Skip to content
HN On Hacker News ↗

tokenspeed — feel LLM tokens-per-second

▲ 488 points 96 comments by hexagr 6d ago HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully AI-generated

99 %

AI likelihood · overall

AI
0% human-written 100% AI-generated
SEGMENTS · HUMAN 0 of 1
SEGMENTS · AI 1 of 1
WORD COUNT 261
PEAK AI % 100% · §1
Analyzed
May 20
backend: pangram/v3.3
Segments scanned
1 windows
avg 261 words each
Distribution
0 / 100%
human / AI fraction
Verdict
AI
Pangram v3.3

Article text · 261 words · 1 segments analyzed

Human AI-generated
§1 AI · 100%

Every local-LLM benchmark reports throughput: "47 tok/s on an M3," "180 tok/s on a 4090," "500 tok/s on Groq." Unless you've actually watched tokens stream at those rates, the numbers are hard to internalize. This is the rendering.

Four modes code — syntax-highlighted pseudo-code, the most common thing you watch stream out of an LLM. text — lorem ipsum prose, for the chat/answer case. think — dim-italic reasoning sentences alternating with code, mimicking a reasoning model thinking out loud. agent — alternating tool calls and code generation with processing pauses, simulating an AI coding agent.

What to try Start at the default 30 and read along. Then hit 1 (5 tok/s — Raspberry-Pi-class local model), 5 (60 tok/s — typical hosted Claude or GPT), 7 (200 tok/s — Groq territory), 9 (800 tok/s — Cerebras-class, where the bottleneck is your eyeballs). Now switch between c and t at the same rate. The difference is striking — and intentional.

What counts as a token This approximates BPE-style tokenization, not any vendor-specific encoder (tiktoken, Claude's tokenizer, etc. — those disagree in the details anyway). Short words are often one token; longer identifiers split into chunks (processUserInput → process + User + Input); punctuation and operators usually count too. Code is more token-dense than prose, so the same tok/s can feel very different depending on what's streaming. The benchmark number is honest; the perceptual effect varies a lot by content type — which is the gap this tool exists to expose. English prose averages ~1.3 tokens per word, so 30 tok/s ≈ 23 words/s.