Human Typing Habits and Token Counts

P pankajpipada.com ↗

▲ 30 points • 6 comments • by ppipada • 2w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

50 %

AI likelihood · overall

Mixed

44% human-written 56% AI-generated

SEGMENTS · HUMAN 1 of 3

SEGMENTS · AI 0 of 3

WORD COUNT 571

PEAK AI % 68% · §1

Analyzed

May 9

backend: pangram/v3.3

Segments scanned

3 windows

avg 190 words each

Distribution

44 / 56%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 571 words · 3 segments analyzed

Human AI-generated

§1 Mixed · 68%

May 8, 2026 | Reading Time: 3 minHumans type for speed, tone, and habit. Tokenizers split text based on common patterns, and providers bill per token. That means ordinary habits like typos, shorthand, filler words, pasted IDs, and stray whitespace can change token counts without changing intent much.I started noticing this on a tiny prompt: 5 words, 2 spelling mistakes, 13 tokens. I fixed the spelling and sent it again: 6 tokens, including the full stop.Counts below use

OpenAI’s tokenizer and

Claude’s API based tokenizer. In general Claude spits out more tokens on the same text compared to OpenAI in my usage. Counts here are for isolated strings. In real prompts, counts can shift slightly based on surrounding spaces, punctuation, and casing.TyposSwapped letters, dropped letters, doubled letters, nearby-key misses: all normal typing habits, all billable. template → 1, tempalte → 3 loaded → 1, lodaed → 2, Claude: 3 assistant → 1, assitant → 2, Claude: 3 simple → 1, simpel → 2 like → 1, liek → 2 Same intent. Different split.Common spellings compress. Rarer spellings fragment. In code, this can compound quickly: the same bad identifier/var name/func name shows up in declarations, references, logs, errors, diffs, etc.When I type for work (code, prompts, texts etc.) my left hand is slightly faster than my right which results in some swapped letters. I never bothered to correct myself when using Google searches, or text messages etc. Now apparently that difference has a pricing model.Word shapesWord shapes matter too. Quick checks: describe → 1, describer → 2, describers → 3 error → 1, errored → 2 A tiny suffix looks harmless to a human. Tokenizers may split it very differently.Conversation habitsHuman chat carries a lot of low-signal padding: fillers: just, basically, actually, really hedges: maybe, I think, kind of, sort of wrappers: hey, please, thanks, sorry tails: etc.,

§2 Mixed · 34%

or so, and all that. etc. is a bit mixed. If it replaces a long useless tail, it can save tokens. If it just hangs off the end of a sentence, it mostly adds fog. chat noise: lol, haha, ..., !!, ?? transcript filler: uh, um, you know, like Tiny expressive habits count too: Good → 1 / Good... → 2 Yes → 1 / Yes!! → 2 Ok/Okay → 1 / Ok/Okay??? → 2 yes → 1 / yesss → 3 really → 1 / reeeally → 3 These help tone. They rarely help the task.Shorter to type is not always cheaperHumans optimize for keystrokes. Tokenizers optimize for common text. Those are not the same thing. please → 1 / pls → Claude: 2 thanks → 1 / thx → 2 without → 1 / w/o → 2, Claude: 3 Most of the time standard dictionary words will be 1 token and almost always more explicit, clearer, and closer to the text models saw during training, than shorthands.Quiet token leaksSome things are not conversational, but they show up in normal work and still inflate tokens: UUIDs, hashes, timestamps, request IDs.

e.g., UUID - 019d6ce9-7cfe-753a-b6d6-df719510c9e3 → 24, Claude: 26 e.g., RFC 3339 timestamp - 2026-05-08T21:00:00+05:30 → 16, Claude: 17

long URLs and file paths leading spaces, trailing spaces.

§3 Human · 1%

Normal internal spacing is usually fine. Boundary whitespace is where things get weird. ConclusionThe model may recover meaning from all of this. Billing does not.Humans type by habit. Tokenizers bill by pattern.Which is mildly annoying, because now even tempalte feels like a line item to rectify.