Skip to content
HN On Hacker News ↗

An interactive introduction to the terrific experience of rendering Arabic typography and its technical debt

▲ 284 points 83 comments by bookofjoe 5d ago HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

61 %

AI likelihood · overall

Mixed
42% human-written 58% AI-generated
SEGMENTS · HUMAN 3 of 5
SEGMENTS · AI 2 of 5
WORD COUNT 1,672
PEAK AI % 99% · §5
Analyzed
Jun 13
backend: pangram/v3.3
Segments scanned
5 windows
avg 334 words each
Distribution
42 / 58%
human / AI fraction
Verdict
Mixed
Pangram v3.3

Article text · 1,672 words · 5 segments analyzed

Human AI-generated
§1 Human · 0%

This post was discussed in LobstersOnce upon a time, a frontend ticket landed on my queue which was not properly mine, but the only other Arabic reader on the team was on leave. It went roughly as follows; a block of mixed-content Arabic prose on the customer-facing dashboard was rendering with a ragged left edge (the rag falls on the left in Arabic, since the lines set out from the right margin; the ticket said "ragged right") when the design team had explicitly specified justified text. Attached were three screenshots from three browsers and a polite note from the product manager observing that the Latin-script version of the same block looked, I quote, "fine."The same six months I had closed three other tickets against the same product, each of which had presented to its filer as the only bug. A customer's name had appeared with its letters unjoined on a printed agreement, the way a sign-painter would have laid them out in 1962, because the PDF library on the receipt server pre-dated the existence of a shaping engine in its language runtime. A search index had been returning empty for accounts the customer service team could see in the database because a 2017 import had encoded twelve thousand names using fossil Unicode codepoints from 1991 instead of regular ones from 1995, and the index, very reasonably, treated the two encodings as different strings, So, that ragged-left ticket was the smallest of the four, HOWEVER, it sat on top of the same iceberg and pointed at the same thing.Here is the disagreement, reproduced live. I used random text, the original had more spacing, I'm too lazy to pick words to maximize the ragging and spacing.PRODUCTION, ANY BROWSERالخط هندسة روحانية ظهرت بآلة جسمانية، وهو لسان اليد ورسول العقل، وسفير الضمير ووحي الفكر، وسلاح المعرفة وأنس الإخوان عند الفرقة.

§2 Human · 7%

apply the fix the ticket asks forTHE MOCKUP, AS DESIGN APPROVED ITالخـــــط هندســـــة روحانيـــــة ظهــــرتبآلــــة جسمانيــــة، وهــــو لســـان اليــــدورســـــول العقـــــل، وسفيــــر الضميــــرووحـــــي الفكـــــر، وســـــلاح المعــــرفةوأنـــــس الإخـــــوان عنــــد الفــــرقــــة.On the right, the agreed design: both margins flush, every line filled by elongating the strokes inside the words, never the spaces between them. It renders in your browser only because I placed every elongation by hand, a confession I will expand on below. On the left, what production ships. Tick the box to apply the one tool CSS offers, text-align: justify(For these demonstrations this site ships its first webfont ever: Amiri, self-hosted, a hundred and fifty kilobytes of one man's unpaid evenings, redistributed under the OFL. That this is what it takes to show you something your operating system cannot do on its own is, I want to be clear, part of the argument. I think it is a delightful hundred and fifty kilobytes.)It did look fine. I spent about half an hour with it, I walked the rendered DOM, I set text-align: justify in so many different combinations of font-family and direction declarations, and at the end of the exercise I wrote a reply explaining, more or less honestly, that the problem was not a bug in our stylesheet but the state of Arabic typography on the web.The reply took and the closure of the ticket took half an hour or so. The reasons behind it took five hundred years to pile up, and they involve a twice-mutilated vizier, a Qurʾān that vanished for four centuries, a Beirut newspaperman with a deadline, and an Egyptian physician who taught himself font engineering for fun (or that what I imagine about him). Walking through these, ended up to be the most enjoyable couple of weeks in that job, and I want to go through it here too.What the scribes solvedThe history deserves recording because most people outside the small world of Arabic font engineering don't know it, and it is wonderful.

§3 Human · 11%

Classical Arabic typography, by which I mean the manuscript tradition that the early printers of Istanbul and Bulaq spent their careers chasing, justifies a line of text without stretching the spaces between words at all. Stretched spaces are the Latin convention, and in Arabic they produce an effect the scribes would have found simply ugly. Instead the scribe extends the letterforms themselves along the baseline, using what is called taṭwīl or, in the modern technical vocabulary, kashida: the connecting strokes between certain pairs of letters can be lengthened, sometimes lavishly, to carry a line out to the margin. A well-set page of Naskh from the seventeenth century has every line flush at both margins, and the result is the dense, regular weave that anyone who has spent time with a good manuscript Qurʾān will recognise on sight.Fig. 1. A Qurʾān folio, fourteenth century, now in the Metropolitan Museum of Art. Run your eye down the left edge: every line lands flush, and not one word-space was stretched to get it there. The justification lives inside the words. (Public domain, via Wikimedia Commons.)And this was not improvisation but a system, with a paper trail. The system was written down by Ibn Muqla, Abbasid vizier and chief calligrapher, who served three caliphs in succession and was imprisoned by two of them; the third had his right hand amputated on a charge of treasonous correspondence, and Ibn Muqla then kept writing for the next several months by lashing a reed pen to the stump of his wrist, and was rewarded for what he wrote by having his tongue cut out, and died in prison around the year 940. His body was buried three times in three different places, his daughter moving it after each interment to keep the grave out of police hands. The system he wrote down outlasted everybody who hurt him by a thousand years. It is called al-khaṭṭ al-mansūb, the proportional script; every letterform measured in rhombic dots of the reed nib, every curve a defined arc of a defined circle, the alif a fixed number of dots high and anything else derived from the alif.

§4 AI · 84%

Within that system the elongation is a drawn stroke with its own rules, which letter pairs accept it, how the curve swells and tapers, how many elongations a line may carry, where they may sit. The scribes also justified by choosing different shapes, because most letters have alternate forms of different widths, and a skilled hand selects among them as the margin approaches. Justification, in this tradition, is not a spacing problem rather a shaping problem.The tradition Ibn Muqla started did not stay with him; it was refined, in writing, by named human beings over the following six hundred years. Ibn al-Bawwāb in Baghdad, around the year 1022, smoothed out the proportions and produced the manuscript that defined Naskh for the rest of the millennium; a single Qurʾān in his hand survives in the Chester Beatty Library in Dublin, and you can date the Persian, Ottoman, and Mamluk traditions by how closely they follow it. Yāqūt al-Mustaʿṣimī, who survived the Mongol sack of Baghdad in 1258 by climbing a minaret and continuing to write, codified what later scholars called the Six Pens, the canonical hands of Naskh, Thuluth, Muḥaqqaq, Rayḥān, Tawqīʿ, Riqāʿ, each with its own metrics, each with its own justification grammar. Then the Persian scribes invented Nastaʿlīq in the fourteenth century, a hanging script that justifies by sloping the baseline downward at the end of each phrase, which is to ordinary justification roughly what a vertical garden is to a lawn. The Ottomans developed Dīwānī for the chancery and a tightly knotted Dīwānī Jalī for the sultanic seal, both of which fill space by interleaving letters at heights ordinary baselines never visit. All of these are the same alphabet of twenty-eight letters; all of them have their own rules about which letters accept the kashida, which never do, and how the line breathes.Latin typesetting never needed any of this, because Latin letters do not hold hands. Arabic letters do, and the web, in 2026, looks at them holding hands and stretches the air between the words anyway.

§5 AI · 99%

So now you know what the mockup card at the top of the page was doing: it was faking a page of this manuscript tradition in HTML, every line carried to the measure by the strokes and not the spaces. The fakery, since I promised a confession, is U+0640 TATWEEL characters that I placed and sized by hand.Four shapes for every letterTo understand why every machine since Gutenberg has wrestled this script and mostly lost, you need one structural fact: Arabic is cursive always. There is no print-versus-handwriting distinction, no block letters. The letters connect in stone inscriptions, in manuscripts, in metal, on screens. Each letter therefore changes shape depending on its neighbours (an isolated form, an initial, a medial, a final), and six letters refuse to connect forward at all, which breaks words into joined clusters and gives the script its rhythm. The shapes are not costumes over some underlying "real" letter. The positional variation is the letter.And the alphabet is bigger than Arabic the language. Persian extends it with four letters Arabic does not have (پ pe, چ che, ژ zhe, گ gaf) and uses two of the existing letters in subtly different forms (ی for the final yāʾ, ک for kaf). Urdu adds an aspirated do-chashmī he (ھ), a retroflex set (ٹ ڈ ڑ), and a hanging ye barree (ے), and writes most of its everyday text in Nastaʿlīq, which a Naskh-shaped font will produce as a phonetically correct but visually unrecognisable approximation. Sindhi has more again. Pashto, Kurdish, Uyghur, Kashmiri, and Punjabi each take the alphabet, add what their phonology requires, and ship. Any font that calls itself "Arabic" without consulting the Persian and Urdu communities will produce, for hundreds of millions of readers in Iran and South Asia, text that is technically rendered but functionally wrong: the kaf has the wrong terminal, the heh fuses where it shouldn't, the digits are from the wrong belt. The Noto Sans Arabic family ships separate sub-fonts to cover these (NotoNaskhArabic, NotoNastaliqUrdu, NotoSansArabicUI), and OS font fallback chains usually get it right.