Can A.I. Produce Writing That We Actually Want to Read?

N newyorker.com ↗

▲ 16 points • 47 comments • by fortran77 • 3w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

4 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,753

PEAK AI % 23% · §4

Analyzed

Jun 3

backend: pangram/v3.3

Segments scanned

5 windows

avg 351 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,753 words · 5 segments analyzed

Human AI-generated

§1 Human · 0%

In the previous installment of this series on the future of higher education, I talked with professors about the ways that A.I. has changed their classrooms. Most felt despair over the breakdown of a contract between student and teacher, one predicated on the faith that, even if students weren’t always perfect, they would at least challenge themselves to think every once in a while. If students rely on A.I. summaries to do their “reading” for them, if they don’t attempt to put their ideas into prose, are they really learning anything?When I consider the original question of this series—whether my nine-year-old daughter will go to college—I find myself wondering whether she will actually struggle through the writing process in that old-fashioned way. Readers will always want literature written by humans, but, for everything else—e-mails, advertising copy, legal briefs, student papers—the resistance to A.I.-generated writing will almost certainly slip as technology improves and it becomes functionally impossible to see the difference between writing by a person and writing by a machine. When that happens, the major incentive that educators hold over students—“I will fail you if you cheat”—will disappear, because there will simply be no way to know.With that in mind, I want to take a step back from the implications of A.I. for higher education and ask a more fundamental question: How far are we from that moment? Right now, I believe it’s still easy for people to spot obvious examples of A.I. writing. A professor who reads hundreds of papers and has a decent grasp of her students’ writing ability can recognize the fakes. A manager who starts getting tidy, bullet-pointed, and mostly cheery e-mails from her employees will rightly suspect that robots have autocompleted their messages. Robot writing is also frequently filled with tells: copious em dashes, “not X but Y” constructions, conspicuous verbs (“delve” comes to mind).But those tells generally show up only in Claude’s most rudimentary outputs. What about the kind of prose that we actually want to read? Can Claude produce that?This question, or some version of it, was asked by thousands of enraged readers during the past couple of weeks, after the literary magazine Granta published a Commonwealth Prize-winning story by a writer named Jamir Nazir that seemed to bear all the hallmarks of A.I. writing.

§2 Human · 0%

People noted the strange recurrence of the word “hum,” for instance, and, especially, the awkward, constipated metaphors that didn’t make much sense. The publisher of Granta then put out a bizarrely ambivalent statement, concluding that “perhaps we never will know” whether A.I. had written the story. Nazir, for his part, rebutted the allegation. A whole bunch of writers screamed that the end times had arrived, or, less persuasively, insisted that the reason A.I. writing could win the Commonwealth Prize was that literary fiction was in such a bad place. (Is literary fiction better or worse today than it was twenty or thirty or forty years ago? I have no idea, but I do know that every generation of writers has made more or less the same complaint.)Using Claude, I vibe-coded a simple game that presented roughly two hundred words of text and asked the player whether it was written by a human or generated by A.I. The sample texts all came from Project Gutenberg, an online library of public-domain literature; I asked the robots to scan through works by writers including George Eliot, James Joyce, Ernest Hemingway, and Arthur Conan Doyle and come up with passages in their respective styles. The robot would then display the results and let me and a few of my friends guess whether each was the real deal or a fabrication.The test rounds were fairly easy. The A.I. writing had tells, including formatting and punctuation problems, and an overreliance on tortured similes and metaphors. A.I. also had a weird habit of making its characters fidget constantly, always running a finger along the edge of a table or adjusting a collar. The most reliable marker, though, was something more abstract, and, I suppose, upon reflection, even a little spooky. The scenes generated by A.I. had characters, but, apart from fidgeting, they mostly did nothing.Consider this passage that Claude generated in the style of Henry Fielding:Sophia, who had hitherto said very little, now looked towards her father with an expression which Mr. Western could not well interpret, whether as entreaty or reproach and indeed it is probable she scarce knew herself what she meant by it. Jones stood near the window, and had the appearance of a man waiting to hear his sentence pronounced.

§3 Human · 6%

Western, for his part, had by this time recovered something of his usual bluster, and began again upon the subject of Blifil, commending his estate and his family with great earnestness, as though these considerations alone ought to have settled the matter long since. He spoke of Allworthy’s approval with particular force, repeating the name two or three times, as if that name carried an authority which no reasonable person could withstand. Sophia said nothing to this, but she turned away towards the fireplace, where a small coal fire was burning, though the afternoon was not cold enough to have required one.There is very little action and no certainty. Sophia doesn’t say much, and Mr. Western can’t interpret her expression, which she herself does not fully understand. And, after Western says his piece, which is described with both an “as if” and an “as though” clause, Sophia doesn’t respond, and looks to the fireplace that is burning a pointless flame.In early rounds, the people I shared such deadened passages with immediately assumed that they were fake, even if the robots had done a decent job of approximating a given writer’s style.For the next couple of days, I chatted with Claude about how to get rid of these tells. I told it to avoid similes and to cut down on such words as “nowhere” and “something,” which tended to betray its odd, core ambivalence. For a while, Claude kept spitting out the same inert passages, in which Jay Gatsby or Sherlock Holmes did a whole lot of nothing and had no opinion about the very little that was happening around them. I told Claude that it wasn’t doing a very good job of unlearning its bad habits, and suggested that it create another agent to scan through the fakes and catch any mistakes it made.

§4 Human · 23%

A third agent made notes with instructions on how best to imitate each author. I imagined these as cue cards that the agent would hold up to make sure everyone remembered to make Dorothea Brooke actually do something.Here’s a sampling of the rules, which I had no part in writing—these are Claude’s instructions to itself regarding how to mimic each author’s style. (I have included only a few; there were typically about ten instructions in each “Does” and “Does Not” category.)ERNEST HEMINGWAYDOES:Strings short declarative sentences with “and” as the primary connective tissue, creating forward momentumStrips dialogue tags to bare “he said / she said”; rarely uses adverbs or action beats on the same linePlaces weather or landscape as a flat factual sentence, not a framed observation (“The sun was over the hills”)DOES NOT:Never uses subordinate clause stacking or periodic sentences that withhold the main verbAvoids Latinate or polysyllabic vocabulary (“illuminated,” “nevertheless,” “subsequently”)Never attributes interior thought through free indirect discourse or italicized reflectionNever names or explains what a character is feeling directly (“he felt sad,” “she was afraid”)GEORGE ELIOTDOES:Builds long, architecturally balanced sentences with multiple embedded subordinate clauses joined by semicolons or colonsIntroduces characters with a brief sociological or class-placing phrase before the name arrives (“a man of some fifty years, whose . . .”)DOES NOT:Never uses sentence fragments for emphasis or rhythmAvoids present-tense narration; everything moves in past tense with controlled retrospectNever uses colloquial or American idiom; no contractions in narrationMultiplying the robot workforce and reminding the bot of its task seemed to work, at least in part. (

§5 Human · 0%

When I asked a friend who teaches computer science and machine learning at U.C. Berkeley why the robots needed other robots to check their work, he replied, “One hundred percent serious answer: No one knows.”) The similes went away. But Claude took some of the new directives a bit too seriously; suddenly, every fake passage was filled with characters hopping on a horse, or delivering an important package, or running. This, for whatever reason, led to very short sentences that were easy for people to spot as fake. So I loosened the rules a bit, and let Claude do its usual thing, with a handful of strict rules about vague words and similes.After a few days of testing, I posted a link to the test on my X account. Within five days, I had more than thirty thousand responses. The people who took the test were able to identify a real passage versus a fake one roughly fifty-two per cent of the time—which might be another way of saying that they couldn’t actually distinguish the two. But roughly ten per cent of players seemed good at the game, whether because they had prior knowledge of the original material or a particularly keen eye for A.I. tics that I still don’t recognize.By this point, I had figured out how to make slightly better fakes. I deployed another A.I. employee and had it double-check both samples for tells. And, by the end of the week, I was fooling more than half of the people who played the game. The sample that tricked the most people came from a robot Bram Stoker. Only seventeen per cent of players were able to discern that it was fake.4 May. I have spent the greater part of this morning at the window of my room and have given myself up to a course of reflection which I had hoped to avoid by means of constant activity, but which the absence of any occupation in this place has finally rendered unavoidable. The Count was last seen by me, so far as I can with certainty assert, on the evening of the second; and his absence has now extended through two nights and the better part of three days. I do not believe that he has left the castle. The horses are in the stable. The great door at the foot of the south stairs has been locked from within since Tuesday. I have walked the corridors of the three lower floors twice each night and have heard no sound but the wind in the chimney of the hall.