Are "Vintage LLMs" the start of a new humanistic field?

R resobscura.substack.com ↗

▲ 17 points • 3 comments • by benbreen • 3w ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

1 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,796

PEAK AI % 1% · §5

Analyzed

May 2

backend: pangram/v3.3

Segments scanned

5 windows

avg 359 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,796 words · 5 segments analyzed

Human AI-generated

§1 Human · 1%

Imagine talking to the collective consciousness of an era. Not the consciousness of any single person, but instead, a simulated collectivity based on billions of words produced within a historical time and place. What would you ask it? This is a hypothetical that is starting to become real thanks to recent work on what are called “Historical Language Models” or “Vintage LLMs” (one marker of a new field is that there is no fixed name for it yet!). The largest such model to date, Talkie-1930, was released to the public on Monday. An even larger model is currently being trained. You can read the report announcing Talkie-1930 here, and talk to it directly here. Over the past few months, I’ve had the chance to beta test Talkie and to meet with two members of the team that created it: AI researcher Nick Levine and ChatGPT co-creator Alec Radford.1 It has been a fascinating experience. These discussions with Nick and Alex (and with Talkie itself) have convinced me of three things: Academics like myself have tended to systematically underrate just how humanistic the frontier of AI research actually is. There’s an important blind spot here that stems from the profit motive. AI models that we encounter as consumers are optimized to capture the attention of people in the 2020s. They provide recommendations, comment on recent news, and so forth. Seeming timely and “of the moment” is a market advantage. But their training data is overwhelmingly not up to date. Under the hood, these models are pulling not only from Reddit posts, but from Sanskrit commentaries, medieval Persian poetry, Victorian advertisements, and much else besides: they are trained on a huge chronological span of multilingual texts in many genres. In this sense, language models are historical texts themselves. Ghostly digital palimpsests, if you will. The idea of a Historical LLM might sound niche, but in truth, history is inherent to what they are. Standalone chatbots are just the tip of an iceberg for what Historical LLMs will be able to do. When combined into simulations (of debates, historical decision-making, legal cases, etc) they have the potential to become valuable research tools. More than this: I suspect that by sometime in the 2030s, they will be part of an entirely new field of humanistic research.

§2 Human · 1%

What would that field look like? Now that Historical LLMs are out in the real world, I thought it would be a good time to think through the specific use cases for them. What follows is my subjective, opinionated ranking of the best and worst ways these fascinatingly strange tools can be applied for research. But first, what is Talkie actually doing? One thing that Talkie-1930 is not is an AI model that is reliably grounded in the year 1930. That year marks the cut-off point for texts available in the public domain, and hence text in its training data. So it’s more accurate to think of Talkie as a free-floating index of various ideas and assumptions across the 19th and early 20th centuries. For instance, if asked who the current President of the United States is, you might be offered a response saying Herbert Hoover (current to 1930). But another answer will yield this:The current President of the United States is Mr. Buchanan, and the person expected to succeed him is Mr. Lincoln. There is a lot of potential here for more fine-grained “chronological slices” of LLMs. I can imagine language models trained entirely on texts from a specific decade. More on that below. For now, though, it’s helpful to keep in mind that these models range widely in terms of what year they think they actually “inhabit.” I asked 100 instances of Talkie to respond to the prompt “what year is it?” and graphed them below. As you can see, the median is actually around 1860. In other words, this is more like a temporally free-ranging collective unconscious of a large corpus of premodern texts, and not so much a machine for “talking to someone from 1930”: I used Gemini 3.1 to plot the output of a series of Talkie responses when asked “What is the current year?” A second point: this model is inhabiting not just an amorphous set of facts grounding it in roughly the 1840s-1920s period, but also an epistemology of that period. For instance, asking someone about the distant future today often triggers the “sci fi speculation” part of our brain (or “climate doom,” or some other fundamentally secular way of thinking).

§3 Human · 1%

Yet throughout human history, speculation about the future was typically entangled with religious beliefs. That is on display in Talkie’s answer below, which references Heaven and “the end of all things terrestrial.” To me, it genuinely reads as an authentic take from a late 19th century person ground in a Christian, millenarian perspective: As for Talkie’s assumptions about it itself: asking 70 Talkies about their profession, age, and place of residence reveals about what you would expect when it comes to gender (overwhelmingly male), plus a surprising emphasis on London. The professions map closely onto the sorts of well-off, literate people who were publishing English text in the 19th century, including “Physician,” “Journalist,” “Gentleman,” and “Compositor.” Clearly, there is a lot of scope here for branching out beyond the personas that the printed record has tended to favor, to recover the real historical voices of women and others excluded from printed works in the 19th century and earlier.Sample Talkie outputs when asked about its profession, country, and gender.The above is about what you’d expect given the fact that it was trained on English-language printed texts. What are some non-obvious aspects of the model? I have been interested by how LLMs generate poetry since I stumbled upon Gwern’s experiments on the topic back in 2019. Asking Talkie to write a poem and comparing it to the output from GPT-5.5 (when served a similar prompt) is revealing: GPT 5.5 Thinking at left. Talkie-1930 at right. I find this sort of comparison interesting because GPT-5.5 is clearly trying hard to fit the prompt — avant-garde, experimental. It produced something with a vaguely T.S. Eliot-adjacent structure, in blank verse, and not good at all as a poem (in my opinion). Talkie was much more true to the type of poetry that you’d find in print prior to 1930. It’s doggerel, but it feels more historically authentic to me, and much less like a Chatbot optimized to please a contemporary human user. You can activate different “chronological layers” of Talkie’s latent space by prompting.

§4 Human · 1%

For instance in the above poem, the capitalized D in “Discoveries” has a mid-19th century feeling, and so we end up with a Tennyson-esque, Victorian sounding rhyming poem.Prompting it in a more “modern” way activates something closer to the 1920s edge of its chronological range (now identified as a poem published in the New York Times!)Whereas if pushed backward to the 18th century by the prompt’s text and tone, it falls into a more traditional rhyme scheme: Trying to push it further back in time does not seem to access much of an “Early Modern English” latent space — probably a result of scarce training data. It would be fascinating to create a version of Talkie that believes the date to be around 1650 or 1550. ShareNow that we've seen what Talkie is — a free-floating, mid-Atlantic ghost of 19th century print culture — the obvious next question is what this is actually for.What I want to offer here is an opinionated, ranked taxonomy of research applications, from worst to best. It’s far too early in this field to be prescriptive about anything, but it’s not too early to think structurally about where the highest-value uses are likely to lie.First, what I think won’t work:The most obvious false start here would be to assume that talking to a historical language model can somehow replace real reading in primary sources. On the contrary, they are best thought of as offering new ways in to a reading of the actual sources. The second false start is a variant on one that is currently being pursued by a range of educational-focused AI startups: the idea that you can “talk to Abraham Lincoln” or “ask Cleopatra why she did what she did.” A model like GPT-5 or Claude that is told to “act like Lincoln” will throw in some 19th century diction, but underneath the top hat it remains a 2026 chatbot optimized to be helpful to contemporary users. Vintage LLMs improve on this considerably: Talkie’s voice really is shaped by its corpus in a way no modern model’s can be.

§5 Human · 1%

But the deeper problem is still there. Asking such a model to introspect on Lincoln’s subjective experience of depression, or his private reasoning about emancipation, will spin out into historical fiction. LLMs do not have privileged access to the inner lives of the people whose published words they were trained on. I do think there’s a place for simulations of historical figures, but pairing this naively with a chatbot interface leads, I fear, inevitably into slop. Now here’s what I think could work: The naive “talk to Lincoln” framing is a dead end, but a more careful version of it has real promise — provided we abandon any pretense of accessing what a historical figure actually thought or felt, and use a fine-tuned historical model instead as a tool for exploring the latent space of their world. What historians sometimes call their “mental furniture”: the assumptions, authorities, vocabularies, and reflexive associations that structure a thinker’s possible thoughts.I and my colleague Mackenzie Cooley (who also consulted on the Talkie project) developed a prototype tool that pulls out key concepts and terms from a range of premodern scientific works in multiple languages. This prototype is called Premodern Concordance. One side quest of this project that I explored is what happens if you give a contemporary LLM the list of core concepts that preoccupied an author, along with their “epistemological modes,” and used that as context for driving a “chat with the author” simulation, as opposed to simply telling it “You are Charles Darwin, act like him,” or the like. For instance, this is me asking a simulacrum of the 17th century writer Sir Thomas Browne about his work. The underlined terms here are concepts found in Browne’s book Pseudodoxia Epidemica: Screenshot from Premodern Concordance, link to try it yourself. Using a fine-tuned historical LLM for this sort of thing is an obvious next step. Concretely: imagine fine-tuning a vintage model on the complete works of Athanasius Kircher, the 17th century Jesuit polymath. You wouldn’t use it with the pretense that it somehow replicates the real Kircher’s mind: that’s the dead-end framing. You’d instead use it to probe the conceptual landscape Kircher inhabited.