Stochastic Parrots 🦜: Frequently Unasked Questions

M medium.com ↗

▲ 51 points • 61 comments • by olalonde • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is fully human-written

0 %

AI likelihood · overall

Human

100% human-written 0% AI-generated

SEGMENTS · HUMAN 5 of 5

SEGMENTS · AI 0 of 5

WORD COUNT 1,869

PEAK AI % 1% · §1

Analyzed

May 17

backend: pangram/v3.3

Segments scanned

5 windows

avg 374 words each

Distribution

100 / 0%

human / AI fraction

Verdict

Human

Pangram v3.3

Article text · 1,869 words · 5 segments analyzed

Human AI-generated

§1 Human · 1%

10 min read4 days ago--It’s been a bit over five years since the Stochastic Parrots paper (Bender, Gebru et al 2021) was published (and somewhat longer since Google made it an enormous news story by firing my co-authors). During that time, I have been watching the phrase stochastic parrot(s) on social media, initially out of linguistic interest (it’s rare to get to see how a coinage develops from its very beginning). In the early days, most usage I saw was from people referring to the paper, and then people who had read the paper referring to large language models as stochastic parrots. Eventually, though, the phrase outran the paper, as people picked it up as a way to refer to LLMs.Tracking this phrase also provides a window into parts of the online discourse about “AI” that I would otherwise be unlikely to see. In that discourse, I see a lot of misconceptions about a) how large language models work and b) my own work on this topic. Accordingly, it seems like a fitting time to do some debunking, answering questions that people frequently fail to ask. Below what you’ll find aren’t questions, but the various statements that people make, when perhaps they should have stopped and asked a question.To keep this grounded in the actual text in question, here is where we introduce the term in the original paper:Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counter-intuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do [89, 140]. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model).

§2 Human · 0%

Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot. (p.616–617)The phrase stochastic parrots was one attempt (among several) to make vivid what it is that large language models, when used to synthesize text, are doing. In later work, (Mystery AI Hype Theater 3000, The AI Con), I’ve also added synthetic text extruding machine as a way to describe systems that closely model which bits of words tend to co-occur in their input data and can be used to, well, extrude synthetic text.Bender says “AI is a stochastic parrot”I have never and will never say that “AI” is a stochastic parrot, because I reject “AI” as a way to describe technologies (LLMs or otherwise). Also, the Stochastic Parrots paper, written in Sept-Oct 2020, was not a paper about “AI” at all, but a paper about the risks and harms associated with the drive for ever larger language models, which, at that point, mostly weren’t being used to extrude synthetic text. (OpenAI had made GPT-2 and GPT-3 available for playing with, but this was still two years before they imposed ChatGPT on the world and synthetic text suddently became everyone’s problem.) The term “AI” appears only once, near the end of the paper, where we write:Work on synthetic human behavior is a bright line in ethical AI development, where downstream effects need to be understood and modeled in order to block foreseeable harm to society and different social groups. (p.619)I believe this particular insight, and its phrasing, is due to Margaret Mitchell (aka Shmargaret). In the years since, this observation has unfortunately been repeatedly reinforced: work on synthetic human behavior unfortunately continued apace, and the foreseeable harms (predictably) came to pass.Bender says [some model] is “just” a stochastic parrotIndulge me into a little digression into linguistics here. The word just is the kind of word that evokes a scale or ranking.

§3 Human · 0%

For example, She is just 5 feet tall places her on a scale of height and furthermore suggests that her height is further down that scale than would be expected or desirable or just normal/normative. So someone who says that I say that some model is “just” a stochastic parrot is also attributing a scale, perhaps of functionality (or, in the anthropomorphizing language I am always struggling against, “capability”), and asserting that I am placing whatever model in the wrong, or at least a surprisingly low, spot on that scale.This misunderstands what I was doing with the phrase stochastic parrots, and what we were doing in that paper in general. While I can’t speak for my co-authors, I am not invested in the project of “AI”, do not see it as a goal that is worthwhile (nor feasible) to work towards, and am not measuring large language models against some scale of progress towards that goal. What I am trying to do, in a world absolutely saturated with marketing selling the idea that the synthetic text extruding machines are “AI”, or maybe even “AGI”, is to help people understand what these systems actually are: systems designed to mimic the language (specifically: linguistic forms) that people use.An important related point here is that though all of these systems (Claude, Gemini, ChatGPT, etc) have LLMs specifically designed to produce synthetic text as key components, that doesn’t mean there aren’t other components, as Margaret Mitchell also points out. Most things we historically do with computing are not well approximated by extruding synthetic text. Accordingly, if a company’s goal is to portray their product as functional, they would be well advised, for example, to run text classification systems on user input to intercept any arithmetic queries and route those to an actual calculator.“Stochastic parrot” is a critique of LLMs/“AI”I often see people talking about “the stochastic parrots critique of LLMs,” but this, too, misapprehends at least the way I use the phrase. (This may be an accurate description of how other people use it.) I definitely take a critical view on the project of “AI”, and on the ways in which people are using synthetic text extruding machines (aka LLMs). But the target of my criticism is not the models.

§4 Human · 0%

Rather, I am concerned about the actions of people: the data theft, the exploitative labor practices, the haphazard creation of and failure to document datasets, the complete disregard for environmental impact, and the astonishing willingness of so many to surrender their own power and turn to synthetic text (for which no one is accountable) for all kinds of weighty decisions.“Stochastic parrot” is an insultAnother common trope in the discourse around this phrase is to claim that stochastic parrot is an insult (or even a slur). On one reading, that would require LLMs to be the kind of thing that can take or feel offense, which they clearly aren’t. But, indeed, it is also possible to insult someone’s work, or consumer product they have acquired, etc. At which point, I refer the reader to the previous two points.Folks have also pointed out that this coinage is somewhat unfair to actual parrots who, for all I know, do have internal lives and do use their ability to mimic human speech with some kind of communicative intent. My best answer here is to say that (despite parrot in stochastic parrot being a noun), I am drawing not on the name of the bird directly but rather on the English verb to parrot, which means to repeat back without understanding.It can’t be a stochastic parrot, it’s come up with something new!This one misses the role of stochastic in stochastic parrot, which means randomly, according to some probability distribution. What comes out of these systems is not usually a direct regurgitation of their input, but rather a remix of it. This remix is shaped by the specific ways in which the systems were built (“trained”) through multiple steps, by the “system prompt” (a prompt prepended to user input that the user doesn’t usually see), and the user input itself. In other words, theses systems make papier-mâché of their training data, molded around the balloons of these other components.The stochastic parrots argument [is wrong, is out of date, etc]This one is funny because it comes up, in the same form, every time one of the companies promotes a new model. “Stochastic parrots might have been an accurate description in [year], but not anymore because…” and then reference to whatever demo the author has been impressed by.

§5 Human · 0%

This is framed as heralding the arrival of “real” “AI” — over and over and over again.But stochastic parrots (in my writing at least) isn’t an argument. It’s a description or a metaphor, again an attempt to make vivid what language mimicking machines do.The stochastic parrots hypothesis has been disprovedStochastic parrots also does not refer to an empirical hypothesis. Accordingly, it doesn’t make sense to say it’s been “disproved” or that it is “unfalsifiable”.The closest thing to a hypothesis in this space in my writing is the argument (again, not empirical hypothesis) in Bender and Koller 2020, the one with the octopus thought experiment. The Stochastic Parrots paper refers to this earlier paper, which lays out the argument that language models don’t understand text they are used to process, because language models only ever have access to the linguistic form (i.e. spellings of words) in the training data.In that paper, we provide a definition of understanding as mapping from language to something outside of language, and show that systems built only with linguistic form have no purchase with which to encode (“learn”) such a mapping.They’re not stochastic parrots because they’re multi-modal models nowStochastic parrots was coined to refer to language models, i.e. systems trained only on linguistic form used to mimic the kinds of sequences of linguistic form that people use. It is true that image/text models, for example, that can be used to map from linguistic strings to images or vice versa, can be argued to meet the definition of understanding in Bender & Koller 2020 — albeit in an extremely thin way. But the stochastic parrots framing is still extremely relevant to these models, as well as systems built with them. As quoted above:we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competenceWhen we look at the text in an image/text model, we make sense of it in a way that is rich and socially situated and we must not project that onto the model if we want to keep a clear-eyed view of how such models actually function (and in what circumstances we should be willing to use them). Similar things can be said about the images, too, though it’s generally not linguistic competence per se they are experienced through.