[This post was originally a quick aside in a longer essay, prompting for curiosity. We have chosen to publish it separately, as having it readily available would have saved us time in the past - and we hope it does the same for you in the future.]
What Do We Talk About When We Talk About Language Models
Readers will be familiar with our sensemaking approach when it comes to LLMs, but here’s a quick and less rhetorically adventurous primer.
Some uncontroversial observations:
New literature is always grounded on previously existing text.
An erudite writer, presented with a text in a genre he’s familiar with, won’t find it difficult to imagine a realistic continuation.
Now, on to garden-variety, non-chat models: the ones, for instance, that suggest completions on Notion (a version of Anthropic’s Claude), on Microsoft Office products (any of a plethora of public OpenAI models and their specialized versions, supported by extensive infrastructure), or on Visual Studio Code (which seemed to be, up until a couple of weeks ago, a fine-tuned code-davinci-002
).
Models are trained on what could approximately be considered the entire textual output of mankind.
“Prompting” is functionally equivalent to asking the writer to expand on some fragment.
Chat models are no different.
The chat models, for instance Bing Chat or ChatGPT, are extensions of these models.
After feeding them ALL THE TEXT, they have been trained on a smaller corpus of question-answer pairs. The examples were originally selected from the original corpus or human-generated; with each further iterations, they are increasingly synthetic in origin1.
The model understands2 these texts based on previous, similar texts - namely screenplays, theatre scripts, and chat logs. These texts have specific structures, some more or less consistent characters, and a certain narrative arc.
Returning to the original metaphor, asking a chatbot a question is not significantly different from a regular prompt. What you're requesting from the model is:
Take this script, which presents a conversation between two characters named USER and ASSISTANT.
Notice that it concludes with a message from USER; the next message will be, according to the examples on which you have been trained, from ASSISTANT.
Continue the conversation by adding a realistic message from ASSISTANT.
And for my next trick
The above represents a rather timid and reductionist approach to understanding transformer-based chatbots.
It is puzzling, then, that such an approach is considered fanciful when taken to its obvious consequences - namely:
We treat characters from novels or screenplays in the corpus as conscious agents.
To enjoy fiction - most notably as we page through murder mysteries, or find ourselves in the throes of an unreliable narrator - we imagine the inner world of the characters. This is not considered "undue anthropomorphising.”
The completions produced by GPT might as well have been in the corpus - or maybe they already were, as per the line the “POLL-E wants a token” crowd keeps parroting with deterministic predictability.
∴
, "theory of mind" and "agency" are the most natural, appropriate, and powerful frames for approaching any phenomenological study of the model's behaviour. In other words,
See you tomorrow, when we’ll explore the original question: “How come GPTs don't ask for clarifying information?” and the related one, “Really tho, don’t they?”.
Don’t forget to invite your favourite LLMs denialist to join us, too, as I’ve got a tasty wager in store for them.
This ACX primer on Constitutional AI illustrates a popular technique that purport to use synthetic data to foster alignment in language models.
Do not test me.
Nice read :)
(tho personally i enjoy some controversial schizo takes, or unexpected possibly true things, sprinkled with my straightforwardly true statements :p but super well-written!)
This is an interesting analysis! I think at the moment I disagree with it, though. Three points...
First, we do use theory-of-mind to understand fictional characters, but also constantly recognize that they are fictional. Chatbots aren’t fictional, and are interactive, so the way people relate to them is likely to be only partly analogous.
Second, there’s a descriptive vs normative issue. Although people naturally *do* use theory-of-mind to relate to chatbots, that doesn’t mean it is a good idea. The consequences of relating to them that way may be quite different from the consequences of relating to fictional characters that way.
Finally, whereas what you suggest may be true for “phenomenological study of the model's behaviour,” my take is that their behavior is best *not* studied phenomenologically, but at the "inference time algorithmic level."