[1][https]

[2]One Useful Thing

SubscribeSign in
Share this post
[https]

Thinking Like an AI

www.oneusefulthing.org
Copy link
Facebook
Email
Note
Other

Thinking Like an AI

A little intuition can help

[13][https]
[14]Ethan Mollick
Oct 20, 2024
528
Share this post
[https]

Thinking Like an AI

www.oneusefulthing.org
Copy link
Facebook
Email
Note
Other
[21]
60
41
[22]
Share

This is my 100th post on this Substack, which got me thinking about how I could
summarize the many things I have written about how to use AI. I came to the
conclusion that [23]the advice in my book is still the advice I would give:
just use AI to do stuff that you do for work or fun, for about 10 hours, and
you will figure out a remarkable amount.

However, I do think having a little bit of intuition about the way Large
Language Models work can be helpful for understanding how to use it best. I
would ask my technical readers for their forgiveness, because I will simplify
here, but here are some clues for getting into the “mind” of an AI:

LLMs do next token prediction

Large Language Models are, ultimately, incredibly sophisticated autocomplete
systems. They use a vast model of human language to predict the next token in a
sentence. For models working with text, tokens are words or parts of words.
Many common words are single tokens, or tokens containing spaces, but other
words are broken into multiple tokens. For example, one tokenizer takes the 10
word sentence, “This breaks up words (even phantasmagorically long words) into
tokens” into 20 tokens.

[25]
[https]

When you give an AI a prompt, you are effectively asking it to predict the next
token that would come after the prompt. The AI then takes everything that has
been written before, runs it through a mathematical model of language, and
generates the probability of which token is likely to come next in the
sequence. For example, if I write “The best type of pet is a” the LLM predicts
that the most likely tokens to come next, based on its model of human language,
are either “dog”, “personal,” “subjective,” or “cat.” The most likely is
actually dog, but LLMs are generally set to include some randomness, which is
what makes LLM answers interesting, so it does not always pick the most likely
token (in most cases, even attempts to eliminate this randomness cannot remove
it entirely). Thus, I will often get “dog,” but I may get a different word
instead.

[26]
[https]
These are the actual probabilities from GPT-3.5, as are the other examples in
this post.

But these predictions take into account everything in the memory of the LLM
(more on memory in a bit), and even tiny changes can radically alter the
predictions of what token comes next. I created three examples with minor
changes on the original sentence. If I choose not to capitalize the first word,
the model now says that “dog” and “cat” are much more likely answers than they
were originally, and “fish” joins the top three. If I change the word “type” to
“kind” in the sentence, the probabilities of all the top tokens drop and I am
much more likely to get an exotic answer like “calm” or “bunny.” If I add an
extra space after the word “pet,” then “dog” isn’t even in the top three
predicted tokens!

[27]
[https]

But the LLM does not just produce one token, instead, after each token, it now
looks at the entire original sentence plus the new token (“The best type of pet
is a dog”) and predicts the next token after that, and then uses that whole
sentence plus the next to make a prediction, and so on. It chains one token to
another like cars on a train. Current LLMs can’t go back and change a token
that came before, they have to soldier on, adding word after word. This results
in a butterfly effect. If the first predicted token was the word “dog” than the
rest of the sentence will follow on like that, if it is “subjective” then you
will get an entirely different sentence. Any difference between the tokens in
two different answers will result in radically diverging responses.

[28]
[https]

The intuition: This helps explain why you may get very different answers than
someone else using the same AI, even if you ask exactly the same question. Tiny
differences in probabilities result in very different answers. It also gives
you a sense about why one of the biases that people worry about with AI is that
it may respond differently to people depending on their writing style, as the
probabilities for the next token may lead on the path to worse answers. Indeed,
[29]some of the early LLMs gave less accurate answers if you wrote in a less
educated way.

You can also see some of why hallucinations happen, and why they are so
pernicious. The AI is not pulling from a database, it is guessing the next word
based on statistical patterns in its training data. That means that what it
produces is not necessarily true (in fact, one of many surprises about LLMs are
how often they are right, given this), but, even when it provides false
information, it likely sounds plausible. That makes it hard to tell when it is
making things up.

It is also helpful to think about tokens to understand why AIs get stubborn
about a topic. If the first prediction is “dog” the AI is much more likely to
keep producing text about how great dogs are because those tokens are more
likely. However, if it is “subjective” it is less likely to give you an
opinion, even when you push it. Additionally, once the AI has written
something, it cannot go back, so it needs to justify (or explain or lie about)
that statement in the future. I like this example that [30]Rohit Krishnan [31]
shared, where you can see the AI makes an error, but then attempts to justify
the results.

[32]
[https]

The caveat: Saying “AI is just next-token prediction” is a bit of a joke
online, because it doesn’t really help us understand why AI can produce such
seemingly creative, novel, and interesting results. If you have been reading my
posts for any length of time, you will realize that AI accomplishes impressive
outcomes that, intuitively, we would not expect from an autocomplete system.

[33]
[https]
Claude makes themed Excel formulas on demand and explains them in delightful
ways. Next token prediction is capable of lots of unexpected results.

LLMs make predictions based on their training data

Where does an LLM get the material on which it builds a model of language? From
the data it was trained on. Modern LLMs are trained over an incredibly vast set
of data, incorporating large amounts of the web and every free book or archive
possible (plus some archives that almost certainly contain copyrighted work).
The AI companies largely did not ask permission before using this information,
but leaving aside the legal and ethical concerns, it can be helpful to
conceptualize the training data.

The original [35]Pile dataset, which most of the major AI companies used for
training, is about 1/3 based on the internet, 1/3 on scientific papers, and the
rest divided up between books, coding, chats, and more. So, your intuition is
often a good guide - if you expect something was on the internet or in the
public domain, it is likely in the training data. But we can get a little more
granular. For example, [36]thanks to this study, we have a rough idea of which
fiction books appear most often in the training data for GPT-4, which largely
tracks the books most commonly found on the web (many of the top 20 are out of
copyright, with a couple notable exceptions of books that are much pirated).

[37]
[https]

Remember that LLMs use a statistical model of language, they do not pull from a
database. So the more common a piece of work is in the training data, the more
likely the AI is to “recall” that data accurately when prompted. You can see
this at work when I give it a sentence from the most fiction common book in its
training data - Alice in Wonderland. It gets the next sentence exactly right,
and you can see that almost every possible next token would continue along the
lines of the original passage.

[38]
[https]

Let’s try something different, a passage from a fairly obscure mid-century
science fiction author, [39]Cordwainer Smith, with an unusual writing style in
part shaped by his time in China (he was Sun Yat-sen’s godson) and his
knowledge of multiple languages. One of his stories starts: Go back to An-fang,
the Peace Square at An-fang, the Beginning Place at An-fang, where all things
start. It then continues: Bright it was. Red square, dead square, clear square,
under a yellow sun. If I give the AI the first section, looking at the
probabilities, there is almost no chance that it will produce the correct next
word “Bright.” Instead, perhaps primed by the mythic language and the fact that
An-fang registers as potentially Chinese (it is actually a play on the German
word for beginning), it creates a passage about a religious journey.

[40]
[https]

The intuition: The fact that the LLM does not directly recall text would be
frustrating if you were trying to use an LLM like Google, but LLMs are not like
Google. They are capable of producing original material, and, even when they
attempt to give you Alice in Wonderland word-for-word, small differences will
randomly appear and eventually the stories will diverge. However, knowing what
is in the training data can help you in a number of ways.

First, it can help you understand what the AI is good at. Any document or
writing style that is common in its training data is likely something the AI is
very good at producing. But, more interestingly, it can help you think about
how to get more original work from the AI. By pushing it through your prompts
to a more unusual section of its probability space, you will get very different
answers than other people. Asking AI to write a memo in the style of [41]Walter
Pater will give you more interesting answers (and overwrought ones) than asking
for a professional memo, of which there are millions in the training data.

[42]
[https]

The caveat: Contrary to some people's beliefs, the AI is rarely producing
substantial text from its training data verbatim. The sentences the AI provides
are usually entirely novel, extrapolated from the language patterns it learned.
Occasionally, the model might reproduce a specific fact or phrase it memorized
from its training data, but more often, it's generalizing from learned patterns
to produce new content.

Outside of training, carefully crafted prompts can guide the model to produce
more original or task-specific content, demonstrating a capability known as
“in-context learning.” This allows LLMs to appear to learn new tasks within a
conversation, even though they're not actually updating their underlying model,
as you will see.

LLMs have a limited memory

Given how much we have discussed training, it may be surprising to learn that
AIs are not generally learning anything permanent from their conversations with
you. Training is usually a discrete event, not something that happens all the
time. If you have privacy features turned on, your chats are not being fed into
the training data at all, but, even if your data will be used for training, the
training process is not continuous. Instead, chats happen within what's called
a 'context window'. This context window is like the AI's short-term memory -
it's the amount of previous text the AI can consider when generating its next
response. As long as you stay in a single chat session and the conversation
fits inside the context window, the AI will keep track of what is happening,
but as soon as you start a new chat, the memories from the last one generally
do not carry over. You are starting fresh. The only exception is the limited
“memory” feature of ChatGPT, which notes down scattered facts about you in a
memory file and inserts those into the context window of every conversation.
Otherwise, the AI is not learning about you between chats.

Even as I write this, I know I will be getting comments from some people
arguing that I am wrong, along with descriptions of insights from the AI that
seem to violate this rule. People are often fooled because the AI is a very
good guesser, w[44]hich Simon Willison explains at length in his excellent post
on the topic of asking the AI for insights into yourself. It is worth reading.

The intuition: It can help to think about what the AI knows and doesn’t know
about you. Do not expect deep insights based on information that the AI does
not have but do expect it to make up insightful-sounding things if you push it.
Knowing how memory works, you can also see why it can help to start a new chat
when the AI gets stuck, or you don’t like where things are heading in a
conversation. Also, if you use ChatGPT, you may want to check out and[45] clean
up your memories every once in a while.

The caveat: The context windows of AIs are growing very long (Google’s Gemini
can hold 2 million tokens in memory), and AI companies want the experience of
working with their models to feel personal. I expect we will see more tricks to
get AIs to remember things about you across conversations being implemented
soon.

All of this is only sort of helpful

We still do not have a solid answer about how these basic principles of how
LLMs work have come together to make a system that is [47]seemingly more
creative than most humans, that we enjoy speaking with, and which does a
surprisingly good job at tasks ranging from corporate strategy to medicine.
There is no manual that lists what AI does well or where it might mess up, and
we can only tell so much from the underlying technology itself.

Understanding token prediction, training data, and memory constraints gives us
a peek behind the curtain, but it doesn't fully explain the magic happening on
stage. That said, this knowledge can help you push AI in more interesting
directions. Want more original outputs? Try prompts that veer into less common
territory in the training data. Stuck in a conversational rut? Remember the
context window and start fresh.

But the real way to understand AI is to use it. A lot. For about 10 hours, just
do stuff with AI that you do for work or fun. Poke it, prod it, ask it weird
questions. See where it shines and where it stumbles. Your hands-on experience
will teach you more than any article ever could (even this long one). You'll
figure out a remarkable amount about how to use AI effectively, and you might
even surprise yourself with what you discover.

[56][                    ]
Subscribe
[58]Share

528
Share this post
[https]

Thinking Like an AI

www.oneusefulthing.org
Copy link
Facebook
Email
Note
Other
[65]
60
41
[66]
Share
PreviousNext

Discussion about this post

Comments
Restacks
[https]
[                    ]
        [73]
        Mickey Schafer
        [74]Oct 20

        Perfect timing! This will be the first post students read next semester
        for a one-credit class called Prompting Curiosities 😊. I'm struggling
        to find those 10 hours so embedding it into a class seemed like a fun
[72]    way to get it done. Just me, 15 students, and the university's AI
[https] system which has most of the LLMs in 3-4 versions. We will start with
        simple prompts across different LLMs, then as each finds their
        favorite, they'll choose one thing as their final project and work on
        it. All in all, it should produce at least 20 per person which will
        help me understand these much better moving forward!

        Expand full comment
        Reply
        Share

[76]2 replies

        [78]
        Clarke Pitts
        [79]Oct 21Liked by Ethan Mollick

[77]    An excellent essay, interesting and intelligible. Very little
[https] explanation about AI and LLM is as lucid.

        Expand full comment
        Reply
        Share

[81]58 more comments...
Top
Latest
Discussions

No posts

Ready for more?

[94][                    ]
Subscribe
© 2024 Ethan Mollick
[96]Privacy ∙ [97]Terms ∙ [98]Collection notice
[99] Start Writing[100]Get the app
[101]Substack is the home for great culture
Share
Copy link
Facebook
Email
Note
Other
This site requires JavaScript to run correctly. Please [108]turn on JavaScript
or unblock scripts

References:

[1] https://www.oneusefulthing.org/
[2] https://www.oneusefulthing.org/
[13] https://substack.com/profile/846835-ethan-mollick
[14] https://substack.com/@oneusefulthing
[21] https://www.oneusefulthing.org/p/thinking-like-an-ai/comments
[22] javascript:void(0)
[23] https://a.co/d/9onRd33
[25] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F805116f4-c2dc-4804-b277-253d14b2139d_1292x105.png
[26] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb74661-2025-4694-b0db-a96d2166865e_1098x711.png
[27] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623e802b-c122-4ef0-a667-6e429b09cc54_1992x504.png
[28] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff7f2a21-1252-474d-896d-d307dc88eea7_1255x837.png
[29] https://arxiv.org/pdf/2212.09251
[30] https://www.strangeloopcanon.com/
[31] https://x.com/krishnanrohit/status/1802747007838384382
[32] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffc187f7b-6341-4ac9-b2e4-0c97d1eddef9_924x502.jpeg
[33] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd959adb9-d728-4e2f-b0f1-840b125ac9e0_1900x1126.png
[35] https://arxiv.org/abs/2101.00027
[36] https://arxiv.org/abs/2305.00118
[37] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fedb0dd91-9b8e-468e-8c37-cdda8bd3db5c_1290x864.jpeg
[38] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dc09899-6a1a-47b3-90b9-c23be78835f8_1504x429.png
[39] https://en.wikipedia.org/wiki/Cordwainer_Smith
[40] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc63b6bec-2dc7-48e4-8e71-ec056768ac96_1494x430.png
[41] https://en.wikipedia.org/wiki/Walter_Pater
[42] https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59a0b6a1-37ca-4447-8777-b94593809c4f_2025x1324.png
[44] https://simonwillison.net/2024/Oct/15/chatgpt-horoscopes/
[45] https://openai.com/index/memory-and-new-controls-for-chatgpt/
[47] https://docs.iza.org/dp17302.pdf
[58] https://www.oneusefulthing.org/p/thinking-like-an-ai?utm_source=substack&utm_medium=email&utm_content=share&action=share
[65] https://www.oneusefulthing.org/p/thinking-like-an-ai/comments
[66] javascript:void(0)
[72] https://substack.com/profile/244712-mickey-schafer
[73] https://substack.com/profile/244712-mickey-schafer
[74] https://www.oneusefulthing.org/p/thinking-like-an-ai/comment/73352564
[76] https://www.oneusefulthing.org/p/thinking-like-an-ai/comment/73352564
[77] https://substack.com/profile/14800577-clarke-pitts
[78] https://substack.com/profile/14800577-clarke-pitts
[79] https://www.oneusefulthing.org/p/thinking-like-an-ai/comment/73452831
[81] https://www.oneusefulthing.org/p/thinking-like-an-ai/comments
[96] https://substack.com/privacy
[97] https://substack.com/tos
[98] https://substack.com/ccpa#personal-data-collected
[99] https://substack.com/signup?utm_source=substack&utm_medium=web&utm_content=footer
[100] https://substack.com/app/app-store-redirect?utm_campaign=app-marketing&utm_content=web-footer-button
[101] https://substack.com/
[108] https://enable-javascript.com/