by Joe Miller
It wasn’t that long ago that you’d need to visit a university research lab, a philosophy classroom or a science fiction convention to talk about AI (artificial intelligence).
Now it’s pretty much everywhere.
ChatGPT has replaced search for younger Americans. Claude has done the same for tech insiders. Google’s Gemini offers to write your emails for you. You can send direct messages to Llama through Facebook or Instagram. Apple’s AI automatically edits your photos.
Some of those implementations are annoying. (I write this column in Google Docs, and I am forever telling Gemini to go away.)
But some are pretty amazing. I use a tool called Elicit to search academic journals. I enter my research question in natural language. Elicit translates that sentence into a search query, returns a set of relevant results and then generates a bespoke summary explaining how each result relates to my question. It saves hours of work every time I conduct a literature review.
That said, Elicit doesn’t eliminate the work entirely. I still have to read all the articles myself. That’s because the AI behind Elicit (a combination of GPT and Claude) sometimes hallucinates. (Hallucination is tech-speak euphemism for makes things up.
Hallucinations happen because AIs like GPT and Gemini and Claude don’t actually think.
If you’ve been around for a while, you probably remember when chatbots first started popping up as customer service representatives. Even before the web really took off, we had to navigate annoying phone trees to reach an actual customer service representative.
Those telephone agents – like their early web chatbot cousins – relied on a set of rules to guide users to an automated answer. It’s the equivalent of writing a whole bunch of statements that take the form, “If input is x, then output y.”
You can probably see why this approach was mostly annoying. There’s no way to program responses for every possible input because the number of possible inputs is effectively infinite. That left us trying to guess at just the right phrasing to get a useful response.
The current crop of AI works differently. They are built using something called large language models (or LLM).
AI built with an LLM doesn’t rely on if, then statements to crank out canned responses. LLMs are trained on a huge set of existing texts, which allows them to uncover patterns in language use. When you ask a question of an LLM, it generates a unique response by predicting the most likely first word that follows your question, then the most likely word that follows that word, and so forth.
You know how your phone will autosuggest words after you type the first few letters?
LLMs are that, but bigger.
I mean a lot bigger. The current crop of LLMs have absorbed the entire public-facing internet. That scale is important, because the more examples an LLM has, the more accurately it can predict the right responses.
But while each generation of LLMs has gotten more accurate, LLMs still hallucinate because they don’t actually understand any of the words they produce.
The American philosopher John Searle has a thought experiment that helps illustrate the problem.
Imagine that you are sealed inside a room that contains a stack of paper, a pen and a lot of books. A mail slot provides your only access to the outside world. Occasionally, pieces of paper with sets of squiggles written on them come through the mail slot. Your task is to look up those squiggles in your books and then write down a corresponding set of squiggles. You pass these new squiggles back through the mail slot.
It turns out that the squiggles coming through the mail slot are questions written in Chinese. Your squiggles are answers, also written in Chinese. A Chinese speaker outside the room is having what appears to be an intelligible conversation.
Nevertheless, you don’t actually know Chinese. You are formally manipulating Chinese symbols. But you don’t understand what any of the symbols mean.
Large language models are stuck inside Searle’s Chinese room.
They do not understand the meaning of the symbols you input, nor do they understand the meaning of the symbols they output in response. They don’t know if their answer is true because they don’t know what true means any more than they know what any other word they produce means.
As a result, you end up with things like the strawberry bug. Multiple LLMs can spell the word correctly but will insist that there are but two r’s in the word. The models don’t know what words mean and therefore cannot make simple inferences about them.
As LLM training sets get bigger, hallucinations will grow less frequent. But these models won’t get you all the way to thinking any more than adding more books to the Chinese room will turn you into a Chinese speaker.
Joe Miller
joe.miller@fountaindigitalconsulting.com
