This week, I got an early look at some fascinating new research that tackles one of the most pressing problems in generative AI: its habit of confidently asserting false information, or “hallucinating.”
By now, you probably know what I mean. Maybe you heard about the American lawyer who was fined last year after using ChatGPT for a court filing that included references to cases that never existed. Or perhaps you saw Google’s new “AI overviews” feature that told users it was healthy to eat rocks.
These aren't just amusing anecdotes — they're a serious barrier to AI's usefulness. So it was interesting to read a new study from researchers at Oxford, who have found a new way to check when an AI is likely to be hallucinating. Their approach, which they call "semantic entropy," can distinguish between factual and nonsense AI-generated answers with about 79% accuracy. That's a significant improvement over existing methods.
“There's a simple check for when a language model makes something up for no reason at all, which is if you ask the same question multiple times, it will give you different answers,” Sebastian Farquhar, the study’s lead author, tells me. Computer scientists have known this for a long time, but have struggled to turn it into a reliable way of detecting hallucinations. That’s because chatbots are talented at saying the same thing in lots of different ways. They might answer the same question (“what’s the capital of France”) in differently-formulated sentences that still all mean the same thing. (“The capital of France is Paris” … “Paris is France’s capital city.”)
The innovation in the paper is to use a large language model to group sentences with the same meanings together. This clustering then allows them to calculate a score that indicates how inconsistent the meanings (not just the words) of several answers to the same prompt are. That score can be used as a barometer for how likely a chatbot is to be hallucinating.
It’s not a perfect mechanism: a chatbot can still be consistently wrong, for example as a result of flawed training data. This method can’t catch those types of hallucinations. And it’s not cheap: it uses about 10 times more computing power than a normal conversation with a chatbot.
But the cool thing about this method is that it scales. Unlike some other ways of reducing hallucinations, it doesn’t require training an AI on specific data — which, say, could make an AI better at answering science questions accurately, but wouldn’t make it smarter when it comes to sport.
“My hope is that this opens up ways for large language models to be deployed where they can't currently be deployed – where a little bit more reliability than is currently available is needed,” says Farquhar, who is a senior research fellow at Oxford University’s department of computer science and a research scientist on Google DeepMind’s safety team. He imagines adding a button to ChatGPT that could allow a user to click on an answer and check the likelihood of its accuracy, which could allow users to have greater faith that their AI-generated answer is reliable. And if you can get an accurate enough signal that your AI is definitely hallucinating, you also have a number that you can optimize an AI to minimize.
Of the lawyer who was fined for relying on a ChatGPT hallucination, Farquhar says: “This would have saved him.”
You can read my full story in TIME here.
Called it
A couple of newsletters ago, on May 6, I wrote this:
We haven’t yet had a big whistleblower from a “frontier” AI lab, but right now I feel the way I felt that summer before Haugen went public. AI companies, fuelled by big tech money, are racing one another to build bigger and more capable AI models. Meanwhile, there are many workers inside these companies who care deeply about ethics and safety — and who worry intensely about the negative externalities of what they see as dangerous industry “race dynamics.” It’s clearer than ever that the business side of AI and the safety side of AI are in a shaky alliance. There are plenty of disincentives to whistleblowing: intimidating NDAs, the risk of ostracism, the harsh glare of public attention. But at some point, someone will decide the risks of inaction are so high that speaking out is worth the pain.
Barely a month later, on June 5, a group of whistleblowers mostly from OpenAI went public. Among their complaints: a lack of whistleblower protections. I spoke to two of them for a story in TIME. “Preexisting whistleblower protections don’t apply here because this industry is not really regulated, so there are no rules about a lot of the potentially dangerous stuff that companies could be doing,” one of them told me.
“The AGI labs are not really accountable to anyone,” said another. “Accountability requires that if some organization does something wrong, information about that can be shared. And right now that is not the case.”