Can we get artificial intelligence to tell the truth? Probably not, but developers of large language model (LLM) chatbots should be legally required to reduce the risk of error, say a team of ethicists.
“What we’re trying to do is create incentive structures that encourage companies to put more emphasis on truth and accuracy when building their systems,” says Brent Mittelstadt of the University of Oxford.
LLM chatbots such as ChatGPT generate human-like responses to user questions based on statistical analysis of vast amounts of text. But while the answers usually seem convincing, they are prone to errors, a flaw known as “hallucinations.”
“We have really amazing generative AI systems, but they make mistakes very frequently, and there’s no fundamental way to fix them based on our understanding of how the systems fundamentally work,” Mittelstadt says.
This is a “huge problem” for the LLM system, he says, because it is deployed for use in a variety of situations, such as government decisions, where it is important to give factually correct, truthful answers and be honest about the limitations of your knowledge.
To address this issue, he and his colleagues have proposed a number of countermeasures: They say that large-scale language models should respond in a way that is similar to how humans do when asked factual questions.
That means being honest about what you know and what you don’t know. “It’s about taking the steps necessary to actually pay attention to what you’re claiming,” Mittelstadt says. “If I’m not sure about something, I’m not going to make something up to sound convincing. Rather, I’d say, ‘Hey, you know? I don’t know. Let me look into it. I’ll get back to you later.'”
This seems like a laudable goal, but Aelke Boiten of De Montfort University in the UK questions whether the ethicists’ demands are technically feasible. Companies have asked law students to stick to the truth, but so far that has proven too labor-intensive to be practical. “I don’t understand why you would want to make something a legal requirement that you think is fundamentally technologically impossible,” he says.
Mittelstadt and his colleagues suggest a more direct way to make LLMs closer to the truth: He says models should link to sources of information, as many currently do to back up their claims, and that making extensive use of a technique called search expansion generation to derive answers might help limit the chance of hallucinations.
He also argues that LLMs deployed in high-risk areas, such as government decision-making, should be scaled back or limited in the sources they can use: “If you had a language model that you wanted to use only in medicine, you might limit it to searching only academic articles published in high-quality medical journals,” he says.
Changing perceptions is also important, Mittelstadt says: “It would be good to move away from the idea that LLMs are good at answering factual questions, or at least can give reliable answers to factual questions, and start seeing LLMs as helping you with the facts that you present,” he says.
Katharina Goanta of Utrecht University in the Netherlands says researchers have focused too much on technology and not enough on the longer-term problem of falsehoods in public discourse. “Vilifying only law graduates in this context gives the impression that humans are perfectly hardworking and would never make such mistakes,” she says. “You can meet any judge in any jurisdiction and hear horror stories about lawyer negligence, and vice versa. This is not a machine problem.”
topic:
(Tag Translation) AI