A few years before ChatGPT started spouting gibberish, Google developed an entirely different kind of artificial intelligence program called AlphaGo, which learned through relentless practice to play the board game Go with superhuman skill.
The company’s researchers published work combining the power of large-scale language models (the AI behind today’s chatbots) with the abilities of AlphaZero, the chess-playing successor to AlphaGo, to solve extremely difficult mathematical proofs.
Their new Frankenstein-like creation, “AlphaProof,” has demonstrated its capabilities by tackling several problems in the 2024 International Mathematical Olympiad (IMO), a prestigious competition for high school students.
AlphaProof uses the Gemini large-scale language model to translate naturally expressed mathematical problems into a programming language called Lean, which provides the training material for a second algorithm to learn how to find verifiable proofs of correctness through trial and error.
Earlier this year, Google DeepMind introduced another mathematical algorithm called AlphaGeometry that combines language models with a different AI approach. AlphaGeometry uses Gemini to convert geometry problems into a form that can be manipulated and tested by programs that process geometric elements. Google today also announced a new and improved version of AlphaGeometry.
The researchers found that two math programs were able to provide silver medalist-level proofs of the IMO puzzle. The programs solved two algebraic problems and one number theory problem out of a total of six problems. One problem was solved in minutes, while the other took days. Google DeepMind did not disclose how much computer power it put into solving the problems.
Google DeepMind calls the approach used for both AlphaProof and AlphaGeometry “neurosymbolic” because it combines the pure machine learning of artificial neural networks — the technology underpinning recent advances in AI — with traditional programming languages.
“What we’ve seen here is that you can combine a very successful approach, and something like AlphaGo, with a large-scale language model, to produce something very competent,” says David Silver, a researcher at Google DeepMind who led the AlphaZero work. Silver says that the techniques demonstrated by AlphaProof should theoretically be extendable to other areas of mathematics.
In fact, this research suggests that applying logic and reasoning in a more grounded way may address the worst tendencies of large language models: they perform phenomenally, but they also have difficulty understanding even basic mathematics, let alone reasoning about problems logically.
In the future, neural symbolic methods could become a means of translating questions or tasks into a form that AI systems can reason about in a way that produces trustworthy results. OpenAI is also rumored to be working on such a system, codenamed “Strawberry.”
But as Silver acknowledges, the system unveiled today has important limitations. Mathematical solutions are either right or wrong, and AlphaProof and AlphaGeometry can work toward the right answer. Many real-world problems, such as figuring out the ideal travel itinerary, have many possible solutions, and it can be unclear which one is ideal. A solution for more ambiguous questions, Silver says, might be for the language model to determine what constitutes a “right” answer during training. “There are a variety of different approaches you can try,” he says.
Silver also cautions that Google DeepMind won’t take over the jobs of human mathematicians. “We aim to provide a system that can prove anything, but that’s not the only job of a mathematician,” Silver said. “A big part of math is posing problems and figuring out what are interesting questions to ask. Think of this as a tool, similar to a slide rule or a calculator or a computational tool.”