AI reaches silver medal level at this year’s Mathematics Olympiad

At the 2024 International Mathematical Olympiad, Google DeepMind presented an AI program capable of generating complex mathematical proofs.

Manon Bischoff

As Paris prepares to host the 33rd Olympiad, more than 600 students from nearly 110 countries gathered in the beautiful English city of Bath in July for the International Mathematical Olympiad (IMO). The students answered six problems across a range of mathematical disciplines over two four-and-a-half-hour sessions. In the individual rankings, Chinese student Haojia Shi took first place with a perfect score. In the national rankings, a team from the United States took first place. But the event’s biggest draw was the results achieved by two Google DeepMind machines that participated in the competition. DeepMind’s artificial intelligence program was able to solve four out of six problems, equivalent to a silver medalist level. The two programs received 28 points out of a possible 42 points. Timothy Gowers, a mathematician, Fields Medal winner, and former gold medalist in the competition, wrote in a thread on X (formerly Twitter) that only about 60 students scored better than him.

To achieve this impressive result, the DeepMind team used two different AI programs: AlphaProof and AlphaGeometry 2. The former works in a similar way to the algorithms that have mastered chess, shogi and Go. Using something called reinforcement learning, AlphaProof repeatedly competes against itself, improving step by step. This method is very easy to implement in board games: the AI performs several moves; if these do not lead to victory, it is penalized and learns to pursue other strategies.

But to do the same for mathematical problems, we need not only to be able to confirm that a program has solved the problem, but also to be able to verify whether the reasoning steps that led to the solution were correct. To achieve this, AlphaProof uses so-called proof assistants: algorithms that run a logical argument step-by-step to check whether the answer to a posed problem is correct. Proof assistants have existed for decades, but their use in machine learning has been constrained by the very limited amount of mathematical data available in formal languages such as Lean that computers can understand.

Supporting science journalism

If you enjoyed this article, please support our award-winning journalism. Subscribe. By purchasing a subscription, you help ensure a future of influential stories about the discoveries and ideas shaping the world today.

Meanwhile, solutions to mathematical problems written in natural language are plentiful. There are many problems on the Internet that have been solved step-by-step by humans. So the DeepMind team trained a large-scale language model called Gemini to translate a million such problems into Lean programming language so that the Proof Assistant can use it for training. “When presented with a problem, AlphaProof generates candidate solutions and searches for possible proof steps in Lean to prove or disprove them,” the developers wrote on the DeepMind website. By doing so, AlphaProof gradually learns which proof steps are useful and which are not, improving its ability to solve more complex problems.

Geometry problems, which also come up in IMO, usually require a completely different approach. In January, DeepMind announced an AI called AlphaGeometry that can solve such problems well. To do this, experts first generated a large set of geometric “premises,” or starting points: for example, a triangle with its height drawn and points marked along its sides. Then, the researchers used what they called a “deduction engine” to infer further properties of the triangle, such as which angles are congruent and which lines are perpendicular to each other. By combining these diagrams with the derived properties, the experts created a training dataset consisting of theorems and corresponding proofs. This procedure was combined with a large language model that also uses what are called auxiliary structures. The model adds another point to the triangle to make it a quadrangle, which helps solve the problem. Now, by training the model on even more data and speeding up the algorithm, the DeepMind team has announced an improved version called AlphaGeometry 2.

To test their programs, DeepMind researchers pitted two AI systems against each other in this year’s Mathematics Olympiad. The teams first had to translate the problems into Lean manually. AlphaGeometry 2 was able to correctly solve the geometry problem in just 19 seconds. Meanwhile, AlphaProof was able to solve one number theory problem and two algebra problems, including one that only five of the human participants were able to solve. But the AI couldn’t solve the combinatorial problems. This may be because these problems are very difficult to translate into a programming language such as Lean.

AlphaProof’s performance was slow, taking more than 60 hours to solve some problems, significantly longer than the total of nine hours allotted to the students. “Human competitors would undoubtedly have scored higher if they had been given that much time per problem,” Gowers wrote in X. “That said, (i) this is well beyond what automated theorem provers have been able to do so far, and (ii) these times are likely to decrease as efficiency improves.”

Gowers and mathematician Joseph K. Myers, another former gold medalist, evaluated the solutions of the two AI systems using the same criteria used for the human participants. According to those criteria, the programs received an impressive score of 28 points, equivalent to a silver medal, meaning the AI just missed out on a gold medal-level performance, which is awarded for a score of 29 points or higher.

As for X, Gowers emphasized that AI programs are trained on a fairly wide range of problems, and that these techniques are not limited to the Mathematical Olympiad. “We may be getting closer to having programs that allow mathematicians to get answers to a wide range of problems,” he explained. “Are we getting close to the point where we no longer need mathematicians? I don’t know.”

This article was originally published on The scientific spectrum Reprinted with permission..

What's Hot

Nuclear waste tanker pilot controls futuristic aluminum sails

Control data access to Amazon S3 from Amazon SageMaker Studio with Amazon S3 Access Grants

Today’s Wordle: Answers and Hints for August 19

Accelerating insurance policy reviews with generative AI: Verisk’s Mozart companion

Announcing general availability of Amazon Bedrock Knowledge Bases GraphRAG with Amazon Neptune Analytics

Evaluate RAG responses with Amazon Bedrock, LlamaIndex and RAGAS

Build a Multi-Agent System with LangGraph and Mistral on AWS

Ground truth generation and review best practices for evaluating generative AI question-answering with FMEval

Mathematicians solve the infamous “moving sofa problems”

A symmetrical theorem of the 100 -year -old who is still changing physics today

Laughing gas could be picked up by respiratory organs

Hawaiian crows return to the wild as “guides of souls”

Connect the Amazon Q Business generative AI coding companion to your GitHub repositories with Amazon Q GitHub (Cloud) connector

Tubi vs Pluto TV: The combo is unbeatable

Most Popular

Thermal storage cannot be ignored in the race to expand renewable energy

Book review: The secret of why stories are passed down through generations

On-ear headphones vs. Over-ear headphones: Which sounds better?

Our Picks

Astronauts could hitchhike aboard an asteroid and reach Venus or Mars

Sinner vs Medvedev 2024 Live Stream: Watch US Open Free

Sam Altman Teases ‘Project Strawberry’ on X: Everything We Know About the Secret AI Tool

Subscribe to our newsletter

Subscribe to Updates

What's Hot

AI reaches silver medal level at this year’s Mathematics Olympiad

Supporting science journalism

Related Posts

Subscribe to our newsletter

Subscribe to our newsletter