New AI stuns mathematicians with its problem-solving skill

By David H Bailey, on June 9th, 2025

Credit: Yuichiro Chino/Getty

Updated 10 June 2025

Artificial intelligence’s march of progress

One by one, large AI- and machine learning-based systems have scored numerous successes over the past few years; it is clear that we have witnessed (and are witnessing) a revolution in the field. Among these successes are the following:

Defeating the reigning world chess champion.
Defeating two champion contestants on the American quiz show Jeopardy.
Defeating the world’s best human Go player.
Defeating all other entries in the 2020 Critical Assessment of Protein Structure Prediction (CASP) competition, a development described by German biologist Andrei Lupas as “This will change medicine. It will change research. It will change bioengineering. It will change everything.”
Demonstrating a weather prediction system that exhibits better accuracy than state-of-the-art conventional systems, at significantly less computational cost.
Generating impressive reports and analyses on diverse topics, with the recent emergence of tools such as ChatGPT.

AI and mathematical research

Computers have long been employed in mathematical research, notably with the rise of tools such as Mathematica and Maple, as well as a host of more specialized tools for symbolic and high-precision computation. The present author has personally employed and championed the use of such tools for at least 30 years, as part of the emerging discipline of experimental mathematics. See, for example, this article.

But in spite the above-listed successes of AI- and machine-learning-based tools, most mathematicians have been rather skeptical that this type of software would be of much use in leading-edge mathematical research. For example, in February 2023 the present author, in an earlier Math Scholar article, reported the results of asking ChatGPT to provide proofs for four classic mathematical theorems:

A general angle cannot be trisected with ruler and compass.
Pi is irrational.
Pi is transcendental.
Every algebraic equation with integer coefficients has a root in the complex numbers.

The result were not very impressive, to the least. Key steps were swept under the rug by vague references to other results. Circular and nonsensical reasoning was employed. None of the results was even remotely acceptable.

Two years later, in February 2025, the present author revisited this exercise, this time with the newly released DeepSeek tool. This time, the results were very different — in each of the four cases, an entirely cogent and well-justified proof was produced. The results were, frankly, rather stunning. It is clear that DeepSeek and other similar cutting-edge AI-based tools had made dramatic advances in mathematical skill.

A meeting to assess AI-based tools for mathematical research

On May 17-18, 2025, a number of leading mathematicians met in Berkeley, California to devise a set of benchmark mathematical problems for AI-powered software systems to solve. The meeting was conducted by Epoch AI, a nonprofit organization funded by OpenAI devoted to producing benchmarks for large language model (LLM) AI systems.

Epoch AI’s specific task was to produce a set of 300 challenging math problems whose solutions had not yet been published. Most AI systems tested by Epoch AI performed rather poorly on the emerging suite, typically solving less than 2% of the test suite, but one stood out — the o4-mini tool. The group subsequently formulated a suite of problems that o4-mini could not solve. The in-person meeting in May was conducted to finalize the preparation of the suite. But even then the group had difficulty formulating a test suite that was impervious to o4-mini.

One mathematician attending the meeting was Ken Ono of the University of Virginia. He is well-known as a consultant for the university’s Olympic medal-winning swim team (see here for some details), but he is also a distinguished mathematician specializing in number theory. Among other roles, he serves as Editor-in-Chief of the Ramanujan Journal.

During the May 17-18 meeting, Ono admitted becoming frustrated at the o4-mini bot, whose unexpectedly impressive mathematical skill had been foiling the group’s progress. He reported, “I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem.” He then asked o4-mini to solve the question. A Scientific American report describes the outcome:

Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler “toy” version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. “It was starting to get really cheeky,” says Ono. … “And at the end, it says, ‘No citation necessary because the mystery number was computed by me!’”

[See a more detailed account at the bottom of this article.]

The group finally succeeded in finding 10 questions that o4-mini could not solve, but they were amazed at its skill. For one thing, the bot was much faster than a professional mathematician — it could do in minutes what a good human mathematician would require weeks or months to complete.

At the end of the meeting, the discussion inevitably turned to consider the future of mathematics in general and the role of human mathematicians in particular. Indeed, it may well be that human researchers will become more like a senior mentor, steering the course of the research, but relying on bots (or graduate students skilled in using the bots) to do the day-to-day work.

Additional details and analysis are in this Scientific American report.

Terence Tao on proof checkers, AI and the future of mathematical research

In June 2024, UCLA Fields Medalist mathematician Terence Tao was interviewed by Christoph Drosser, a writer for Spektrum der Wissenschaft, the German edition of Scientific American. Drosser and Tao discussed at length the current status of computer tools in mathematical research, including proof-checking software such as Lean and also AI-based tools, and what is likely to be the future course of the field with the ascendance of these tools. Here are some brief excerpts from this remarkable interview, with some minor editing by the present author (see also this summary):

Drosser: I have talked to people that try to use large language models or similar machine-learning technologies to create new proofs. Tony Wu and Christian Szegedy, who recently co-founded the company xAI, with Elon Musk and others, told me that in two to three years mathematics will be “solved” in the same sense that chess is solved — that machines will be better than any human at finding proofs.

Tao: I think in three years AI will become useful for mathematicians. It will be a great co-pilot. You’re trying to prove a theorem, and there’s one step that you think is true, but you can’t quite see how it’s true. And you can say, “AI, can you do this stuff for me?” And it may say, “I think I can prove this.” I don’t think mathematics will become solved. If there was another major breakthrough in AI, it’s possible, but I would say that in three years you will see notable progress, and it will become more and more manageable to actually use AI. And even if AI can do the type of mathematics we do now, it means that we will just move to a higher type of mathematics. So right now, for example, we prove things one at a time. It’s like individual craftsmen making a wooden doll or something. You take one doll and you very carefully paint everything, and so forth, and then you take another one. The way we do mathematics hasn’t changed that much. But in every other type of discipline, we have mass production. And so with AI, we can start proving hundreds of theorems or thousands of theorems at a time. And human mathematicians will direct the AIs to do various things. So I think the way we do mathematics will change, but their time frame is maybe a little bit aggressive.

Drosser: I interviewed Peter Scholze when he won the Fields Medal in 2018. I asked him, How many people understand what you’re doing? And he said there were about 10 people.

Tao: With formalization projects, what we’ve noticed is that you can collaborate with people who don’t understand the entire mathematics of the entire project, but they understand one tiny little piece. It’s like any modern device. No single person can build a computer on their own, mine all the metals and refine them, and then create the hardware and the software. We have all these specialists, and we have a big logistics supply chain, and eventually we can create a smartphone or whatever. Right now, in a mathematical collaboration, everyone has to know pretty much all the mathematics, and that is a stumbling block, as [Scholze] mentioned. But with these formalizations, it is possible to compartmentalize and contribute to a project only knowing a piece of it. I think also we should start formalizing textbooks. If a textbook is formalized, you can create these very interactive textbooks, where you could describe the proof of a result in a very high-level sense, assuming lots of knowledge. But if there are steps that you don’t understand, you can expand them and go into details — all the way down the axioms if you want to. No one does this right now for textbooks because it’s too much work. But if you’re already formalizing it, the computer can create these interactive textbooks for you. It will make it easier for a mathematician in one field to start contributing to another because you can precisely specify subtasks of a big task that don’t require understanding everything.

[Continuing after some skip:]

Drosser: By breaking down a problem and exploring it, you learn a lot of new things on the way, too. Fermat’s Last Theorem, for example, was a simple conjecture about natural numbers, but the math that was developed to prove it isn’t necessarily about natural numbers anymore. So tackling a proof is much more than just proving this one instance.

Tao: Let’s say an AI supplies an incomprehensible, ugly proof. Then you can work with it, and you can analyze it. Suppose this proof uses 10 hypotheses to get one conclusion — if I delete one hypothesis, does the proof still work? That’s a science that doesn’t really exist yet because we don’t have so many AI-generated proofs, but I think there will be a new type of mathematician that will take AI-generated mathematics and make it more comprehensible. Like, we have theoretical and experimental science. There are lots of things that we discover empirically, but then we do more experiments, and we discover laws of nature. We don’t do that right now in mathematics. But I think there’ll be an industry of people trying to extract insight from AI proofs that initially don’t have any insight.

Drosser: So instead of this being the end of mathematics, would it be a bright future for mathematics?

Tao: I think there’ll be different ways of doing mathematics that just don’t exist right now. I can see project manager mathematicians who can organize very complicated projects — they don’t understand all the mathematics, but they can break things up into smaller pieces and delegate them to other people, and they have good people skills. Then there are specialists who work in subfields. There are people who are good at trying to train AI on specific types of mathematics, and then there are people who can convert the AI proofs into something human-readable. It will become much more like the way almost any other modern industry works. Like, in journalism, not everyone has the same set of skills. You have editors, you have journalists, and you have businesspeople, and so forth — we’ll have similar things in mathematics eventually.

The full text of the interview is available here.

Addendum: Here is a more detailed summary (from Ken Ono) of the o4-mini’s solution process for the number theory question he posted:

Math Scholar

New AI stuns mathematicians with its problem-solving skill

Artificial intelligence’s march of progress

AI and mathematical research

A meeting to assess AI-based tools for mathematical research

Terence Tao on proof checkers, AI and the future of mathematical research

Recent Posts

Meta