DeepSeek: A breakthrough in AI for math (and everything else)


DeepSeek performance versus OpenAI

DeepSeek’s big splash

By now, many readers have likely heard about DeepSeek, a new AI software system developed by a team in China. The latest version (R1) was introduced on 20 Jan 2025, while many in the U.S. were preoccupied by Donald Trump’s inauguration. DeepSeek is variously termed a generative AI tool or a large language model (LLM), in that it uses machine learning techniques to process very large amounts of input text, then in the process becomes uncannily adept in generating responses to new queries. It represents yet another step forward in the march to artificial general intelligence.

DeepSeek’s advantages

DeepSeek-R1 is not only remarkably effective, but it is also much more compact and less computationally expensive than competing AI software, such as the latest version (“o1-1217”) of OpenAI’s chatbot. Peter Diamandis noted that DeepSeek was founded only about two years ago, has only 200 employees and started with only about 5 million dollars in capital (although they have invested much more since startup). By comparison, OpenAI is 10 years old, has roughly 4,500 employees, and has raised over 6 billion dollars.

As for hardware, Gale Pooley reported that DeepSeek runs on a system of only about 2,000 Nvidia graphics processing units (GPUs); another analyst claimed 50,000 Nvidia processors. But either figure is far fewer than the 100,000+ Nvidia processors reportedly being used by OpenAI and other state-of-the-art AI systems. On 27 Jan 2025, largely in response to the DeepSeek-R1 rollout, Nvidia’s stock tumbled 17%, erasing billions of dollars (although it has subsequently recouped most of this loss). And whereas OpenAI’s system is based on roughly 1.8 trillion parameters, active all the time, DeepSeek-R1 requires only 670 billion, and, further, only 37 billion need be active at any one time, for a dramatic saving in computation. Finally, DeepSeek has provided their software as open-source, so that anyone can test and build tools based on it. For additional analysis of DeepSeek’s technology, see this article by Sahin Ahmed or DeepSeek’s just-released technical report.

The remarkable fact is that DeepSeek-R1, in spite of being much more economical, performs nearly as well if not better than other state-of-the-art systems, including OpenAI’s “o1-1217” system. See the chart above, which is from DeepSeek’s technical report. The tested benchmarks are as follows:

  1. AIME 2024: A set of problems from the 2024 edition of the American Invitational Mathematics Examination.
  2. CodeForces: A competition coding benchmark designed to accurately evaluate the reasoning capabilities of LLMs with human-comparable standardized ELO ratings.
  3. GPQA Diamond: A subset of the larger Graduate-Level Google-Proof Q&A dataset of challenging questions that domain experts consistently answer correctly, but non-experts struggle to answer accurately, even with extensive internet access.
  4. MATH-500: This tests the ability to solve challenging high-school-level mathematical problems, typically requiring significant logical reasoning and multi-step solutions.
  5. MMLU: Massive Multitask Language Understanding is a benchmark designed to measure knowledge acquired during pretraining, by evaluating LLMs exclusively in zero-shot and few-shot settings.
  6. SWE-bench: This assesses an LLM’s ability to complete real-world software engineering tasks, specifically how the model can resolve GitHub issues from popular open-source Python repositories.

How well does DeepSeek perform on mathematical queries?

Two years ago (23 Feb 2023), the present author tested ChatGPT, which then was the state-of-the-art in AI chatbots, with requests to produce rigorous mathematical proofs for four well-known but nontrivial theorems:

  1. A general angle cannot be trisected with ruler and compass.
  2. $\pi$ is irrational.
  3. $\pi$ is transcendental.
  4. Every algebraic equation with integer coefficients has a root in the complex numbers.

The results, frankly, were abysmal — none of the “proofs” was acceptable. The trisection “proof” swept all the nontrivial steps under the rug with phrases such as “using Galois theory.” The “proof” that $\pi$ is irrational reasoned that “it has been proven that the decimal representation of $\pi$ is non-repeating and non-terminating, leading to a contradiction,” but of course $\pi$’s nonrepeating decimal expansion is a consequence, not a proof, of its irrationality. See this Math Scholar article for more details.

So how well does DeepSeek perform with these problems? Here are screen shots for the trisection theorem [1 above], the irrationality of $\pi$ [2 above] and the fundamental theorem of algebra [4 above], respectively, taken directly from DeepSeek responses, using the publicly available DeepSeek version 3 (not the more advanced version R1) on the DeepSeek website, on 1 Feb 2025:


As one can readily see, DeepSeek’s responses are accurate, complete, very well-written as English text, and even very nicely typeset. One can cite a few nits: In the trisection proof, one might prefer that the proof include a proof why the degrees of field extensions are multiplicative, but a reasonable proof of this can be obtained by additional queries. Also in the trisection proof, one might prefer that the equation $4x^3 – 3x – \cos \theta = 0$ be followed with a specific instance, say for $\theta = \pi / 3$, so that one obtains the specific equation $8 x^3 – 6 x – 1 = 0$, but this is a minor point. With regards to the proof of the fundamental theorem of algebra, it should have explicitly mentioned the assumption that $P(z)$ is of degree $n \geq 1$ at the start, but again this is a minor nit. Also, one might prefer that this proof be self-contained, rather than relying on Liouville’s theorem, but again one can separately request a proof of Liouville’s theorem, so this is not a significant issue. One larger criticism is that none of the three proofs cited any specific references. It may be that these can be provided if one requests them in some manner.

Overall, the present author was personally stunned at the quality of the DeepSeek responses. There is no question that it represents a major improvement over the state-of-the-art from just two years ago. Given DeepSeek’s simplicity, economy and open-source distribution policy, it must be taken very seriously in the AI world and in the larger realm of mathematics and scientific research.

What does the future hold?

In a recent interview, Fields medal-winning UCLA mathematician Terence Tao described his vision for the future of mathematical research in a world with AI and automatic proof-checking software (see also here):

Tao: I think in three years AI will become useful for mathematicians. It will be a great co-pilot. You’re trying to prove a theorem, and there’s one step that you think is true, but you can’t quite see how it’s true. And you can say, “AI, can you do this stuff for me?” And it may say, “I think I can prove this.” I don’t think mathematics will become solved. If there was another major breakthrough in AI, it’s possible, but I would say that in three years you will see notable progress, and it will become more and more manageable to actually use AI. And even if AI can do the type of mathematics we do now, it means that we will just move to a higher type of mathematics. So right now, for example, we prove things one at a time. It’s like individual craftsmen making a wooden doll or something. You take one doll and you very carefully paint everything, and so forth, and then you take another one. The way we do mathematics hasn’t changed that much. But in every other type of discipline, we have mass production. And so with AI, we can start proving hundreds of theorems or thousands of theorems at a time. And human mathematicians will direct the AIs to do various things. So I think the way we do mathematics will change, but their time frame is maybe a little bit aggressive.

For additional reading, see:

  • Previous MathScholar article on ChatGPT: Here.
  • Terence Tao’s vision of AI in mathematics: Here and Here.
  • Gale Pooley’s analysis of DeepSeek: Here.
  • Sahin Ahmed’s analysis of the DeepSeek technology: Here.
  • DeepSeek’s January 2025 technical report: Here.
  • DeepSeek’s website, from which one may experiment with or download their software: Here.

Comments are closed.