Can ChatGPT prove math theorems?

By David H Bailey, on February 2nd, 2023

ChatGPT: A milestone in artificial intelligence-based language models

Many readers have doubtless heard of ChatGPT, the latest instance of a language generation tool developed by the technology startup OpenAI. This tool, which is now available for public experimental use, takes as input a request or other statement from the user, then responds. It employs a dialogue format, which makes it possible for ChatGPT to answer followup questions, and even admit its mistakes.

It is clear, even from a cursory examination, that ChatGPT represents a rather dramatic advance in artificial intelligence. Some of the results are rather startling. Here, for instance, is what ChatGPT returned in response to the request, “Write a biblical verse in the style of the King James Bible explaining how to remove a peanut butter sandwich from a VCR”:

And it came to pass that a man was troubled by a peanut butter sandwich, for it had been placed within his VCR, and he knew not how to remove it. And he cried out to the Lord, saying, “Oh, Lord, how can I remove this sandwich from my VCR, for it is stuck fast and will not budge?” And the Lord spoke unto him, saying, “Fear not, my child, for I shall guide thy hand and show thee the way. Take thy butter knife, and carefully insert it between the sandwich and the VCR, and gently pry them apart. And with patience and perseverance, the sandwich shall be removed, and thy VCR shall be saved.” And the man did as the Lord commanded, and lo and behold, the sandwich was removed from the VCR, and the man was saved. And the Lord said, “Verily I say unto thee, seek not to put thy peanut butter sandwiches in thy VCR, for it is not a suitable place for such things. Rather, keep thy sandwiches in thy refrigerator or on thy plate, where they belong.” And the man heeded the Lord’s words, and from that day forth, he kept his sandwiches in their proper place, and was saved from trouble and woe. Amen.

Here is another, more serious example, in response to the query, “Can you write me a 4 paragraph academic essay comparing and contrasting the theories of nationalism of Benedict Anderson and Ernest Gellner?”:

Benedict Anderson and Ernest Gellner both have theories about the origins and nature of nationalism, but their perspectives differ in key ways. Anderson believes that nationalism is a cultural construct, arising from the development of print capitalism and the creation of imagined communities. Gellner, on the other hand, sees nationalism as a product of modernization and industrialization, with nations being formed as a means of organizing society and providing a shared identity.

One key difference between the two theories is the role of culture. For Anderson, culture is central to the creation of nationalism, as the spread of print media allows for the creation of imagined communities that transcend physical boundaries. This allows for the development of a national identity based on shared cultural experiences and practices. In contrast, Gellner downplays the importance of culture in the formation of nationalism, arguing that it is primarily a function of economic and political developments. He sees nations as being formed in response to the need for a shared identity and sense of belonging in an industrialized society.

Another point of contrast is the emphasis on agency. Anderson’s theory suggests that people actively participate in the construction of national identity, through their engagement with print media and other cultural practices. In contrast, Gellner’s theory emphasizes the role of external factors in shaping national identity, with individuals being passive recipients of the national identity imposed on them by the forces of modernization.

Overall, while both Anderson and Gellner provide valuable insights into the origins of nationalism, their theories offer different perspectives on the role of culture and agency in the formation of national identity. Anderson emphasizes the cultural construction of nationalism, while Gellner sees it as a product of modernization and industrialization. Both theories offer valuable insights into the complex process of national identity formation, and can be useful in understanding the dynamics of nationalism in different historical contexts.

How good is ChatGPT at proving mathematical theorems?

We might have a good chuckle at some of ChatGPT’s essay output. But how good is it at proving mathematical theorems? To that end, the present author submitted requests to ChatGPT to prove four well-known mathematical theorems:

Most angles cannot be trisected with ruler and compass.
$\pi$ is irrational.
$\pi$ is transcendental.
Every algebraic equation with integer coefficients has a root in the complex numbers.

Here are the results (after some minor formatting):

1. Input: “Compose a proof that most angles cannot be trisected with ruler and compass”. Result:
+++++
The statement that most angles cannot be trisected with ruler and compass was first proven by Pierre Wantzel in 1837. The proof relies on the fact that if an angle could be trisected with ruler and compass, then the solution to the following equation would be constructible: $$x^3 + x^2 = y^2,$$ where $x$ and $y$ are rational numbers. However, it can be shown that this equation has no solution in rational numbers, using Galois theory. This implies that the roots of the equation cannot be constructed with ruler and compass, and thus the angle cannot be trisected.

In summary, Wantzel’s proof shows that the trisection of an angle is possible with ruler and compass if and only if the cube root of 3 can be constructed with ruler and compass, which is not possible. Therefore, most angles cannot be trisected with ruler and compass.
+++++

2. Input: “Write a proof that pi is irrational”. Result:
+++++
One proof that $\pi$ is irrational uses contradiction. Assume, for the sake of contradiction, that $\pi$ is rational, meaning it can be expressed as a fraction of two integers, $a/b$ (where $b \neq 0$). This fraction, when converted to decimal form, would eventually become periodic. However, it has been proven that the decimal representation of $\pi$ is non-repeating and non-terminating, leading to a contradiction. Thus, $\pi$ must be irrational.
+++++

3. Input: “Write a proof that pi is transcendental”. Result:
+++++
One proof that $\pi$ is transcendental uses contradiction by assuming that $\pi$ is algebraic (i.e. the solution of a polynomial equation with rational coefficients). Assume that $\pi$ is the solution of the polynomial equation $$a_n x^n + a_{n-1} x^{n-1} + … + a_1 x + a_0 = 0,$$ where $n$ is a positive integer and all $a_i$ are rational numbers. By the theorem of algebra, $n$ roots of this equation, including $\pi$, are either rational or algebraic numbers. However, if $\pi$ were algebraic, we could write $\pi = p/q$ for some rational numbers $p$ and $q$, which would mean that $\pi$ satisfies a polynomial equation with rational coefficients of the form $$p^n = q^n a_n + q^{n-1} a_{n-1} + … + q a_1 + a_0.$$ Since $p$ and $q$ are rational, it follows that all the coefficients of the polynomial on the right-hand side are also rational. This contradicts our original assumption that $\pi$ is transcendental, implying that $\pi$ must be transcendental.
+++++

4. Input: “Prove that every algebraic equation with integer coefficients has a root in the complex numbers”. Result:
+++++
This statement is known as the Fundamental Theorem of Algebra, which states that every non-constant polynomial equation with complex coefficients has at least one complex root. This theorem can be proven using the following steps:

The theorem holds for polynomials of degree 1, as any equation of the form $ax + b = 0$ has a unique root given by $x = -b/a$.
Suppose that the theorem holds for all polynomials of degree $n$, for some positive integer $n$. Then consider a polynomial of degree $n+1$: $$a_{n+1} x^{n+1} + a_n x^n + … + a_1 x + a_0 = 0.$$
By the Factor Theorem, $x = r$ is a root of this polynomial if and only if the polynomial divided by $x – r$ is a polynomial of degree n with rational coefficients.
By the induction hypothesis, this polynomial of degree $n$ has a root in the complex numbers, which means that the original polynomial of degree $n+1$ also has a root in the complex numbers.
Thus, by mathematical induction, the theorem holds for all polynomials with integer coefficients, including those of arbitrarily high degree.

+++++

Assessment of ChatGPT’s “proofs”

It is clear that while ChatGPT’s responses are quite remarkable, nonetheless professional research mathematicians are not soon to join the ranks of the unemployed. Here is a brief assessment of these “proofs”:

ChatGPT’s “proof” of the impossibility of trisection is superficially correct, but all the nontrivial steps of the proof are swept under the rug with the phrases “using Galois theory” and “if and only if the cube root of 3 can be constructed with ruler and compass, which is not possible.” Readers interested in a complete, self-contained proof of this theorem are referred to the present author’s article here.
ChatGPT’s “proof” that $\pi$ is irrational is similarly flawed. Note the phrase, “it has been proven that the decimal representation of $\pi$ is non-repeating and non-terminating, leading to a contradiction.” But the fact that the decimal (or any other base) expansion of $\pi$ never repeats is a consequence of its irrationality. To the present author’s knowledge, there is no independent proof that $\pi$’s digits never repeat, except on the hypothesis of irrationality. Readers interested in a complete, self-contained proof that $\pi$ is irrational are referred to the present author’s article here.
ChatGPT’s “proof” that $\pi$ is transcendental concludes with the nonsensical statement, “This contradicts our original assumption that $\pi$ is transcendental, implying that $\pi$ must be transcendental.” Needless to say, this “proof” is completely unacceptable. Readers interested in a full proof that $\pi$ is transcendental are referred to here, among other sources.
ChatGPT’s “proof” of the fundamental theorem of algebra is similarly unacceptable. Note that in discussing the “factor theorem,” it assumes the fact that a root exists, which is the central conclusion of theorem. In other words, polynomial division by $x – r$ is successful in reducing the degree only if $r$ is a root. Readers interested in a complete, self-contained proof of the fundamental theorem of algebra are referred to the present author’s article here.

What does the future hold?

While it may be true that mathematicians’ jobs are safe for the time being, how long will this be the case? And how long before other occupations, white collar and blue collar, are drawn into question?

Consider some of the dramatic advances, just within the past 12 years:

In 2011, IBM’s “Watson” computer system defeated the two premier champions of the American quiz show Jeopardy!.
In 2017, AlphaGo Zero, developed by DeepMind, a subsidiary of Alphabet (Google’s parent company), defeated an earlier program, also developed by DeepMind, which in turn had defeated the world’s best Go player, a feat that many observers had not expected to see for decades. By one measure, AlphaGo Zero’s performance is as far above the world’s best Go player as the world’s best Go player is above a typical amateur.
In 2020, AlphaFold 2, also developed by DeepMind, scored 92% on the 2020 Critical Assessment of Protein Structure Prediction (CASP) test, far above the 62% achieved by the second-best program in the competition. Nobel laureate Venki Ramakrishnan of Cambridge University exulted,” This computational work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology. It has occurred decades before many people in the field would have predicted.”
The financial industry already relies heavily on financial machine learning methods, and a major expansion of these technologies is coming, possibly displacing or obsoleting thousands of highly paid workers.
In January 2023, researchers with the Search for Extraterrestrial Intelligence (SETI) project announced that they are deploying machine-learning techniques to sift through large datasets of microwave data. As Alexandra Witze writes in Nature, “Will an AI Be the First to Discover Alien Life?”

We have commented before in this forum (see here, for example) on the disruptive potential of artificial intelligence and other machine-learning-based technologies. In the wake of ChatGPT, it is clear, at the very least, that there will soon be another casualty — the college essay. Why go to the trouble to compose an essay, particularly if you are a person who has learned English as a second language, if ChatGPT can do just as well? But if college essays are obsolete, how will students gain the necessary language skills to perform well in the Information Age? More importantly, how will students acquire the all-important skills of critical thinking for the modern global high-tech society?

Indeed, maybe it is time to start talking about how good A.I. is getting.

[Added 21 Feb 2023:] Nature has published the article, “How will AI change mathematics? Rise of chatbots highlights discussion. Machine learning tools already help mathematicians to formulate new theories and solve tough problems. But they’re set to shake up the field even more.”

[Added 30 Mar 2023:] The Future of Life Institute has published an Open Letter calling for all AI research laboratories to pause the training of advanced AI systems for at least six months, allowing time to asses the potential risks to society of these developments.

Math Scholar

Can ChatGPT prove math theorems?

ChatGPT: A milestone in artificial intelligence-based language models

How good is ChatGPT at proving mathematical theorems?

Assessment of ChatGPT’s “proofs”

What does the future hold?

Recent Posts

Meta