Simple proofs: Pi is transcendental


Credit: Vatican Museum

NOTE: A typeset PDF version of this article, with references, is available HERE.

Introduction

The first rigorous mathematical calculation of $\pi$ was due to Archimedes (ca. 250 BCE), who used a scheme of inscribed and circumscribed polygons to obtain the bounds $3 \frac{10}{71} < \pi < 3 \frac{1}{7}$. The fifth century Indian mathematician Aryabhata produced four digits; the Chinese mathematician Tsu Chung-Chih produced seven. Following the discovery of calculus by Newton and Leibniz in the 1600s, several new formulas for $\pi$ were found. Newton published 15 digits, but later sheepishly admitted, "I am ashamed to tell you how many figures I carried these computations, having no other business at the time." Shanks computed 707 digits in 1873, although it was in error after the 527th digit. In the 20th and 21st century, $\pi$ was computed to thousands, then millions, then billions, and, as of the above date, to over 100 trillion decimal digits [Iwao2022].

One motivation for computations of $\pi$ in earlier eras was to see if the decimal expansion repeats, thus disclosing that $\pi$ is the ratio of two integers (although few believed it was). This question was settled in 1768, when Lambert proved that $\pi$ is irrational. Some still wondered whether $\pi$ might be given by an algebraic expression; indeed, countless “proofs” of algebraic values and geometric constructions were received by scholarly journals. This question was finally settled in 1882 when Lindemann proved that $\pi$ is transcendental. His result also settled the ancient Greek question of whether the circle could be squared by ruler and compass: this is impossible, because constructible numbers are given by finite combinations of rationals and square roots and thus are algebraic.

Sadly, even in the 21st century, some continue to assert that the conventional value of $\pi$ is not correct, but instead is given by some simple algebraic expression, such as $17 \, – \, 8 \sqrt{3} = 3.1435935394\ldots$ or $(14 \, – \, \sqrt{2})/4 = 3.1464466094\ldots$. For a list of 11 specific examples of post-2000 papers offering “proofs” of some algebraic value for $\pi$, published in supposedly “peer-reviewed” journals, see [Bailey2017]; most likely there are numerous others.

In an attempt to make the proof of the transcendence of $\pi$ more widely known and understood, and to help counter the current wave of pseudomathematics, we present here an elementary, self-contained proof of this result. It involves only algebra and elementary calculus, and includes some intermediate lemmas, examples and illustrations. This material is based upon, but further elaborates, several earlier works, notably a 1939 paper by Ivan Niven [Niven1939] and a 2006 note by Steve Mayer [Mayer2006], but also a 1949 work by Bourbaki [Bourbaki1949] and a 2017 paper by Ben Blum-Smith and Samuel Coskey [Blum-Smith2017]. See also [Morris2022].

Other articles in this series include:

  1. The irrationality of pi
  2. The fundamental theorem of algebra
  3. The impossibility of trisection
  4. The fundamental theorem of calculus
  5. Archimedes’ calculation of pi
  6. Pi as the limit of n-sided circumscribed and inscribed polygons

Proofs that $\pi$ and $\pi^2$ are irrational

Following Mayer’s exposition, and before proceeding with a proof of transcendence, we first present a proof that $\pi$ is irrational and also that $\pi^2$ is irrational, because these surprisingly simple proofs illustrate a high-level strategy that will be employed later (however, neither of these two results is necessary for the transcendence proof below). The proof that $\pi$ is irrational is adapted from a 1949 Bourbaki work [Bourbaki1949] (see English version at [Pi-irrational2025]), while the proof that $\pi^2$ is irrational is adapted from Mayer [Mayer2006].

Theorem 1: $\pi$ is irrational.

Proof: Assume that $\pi = a/b$ for relatively prime positive integers $a$ and $b$. Define $f_n(x) = x^n (a \, – \, bx)^n / n! = b^n x^n (\pi \, – \, x)^n / n!$ for integer $n \geq 0$, and let $D$ denote differentiation by $x$. Then by using integration by parts,
\begin{align}
A_n \; = \; \int_0^\pi f_n(x) \sin(x) \, {\rm d}x &= \left[-f_n(x) \cos(x)\right]_0^\pi \, – \, \left[-D(f_n(x)) \sin(x)\right]_0^\pi + \cdots \nonumber \\
&{} \hspace{1em} \cdots \pm \left[D^{2n}(f_n(x)) \cos(x)\right]_0^\pi \pm \int_0^\pi D^{2n+1}(f_n(x)) \cos(x) \, {\rm d}x. \tag{1}
\end{align}
For $k = 1$ to $k = n$, the derivative $D^k(f_n (x))$ is zero at $x = 0$ and $x = \pi$; for $k = n + 1$ to $2n$, it takes various integer values at $x = 0$ and $x = \pi$; and for $k = 2n + 1$, it is zero. Also, $\sin(0) = \sin(\pi) = 0$, while $\cos(0) = 1$ and $\cos(\pi) = -1$. Thus it follows that $A_n$ is an integer for all $n \ge 0$. However, one can also write
\begin{align}
A_n &= \int_0^\pi f_n(x) \sin(x) \, {\rm d}x \; = \; \frac{b^n}{n!} \int_0^\pi x^n (\pi \, – \, x)^n \sin(x) \, {\rm d}x. \tag{2}
\end{align}
Note that the integrand here is zero at $x = 0$ and $x = \pi$, but is strictly positive elsewhere in $(0, \pi)$, so that $A_n > 0$. In addition, since $x (\pi \, – \, x) \leq (\pi/2)^2$ for $0 \leq x \leq \pi$, one can write, for all sufficiently large $n$,
\begin{align}
A_n &\leq \frac{\pi b^n}{n!}\left(\frac{\pi}{2}\right)^{2n} \; = \; \frac{\pi (b \pi^2 / 4)^n}{n!} \; < \; 1, \tag{3} \end{align} since for fixed $b$ the $n!$ in the denominator increases faster than the $n$-th power in the numerator. Thus, for all sufficiently large $n$, it follows that $0 < A_n < 1$, contradicting the finding above that $A_n$ is an integer. Thus $\pi$ is irrational.

Theorem 2: $\pi^2$ is irrational.

Proof: Assume that $\pi^2 = a / b$ for positive integers $a$ and $b$, and let $D$ denote differentiation by $x$. Define
\begin{align}
f(x) &= \frac{x^n (1 \, – \, x)^n}{n!} \nonumber \\
g(x) &= b^n \left[\pi^{2n} f(x) \, – \, \pi^{2n-2} D^2(f(x)) + \cdots + (-1)^n \pi^0 D^{2n}(f(x))\right]. \tag{4}
\end{align}
It is clear that $D^{k} (f(x))$, when evaluated at 0 or 1, is zero if $k \leq n$; is some integer for $n+1 \leq k \leq n$; and is zero for $k > 2n$. Also note that $g(0)$ and $g(1)$ are integers. Further,
\begin{align}
D \left[D(g(x)) \sin(\pi x) \, – \, \pi g(x) \cos(\pi x)\right] &= (D^2(g(x)) \, – \, \pi^2 g(x)) \sin (\pi x) \nonumber \\
&= b^n \pi^{2n+2} f(x) \sin(\pi x) \; = \; a^n \pi^2 f(x) \sin (\pi x). \tag{5}
\end{align}
Thus it follows that, for any $n \geq 0$,
\begin{align}
A_n(\alpha) &= \pi \int_0^1 a^n f(x) \sin (\pi x) \, {\rm d} x \; = \; \left[D(g(x)) \sin (\pi x) / \pi \, – \, g(x) \cos (\pi x)\right]_0^1 \nonumber \\
&= {\rm some} \; {\rm integer}. \tag{6}
\end{align}
However, note that the integrand is zero at $x = 0$ and $x = 1$, but is strictly positive elsewhere in $(0,1)$, so that $A_n > 0$, and
\begin{align}
A_n(\alpha) &= \pi \int_0^1 a^n f(x) \sin (\pi x) \, {\rm d} x \; \leq \frac{\pi a^n}{n!}. \tag{7}
\end{align}
So for all sufficiently large $n$, it follows that $0 < A_n < 1$, contradicting the finding above that $A_n$ is an integer. Thus $\pi^2$ is irrational.

Vieta’s theorem and the fundamental theorem of symmetric polynomials

We include here the following two well-known results from polynomial theory that will be employed below:

Theorem 3: (Vieta’s theorem)
If $p(x)$ is a monic polynomial (i.e., the highest-degree coefficient is one) with integer or rational coefficients and degree $m$, and with roots $(\alpha_1, \alpha_2, \cdots, \alpha_m)$, then
\begin{align}
p(x) &= (x \, – \, \alpha_1) (x \, – \, \alpha_2) \cdots (x \, – \, \alpha_m) \nonumber \\
&= x^m \, – \, \sigma_1 x^{m-1} + \sigma_2 x^{m-2} \, – \, \cdots + (-1)^m \sigma_m, \tag{8}
\end{align}
where the $\sigma_k$ are the $m$-variable elementary symmetric polynomials of the $\alpha_k$:
\begin{align}
\sigma_1 &= \sum_{k} \alpha_k \; = \; \alpha_1 + \alpha_2 + \cdots \alpha_m \nonumber \\
\sigma_2 &= \sum_{j < k} \alpha_j \alpha_k \; = \; \alpha_1 \alpha_2 + \alpha_1 \alpha_3 + \cdots + \alpha_{m-1} \alpha_m \nonumber \\ \sigma_3 &= \sum_{i < j < k} \alpha_i \alpha_j \alpha_k \; = \; \alpha_1 \alpha_2 \alpha_3 + \alpha_1 \alpha_2 \alpha_4 + \cdots + \alpha_{m-2} \alpha_{m-1} \alpha_m \nonumber \\ &{} \cdots \nonumber \\ \sigma_m &= \alpha_1 \alpha_2 \cdots \alpha_m. \tag{9} \end{align} If all coefficients of $p(x)$ are integers or rationals, then each $\sigma_k$ is an integer or rational, respectively. Conversely, if all elementary symmetric polynomials of the set $(\alpha_k, 1 \leq k \leq m)$ are integer or rational, then the $\alpha_k$ are the roots of a degree-$m$ polynomial with integer or rational coefficients, respectively.

Proof: The proof is immediate, as shown above.

There are, of course, many symmetric polynomial expressions that are not elementary symmetric polynomials, for example: $x_1^2 + x_2^2$ (two variables); $(x_1 + x_2)^3 + (x_1 + x_3)^3 + (x_2 + x_3)^3$ (three variables); and $(x_1 \, – \, x_2)^2 + (x_1 \, – \, x_3)^2 + (x_1 \, – \, x_4)^2 + (x_2 \, – \, x_3)^2 + (x_2 \, – \, x_4)^2 + (x_3 \, – \, x_4)^2$ (four variables). Can these expressions be written in terms of the elementary symmetric polynomials? In these cases, yes:
\begin{align}
&{} x_1^2 + x_2^2 \; = \; \sigma_1^2 \, – \, 2 \sigma_2 \nonumber \\
&{} (x_1 + x_2)^3 + (x_1 + x_3)^3 + (x_2 + x_3)^3 \; = \; 2\sigma_1^3 \, – \, 3\sigma_1\sigma_2 -3\sigma_3 \nonumber \\
&{} (x_1 \, – \, x_2)^2 + (x_1 \, – \, x_3)^2 + (x_1 \, – \, x_4)^2 + (x_2 \, – \, x_3)^2 + (x_2 \, – \, x_4)^2 + (x_3 \, – \, x_4)^2 \; = \; 3 \sigma_1^2 \, – \, 8 \sigma_2. \tag{10} \label{form:reduce}
\end{align}
According to the fundamental theorem of symmetric polynomials, this type of reduction is always possible. The proof here follows a paper by Ben Blum-Smith and Samuel Coskey [Blum-Smith2017], who in turn cite Bernd Sturmfels [Sturmfels2008], but note that the basic idea appeared in the writings of Gauss.

Theorem 4: (Fundamental theorem of symmetric polynomials)
Any symmetric multivariate polynomial can be written uniquely as a sum of the elementary symmetric polynomials.

Proof: Let $f(x)$ be the symmetric polynomial in $m$ variables $x_1, x_2, \cdots, x_m$ [here $x$ alone will be shorthand for the list $(x_1, x_2, \cdots, x_m)$]. The set of terms of $f(x)$ of a given degree is itself a symmetric polynomial, and clearly if each such set of terms can be represented with elementary symmetric polynomials, then $f(x)$ can also be represented. Thus one can safely assume that all terms of $f(x)$ have the same degree.

The key strategy here is to order the terms of $f(x)$ lexicographically: place the term with the highest power of $x_1$ first; if multiple terms have the same power of $x_1$, select the term with the highest power of $x_2$; if multiple terms have the same powers of $x_1$ and $x_2$, select the term with the highest power of $x_3$, etc.

Because $f(x)$ is symmetric, if $c x_1^{i_1} x_2^{i_2} \cdots x_m^{i_m}$ is one term, then $f(x)$ also contains all terms with permuted exponents (these are the term’s conjugates). By the ordering above, the leading term of $f(x)$ has $i_1 \geq i_2 \geq \cdots \geq i_m$. Now define
\begin{align}
g_1(x) = c_1 \sigma_1^{i1-i2} \sigma_2^{i2-i3} \cdots \sigma_{m-1}^{i_{m-1}-i_m} \sigma_m^{i_m}. \tag{11}
\end{align}
Clearly $g_1(x)$ is elementary symmetric, and has the same leading term as $f(x)$, so that $f(x) \, – \, g_1(x)$ is symmetric with a leading term that is lower in the lexicographical “degree.” Continuing in this manner, one eventually obtains $f(x) \, – \, g_1(x) \, – \, g_2(x) \, – \, \cdots \, – \, g_n(x) = 0$, for some $n$. Then $f(x) = g_1(x) + g_2(x) + \cdots + g_n(x)$ is the required representation in terms of elementary symmetric polynomials. With some additional effort, it can be established that this representation is unique, although since the transcendence proof below does not require this fact, the uniqueness proof is omitted here. For a more detailed exposition of this proof, as well as an interesting alternative proof, see [Blum-Smith2017].

By the way, the reduction of a symmetric polynomial into elementary symmetric polynomials is implemented in many computer algebra systems, for instance with the Mathematica function SymmetricReduction. In fact, the reductions listed above in \eqref{form:reduce} were obtained using SymmetricReduction.

The transcendence of $\pi$

As noted above, the transcendence of $\pi$ was first proved by Lindemannn in 1882 [Lindemann1882]. In particular, he proved that if $\alpha$ is a nonzero algebraic number (real or complex), then $e^\alpha$ is transcendental. Then since $e^{i\pi} = -1$, it follows that $\pi$ cannot be algebraic. In 1885, Weierstrass proved the more general fact that if $(\alpha_a, \alpha_2, \cdots, \alpha_m)$ are algebraic and linearly independent over the rationals, then $(e^{\alpha_1}, e^{\alpha_2}, \cdots, e^{\alpha_m})$ are algebraically independent.

Many contemporary proofs of the transcendence of $\pi$ simply note that this is a consequence of the Lindemann-Weierstrass theorem; see, for example [Lindemann2025]. However, in the present author’s opinion, the proof of the Lindemann-Weierstrass theorem is significantly more complicated than a direct proof of the transcendence of $\pi$, which is presented here. This is adapted from an elegant 1939 proof by Niven [Niven1939] (and mostly uses his notation) and from Mayer’s note [Mayer2006], but includes some additional illustration and elaboration.

Theorem 5: $\pi$ is transcendental.

Proof: Assume $\pi$ is algebraic. Then $i \pi$ is algebraic also. In other words, $i \pi$ is the root of an algebraic equation $\theta_1(x) = 0$ with integer coefficients (not necessarily monic) and with roots $(\alpha_1, \alpha_2, \cdots, \alpha_m)$ for some integer $m$, where $\alpha_1 = i \pi$. From Euler’s formula $e^{i\pi} + 1 = 0$, we have $1 + e^{\alpha_1} = 0$, so that
\begin{align}
0 &= (1 + e^{\alpha_1}) (1 + e^{\alpha_2}) \cdots (1 + e^{\alpha_m}) \nonumber \\
&= 1 + (e^{\alpha_1} + e^{\alpha_2} + \cdots + e^{\alpha_m}) + (e^{\alpha_1 + \alpha_2} + e^{\alpha_1 + \alpha_3} + \cdots + e^{\alpha_{m-1} + \alpha_m}) \nonumber \\
&{} \hspace{1em} + (e^{\alpha_1 + \alpha_2 + \alpha_3} + e^{\alpha_1 + \alpha_2 + \alpha_4} + \cdots + e^{\alpha_{m-2} + \alpha_{m-1} + \alpha_m}) \nonumber \\
&{} \hspace{1em} + \cdots + e^{\alpha_1 + \alpha_2 + \cdots + \alpha_m}. \tag{12} \label{form:F1}
\end{align}

We now employ a clever scheme, from Niven’s paper, to construct an algebraic equation with rational coefficients whose roots are precisely the exponents in the expanded form of \eqref{form:F1}. Note that since the $\alpha_k$ are the roots of $\theta_1(x) = 0$, it follows from Theorem 3 that the elementary symmetric polynomials of $(\alpha_1, \alpha_2, \cdots, \alpha_m)$ evaluate to rationals. Now consider the set of roots summed two at a time:
\begin{align}
S_2 &= (\alpha_i + \alpha_j, i < j \leq m) \, = \, (\alpha_1 + \alpha_2, \alpha_1 + \alpha_3, \cdots, \alpha_1 + \alpha_m, \cdots, \alpha_{m-1} + \alpha_m). \tag{13} \end{align} Since the symmetric polynomials of the entries of $S_2$ are also symmetric in the $\alpha_k$, it follows by Theorem 4 that they evaluate to rationals, and thus these rationals constitute the coefficients (except for signs, as in Vieta's theorem) of an algebraic equation $\theta_2(x) = 0$ whose roots are precisely the two-at-a-time sums in $S_2$. We do the same with the set of three-at-a-time sums $S_3 = (\alpha_i + \alpha_j + \alpha_k, i < j < k)$, producing an algebraic equation $\theta_3(x) = 0$ whose roots are the sums in $S_3$, continuing to $S_m$ and $\theta_m(x) = 0$. It follows that the roots of the product equation \begin{align} \theta_1(x) \theta_2(x) \cdots \theta_m(x) &= 0 \tag{14} \label{form:F2} \end{align} are precisely the exponents in the expansion of \eqref{form:F1}. Deleting roots that are zero, if any, yields an equation \begin{align} \theta(x) &= c_0 x^r + c_1 x^{r-1} + \cdots + c_r = 0, \tag{15} \label{form:F3} \end{align} whose roots $\beta_1, \beta_2, \cdots, \beta_r$ are the nonzero exponents in the expanded form of \eqref{form:F1}, and whose coefficients are integers, with $c_0$ and $c_r$ nonzero. Thus the expanded form of \eqref{form:F1} may be written \begin{align} e^{\beta_1} + e^{\beta_2} + \cdots + e^{\beta_r} + k = 0, \tag{16} \label{form:F4} \end{align} for some positive integer $k$. Note $k \neq 0$ since at least the first term of the expanded \eqref{form:F1} is $1$.

Hereafter let $c = c_0$ be the high-order coefficient in \eqref{form:F3}. Define
\begin{align}
f(x) &= \frac{c^s x^{p \, – \, 1} [\theta(x)]^p}{(p \, – \, 1)!}, \tag{17} \label{form:F5}
\end{align}
where the integer $s = r p \, – \, 1$, and $p$ is a prime to be chosen below, and define
\begin{align}
F(x) &= f(x) + D(f(x)) + D^2(f(x)) + \cdots + D^{s+p}(f(x)), \tag{18} \label{form:F6}
\end{align}
where $D$ denotes differentiation by $x$. Note that $D(e^{-x} F(x)) = – e^{-x} f(x)$. Thus
\begin{align}
e^{-x} F(x) \, – \, F(0) &= – \int_0^x e^{-t} f(y) \, {\rm d}y, \tag{19}
\end{align}
which, after substituting $y = \lambda x$ and rearranging becomes
\begin{align}
F(x) \, – \, e^x F(0) &= -x \int_0^1 e^{1 \, – \, \lambda} f(\lambda x) \, {\rm d} \lambda. \tag{20}
\end{align}
Recall that $\sum_{1 \leq j \leq r} e^{\beta_j} + k = 0$ from \eqref{form:F4} above. Thus by substituting $x = b_j$ and summing, we can write
\begin{align}
\sum_{j=1}^r F(\beta_j) + k F(0) = – \sum_{j=1}^r \int_0^1 e^{(1 \, – \, \lambda) \beta_j} f(\lambda \beta_j) \, {\rm d}\lambda. \tag{21} \label{form:F7}
\end{align}

The result \eqref{form:F7} leads to a contradiction, because it will be shown below that for suitably large prime $p$, the left-hand expression is a nonzero integer, whereas the right-hand expression is arbitrarily small. First, note that by \eqref{form:F5},
\begin{align}
\sum_{j=1}^r D^t(f(x))|_{x=\beta_j} = 0 \quad {\rm for} \quad 0 \leq t < p. \tag{22} \label{form:F8} \end{align} Further, the polynomial $g(x) = (p - 1)! f(x)$ has integer coefficients. Since the product of $p$ consecutive integers is always divisible by $p!$, the $p$-th and higher-order derivatives of $g(x)$ are polynomials in $x$ with integer coefficients divisible by $p!$. Thus the $p$-th and higher-order derivatives of $f(x)$ are polynomials with integer coefficients divisible by $p$. It is also clear that each of these coefficients is divisible by $c^s$.

Thus for $t \geq p$, it follows that $D^t(f(x))|_{x=\beta_j}$ is a polynomial in $\beta_j$ with degree at most $s$, and with coefficients divisible by $p c^s$. By \eqref{form:F3}, any symmetric function of $(\beta_1, \beta_2, \cdots, \beta_r)$ with integer coefficients and degree at most $s$ evaluates to an integer, provided that each coefficient is divisible by $c^s$, which it is. Thus
\begin{align}
\sum_{j=1}^r D^t(f(x))|_{x=\beta_j} &= p k_t \quad {\rm for} \quad p \leq t \leq p + s, \tag{23} \label{form:F9}
\end{align}
where $k_t$ are integers. Combining \eqref{form:F8} and \eqref{form:F9} and summing over $t$, we obtain
\begin{align}
\sum_{j=1}^r F(\beta_j) &= p \sum_{t=p}^{p+s} k_t. \tag{24}
\end{align}
From \eqref{form:F5} and \eqref{form:F6}, it follows that
\begin{align}
D^t(f(x))|_{x=0} &= \left\{ \begin{array}{lll} 0 & {\rm for} & 0 \leq t \leq p \, – \, 2 \\ c^s c_r^p & {\rm for} & t = p \, – \, 1 \\ p q_t & {\rm for} & p \leq t \leq p + s, \\ \end{array} \right. \tag{25}
\end{align}
for some integers $q_t$, so $F(0) = p q + k c^s c_r^p$, for some integer $q$. Thus the left-hand side of \eqref{form:F7} is $p q’ + k c^s c_r^p$, for some integer $q’$. Recall that $k, c, c_r \neq 0$. Thus $k c^s c_r^p$ is not divisible by $p$ provided one chooses the prime $p > k, c$ and $c_r$. This means that the left-hand side of \eqref{form:F7} is a nonzero integer. But the right-hand side of \eqref{form:F7} can be written
\begin{align}
– \sum_{j=1}^r \int_0^1 \frac{\left[c^r \beta_j \, \theta(\tau \beta_j)\right]^p e^{(1-\tau)\beta_j}} {c (p \, – \, 1)!} \, {\rm d}\tau, \tag{26}
\end{align}
which is arbitrarily small for all sufficiently large $p$. This concludes the proof.

Comments are closed.