Introduction

Archimedes is widely regarded as the greatest mathematician of antiquity. He was a pioneer of applied mathematics, for instance with his discovery of the principle of buoyancy, and a master of engineering designs, for instance with his “screw” to raise water from one level to another. But his most far-reaching discovery was the “method of exhaustion,” which he used to deduce the area of a circle, the surface area and volume of a sphere and the area under a parabola. Indeed, with this method Archimedes anticipated, by nearly

Continue reading Simple proofs: Archimedes’ calculation of pi

]]>**Introduction**

Archimedes is widely regarded as the greatest mathematician of antiquity. He was a pioneer of applied mathematics, for instance with his discovery of the principle of buoyancy, and a master of engineering designs, for instance with his “screw” to raise water from one level to another. But his most far-reaching discovery was the “method of exhaustion,” which he used to deduce the area of a circle, the surface area and volume of a sphere and the area under a parabola. Indeed, with this method Archimedes anticipated, by nearly 2000 years, the development of calculus in the 17th century by Leibniz and Newton. For additional details, see the Wikipedia article.

In this article, we present Archimedes’ ingenious method to calculate the perimeter and area of a circle, while taking advantage of a much more facile system of notation (algebra), a much more facile system of calculation (decimal arithmetic and computer technology), and a much better-developed framework for rigorous mathematical proof. For a step-by-step presentation of Archimedes’ actual computation, see this article by Chuck Lindsey.

One motivation for presenting this material is that a surprisingly large fraction of treatments the present author has seen are either incomplete, deficient in rigor or assume some concept or technique (such as radian measure or other facts about trig functions) that presupposes properties of $\pi$. This presentation aims to avoid such missteps.

**The Pi denial movement**

But another motivation is to counter the rise of what might sadly be termed the “$\pi$ denial movement”: the growing numbers of writers who reject basic mathematical theory and the numerical value of $\pi$, proclaiming instead that they have found $\pi$ to be some other value. For example, one author, in a supposedly peer-reviewed (!) article, asserts that $\pi = 17 – 8 \sqrt{3} = 3.1435935394\ldots$. Another author, in another supposedly peer-reviewed (!) article, asserts that $\pi = (14 – \sqrt{2}) / 4 = 3.1464466094\ldots$. A third person promises to reveal an “exact” value of $\pi$, differing significantly from the accepted value, on 13 March 2019. For other examples, see this Math Scholar blog. The well-known fact that $\pi$ cannot possibly be algebraic or any other variant value does not seem to impress these writers.

Thus this material attempts to demonstrate, as simply and concisely as possible, why these many instances of $\pi$ denial are utterly indefensible. To that end, the material below requires no mathematical background beyond very basic algebra, trigonometry and the Pythagorean theorem, and scrupulously avoids advanced analysis or any reasoning that depends on properties of $\pi$. Along this line, traditional degree notation is used for angles instead of radian measure customary in professional research work, both to make the presentation easier follow and also to avoid any concepts or techniques that might be viewed as dependent on $\pi$.

We start by establishing some basic identities, deriving them from first principles, in other words assuming only the definitions of the trigonometric functions sine, cosine and tangent, together with the Pythagorean theorem. Readers who are familiar with the proofs of these identities may skip to the next section.

**LEMMA 1 (Double-angle and half-angle formulas)**: The double angle formulas are $\sin(2\alpha) = 2 \cos(\alpha) \sin(\alpha), \; \cos(2\alpha) = 1 – 2 \sin^2(\alpha)$ and $\tan(2\alpha) = 2 \tan(\alpha) / (1 – \tan^2(\alpha))$. The corresponding half-angle formulas are $$\sin(\alpha/2) = \sqrt{(1 – \cos(\alpha))/2}, \;\; \cos(\alpha/2) = \sqrt{(1 + \cos(\alpha))/2}, \;\; \tan(\alpha/2) = \frac{\sin(\alpha)}{1 + \cos(\alpha)} = \frac{\tan(\alpha)\sin(\alpha)}{\tan(\alpha) + \sin(\alpha)},$$ however note that the first two of these are valid only for $0 \le \alpha \leq 180^\circ$, because of the ambiguity of the sign when taking a square root.

**Proof**: We first establish some more general results: $$\sin (\alpha + \beta) = \sin (\alpha) \cos (\beta) + \cos (\alpha) \sin (\beta),$$ $$\cos (\alpha + \beta) = \cos (\alpha) \cos (\beta) – \sin (\alpha) \sin (\beta),$$ $$\tan(\alpha + \beta) = \frac{\tan(\alpha) + \tan(\beta)}{1 – \tan(\alpha)\tan(\beta)}.$$ The formula for $\sin(\alpha + \beta)$ has a simple geometric proof, based only on the Pythagorean formula and simple rules of right triangles, which is illustrated to the right (here $OP = 1$). First note that $RPQ = \alpha, \, PQ = \sin(\beta)$ and $OQ = \cos (\beta)$. Further, $AQ/OQ = \sin(\alpha)$, so $AQ = \sin(\alpha) \cos(\beta)$, and $PR/PQ = \cos(\alpha)$, so $PR = \cos(\alpha) \sin(\beta)$. Combining these results, $$\sin(\alpha + \beta) = PB = RB + PR = AQ + PR = \sin(\alpha) \cos(\beta) + \cos(\alpha) \sin(\beta).$$ The proof of the formula for the cosine of the sum of two angles is entirely similar, and the formula for $\tan(\alpha + \beta)$ is obtained by dividing the formula for $\sin(\alpha + \beta)$ by the formula for $\cos(\alpha + \beta)$, followed by some simple algebra. See this Wikipedia article, from which the above illustration and proof were taken, for additional details.

Setting $\alpha = \beta$ in the above double-angle formulas yields $\sin(2\alpha) = 2 \cos(\alpha) \sin(\alpha), \, \cos(2\alpha) = \cos^2(\alpha) – \sin^2(\alpha) = 1 – 2 \sin^2(\alpha)$, and $\tan(2\alpha) = \sin(\alpha)/(1 + \cos(\alpha)) = \tan(\alpha)\sin(\alpha)/(\tan(\alpha) + \sin(\alpha))$. The half-angle formulas can then easily be derived by simple algebra. For example, from $\cos(\alpha) = 1 – 2 \sin^2(\alpha/2)$ we can write $2 \sin^2(\alpha/2) = 1 – \cos(\alpha)$, from which we deduce $\sin(\alpha/2) = \sqrt{(1 – \cos(\alpha))/2}$ (however, as noted before, this formula is only valid for $0 \leq \alpha \leq 180^\circ$, because of the ambiguity in the sign when taking a square root).

**Archimedes’ algorithm for approximating Pi**

With this background, we are now able to present Archimedes’ algorithm for approximating $\pi$. Consider the case of a circle with radius one (see diagram). We see that each side of a regular inscribed hexagon has length one, and thus, of course, each half-side has length one-half. This is reflected in the formula $\sin(30^\circ) = 1/2$, a formula which in effect is proven by this diagram. Note that by applying the identity $\cos^2(\alpha) = 1 – \sin^2(\alpha)$, we obtain $\cos(30^\circ) = \sqrt{3}/2 = 0.866025\ldots$, and also that $\tan(30^\circ) = \sin(30^\circ)/\cos(30^\circ) = \sqrt{3}/3 = 0.577350\ldots$.

Let $a_1$ be the semi-perimeter of the regular circumscribed hexagon of a circle with radius one, and let $b_1$ denote the semi-perimeter of the regular inscribed hexagon. By examining the figure, we see each of the six equilateral triangles in the circumscribed hexagon has base $= 2 \tan{30^\circ} = 2 \sqrt{3}/3$. Thus $a_1 = 6 \tan(30^\circ) = 2\sqrt{3} = 3.464101\ldots$. Each of the six equilateral triangles in the inscribed hexagon has base $= 2 \sin(30^\circ) = 1$, so that $b_1 = 6 \sin(30^\circ) = 3$. In a similar fashion, let $c_1$ be the area of the regular circumscribed hexagon of a circle with radius one, and let $d_1$ denote the area of the regular inscribed hexagon. Since the altitude of each section of the circumscribed hexagon is one, $c_1 = a_1 = 2\sqrt{3} = 3.464101\ldots$. Since the altitude of each section of the inscribed hexagon is $\cos(30^\circ)$, $d_1 = 6 \sin(30^\circ) \cos(30^\circ) = 2.598076\ldots$.

Now consider a $12$-sided regular circumscribed polygon of a circle with radius one, and a $12$-sided regular inscribed polygon. Their semi-perimeters will be denoted $a_2$ and $b_2$, respectively, and their full areas will be denoted $c_2$ and $d_2$, respectively. The angles are halved, but the number of sides is doubled. Thus $a_2 = 12 \tan(15^\circ), \, b_2 = 12 \sin(15^\circ), \, c_2 = a_2 = 12 \tan(15^\circ)$ and $d_2 = 12 \sin(15^\circ) \cos(15^\circ)$, the latter of which, by applying the double angle formula for sine from Lemma 1, can be written as $d_2 = 6 \sin(30^\circ) = b_1$. Applying the half-angle formulas from Lemma 1, we obtain $a_2 = 12 (2 – \sqrt{3}) = 3.215390\ldots, \; b_2 = 3 (\sqrt{6} – \sqrt{2}) = 3.105828\ldots, \; c_2 = a_2 = 3.215390\ldots$ and $d_2 = b_1 = 3$.

In general, after $k$ steps of doubling, denote the semi-perimeters of the regular circumscribed and inscribed polygons for a circle of radius one with $3 \cdot 2^k$ sides as $a_k$ and $b_k$, respectively, and denote the full areas as $c_k$ and $d_k$, respectively. As before, because the altitudes of the triangles in the circumscribed polygons always have length one, $c_k = a_k$ for each $k$. Also, as before, after applying the double-angle identity for sine from Lemma 1, we can write $d_k = 3 \cdot 2^k \sin(60^\circ/2^k) \cos(60^\circ/2^k) = 3 \cdot 2^{k-1} \sin(60^\circ/2^{k-1}) = b_{k-1}$. In summary, let $\theta_k = 60^\circ/2^k$. Then $$a_k = 3 \cdot 2^k \tan(\theta_k), \; b_k = 3 \cdot 2^k \sin(\theta_k), \; c_k = a_k, \; d_k = b_{k-1}.$$

**THEOREM 1 (The Archimedean iteration for Pi)**: Define the sequences of real numbers $A_k, \, B_k$ by the following: $A_1 = 2 \sqrt{3}, \, B_1 = 3$. Then, for $k \ge 1$, set $$A_{k+1} = \frac{2 A_k B_k}{A_k + B_k}, \quad B_{k+1} = \sqrt{A_{k+1} B_k}.$$ Then for all $k \ge 1$, we have $A_k = a_k$ and $B_k = b_k$, as given by the formulas above.

**Proof**: $A_1 = a_1$ and $B_1 = b_1$, so the result is true for $k = 1$. By induction, assume the result is true up to some $k$. Then we can write, recalling the formula $\tan(\alpha/2) = \tan(\alpha)\sin(\alpha)/(\tan(\alpha) + \sin(\alpha))$ from Lemma 1, $$A_{k+1} = \frac{2 A_k B_k}{A_k + B_k} = \frac{2 \cdot 3 \cdot 2^k \tan(\theta_k) \cdot 3 \cdot 2^k \sin(\theta_k)}{3 \cdot 2^k \tan(\theta_k) + 3 \cdot 2^k \sin(\theta_k)} = 3 \cdot 2^{k+1} \tan(\theta_k/2) = 3 \cdot 2^{k+1} \tan(\theta_{k+1}) = a_{k+1}.$$ Similarly, recalling the identity $\sin(2\alpha) = 2 \sin(\alpha) \cos(\alpha)$ from Lemma 1, we can write $$B_{k+1} = \sqrt{A_{k+1} B_k} = \sqrt{9 \cdot 2^{2k+1} \tan(\theta_{k+1}) \sin(\theta_k)} = \sqrt{9 \cdot 2^{2k+2} \tan(\theta_{k+1}) \sin(\theta_{k+1}) \cos(\theta_{k+1})},$$ $$ = \sqrt{9 \cdot 2^{2k+2} \sin^2(\theta_{k+1})} = 3 \cdot 2^{k+1} \sin(\theta_{k+1}) = b_{k+1}.$$

**Computations using the Archimedean iteration**

We are now able to directly compute some approximations to $\pi$, using only the formulas of Theorem 1. These results are shown in the table to 16 digits after the decimal point, but were performed using 50-digit precision arithmetic to rule out any possibility of numerical round-off error corrupting the table results.

Iteration | Sides | ||||

$k$ | $3 \cdot 2^k$ | ||||

1 | 6 | 3.4641016151377545 | 3.0000000000000000 | 3.4641016151377545 | 2.5980762113533159 |

2 | 12 | 3.2153903091734724 | 3.1058285412302491 | 3.2153903091734724 | 3.0000000000000000 |

3 | 24 | 3.1596599420975004 | 3.1326286132812381 | 3.1596599420975004 | 3.1058285412302491 |

4 | 48 | 3.1460862151314349 | 3.1393502030468672 | 3.1460862151314349 | 3.1326286132812381 |

5 | 96 | 3.1427145996453682 | 3.1410319508905096 | 3.1427145996453682 | 3.1393502030468672 |

6 | 192 | 3.1418730499798238 | 3.1414524722854620 | 3.1418730499798238 | 3.1410319508905096 |

7 | 384 | 3.1416627470568485 | 3.1415576079118576 | 3.1416627470568485 | 3.1414524722854620 |

8 | 768 | 3.1416101766046895 | 3.1415838921483184 | 3.1416101766046895 | 3.1415576079118576 |

9 | 1536 | 3.1415970343215261 | 3.1415904632280500 | 3.1415970343215261 | 3.1415838921483184 |

10 | 3072 | 3.1415937487713520 | 3.1415921059992715 | 3.1415937487713520 | 3.1415904632280500 |

11 | 6144 | 3.1415929273850970 | 3.1415925166921574 | 3.1415929273850970 | 3.1415921059992715 |

12 | 12288 | 3.1415927220386138 | 3.1415926193653839 | 3.1415927220386138 | 3.1415925166921574 |

13 | 24576 | 3.1415926707019980 | 3.1415926450336908 | 3.1415926707019980 | 3.1415926193653839 |

14 | 49152 | 3.1415926578678444 | 3.1415926514507676 | 3.1415926578678444 | 3.1415926450336908 |

15 | 98304 | 3.1415926546593060 | 3.1415926530550368 | 3.1415926546593060 | 3.1415926514507676 |

16 | 196608 | 3.1415926538571714 | 3.1415926534561041 | 3.1415926538571714 | 3.1415926530550368 |

17 | 393216 | 3.1415926536566377 | 3.1415926535563709 | 3.1415926536566377 | 3.1415926534561041 |

18 | 786432 | 3.1415926536065043 | 3.1415926535814376 | 3.1415926536065043 | 3.1415926535563709 |

19 | 1572864 | 3.1415926535939710 | 3.1415926535877043 | 3.1415926535939710 | 3.1415926535814376 |

20 | 3145728 | 3.1415926535908376 | 3.1415926535892710 | 3.1415926535908376 | 3.1415926535877043 |

As can be easily seen, each of these columns converges quickly to the well-known value of $\pi$. In the final row of the table, which presents results for circumscribed and inscribed polygons with 3,145,728 sides, all four entries agree to ten digits after the decimal point: $3.1415926535\ldots$. Note, by the way, that *both of the two variant values of $\pi$ mentioned in the Pi denial section above are excluded by iteration four*. There is no escaping these calculations — the variant values for $\pi$ are simply wrong.

We will now rigorously prove that the Archimedean iteration converges to $\pi$ in both the circumference and area senses, again relying only on first-principles reasoning.

**THEOREM 2 (Pi as the limit of of circumscribed and inscribed polygons)**:

**Theorem 2a**: As the index $k$ increases, the limit of semi-perimeters of circumscribed and inscribed regular polygons with $3 \cdot 2^k$ sides, for a circle of radius one, is a common value, which we may define as $\pi$.

**Theorem 2b**: As the index $k$ increases, the limit of areas of circumscribed and inscribed regular polygons with $3 \cdot 2^k$ sides, for a circle of radius one, is a common value, which value is exactly equal to $\pi$ as defined in Theorem 2a.

**Proof**: Recall that $$a_k = 3 \cdot 2^k \tan(\theta_k), \; b_k = 3 \cdot 2^k \sin(\theta_k), \; c_k = a_k, \; d_k = b_{k-1}.$$ First note that since all $\theta_k \gt 0$, all $\cos(\theta_k) \lt 1$ or, in other words, $1 – \cos(\theta_k) \gt 0$. Then we can write $$a_{k} – a_{k+1} = 3 \cdot 2^k \tan(\theta_k) – 3 \cdot 2^{k+1} \tan(\theta_{k+1}) = 3 \cdot 2^k \left(\tan(\theta_k) – \frac{2 \sin(\theta_k)}{1 + \cos(\theta_k)}\right) = \frac{3 \cdot 2^k \tan(\theta_k) (1 – \cos(\theta_k))}{1 + \cos(\theta_k)} \gt 0, $$ $$b_{k+1} – b_k = 3 \cdot 2^{k+1} \sin(\theta_{k+1}) – 3 \cdot 2^k \sin(\theta_k) = 3 \cdot 2^{k+1} (\sin(\theta_{k+1}) – \sin(\theta_{k+1}) \cos(\theta_{k+1})) = 3 \cdot 2^{k+1} \sin(\theta_{k+1})(1 – \cos(\theta_{k+1})) \gt 0,$$ $$a_k – b_k = 3 \cdot 2^k (\tan(\theta_k) – \sin(\theta_k)) = 3 \cdot 2^k \tan(\theta_k) (1 – \cos(\theta_k)) \gt 0.$$ Thus $a_k$ is a strictly decreasing sequence, $b_k$ is a strictly increasing sequence, and each $a_k \gt b_k$. If $k \le m$, then $a_k \ge a_m \gt b_m$, so $a_k \gt b_m$. Thus all $a_k$ are strictly greater than all $b_k$. In particular, since $a_1 = 2 \sqrt{3} \lt 4$, this means that all $a_k \lt 4$ and thus all $b_k \lt 4$. Similarly, since $b_1 = 3$, all $b_k \ge 3$ and thus all $a_k \gt 3$. Also, since $\theta_1 = 30^\circ$ and all $\theta_k$ for $k \gt 1$ are smaller than $\theta_1$, this means that $\cos(\theta_k) \gt 1/2$ for all $k$. Now we can write, starting from the expression a few lines above for $a_k – b_k$, $$a_k – b_k = 3 \cdot 2^k \tan(\theta_k) (1 – \cos(\theta_k)) = \frac{3 \cdot 2^k \tan(\theta_k) \sin^2(\theta_k)}{1 + \cos(\theta_k)} \le 3 \cdot 2^k \tan(\theta_k) \sin^2(\theta_k)$$ $$= \frac{3 \cdot 2^k \sin^3(\theta_k)}{\cos(\theta_k)} \le 2 \cdot 3 \cdot 2^k \sin^3(\theta_k) = \frac{2 (3 \cdot 2^{k})^3 \sin^3(\theta_k)}{(3 \cdot 2^{k})^2} = \frac{2 b_k^3}{9 \cdot 4^k} \le \frac{128}{9 \cdot 4^k},$$ so that the difference between the circumscribed and inscribed semi-perimeters decreases by roughly a factor of four with each iteration (as is also seen in the table above).

A fundamental axiom of the real numbers is “Every sequence that is bounded above has a least upper bound,” and, equivalently, “Every sequence that is bounded below has a greatest lower bound.” Recall from the above that all $b_k \lt 4$, so that $(b_k)$ are bounded above, and all $a_k \ge 3$, so that $(a_k)$ are bounded below. Since for any $\epsilon \gt 0$ and all sufficiently large $k$, $a_k – b_k \lt \epsilon$, it follows that the greatest lower bound of the circumscribed semi-perimeters $a_k$ is exactly equal to the least upper bound of the inscribed semi-perimeters $b_k$, so that the common limit can be defined as $\pi$.

For Theorem 2b, the difference between the circumscribed and inscribed areas is $$c_k – d_k = 3 \cdot 2^k (\tan(\theta_k) – \sin(\theta_k)\cos(\theta_k)) = 3 \cdot 2^k \left(\frac{\sin(\theta_k)}{\cos(\theta_k)} – \sin(\theta_k) \cos(\theta_k)\right) $$ $$= \frac{3 \cdot 2^k \sin(\theta_k) (1 – \cos^2(\theta_k))}{\cos(\theta_k)} = \frac{3 \cdot 2^k \sin^3(\theta_k)}{\cos(\theta_k)} \le \frac{128}{9 \cdot 4^k},$$ since the final inequality was established a few lines above. As before, it follows that the greatest lower bound of the circumscribed areas $c_k$ is exactly equal to the least upper bound of the inscribed areas $d_k$. Furthermore, since the sequence $a_k$ of semi-perimeters of the circumscribed polygons is *exactly the same* as the sequence $c_k$ of areas of the circumscribed polygons, we conclude that the common limit of the areas is identical to the common limit of the semi-perimeters, namely $\pi$. This completes the proof.

**Other formulas and algorithms for Pi**

We note in conclusion that Archimedes’ scheme is just one of many formulas and algorithms for $\pi$. The present author has produced a collection of approximately 80 such formulas and algorithms. One, for instance, is the Borwein quartic algorithm: Set $a_0 = 6 – 4\sqrt{2}$ and $y_0 = \sqrt{2} – 1$. Iterate, for $k \ge 0$, $$y_{k+1} = \frac{1 – (1 – y_k^4)^{1/4}}{1 + (1 – y_k^4)^{1/4}},$$ $$a_{k+1} = a_k (1 + y_{k+1})^4 – 2^{2k+3} (1 + y_{k+1} + y_{k+1}^2).$$ Then $1/a_k$ converges quartically to $\pi$: each iteration approximately *quadruples* the number of correct digits. Just three iterations yield 171 correct digits, which are as follows: $$3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482$$ $$534211706798214808651328230664709384460955058223172535940812848111745028410270193\ldots$$

**Other posts in the “Simple proofs” series**

The other posts in the “Simple proofs of great theorems” series are available Here.

]]>Introduction: The fundamental theorem of calculus, namely the fact that integration is the inverse of differentiation, is indisputably one of the most important results of all mathematics, with applications across the whole of modern science and engineering. It is not an exaggeration to say that our entire modern world hinges on the fundamental theorem of calculus. It has applications in astronomy, astrophysics, quantum theory, relativity, geology, biology, economics, just to name a few fields of science, as well as countless applications in all types of engineering — civil,

Continue reading Simple proofs: The fundamental theorem of calculus

]]>**Introduction:**

The fundamental theorem of calculus, namely the fact that integration is the inverse of differentiation, is indisputably one of the most important results of all mathematics, with applications across the whole of modern science and engineering. It is not an exaggeration to say that our entire modern world hinges on the fundamental theorem of calculus. It has applications in astronomy, astrophysics, quantum theory, relativity, geology, biology, economics, just to name a few fields of science, as well as countless applications in all types of engineering — civil, mechanical, electrical, electronic, chip manufacture, aerospace, medical and more.

The fundamental theorem of calculus was first stated and proved in rudimentary form in the 1600s by James Gregory, and, in improved form, by Isaac Barrow, while Gottfried Leibniz coined the notation and theoretical framework that we still use today.

But it was Isaac Newton who grasped the full impact of the theorem and applied it to unravel both the cosmos and the everyday world. In particular, Newton’s third law of motion states that force is the product of mass acceleration, where acceleration is the second derivative of distance. The third law can then be solved using the fundamental theorem of calculus to predict motion and much else, once the basic underlying forces are known. Ernst Mach, writing in 1901, graciously acknowledged Newton’s contributions to science in general, and his to calculus in particular, in these terms:

All that has been accomplished in mathematics since his day has been a deductive, formal, and mathematical development of mechanics on the basis of Newton’s laws.

We present here a *rigorous* and *self-contained* proof of the fundamental theorem of calculus (Parts 1 and 2), including proofs of necessary underlying lemmas such as the fact that a continuous function on a closed interval is integrable. These proofs are based only on elementary algebra and some basic completeness axioms of real numbers, and thus are suitable for anyone with a high school background in mathematics, although some familiarity with limits, inequalities, derivatives and integrals is required. We believe this exposition to be significantly more concise than most textbook treatments, and thus easier to grasp.

**Definitions: Continuous functions, derivatives and integrals.**

A function $f(t)$ defined on a closed interval $[a,b]$ is *continuous* at a point $t \in [a,b]$ if, given any $\epsilon \gt 0$, there is a $\delta \gt 0$ such that $|f(s) – f(t)| \lt \epsilon$ for all $s \in [a,b]$ with $|s – t| \lt \delta$. The function $f(t)$ is *uniformly continuous* on the closed interval $[a,b]$ if, given any $\epsilon \gt 0$, there exists a $\delta \gt 0$ (independent of $t$) such that $|f(s) – f(t)| \lt \epsilon$ for all $s, t \in [a,b]$ with $|s – t| \lt \delta$.

Given a continuous function $f(t)$ on $[a,b]$, the *derivative* $f'(t)$ is defined for $t \in (a,b)$ as the limit, if it exists,

$$f'(t) = \lim_{h \to 0} \frac{f(t + h) – f(t)}{h}.$$ Note that it follows immediately from this definition that if $f(t) = g(t) + h(t)$ for all $t \in [a,b]$, then $f'(t) = g'(t) + h'(t)$ for all $t \in (a,b)$, and if $f(t) = C$ for all $t \in [a,b]$, then $f'(t) = 0$ for all $t \in (a,b)$. These facts will be used in Theorems 1 and 2 below.

The *Riemann integral* $\int_{a}^{b}f(t)\,{\rm d}t$ is defined informally as the signed area of the region in the $xy$-plane that is bounded by the graph of $f(t)$ and the $x$-axis between $x = a$ and $x = b$. Note that the area above the $x$-axis is positive and adds to the total area, while the area below the $x$-axis is negative and subtracts from the total area.

More formally, given the closed interval $[a, b]$, define a “tagged partition” of $[a,b]$ as a pair of sequences $(x_i, 0 \leq i \leq n)$ and $(t_i, 1 \leq i \le n)$, such that

$$a = x_0 \leq t_1 \leq x_1 \leq t_2 \leq x_2 \leq \cdots \leq x_{n-1} \leq t_n \leq x_n = b.$$ Note that the tagged partition $P = \{x_i, t_i\}$ divides the interval $[a,b]$ into $n$ sub-intervals $[x_{i−1}, x_i]$, each of which includes a point $t_i$. Let $d_i = x_i – x_{i-1}$, and let $D(P) = \max_i d_i$. The *Riemann sum* of the tagged partition $P$ is defined as $R(P) = \sum _{i=1}^{n}f(t_{i}) \, d_i$. The *Riemann integral* of $f(t)$ on $[a,b]$ is then defined as $I = \int_a^b f(t) \, {\rm d}t$, provided that, given any $\epsilon \gt 0$, there is a $\delta \gt 0$ such that for any tagged partition $P$ of $[a,b]$ with $D(P) \lt \delta$, the condition $|I – R(P)| \lt \epsilon$ is satisfied. Note that it follows immediately from this definition that for any $c \in (a,b)$, we have $\int_a^b f(t) \, {\rm d}t = \int_a^c f(t) \, {\rm d}t + \int_c^b f(t) \, {\rm d}t$. This fact will be used in Theorems 1 and 2 below.

**AXIOM 1 (Completeness axioms)**:

Axiom 1a (The Heine-Borel theorem): Any collection of open intervals covering a closed set of real numbers has a finite subcover.

Axiom 1b (Intermediate value theorem): Every continuous function on a closed real interval attains each value between and including its minimum and maximum.

Axiom 1c (Least upper bound / greatest lower bound theorem): Every set of reals that is bounded above has a least upper bound; every set of reals that is bounded below has a greatest lower bound.

**Comment**: These are not really “theorems” but instead are merely equivalent axioms of the property of completeness for real numbers; each can be proven from the others. See the Wikipedia article Completeness of the real numbers and this Chapter for details.

**LEMMA 1 (Continuity on a closed interval implies uniformly continuity)**: Let $f(t)$ be a continuous function on the closed interval $[a,b]$. Then $f(t)$ is uniformly continuous on $[a,b]$.

**Proof**: By the definition given above for a continuous function, for every point $t \in [a,b]$, given $\epsilon \gt 0$, there is some $\delta_t \gt 0$ such that for all $s$ in the region $|s – t| \lt \delta_t$, we have $|f(s) – f(t)| \lt \epsilon$. Then consider the collection of intervals $C(\epsilon) = \{(t – \delta_t, t + \delta_t), t \in [a,b]\}$. Clearly $C(\epsilon)$ is an open cover of $[a,b]$. Now by Axiom 1a, this collection has a finite subcover $C_n(\epsilon) = \{(t_1 – \delta_1, t_1 + \delta_2), (t_2 – \delta_2, t_2 + \delta_2), \cdots, (t_n – \delta_n, t_n + \delta_n)\}$. Let $\delta = \min_i \delta_i$. Now consider any $x, y \in [a,b]$ with $|x – y| \lt \delta$. If both $x$ and $y$ are in the same interval of the collection, we have $|f(x) – f(y)| \lt 2 \epsilon$. If they lie in adjacent intervals, then $|f(x) – f(y)| \lt 4 \epsilon$. Either way, the condition for uniform continuity is satisfied.

**LEMMA 2 (A continuous function on a closed interval is integrable)**: If the function $f(t)$ is continuous on $[a,b]$, then the integral $\int_a^b f(t) \, {\rm d}t$ exists.

**Proof**: For any tagged partition $P = \{x_i, t_i\}$ of $[a,b]$, define $u_i, v_i \in [x_{i-1}, x_i]$ as the values, guaranteed to exist by Axiom 1b above, such that $f(u_i) = \min_{x \in [x_{i-1},x_i]} f(x)$ and $f(v_i) = \max_{x \in [x_{i-1},x_i]} f(x)$. By Lemma 1, given any $\epsilon \gt 0$, there is some $\delta \gt 0$ such that whenever $D(P) \lt \delta$, (so that $|u_i – v_i| \lt \delta$), it follows that $|f(u_i) – f(v_i)| \lt \epsilon / (b – a)$. Define $U(P) = \sum_{i=1}^n f(u_i) d_i$ and $V(P) = \sum_{i=1}^n f(v_i) d_i$, and let $R(P) = \sum_{i=1}^n f(t_i) d_i$ be the Riemann sum as above. Clearly $U(P) \leq R(P) \leq V(P)$. However, $$V(P) – U(P) = \sum_{i=1}^n (f(v_i) – f(u_i)) d_i \lt \frac{\epsilon}{b-a} \sum_{i=1}^n d_i = \epsilon.$$ Since the set of $U(P)$ over all tagged partitions $P$ is bounded above by $(b-a) \max_{t \in [a,b]} f(t)$ or by any $V(P)$, by Axiom 1c there is a least upper bound $U$. Similarly, since the set of $V(P)$ over all tagged partitions $P$ is bounded below by $(b-a) \min_{t \in [a,b]} f(t)$ or by any $U(P)$, there is a greatest lower bound $V$. In summary, we have shown that given any $\epsilon \gt 0$, there is a $\delta \gt 0$ such that for any tagged partition $P$ with $D(P) \lt \delta$, we have $U(P) \leq R(P) \leq V(P)$; $U(P) \leq U \leq V(P)$; $U(P) \leq V \leq V(P)$; and $V(P) \lt U(P) + \epsilon$. It follows that $U = V = \int_a^b f(t) \, {\rm d}t$.

**LEMMA 3 (Extreme value theorem)**: If $f(t)$ is continuous on $[a,b]$ and differentiable on $(a,b)$, and if $f(a) = f(b)$, then there is some $c \in (a,b)$ such that $f'(c) = 0$.

**Proof**: By Axiom 1b, $f(t)$ has a maximum and a minimum in $[a,b]$. Since $f(a) = f(b)$, then if $f(t)$ is not constant for all $t \in [a,b]$ it must have either a minimum or a maximum at some point $c$ in the *interior* of $[a,b]$. Say $c$ is an interior maximum, but that $g'(c) \gt 0$. Recall that $$g'(c) = \lim_{h \to 0} \frac{g(c+h) – g(c)}{h}.$$ Thus given $\epsilon$ with $0 \lt \epsilon \lt g'(c)$, there is a $\delta \gt 0$ such that for all $h$ with $0 \lt h \lt \delta$ we have that $g(c+h) \gt g(c) + h g'(c) – h \epsilon \gt g(c)$, contradicting the assumption that $c$ is an interior maximum. A similar contradiction results if one assumes that $g'(c) \lt 0$, or assumes that $c$ is an interior minimum and $g'(c) \ne 0$.

**LEMMA 4 (Zero derivative implies constant)**: If $f(t)$ is continuous on $[a,b]$ and $f'(t) = 0$ for all $t \in (a,b)$, then $f(t) = C$ for some constant $C$.

**Proof**: If $f(t)$ is not constant, then by Axiom 1b it has a minimum at some point $c_1$ and a maximum at some point $c_2$. Let $d_1 = f(c_1)$ and $d_2 = f(c_2)$, and assume, for convenience, that $c_1 \lt c_2$. Now define, on the closed interval $[c_1, c_2]$, the function $g(t) = f(t) – (d_2 – d_1) (t – c_1) / (c_2 – c_1) – d_1$. Its derivative is $g'(t) = f'(t) – (d_2 – d_1) / (c_2 – c_1) = – (d_2 – d_1) / (c_2 – c_1)$, since by hypothesis $f'(t) = 0$ for all $t \in (a,b)$. Now note that $g(c_1) = g(c_2) = 0$. Thus by Lemma 3 there is some $c \in (c_1, c_2)$ such that $g'(c) = 0$. But since $g'(c) = – (d_2 – d_1) / (c_2 – c_1)$, this implies that $d_1 = d_2$ and in fact that $f(t)$ is constant on $[a,b]$.

**THEOREM 1 (Fundamental theorem of calculus, Part 1)**: Let $f(t)$ be continuous on $[a,b]$, and define the function $g(x) = \int_a^x f(t) \, {\rm d}t$. Then $g(x)$ is differentiable on $(a,b)$, and for every $x \in (a,b), \, g'(x) = f(x)$.

**Proof**: By Lemma 2, for any $x \in (a,b)$ and any $h \gt 0$ with $x + h \lt b$, we have have that the integrals $\int_a^x f(t) \, {\rm d}t$, $\int_a^{x+h} f(t) \, {\rm d}t$ and their difference, namely $\int_x^{x+h} f(t) \, {\rm d}t$, each exist. By Lemma 1, given any $\epsilon \gt 0$, there is some $\delta \gt 0$ such that for any $x \in (a,b)$, any $h > 0$ such that $(x-h, x+h) \subset (a,b)$, and any $t \in (x-h, x+h)$, the inequality $f(x) – \epsilon \le f(t) \le f(x) + \epsilon$ holds. Thus for any tagged partition $P$ of $[x,x+h]$ with $D(P) \lt \delta$, we can write $$h (f(x) – \epsilon) \le \sum_{i=1}^n f(t_i) \, d_i \le h (f(x) + \epsilon),$$ which then means that $$h (f(x) – \epsilon) \le \int_x^{x+h} f(t) \, {\rm d}t \le h (f(x) + \epsilon).$$ Now note that $\int_x^{x+h} f(t) \, {\rm d}t = g(x+h) – g(x)$. Thus $$f(x) – \epsilon \le \frac{g(x+h) – g(x)}{h} \le f(x) + \epsilon.$$ A similar argument for $\int_{x-h}^x f(t) \, {\rm d}t$ gives the desired result $g'(x) = f(x)$. We note in passing that the condition for continuity of $g(x)$ on $[a,b]$ is easily met: Given $\epsilon \gt 0$, define $\delta \lt \epsilon / (\max_{t \in [a,b]} |f(t)|)$, and then $|g(x) – g(y)| \lt \epsilon$ whenever $|x – y| \lt \delta$.

**THEOREM 2 (Fundamental theorem of calculus, Part 2)**: Let $f(t)$ be continuous on $[a,b]$, and let $G(x)$ be any continuous function on $[a,b]$ that satisfies $G'(x) = f(x)$ on $(a,b)$. Then $\int_a^b f(t) \, {\rm d}t = G(b) – G(a)$.

**Proof**: The function $g(x)$ of Part 1 is continuous on $[a,b]$ and also satisfies $g'(t) = f(x)$ for $x \in (a,b)$. Then $h(x) = G(x) – g(x)$ is continuous and has zero derivative on $(a,b)$. Thus by Lemma 4, $h(x) = C$ for some constant $C$, so that $G(x) = g(x) + C$. Now we can write $$G(b) – G(a) = (g(b) + C) – (g(a) + C) = g(b) – g(a) = \int_a^b f(t) \, {\rm d}t.$$

**Other posts in the “Simple proofs” series**

The other posts in the “Simple proofs of great theorems” series are available Here.

]]>The fact that scientific research has made immense progress over the past years, decades and centuries is taken for granted among professional scientists and most of the lay public as well. But there are others, from both the left wing and the right wing of society, who question, dismiss or even reject the notion that science progresses. One group, which is mostly rooted in the right wing of society, rejects the scientific consensus on evolution, as with the creationism and intelligent

Continue reading Is scientific progress real?

]]>The fact that scientific research has made immense progress over the past years, decades and centuries is taken for granted among professional scientists and most of the lay public as well. But there are others, from both the left wing and the right wing of society, who question, dismiss or even reject the notion that science progresses. One group, which is mostly rooted in the right wing of society, rejects the scientific consensus on evolution, as with the creationism and intelligent design movements, or the scientific consensus on global warming. The other group, namely the “postmodern science studies” movement, which is mostly rooted in the left wing of society, questions, in a very fundamental sense, whether scientific research even uncovers truth at all, much less progresses to an ever sharper view of nature.

Much of today’s postmodern science studies literature is rooted in the writings of Karl Popper and Thomas Kuhn, although the postmodern writers go much further than either Popper or Kuhn. British economist and philosopher Karl Popper was struck by the differences in approach that he perceived at the time between the writings of some popular Freudians and Marxists, who saw “verifications” of their theories in every news report and clinical visit, and the writings of Albert Einstein, who for instance acknowledged that if the predicted red shift of spectral lines due to gravitation were not observed, then his general theory of relativity would be untenable. Popper was convinced that *falsifiability* was the key distinguishing factor, a view he presented in his oft-cited book *The Logic of Scientific Discovery* [Popper1959, pg. 40-41]. His basic view is widely accepted in modern scientific thinking.

In the 1970s, Thomas Kuhn’s work *The Structure of Scientific Revolutions* analyzed numerous historical cases of scientific advancements, and then argued key paradigm shifts did not come easily [Kuhn1970]. As a trained scientist, Kuhn was able to bring significant scientific insight into his analyses of historical scientific revolutions. Unfortunately, his work includes some very dubious and immoderate analysis, such as at one point where he denies that paradigm shifts carry scientists closer to fundamental truth [Kuhn1970, pg. 170], or when he argues that paradigm shifts often occur due to non-experimental factors [Kuhn1970, pg. 135]. For one thing, Kuhn’s “paradigm shift” model has not worked as well in recent years. As a single example, the “standard model” of physics within just a few years completely displaced previous theories of particle physics, after a very orderly transition [Tipler1994, pg. 88-89].

More recent writings in the postmodern science studies field have greatly extended the scope and sharpness of these critiques, declaring that much of modern science, like literary and historical analysis, is “socially constructed,” dependent on the social environment and privileged power structures of the researchers, with no claim whatsoever to fundamental truth or progress [Koertge1998, pg. 258; Madsen1990, pg. 471; Sokal1998, pg. 5-91, 229-258]. Here are just a few of the many examples that could be cited:

- “The validity of theoretical propositions in the sciences is in no way affected by the factual evidence.” [Gergen1988, pg. 258; Sokal2008, pg. 230].
- “The natural world has a small or non-existent role in the construction of scientific knowledge.” [Collins1981; Sokal2008, pg. 230].
- “Since the settlement of a controversy is the
*cause*of Nature’s representation, not the consequence, we can never use the outcome — Nature — to explain how and why a controversy has been settled.” [Latour1987, pg. 99; Sokal2008, pg. 230]. - “For the relativist [such as ourselves] there is no sense attached to the idea that some standards or beliefs are really rational as distinct from merely locally accepted as such.” [Barnes1981, pg. 27; Sokal2008, pg. 230].
- “Science legitimates itself by linking its discoveries with power, a connection which
*determines*(not merely influences) what counts as reliable knowledge.” [Aronowitz1988, pg. 204; Sokal2008, pg. 230].

In a curious turn of events, these postmodern science writers, by undermining scientists’ claim to objective truth, have unwittingly provided arguments and talking points for the creationism, intelligent design and climate change denial movements [Otto2016a; Otto2016b].

In a recently published interview of Kuhn by *Scientific American* writer John Horgan, Kuhn was deeply upset that he has become a patron saint to this type of would-be scientific revolutionary: “I get a lot of letters saying, ‘I’ve just read your book, and it’s transformed my life. I’m trying to start a revolution. Please help me,’ and accompanied by a book-length manuscript.” Kuhn emphasized that in spite of the often iconoclastic way his writings have been interpreted, he remained “pro-science,” noting that science has produced “the greatest and most original bursts of creativity” of any human enterprise [Horgan2012].

While philosophers and postmodern writers may debate whether science fundamentally progresses, what are the facts? Even after properly acknowledging the tentative, falsifiable nature of science as taught by writers such as Popper and Kuhn, it is clear that modern science has produced a sizable body of broad-reaching theoretical structures that describe the universe and life on Earth ever more accurately with each passing year. Keep in mind that each year approximately two million new peer-reviewed scientific research papers are published worldwide [Ware2012]. This rapidly expanding corpus of scientific work is an undeniable testament to the progress of modern science.

It is easy to be blase and dismissive of this progress, but consider for a moment a few of the remarkable developments of the past 120 years:

**Relativity**. In 1905, Albert Einstein published what is now known as the special theory of relativity, which extended the classical Newtonian physics, a theory that had reigned supreme for over 250 years, to the realms of very fast moving objects and systems. Then in 1917, Einstein’s theory of general relativity further extended the theory to accelerating systems, and in the process explained gravity and a host of other phenomena in a very mathematically elegant framework. Relativity has now passed a full century of the most exacting tests, including measurements of the change over time in Mercury’s perihelion motion, measurements of the periodic changes in frequency of binary stars (which agree with theory to astonishing precision), predictions of exotica such as gravitational lenses and black holes, and, in the process, has pushed aside several other competing theories [General2019; Leach2018]. Today’s GPS technology crucially relies on both Einstein’s general relativity and special relativity, and if either of these theories were in error to any significant degree, the entire GPS system and applications that rely on it would quickly fail [Global2019].**Quantum physics**. Another 1905 paper by Einstein proposed that light shine in discrete packets, now called “quanta,” rather in continuous beams. Subsequently physicists such figures as Max Planck, Niels Bohr, Paul Dirac, Werner Heisenberg and Erwin Schrodinger developed what is now known as quantum physics, which governs phenomena at the submicroscopic realm. Quantum physics has also passed a full century of the most exacting tests imaginable, from confirmations of the perplexing predicted behavior of electrons traveling through slits to numerous measurements of fundamental chemical and nuclear properties. As a single example of thousands that could be mentioned here, the numerical value of the magnetic dipole moment of the electron (in certain units), calculated from the present-day theory of quantum electrodynamics on one hand, and calculated from best available experimental measurements on the other hand, are [Sokal1998, pg. 57]: Theoretical: 1.001159652201 (plus or minus 30 in the last two digits); Experimental: 1.001159652188 (plus or minus 4 in the last two digits). Is this agreement, to within one part in 70 billion, just a coincidence? Quantum physics is the basis for chemistry, semiconductor technology and materials science, and thus has far-reaching and absolutely indispensable applications in today’s world. It is not an exaggeration to say that quantum physics underlies virtually every electronic circuit, device and system in use today, and they would quickly fail to work if these theories were even slightly in error.**Standard model**. In the 1970s, quantum electrodynamics (QED) was extended to what is now known as quantum chromodynamics (QCD), and, together with relativity, constitute what is known as the “standard model” of modern physics. Perhaps the most dramatic confirmation of the standard model was the 2012 discovery of the Higgs boson [Overbye2012a]. In the past 30 years other attempts have been made to extend the frontiers of physics, including supersymmetry and string theory, but so far the standard model continues to reign supreme, even though researchers recognize that ultimately either relativity or quantum physics or both must give way to a more fundamental theory.**Structure of DNA**. Surely the discovery of the structure and function of DNA by Francis Crick and James Watson (with assistance from several others, notably Rosalind Franklin) must rank as one of the most significant discoveries of the 20th century, and arguably the single most significant discovery in molecular biology of all time. As the two researchers modestly observed at the conclusion of their original paper, “It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.” [Watson1953; Pray2008]. The full impact of this discovery in medicine and biology is only now being realized, with the advent of inexpensive full-genome sequencing and gene editing (see below).**Accelerating universe**. Astronomers and physicists were startled when in 1998, two different observational teams found that the expansion of the universe, long assumed to be gradually slowing due to gravitation, was actually accelerating [Wilford1998]. This finding has enormous impact on cosmological models of the universe, and has caused considerable consternation in the field, yet latest studies continue to confirm this finding [Susskind2005, pg. 22, 80-82, 154; Amit2017].**Extrasolar planets**. Following some initial discoveries by astronomers in the 1990s, astronomers have discovered thousands of planets orbiting other stars, in a development that has significant implications for the existence of life outside the Earth. As a single example of these discoveries, in 2017 astronomers found seven roughly Earth-size planets orbiting a star named Trappist-1, about 40 light-years away. At least one or two of these planets appear to have a temperature regime that would support life, although it is still much too soon to say whether or not any life actually exists there [Chang2017].**Gravitational wave astronomy**. In 2016, a team of researchers operating the new Long Interferometer Gravitational Observatory (LIGO) system announced that they had detected a brief chirp, the “sound” in the fabric of the universe of two black holes colliding, as predicted by Einstein’s general relativity. This discovery heralds the start of a new era of astronomy, one where optical and radio telescopes are combined with gravitational wave detections to explore the universe [Overbye2016].

Of course, this is just the briefest summary of highlights. For every item listed above, 1000 very significant other items could have been included. Scientific progress is very real.

Similarly, it is very easy to take for granted our current technology, which is merely the endpoint of a tidal wave of scientific and technological advances in our modern era. But consider just for a moment what has been accomplished:

**Medical technology**. Eyeglasses, from their widespread adoption in the 19th century, have restored clear vision to billions of otherwise blind or nearly blind persons. A recent analysis listed eyeglasses as the fifth most significant invention of all time, one that has “dramatically raised the collective human IQ” [Fallows2013]. Yet vision is just one detail in a huge body of medical technology, mostly developed in the 20th and 21st century, including: (a) vaccination and antibiotics, which have saved billions of persons from otherwise debilitating and deadly disease; (b) x-rays and magnetic resonance imaging; (c) surgical procedures; (c) effective painkillers and pharmaceuticals; and (d) dental procedures that have saved the teeth of billions of persons worldwide. As a result of this technology, worldwide life expectancy has soared from 29 in 1880 to 71 today [Pinker2018, Chap. 5].**Transportation**. Today’s worldwide rail network, which serves a large fraction of humanity, has grown from a few miles in England, Europe and the U.S. in 1830 to millions of miles today. Even more amazing is the growth, beginning in the late 1800s and early 1900s, in highways and automobiles, with over one billion vehicles in use today [Motor2019]. Additionally, a whopping 3.5 billion airplane trips are taken each year, and although 80% of the world’s population has never flown, each year 100 million fly for the first time [Gurdus2017]. More remarkably, very likely within 25 years or so, human passengers (not just a handful of astronauts) will travel to the Moon and Mars, an achievement that only a few years ago was the realm of fantasy [Drake2017]. Firms pursuing commercial space transport include Blue Origin (founded by Jeff Bezos), Boeing, Orbital Sciences, SpaceX (founded by Elon Musk) and Virgin Galactic (founded by Richard Branson).**Moore’s Law and computer technology**. No other single statistic is as compelling a demonstration of technological progress as Moore’s Law, namely the observation that beginning in 1965, when Intel pioneer Gordon Moore first noted it [Moore1965], the number of transistors that can be crammed onto a single integrated circuit roughly doubles every 18-24 months. Moore’s Law has now continued unabated for over 50 years, and the end is not yet in sight. As of 2019, state-of-the-art devices typically have more than 20 billion transistors, an increase by a factor of 80 million over the best 1965-era devices [Transistor2019]. This staggering number of on-chip transistors translates directly into memory capacity and processing speed, endowing a broad range of high-tech devices with capabilities unthinkable even a few years ago. For example, 2019-era supercomputers compute one million times faster, and include one million times as much memory, compared to supercomputers just 25 years ago [Top500]. With technologies such as nanotechnology and quantum computing in development, even more futuristic applications are in the works.**Communication**. Human society leaped forward in the 15th century with the printing press, which provided public access to many literary and scientific works and directly contributed to the birth of modern science. Similarly, the development of the telegraph and telephone facilitated the huge technological boom of the 19th and 20th century. Now we are seeing the effects of an even more far-reaching communication revolution, namely the Internet, which quite literally brings the entire world’s cumulative knowledge to one’s computer or smartphone. Some of the older generation may recall when telephone service was first provided to individual homes, via “party lines,” back in the 1930s, 1940s and 1950s. Long-distance calls were possible, but only at very high rates — typically 50 cents to $1.00 per minute within first-world countries, and $3.00 to $5.00 per minute to foreign countries. Nowadays, via the Internet and services such as Apple’s FaceTime and Microsoft’s Skype, one can communicate by high-resolution color video, for free, to virtually anywhere worldwide.**Smartphones**. No first-world teenager needs to be lectured about the miracle of a smartphone, which quite literally connects nearly the entire world’s population in a communications network, provides full access to Internet resources and includes a GPS mapping facility that by itself would astound anyone of an earlier era. As of 2019, over five billion persons, or roughly 70% of the entire world population, own at least a cell phone, and 2.71 billion, or nearly half the world’s population, own a smartphone [Smartphone2019]. As of 2019, state-of-the-art smartphones typically include touchscreens with over three million pixel resolution, at least two cameras with over ten million pixel resolution, and up to 512 Gbyte storage. Beginning with the 2018-2019 models, Apple smartphones also include special hardware and software that enable 3-D facial recognition. Today’s leading-edge smartphones can perform more than*five trillion*operations per second, a speed faster than that of the most powerful supercomputers of twenty years earlier. Now many of these same smartphone capabilities are being delivered in smartwatches, with built-in wireless facilities, GPS mapping and features for monitoring health and fitness. For example, beginning with the 2018-2019 models, Apple watches can produce a clinical-quality electrocardiogram and can automatically call emergency services if it detects that the wearer has fallen and was not able to get up.**Genome sequencing**. The first complete human genome sequence was completed in 2000, after a ten-year effort that cost approximately $2.7 billion [Genome2010]. But in the wake of several waves of new technology since then, genomes can now be sequenced for less than $1,000, and the cost is expected to drop to only $100 by 2020 or so [Vance2014; Fikes2017]. If the $100 price is achieved by 2020, as expected, the price will have dropped by a stunning factor of 27 million in only twenty years. This is sustained progress by a factor of 2.35 compounded per year, which is significantly faster than the Moore’s Law rate of 1.59 compounded per year. Partial genome sequencing is already wildly popular as a means to identify the national origin of one’s ancestors, and full genome sequencing is being used to identify possible genetic defects. It is inevitable that full genome sequencing will become a standard part of modern medicine in ways that we can only dimly foresee at the present time. What’s more, this same sequencing technology has enabled biologists to study the genomes of thousands of other biological species, producing indisputable evidence of common ancestry between species. In the latest application of DNA technology, researchers have discovered a technique for gene editing (“CRISPR”), a development that is certain to have far-reaching applications in medicine and almost certainly will merit a Nobel Prize [Zimmer2015c].**Artificial intelligence**. Although the notion of artificial intelligence (AI) was first articulated by 1950, early optimism soon faded as researchers realized that AI was very much more difficult than first envisioned. In 1997, in a seminal event for the field, an IBM computer system defeated the world’s champion chess player. In 2011, in a much more impressive AI achievement, an IBM computer named “Watson” defeated two champion contestants on the American quiz game Jeopardy. In 2017, a computer program developed by Google’s DeepMind defeated the world champion Go champion, an achievement that many had thought would not come for decades, if ever. Then later that year, in an even more startling development, DeepMind researchers developed a new program that was merely taught the rules of Go, and then played against itself. Within just three days it had exceeded the skill of the previous program; similar programs then quickly conquered chess and the Japanese game shogi as well. This same AI-machine learning approach is now being applied in numerous commercial developments; indeed, 2018 appears to be the year that AI truly came of age. Among the current participants are Amazon, Apple, Facebook, Google, IBM, Microsoft, Salesforce, and numerous financial firms [Bailey2018]. Closely related is the development of self-driving automobiles and trucks. Waymo has already launched a robot-ride taxi service in Arizona [Davies2018], and self-driving trucks will likely be fielded by 2022 [Freedman2017].

It is a sad commentary on our current society that a large fraction of the populace are so absorbed by day-to-day bad news that they do not recognize unmistakable evidence of longer-term progress, across a broad range of social and economic indicators. Crime is down; life expectancy is up; numerous diseases have been conquered; hundreds of millions fewer worldwide live in poverty; many fewer are dying in military conflicts or in accidents; many more worldwide live in democratic societies where basic human rights are defended, and literacy is on the rise in every nation for which reliable data are available [Pinker2011b; Pinker2018].

At the same time, and in spite of continuing naysaying from both the left wing (in particular, the postmodern science studies movement) and the right wing (in particular, the creationism, intelligent design and climate change denial movements), the engine of scientific and technological progress continues unabated. Just in the past 20 years, scientists have discovered that the universe’s expansion is accelerating, have discovered thousands of planets orbiting other stars, and have catalogued the entire human genome. The latter task cost roughly $2.7 billion when it was completed in 2000, yet dramatic improvements since then have reduced the cost to roughly $1,000, and it will soon fall to $100. Computer and information technologies continue their relentless advance with Moore’s Law. This is perhaps most evident when we see the vast numbers of cell phones and smartphones now in use — more than 5 billion, or roughly 70% of the entire world human population, now own at least a cell phone, and 2.7 billion own a smartphone. The latest smartphones pack worldwide communication, full-fledged Internet facilities, voice recognition, facial recognition and GPS mapping, and feature computing power exceeding the world’s most powerful supercomputers of just 20 years ago. And technologies such as genome sequencing, artificial intelligence, self-driving vehicles and commercial space travel are just getting started.

There is no sign that this torrid rate of progress is slowing down — in 20 years hence we will look back to our own time with just as much disdain as we do today when we recall the world of 20 years ago. So we have much to look forward to. The future is destined to be as exciting as any time in the past. It’s a great time to be alive.

]]>In 1950 Alan Turing’s landmark paper Computing machinery and intelligence outlined the principles of AI

Continue reading 2018: The year that artificial intelligence came of age

]]>The field of artificial intelligence (AI) is actually rather old. Ancient Greek, Chinese and Indian philosophers developed principles of formal reasoning several centuries before Christ. In 1651, British philosopher Thomas Hobbes wrote in *Leviathan* that “reason … is nothing but reckoning (that is, adding and subtracting).” In 1843 century Ada Lovelace, widely considered to be the first computer programmer, ventured that machines such as Babbage’s analytical engine “might compose elaborate and scientific pieces of music of any degree of complexity or extent.”

In 1950 Alan Turing’s landmark paper Computing machinery and intelligence outlined the principles of AI and proposed a test, now known as the Turing test, for establishing whether true AI had been achieved. Early computer scientists were confident that true AI system would soon be a reality. In 1965 Herbert Simon predicted that “machines will be capable, within twenty years, of doing any work a man can do.” In 1970 Marvin Minsky declared “In from three to eight years we will have a machine with the general intelligence of an average human being.”

But this early optimism collided with hard reality. For example, early attempts at producing practical machine translation systems, which were presumed to be imminent, were slammed in a 1966 report. Weizenbaum’s ELIZA program attempted to emulate a psychotherapist, but the resulting dialogue was little more than a reassembly of the user’s input, and there was little indication of how to extend this to true AI. The inevitable backlash against inflated promises and expectations during the 1970s was dubbed the “AI Winter,” a phenomenon that sadly was repeated again, in the late 1980s and early 1990s, when a second wave of AI systems also resulted in disappointment.

In retrospect, these pioneers failed to appreciate the true difficulties of constructing true AI. These include limited computer power, the combinatorial explosion of logical branches, the requirement for commonsense knowledge, and the lack of appreciation for seemingly trivial human capabilities such as visual pattern recognition and physical motion. For additional details, see the Wikipedia article on the history of AI.

A breakthrough of sorts came in the late 1990s and early 2000s with the development of machine learning, Bayes-theorem-based methods, which quickly displaced the older methods based mostly on formal reasoning. In other words, rather than trying to program an AI system as a large web of discrete logical reasoning operations, these researchers were content to use statistical machine learning schemes to automatically produce the reasoning tree. These new methods proved to be superior both in efficiency and in effectiveness.

The other major development was the inexorable rise of computing power and memory, gifts of Moore’s Law that have continued unabated for over 50 years. A typical 2018-2019-era smartphone is based on 8 nm technology, features up to 512 Gbyte flash memory and can perform trillions of operations per second. With such huge computing power and memory, greater in many respects than that of the world’s most powerful 2000-era supercomputers, previously unthinkable AI capabilities, such as 3-D facial recognition, can be provided directly to the consumer.

One highly publicized advance came in 1997, when Garry Kasparov, the reigning world chess champion, was defeated by an IBM-developed computer system named “Deep Blue.” Deep Blue employed some new techniques, but for the most part it simply applied enormous computing power to store openings, look ahead many moves, apply alpha-beta tree pruning and never make mistakes.

This was followed in 2011 with the defeat of two champion contestants on the American quiz show “Jeopardy!” by an IBM-developed computer system named “Watson.” The Watson achievement was significantly more impressive as an AI demonstration, because it involved natural language understanding, i.e. the understanding of ordinary (and often tricky) English text. For example, the “Final Jeopardy” clue at the culmination of the contest, in the category “19th century novelists,” was the following: “William Wilkinson’s ‘An Account of the Principalities of Wallachia and Moldavia’ inspired this author’s most famous novel.” Watson correctly responded “Who is Bram Stoker?” [the author of Dracula], thus sealing the victory.

Legendary Jeopardy champ Ken Jennings conceded by writing on his tablet, “I for one welcome our new computer overlords.”

The ancient Chinese game of Go involves placing black and white beads on a 19×19 grid. The game is notoriously complicated, with strategies that can only be described in vague, subjective terms. For these reasons, many observers did not expect Go-playing computer programs to beat the best human players for many years, if ever. See the earlier MathScholar blog for more details.

This pessimistic outlook changed abruptly in March 2016, when a computer program named “AlphaGo,” developed by researchers at DeepMind, a subsidiary of Alphabet (Google’s parent company), defeated Lee Se-dol, a South Korean Go master, 4-1 in a 5-game tournament. The DeepMind researchers further enhanced their program, which then in May 2017 defeated Ke Jie, a 19-year-old Chinese Go master thought to be the world’s best human Go player.

In developing the program that defeated Lee and Ke, DeepMind researchers fed their program 100,000 top amateur games and “taught” it to imitate what it observed. Then they had the program play itself and learn from the results, slowly increasing its skill.

In an even more startling development, in October 2017, Deep Mind researchers developed from scratch a new program, called AlphaGo Zero. For this program, the DeepMind researchers merely programmed the rules of Go, with a simple reward function that rewarded games won, and then instructed the program to play games against itself. This program was *not* given any records of human games, nor was it programmed with any strategies, general or specific.

Initially, the program merely scattered pieces seemingly at random across the board. But it quickly became more adept at evaluating board positions, and gradually increased in skill. Interestingly, along the way the program rediscovered many well-known elements of Go strategies used by human players, including anticipating its opponent’s probable next moves. But unshackled from the experience of humans, it developed new complex strategies never before seen in human Go games. After just three days of training and 4.9 million training games (with the program playing against itself), the AlphaGo Zero program had advanced to the point that it defeated the earlier Alpha Go program 100 games to zero.

Skill at Go (and several other games) is quantified by the Elo rating, which is based on the record of their past games. Lee’s rating is 3526, while Ke’s rating is 3661. After 40 days of training, AlphaGo Zero’s Elo rating was over 5000. Thus AlphaGo Zero was as far ahead of Ke as Ke is ahead of a good amateur player. Additional details are available in an Economist article, a Scientific American article and a Nature article.

In December 2017, DeepMind announced that they had reconfigured the AlphaGo Zero program, dubbed AlphaZero for short, to play other games, including chess and shogi, a Japanese version of chess that is significantly more complicated and challenging. Recently (December 2018) the DeepMind researchers documented their groundbreaking work in a technical paper published in *Science* (see also this excellent New York Times analysis by mathematician Steven Strogatz).

In the paper, the researchers described various experiments they have run comparing their AlphaZero program to championship-grade software programs, including Stockfish, the 2016 Top Chess Engine Championship champion (significantly more powerful than the 1997 IBM Deep Blue system), and Elmo, the 2017 Computer Shogi Association champion. In a matches against Stockfish in chess, with AlphaZero playing white, AlphaZero won 29%, drew 70.6% and lost 0.4%. In a similar match against Elmo in shogi, with AlphaZero playing white, AlphaZero won 84.2%, drew 2.2% and lost 13.6%. Other comparison results are presented in the technical paper.

Just as impressive as these statistics is the fact that AlphaZero seemed to play with a human-like style. As Strogatz explains, describing the chess program,

[AlphaZero] played like no computer ever has, intuitively and beautifully, with a romantic, attacking style. It played gambits and took risks. … Grandmasters had never seen anything like it. AlphaZero had the finesse of a virtuoso and the power of a machine. It was humankind’s first glimpse of an awesome new kind of intelligence..

AI systems are doing much more than defeating human opponents in games. Here are just a few of the current commercial developments:

- Apple’s Siri and Amazon’s Alexa smartphone-based voice recognition systems are now significantly improved over the earlier versions, and speaker systems incorporating them are rapidly becoming a household staple.
- Facial recognition has also come of age, with Facebook’s facial recognition API and, even more impressively, with Apple’s 3-D facial recognition hardware and software, which is built into the latest iPhones and iPads.
- Self-driving cars are already on the road, and 3.5 million truck driving jobs, just in the U.S., are at risk within the next ten years.
- AI-powered surgical robots may soon perform some procedures faster and more accurately than humans.
- The financial industry already relies heavily on financial machine learning methods, and a huge expansion of these technologies is coming, possibly displacing thousands of highly paid workers.
- Other occupations likely to be impacted include package delivery drivers, construction workers, legal workers, accountants, report writers and salespeople.

So where is all this heading? A recent Time article features an interview with futurist Ray Kurzweil, who predicts an era, roughly in 2045, when machine intelligence will meet, then transcend human intelligence. Such future intelligent systems will then design even more powerful technology, resulting in a dizzying advance that we can only dimly foresee at the present time. Kurzweil outlines this vision in his book The Singularity Is Near.

Futurists such as Kurzweil certainly have their skeptics and detractors. Sun Microsystem founder Bill Joy is concerned that humans could be relegated to minor players in the future, if not extinguished. Indeed, in many cases AI systems already make decisions that humans cannot readily understand or gain insight into. But even setting aside such concerns, there is considerable concern about the societal, legal, financial and ethical challenges of such technologies, as exhibited by the current backlash against technology, science and “elites” today.

One implication of all this is that education programs in engineering, finance, medicine, law and other fields will need to change dramatically to train students in the usage of emerging AI technology. And even the educational system itself will need to change, perhaps along the lines of massive open online courses (MOOC). It should also be noted that large technology firms such as Amazon, Apple, Facebook, Google and Microsoft are aggressively luring top AI talent, including university faculty, with huge salaries. But clearly the field cannot eat its seed corn in this way; some solution is needed to permit faculty to continue teaching while still participating in commercial R&D work.

But one way or the other, intelligent computers are coming. Society must find a way to accommodate this technology, and to deal respectfully with the many people whose lives will be affected. But not all is gloom and doom. Steven Strogatz envisions a mixed future:

]]>Maybe eventually our lack of insight would no longer bother us. After all, AlphaInfinity could cure all our diseases, solve all our scientific problems and make all our other intellectual trains run on time. We did pretty well without much insight for the first 300,000 years or so of our existence as Homo sapiens. And we’ll have no shortage of memory: we will recall with pride the golden era of human insight, this glorious interlude, a few thousand years long, between our uncomprehending past and our incomprehensible future.

Four prestigious professional society awards are presented at the SC18 conference. This

Continue reading US leads but China rises in latest Top500 supercomputer list

]]>Four prestigious professional society awards are presented at the SC18 conference. This year’s awardees include:

- David E. Shaw received the Seymour Cray Engineering Award from the IEEE Computer Society. Shaw, the billionaire former CEO of the D. E. Shaw hedge fund, has for the past 12 years led a team to develop ANTON, a special-purpose computer system for protein folding and other biomedical computations. His team has twice received the Gordon Bell prize (see below) for landmark calculations in the biomedical field using the ANTON system.
- Sarita Adve received the Ken Kennedy Award from the ACM and the IEEE Computer Society. Adve, a professor of computer science at the University of Illinois, Urbana-Champaign, co-developed the memory models used in the C++ and Java programming languages, and is known for related work in hardware-software interfaces, system resiliency, cache coherence, reliability and power management.
- Linda Petzold, a professor at the University of California, Santa Barbara, received the 2018 Sidney Fernbach Award from the IEEE Computer Society. Petzold is best known for her work in the numerical solution of differential-algebraic equations, which have applications in materials science, computational biology and medicine.
- Two teams shared the Gordon Bell Prize from the ACM. One team, from the Oak Ridge National Laboratory, developed a supercomputer program that processes huge amounts of genome data to better understand the genetic basis of chronic pain and opiod addiction. The other group, based at the Lawrence Berkeley National Laboratory, developed a neural-net-based supercomputer program to identify extreme weather patterns in climate simulations.

This is the first year in which two women (Adve and Petzold) were awarded one the four major professional society awards in the high-performance computing field.

Another announcement at the SC18 conference was the latest edition of the Top500 list of the world’s most powerful supercomputers.

Number one on the November 2018 list is the Summit supercomputer at Oak Ridge National Laboratory in the U.S. The summit system consists of more than 4,600 individual servers, each containing two 22-core IBM Power9 processors and six NVIDIA Tesla V100 graphics processing unit accelerators, all connected with a Mellanox 100 Gb/s interconnection network. The system has a potential peak performance of 200 Pflop/s (2 x 10^{17} floating-point operations per second). Its performance on the Linpack benchmark, which is used for the Top 500 ranking, was 143.5 Pflop/s (1.435 x 10^{17}) floating-point operations per second.

Number two on the November 2018 list is the Sierra system, another IBM-based supercomputer, this one at the Lawrence Livermore National Laboratory in the U.S. The number three system is the Sunway TaihuLight system, at the National Supercomputer Center in Wuxi, China. It consists of 40,960 Chinese-designed SW26010 manycore processors.

The remaining top ten supercomputers are: (#4) the Tianhe-2A supercomputer in Guangzhou, China; (#5) the Piz Daint supercomputer in Lugano, Switzerland; (#6) the Trinity supercomputer in Albuquerque, New Mexico; (#7) the AI Bridging Cloud Infrastructure supercomputer in Tokyo, Japan; (#8) the Leibniz Rechenzentrum supercomputer near Munich, Germany; (#9) the Titan supercomputer in Oak Ridge, Tennessee; and (#10) the Sequoia supercomputer in Livermore, California.

For full details on these supercomputers, see the Top500 site.

Although the U.S. still dominates the Top500 list, including the top two systems, China has made remarkable progress in recent years. In the November 2018 list, 227 of the 500 systems are in China, versus 109 in the U.S., 38 in Japan, and 20 in the U.K. By contrast, in 2008 only 12 of the top 500 systems were in China. If one counts the share of total performance, then the U.S. leads with 37.7%, versus 31% in China, and 7.7% in Japan. In 2008, China’s share was a mere 2.5%.

The following graphic shows the remarkable advance of China’s presence in the Top 500 list over the past ten years:

]]>Continue reading Simple proofs: The impossibility of trisection

]]>Ancient Greek mathematicians developed the methodology of “ruler-and-compass” constructions: if one is given only a ruler (without marks) and a compass, what objects can be constructed as a result of a finite set of operations? While they achieved many successes, three problems confounded their efforts: (1) squaring the circle; (2) trisecting an angle; and (3) duplicating a cube (i.e., constructing a cube whose volume is twice that of a given cube). Indeed, countless mathematicians through the ages have attempted to solve these problems, and countless incorrect “proofs” have been offered. We might add that Archimedes discovered a way to trisect an angle, but it did not strictly conform to the rules of ruler-and-compass construction.

The impossibility of squaring the circle was first proved by Lindemann in 1882, who showed that $\pi$ is *transcendental* — it is not the root of any algebraic equation with integer or rational coefficients. The impossibility of trisecting an arbitrary angle was proved earlier, in 1837, by Pierre Wantzel, and the impossibility of duplicating a cube was also proved at about that same time. For additional historical background on the trisection problem, see this Wikipedia article.

In most textbooks, these impossibility proofs are presented only after extensive background in groups, rings, fields, field extensions and Galois theory. Needless to say, very few college-level students, even among those studying majors such as physics or engineering, ever take such coursework. This is typically accepted as an unpleasant but necessary aspect of mathematical pedagogy.

We present here a proof of the impossibility of trisecting an arbitrary angle by ruler-and-compass construction and, as a bonus, the impossibility of cube duplication. We emphasize that this proof is both *elementary* and *complete* — all necessary lemmas and nontrivial facts are proved below. It should be understandable by anyone with a solid high school background in algebra, geometry and trigonometry, although it would help if the reader is familiar with the integers modulo $n$ and algebraic fields. This proof is somewhat longer than others in this series, but this is mainly due to the inclusion of several illustrative examples, as well as proofs of facts, such as the rational root theorem and the formula for cosine of a triple angle, that some might consider “obvious.” The core of the proof (see “Proof of the main theorems” below) is actually the *shortest* of those published so far in this series.

**Gist of the proof:**

We will show that trisecting a 60 degree angle, in particular, is equivalent to constructing the number $\cos (\pi/9)$ (i.e., the cosine of 20 degrees), which is an algebraic number that satisfies an irreducible polynomial of degree 3. Since the only numbers and number fields that can be produced by ruler-and-compass construction have algebraic degrees that are powers of two, this shows that the trisection of a 60-degree angle is impossible.

We first define some basic notions about polynomials, algebraic numbers and field extensions, and prove three basic lemmas (Lemmas 1, 2 and 3). Readers familiar with these lemmas may skip to the “Constructible numbers” section below.

**Definitions: Algebraic numbers, irreducible polynomials, minimal polynomials and degrees.**

By an *algebraic number*, we mean some real or complex number $\alpha$ that is the root of a polynomial $a_0 + a_1 x + a_2 x^2 + \cdots + a_m x^m$ with integer coefficients $a_i$. We will assume here that the $a_i$ have no common factor, since if they do all coefficients can be divided by this factor, and $\alpha$ is still a root. Note also that if $\alpha$ is the root of a polynomial with rational coefficients, then by multiplying all coefficients by the greatest common multiple of the denominators of the $a_i$, one obtains a polynomial with integer coefficients that has $\alpha$ as a root. An *irreducible polynomial* is a polynomial with integer coefficients that cannot be factored into two polynomials of lesser degrees. The *minimal polynomial* of an algebraic number $\alpha$ is the polynomial with integer coefficients of minimal degree (i.e., the irreducible polynomial) that has $\alpha$ as a root, and the *degree* of $\alpha$, denoted ${\rm deg}(\alpha)$, is the degree of its minimal polynomial. Note that if $a_0 + a_1 x + a_2 x^2 + \cdots + a_m x^m$ is the minimal polynomial of $\alpha$, then neither $a_0$ nor $a_m$ can be zero, since otherwise $\alpha$ would satisfy a polynomial of lower degree.

**LEMMA 1 (The rational root theorem)**: If $x = p / q$ is a rational root of a polynomial $a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0$, where $a_i$ are integer coefficients with neither $a_n$ nor $a_0$ zero (otherwise the polynomial is equivalent to one of lesser degree), and where $p$ and $q$ are integers without any common factor, then $p$ divides $a_0$ and $q$ divides $a_n$.

**Proof**: By hypothesis, $a_n (p/q)^n + a_{n-1} (p/q)^{n-1} + \cdots + a_1 (p/q) + a_0 = 0.$ After multiplying through by $q^n$, taking the lowest-order (constant) term to the right-hand side and factoring out $p$ from the other terms, we obtain $$p (a_n p^{n-1} + a_{n-1} p^{n-2}q + \cdots + a_1 q^{n-1}) = – a_0 q^n.$$ Since the large expression in parentheses is an integer, $p$ divides the left-hand side and thus also the right-hand side. But since $p$ and $q$ have no common factors, $p$ cannot divide $q^n$. Thus $p$ must divide $a_0$. Similarly, by taking the highest-order term to the right-hand side and factoring out $q$ from the other terms, we can write $$q (a_{n-1} p^{n-1} + a_{n-2} p^{n-2} q + \cdots + a_0 q^{n-1}) = – a_n p^n.$$ But again, since the expression in parentheses is an integer, $q$ divides the left-hand side and thus must also divide the right-hand side. Since $q$ cannot divide $p^n$, it must divide $a_n$.

**Examples**: The constant $1 + \sqrt{2}$ satisfies the polynomial $x^2 – 2 x – 1$, so that its algebraic degree cannot exceed two. By Lemma 1, the only possible rational roots are $x = 1$ and $x = -1$, and neither satisfies $x^2 – 2x – 1 = 0$. Thus the polynomial $x^2 – 2 x – 1$ is irreducible and is the minimal polynomial of $1 + \sqrt{2}$, so that ${\rm deg}(1 + \sqrt{2}) = 2$. As another example, consider $\sqrt[3]{2}$, which satisfies $x^3 – 2 = 0$. If this polynomial factors at all over the rational numbers, it has a linear factor and thus a rational root. But by Lemma 1, the only possible rational roots are $x = \pm 1$ and $x = \pm 2$, and none of these satisfies $x^3 – 2 = 0$. Thus $x^3 – 2$ is irreducible and is the minimal polynomial of $\sqrt[3]{2}$, so that ${\rm deg}(\sqrt[3]{2}) = 3$.

**Definitions: Algebraic fields.**

By an *algebraic field*, we will mean, for our purposes here, a set of numbers $F$ that includes the rational numbers $R$, with the usual four arithmetic operations defined, and with the property that if $x$ and $y$ are in $F$, then so are the sum $x+y$, difference $x-y$, product $x y$, and, if $x \neq 0$, the reciprocal $1/x$. There is a rich theory of algebraic fields, but we will not require anything here more than this definition and two basic lemmas (Lemmas 2 and 3), which we prove here.

**LEMMA 2 (Algebraic extensions are fields)**: Let $R$ be the rational numbers, and let $\alpha$ be algebraic of degree $m \ge 2$, with minimal polynomial $P(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_m x^m$. Then the set $S = \{r_0 + r_1 \alpha + r_2 \alpha^2 + \cdots + r_{m-1} \alpha^{m-1}\}$, where $r_i \in R$, is a field.

**Proof**: If $s = s_0 + s_1 \alpha + s_2 \alpha^2 + \cdots + s_{m-1} \alpha^{m-1}$, where $s_i \in R$, and $t = t_0 + t_1 \alpha + t_2 \alpha^2 + \cdots + t_{m-1} \alpha^{m-1}$, where $t_i \in R$ are elements of $S$, their sum $s + t = (s_0 + t_0) + (s_1 + t_1) \alpha + \cdots + (s_{m-1} + t_{m-1}) \alpha^{m-1}$ is clearly in $S$, as is their difference. We can write their product as: $$st = s_0 t_0 + (s_0 t_1 + s_1 t_0) \alpha + (s_0 t_2 + s_1 t_1 + s_2 t_0) \alpha^2 + \cdots + s_{m-1} t_{m-1} \alpha^{2m-2}.$$ This expression involves terms with $\alpha^k$ for $k \ge m$, but note that the term with $\alpha^m$ can be rewritten by solving the minimal polynomial $P(\alpha) = 0$ to obtain $\alpha^m = -(a_0/a_m) – (a_1/a_m) \alpha – \cdots – (a_{m-1}/a_m) \alpha^{m-1}$; higher powers of $\alpha$ can similarly be rewritten in terms of $\alpha^k$ for $k \le m-1$. Thus the product $st$ is in $S$. The reciprocal of $s = Q(\alpha) = s_0 + s_1 \alpha + s_2 \alpha^2 + \cdots + s_{m-1} \alpha^{m-1}$ can be calculated by applying the extended Euclidean algorithm to the polynomials $P(x)$ and $Q(x)$, where polynomial long division, dropping the remainder, is used instead of ordinary division. In this case, where $P(x)$ and $Q(x)$ have no common factor (since $P(x)$ is irreducible), the extended Euclidean algorithm produces polynomials $A(x)$ and $B(x)$, of lower degree than $P(x)$, such that $A(x) P(x) + B(x) Q(x) = 1$. Substituting $x = \alpha$ and recalling that $P(\alpha) = 0$, we have $B(\alpha) Q(\alpha) = 1$, so that $1/s = 1 / Q(\alpha) = B(\alpha)$, which is in $S$ since ${\rm deg}(B(x)) \le m – 1$. This proves that $S$ is a field.

**The extended Euclidean algorithm**: The extended Euclidean algorithm is best illustrated by using it to find the reciprocal (multiplicative inverse) of an integer $m$ in the field of integers modulo $p$, where $p$ is a prime. By the “field of integers modulo $p$,” we mean the set of integers $\{0, 1, 2, \cdots, p – 1\}$, where multiples of $p$ are added to or subtracted from the results of the operations $+, \, -, \, \times$ to reduce them to the range $(0, 1, 2, \cdots, p – 1)$. We will illustrate the case $m = 100$ and $p = 257$ (a prime). First start with a column vector containing $100$ and $257$, together with a $2 \times 2$ identity matrix. Then operate as follows, where ${\rm int}(\cdot)$ denotes integer part, where $R_1$ and $R_2$ denote the first and second rows of the column vector, and where $C_1$ and $C_2$ denote the first and second columns of the $2 \times 2$ matrix:

\begin{align*}

\left( \begin{array}{r} 100 \\ 257 \end{array} \right) & \quad

\left( \begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array} \right) & {\rm int}(257/100) = 2, \; {\rm so} \; R_2 \leftarrow R_2 – 2 \cdot R_1 \; {\rm and} \; C_1 \leftarrow C_1 + 2 \cdot C_2. \\

\left( \begin{array}{r} 100 \\ 57 \end{array} \right) & \quad

\left( \begin{array}{rr} 1 & 0 \\ 2 & 1 \end{array} \right) & {\rm int}(100/57) = 1, \; {\rm so} \; R_1 \leftarrow R_1 – 1 \cdot R_2 \; {\rm and} \; C_2 \leftarrow C_2 + 1 \cdot C_1. \\

\left( \begin{array}{r} 43 \\ 57 \end{array} \right) & \quad

\left( \begin{array}{rr} 1 & 1 \\ 2 & 3 \end{array} \right) & {\rm int}(57/43) = 1, \; {\rm so} \; R_2 \leftarrow R_2 – 1 \cdot R_1 \; {\rm and} \; C_1 \leftarrow C_1 + 1 \cdot C_2. \\

\left( \begin{array}{r} 43 \\ 14 \end{array} \right) & \quad

\left( \begin{array}{rr} 2 & 1 \\ 5 & 3 \end{array} \right) & {\rm int}(43/14) = 3, \; {\rm so} \; R_1 \leftarrow R_1 – 3 \cdot R_2 \; {\rm and} \; C_2 \leftarrow C_2 + 3 \cdot C_1. \\

\left( \begin{array}{r} 1 \\ 14 \end{array} \right) & \quad

\left( \begin{array}{rr} 2 & 7 \\ 5 & 18 \end{array} \right) & {\rm int}(14/1) = 14, \; {\rm so} \; R_2 \leftarrow R_2 – 14 \cdot R_1 \; {\rm and} \; C_1 \leftarrow C_1 + 14 \cdot C_2. \\

\left( \begin{array}{r} 1 \\ 0 \end{array} \right) & \quad

\left( \begin{array}{rr} 100 & 7 \\ 257 & 18 \end{array} \right) & {\rm one \; entry \; of \; vector \; is \; zero, \; so \; process \; ends; \; result \; = \; 18.}

\end{align*}

Thus $100^{-1} \bmod 257 = 18$. Indeed, $18 \cdot 100 – 7 \cdot 257 = 1$. Note that $18$ appears in the final matrix, in the opposite corner from the input value $100$. If the number of iterations performed is odd, the sign of the result is changed (not necessary in above example). The polynomial version of the extended Euclidean algorithm operates in exactly the same way, except: (a) the division operation is replaced by polynomial long division, dropping the remainder; and (b) if at termination the nonzero constant in the column vector is not $1$, then the result must be divided by this constant.

**Example**: Let $\alpha$ be a root of the minimal polynomial $P(x) = 1 + x + x^3$ (note that $P(x)$ is irreducible by Lemma 1), and define $S = \{r_0 + r_1 \alpha + r_2 \alpha^2\}$ for $r_i \in R$. Let $s = 1 + 2 \alpha + \alpha^2$ and $t = 1 + \alpha^2$ be two typical elements of $S$. Clearly their sum $s + t = 2 + 2 \alpha + 2 \alpha^2$ is in $S$, as is their difference. Their product $s t = 1 + 2 \alpha + 2 \alpha^2 + 2 \alpha^3 + \alpha^4$. Since $P(\alpha) = 0$, we can solve the minimal polynomial to write $\alpha^3 = -1 – \alpha$ and $\alpha^4 = – \alpha – \alpha^2$, and rewrite the product as $s t = -1 -\alpha + \alpha^2$, which then is in $S$. To find the reciprocal of $s = 1 + 2 \alpha + \alpha^2$, we apply the extended Euclidean algorithm to the polynomials $P(x) = 1 + x + x^3$ and $Q(x) = 1 + 2 x + x^2$. The algorithm proceeds as follows, where ${\rm quot}(\cdot,\cdot)$ denotes polynomial quotient, dropping remainder:

\begin{align*}

\left( \begin{array}{c} x^2 + 2 x + 1 \\ x^3 + x + 1 \end{array} \right) & \quad \left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right) & \begin{array}{c} {\rm quot}((x^3 + x + 1), (x^2 + 2 x + 1)) = x – 2, \; {\rm so} \\ R_2 \leftarrow R_2 – (x – 2) \cdot R_1 \; {\rm and} \; C_1 \leftarrow C_1 + (x – 2) \cdot C_2. \end{array} \\

\left( \begin{array}{c} x^2 + 2 x + 1 \\ 4 x + 3 \end{array} \right) & \quad \left( \begin{array}{cc} 1 & 0 \\ x – 2 & 1 \end{array} \right) & \begin{array}{c} {\rm quot}((x^2 + 2 x + 1), (4 x + 3)) = x / 4 + 5/16, \; {\rm so} \\ R_1 \leftarrow R_1 – (x/4 + 5/16) \cdot R_2 \; {\rm and} \; C_2 \leftarrow C_2 + (x/4 + 5/16) \cdot C_1. \end{array} \\

\left( \begin{array}{c} 1/16 \\ 4 x + 3 \end{array} \right) & \quad \left( \begin{array}{cc} 1 & x/4 + 5/16 \\ x – 2 & x^2/4 – 3 x/16 + 3/8 \end{array} \right) & \begin{array}{c} {\rm quot}((4x + 3), 1/16) = 64 x + 48, \; {\rm so} \\ R_1 \leftarrow R_1 – (64 x + 48) \cdot R_2 \; {\rm and} \; C_2 \leftarrow C_2 + (64 x + 48) \cdot C_1. \end{array} \\

\left( \begin{array}{c} 1/16 \\ 0 \end{array} \right) & \quad \left( \begin{array}{cc} 16 + 32 x + 16 x^2 & x / 4 + 5/16 \\ 16 x^3 + 16 x + 16 & x^2/4 – 3 x /16 + 3/8 \end{array} \right) & {\rm process \; ends; \; result} \; = \; x^4 / 4 – 3 x / 16 + 3 / 8. \\

\end{align*}

After dividing the result $(x^4 / 4 – 3 x / 16 + 3 / 8)$ by $1/16$, we obtain $B(x) = 4 x^2 – 3 x + 6$, so that $1/s = B(\alpha) = 6 – 3 \alpha + 4 \alpha^2$. Indeed, $(6 – 3 \alpha + 4 \alpha^2)s = (6 – 3 \alpha + 4 \alpha^2)(1 + 2 \alpha + \alpha^2) = 6 + 9 \alpha + 4 \alpha^2 + 5 \alpha^3 + 4 \alpha^4$, which, after recalling $\alpha^3 = -1 – \alpha$ and $\alpha^4 = -\alpha – \alpha^2$, gives $1$.

**Definitions: Linear independence and degrees.**

Suppose that we have a field $S$ that contains the rational numbers $R$ (i.e., $R \subset S$), such that every $s \in S$ can be written as $s = r_1 \alpha_1 + r_2 \alpha_2 + \cdots + r_m \alpha_m$, with $r_i \in R$, and with the property that the $\alpha_i$ are *linearly independent*: if $r_1 \alpha_1 + r_2 \alpha_2 + \cdots + r_n \alpha_m = 0$, with $r_i \in R$, then each $r_i = 0$. Such an extension is said to be of *degree* $m$, which we will denote here as ${\rm deg}(S / R) = m$ (this is also denoted $[S:R] = m$).

Suppose $\alpha$ is algebraic of degree $m$, with minimal polynomial $a_0 + a_1 x + a_2 x^2 + \cdots + a_m x^m$. Then consider the field $S$ consisting of all $s$ that can be written $s = r_0 + r_1 \alpha + r_2 \alpha^2 + \cdots + r_{m-1} \alpha^{m-1}$, with $r_i \in R$. Since there are $m$ terms, it is clear that ${\rm deg}(S/R) \leq m$. Now suppose that for some choice of $r_0, r_1, \cdots, r_{m-1}$, that $r_0 + r_1 \alpha + r_2 \alpha^2 + \cdots + r_{m-1} \alpha^{m-1} = 0$. Since the minimal polynomial of $\alpha$ has degree $m$, and not degree $m-1$ or less, this can only happen if each $r_i = 0$. Thus ${\rm deg}(S/R) = m$ exactly. In other words, *the notions of algebraic degree and field extension degree coincide when the field is generated by an algebraic number*.

**Example**: Consider the case $S = R(\sqrt[3]{2})$, where $S$ is the set of all $s$ that can be written $s = r_0 + r_1 \sqrt[3]{2} + r_2 (\sqrt[3]{2})^2$, with $r_i \in R$. Clearly $R \subset S$, and, since there are three terms, ${\rm deg}(S/R) \leq 3$. Now suppose that for some choice of $r_0, r_1, r_2$ that $r_0 + r_1 \sqrt[3]{2} + r_2 (\sqrt[3]{2})^2 = 0$. Since $\sqrt[3]{2}$ satisfies the polynomial $x^3 – 2 = 0$, and this polynomial is irreducible, as we showed above, $x^3 – 2$ is the minimal polynomial of $\sqrt[3]{2}$ and ${\rm deg}(\sqrt[3]{2}) = 3$. Since the minimal polynomial of $\sqrt[3]{2}$ is of degree $3$ and not $2$ or less, it follows that $r_0 + r_1 \sqrt[3]{2} + r_2 (\sqrt[3]{2})^2 = 0$ only when $r_0 = r_1 = r_2 = 0$. Thus ${\rm deg}(S/R) = 3$ exactly.

We now prove an important fact about the degrees of field extensions, which will be key to our main impossibility proofs below:

**LEMMA 3 (Multiplication of degrees of field extensions)**: Suppose that we have three fields $R, S, T$ (where $R$ is the field of rational numbers as before) such that $R \subset S \subset T$, and with ${\rm deg}(S/R) = m$ and ${\rm deg}(T/S) = n$. Then ${\rm deg} (T/R) = {\rm deg} (S/R) \cdot {\rm deg} (T/S) = m \cdot n$.

**Proof**: For ease of notation, we will prove this in the specific case $m = 2, n = 3$, i.e., where ${\rm deg} (S/R) = 2$ and ${\rm deg} (T/S) = 3$, although the argument is completely general. By hypothesis, all $s \in S$ can be written as $s = r_1 \alpha_1 + r_2 \alpha_2$, where $r_i \in R$ are rational numbers, and with the property that if $r_1 \alpha_1 + r_2 \alpha_2 = 0$, then each $r_i = 0$. In a similar way, by hypothesis all $t \in T$ can be written as $t = s_1 \beta_1 + s_2 \beta_2 + s_3 \beta_3$, where $s_i \in S$, and with the property that if $s_1 \beta_1 + s_2 \beta_2 + s_3 \beta_3 = 0$, then each $s_i = 0$.

First note that when we write any $t \in T$ as $t = s_1 \beta_1 + s_2 \beta_2 + s_3 \beta_3,$ where $s_1, s_2, s_3 \in S$, each of these $s_i$ can in turn be rewritten in terms of $r_i$ and $\alpha_i$: in particular, $s_1 = a_1 \alpha_1 + a_2 \alpha_2$, for some $a_i \in R$; $s_2 = b_1 \alpha_1 + b_2 \alpha_2$, for some $b_i \in R$; and $s_3 = c_1 \alpha_1 + c_2 \alpha_2$, for some $c_i \in R$. Thus we can write $$t = (a_1 \alpha_1 + a_2 \alpha_2) \beta_1 + (b_1 \alpha_1 + b_2 \alpha_2) \beta_2 + (c_1 \alpha_1 + c_2 \alpha_2) \beta_3 $$ $$= a_1 (\alpha_1 \beta_1) + a_2 (\alpha_2 \beta_1) + b_1 (\alpha_1 \beta_2) + b_2 (\alpha_2 \beta_2) + c_1 (\alpha_1 \beta_3) + c_2 (\alpha_2 \beta_3),$$ where $a_1, a_2, b_1, b_2, c_1, c_2 \in R$. We now see that we can write any $t \in T$ as a linear sum of the six terms shown, with coefficients in $R$. Thus ${\rm deg}(T/R) \leq 6$. Now suppose that some $t \in T$ is zero, so that $$t = (a_1 \alpha_1 + a_2 \alpha_2) \beta_1 + (b_1 \alpha_1 + b_2 \alpha_2) \beta_2 + (c_1 \alpha_1 + c_2 \alpha_2) \beta_3 = 0.$$ By hypothesis, this can only happen if each of the three coefficient expressions $(a_1 \alpha_1 + a_2 \alpha_2)$, $(b_1 \alpha_1 + b_2 \alpha_2)$ and $(c_1 \alpha_1 + c_2 \alpha_2)$ is zero, which in turn is only possible if each $a_1, a_2, b_1, b_2, c_1, c_2$ is zero. Thus the six terms $\alpha_1 \beta_1, \alpha_2 \beta_1, \alpha_1 \beta_2, \alpha_2 \beta_2, \alpha_1 \beta_3$ and $\alpha_2 \beta_3$ are linearly independent, so that ${\rm deg}(T/R) = 6$ exactly.

The more general case where ${\rm deg} (S/R) = m$ and ${\rm deg} (T/S) = n$ is handled in exactly the same way, only with somewhat more complicated notation.

**Definitions: Constructible numbers.**

The next step is to establish what kinds of numbers can be constructed by ruler and compass. Readers familiar with the mathematics of ruler-and-compass constructions may skip to the “Angle trisection” section below.

First, it is clear that addition can be easily done with ruler and compass by marking the two distances on a straight line, then the combined distance is the sum. Subtraction can be done similarly. Multiplication can also be done by ruler and compass, as illustrated in the figure to the right (taken from here), and division is just the inverse of this process.

What’s more, one can compute square roots by ruler and compass. In the second illustration to the right (taken from here), assuming that $AC = 1$, simple properties of right triangles show that the distance $AD = \sqrt{AB}$.

Thus, starting with a distance of unit length, numbers composed of any finite combination of the four arithmetic operations and square roots can be constructed. The converse is also true: Such numbers are the *only* numbers that can be constructed, as we will now show.

**LEMMA 4 (Degrees of constructible numbers)**: If $S$ is the number field resulting from a finite sequence of ruler-and-compass constructions, then ${\rm deg}(S/R) = 2^m$ for some integer $m$.

**Proof**: Ruler-and-compass constructions can only extend the rational number field by a sequence of one of the following operations, each of which has algebraic degree 1 or 2 over the field generated by the previous operation:

- Finding the $(x,y)$ coordinates of the intersection of two lines is equivalent to solving a set of two linear equations in two unknowns, which involves only the four arithmetic operations. Note that this does
*not*involve any field extension. - Finding the $(x,y)$ coordinates of the intersection of a line with a circle is equivalent to solving a system of a linear equation and a quadratic equation, which requires only arithmetic and one square root.
- Finding the $(x,y)$ coordinates of the intersection of two circles is equivalent to solving two simultaneous quadratic equations, which again requires only arithmetic and one square root. See any high school analytic geometry text for details.
- Copying the distance between any two constructed points to the $x$ axis is equivalent to computing the distance between the two points, which requires only arithmetic and one square root.

Thus, by Lemma 3, the algebraic degree over the rationals of the final number field so constructed, after a sequence of $n$ ruler-and-compass operations, is $2^m$ for some integer $m \le n$.

**Definitions: Angle trisection.**

We first must keep in mind that some angles *can* be trisected. For example, a right angle can be trisected, because a 30 degree angle can be constructed, simply by bisecting one of the interior angles of an equilateral triangle. To show the impossibility result, we need only exhibit a single angle that cannot be trisected. We choose a 60 degree angle (i.e., $\pi/3$ radians). In other words, we will show that a 20 degree angle (i.e., $\pi/9$ radians) cannot be constructed. Note that it is sufficient to prove that the number $x = \cos(\pi/9)$ cannot be constructed, because if it could, then by copying this distance to the $x$ axis, with the left end at the origin, then constructing a perpendicular from the right end to the unit circle, the resulting angle will be the desired trisection.

Here we require only an elementary fact of trigonometry, namely the formula for the cosine of a triple angle. For convenience, we will prove this and some related formulas here, based only on elementary geometry and a little algebra. Readers familiar with these formulas may skip to the “Proof of the main theorems” section below.

**LEMMA 5 (Triple angle formula for cosine)**: $\cos(3\alpha) = 4 \cos^3(\alpha) – 3 \cos(\alpha).$

**Proof**: We start with the formula $$\sin (\alpha + \beta) = \sin (\alpha) \cos (\beta) + \cos (\alpha) \sin (\beta).$$ This fact has a simple geometric proof, which is illustrated to the right, where $OP = 1$. First note that $RPQ = \alpha, \, PQ = \sin(\beta)$ and $OQ = \cos (\beta)$. Further, $AQ/OQ = \sin(\alpha)$, so $AQ = \sin(\alpha) \cos(\beta)$, and $PR/PQ = \cos(\alpha)$, so $PR = \cos(\alpha) \sin(\beta)$. Combining these results, $$\sin(\alpha + \beta) = PB = RB + PR = AQ + PR = \sin(\alpha) \cos(\beta) + \cos(\alpha) \sin(\beta).$$ [This proof and illustration are taken from here.] By applying this formula to $(\pi/2 – \alpha)$ and $(-\beta)$, one obtains the related formula: $$\cos(\alpha + \beta) = \cos(\alpha) \cos(\beta) – \sin(\alpha) \sin(\beta).$$ From these formulas one immediately obtains the formulas $\sin(2\alpha) = 2 \sin(\alpha) \cos(\alpha)$ and $\cos(2\alpha) = 2 \cos^2(\alpha) – 1$. Now we can write: $$\cos(3\alpha) = \cos (\alpha + 2\alpha) = \cos(\alpha) \cos(2\alpha) – \sin(\alpha) \sin(2\alpha) = \cos(\alpha) (2 \cos^2(\alpha) – 1) – 2 \sin^2(\alpha)\cos(\alpha)$$ $$= \cos(\alpha) (2 \cos^2(\alpha) – 1) – 2 (1 – \cos^2(\alpha))\cos(\alpha) = 4 \cos^3(\alpha) – 3 \cos(\alpha).$$

**Proof of the main theorems:**

**THEOREM 1**: There is no ruler-and-compass scheme to trisect an arbitrary angle.

**Proof**: As mentioned above, we will show that the angle $\pi/3$ radians or 60 degrees cannot be trisected, or, in other words, that the number $\cos (\pi/9)$ cannot be constructed. From Lemma 5 we can write $$\cos (3(\pi/9)) = \cos (\pi/3) = 1/2 = 4 \cos^3(\pi/9) – 3 \cos(\pi/9),$$ so that $\cos(\pi/9)$ is a root of the polynomial equation $8 x^3 – 6 x – 1 = 0$. If this polynomial factors at all, it must have a linear factor and hence a rational root. But by Lemma 1, the only possible rational roots are $x = \pm 1, \, x = \pm 1/2, \, x = \pm 1/4$ and $x = \pm 1/8$, none of which satisfies $8 x^3 – 6 x – 1 = 0$. Thus $8 x^3 – 6 x – 1$ is irreducible and is the minimal polynomial of $\cos(\pi/9)$, so that $\deg(\cos(\pi/9)) = 3$. Hence any extension field $F$ containing $\cos(\pi/9)$ must have $\deg(F/R) = 3n$ for some integer $n$. However, by Lemma 4 the only possible degrees for number fields composed by ruler-and-compass constructions are powers of two. Since $2^m$ is not divisible by $3$, no finite sequence of ruler-and-compass constructions can compose an angle of $\pi/9$ or 20 degrees, which is the trisection of the angle $\pi/3$ or 60 degrees.

**THEOREM 2**: There is no ruler-and-compass scheme to duplicate an arbitrary cube.

**Proof**: This follows by the same line of reasoning, since duplication of a cube is equivalent to constructing the number $\sqrt[3]{2}$, which is a root of the polynomial equation $x^3 – 2 = 0$. As noted in one of the examples above, if $x^3 – 2$ factors at all, one of the factors must be linear, yielding a rational root. But by Lemma 1, the only possible rational roots are $x = \pm 1$ and $x = \pm 2$, and none of these satisfies $x^3 – 2 = 0$. Thus $x^3 – 2$ is irreducible and is the minimal polynomial of $\sqrt[3]{2}$, so that ${\rm deg}(\sqrt[3]{2}) = 3$. Hence any extension field $F$ containing $\sqrt[3]{2}$ must have $\deg(F/R) = 3n$ for some integer $n$.

For other proofs in this series see the listing at Simple proofs of great theorems.

]]>This theorem has a long, tortuous history. In 1608, Peter Roth wrote that a polynomial equation of degree $n$ with real coefficients may have $n$ solutions, but offered no proof. Leibniz and Nikolaus Bernoulli both asserted that quartic polynomials of

Continue reading Simple proofs: The fundamental theorem of algebra

]]>The

This theorem has a long, tortuous history. In 1608, Peter Roth wrote that a polynomial equation of degree $n$ with real coefficients may have $n$ solutions, but offered no proof. Leibniz and Nikolaus Bernoulli both asserted that quartic polynomials of the form $x^4 + a^4$, where $a$ is real and nonzero, could not be solved or factored into linear or quadratic factors, but Euler later pointed out that $$x^4 + a^4 = (x^2 + a \sqrt{2} x + a^2) (x^2 – a \sqrt{2} x + a^2).$$ Eventually mathematicians became convinced that something like the fundamental theorem must be true. Numerous mathematicians, including d’Alembert, Euler, Lagrange, Laplace and Gauss, published proofs in the 1600s and 1700s, but each was later found to be flawed or incomplete. The first complete and fully rigorous proof was by Argand in 1806. For additional historical background on the fundamental theorem of algebra, see this Wikipedia article.

A proof of the fundamental theorem of algebra is typically presented in a college-level course in complex analysis, but only after an extensive background of underlying theory such as Cauchy’s theorem, the argument principle and Liouville’s theorem. Only a tiny percentage of college students ever take such coursework, and even many of those who do take such coursework never really grasp the essential idea behind the fundamental theorem of algebra, because of the way it is typically presented in textbooks (this was certainly the editor’s initial experience with the fundamental theorem of algebra). All this is generally accepted as an unpleasant but unavoidable feature of mathematical pedagogy.

We present here a proof of the fundamental theorem of algebra. We emphasize that it is both *elementary* and *self-contained* — it relies only a well-known completeness axiom and some straightforward reasoning with estimates and inequalities. It should be understandable by anyone with a solid high school background in algebra, trigonometry and complex numbers, although some experience with limits and estimates, as is commonly taught in high school or first-year college calculus courses, would also help.

**Gist of the proof**:

For readers familiar with Newton’s method for solving equations, one starts with a reasonably close approximation to a root, then adjusts the approximation by moving closer in an appropriate direction. We will employ the same strategy here, showing that if one assumes that the argument where the polynomial function achieves its minimum absolute value is not a root, then there is a nearby argument where the polynomial function has an even smaller absolute value, contradicting the assumption that the argument of the minimum absolute value is not a root.

**Definitions and axioms**:

In the following $p(z)$ will denote the $n$-th degree polynomial $p(z) = p_0 + p_1 z + p_2 z^2 + \cdots + p_n z^n$, where the coefficients $p_i$ are any complex numbers, with neither $p_0$ nor $p_n$ equal to zero (otherwise the polynomial is equivalent to one of lesser degree). We will utilize a fundamental completeness property of real and complex numbers, namely that a continuous function on a closed set achieves its minimum at some point in the domain. This can be taken as an axiom, or can be easily proved by applying other well-known completeness axioms, such as the Cauchy sequence axiom or the nested interval axiom. See this Wikipedia article for more discussion on completeness axioms.

**THEOREM 1**: Every polynomial with real or complex coefficients has at least one complex root.

**Proof**: Suppose that $p(z)$ has no roots in the complex plane. First note that for large $z$, say $|z| > 2 \max_i |p_i/p_n|$, the $z^n$ term of $p(z)$ is greater in absolute value than the sum of all the other terms. Thus given some $B > 0$, then for any sufficiently large $s$, we have $|p(z)| > B$ for all $z$ with $|z| \ge s$. We will take $B = 2 |p(0)| = 2 |p_0|$. Since $|p(z)|$ is continuous on the interior and boundary of the circle with radius $s$, it follows by the completeness axiom mentioned above that $|p(z)|$ achieves its minimum value at some point $t$ in this circle (possibly on the boundary). But since $|p(0)| < 1/2 \cdot |p(z)|$ for all $z$ on the circumference of the circle, it follows that $|p(z)|$ achieves its minimum at some point $t$ in the *interior* of the circle.

Now rewrite the polynomial $p(z)$, translating the argument $z$ by $t$, thus producing a new polynomial $$q(z) = p(z – t) \; = \; q_0 + q_1 z + q_2 z^2 + \cdots + q_n z^n,$$ and similarly translate the circle described above. Presumably the polynomial $q(z)$, defined on some circle centered at the origin (which circle is contained within the circle above), has a minimum absolute value $M > 0$ at $z = 0$. Note that $M = |q(0)| = |q_0|$.

Our proof strategy is to construct some point $x$, close to the origin, such that $|q(x)| < |q(0)|,$ thus contradicting the presumption that $|q(z)|$ has a minimum nonzero value at $z = 0$. If our method gives us merely a direction in the complex plane for which the function value decreases in magnitude (a *descent direction*), then by moving a small distance in that direction, we hope to achieve our goal of constructing a complex $x$ such that $|q(x)| < |q(0)|$. This is the strategy we will pursue.

**Construction of $x$ such that $|q(x)| < |q(0)|$**:

Let the first nonzero coefficient of $q(z)$, following $q_0$, be $q_m$, so that $q(z) = q_0 + q_m z^m + q_{m+1} z^{m+1} + \cdots + q_n z^n$. We will choose $x$ to be the complex number $$x = r \left(\frac{- q_0}{ q_m}\right)^{1/m},$$ where $r$ is a small positive real value we will specify below, and where $(-q_0/q_m)^{1/m}$ denotes any of the $m$-th roots of $(-q_0/q_m)$.

**Comment**: As an aside, note that unlike the real numbers, in the complex number system the $m$-th roots of a real or complex number are always guaranteed to exist: if $z = z_1 + i z_2$, with $z_1$ and $z_2$ real, then the $m$-th roots of $z$ are given explicitly by $$\{R^{1/m} \cos ((\theta + 2k\pi)/m) + i R^{1/m} \sin ((\theta+2k\pi)/m), \, k = 0, 1, \cdots, m-1\},$$ where $R = \sqrt{z_1^2 + z_2^2}$ and $\theta = \arctan (z_2/z_1)$. The guaranteed existence of $m$-th roots, a feature of the complex number system, is the key fact behind the fundamental theorem of algebra.

**Proof that $|q(x)| < |q(0)|$**:

With the definition of $x$ given above, we can write

$$q(x) = q_0 – q_0 r^m + q_{m+1} r^{m+1} \left(\frac{-q_0} {q_m}\right)^{(m+1)/m} + \cdots + q_n r^n \left(\frac{-q_0} {q_m}\right)^{n/m}$$ $$= q_0 – q_0 r^m + E,$$ where the extra terms $E$ can be bounded as follows. Assume that $q_0 \leq q_m$ (a very similar expression is obtained for $|E|$ in the case $q_0 \geq q_m$), and define $s = r(|q_0/q_m|)^{1/m}$. Then, by applying the well-known formula for the sum of a geometric series, we can write $$|E| \leq r^{m+1} \max_i |q_i| \left|\frac{q_0}{q_m}\right|^{(m+1)/m} (1 + s + s^2 + \cdots + s^{n-m-1}) \leq \frac{r^{m+1}\max_i |q_i|}{1 – s} \left|\frac{q_0}{q_m}\right|^{(m+1)/m}.$$ Thus $|E|$ can be made arbitrarily smaller than $|q_0 r^m| = |q_0|r^m$ by choosing $r$ small enough. For instance, select $r$ so that $|E| < |q _0|r^m / 2$. Then for such an $r$, we have $$|q(x)| = |q_0 - q_0 r^m + E| < |q_0 - q_0 r^m / 2| = |q_0|(1 - r^m / 2)< |q_0| = |q(0)|,$$ which contradicts the original assumption that $|q(z)|$ has a minimum nonzero value at $z = 0$.

**THEOREM 2**: Every polynomial of degree $n$ with real or complex coefficients has exactly $n$ complex roots, when counting individually any repeated roots.

**Proof**: If $\alpha$ is a real or complex root of the polynomial $p(z)$ of degree $n$ with real or complex coefficients, then by dividing this polynomial by $(z – \alpha)$, using the well-known polynomial division process, one obtains $p(z) = (z – \alpha) q(z) + r$, where $q(z)$ has degree $n – 1$ and $r$ is a constant. But note that $p(\alpha) = r = 0$, so that $p(z) = (z – \alpha) q(z)$. Continuing by induction, we conclude that the original polynomial $p(z)$ has exactly $n$ complex roots, although some might be repeated.

For other proofs in this series see the listing at Simple proofs of great theorems.

]]>Continue reading Simple proofs: The irrationality of pi

]]>Mankind has been fascinated with $\pi$, the ratio between the circumference of a circle and its diameter, for at least 2500 years. Ancient Hebrews used the approximation 3 (see 1 Kings 7:23 and 2 Chron. 4:2). Babylonians used the approximation 3 1/8. Archimedes, in the first rigorous analysis of $\pi$, proved that 3 10/71 < $\pi$ < 3 1/7, by means of a sequence of inscribed and circumscribed triangles. Later scholars in India (where decimal arithmetic was first developed, at least by 300 CE), China and the Middle East computed $\pi$ ever more accurately. In 1665, Newton computed 16 digits, but, as he later confessed, “I am ashamed to tell you to how many figures I carried these computations, having no other business at the time.” In 1844 the computing prodigy Johan Dase produced 200 digits. In 1874, Shanks published 707 digits, although later it was found that only the first 527 were correct.

In the 20th century, $\pi$ was computed to thousands, then to millions, then to billions of digits, in part due to some remarkable new formulas and algorithms for $\pi$, and in part due to clever computational techniques, all accelerated by the relentless advance of Moore’s Law. The most recent computation, as far as the present author is aware, produced 22.4 trillion digits. One may download up to the first trillion digits here. For additional details on the history and computation of $\pi$, see The quest for $\pi$ and The computation of previously inaccessible digits of pi^2 and Catalan’s constant.

For many years, part of the motivation for computing $\pi$ was to answer the question of whether $\pi$ was a rational number — if $\pi$ was rational, the decimal expansion would eventually repeat. But given that no repetitions were found in early computations, many leading mathematicians in the 17th and 18th century concluded that $\pi$ must be irrational. In 1761, Johann Heinrich Lambert settled the question by proving that $\pi$ is irrational. Then in 1882 Ferdinand von Lindemann proved that $\pi$ is transcendental, which proved once and for all that the ancient Greek problem of squaring the circle is impossible (because ruler-and-compass constructions can only produce power-of-two degree algebraic numbers).

We present here what we believe to be the simplest proof that $\pi$ is irrational. It was first published in the 1930s as an exercise in the Bourbaki treatise on calculus. It requires only a familiarity with integration, including integration by parts, which is a staple of any high school or first-year college calculus course. It is similar to, but simpler than (in the present author’s opinion), a related proof due to Ivan Niven.

**Gist of the proof:**

The basic idea is to define a function $A_n(b)$, based on an integral from $0$ to $\pi$. This function has the property that for each positive integer $b$ and for all sufficiently large integers $n$, $A_n(b)$ lies strictly between 0 and 1. Yet another line of reasoning, assuming $\pi$ is a rational number, and applying integration by parts, concludes that $A_n(b)$ must be an integer. This contradiction shows that $\pi$ must be irrational.

**THEOREM**: $\pi$ is irrational.

**Proof**:

For each positive integer $b$ and non-negative integer $n$, define $$A_n(b) = b^n \int_0^\pi \frac{x^n(\pi – x)^n \sin(x)}{n!} \, {\rm d}x.$$ Note that the integrand function of $A_n(b)$ is zero at $x = 0$ and $x = \pi$, but is strictly positive elsewhere in the integration interval. Thus $A_n(b) > 0$. In addition, since $x (\pi – x) \le (\pi/2)^2$, we can write $$A_n(b) \le \frac{\pi b^n}{n!}\left(\frac{\pi}{2}\right)^{2n} = \frac{\pi (b \pi^2 / 4)^n}{n!},$$ which is less than one for large $n$, since, by Stirling’s formula, the $n!$ in the denominator increases faster than the $n$-th power in the numerator. Thus we have established, for any integer $b \ge 1$ and all sufficiently large $n$, that $0 < A_n(b) < 1.$

Now let us assume that $\pi = a/b$ for relatively prime integers $a$ and $b$. Define $f(x) = x^n (a – bx)^n / n!$. Then we can write, using integration by parts, $$A_n(b) = \int_0^\pi f(x) \sin(x) \, {\rm d}x = \left[-f(x) \cos(x)\right]_0^\pi – \left[-f'(x) \sin(x)\right]_0^\pi + \cdots$$ $$ \cdots \pm \left[f^{(2n)} (x) \cos(x)\right]_0^\pi \pm \int_0^\pi f^{(2n+1)} (x) \cos(x) \, {\rm d}x.$$ Now note that for $k = 1$ to $k = n$, the derivative $f^{(k)} (x)$ is zero at $x = 0$ and $x = \pi$. For $k = n + 1$ to $2n$, $f^{(k)} (x)$ takes various integer values at $x = 0$ and $x = \pi$; and for $k = 2n + 1$, the derivative is zero. Also, the function $\sin(x) $ is $0$ at $x = 0$ and $x = \pi$, while the function $\cos(x)$ is $1$ at $x = 0$ and $-1$ at $x = \pi$. Combining these facts, we conclude that $A_n (b)$ must be an integer. This contradiction proves that $\pi$ is irrational.

For other proofs in this series see the listing at Simple proofs of great theorems.

]]>Modern mathematics is one of the most enduring edifices created by humankind, a magnificent form of art and science that all too few have the opportunity of appreciating. The great British mathematician G.H. Hardy wrote, “Beauty is the first test; there is no permanent place in the world for ugly mathematics.” Mathematician-philosopher Bertrand Russell added: “Mathematics, rightly viewed, possesses not only truth, but supreme beauty — a beauty cold and austere, like that of sculpture, without appeal to any part of our weaker nature, without the gorgeous trappings of painting or music,

Continue reading Simple proofs of great theorems

]]>Modern mathematics is one of the most enduring edifices created by humankind, a magnificent form of art and science that all too few have the opportunity of appreciating. The great British mathematician G.H. Hardy wrote, “Beauty is the first test; there is no permanent place in the world for ugly mathematics.” Mathematician-philosopher Bertrand Russell added: “Mathematics, rightly viewed, possesses not only truth, but supreme beauty — a beauty cold and austere, like that of sculpture, without appeal to any part of our weaker nature, without the gorgeous trappings of painting or music, yet sublimely pure, and capable of a stern perfection such as only the greatest art can show.”

Yet it is a sad fact that few people on this planet even begin to grasp the full power and beauty of mathematics. For the vast majority of humankind, even in first-world nations, all students see of mathematics is some relatively dull drilling in decimal arithmetic (e.g., the dreaded times tables), some very basic algebra and various textbook “story problems.” Most never glimpse the higher-level beauty of real modern mathematics that lies at the end of the tunnel. Many otherwise promising students lose interest in mathematics as a result of dreary instruction, often by teachers who did not major in mathematics and for whom teaching mathematics is a chore. Inevitably, math homework often loses out to social media, computer games, parties and sports. The result is a tragic waste of desperately needed mathematical talent for the present and future information age.

Part of the problem here is that hardly any students ever see some of the more beautiful parts of mathematics, such as elegant proofs of important mathematical theorems. Some of these theorems may be mentioned in textbooks, but often their proofs are dismissed as “beyond the scope of this text,” relegated to some higher-level mathematics course only taken by advanced math majors or graduate students, which is an extremely tiny audience. Even when these proofs are part of the curriculum, the presentation in textbooks often fails to highlight the keys ideas in a way that students can easily grasp and gain intuition.

As a single example, the fundamental theorem of algebra, namely the fact that every algebraic equation of degree *n* has *n* roots, typically is presented in a college-level course in complex analysis, and then only after an extensive background of underlying theory such as Cauchy’s theorem, the argument principle and Liouville’s theorem. Only a tiny percentage of college students ever take such coursework, and even many of those who do take such coursework never really grasp the essential idea behind the fundamental theorem of algebra, because of the way it is typically presented in textbooks (this was certainly the editor’s initial experience with the fundamental theorem of algebra). All this is generally accepted as an unpleasant but unavoidable feature of mathematical pedagogy.

The editor of this blog rejects this defeatism. He is convinced that many of the greatest theorems of mathematics can be proved significantly more simply, and requiring significantly less background, than they are typically presented in traditional textbooks and courses.

Thus the editor has decided to start a new feature in this blog, namely to present simple, beautiful and readily understandable proofs of a number of important theorems. In most cases, as we will see, this can be done without sacrificing rigor, and yet also without a huge background of prior coursework and machinery that many younger students especially might not have the patience or time for.

The proofs envisioned for presentation will be designed to satisfy the following requirements:

- The proofs will include some historical background and context.
- In most cases, the most simple, elegant and beautiful proof of a given theorem will be the one presented.
- Whenever possible, the proofs will be suitable for presentation to bright high school students or, at most, first- or second-year college mathematics students.
- Any necessary lemmas or other background material beyond very basic facts will be included.
- The entire proof, including historical context and background material, will not exceed three pages or so.

An impossible dream? The editor is convinced otherwise.

Here are the articles currently in the series. Others will be added in the future:

- The irrationality of pi.
- The fundamental theorem of algebra.
- The impossibility of trisection.
- The fundamental theorem of calculus.
- Archimedes’ calculation of pi.

Comments on any of this material certainly welcome. What’s more, if a reader has a suggestion for some theorem that not yet been featured, or even for an alternate proof of some theorem that already has been featured, please contact the editor. His email is given at Welcome to the Math Scholar blog.

]]>As we have explained in previous Math Scholar blogs (see, for example, MS1 and MS2), the perplexing question why the heavens are silent even though, from all evidence, the universe is teeming with potentially habitable exoplanets, continues to perplex and fascinate scientists. It is one of the most significant questions of modern science, with connections to mathematics, physics, astronomy, cosmology, biology and philosophy.

In spite of the glib dismissals that are often presented in public venues and (quite sadly) in writings by some professional scientists (see MS1 and MS2 for examples and rejoinders), there

Continue reading New books and articles on the “great silence”

]]>As we have explained in previous Math Scholar blogs (see, for example, MS1 and MS2), the perplexing question why the heavens are silent even though, from all evidence, the universe is teeming with potentially habitable exoplanets, continues to perplex and fascinate scientists. It is one of the most significant questions of modern science, with connections to mathematics, physics, astronomy, cosmology, biology and philosophy.

In spite of the glib dismissals that are often presented in public venues and (quite sadly) in writings by some professional scientists (see MS1 and MS2 for examples and rejoinders), there is no easy resolution to Fermi’s paradox, as it is known.

**Sociological explanations**, such as “they have lost interest in research and exploration,” or “they are under a galactic directive not to disturb Earth,” or “they have moved on to more advanced communication technologies,” all fall prey to a diversity argument: Any advanced society that arose from an evolutionary process (which is the only conceivable natural process to explain the origin of complex beings), surely must consist of millions if not billions or even trillions of diverse individuals. All it takes is for one individual in just one of these advanced civilizations to disagree, say with the galactic directive not to or send probes or communications to Earth in a form that earthlings or other newly technological societies can easily recognize, and this explanation fails. Note, for example, that any signal (microwave, light or gravitational wave), once it has been sent by extraterrestrials (ETs) on its way to Earth, cannot be censored or called back, according to known laws of physics.

Similarly, **technological explanations** (i.e., it is too difficult for them) were once considered persuasive, but now founder on the fact that even present-day human technology is sufficient to communicate with and (soon) to send probes to distant stars and planets (see, for example, technical reports TR1, TR2, TR3 and TR4). If we can do this now, or within the next 20-30 years, surely civilizations thousands or millions of years more advanced can do this much better, and much more cheaply too. This same space technology, by the way, defeats the “all advanced civilizations destroy themselves” explanation, since as soon as human civilization (for instance) establishes permanent outposts on the Moon, Mars and beyond, its continued survival will be largely impervious to any calamities that may befall the home planet. By the way, the self-destruct explanation also falls prey to a diversity argument — even if most civilizations destroy themselves, surely at least a handful have figured out how to survive and thrive over the long term?

So are we alone, at least in the Milky Way? Perhaps, although many (including the present author) find such a conclusion rather distasteful, to say the least, since it goes against the Copernican principle that has guided scientific research for many years. See previous Math Scholar blogs MS1 and MS2 for a more complete discussion of these issues.

Recently two books and a *Scientific American* article (which summarizes a third book) have appeared on this topic. Here is a brief synopsis of each.

In his book If the Universe Is Teeming with Aliens … WHERE IS EVERYBODY?: Seventy-Five Solutions to the Fermi Paradox and the Problem of Extraterrestrial Life, British physicist Stephen Webb updates his 2002 edition with a thorough discussion of 50 earlier proposed solutions to Fermi’s paradox, updated to reflect the latest scientific findings, plus 25 new proposed solutions for this edition. The discussion is very readable and focused, yet does not omit technical detail where needed.

Here are just a few of the proposed solutions presented (often with devastating rejoinders) in Webb’s book. Item numbers and titles are from Webb’s book:

*The Zoo scenario*: We are part of a preserve, being tended by an overarching society of ETs.*The interdict scenario*: There is a galaxy-wide prohibition on visits to or communication with Earth.*They have not had time to reach us*: ETs may exist, but they have not yet been able to reach us or communicate with us.*Bracewell-von Neumann probes*: ETs could have explored the galaxy by means of self-replicating probes that arrive at a star system, find a suitable asteroid or planet, construct additional copies of themselves, and then launch them to other star systems, with updated software beamed from the home planet. Such probes could have explored the entire Milky Way in a few million years, an eyeblink in cosmic time. So where are they?*They are signaling, but we don’t know how to listen*: We are not using the right technology to receive and/or decipher their signals.*They have no desire to communicate*: ETs have settled into the good life and no longer have any desire to explore the cosmos or learn about other newly technological societies like us.*Intelligence isn’t permanent*: Intelligence has arisen elsewhere, but then degraded because it was not needed so much anymore.*We live in a post-biological universe*: ETs have progressed from biological to machine-based existence and no longer have interest in us.*Continuously habitable zones are narrow*: The Earth is nearly unique in maintaining a habitable environment over billions of years.*The galaxy is a dangerous place*: Much if not most of the galaxy is uninhabitable due to a steady rain of supernova explosions, gamma ray bursts and other phenomena that regularly sterilize their surroundings.*Life’s genesis is rare*: Although progress has been made, the origin of the first reproducing biomolecules on Earth is still a mystery. Perhaps biogenesis was a freak of nature, much more singular than we suppose.*Intelligence at the human level is rare*: Even if life is common, perhaps human-level intelligence, with its associated technology, is extremely rare.*Science is not inevitable*: Perhaps even if intelligence emerges, the development of science and advanced technology is rare.

Webb discusses each of these proposed solutions in detail, with numerous references to technical literature and other analyses. He also frankly discusses the weaknesses of each of these, and, in many places, offers his own assessment. In the end, Webb acknowledges that while we still have much to learn, he personally thinks it is unlikely that there are any other human-like technological societies in the Milky Way.

In his new book The Great Silence: Science and Philosophy of Fermi’s Paradox, Milan Cirkovic analyzes Fermi’s paradox from a somewhat deeper philosophical and scientific basis. He argues that many of approaches to the great silence are riddled with logical errors and other weaknesses. He then attempts to carefully examine many of the proposed solutions and to identify what can be safely said.

For example, Cirkovic argues that discussions of the Drake equation, which is used to estimate the number of space-communicating civilizations in the Milky Way, are typically fallacious. Drake’s equation, which was first proposed by Frank Drake in the 1960s, is:

N = R_{*} f_{p} n_{e} f_{l} f_{i} f_{c} L

where N is the number of predicted ET civilizations, R_{*} is the star formation rate in the Milky Way, f_{p} is the fraction of stars possessing planets, n_{e} is the average number of habitable planets per planetary system, f_{l} is the fraction of habitable planets possessing life, f_{i} is the fraction of inhabited planets developing intelligent life, f_{c} is the fraction of intelligent societies developing space communication technology, and L is the lifespan of civilizations that have developed technology.

Cirkovic argues that because of the huge uncertainties involved, each term should be given by a probability distribution, and the product integrated over the multidimensional parameter space. In any event, since virtually all of these terms are matters of active astrobiological research, writers who cite Drake’s equation without analyzing it more carefully are on shaky ground.

In the end, Cirkovic argues that in addition to identifying and correcting prejudices and fallacies in this arena, we need to realize that substantive progress awaits more real scientific data. But he insists that we must not retreat from Copernican principle, namely that ultimately our existence is not unique. He concludes on a positive note:

Even when [space exploration] is achieved, it will not be the beginning of the end — but it might be the end of the beginning of the greatest of all journeys, the wildest adventure of all adventures. … After all, this is what science has stood for in its most brilliant moments: courage, conviction, and the spirit of great adventure. These qualities might, in the final analysis, withstand the test of cosmic time, repulse the last challenge to Copernicanism, and ultimately break the Great Silence.

Astrophysicist and well-known science writer John Gribbin has weighed into the debate with an article in the August 2018 *Scientific American* entitled Are Humans Alone in the Milky Way? Why we are probably the only intelligent life in the galaxy. His article presents a brief summary of his 2011 book Alone in the Universe: Why Our Planet Is Unique, with a number of updates reflecting recent scientific discoveries and findings.

Gribbin argues that we are the product of a long series of remarkably fortuitous (and unlikely) events that might not have been repeated anywhere else, at least within the Milky Way. These include:

*Special timing*: Given that all elements heaver than hydrogen, helium and lithium were created in supernova explosions, this means that a metal-rich environment like our sun and solar system could not exist until several billion years after the big bang, yet not so long after that the sun begins to die.*Special location*: While the Milky Way seems vast, regions much closer than our sun to the center of the galaxy are bathed in high-energy gamma particles and gamma ray bursts, which are lethal to any conceivable form of biology. Much further away from the center than our sun, metallicity and other conditions are not met. In other words, we reside in a Goldilocks zone in the Milky Way, estimated to span only about 7% of the galactic radius.*Special planet*: In spite of all the talk about exoplanets in the Milky Way, only a few are rocky like Earth, fewer reside in the habitable zone, and even fewer still (or perhaps none at all) have these features plus a combination of a molten core generating a magnetic field together with plate tectonics to regulate climate. In our solar system, Mars is too far away, has no magnetic field and is too cold (although primitive organisms might exist underground). Venus lacks a magnetic field and plate tectonics, and, because of runaway greenhouse effect, is much too hot for any life. The collision with Earth by a Mars-size object in the early solar system, and the presence of a large moon, also appear to be key to Earth’s long-term, life-friendly environment.*Special life*: Once Earth’s original molten state settled, life appeared with “indecent rapidity,” in Gribbin’s words. But then not much happened for the next three billion years. More complex cells, known as eukaryotes, resulted from the chance merger of two primordial organisms, bacteria and archaea, but even after this crucial milestone event, not much happened for another billion years, until the Cambrian explosion roughly 540 million years ago. So the rise of complex life forms, much less intelligent creatures such as Homo sapiens, was hardly inevitable.*Special species*: Homo sapiens appeared about 300,000 years ago, but nearly went extinct 150,000 years ago, and again about 70,000 years ago. Thus our subsequent domination of the planet seems far from assured.

Gribbin concludes as follows:

As we put everything together, what can we say? Is life likely to exist elsewhere in the galaxy? Almost certainly yes, given the speed with which it appeared on Earth. Is another technological civilization likely to exist today? Almost certainly no, given the chain of circumstances that led to our existence. These considerations suggest we are unique not just on our planet but in the whole Milky Way. And if our planet is so special, it becomes all the more important to preserve this unique world for ourselves, our descendants and the many creatures that call Earth home.

As mentioned above, the question of extraterrestrial intelligent life, and why we have not yet found any, surely must rank at or near the top of the most significant scientific questions of all time. In his foreword to Webb’s book, noted British astronomer Martin Rees explains what is at stake:

]]>Maybe we will one day find ET. On the other hand, [Webb’s] book offers 75 reasons why SETI searches may fail; Earth’s intricate biosphere may be unique. That would disappoint the searchers, but it would have an upside: it would entitle us humans to be less “cosmically modest.” Moreover, this outcome would not render life a cosmic sideshow. Evolution may still be nearer its beginning than its end. Our Solar System is barely middle aged and, if humans avoid self-destruction, the post-human era beckons. Life from Earth could spread through the Galaxy, evolving into a teeming complexity far beyond what we can even conceive. If so, our tiny planet — this pale blue dot floating in space — could be the most important place in the entire Galaxy, and the first interstellar voyagers from Earth would have a mission that would resonate through the entire Galaxy and perhaps beyond.