Simple proofs: The fundamental theorem of calculus

MathJax TeX Test Page

Isaac Newton, Credit: sjisblog.com

Introduction:
The fundamental theorem of calculus, namely the fact that integration is the inverse of differentiation, is indisputably one of the most important results of all mathematics, with applications across the whole of modern science and engineering. It is not an exaggeration to say that our entire modern world hinges on the fundamental theorem of calculus. It has applications in astronomy, astrophysics, quantum theory, relativity, geology, biology, economics, just to name a few fields of science, as well as countless applications in all types of engineering — civil, mechanical, electrical, electronic, chip manufacture, aerospace, medical and more.

The fundamental theorem of calculus was first stated and proved in rudimentary form in the 1600s by James Gregory, and, in improved form, by Isaac Barrow, while Gottfried Leibniz coined the notation and theoretical framework that we still use today.

But it was Isaac Newton who grasped the full impact of the theorem and applied it to unravel both the cosmos and the everyday world. In particular, Newton’s third law of motion states that force is the product of mass acceleration, where acceleration is the second derivative of distance. The third law can then be solved using the fundamental theorem of calculus to predict motion and much else, once the basic underlying forces are known. Ernst Mach, writing in 1901, graciously acknowledged Newton’s contributions to science in general, and his to calculus in particular, in these terms:

All that has been accomplished in mathematics since his day has been a deductive, formal, and mathematical development of mechanics on the basis of Newton’s laws.

We present here a rigorous and self-contained proof of the fundamental theorem of calculus (Parts 1 and 2), including proofs of necessary underlying lemmas such as the fact that a continuous function on a closed interval is integrable. These proofs are based only on elementary algebra and some basic completeness axioms of real numbers, and thus are suitable for anyone with a high school background in mathematics, although some familiarity with limits, inequalities, derivatives and integrals is required. We believe this exposition to be significantly more concise than most textbook treatments, and thus easier to grasp.

Definitions: Continuous functions, derivatives and integrals.
A function $f(t)$ defined on a closed interval $[a,b]$ is continuous at a point $t \in [a,b]$ if, given any $\epsilon \gt 0$, there is a $\delta \gt 0$ such that $|f(s) – f(t)| \lt \epsilon$ for all $s \in [a,b]$ with $|s – t| \lt \delta$. The function $f(t)$ is uniformly continuous on the closed interval $[a,b]$ if, given any $\epsilon \gt 0$, there exists a $\delta \gt 0$ (independent of $t$) such that $|f(s) – f(t)| \lt \epsilon$ for all $s, t \in [a,b]$ with $|s – t| \lt \delta$.

Credit: Wikimedia

Given a continuous function $f(t)$ on $[a,b]$, the derivative $f'(t)$ is defined for $t \in (a,b)$ as the limit, if it exists,
$$f'(t) = \lim_{h \to 0} \frac{f(t + h) – f(t)}{h}.$$ Note that it follows immediately from this definition that if $f(t) = g(t) + h(t)$ for all $t \in [a,b]$, then $f'(t) = g'(t) + h'(t)$ for all $t \in (a,b)$, and if $f(t) = C$ for all $t \in [a,b]$, then $f'(t) = 0$ for all $t \in (a,b)$. These facts will be used in Theorems 1 and 2 below.

The Riemann integral $\int_{a}^{b}f(t)\,{\rm d}t$ is defined informally as the signed area of the region in the $xy$-plane that is bounded by the graph of $f(t)$ and the $x$-axis between $x = a$ and $x = b$. Note that the area above the $x$-axis is positive and adds to the total area, while the area below the $x$-axis is negative and subtracts from the total area.

More formally, given the closed interval $[a, b]$, define a “tagged partition” of $[a,b]$ as a pair of sequences $(x_i, 0 \leq i \leq n)$ and $(t_i, 1 \leq i \le n)$, such that
$$a = x_0 \leq t_1 \leq x_1 \leq t_2 \leq x_2 \leq \cdots \leq x_{n-1} \leq t_n \leq x_n = b.$$ Note that the tagged partition $P = \{x_i, t_i\}$ divides the interval $[a,b]$ into $n$ sub-intervals $[x_{i−1}, x_i]$, each of which includes a point $t_i$. Let $d_i = x_i – x_{i-1}$, and let $D(P) = \max_i d_i$. The Riemann sum of the tagged partition $P$ is defined as $R(P) = \sum _{i=1}^{n}f(t_{i}) \, d_i$. The Riemann integral of $f(t)$ on $[a,b]$ is then defined as $I = \int_a^b f(t) \, {\rm d}t$, provided that, given any $\epsilon \gt 0$, there is a $\delta \gt 0$ such that for any tagged partition $P$ of $[a,b]$ with $D(P) \lt \delta$, the condition $|I – R(P)| \lt \epsilon$ is satisfied. Note that it follows immediately from this definition that for any $c \in (a,b)$, we have $\int_a^b f(t) \, {\rm d}t = \int_a^c f(t) \, {\rm d}t + \int_c^b f(t) \, {\rm d}t$. This fact will be used in Theorems 1 and 2 below.

AXIOM 1 (Completeness axioms):
Axiom 1a (The Heine-Borel theorem): Any collection of open intervals covering a closed set of real numbers has a finite subcover.
Axiom 1b (Intermediate value theorem): Every continuous function on a closed real interval attains each value between and including its minimum and maximum.
Axiom 1c (Least upper bound / greatest lower bound theorem): Every set of reals that is bounded above has a least upper bound; every set of reals that is bounded below has a greatest lower bound.

Comment: These are not really “theorems” but instead are merely equivalent axioms of the property of completeness for real numbers — i.e., the property that the set of real numbers, unlike say the set of rational numbers, has no “holes.” Each of these axioms can be proven from the others. See the Wikipedia article Completeness of the real numbers and this Chapter for details.

LEMMA 1 (Continuity on a closed interval implies uniformly continuity): Let $f(t)$ be a continuous function on the closed interval $[a,b]$. Then $f(t)$ is uniformly continuous on $[a,b]$.

Proof: By the definition given above for a continuous function, for every point $t \in [a,b]$, given $\epsilon \gt 0$, there is some $\delta_t \gt 0$ such that for all $s$ in the region $|s – t| \lt \delta_t$, we have $|f(s) – f(t)| \lt \epsilon$. Then consider the collection of intervals $C(\epsilon) = \{(t – \delta_t, t + \delta_t), t \in [a,b]\}$. Clearly $C(\epsilon)$ is an open cover of $[a,b]$. Now by Axiom 1a, this collection has a finite subcover $C_n(\epsilon) = \{(t_1 – \delta_1, t_1 + \delta_2), (t_2 – \delta_2, t_2 + \delta_2), \cdots, (t_n – \delta_n, t_n + \delta_n)\}$. Let $\delta = \min_i \delta_i$. Now consider any $x, y \in [a,b]$ with $|x – y| \lt \delta$. If both $x$ and $y$ are in the same interval of the collection, we have $|f(x) – f(y)| \lt 2 \epsilon$. If they lie in adjacent intervals, then $|f(x) – f(y)| \lt 4 \epsilon$. Either way, the condition for uniform continuity is satisfied.

LEMMA 2 (A continuous function on a closed interval is integrable): If the function $f(t)$ is continuous on $[a,b]$, then the integral $\int_a^b f(t) \, {\rm d}t$ exists.

Proof: For any tagged partition $P = \{x_i, t_i\}$ of $[a,b]$, define $u_i, v_i \in [x_{i-1}, x_i]$ as the values, guaranteed to exist by Axiom 1b above, such that $f(u_i) = \min_{x \in [x_{i-1},x_i]} f(x)$ and $f(v_i) = \max_{x \in [x_{i-1},x_i]} f(x)$. By Lemma 1, given any $\epsilon \gt 0$, there is some $\delta \gt 0$ such that whenever $D(P) \lt \delta$, (so that $|u_i – v_i| \lt \delta$), it follows that $|f(u_i) – f(v_i)| \lt \epsilon / (b – a)$. Define $U(P) = \sum_{i=1}^n f(u_i) d_i$ and $V(P) = \sum_{i=1}^n f(v_i) d_i$, and let $R(P) = \sum_{i=1}^n f(t_i) d_i$ be the Riemann sum as above. Clearly $U(P) \leq R(P) \leq V(P)$. However, $$V(P) – U(P) = \sum_{i=1}^n (f(v_i) – f(u_i)) d_i \lt \frac{\epsilon}{b-a} \sum_{i=1}^n d_i = \epsilon.$$ Since the set of $U(P)$ over all tagged partitions $P$ is bounded above by $(b-a) \max_{t \in [a,b]} f(t)$ or by any $V(P)$, by Axiom 1c there is a least upper bound $U$. Similarly, since the set of $V(P)$ over all tagged partitions $P$ is bounded below by $(b-a) \min_{t \in [a,b]} f(t)$ or by any $U(P)$, there is a greatest lower bound $V$. In summary, we have shown that given any $\epsilon \gt 0$, there is a $\delta \gt 0$ such that for any tagged partition $P$ with $D(P) \lt \delta$, we have $U(P) \leq R(P) \leq V(P)$; $U(P) \leq U \leq V(P)$; $U(P) \leq V \leq V(P)$; and $V(P) \lt U(P) + \epsilon$. It follows that $U = V = \int_a^b f(t) \, {\rm d}t$.

LEMMA 3 (Extreme value theorem): If $f(t)$ is continuous on $[a,b]$ and differentiable on $(a,b)$, and if $f(a) = f(b)$, then there is some $c \in (a,b)$ such that $f'(c) = 0$.

Proof: By Axiom 1b, $f(t)$ has a maximum and a minimum in $[a,b]$. Since $f(a) = f(b)$, then if $f(t)$ is not constant for all $t \in [a,b]$ it must have either a minimum or a maximum at some point $c$ in the interior of $[a,b]$. Say $c$ is an interior maximum, but that $g'(c) \gt 0$. Recall that $$g'(c) = \lim_{h \to 0} \frac{g(c+h) – g(c)}{h}.$$ Thus given $\epsilon$ with $0 \lt \epsilon \lt g'(c)$, there is a $\delta \gt 0$ such that for all $h$ with $0 \lt h \lt \delta$ we have that $g(c+h) \gt g(c) + h g'(c) – h \epsilon \gt g(c)$, contradicting the assumption that $c$ is an interior maximum. A similar contradiction results if one assumes that $g'(c) \lt 0$, or assumes that $c$ is an interior minimum and $g'(c) \ne 0$.

LEMMA 4 (Zero derivative implies constant): If $f(t)$ is continuous on $[a,b]$ and $f'(t) = 0$ for all $t \in (a,b)$, then $f(t) = C$ for some constant $C$.

Proof: If $f(t)$ is not constant, then by Axiom 1b it has a minimum at some point $c_1$ and a maximum at some point $c_2$. Let $d_1 = f(c_1)$ and $d_2 = f(c_2)$, and assume, for convenience, that $c_1 \lt c_2$. Now define, on the closed interval $[c_1, c_2]$, the function $g(t) = f(t) – (d_2 – d_1) (t – c_1) / (c_2 – c_1) – d_1$. Its derivative is $g'(t) = f'(t) – (d_2 – d_1) / (c_2 – c_1) = – (d_2 – d_1) / (c_2 – c_1)$, since by hypothesis $f'(t) = 0$ for all $t \in (a,b)$. Now note that $g(c_1) = g(c_2) = 0$. Thus by Lemma 3 there is some $c \in (c_1, c_2)$ such that $g'(c) = 0$. But since $g'(c) = – (d_2 – d_1) / (c_2 – c_1)$, this implies that $d_1 = d_2$ and in fact that $f(t)$ is constant on $[a,b]$.

THEOREM 1 (Fundamental theorem of calculus, Part 1): Let $f(t)$ be continuous on $[a,b]$, and define the function $g(x) = \int_a^x f(t) \, {\rm d}t$. Then $g(x)$ is differentiable on $(a,b)$, and for every $x \in (a,b), \, g'(x) = f(x)$.

Credit: Wikimedia

Proof: By Lemma 2, for any $x \in (a,b)$ and any $h \gt 0$ with $x + h \lt b$, we have have that the integrals $\int_a^x f(t) \, {\rm d}t$, $\int_a^{x+h} f(t) \, {\rm d}t$ and their difference, namely $\int_x^{x+h} f(t) \, {\rm d}t$, each exist. By Lemma 1, given any $\epsilon \gt 0$, there is some $\delta \gt 0$ such that for any $x \in (a,b)$, any $h > 0$ such that $(x-h, x+h) \subset (a,b)$, and any $t \in (x-h, x+h)$, the inequality $f(x) – \epsilon \le f(t) \le f(x) + \epsilon$ holds. Thus for any tagged partition $P$ of $[x,x+h]$ with $D(P) \lt \delta$, we can write $$h (f(x) – \epsilon) \le \sum_{i=1}^n f(t_i) \, d_i \le h (f(x) + \epsilon),$$ which then means that $$h (f(x) – \epsilon) \le \int_x^{x+h} f(t) \, {\rm d}t \le h (f(x) + \epsilon).$$ Now note that $\int_x^{x+h} f(t) \, {\rm d}t = g(x+h) – g(x)$. Thus $$f(x) – \epsilon \le \frac{g(x+h) – g(x)}{h} \le f(x) + \epsilon.$$ A similar argument for $\int_{x-h}^x f(t) \, {\rm d}t$ gives the desired result $g'(x) = f(x)$. We note in passing that the condition for continuity of $g(x)$ on $[a,b]$ is easily met: Given $\epsilon \gt 0$, define $\delta \lt \epsilon / (\max_{t \in [a,b]} |f(t)|)$, and then $|g(x) – g(y)| \lt \epsilon$ whenever $|x – y| \lt \delta$.

THEOREM 2 (Fundamental theorem of calculus, Part 2): Let $f(t)$ be continuous on $[a,b]$, and let $G(x)$ be any continuous function on $[a,b]$ that satisfies $G'(x) = f(x)$ on $(a,b)$. Then $\int_a^b f(t) \, {\rm d}t = G(b) – G(a)$.

Proof: The function $g(x)$ of Part 1 is continuous on $[a,b]$ and also satisfies $g'(t) = f(x)$ for $x \in (a,b)$. Then $h(x) = G(x) – g(x)$ is continuous and has zero derivative on $(a,b)$. Thus by Lemma 4, $h(x) = C$ for some constant $C$, so that $G(x) = g(x) + C$. Now we can write $$G(b) – G(a) = (g(b) + C) – (g(a) + C) = g(b) – g(a) = \int_a^b f(t) \, {\rm d}t.$$

Other posts in the “Simple proofs” series

The other posts in the “Simple proofs of great theorems” series are available Here.