Deconstructing the Mean Value Theorem, Part 2

The Mean Value Theorem says there’s at least one point where the slope of the tangent line on a smooth curve equals the slope of the secant line between the endpoints.

At its heart, the Mean Value Theorem is fairly intuitive. It essentially states that a smooth curve has at least one tangent line whose slope is equal to the slope of the (secant) line between the endpoints of the graph over any closed interval.

The main goal of this post is to study the big ideas of the mean value theorem proof. However, we will not go into complete depth on all the details.

In Part 1 of this series of posts, we looked at the precise statement of the Mean Value Theorem and deconstructed it into its pieces. Our goals were: 1) to understand the meaning of the statement of the theorem. 2) To understand why its hypotheses are necessary. And 3) to understand why its hypotheses make the theorem as strong as possible.

But how is the Mean Value Theorem (MVT) rigorously proved? We will look at how the proof of the MVT can be deconstructed by working backwards. A complete and polished proof will not be given. Instead, a partial proof will be given that relies on other facts. We will then explore why those other facts are true. This process will continue until the “core” foundational truths and definitions that lead to the MVT are reached. These truths will be labeled as axioms (or “postulates”). This means that we actually assume that they are true and construct our mathematics from them as a starting point.

This may seem counter-intuitive and perhaps even heretical. How can a mathematical “fact” be taken as “proved” based on another fact that is assumed to be true?

This is actually done all the time in mathematics, and it is necessary; there is really no way around it. It illustrates that any given subject in mathematics lies within a certain axiomatic system; that is, it is based on assumed truths and the rules of logic, as well as precise definitions. We have to start somewhere in our development of mathematics. There is no way to avoid axioms without implicit assumptions and/or circular reasoning.

Axiomatic Mathematics and Geometry

The logical validity of axiomatic mathematics is actually separate from the issue of the real-life utility of mathematics. The real-life usefulness of any area of mathematics is dependent on whether the axioms and theorems for that area seem to most closely match a real-life situation.

For example, there are many “kinds” of geometry. The geometry most people learn when they are young is called Euclidean geometry. This is based upon a system developed by the ancient Greek mathematician Euclid in a treatise called Euclid’s Elements. The most “famous” axiom of that system is the (controversial) parallel postulate, which essentially says that non-parallel lines will intersect, and about which whole books have been written.

One example of a theorem that can be proved in Euclidean geometry is the fact that the angles in any triangle add up to 180^{\circ}. In other words, the sum of the measures of their angles has the same measure as two right angles. This is a fundamental fact for problem solving in Newtonian physics and engineering.

Suffice it to say that you had better use Euclidean geometry to design bridges and buildings if you want to be an engineer or architect!

Non-Euclidean Geometries

On the other hand, there are many non-Euclidean geometries that arise when we ditch the parallel postulate. In one such Euclidean geometry, called spherical geometry, the sum of the measures of the angles in any triangle is greater than 180^{\circ}!!!

When should you use spherical geometry? The “biggest” and perhaps most obvious situation is when you need to think about lines and triangles that lie on the surface of the Earth! A “line” on the surface of the Earth is actually a so-called “great circle“. This is a circle on the surface of the Earth whose center is at the center of the Earth. Lines of longitude are great circles but lines of latitude, except for the equator, are not.

If there are three points on the surface of the Earth that are not all on the same great circle, then the three lines (great circles) that connect the three points will form a triangle on the surface of the Earth (and the three given points will be the vertices of the triangle). If one vertex is at the north pole and two other vertices are both on the equator, one quarter of a turn around the equator from each other, the resulting triangle on the Earth will have all right angles whose measures add up to 270^{\circ}!!!

Great circles are used, when possible, for circumnavigation around the Earth. Airplanes use these paths to save time and fuel. Spherical geometry is used more generally in, for example, the programming of computer systems for GPS satellites.

Physical Intuition for the MVT

Before we dive into the deconstruction of the proof, is there anything we can do to gain more intuition about why the MVT is true?

Besides looking at graphs, perhaps the best way to gain more intuition is to look at motion.

Imagine a race between two cars, one red and one blue. The driver of the blue car is Steady Eddie. He drives the blue car at a constant speed the entire time (to keep things simple, this also means there is no speed-up at the start and no slow-down at the end). The driver of the red car is Wild Kyle. He continually speeds up and slows down the entire time he drives.

If Steady Eddie and Wild Kyle both leave the starting line at time t=0 and reach the finish line at the same moment, then their average speeds must have been the same (by definition of average speed as \frac{\mbox{distance traveled}}{\mbox{time elapsed}}). The animation below shows an overhead camera view of the race (with reverse time shown as well).

The physical intuition behind the truth of the mean value theorem.
Steady Eddie is in the blue car on the bottom and Wild Kyle is in the red car on the top. They start and end the race at the same times, so they have the same average speed. Does Wild Kyle ever have a moment in time where his instantaneous speed equals his average speed (and that of the constant speed Steady Eddie). The MVT says ‘yes’.
Are Their Instantaneous Speeds Ever Equal?

Does Wild Kyle ever have a moment in time where his instantaneous speed equals his average speed (and that of the constant speed Steady Eddie)? The MVT says “yes”.

Why? Using whatever units are handy, let f(t) be the distance traveled by Steady Eddie from time 0 to time t (for 0\leq t\leq b). And let g(t) be the distance traveled by Wild Kyle from time 0 to time t. Then the derivatives f'(t) and g'(t) represent, respectively, the instantaneous speeds of Eddie and Kyle at time t (for any t with 0<t<b).

Since Eddie is Steady, f'(t)=constant, which equals both of their average speeds \frac{f(b)-f(0)}{b-0}=\frac{f(b)}{b}=\frac{g(b)}{b}=\frac{g(b)-g(0)}{b-0}. Since Kyle is Wild, g'(t) is not constant. However, the MVT says that there is some number c between the starting time and the ending time where g'(c) is the same as the average speed \frac{g(b)}{b}=\frac{g(b)-g(0)}{b-0}. This is also the constant speed of Steady Eddie.

This should seem intuitive. Kyle’s speeds are sometimes lower (slower) and sometimes higher (faster) than Eddie’s. Since Kyle’s instantaneous speeds should change in a continuous way, his speed should sometimes equal Eddie’s, which is also their average speed.

I’m actually making the implicit assumption that speeds are continuous in this intuitive explanation. This is not always the case in purely mathematical settings. However, this intuitive explanation is good enough to convince most people.

Videos About Deconstructing the Proof

In the written content to follow, I will mathematically deconstruct the proof the MVT. I have also made video content for this material in the form of two parts from one lecture below.

A Starting Point for Deconstructing the Proof: Rolle’s Theorem

There is another theorem intimately related to the MVT that goes by a different name: Rolle’s Theorem. It is actually a special case of the MVT. But, as we shall see at the end of this post, Rolle’s Theorem can, amazingly, be used to prove the MVT. It is pretty rare for a special case of a theorem to be equivalent to the theorem itself.

Here’s a precise statement of Rolle’s Theorem:

Rolle’s Theorem: If f is a real-valued function defined and continuous on a closed interval [a,b], if f is differentiable on the open interval (a,b), and if f(a)=f(b), then there exists a number c\in (a,b) with the property that f'(c)=0.

Why is this a special case of the MVT? Recall the precise statement of the MVT from Part 1 of this series:

MVT: If f is a real-valued function defined and continuous on a closed interval [a,b] and if f is differentiable on the open interval (a,b), then there exists a number c\in (a,b) with the property that f'(c)=\frac{f(b)-f(a)}{b-a}.

Looking carefully at these statements, we see that Rolle’s Theorem adds the extra hypothesis that f(a)=f(b). But if this extra hypothesis is added to the MVT, the conclusion becomes that there is a number c\in (a,b) such that f'(c)=\frac{f(b)-f(a)}{b-a}=\frac{0}{b-a}=0, which is the conclusion of Rolle’s Theorem.

Why is Rolle’s Theorem True?

Our task now is to understand why Rolle’s Theorem is true. We want to show that the derivative of f, which exists on (a,b), is somewhere equal to zero.

What’s needed here is some other knowledge about derivatives from calculus. Hopefully you recall that, for a continuous function f over a closed interval [a,b] (note that this is a key assumption in Rolle’s Theorem), the extreme (maximum and minimum) values must occur either at the endpoints a or b of the interval or they must occur at the critical points of f in the open interval (a,b). The critical points are those values of x where f'(x) is either zero or undefined. Since f is assumed to be differentiable over (a,b), these reduce to where f'(x)=0. This sounds like a promising line of argument!

Ruling Out a Trivial Case

Before we make this argument airtight, let’s rule out a trivial case. What happens if f(x)=constant for all x\in [a,b]? Then the conclusion of Rolle’s Theorem is easily seen to be true because this implies that f'(x)=0 for all x\in (a,b). In other words, c can be any number in (a,b)!

So now we assume that f is not constant over [a,b]. Since f(a)=f(b), this means there is a number x\in (a,b) with f(x)\not=f(a)=f(b).

In all of what we do now, let’s assume that f(x)>f(a)=f(b) (the case where f(x)<f(a)=f(b) is similar — you can ultimately give it a try on your own). The fact that there is such a number implies that the maximum value of f over [a,b] must occur at some number c\in (a,b). But at such a maximum value, we know that f'(c)=0, so we are done!

Is this a proof? Only if you accept two truths that have not been proved yet: Fact 1) that a continuous function f on [a,b] must have a maximum value over [a,b], and Fact 2) that f'(c)=0 for a number c\in (a,b) at which a differentiable function f has a maximum value.

So, if we want to continue to deconstruct our proof of Rolle’s Theorem, and ultimately the MVT, we must continue by demonstrating the truth of the two facts from the previous paragraph.

Fermat’s Theorem for Stationary Points

It turns out that Fact 2 is easier to prove than Fact 1. To prove Fact 2, sometimes called Fermat’s Theorem for Stationary Points, we use: 1) the definition of the derivative. 2) The definition of the maximum value of a function. And 3) the fact that if a two-sided limit of a function exists, then the one-sided limits of that function (at that same point) both exist and equal the same value as the two-sided limit.

First, we state the definition of the maximum value of a (real-valued) function. There is a similar definition for a minimum value.

Definition of Maximum Value: Let f be a real-valued function defined on an interval I (which could be closed, open, or neither). We say that f has a (global) maximum value at a number c\in I if f(x)\leq f(c) for all x\in I. The real number f(c) is called the maximum value of f on I, and the number c\in I is called the “point where the maximum value occurs”.

Here’s is the precise statement of Fact 2. There is a similar true statement when the function has a minimum value.

Fermat’s Theorem for Stationary Points: Let f be a real-valued function defined and differentiable on an open interval (a,b). If f has a maximum value at c\in (a,b), then f'(c)=0.

Before moving on, we briefly note that this theorem is true even if the maximum value is “local” as opposed to our definition, which is “global”.

The Essence of the Proof of Fermat’s Theorem

Here’s the essence of the proof of Fermat’s Theorem above. Let c\in (a,b) be a point where f has a maximum value. Since f'(c) exists, we can say that

f'(c)=\displaystyle\lim_{h\rightarrow 0^{+}}\frac{f(c+h)-f(c)}{h}=\displaystyle\lim_{h\rightarrow 0^{-}}\frac{f(c+h)-f(c)}{h},

where the + and – notation denotes right- and left-hand limits, respectively.

Since f has a maximum value at c\in (a,b), we can say that f(c+h)\leq f(c), no matter whether h>0 or h<0 (assuming h is close enough to 0 for f(c+h) to be defined). Therefore,

\frac{f(c+h)-f(c)}{h}\leq 0 when h>0 and \frac{f(c+h)-f(c)}{h}\geq 0 when h<0.

But this means that:

\displaystyle\lim_{h\rightarrow 0^{+}}\frac{f(c+h)-f(c)}{h}\leq 0 \mbox{ and } \displaystyle\lim_{h\rightarrow 0^{-}}\frac{f(c+h)-f(c)}{h}\geq 0.

Since both of these one-sided limits are equal, they must also both equal zero. Therefore f'(c)=0, proving Fermat’s Theorem for Stationary Points.

The Extreme Value Theorem

Now we turn to Fact 1. It is a special case of the extremely important Extreme Value Theorem (EVT). It only requires the function to be continuous on a closed interval, not differentiable. Most significantly, the EVT is also something that can be generalized. It can be generalized to continuous functions between arbitrary topological spaces when the domain of the function is assumed to be “compact“, though that’s a topic for a future post.

Extreme Value Theorem (EVT): Let f be a real-valued function defined and continuous on a closed interval [a,b]. Then f has a maximum value on [a,b].

A similar statement can be made about f having a minimum value on [a,b].

This is the point where our deconstruction of the MVT gets extra difficult: proving the EVT. In fact, this is the point where understanding of “true” real analysis, and not “just” calculus, becomes necessary.

The proof of the EVT relies on the relationship between continuity and limits of functions. The proof also relies on a fact called the Bolzano-Weierstrass Theorem. I will describe this theorem in this post, but I will only do so intuitively and not precisely. To be fully rigorous and thorough would take many more pages of writing.

Intuitive Description of the Bolzano-Weierstrass Theorem: Suppose an infinite list (“sequence“) of real numbers c_{1},c_{2},c_{3},\ldots has the property that all of these numbers are in some closed interval [a,b], then there is some number c\in [a,b] for which there are points in the sequence that are as close as we like to c.

Examples to Help in Understanding the Bolzano-Weierstrass Theorem

Perhaps this is best understood at a basic level by thinking about some examples. We will start simple and progress upward in difficulty.

Example 1: Let the sequence be 1,\frac{1}{2},\frac{1}{3},\frac{1}{4},\ldots. Notice that all of these numbers are in, for example, the closed interval [0,1]. Let c=0 and notice that, no matter how small \epsilon>0 is, there is a number \frac{1}{n} in the sequence such that c=0<\frac{1}{n}<\epsilon (for example, if \epsilon=0.0001, then just choose n>10000). The number \epsilon>0 is our representation of the idea of “as close as we like”.

Example 2: Let the sequence be 1,-1,1,-1,1,-1,\ldots (the numbers in a sequence can repeat themselves). Notice that all of these numbers are in, for example, the closed interval [-1,1]. Also notice that, if c=-1 or if c=1, then no matter how small \epsilon>0 is, there are numbers in the sequence a distance <\epsilon from 1 or -1 (because the sequence itself consists of alternating 1’s and -1’s).

Example 3: Consider the sequence defined to be 1+\frac{1}{n} when n is even and -1-\frac{1}{n} when n is odd (so the sequence, written out, would be -2, \frac{3}{2}, -\frac{4}{3}, \frac{5}{4}, -\frac{6}{5},\ldots). These numbers are all in, for example, the closed interval [-2,2]. You should check on your own that c=\pm 1 work as in Example 2 (draw a picture of these points on a number line).

Example 4: This is a far more difficult example, but should be thought about anyway. Consider the sequence defined by \sin(n). As an approximate infinite list, using radian measure for the input n, this would be 0.84147, 0.90930, 0.14112, -0.75680, -0.95892, -0.27942,\ldots, which are all points in [-1,1]. It’s a difficult proof, but it can be shown that any number c\in [-1,1] satisfies the conclusion of the Bolzano-Weierstrass Theorem for this example!

Back to the Proof of the MVT

How is the Bolzano-Weierstrass Theorem helpful for proving the EVT? Again, the proof also relies on properties of continuous functions. A key property of continuous functions that we will need can be described intuitively as saying that if a sequence of points c_{1},c_{2},c_{3},\ldots converges (“gets closer and closer to”) some number c, and if f is continuous at c, then the sequence f(c_{1}), f(c_{2}), f(c_{3}),\ldots converges to f(c).

The first thing to prove is that the real-valued continuous function f defined on [a,b] is “bounded above” (such a function would also be “bounded below”). This means that there is some real number M so that f(x)\leq M for all x\in [a,b] (the graph of y=f(x) stays below some horizontal line y=M).

To argue for why this is true, assume to the contrary that it is not true and try to obtain a logical contradiction. If it is not true, that means, for example, that for any integer n>0, there is a number c_{n}\in [a,b] so that f(c_{n})>n (think about this!). But then this defines a sequence of numbers c_{1},c_{2},c_{3},\ldots in the closed interval [a,b]. By the Bolzano-Weierstrass Theorem, there must be a number c\in [a,b] that some of the points in this sequence get arbitrarily close to. In fact, you could create from those points another sequence that converges to c (called a “subsequence” of the original sequence).

Using the Continuity of the Function

By continuity of f, including continuity at c\in [a,b], this would mean that the values of f at those points that converge to c must converge to f(c). But this is impossible because f(c_{n})>n for all n=1,2,3,\ldots! There’s no way the values of f at those points could converge at all because they get “arbitrarily large”!

This contradiction implies that f must, in fact, be bounded above on [a,b].

The Completeness Axiom is Needed

Now we argue that f has a maximum value on [a,b]. This is another point in this article where the depths of “true” real analysis must be plumbed. We need something called — wait for it — the Completeness Axiom! This is something we will assume is true!

Completeness Axiom: Let S be any nonempty set of real numbers that is bounded above, so that there is a real number M such that x\leq M for all x\in S. Then there is a real number \beta with the following properties: 1) x\leq \beta for all x\in S (\beta is an “upper bound” of S) and 2) if y<\beta, then there is a number x\in S so that y<x (any real number less than \beta is not an upper bound of S). The number \beta is called the (unique) least upper bound of S. It is also called the supremum of S and we write \beta=\sup(S).

This is quite a mouthful and you really need to consider examples to understand it. But at the moment let us just remark that \beta=\sup(S) is kind of like a maximum value of the set S, though it might NOT actually be a member of S! As a simple example, if S is the open interval (0,1), then 1=\sup(S), but 1\not\in S.

Back to the Argument for the Truth of the EVT

We now come back to the argument for the truth of the EVT. Since the continuous function f is bounded above on the closed interval [a,b], we can say that the set of outputs S=\{f(x)|x\in [a,b]\} is bounded above. The Completeness Axiom now implies that \beta=\sup(S) exists as a real number. If we can show that there is a number c\in [a,b] with f(c)=\beta, we will be done by the definition of the supremum given in the statement of the Completeness Axiom as well as the definition of a maximum value of a function over an interval.

For any positive integer n, the number \beta-\frac{1}{n}<\beta. In other words, \beta-\frac{1}{n} is like the “y” in the definition of the supremum (in the Completeness Axiom). This means that there is a number c_{n}\in [a,b] with the property that \beta-\frac{1}{n}<f(c_{n})\leq \beta (also think about the definition of S).

But this means that \displaystyle\lim_{n\rightarrow\infty}f(c_{n})=\beta, since \displaystyle\lim_{n\rightarrow\infty}\frac{1}{n}=0. The Bolzano-Weierstrass Theorem also implies that there is a number c\in [a,b] that some points in the sequence c_{1},c_{2},c_{3},\ldots get arbitrarily close to. Finally, the continuity of f now implies that f(c)=\beta. We are done with our argument for the truth of the EVT (“Fact 1”) and ultimately our argument for the proof of Rolle’s Theorem.

Concluding Comments about the Completeness Axiom

In all this, there are still many details to be filled-in. For example, what are the precise definitions of the limit of a function and the limit of a sequence? What does continuity really mean and why is it related to limits of sequences?

Perhaps the most important question is: why should the Completeness Axiom be an axiom? Shouldn’t we be able to prove it from (hopefully) far simpler principles?

Actually, the Completeness Axiom can be proved from simpler principles. However, even though these principles are indeed simpler, this approach turns out to be a lot of work. If you are interested in how this can be done, you should learn about the subject of Dedekind cuts. For our purposes, we will be satisfied with taking it to be an axiom.

Proving the MVT using Rolle’s Theorem

I promised that I would end this post showing how Rolle’s Theorem can be used to prove the MVT. This can be done pretty easily, though it does involve a mathematical trick.

Here’s the trick: given a function f that satisfies the hypotheses of the MVT, define a new function g by the formula g(x)=f(x)-L(x), where L(x)=f(a)+\frac{f(b)-f(a)}{b-a}(x-a) (the graph of L is the secant line connecting the points (a,f(a)) and (b,f(b))).

Then g is continuous on [a,b] and differentiable on (a,b) since f and L are. Furthermore, since L(a)=f(a) and L(b)=f(b), we can conclude that g(a)=g(b)=0. Hence, by Rolle’s Theorem, there is a number c\in (a,b) such that g'(c)=0. But g'(x)=f'(x)-L'(x)=f'(x)-\frac{f(b)-f(a)}{b-a}. Therefore, f'(c)-\frac{f(b)-f(a)}{b-a}=0, which is equivalent to saying that f'(c)=\frac{f(b)-f(a)}{b-a}.

WHEW!!! YOU FINISHED READING!!! CONGRATULATIONS!!! THE END!!!

Next: Deconstructing the Mean Value Theorem, Part 3