At its heart, the Mean Value Theorem is fairly intuitive. It essentially states that a smooth curve has at least one tangent line whose slope is equal to the slope of the (secant) line between the endpoints of the graph over any closed interval.
The main goal of this post is to study the big ideas of the mean value theorem proof. However, we will not go into complete depth on all the details.
In Part 1 of this series of posts, we looked at the precise statement of the Mean Value Theorem and deconstructed it into its pieces. Our goals were: 1) to understand the meaning of the statement of the theorem. 2) To understand why its hypotheses are necessary. And 3) to understand why its hypotheses make the theorem as strong as possible.
But how is the Mean Value Theorem (MVT) rigorously proved? We will look at how the proof of the MVT can be deconstructed by working backwards. A complete and polished proof will not be given. Instead, a partial proof will be given that relies on other facts. We will then explore why those other facts are true. This process will continue until the “core” foundational truths and definitions that lead to the MVT are reached. These truths will be labeled as axioms (or “postulates”). This means that we actually assume that they are true and construct our mathematics from them as a starting point.
This may seem counter-intuitive and perhaps even heretical. How can a mathematical “fact” be taken as “proved” based on another fact that is assumed to be true?
This is actually done all the time in mathematics, and it is necessary; there is really no way around it. It illustrates that any given subject in mathematics lies within a certain axiomatic system; that is, it is based on assumed truths and the rules of logic, as well as precise definitions. We have to start somewhere in our development of mathematics. There is no way to avoid axioms without implicit assumptions and/or circular reasoning.
Axiomatic Mathematics and Geometry
The logical validity of axiomatic mathematics is actually separate from the issue of the real-life utility of mathematics. The real-life usefulness of any area of mathematics is dependent on whether the axioms and theorems for that area seem to most closely match a real-life situation.
For example, there are many “kinds” of geometry. The geometry most people learn when they are young is called Euclidean geometry. This is based upon a system developed by the ancient Greek mathematician Euclid in a treatise called Euclid’s Elements. The most “famous” axiom of that system is the (controversial) parallel postulate, which essentially says that non-parallel lines will intersect, and about which whole books have been written.
One example of a theorem that can be proved in Euclidean geometry is the fact that the angles in any triangle add up to . In other words, the sum of the measures of their angles has the same measure as two right angles. This is a fundamental fact for problem solving in Newtonian physics and engineering.
Suffice it to say that you had better use Euclidean geometry to design bridges and buildings if you want to be an engineer or architect!
Non-Euclidean Geometries
On the other hand, there are many non-Euclidean geometries that arise when we ditch the parallel postulate. In one such Euclidean geometry, called spherical geometry, the sum of the measures of the angles in any triangle is greater than
When should you use spherical geometry? The “biggest” and perhaps most obvious situation is when you need to think about lines and triangles that lie on the surface of the Earth! A “line” on the surface of the Earth is actually a so-called “great circle“. This is a circle on the surface of the Earth whose center is at the center of the Earth. Lines of longitude are great circles but lines of latitude, except for the equator, are not.
If there are three points on the surface of the Earth that are not all on the same great circle, then the three lines (great circles) that connect the three points will form a triangle on the surface of the Earth (and the three given points will be the vertices of the triangle). If one vertex is at the north pole and two other vertices are both on the equator, one quarter of a turn around the equator from each other, the resulting triangle on the Earth will have all right angles whose measures add up to
Great circles are used, when possible, for circumnavigation around the Earth. Airplanes use these paths to save time and fuel. Spherical geometry is used more generally in, for example, the programming of computer systems for GPS satellites.
Physical Intuition for the MVT
Before we dive into the deconstruction of the proof, is there anything we can do to gain more intuition about why the MVT is true?
Besides looking at graphs, perhaps the best way to gain more intuition is to look at motion.
Imagine a race between two cars, one red and one blue. The driver of the blue car is Steady Eddie. He drives the blue car at a constant speed the entire time (to keep things simple, this also means there is no speed-up at the start and no slow-down at the end). The driver of the red car is Wild Kyle. He continually speeds up and slows down the entire time he drives.
If Steady Eddie and Wild Kyle both leave the starting line at time and reach the finish line at the same moment, then their average speeds must have been the same (by definition of average speed as ). The animation below shows an overhead camera view of the race (with reverse time shown as well).
Are Their Instantaneous Speeds Ever Equal?
Does Wild Kyle ever have a moment in time where his instantaneous speed equals his average speed (and that of the constant speed Steady Eddie)? The MVT says “yes”.
Why? Using whatever units are handy, let be the distance traveled by Steady Eddie from time 0 to time (for ). And let be the distance traveled by Wild Kyle from time 0 to time . Then the derivatives and represent, respectively, the instantaneous speeds of Eddie and Kyle at time (for any with ).
Since Eddie is Steady, , which equals both of their average speeds . Since Kyle is Wild, is not constant. However, the MVT says that there is some number between the starting time and the ending time where is the same as the average speed . This is also the constant speed of Steady Eddie.
This should seem intuitive. Kyle’s speeds are sometimes lower (slower) and sometimes higher (faster) than Eddie’s. Since Kyle’s instantaneous speeds should change in a continuous way, his speed should sometimes equal Eddie’s, which is also their average speed.
I’m actually making the implicit assumption that speeds are continuous in this intuitive explanation. This is not always the case in purely mathematical settings. However, this intuitive explanation is good enough to convince most people.
Videos About Deconstructing the Proof
In the written content to follow, I will mathematically deconstruct the proof the MVT. I have also made video content for this material in the form of two parts from one lecture below.
A Starting Point for Deconstructing the Proof: Rolle’s Theorem
There is another theorem intimately related to the MVT that goes by a different name: Rolle’s Theorem. It is actually a special case of the MVT. But, as we shall see at the end of this post, Rolle’s Theorem can, amazingly, be used to prove the MVT. It is pretty rare for a special case of a theorem to be equivalent to the theorem itself.
Here’s a precise statement of Rolle’s Theorem:
Rolle’s Theorem: If is a real-valued function defined and continuous on a closed interval , if is differentiable on the open interval , and if , then there exists a number with the property that .
Why is this a special case of the MVT? Recall the precise statement of the MVT from Part 1 of this series:
MVT: If is a real-valued function defined and continuous on a closed interval and if is differentiable on the open interval , then there exists a number with the property that .
Looking carefully at these statements, we see that Rolle’s Theorem adds the extra hypothesis that . But if this extra hypothesis is added to the MVT, the conclusion becomes that there is a number such that which is the conclusion of Rolle’s Theorem.
Why is Rolle’s Theorem True?
Our task now is to understand why Rolle’s Theorem is true. We want to show that the derivative of which exists on is somewhere equal to zero.
What’s needed here is some other knowledge about derivatives from calculus. Hopefully you recall that, for a continuous function over a closed interval (note that this is a key assumption in Rolle’s Theorem), the extreme (maximum and minimum) values must occur either at the endpoints or of the interval or they must occur at the critical points of in the open interval The critical points are those values of where is either zero or undefined. Since is assumed to be differentiable over these reduce to where . This sounds like a promising line of argument!
Ruling Out a Trivial Case
Before we make this argument airtight, let’s rule out a trivial case. What happens if for all ? Then the conclusion of Rolle’s Theorem is easily seen to be true because this implies that for all . In other words, can be any number in !
So now we assume that is not constant over . Since , this means there is a number with
In all of what we do now, let’s assume that (the case where is similar — you can ultimately give it a try on your own). The fact that there is such a number implies that the maximum value of over must occur at some number . But at such a maximum value, we know that , so we are done!
Is this a proof? Only if you accept two truths that have not been proved yet: Fact 1) that a continuous function on must have a maximum value over , and Fact 2) that for a number at which a differentiable function has a maximum value.
So, if we want to continue to deconstruct our proof of Rolle’s Theorem, and ultimately the MVT, we must continue by demonstrating the truth of the two facts from the previous paragraph.
Fermat’s Theorem for Stationary Points
It turns out that Fact 2 is easier to prove than Fact 1. To prove Fact 2, sometimes called Fermat’s Theorem for Stationary Points, we use: 1) the definition of the derivative. 2) The definition of the maximum value of a function. And 3) the fact that if a two-sided limit of a function exists, then the one-sided limits of that function (at that same point) both exist and equal the same value as the two-sided limit.
First, we state the definition of the maximum value of a (real-valued) function. There is a similar definition for a minimum value.
Definition of Maximum Value: Let be a real-valued function defined on an interval (which could be closed, open, or neither). We say that has a (global) maximum value at a number if for all The real number is called the maximum value of on , and the number is called the “point where the maximum value occurs”.
Here’s is the precise statement of Fact 2. There is a similar true statement when the function has a minimum value.
Fermat’s Theorem for Stationary Points: Let be a real-valued function defined and differentiable on an open interval . If has a maximum value at , then
Before moving on, we briefly note that this theorem is true even if the maximum value is “local” as opposed to our definition, which is “global”.
The Essence of the Proof of Fermat’s Theorem
Here’s the essence of the proof of Fermat’s Theorem above. Let be a point where has a maximum value. Since exists, we can say that
where the + and – notation denotes right- and left-hand limits, respectively.
Since has a maximum value at , we can say that , no matter whether or (assuming is close enough to 0 for to be defined). Therefore,
when and when
But this means that:
Since both of these one-sided limits are equal, they must also both equal zero. Therefore proving Fermat’s Theorem for Stationary Points.
The Extreme Value Theorem
Now we turn to Fact 1. It is a special case of the extremely important Extreme Value Theorem (EVT). It only requires the function to be continuous on a closed interval, not differentiable. Most significantly, the EVT is also something that can be generalized. It can be generalized to continuous functions between arbitrary topological spaces when the domain of the function is assumed to be “compact“, though that’s a topic for a future post.
Extreme Value Theorem (EVT): Let be a real-valued function defined and continuous on a closed interval . Then has a maximum value on
A similar statement can be made about having a minimum value on
This is the point where our deconstruction of the MVT gets extra difficult: proving the EVT. In fact, this is the point where understanding of “true” real analysis, and not “just” calculus, becomes necessary.
The proof of the EVT relies on the relationship between continuity and limits of functions. The proof also relies on a fact called the Bolzano-Weierstrass Theorem. I will describe this theorem in this post, but I will only do so intuitively and not precisely. To be fully rigorous and thorough would take many more pages of writing.
Intuitive Description of the Bolzano-Weierstrass Theorem: Suppose an infinite list (“sequence“) of real numbers has the property that all of these numbers are in some closed interval , then there is some number for which there are points in the sequence that are as close as we like to .
Examples to Help in Understanding the Bolzano-Weierstrass Theorem
Perhaps this is best understood at a basic level by thinking about some examples. We will start simple and progress upward in difficulty.
Example 1: Let the sequence be . Notice that all of these numbers are in, for example, the closed interval . Let and notice that, no matter how small is, there is a number in the sequence such that (for example, if , then just choose ). The number is our representation of the idea of “as close as we like”.
Example 2: Let the sequence be (the numbers in a sequence can repeat themselves). Notice that all of these numbers are in, for example, the closed interval . Also notice that, if or if , then no matter how small is, there are numbers in the sequence a distance from 1 or -1 (because the sequence itself consists of alternating 1’s and -1’s).
Example 3: Consider the sequence defined to be when is even and when is odd (so the sequence, written out, would be ). These numbers are all in, for example, the closed interval You should check on your own that work as in Example 2 (draw a picture of these points on a number line).
Example 4: This is a far more difficult example, but should be thought about anyway. Consider the sequence defined by . As an approximate infinite list, using radian measure for the input , this would be which are all points in . It’s a difficult proof, but it can be shown that any number satisfies the conclusion of the Bolzano-Weierstrass Theorem for this example!
Back to the Proof of the MVT
How is the Bolzano-Weierstrass Theorem helpful for proving the EVT? Again, the proof also relies on properties of continuous functions. A key property of continuous functions that we will need can be described intuitively as saying that if a sequence of points converges (“gets closer and closer to”) some number , and if is continuous at , then the sequence converges to
The first thing to prove is that the real-valued continuous function defined on is “bounded above” (such a function would also be “bounded below”). This means that there is some real number so that for all (the graph of stays below some horizontal line ).
To argue for why this is true, assume to the contrary that it is not true and try to obtain a logical contradiction. If it is not true, that means, for example, that for any integer there is a number so that (think about this!). But then this defines a sequence of numbers in the closed interval . By the Bolzano-Weierstrass Theorem, there must be a number that some of the points in this sequence get arbitrarily close to. In fact, you could create from those points another sequence that converges to (called a “subsequence” of the original sequence).
Using the Continuity of the Function
By continuity of including continuity at this would mean that the values of at those points that converge to must converge to But this is impossible because for all ! There’s no way the values of at those points could converge at all because they get “arbitrarily large”!
This contradiction implies that must, in fact, be bounded above on .
The Completeness Axiom is Needed
Now we argue that has a maximum value on . This is another point in this article where the depths of “true” real analysis must be plumbed. We need something called — wait for it — the Completeness Axiom! This is something we will assume is true!
Completeness Axiom: Let be any nonempty set of real numbers that is bounded above, so that there is a real number such that for all Then there is a real number with the following properties: 1) for all ( is an “upper bound” of ) and 2) if then there is a number so that (any real number less than is not an upper bound of ). The number is called the (unique) least upper bound of It is also called the supremum of and we write
This is quite a mouthful and you really need to consider examples to understand it. But at the moment let us just remark that is kind of like a maximum value of the set , though it might NOT actually be a member of As a simple example, if is the open interval , then , but
Back to the Argument for the Truth of the EVT
We now come back to the argument for the truth of the EVT. Since the continuous function is bounded above on the closed interval , we can say that the set of outputs is bounded above. The Completeness Axiom now implies that exists as a real number. If we can show that there is a number with , we will be done by the definition of the supremum given in the statement of the Completeness Axiom as well as the definition of a maximum value of a function over an interval.
For any positive integer , the number In other words, is like the “” in the definition of the supremum (in the Completeness Axiom). This means that there is a number with the property that (also think about the definition of ).
But this means that since The Bolzano-Weierstrass Theorem also implies that there is a number that some points in the sequence get arbitrarily close to. Finally, the continuity of now implies that We are done with our argument for the truth of the EVT (“Fact 1”) and ultimately our argument for the proof of Rolle’s Theorem.
Concluding Comments about the Completeness Axiom
In all this, there are still many details to be filled-in. For example, what are the precise definitions of the limit of a function and the limit of a sequence? What does continuity really mean and why is it related to limits of sequences?
Perhaps the most important question is: why should the Completeness Axiom be an axiom? Shouldn’t we be able to prove it from (hopefully) far simpler principles?
Actually, the Completeness Axiom can be proved from simpler principles. However, even though these principles are indeed simpler, this approach turns out to be a lot of work. If you are interested in how this can be done, you should learn about the subject of Dedekind cuts. For our purposes, we will be satisfied with taking it to be an axiom.
Proving the MVT using Rolle’s Theorem
I promised that I would end this post showing how Rolle’s Theorem can be used to prove the MVT. This can be done pretty easily, though it does involve a mathematical trick.
Here’s the trick: given a function that satisfies the hypotheses of the MVT, define a new function by the formula , where (the graph of is the secant line connecting the points and ).
Then is continuous on and differentiable on since and are. Furthermore, since and , we can conclude that Hence, by Rolle’s Theorem, there is a number such that . But . Therefore, which is equivalent to saying that
WHEW!!! YOU FINISHED READING!!! CONGRATULATIONS!!! THE END!!!