Differentiable Functions and Local Linearity

Calculus 1, Lectures 12 through 15B

An infinitely oscillating continuous and differentiable function.
This function is both continuous and smooth at x=0, in spite of oscillating infinitely often in any neighborhood of x=0.

In Steven Strogatz’s excellent book, Infinite Powers, there is a big emphasis on an idea he calls The Infinity Principle.

The concepts of a continuous function and of a differentiable function are closely tied to the Infinity Principle. But what does the Infinity Principle say? Here is the quote from page xvi of Infinite Powers.

To shed light on any continuous shape, object, motion, process, or phenomenon — no matter how wild and complicated it may appear — reimagine it as an infinite series of simpler parts, analyze those, and then add the results back together to make sense of the original whole.

Infinity Principle as stated on page xvi in “Infinite Powers, How Calculus Reveals the Secrets of the Universe”, by Steven Strogatz

Relationship to Continuous Functions

How is this related, first of all, to continuous functions? What did you learn to do when you were first taught about functions? You learned how to graph them (a.k.a. plot them).

The natural procedure to graph y=f(x) is:

  1. Pick some values for the independent variable x.
  2. For each x, find the corresponding (unique!) value of the dependent variable y.
  3. Then plot the corresponding points (x,y) (in a rectangular (Cartesian) coordinate plane).
  4. Finally, connect the dots with a continuous curve.

Look at the graph below to see this process visualized for the function y=f(x)=x^{2}. The points plotted are (x,y)=(-3,9),(-2,4),(-1,1),(0,0),(1,1),(2,4), and (3,9). The points are connected with a continuous curve. It happens that the slope (steepness) of this curve changes in a continuous way as well.

The squaring function has a graph that is a continuous curve. It is also smooth.
Part of the graph of y=f(x)=x^{2} sketched by first plotting points and then connecting the dots with a continuous curve. The curve is also ‘smooth’ so that its steepness changes in a continuous way as well. This is related to the concept of differentiability.

The key questions now are: a) What is a continuous curve? b) How do we know the curve in question is continuous? And c) How is this related to the Infinity Principle?

Intuitive and Precise Descriptions of Continuity

Continuity has an intuitive description. A function y=f(x) is continuous over its domain (i.e., a continuous curve) if you can draw its graph without picking up your writing utensil.

However, as described in the post “Limit Definition, Continuity, and Derivatives”, a precise definition of continuity entails a precise definition of a limit.

But the precise definition of a limit is quite difficult to understand. In part, this is because it involves the idea of a distance function (a “metric”), and also rests on the Completeness Axiom of the real number system. In part, it’s also difficult because of its symbolic and abstract nature.

Indeed, continuity is a more subtle concept than you might imagine. It implies, for example, that the square root of two exists. That is, there is a positive number, written as \sqrt{2}, such that \left(\sqrt{2}\right)^{2}=2.

This is actually somewhat surprising when you also learn that \sqrt{2} is an irrational number. After all, if its decimal expansion goes on forever and ever without a repeating pattern, how can you ever hope to actually square it to confirm that its square equals 2?!? That’s certainly something I would not want to try to do! At least not without knowing the answer ahead of time.

The existence of \sqrt{2} can be directly proved from the Completeness Axiom.

On the other hand, if you fully understand continuity, or even if you just trust the definition of continuity, then its existence follows from the Intermediate Value Theorem. This theorem can be applied to the function f(x)=x^{2}-2. Basically, since f(1)=-1<0 and f(2)=2>0, the continuity of f implies it must cross (and touch) the x-axis somewhere. This location is labeled x=\sqrt{2} because f(x)=0 is equivalent to x^{2}=2.

How Do We Know When a Curve is Continuous?

The graph of a function y=f(x) is continuous over a domain if \displaystyle\lim_{x\rightarrow c}f(x)=f(c) for every point c in its domain.

To prove this limit equation is true for a particular example requires the a precise definition of a limit mentioned above. I will go ahead and use the precise definition to prove that f(x)=x^{2} is continuous for all real numbers x here. However, if you want to skip ahead to the next section, that’s fine.

You should realize that this proof will seem like it’s “pulled out of a hat” (like a rabbit by a magician) with no clear way of seeing how to discover it on your own. That’s okay for the moment. If you are interested, a book like “Real Analysis, a First Course”, by Russell Gordon, can definitely help you.

Proof that the squaring function is continuous for all inputs:

Let f(x)=x^{2} and let c be an arbitrary real number. We want to prove that \displaystyle\lim_{x\rightarrow c}f(x)=\displaystyle\lim_{x\rightarrow c}x^{2}=c^{2}.

Towards this end, let \epsilon>0 be given. We want to show that f(x)=x^{2} is within a distance of \epsilon from c^{2} when x is sufficiently close to c.

Choose \delta=\mbox{min}\left\{1,\frac{\epsilon}{1+2|c|}\right\}>0 and suppose that x is chosen so that 0<|x-c|<\delta. (This is the part that seems to be “pulled out of a hat”.)

Since \delta\leq 1, we can say that |x-c|<1. By the so-called Triangle Inequality and a bit of ingenuity, we can conclude that:

|x+c|=|(x-c)+2c|\leq |x-c|+|2c|<1+2|c|.

Therefore,

|f(x)-f(c)|=|x^{2}-c^{2}|=|(x+c)(x-c)|=|x+c|\cdot |x-c|< (1+2|c|)\cdot \frac{\epsilon}{1+2|c|}=\epsilon.

Since c is arbitrary, this concludes the proof. The function f(x)=x^{2} is continuous over the set of all real numbers \Bbb{R}. Q.E.D.

Relationship to the Infinity Principle

The continuity of a function over its domain can be related to the Infinity Principle in a simple way. The “whole” that we want to understand is the graph of the function. And the “simpler parts” that we want to put together to understand the whole are the individual points on the graph. We put them together by “connecting” them. If we plot enough points and connect them in a continuous and smooth way, it will give a faithful representation of the function.

Again, this is a valid thing to do because of the continuity of the function.

This does not mean that graphs of continuous functions are necessarily simple objects, however.

Some continuous functions, such as the function f defined by f(x)=x\sin\left(\frac{1}{x}\right) when x\not=0 and f(0)=0, oscillate infinitely often in any neighborhood (small interval) containing a certain point (see the second graph at the preceding link).

Other continuous functions, such as the Weierstrass function, are actually fractals whose graphs appear jagged on every scale (every level of magnification).

Continuity and Areas

Another way continuous functions are related to the Infinity Principle is this: when the graph of a continuous function is above the horizontal axis over a closed interval, the area under the graphs and above the horizontal axis over that closed interval is well-defined. In a sense, in this situation, it is valid “build up” the area of this region by “adding together” an infinite number of vertical lines.

This is described in a precise way in the Fundamental Theorem of Calculus.

In the following animation, imagine the area shown under the graph expanding in the way that is shown because we keep adding more and more vertical lines to it. This is indicated by the moving bold blue vertical line.

The area under the graph of a continuous function is well-defined and related to the infinity principle.
The area under the continuous graph of y=f(x)=x^{2} over the closed interval -3\leq x\leq 3 is well-defined. In fact, we can use methods of calculus to show that this area equals 18.

Lecture 12

The main thing that I ultimately want to emphasize in this blog post is the relationship between the Infinity Principle and the idea of a differentiable function.

Towards this end, I will now summarize the content of Lectures 12 through 15B from my Calculus 1 class at Bethel University during the Fall of 2019. Connections to the idea of a differentiable function will be emphasized along the way. In the end, we will also consider the idea of an infinitesimal number to bring everything all together.

Lecture 12 is the first lecture I give after the first exam. And I spend the first 6-7 minutes emphasizing that math classes, exams, and grades, while important, are certainly not the most important things in life. As a Christian, I believe that humans are created in God’s image and that we all have intrinsic value because of that fact.

Calculus 1, Lecture 12: Testing, Math, & Life; Main Applications; Distance and Speed, Position and Velocity

I also emphasize that everyone can improve. I share a story about a woman named Sarah Seales who overcame a lot of obstacles to become a mathematician.

Distance Traveled, Speed, Position, and Velocity

At that point, I review the main applications for the course and then I move on to discuss distance traveled, speed, position, and velocity as the main topics of the lecture.

There are two main broad concepts that I emphasize. 1) The distinction between distance traveled and position and the corresponding distinction between speed and velocity, and 2) the facts that speed is the derivative of distance traveled and velocity is the derivative of position.

I approach the second concept by considering examples graphically, numerically, and symbolically — with formulas and limits. The limit definition of the derivative is reviewed and used in the context of motion. I use relatively simple examples for which the derivative exists at all points. This means the functions I consider are differentiable functions.

But what about the first concept? In a nutshell, velocity is the same as speed except for the fact that direction is taken into account. This means that a velocity can be negative (you need to pick a “positive direction” and a “negative direction” first, based on a choice of an axis with positive and negative directions). It also means that speed is the absolute value of velocity.

Position is measured with respect to the axis given. Distance traveled is just how far you travel.

Lectures 13A and 13B

In Lectures 13A and 13B, I start to get at the heart of what it means to be a differentiable function and its relationship to the Infinity Principle. I also consider a number of other topics.

At the very start of Lecture 13A, I review the Infinity Principle and mention that it is related to the idea of a differentiable function. I do this by emphasizing an important way to conceptualize the graph of a differentiable function.

The graph of a differentiable function is, in a sense, made up of infinitely many infinitesimally small straight lines.

Calculus 1, Lecture 13A: Infinity Principle, Local Linearity & Mathematica, Second Derivatives, Acceleration

What do I mean by this? I mean that, when you zoom in toward any point on the graph of a differentiable function, the graph looks more and more like a straight line. This phenomenon is also called local linearity. This is the concept I will explore in more detail at the end of this post.

Acceleration is the Derivative of Velocity and the Second Derivative of Position

From there, I discuss physics more. The acceleration of a moving object is the derivative (rate of change) of the velocity. Since velocity is the derivative of position, we say the acceleration is the second derivative of the position.

If y=f(t) is position, then we write the velocity as v=f'(t)=\frac{dy}{dt}. Then the acceleration is a=\frac{dv}{dt}=\left(f'\right)'(t)=f''(t)=\frac{d^{2}y}{dt^{2}}. This last bit of notation is an outgrowth of the expression \frac{d}{dt}\left(\frac{d}{dt}(y)\right).

It’s almost as if the “d” in the numerator is being square and the “dt” in the denominator is being square. That’s not really what is happening. Instead, the \frac{d}{dt} “operator” is being applied twice. See the discussion about Lecture 14 below.

Interpreting Derivatives as Rates of Change in Real Life Situations

Finally, I also spend time in the last part of Lecture 13B discussing how to interpret derivatives as (instantaneous) rates of change in real life situations. The two main situations where I do this in Lecture 13B are:

  1. A yam is heating up in an oven. Let T be the temperature and t be the time, in minutes, since it started heating up. Then T=f(t) and we want to use the function value f(20) and the derivative value f'(20) to estimate f(23). The key approximation is based on the equation of the tangent line: f(23)\approx f(20)+f'(20)\cdot (23-20)=f(20)+f'(20)\cdot 3.
  2. Money grows in a bank account for 10 years. Let B be the final balance and let r be the interest rate (quoted as r\%). The goal is to interpret g'(5) as a rate of change and then use g(5) and g'(5) to estimate g(6). The key approximation is g(6)\approx g(5)+g'(5)\cdot 1.

Lecture 14 (Including the Idea of a Differentiable Function)

At the start of Lecture 14, I use Mathematica to make an interactive graph of a fourth degree polynomial, its derivative, and its second derivative. I emphasize the importance of being able to roughly draw these graphs by hand without technology.

Calculus 1, Lecture 14: Graph f’ & f”, d/dx is a Linear Operator, Linear Approximations & Differentiability

The Derivative is a Linear Operator

Then I emphasize that the symbol \frac{d}{dx} can be thought of as a “name” for a linear operator that takes a differentiable function of x as input and returns its derivative, as a function of x, as output. The linearity of this operator is expressed symbolically by the following equation, which is true for all differentiable functions f and g and all constants a and b.

\frac{d}{dx}\left(a\cdot f(x)+b\cdot g(x)\right)=a\cdot \frac{d}{dx}\left(f(x)\right)+b\cdot \frac{d}{dx}\left(g(x)\right)=af'(x)+bg'(x)

This equation can be expressed in words as: “the derivative of a linear combination of functions is the corresponding linear combination of their derivatives”.

This fact is fundamental to the usefulness of calculus. Its truth follows from the definition of the derivative (see “Limit Definition, Continuity, and Derivatives”) and the corresponding linearity of limits.

Linearity can also be extended, via mathematical induction, to a linear combination of an arbitrary finite number of functions and constants. Here is the property written with summation (sigma) notation.

 \frac{d}{dx}\left(\displaystyle\sum_{k=1}^{n}c_{k}f_{k}(x)\right)=\displaystyle\sum_{k=1}^{n}c_{k}\cdot \frac{d}{dx}\left(f_{k}(x)\right)=\displaystyle\sum_{k=1}^{n}c_{k}f_{k}'(x)

Second Derivative Notation

Recall from above that if y=f(x), then its second derivative (assuming it exists) is written as \frac{d^{2}y}{dx^{2}}=f''(x). The first form can be rewritten as \frac{d}{dx}\left(\frac{d}{dx}(y)\right).

We are not literally “squaring” the operator \frac{d}{dx} here (in the sense of multiplication). Instead, we are really applying (using) it twice. In reality, it’s a function composition because operators are actually functions (they have inputs and outputs). For this case, \frac{d^{2}}{dx^{2}}=\frac{d}{dx}\circ \frac{d}{dx}. See Lecture 4B in the blog post “The Derivative of y = x^2”.

The Power Rule (Proved in a Special Case using the Binomial Theorem)

After this, I discuss the “most remembered” rule in calculus: the Power Rule. It states that, for any number n, we have \frac{d}{dx}\left(x^{n}\right)=nx^{n-1} over an appropriate domain for x (which always at least includes the set where x>0).

The proof of the Power Rule is difficult for arbitrary n. However, when n is a positive integer, its proof is straightforward if you know the Binomial Theorem.

In Lecture 14, I use a “weak” form of the Binomial Theorem (where I don’t explicitly describe all the terms).

Here, we can fairly quickly use the “regular” form. All you need to know is that the symbol \left(\begin{array}{c} n \\ k \end{array}\right) is “n choose k” (a “binomial coefficient” — also see “Pascal’s Triangle”) and equals \frac{n!}{k!(n-k)!} (where “!” represents the factorial function).

Proof of the Power Rule When the Power is a Positive Integer

Here are the details when f(x)=x^{n}, where n\in \{1,2,3,\ldots\}. It is in the second equality where the Binomial Theorem is used.

f'(x)=\displaystyle\lim_{h\rightarrow 0}\frac{(x+h)^{n}-x^{n}}{h}=\displaystyle\lim_{h\rightarrow 0}\frac{\left(\displaystyle\sum_{k=0}^{n}\left(\begin{array}{c} n \\ k \end{array}\right)x^{n-k}h^{k}\right)-x^{n}}{h}

Note that \left(\begin{array}{c} n \\ 0\end{array}\right)=\frac{n!}{0!(n-0)!}=1, since 0!=1 by definition. Therefore, there are two x^{n} terms that cancel in the top and we can write:

f'(x)=\displaystyle\lim_{h\rightarrow 0}\frac{\displaystyle\sum_{k=1}^{n}\left(\begin{array}{c} n \\ k \end{array}\right)x^{n-k}h^{k}}{h}=\displaystyle\lim_{h\rightarrow 0}\frac{h\left(\displaystyle\sum_{k=1}^{n}\left(\begin{array}{c} n \\ k \end{array}\right)x^{n-k}h^{k-1}\right)}{h}

Continuing,

=\displaystyle\lim_{h\rightarrow 0}\displaystyle\sum_{k=1}^{n}\left(\begin{array}{c} n \\ k \end{array}\right)x^{n-k}h^{k-1}=\left(\begin{array}{c} n \\ 1 \end{array}\right)x^{n-1}=\frac{n!}{1!(n-1)!}x^{n-1}=nx^{n-1}.

The third to last equality on the last line follows by continuity, as a function of h, of the expression \displaystyle\sum_{k=1}^{n}\left(\begin{array}{c} n \\ k \end{array}\right)x^{n-k}h^{k-1} at h=0.

Differentiability and Local Linearity

Once again, I then describe, at an intuitive level, what it means for a function to be differentiable and locally linear. First, I apply this concept to more real-life approximations (also see Lecture 13B above).

Then I describe this concept in a more technical way before illustrating how it fails as we zoom in near the point (x,y)=(0,0) on the graph of the absolute value function y=|x|. This graph has a “corner” at (x,y)=(0,0) which never goes away upon greater magnification. It is not locally linear at x=0 and is not differentiable there.

Real Life Interpretations

In Lecture 14, the real-life situations I consider are:

  1. Painkiller dosage D as a function of the weight w of a patient.
  2. Fuel efficiency E as a function of the speed v of a car.
  3. Revenue from car sales C as a function of the advertising expenditure a.

As in the other real-life situations, units are a big key to helping us to interpret the derivative as a rate of change. The units for the derivative are always the units for the dependent variable per unit of the independent variable.

Differentiability and Lack of Differentiability

Let’s be a bit more “official” in our definition of differentiable function now. Let f be a function defined on some open interval containing the number x=a. We say that f is differentiable at x=a if the derivative f'(a) exists. This is equivalent to the limit \displaystyle\lim_{h\rightarrow 0}\frac{f(a+h)-f(a)}{h} existing.

This implies that the graph of f is “smooth” (no corners). It also implies that the graph is continuous. Therefore, if the graph of a function has a corner or a discontinuity at x=a, then f cannot be differentiable at x=a.

Appearing Smooth is No Guarantee of Differentiability

It is also interesting and important to note that some graphs that look continuous and “smooth” from a visual standpoint can fail to be differentiable. This can occur when the tangent line at a point is vertical (in a sense, its slope is infinite). The simplest example where this occurs is for the cube root function f(x)=\sqrt[3]{x}=x^{1/3} at x=a=0.

Although the derivative of this function is, by the Power Rule, f'(x)=\frac{1}{3}x^{-2/3}=\frac{1}{3x^{2/3}}, which is undefined at x=0, this is not the most fundamental reason to give for its failure to be differentiable at x=0.

The most fundamental reason is based on the limit definition of the derivative of f at x=0. In this case, the relevant limit is \displaystyle\lim_{h\rightarrow 0}\frac{f(0+h)-f(0)}{h}=\displaystyle\lim_{h\rightarrow 0}\frac{h^{1/3}}{h}=\displaystyle\lim_{h\rightarrow 0}\frac{1}{h^{2/3}}. But this limit clearly does not exist. Therefore, f'(0) does not exist.

Zooming In Where Functions Fail to be Differentiable Functions

If we zoom in on the graph of f(x)=\sqrt[3]{x}=x^{1/3} near (0,0), we can see that it does indeed seem to look more and more vertical the closer we zoom in.

The cube root function is not a differentiable function at zero. It has a tangent line with an infinite slope there.
The function f(x)=\sqrt[3]{x}=x^{1/3} fails to be differentiable at x=0, in spite of the fact that it is continuous there and is, apparently, ‘smooth’ there. This is because the tangent line to this graph at (x,y)=(0,0) is vertical. In a sense, the derivative equals infinity there, though we don’t treat infinity as a number in calculus.

In addition, the graph of y=g(x)=|x| seems to have a true “corner” at (x,y)=(0,0) no matter how far we zoom. This function also fails to be differentiable at x=a=0.

The absolute value function is not differentiable at zero. It has a corner in its graph there.
The function f(x)=|x| fails to be differentiable at x=0. It has a ‘corner’ in its graph at (0,0) that looks the same at all levels of magnification. It never looks more and more like a straight line upon zooming in. This function is not locally linear at x=0.

Be aware, however, that appearances can be deceiving.

If you plot f(x)=\sqrt{x^{2}+0.0001} on an “ordinary” window, it will appear to have a corner in its graph near the origin.

However, if you zoom in on the point (x,y)=(0,f(0))=(0,0.01), you will see that the graph is actually locally linear there. In fact, you will see that f'(0)=0. Try it on your calculator!

Lectures 15A and 15B (Getting Further into Differentiable Functions)

The final couple lectures related to this blog post are Lectures 15A and 15B.

Once again, the concept of a differentiable function and its local linearity is reviewed. In addition, in Lecture 15A, the derivatives of exponential functions and the relationship to the application of continuous interest rates is explored.

Calculus 1, Lecture 15A: Differentiability, Derivatives of Exponentials, Compound Interest and Continuous Interest Rates
Derivatives of Exponential Functions

Given b>0, the derivative of the exponential function f(t)=b^{t} is f'(t)=\ln(b)\cdot b^{t}. In Lecture 15A, this is partially derived by considering the limit definition of the derivative, as well as numerical and graphical evidence. A full proof is not given because it is beyond the scope of the course.

The key calculation in this partial derivation is based on rules of exponents as well as the linearity of limits:

\displaystyle\lim_{h\rightarrow 0}\frac{b^{t+h}-b^{t}}{h}=\displaystyle\lim_{h\rightarrow 0}\frac{b^{t}\left(b^{h}-1\right)}{h}=\left(\displaystyle\lim_{h\rightarrow 0}\frac{b^{h}-1}{h}\right)\cdot b^{t}

It turns out that \displaystyle\lim_{h\rightarrow 0}\frac{b^{h}-1}{h}=\ln(b), though I only give numerical evidence of this fact, not a proof.

Note that when b=e=2.71828\ldots, Euler’s number, the derivative fact from further above simplifies to \frac{d}{dt}\left(e^{t}\right)=e^{t}. This is sometimes said to be the most important specific derivative fact in calculus.

Fortunately it is also the easiest to remember!

In general, since \ln(e^{k})=k, it follows that \frac{d}{dt}\left(e^{kt}\right)=\frac{d}{dt}\left(\left(e^{k}\right)^{t}\right)=\ln\left(e^{k}\right)\cdot \left(e^{k}\right)^{t}=ke^{kt}. We will eventually see that this can also be derived with the Chain Rule.

These facts all illustrate that e is the most natural base to use for exponential functions, as well their inverses, the logarithmic functions.

Effective Relative Growth Rates vs Continuous Relative Growth Rates

Let’s take the time to focus on what a continuous interest rate (a.k.a. continuous relative rate of growth) represents. Consider an example. Let A=f(t)=1000\cdot 1.1^{t} be the value of your investment of $1000 at time t, in years. We also considered this example in the post “Linear versus Exponential Growth and Decay” about Lecture 3A and Lecture 3B.

This means your investment has an “actual”, or “effective” interest rate (relative growth rate) of 10% per year. For example, f(1)=\$1100, which is 10% more than $1000; f(2)=\$1210, which is 10% more than $1100; f(3)=\$1331, which is 10% more than $1210; and f(4)=\$1464.10, which is 10% more than $1331.

This growth rate, 10%, is a relative growth rate (rather than an “absolute” growth rate). Technically speaking, in each of the time periods from the previous paragraph, it equals \frac{\mbox{ending value}-\mbox{starting value}}{\mbox{starting value}}=\frac{f(n+1)-f(n)}{f(n)}. The amount of growth f(n+1)-f(n) is described relative to the starting value f(n) over the time interval [n,n+1]. This description takes the form of a ratio.

Corresponding Continuous Relative Growth Rate

The corresponding continuous relative growth rate is the continuous version of this. In essence, at a specific moment in time t, it is \displaystyle\lim_{h\rightarrow 0}\frac{\frac{f(t+h)-f(t)}{h}}{f(t)}. By the limit definition of the derivative, this is \frac{f'(t)}{f(t)}.

If k is the continuous interest rate for this problem, then we have seen that f(t)=1000e^{kt} (in fact, k=\ln(1.1)\approx 0.0953101798\approx 9.53\%). Note that \frac{f'(t)}{f(t)}=\frac{1000ke^{kt}}{1000e^{kt}}=k for all t. In other words, k is indeed the continuous relative growth rate!

At any instant in time, this is the relative rate at which your money is growing, if we assumed that the growth continued along a straight line (the tangent line to the graph). This is not what actually happens since the graph is concave up. The effective annual growth rate is higher than k. It is 10%.

More on this will come in my next blog post, which will include the content of Lecture 16A.

Lecture 15B

The final lecture for this blog post is Lecture 15B. This lecture includes an introduction to one of my favorite intuitive (non-rigorous) math topics: infinitesimals. This is “calculus sans limits” (i.e., “calculus without limits”). Doesn’t that sound great?

However, it starts with yet another dive into the main topic of this blog post: the definition of a differentiable function, its consequences, and some visuals where I zoom in to graphs.

Calculus 1, Lecture 15B: Visualizing Local Linearity (or Lack Thereof), Infinitesimal Calculus Introduction

I take the time to go through an outline of the proof that if a function f is differentiable at x=a, then it is continuous at x=a. It’s not a difficult proof, though it is a bit tricky.

This fact also implies that if f is not continuous at x=a, it will not be differentiable at x=a, as mentioned further above.

Zooming in on Two Wild Functions, One of Which is a Differentiable Function

A couple new functions zoomed-in on during the course of Lecture 15B include: 1) the function f(x)=x\sin\left(\frac{1}{x}\right) when x\not=0 and f(0)=0; and 2) the function f(x)=x^{2}\sin\left(\frac{1}{x}\right) when x\not=0 and f(0)=0. Both of these functions oscillate infinitely often on any little neighborhood of x=0, and both are continuous at x=0 because the oscillations decrease in amplitude to zero. But only the second example is differentiable at x=0.

Here is an animation of the first example.

An infinitely oscillating function, x*sin(1/x), which is a continuous function at x = 0 but not a differentiable function at x = 0.
The graph of f(x)=x\sin\left(\frac{1}{x}\right) when x\not=0 and f(0)=0 is a continuous function at x=0, but not differentiable function there. It is not locally linear near x=0.

And here is an animation of the second example.

An infinitely oscillating function, x^2*sin(1/x), which is a continuous and differentiable function at x = 0.
The graph of y=f(x)=x^{2}\sin\left(\frac{1}{x}\right) when x\not=0 and f(0)=0 is both continuous and smooth at x=0, in spite of oscillating infinitely often in any neighborhood of x=0.

The non-differentiability of the first example is based on the fact that \displaystyle\lim_{h\rightarrow}\frac{f(0+h)-f(0)}{h}=\displaystyle\lim_{h\rightarrow 0}\sin\left(\frac{1}{h}\right) does not exist.

The differentiability of the second example is based on the fact that \displaystyle\lim_{h\rightarrow}\frac{f(0+h)-f(0)}{h}=\displaystyle\lim_{h\rightarrow 0}h\sin\left(\frac{1}{h}\right) does exist. The so-called Squeeze Theorem provides the quickest proof of this last fact.

Calculus Sans Limits (Calculus with Infinitesimals)

This, our final topic for this post, is a fun topic. It is also the way the inventors of calculus, Isaac Newton and Gottfried Wilhelm Leibniz, conceptualized the subject. Although Newton discovered/created the subject first, the notation is due to Leibniz.

I spend the last 15 minutes of Lecture 15B introducing this topic.

What is the topic? It is the idea of an infinitesimal quantity and how that idea can be used to derive facts in calculus without thinking about limits.

An infinitesimal number is thought of as a positive number dx that is so close to zero that it is smaller than any positive real number. How small is this? Certainly it’s much smaller than 10^{-10} and 10^{-100}. Getting even more extreme, it’s much smaller than even 10^{-10^{100}} or even 10^{-(10^{100})!}. Is this possible?

Immediately we are struck by contradictions. After all, wouldn’t \frac{dx}{2} be a positive number smaller than dx? In fact, to be technical about it, there is no smallest positive real number!

An Example

Let’s not worry about these contradictions and just blindly proceed. In fact, let’s consider an example.

Let y=f(x)=x^{3}. Think of x as some fixed real number.

Here is the key question to ask and attempt to answer: if x is changed by an infinitesimal amount dx, what is the corresponding change in the output value y? Will it be infinitesimal?

Let’s assume the change in y is infinitesimal and call it dy. Can this be written in terms of dx? Do a computation!

dy=f(x+dx)-f(x)=(x+dx)^{3}-x^{3}=x^{3}+3x^{2}\cdot dx+3x\cdot dx^{2}+dx^{3}-x^{3}

=3x^{2}\cdot dx+3x\cdot dx^{2}+dx^{3}.

So we see that if x is changed by an infinitesimally tiny amount to x+dx, then y is indeed changed by an infinitesimally tiny amount to y+dy, where dy=3x^{2}\cdot dx+3x\cdot dx^{2}+dx^{3}.

Now if dx is infinitesimal, how small should we say that dx^{2} and dx^{3} are? After all, dx is “much” smaller than, for instance, 10^{-100}. Hence, dx^{2} is “much” smaller than \left(10^{-100}\right)^{2}=10^{-200} and dx^{3} is “much” smaller than \left(10^{-100}\right)^{3}=10^{-300}.

In fact, based on this line of “loose” reasoning, we might posit that dx^{2} and dx^{3} are, to coin a new phrase, unspeakably small. We might even declare that they are so small that we can take them to be zero (“negligible”).

If we set dx^{2}=0 and dx^{3}=0, then the formula for dy simplifies to dy=3x^{2}dx.

Dividing both sides of this equation by dx gives \frac{dy}{dx}=3x^{2}.

Hey! This is the derivative of y=f(x)=x^{3}, by the Power Rule!

What happened? Some sort of mathematical magic?!?

What is Going on Here?

In a sense, it is indeed a bit of mathematical magic. Based on the standard real number system, it is not rigorous mathematics.

It is made rigorous in the subject called non-standard analysis. However, this subject is either too difficult or not of interest for many mathematicians, much less high-school students and undergraduates.

We will just treat it as a “fun” and somewhat “intuitive” approach to deriving equations in calculus. It’s also the foundation for a lot of the content found in Grant Sanderson’s “The Essence of Calculus” series at 3Blue1Brown.

The Essence of Calculus by Grant Sanderson at 3Blue1Brown

Infinitesimals were how Newton and Leibniz thought about calculus. In fact, we use Leibniz’s notation, including for infinitesimals. The foundations of calculus were not made fully rigorous until the program of the “Arithmetization of Analysis” research program started by Cauchy, Weierstrass, Dedekind, and others in the 1800’s.

Does this have anything to do with the Infinity Principle? Yes. Here is the connection. The “whole” that we want to build up is a smooth (differentiable) function. We build up such a function by constructing it with infinitesimally small straight lines. These lines have varying slopes that equal the varying values of the derivative \frac{dy}{dx} of the function.

In fact, in this approach, we are thinking of \frac{dy}{dx} as an actual fraction. It is giving us “rise over run” for the infinitesimally small straight line.

This is all “intuitively justified” by the phenomenon of local linearity for smooth (differentiable) functions.

This is a good enough justification for us when we take this approach. However, we will still talk about the more rigorous approach based on limits as it suits us.