Infinitesimal Calculus and Calculus Rules

Calculus 1, Lectures 16A through 18A

Infinitesimal calculus can be used to derive the derivative of the sine function.
Infinitesimal calculus can be used to derive the derivative of the sine function.

Did you know that Newton and Leibniz did not know the precise definition of a limit?

Instead, they approached calculus in an intuitive way. Today, this intuitive method is called infinitesimal calculus. It is based on the concept of infinitesimal quantities, or just “infinitesimals”, for short. These are quantities so small that they are smaller than any positive real number. In a sense, you can think of them as quantities of the form \frac{\mbox{positive number}}{\infty}.

But, first things first: there are no such real numbers! You cannot divide by infinity!

There is also no smallest positive real number!

This last statement is easy to prove. Given any real number x>0, the number \frac{x}{2}>0, but \frac{x}{2}<x.

Moreover, if 0<x<1, then x^{2}, x^{3}, etc… can be “much smaller than” x itself (by many “orders of magnitude”).

Using Infinitesimals

So if there are no such real numbers, how can they possibly be used? There are two approaches to the answer this question.

  1. They can be made rigorous through the arduous process of studying the subject of non-standard analysis.
  2. They can be kept at an intuitive, but non-rigorous, level.

Most people don’t have the stomach for approach #1. So instead, we use approach #2. Approach #2 also has the benefit of being a lot of fun! — once you get used to it, at least.

Part of the fun that arises from this approach is that calculus formulas can be derived without resorting to the use of limits. You could call this approach Calculus Sans Limits. I discussed this at the end of my previous blog post, “Differentiable Functions and Local Linearity”.

A more serious benefit of Approach #2 is that it can give you insight into many applications of calculus. Indeed, this approach results in some of the main benefits that scientists, engineers, economists, etc… get out of learning calculus.

These benefits are not something that come so naturally to people trained to be pure mathematicians.

I myself was trained this way. It always bothered me when my physics teachers took this approach.

I would ask them, “how do you know you can do that?”

And they would respond, “because it makes sense!” And then they would look at me and ask, “what are you, a mathematician or something?”

I have learned over the years to appreciate their point of view. Again, it helps you get a lot of insight into the applications of calculus; and not just in differential calculus, but even more so in integral calculus.

Lecture 16A: Continuous Growth Rates, Errors, and Newton’s Method

In the lectures, before getting into infinitesimal calculus, I spend some time nailing down the meaning of two things.

The first thing that needs to be made more clear is the interpretation of continuous growth rates. And the second thing that needs more clarity is what it means for a linear approximation to be “good”. These two topics take up the first half of Lecture 16A.

Calculus 1, Lecture 16A: Continuous Growth Rates, Errors for Tangent Line Approximations, and Newton’s Method

Continuous (Instantaneous) Relative Growth Rates

I start by considering a situation where an investment grows by 100% for every year that goes by. In other words, the value of the investment doubles every year. The only situation where this might be somewhat realistic is for a newly-formed company whose value skyrockets in its first few years.

In this situation, if $1000 is invested at time t=0, then the investment’s value at an arbitrary time t>0 is A=f(t)=1000\cdot 2^{t}. If k=\ln(2), then 2=e^{\ln(2)}=e^{k}\approx e^{0.693147}. Therefore, we can also write A=f(t)=1000e^{\ln(2)t}\approx 1000e^{0.693147t}. The quantity k=\ln(2)\approx 0.693147\approx 69.3\% is the continuous growth rate in this situation.

This is an instantaneous relative (percent) rate of growth. If the growth continued along a straight line rather than a concave up exponential growth curve, it would grow by about 69.3% in one year.

To see this, note that \frac{dA}{dt}=f'(t)=1000\cdot \ln(2)\cdot 2^{t}\approx 693.147\cdot 2^{t}. Then, at any moment in time t=a, the tangent line approximation to f at the point (a,f(a)) gives \Delta A\approx f'(a)\cdot \Delta t \approx 693.147\cdot 2^{a}\cdot \Delta t. Hence, the relative change along the tangent line when \Delta t=1 is \frac{f'(a)\cdot \Delta t}{A}\approx \frac{693.147\cdot 2^{a}\cdot 1}{1000\cdot 2^{a}}=\frac{693.147}{1000}=0.693147\approx 69.3\%.

This is described visually in the lecture embedded above and shown in the figure below.

Illustrating continuous relative growth rates with an investment example growing by 100% (doubling) for every unit of time that goes by.
While the blue curve grows by 100% for every unit of time of time that goes by, its tangent line approximations grow by about 69.3% for every unit of time that goes by.

Errors in Linear Approximations

What does it mean for a linear approximation L(x)=f(a)+f'(a)(x-a) to be a “good” approximation for a nonlinear function f near x=a?

It means the error in the approximation goes to zero “rapidly” as x approaches a.

By “rapidly”, we mean that the error goes to zero faster than x-a does as x\rightarrow a. This, in turn, is defined by requiring that \frac{\mbox{error}}{x-a}\rightarrow 0 as x\rightarrow a. In other words, the top of this fraction goes to zero significantly faster than the bottom does.

But what is the error? In applied mathematics, the error in an approximation is always defined to be (\mbox{actual value}) - (\mbox{approximate value}).

An example will help. The example from the lecture is f(x)=x^{2} and a=3. The tangent line (linear) approximation of f near x=a is L(x)=f(3)+f'(3)(x-3). Since f'(x)=2x, this gives L(x)=9+6(x-3). The error is therefore \mbox{error}(x)=f(x)-L(x)=x^{2}-(9+6(x-3)). This formula simplifies to \mbox{error}(x)=x^{2}-6x+9=(x-3)^{2}. Therefore, \frac{\mbox{error}(x)}{x-3}=x-3 when x\not=3. This definitely has a limit of 0 as x\rightarrow 3.

In other words, \mbox{error}(x) goes to zero significantly faster than x-3 does. This is fast enough to call the tangent line approximation “good”. The error function has a graph which is a parabola with a vertex at (x,y)=(3,0). The function outputs are very close to zero when x is close to 3.

The error in a linear approximation should go to zero very fast as x approaches a. This is related to infinitesimal calculus by the fact that the error function is written in terms of higher powers of dx.
The error function (green) in using the red line to approximate the blue curve is a parabola with a vertex at x=a=3. The error function goes to zero much faster as x\rightarrow 3 than y=x-3 (orange) does.
Errors in Terms of Infinitesimals

It turns out that errors can also be thought of in terms of infinitesimal calculus. In the previous example, let dx=x-3 (imagine x is “infinitesimally close” to 3). Then we can think of the error as a function of dx and write \mbox{error}(dx)=dx^{2}. If dx is an infinitesimally small, then dx^{2} is even “more” infinitesimally small.

Maybe we need a new adjective. Should we describe dx^{2} as unbelievably small? Inconceivably small? Unspeakably small? I chose to use “unspeakably small” in later lectures. This is how the error goes to zero “very fast” when described in terms of infinitesimals.

Of course, none of this is rigorous mathematics. In fact, it is, in part, meant to be mildly humorous. In spite of this, however, it is still worth doing.

Newton’s Method is the topic of the last half of Lecture 16A. It is also a topic in Lecture 17A, so I will get into its details in the next section.

Lectures 16B and 17A: Putting Infinitesimal Calculus to Use

It is in Lectures 16B and 17A where I put infinitesimals to their most significant use in my Calculus 1 lectures.

Calculus 1, Lecture 16B: Infinitesimal Calculus for d(sin(x)) and d(cos(x)), Product Rule and Applications

The thumbnail for the video embedded above is an infinitesimal calculus version of the derivative fact \frac{d}{dx}(\sin(x))=\cos(x). The purpose of using infinitesimals in this context is to derive this equation. The derivation is done without using the limit definition of the derivative: it is Calculus Sans Limits. It does rely on a foundational angle sum trigonometric identity, however. That identity is below.

For any two numbers A and B, we have:

\sin(A+B)=\sin(A)\cos(B)+\cos(A)\sin(B)

Approximations Can Become “Exact” in Infinitesimal Calculus

The derivation also relies of the following “exact” equations involving infinitesimals. If dx is an infinitesimal, then

\cos(dx)=1\ \mbox{ and }\ \sin(dx)=dx

To confirm this at an intuitive level, get your calculator out and make sure it is in radian mode.

Use your calculator to see that \cos(0.1)\approx 0.995, \cos(0.01)\approx 0.99995, \cos(0.001)\approx 0.9999995, and \cos(0.0001)\approx 0.999999995.

Now we make a couple observations. First note that we keep dividing successive inputs by 10. Next, note that the successive “errors” in how close the outputs are to 1 are: 0.005, 0.00005, 0.0000005, and 0.000000005. They keep getting divided by 10^{2}=100 as the inputs keep getting divided by 10.

Since an infinitesimal quantity dx is smaller than 10^{-n} no matter how big n is, it therefore makes intuitive sense to say \cos(dx) is “exactly” 1. We go ahead and write \cos(dx)=1 whenever it might be handy.

And why do we write \sin(dx)=dx whenever it might be handy?

Once again, use your calculator to confirm that \sin(0.1)\approx 0.0998, \sin(0.01)\approx 0.0099998, \sin(0.001)\approx 0.0009999998, and \sin(0.0001)\approx 0.0000999999998.

We are again dividing successive inputs by 10. In turn, the “error” in each output in successively approximating 0.1, 0.01, 0.001, and 0.0001 keeps getting divided by 10^{2}=100. Because of this, it makes intuitive sense to write the “exact” equation \sin(dx)=dx when dx is infinitesimal.

Deriving the Derivative of the Sine Function

Now we can derive the derivative of the sine function. Let y=f(x)=\sin(x) (where x is measured in radians if it is thought of as an angle). Then, for an infinitesimal increase in the input from x to x+dx, we have

dy=f(x+dx)-f(x)=\sin(x+dx)-\sin(x).

Using the angle sum formula from above, this becomes

dy=d(\sin(x))=\sin(x)\cos(dx)+\cos(x)\sin(dx)-\sin(x).

Then we use our “exact” infinitesimal equations to get

dy=d(\sin(x))=\sin(x)\cdot 1+\cos(x)dx-\sin(x)=\cos(x)dx.

Now just divide both sides by the (nonzero!) infinitesimal dx to get the derivative fact that we seek:

\frac{dy}{dx}=\frac{d}{dx}(\sin(x))=\cos(x).

Isn’t that fun?!? No limits necessary! Calculus Sans Limits!

It may seem like magic, but this is how Newton and Leibniz, as well as many people after them, thought about these things.

To a person trained as a modern-day pure mathematician, however, it can leave an uneasy feeling in their stomach.

Personally, I have learned to just appreciate it for what it is: an intuitive way to (oftentimes) get correct answers. Sometimes, however, if you are not careful, it can lead you astray to wrong answers.

The Quotient Rule discussed below is one situation where it is easy to get the wrong answer.

The Product Rule

What is the instantaneous rate of change of the product f(x)\cdot g(x)? Since the derivative of a sum is the sum of the derivatives, i.e., \frac{d}{dx}(f(x)+g(x))=f'(x)+g'(x), we might be tempted to say that the derivative of a product is the product of the derivatives.

This, however, would be incorrect. A simple example suffices to demonstrate this. Let f(x)=2 and g(x)=3x. Then f(x)\cdot g(x)=6x, whose derivative is 6. But the product f'(x)\cdot g'(x)=0 since f is a constant function.

The essence of this issue is this: it is not only the size of the derivatives f'(x) and g'(x) that affects the size of the derivative \frac{d}{dx}(f(x)\cdot g(x)), it is also the sizes of f(x) and g(x) themselves.

For the example above, f(1)=2, g(1)=3, and f(1)\cdot g(1)=6, while f(1.1)=2, g(1.1)=3.3 and f(1.1)\cdot g(1.1)=6.6. Therefore, when \Delta x=0.1, the change in the product is 6.6-6=0.6.

On the other hand, if we double the value for f to f(x)=4, then the change in the product is doubled as well:

f(1.1)\cdot g(1.1)-f(1)\cdot g(1)=4\cdot 3.3-4\cdot 3=13.2-12=1.2.

That should make sense when you think about formulas. After all, f(x)\cdot g(x)=6x in the first case and f(x)\cdot g(x)=12x in the second case.

Deriving the Product Rule with Infinitesimal Calculus

Let’s see if we can work out the derivative of a product f(x)g(x) using infinitesimal calculus. Suppose the input for the product changes by an infinitesimal amount from x to x+dx. Then:

d(fg)=f(x+dx)g(x+dx)-f(x)g(x).

Assuming f and g are differentiable, we can write df=f(x+dx)-f(x)=f'(x)dx and dg=g(x+dx)-g(x)=g'(x)dx. Therefore, f(x+dx)=f(x)+f'(x)dx and g(x+dx)=g(x)+g'(x)dx, so

d(fg)=(f(x)+f'(x)dx)(g(x)+g'(x)dx)-f(x)g(x).

Expanding this out gives

d(fg)=f(x)g(x)+f(x)g'(x)dx+f'(x)g(x)dx+f'(x)g'(x)dx^{2}-f(x)g(x).

Replacing dx^{2} with 0 and cancelling the two f(x)g(x) terms leads to d(fg)=f(x)g'(x)dx+f'(x)g(x)dx. Now just divide both sides by dx to get the Product Rule:

\frac{d}{dx}(fg)=f(x)g'(x)+f'(x)g(x).

This derivation is actually done in Lecture 17A. You will find Lecture 17A embedded further below.

There is a 3Blue1Brown Essence of Calculus video where Grant Sanderson talks about how to visualize this. The function values f(x) and g(x) represent lengths while their product f(x)g(x) represents an area.

3Blue1Brown: Visualizing the chain rule and product rule | Essence of calculus, chapter 4

Grant also recommends memorizing the Product Rule as “right dleft plus left dright”. There is a left function and a right function being multiplied and we’d like to take the derivative of the product. The “d” in the mnemonic represents differentiation.

I like this way of memorizing it because it really flows off of your tongue.

A quick application in my lecture is to find the derivative of h(x)=x^{2}e^{x}. The left function is f(x)=x^{2} while the right function is g(x)=e^{x}. Since f'(x)=2x and g'(x)=e^{x}, the Product Rule allows us to conclude that h'(x)=x^{2}e^{x}+2xe^{x}.

For applications where you need to determine where h'(x)=0, it is good to factor the answer as h'(x)=xe^{x}(x+2). That help you see that h'(x)=0 if and only if x=0 or x=-2.

Application to Revenue

Finally, I wrap up Lecture 16B with a business application. If p is the price of an item being sold, let q=f(p) be the corresponding demand, which is the number of items you will sell (over a certain time period, of course). Then the product R(p)=pq=pf(p) will be the revenue from the sales, which is the amount of money taken in.

The Product Rule implies that \frac{dR}{dp}=R'(p)=pf'(p)+f(p). It is interesting to note that R'(p)=0 if and only if f'(p)=\frac{f(p)}{p}. This has an interesting graphical interpretation in the lectures.

Newton’s Method and the Nature of the Square Root of Two

Lecture 17A ends with the derivation of the Product Rule and more discussion about the revenue example from above.

Calculus 1, Lecture 17A: Newton’s Method, Sqrt(2) Existence & Irrationality, Product Rule with Infinitesimals

Before that point, the main things I emphasize are Newton’s Method and the nature of the number \sqrt{2}. This content can also be found at my blog post: “Does the Square Root of Two Exist?“.

One thing that I prove is that, if it exists, the square root of 2 cannot be a rational number. This means it cannot be written as a ratio of the form \frac{p}{q} where p and q are integers (positive or negative whole numbers).

The proof of this is considered to be among the most beautiful in mathematics. Why? Because it is elegant (short and ingenious) and proves something very profound (deep and unexpected) about creation.

Another Video and More About the Square Root of Two

I won’t reproduce the proof here. I strongly encourage you to watch either the video above or the video below, where I prove a more general fact about irrational numbers.

Prove Square Roots of Non-Perfect Squares are Irrational

But here is perhaps a deeper question: does \sqrt{2} exist? This question is also addressed in the same blog post “Does the Square Root of Two Exist?“.

The proof of this fact from first principles is also harder. In fact, an entire research program to address this and related questions was started in the 19th century. It was called the Arithmetization of Analysis.

The proof can be done more easily with “higher-level” principles. In fact, I will discuss that further below under Lecture 18A. It is based on a theorem named the Intermediate Value Theorem (IVT).

Before moving on, you should take the time to realize that it is an issue that needs to be resolved. After all, no one but God knows all the infinitely many decimal places of \sqrt{2}! And think about this: If no person actually knows them all, how do we know that it is a well-defined number?

Newton’s Method

If we assume that \sqrt{2} exists, then our next goal is to approximate it. Newton’s method, alluded to above, is the quickest way to approximate it from scratch.

Newton’s Method relies on the fact that a tangent line to the graph of a nonlinear function f near a point x=x_{0} is a good approximation to the graph of f (see above again). Therefore, as long as the point (x_{0},f(x_{0})) is relatively close to the x-axis to begin with, then the x-intercepts of the function f and its tangent line should be close together.

The linear function L whose graph is the tangent line to f at the given point is defined by y=L(x)=f(x_{0})+f'(x_{0})(x-x_{0}). Call the x-intercept of this function x_{1}. Then 0=L(x_{1})=f(x_{0})+f'(x_{0})(x_{1}-x_{0}). Solving this equation for x_{1} yields x_{1}=x_{0}-\frac{f(x_{0})}{f'(x_{0})}.

Rinse and Repeat

This process can be repeated (iterated) to produce an x-intercept of the tangent line to the graph of f near x=x_{1}. This gives a new x-intercept x_{2}=x_{1}-\frac{f(x_{1})}{f'(x_{1}))}.

In general, we have a recursive formula that generates, based on an initial guess x_{0}, a sequence x_{0},x_{1},x_{2},x_{3},\ldots.

x_{n}=x_{n-1}-\frac{f(x_{n-1})}{f'(x_{n-1})}\ \mbox{ for }n=1,2,3,\ldots

The hope is that x_{n} approaches the true root of f as n\rightarrow \infty.

In fact, it often converges to the true root very rapidly.

Approximating the Square Root of Two

How should we use Newton’s Method to approximate \sqrt{2}?

Start by noting that, by definition, \sqrt{2} is the unique positive root of the differentiable and continuous function f(x)=x^{2}-2.

The recursive formula above becomes x_{n}=x_{n-1}-\frac{x_{n-1}^{2}-2}{2x_{n-1}}.

If we start by guessing x_{0}=1.5, the next value is x_{1}=1.5-\frac{1.5^{2}-2}{3}=1.5-0.08333\ldots=1.416\ldots. Using this number, the next value is x_{2}=1.416\ldots-\frac{1.416\ldots^{2}-2}{2.833\ldots}\approx 1.414215686. This is already very close to \sqrt{2}\approx 1.414213562. Newton’s Method does indeed seem to produce estimates that converge very quickly to the true value.

Lectures 17B and 18A: More Calculus Rules

I start Lecture 17B off with more discussion of the revenue example from above before diving into the Quotient Rule and Chain Rule.

Calculus 1, Lecture 17B: Demand & Revenue Curves (Geometric Relationship at Max), Quotient Rule, Chain Rule
Derivation of the Quotient Rule with Infinitesimal Calculus

The Quotient Rule can be derived with infinitesimals, though this is a situation where it is easy to make a mistake and get the wrong answer.

To be more precise, this is a situation where one infinitesimal will be ignored (replaced by zero) and one will not. It is difficult to know which should be replaced by zero to get the correct final answer.

Here is the calculation. Let h(x)=\frac{f(x)}{g(x)} and supposed the input gets “nudged” by an infinitesimal amount from x to x+dx. Then

dh=\frac{f+df}{g+dg}-\frac{f}{g}=\frac{g(f+df)-f(g+dg)}{g(g+dg)}=\frac{g\cdot df-h\cdot dg}{g(g+dg)}.

Now, here comes the extra-tricky part. In the bottom of this fraction, make the replacement dg=0. But do not do this in the top! This results in dh=\frac{g\cdot df-h\cdot dg}{g^{2}}. Dividing both sides by dx and explicitly showing the input x gives the Quotient Rule:

\frac{dh}{dx}=h'(x)=\frac{g(x)f'(x)-h(x)g'(x)}{(g(x))^{2}}.

The Trouble with Infinitesimals

But this is very confusing! Why should dg be replaced by zero in one spot but not in another? Is it excusable because we already know the “right” final answer?

As unsatisfying as it may be, I think this is just something that we’ll have to accept as part of the “risk vs. reward” of using infinitesimals. They can be fun and often get you to the right answers without using limits, but they can also easily lead you to making errors.

A fundamental fact that can be obtained using the Quotient Rule is the derivative of the tangent function.

\frac{d}{dx}(\tan(x))=\frac{d}{dx}\left(\frac{\sin(x)}{\cos(x)}\right)=\frac{\cos(x)\cdot \cos(x)-\sin(x)\cdot (-sin(x))}{\cos^{2}(x)}=\frac{1}{\cos^{2}(x)}=\sec^{2}(x).

I usually remember the Quotient Rule by the mnemonic: low dhigh minus high dlow over the square of what’s below.

Derivation of the Chain Rule with Infinitesimal Calculus

The Chain Rule tells us how to differentiate a function composition y=h(x)=(f\circ g)(x)=f(g(x)). In this case, it is best to define an intermediate variable, often called u, to be u=g(x). Then we can also write y=f(u).

If we know “nudge” the input by an infinitesimal amount from x to x+dx, then du=g'(x)dx. This then leads to a “chain reaction” that produces an infinitesimal change in the final output dy=f'(u)du.

But this last statement can then be written as dy=f'(u)du=f'(g(x))g'(x)dx. Dividing both sides by dx gives the Chain Rule:

\frac{dy}{dx}=h'(x)=f'(g(x))g'(x)=\frac{dy}{du}\cdot \frac{du}{dx}.

We can apply the Chain Rule to, for example, find the derivative of h(x)=\sin^{2}(x). We can write h as h(x)=f(g(x)) if we choose g(x)=\sin(x) and f(x)=x^{2}. Since f'(x)=2x and g'(x)=\cos(x), the Chain Rule implies that h'(x)=2\sin(x)\cdot \cos(x).

Contrast this with the derivative of h(x)=\sin(x^{2}), which is h'(x)=\cos(x^{2})\cdot 2x.

I discuss the derivations of these facts in Lecture 18A as well.

Calculus 1, Lecture 18A: IVT (Intermediate Value Theorem), Quotient Rule, Chain Rule, Derivatives of Logarithms & Inverse Tangent

But I also discuss many other things in Lecture 18A.

Proving the Square Root of Two Exists with the Intermediate Value Theorem (IVT)

As mentioned above, a “higher-level” proof of the existence of \sqrt{2} can be accomplished with a theorem, called the Intermediate Value Theorem (IVT).

The IVT requires a thorough understanding of continuity, however. This ultimately rests of the precise definition of a limit as well as the Completeness Property of the real number system {\Bbb R}. It is not an easy thing to prove from scratch.

Intermediate Value Theorem (IVT): Suppose f is a function which is defined and continuous on a closed interval [a,b]=\{x\in {\Bbb R}| a\leq x\leq b\}. If v is any number between f(a) and f(b), then there exists a number c\in [a,b] such that f(c)=v.

Applying the IVT to prove that \sqrt{2} exists is pretty easy. Let f(x)=x^{2}-2 (the same function we applied Newton’s Method to above). Then f is continuous over the whole real number line {\Bbb R}. In particular, f is continuous on the closed interval [a,b]=[1,2]. Also note that f(a)=f(1)=-1 and f(b)=f(2)=2. Let v=0 and note that v is between f(a) and f(b). The IVT now implies the existence of a number c\in [1,2] with the property that f(c)=0. But this means c^{2}=2. In other words, c=\sqrt{2}. Q.E.D.

Other Content

Other content of Lecture 18A includes the derivations of the facts that \frac{d}{dx}\left(\log_{b}(x)\right)=\frac{1}{\ln(b)x} for all x>0 and \frac{d}{dx}\left(\tan^{-1}(x)\right)=\frac{d}{dx}\left(\arctan(x)\right)=\frac{1}{1+x^{2}} for all x\in {\Bbb R}.

The first of these rules also includes the case where b=e so that \ln(b)=\ln(e)=1. Therefore, \frac{d}{dx}\left(\ln(x)\right)=\frac{1}{x}.

These facts are derived using some ingenuity along with the Chain Rule. For example, assuming that \tan^{-1}(x) is differentiable for all x\in {\Bbb R}, the Chain Rule implies that the equation \tan(\tan^{-1}(x))=x can be differentiated on both sides to get \sec^{2}(\tan^{-1}(x))\cdot \frac{d}{dx}\left(\tan^{-1}(x)\right)=1. Multiplying both sides of this equation by \cos^{2}(\tan^{-1}(x)) implies that \frac{d}{dx}\left(\tan^{-1}(x)\right)=\cos^{2}(\tan^{-1}(x)). By drawing a right triangle and labeling one of the non-right angles with \tan^{-1}(x), you will see that \frac{d}{dx}\left(\tan^{-1}(x)\right)=\frac{1}{1+x^{2}}. (Trigonometry and the Pythagorean Theorem are needed here — see Lecture 18A above).