Adventures in Mathematics PDF

Exponentiation: Theorems, Proofs, Problems
Pre/Calculus 11, Veritas Prep.
Our Exponentiation Theorems

n
Theorem A: an+m = an am Theorem E: a = an−m
am
0
Theorem B: (an )m = anm Theorem F: a = 1
1
Theorem C: (ab)n = an bn Theorem G: a−n = n
a n a n a√
Theorem D: = n Theorem H: an/m = m an
b b
Like all theorems, these do not come out of nowhere. They come from a definition and logical deduction.
Let us start, then, with a definition. What is exponentiation, anyway? What do we mean when we
write a number with a superscript? I contend that we mean1 something like repeated multiplication. When
we write, for example, 54 , we mean this simply as a more convenient way of writing 5 · 5 · 5 · 5 (or, even
more simply, 625). Formally, let’s define exponentiation in the following way2 :
an ≡ |a · a ·{za · · · a}
n times
(The triple-equals-sign here is used to show that this is a definition—that this equation is not the conse-
quence of some earlier theorem or axiom, but that with it we’re defining what we mean by an .) Let us
prove these theorems.
Theorem A: an+m = an am
Proof: We’ll start with the left side of the equation, apply the definition of exponentiation, do some
algebra, and eventually end up with the right side.
an+m | · a ·{za · · · a}
=a (by the definition of exponen-
n+m times tiation)
| · a ·{za · · · a})(a
= (a | · a ·{za · · · a}) (because multiplication is as-
n times m times sociative)
= an · am (applying the definition of ex-
ponentiation again)
(done! this stylized A is my
end-of-proof symbol.)
Theorem B: (an )m = anm

Proof: To prove this, we’ll need to apply the definition of exponentiation—twice. (Well, actually three
times, but the last time doesn’t count.)
1
For the most part. If you’re a mathematician, we could have a lengthy conversation about everything I say on this sheet,
but for our purposes this definition is sufficient. If you disagree (and this is in an imaginary conversation between myself and
a mathematician), I will say but this: do you believe we should teach students Dedekind cuts before we let them utter the
words, “real number”?
2
Note that we haven’t put any specifications on what a and n can and/or should be—must they be integers? real numbers?
complex numbers? The way our definition works, what with the “n times” business, it must only hold for n being a positive
integer, but these theorems hold for n being any real number. What we’ll see in our derivations is that, even though we start
by considering n only as a positive integer, we can extend our idea of “exponentiation” in a very logical way such that it holds
for any real number. But this is in a footnote for a reason.
1
(an )m | · a ·{za · · · a})
= (a m (by definition)
n times
| · a ·{za · · · a}) · (a
= (a | · a ·{za · · · a}) · · · (a
| · a ·{za · · · a}) (by definition, again)
n times n times n times
| {z }
m times
| · a ·{za · · · a}
=a (multiplication)
n*m times
= anm (finish by re-applying the def-
inition in reverse)
Theorem C: (ab)n = an bn
Proof: See a pattern? We’ll apply the definition of exponentiation, do some algebra, and eventually get
what we want.
(ab)n = |ab · ab
{z· · · ab} (by definition)
n times
| · a{z· · · a}) · (b| · b{z· · · }b)
= (a (multiplication is commuta-
n times n times tive —we can rearrange)
= a bn
n (definition)
a n an
Theorem D: = n
b b
Proof: How do you think this should go?
an
Theorem E: m = an−m .
a
Proof: This is where things start to get interesting. The clear first step is to write out the entire quantity:
n times
an
z }| {
a · a···a
=
am | · a{z· · · a}
a
m times
But where we go from here will depend on the relative size of n and m. n could be bigger than m, it could
be smaller, or the two could be equal.
Let’s start with the first case—that of n > m. In that case, both the top and the bottom will have at
least m of the a’s, and the top will have more— it’ll have n − m more (because m + (n − m) = n). Put
2
differently, we can think of our fraction as being something like this:
m times n-m times
an
z }| { z }| {
a · a···a·a · a···a
= =
am | · a{z· · · a}
a
m times
If you’re confused, count up the total number of a’s on top—there should be n of them, just like what we
started with. And then—wait a second! There are m a’s on top, and m a’s on the bottom! We can cancel
them out! And we just end up with:
n-m times
z }| {
= a · a···a
Which, by definition, is just an−m .
But what about these other two cases? First of all, what happens in the case that m = n? Well, if
n n m
m = n, then aam will just equal aan (or aam —either way is the same). But that clearly3 is just equal to 1.
And I hope you will agree that if n and m are equal, then n − m = 0, and that then an−m = a0 . So clearly,
n
then, since we want aam = an−m , then a0 = 1. (Which should make sense. 1, after all, is the multiplicative
identity, so if we multiply something “no” times, we should end up not with zero, but with one. Think
about how if you have a fraction in which everything cancels out, you have 1, not 0. Same sort of deal.)
Finally. What if n < m? Then (by the same argument as the first case) we have something like:
n times
an
z }| {
a · a···a
= m =
a |a · a{z· · · a} · |a · a{z· · · a}
n times m-n times
But wait! We can cancel things here, too! We get:

1 1
= = m−n
a · a
| {z }· · · a a
m-n times
1
But this isn’t quite what we want. We want an−m . So let’s define stuff as being equal to a-stuff . That
a
way, our equation here will become:
1
= = a−(m−n) = a−m+n = an−m
am−n
Excellent.
This has been a long discussion, so let’s review what we’ve done. We’ve extended our idea of an
exponent from just the positive integers to all integers (i.e., we can now exponentiate by 0 and negative
numbers!) But in order to do this, we had to make a slight extension of our definition. Our original
definition only considered the case of an exponent being a positive integer; in the course of this proof, we
discovered a natural way to extend that definition to 0 and the negatives. Thus this has been somewhat
more than just a “proof”—it has been partly a proof (for n > m), and partly a redefinition.
Theorem F: a0 = 1
Proof: We already discussed this, in the proof of E
1
Theorem G: a−n = n
a
Proof: Again, we already discussed this in the proof of E
3
Note how my writing style is a bit different in this proof than in the previous ones—I’m making an argument in prose
rather than in a nice, clean, two-column format. That’s OK. Many math textbooks do the same thing, and it’s the job of the
reader (and what a job it is!) to translate that into a more clear, more symbolic format if necessary.
3
√
Theorem H: an/m = m an
Proof: So far, we already know how to deal with exponents when the exponents are positive integers
(thanks to our definition), zero (thanks to F), or negative integers (thanks to G). But what if we want to
move beyond the integers and have exponents that are rational numbers (i.e., fractions)? This theorem
will tell us how to think about that. √
First of all, I should point out that we don’t really have a solid definition of what we mean by stuff.
(From a technical, mathematical standpoint, what we’re doing here is defining what we mean by such
notation.) But intuitively, I hope you agree that the basic idea of a radical/root is that if I have an nth
root of something, and I multiply n of those nth roots, I end up with my original thing. Meaning the thing
under the radical sign. Right? Good.
So imagine we have a1/m . (Can we do this? Let’s assume we can.) Now imagine we have m copies of
it, all multiplied together:
1/m
|a · a1/m ·{za1/m · · · a1/m}
m times
By the definition of exponentiation, this must be (a1/m )m . But by B, we can simplify this:
a1/m·m = am/m = a
In other words, whatever this a1/m is, if we take m copies of it, we get a. So this is just a root! (The mth
root, to be specific). Formally: √
a1/m = m a
What about the rest (the n/m part?) Imagine we have an/m√ . Because of Theorem B again, we can write
n
this as (a )1/m m n
. Which, because of what we just proved, is a .
So now we know how to exponentiate by any rational number. Hooray!
Problems
Evaluate each of the following expressions.
1/2 0
1. 4−3 25. 8243,458 + 121437

4
13.
2. 5−3 25 1/2
−3 −2/3 26. 150 + 15
2 1 −1
3. 14. 27. 3−1 + 3−2
3 125
4. 95/2 1 1
15. 163/4 28. +
1/2 4−2 4−1
5. 9−3/2 16. 1210 64
6. 16−3/4 29.
17. 5−1 · 5−3 642/3
7. 64−2/3 18. 5−2 · 6−2 91/2
2/3 30.
8. 50 19. 4−3 + 8−2 27−1/3
√ √ √
0
20. 60 + 6−1 31. 5 20 − 45 + 2 80
9. 71/3 √ √ √
1/2
21. 30 + 3 32. 3 40 + 2 3 135 − 5 3 320
10. 81−3/4
1/3
1 22. −90 + 91/2 211/12 · 2−7 · 2−5
11. √ 33.
27
2 23 · 21/2 · 2−10
23. 3 216
(32 )−1/2 (94 )−1
1/3 34.
8 24. 811/2 − 81−1/2 27−3
12.
27
4
Simplify each of the following expressions so that they are written only with positive exponents and no
radicals.
−2 √
5 (6a)1/2 ab
35. −3 45. 2x 3y 54.
y a2 b3/2
−8 18x −2
x !15/9
36. −2 46. −3 r 2/3
x 9xy 55.
−1 −2 s1/5
37. 2x y y 47. (27x−3 y −9 )1/3
√
38. (3x−2 y)(4x5 y −4 ) 4
48. r8 s12 56. (c2/5 d−2/3 )(c6 d3 )4/3
√
39. (2x3 y)−2 3
49. 27a−3 b (2a)1/2 (3b)−2 (4a)3/5
−1 −3 57.
40. (3a b) (4a)−3/2 (3b)2 (2a)1/5
p
50. 7 −x14 y 28
41. (16x−2 )1/2 2
p
51. 9 (4x + 2y)18 58. (ax )1/x
42. (8y 6 x−3 )1/3 √3
p
52. √ a + b + 3 −(a + b)2 + (bx )x−1
−1
43. (4x )(2x) −2 3
a + b 59.
b−x
x2 (7a)2 (5b)3/2 c
44. −2 −1 53. 3/2 4
60. 5/6 42 51 −2/3
x y (5a) (7b) (c ) (c )
Prove that each of the following equations are true. (How do you do this? Think of it as being like a
two-column proof in geometry. Start with one side of the equation and step-by-step apply exponentiation
laws until you end up with the other side of the equation.)
√ √
ab 1 c2 d6 d5 aa b7 1
61. 2 3/2 = √ 62. √ =√ 63. √ = 14−a
a b ab a 3
4c d −4 4c (a b) 14 a
5
Solving (Polynomial) Equations
Solve each of the following equations for x. (What do I mean by “solve”? I mean, find all the values of x
such that the equation is true.)
1. x = 0 22. x2 − 5x = 14 43. x2 − 2x = 12
2. 2x = 4 23. x2 + 5x + 6 = 0 44. x2 − 4x − 30 = 0
3. 3x = 5 24. x2 + x = 20 45. x2 − x − 1 = 0
4. x − b = 0 25. 2x2 + 5x − 3 = 0 46. x2 + 3x − 2 = 0
5. 4x − 2a = −7 26. 3x2 − x − 2 = 0 47. x2 − 4x + 1 = 0
6. 5 = 3 27. 4x2 + 9x + 2 = 0 48. x2 + 6x + 7 = 0
7. x − 5 = 6 28. 9x2 + 2 = 11x 49. x2 + 6 = 2x
8. x2 = 9 29. 3x2 + x = 4 50. 4x2 − 4x = 7
9. x2 = 12 30. 5x2 + 26x = −5 51. 4x2 − 8x + 1 = 0
10. x2 = 40 31. 12x2 + 13x = 4 52. 5x2 + 8x = −2
11. −x2 = −10 32. 18x2 = 23x + 6 53. x2 + 9x + 18 = 0
12. 3x2 = 12 33. x2 − yz + xz − xy = 0 54. 4x(x + 1) = 1
1 2
13. 2x = 10 34. x6 − 2x4 − 8x2 + 16 = 0 55. 2x2 = 7x + 15
14. −5x2 = −30 35. a3 − 2b2 + 2a2 b − ab = 0 56. x2 + 4x + 13 = 0
15. −3x2 = 11 36. u2 x − 2w2 − 2uxw + uw = 0 7x2 2x
57. 3 = 3 −1
16. 25x2 − 4 = 0 37. x3 + 4x2 − 8x − 32 = 0 58. 25x + 4
x = 20
17. 4x2 − 28 = 0 38. z8 − 5z 7 + 2z − 10 = 0 √
59. 7x2 + 3x + 33 = 0
18. −3x2 + 8 = −20 39. 12x3 + 9x2 + 8x + 6 = 0 60. x2 + πx + 3.1 = 0
19. −2x2 − 11 = 5 40. 10x2 y − 8x2 + 5y − 4 = 0 61. 5x2 + 5x + 5 = 0
1
20. x =0 41. 3x2 + 5x + 11 = 0 62. 10αx29 −40gαx27 −5yx2 α2 +
21. x2 − 8x + 15 = 0 42. px2 + qx + r = 0 20α2 gy = 0
Painting Perfect Polynomials
Calculus 11/12, Veritas Prep.
Polynomials, as you have known for years, are functions that look like
f (x) = 5x2 + 3x − 2 f (x) = −8x15 f (x) = 3x + 4

f (x) = 25.7x9,000,000 − 3x f (x) = 9 f (x) = x3 + 92x
and so on. Basically, we have a bunch of terms added together, where each term looks like a variable raised
to some power times some coefficient. We can have as many or as few terms as we want; the coefficients
can be integers, real numbers, whatever.
The only real restriction is that the variable must be raised to some non-negative, integer power. For
example, as tempting as it is to say that something like 5x−2 − 3x−1 is a polynomial, it’s not. Sorry. We
exclude things with negative exponents from our definition of a polynomial because they cause problems—
we get vertical asymptotes and such, which are messy and discontinuous, and we want to be able to live
in a world without discontinuities. Such is the world of polynomials.
Likewise, we don’t want to call something with fractional exponents, like 5x1/2 − 3x2/3 , a polynomial,
either. Imagine x1/2 —a square root. What if you try to plug a negative in (i.e., take the square root of
a negative number)? You can’t do it. (Not in the real numbers, anyway.) We don’t want that to be a
polynomial—we want to be able to plug any number, positive or negative, big or small, into our polynomial,
and get a result.
What about something like x1/3 ? This is just another way of writing a cube root. And we can take
the cube root of a negative number—the domain of x1/3 is all real numbers. Except... what’s going on at
x = 0? The function has a vertical tangent line. (Think about what the graph looks like.) Put differently,
the function, at the instant that x = 0, is vertical. (But not so much that it violates the vertical line test...
crazy, eh?) And the immediate consequence of that is that its derivative (which is, after all, just the slope
of the function) has a vertical asymptote there. Ouch. We don’t want that—we want the function to have
no vertical asymptotes, and we want its derivatives to have no vertical asymptotes, either. Not only do we
want functions that behave nicely on the surface, we want them to still behave nicely once we start doing
calculus with them (we want to be able to take their derivatives, as many derivatives we want, and still
not get anything messy).
What else? Trigonometric and exponential functions (ex , sin(x), etc.) aren’t polynomials either. What
we’ll see eventually in calculus is that if we could write polynomials with infinitely many terms, we could
write trig functions and exponentials as polynomials1 . But that’s for later. For now, we’ll require our
polynomials to be of finite length because infinity is nasty and nebulous and it’s not entirely clear what
we mean by “infinity”2 .
My point with all of this is that the definition of a polynomial is not something chiseled in stone
that ancient mathematicians found and which has been passed down, generation to generation, ever since.
The reason we define polynomials the way we do is because we want to have a class of functions that, to
put it colloquially, behave nicely. We want to have functions that are defined everywhere—functions free
of vertical asymptotes, holes, or other nether regions. We want to have functions that are differentiable
everywhere—and whose derivatives are also differentiable everywhere, and also have no unpleasant features.
This is why we want this thing we call a “polynomial” to satisfy all these conditions about the exponents
being positive integers and so forth—because we want to create a mathematical playground we know we
can have fun in without getting hurt.
1
And that in fact, trig functions are just a kind of exponential function.
2
That is not to say that infinity is totally un-understandable—just that we’re not yet at the point where we can. For a
great book on this subject (my favorite popular math book, actually) see David Foster Wallace, Everything and More: A
Compact History of ∞ (W.W. Norton, 2003).
1
Formally, we define a polynomial (written in standard form) as a function of the form:
f (x) = cn xn + cn−1 xn−1 + cn−2 xn−2 + · · · + c2 x2 + c1 x1 + c0 x0
or just
f (x) = cn xn + cn−1 xn−1 + cn−2 xn−2 + · · · + c2 x2 + c1 x + c0
where n is a nonnegative integer and where the set {cn , cn−1 , cn−2 , . . . , c2 , c1 , c0 } are real numbers (called
the coefficients of each term). The subscripts are just a way of labeling each of the coefficients—you could
label them as a, b, c etc., if you wanted, but it’d make it hard to write the definition (if you start with
axn + bxn−1 . . . , what would you call the coefficient on the x1 and x0 terms?) Note that when we write
them as cn , cn−1 , · · · , the n and n − 1 have no connection to the actual, numerical value of the coefficient—
they’re just a way of saying, “this number ck (for some k) is the number I want to be the coefficient of the
xk term.” For example, maybe you have a third degree polynomial like 4x3 + 8x2 + 7x + 1.4. Then c3 = 4,
c2 = 8, c1 = 7, and c0 = 1.4.
If you like, you can write the definition as a finite sum3 :
k=n
X
f (x) = ck xk
k=0
There’s some terminology associated with polynomials. The value of the highest exponent of the polynomial—
in this case, n—is called the degree of the polynomial. The term with the highest exponent is called the
leading term (because it’s usually written first), and the coefficient on that term is called the leading
coefficient, like so:
leading term
z }| {
f (x) = cn xn + cn−1 xn−1 + cn−2 xn−2 + · · · + c2 x2 + c1 x + c0
| {z } | {z } |{z}
term another term constant term
Often the terms between the leading term and the constant term are called the cross terms.
We can also write a polynomial in factored form, which is often more convenient. (Sometimes this is
also called the complete factorization of the polynomial.) In fact, for sketching polynomials, it is more
convenient, since it makes it much, much easier to find the x-interecepts (which I’ll talk about in a bit).
For example...
Standard form: f (x) = x2 + x − 6 Standard form: f (x) = 5x2 + 5x

Factored form: f (x) = (x − 2) (x + 3) Factored form: f (x) = 5x(x + 1)
| {z } | {z }
factor factor
On the other hand, consider something like f (x) = (x − 2)(x + 3)2 − 7. This is neither factored nor in
standard form. It’s just a weird Frankenstein polynomial.
3
I mentioned earlier that if you could write polynomials with infinitely many terms, you could write exponential functions
and trig functions as polynomials. Here are two common examples:
k=∞
X xk x2 x3 x4
ex = =1+x+ + + + ···
k! 2! 3! 4!
k=0
k=∞
X (−1)k x2k+1 x3 x5 x7
sin x = =x− + − + ···
(2k + 1)! 3! 5! 7!
k=0
(Remember that the exclamation mark is the symbol for the factorial, e.g., 4! = 4 · 3 · 2 · 1.) Look up “Taylor series” if you
want to learn more.
2
Factoring can be tough. When you first learn it, it’s really hard. There’s no formula for how to
do it. You’re sitting there, a wee little eighth-grader confronting quadratics like x2 − 2x − 15, and you’re
not sure what to do. So you write down (x+ )(x+ ), and stare at it for a while. You experiment. You
try some numbers, multiply them out, and they don’t work. So you try some more. And eventually—in a
blinding flash of insight—you realize how to do it! You realize that it’s (x + 3)(x − 5)! In a sense, then,
this is the first time that you really get to experience the feeling of doing math. You get that feeling of
being lost in the woods without a map, and having to stumble through, groping at dark branches, trusting
that eventually you’ll make it out, and then finally, without warning, coming to an opening and pushing
the branches aside and seeing the glory of a beautiful [something]!
You keep factoring. You do more and more problems, and it gets easier. You come up with little
strategies for how to factor, little mental heuristics, and you develop an intuition. And you wonder: could
you formalize this intuition? Meaning: you’ve gotten good at factoring, but you still don’t really want to
have to factor every quadratic you see by hand. It would be nice if you could do it automatically. It would
be nice if you could come up with an equation that tells you how to factor!
So you work on this project for a while. And eventually you come up with the method of completing
the square, and eventually you derive the quadratic equation. And with the quadratic equation, you can
now factor every quadratic!!! To wit:
√ ! √ !
−b+ b 2 − 4ac −b− b2 − 4ac
ax2 + bx + c = x − x−
2a 2a
If you’re skeptical, just multiply out the right side (using FOIL or something). After a lot of simplification,
you’ll get just ax2 + bx + c! Isn’t it beautiful? Here’s a picture of the quadratic equation spray-painted
onto a Jersey barrier in New York4 :
It turns out that we can do the same thing for any cubic polynomial. Factor it, I mean. Though I guess
you could also use it as a graffiti subject. We can come up with a “cubic formula” which is hideously long
but does factor any third-degree polynomial. Likewise with quartics (fourth-degree polynomials), though
the quartic formula is so long and messy that it’s kind of a nightmare. But as it turns out, for fifth-degree
polynomials and above, there is no general formula for factoring... i.e., sometimes it’s impossible to factor
fifth-degree and greater polynomials! Thank Évariste Galois (1811-1832) for proving that, and then getting
killed in a duel over a girl before he could do even more awesome mathematics.
4
reprinted under a Creative Commons license from http://www.flickr.com/photos/37665276@N00/76780093
3
Honestly, although it might seem boring on this level, the theory of factoring is one of my favorite
things in mathematics. When you do it abstractly enough (i.e., you define a polynomial such that it might
not involve numbers, or even arithmetic), there is a lot of beauty in factorization. It requires a lot of
mathematical machinery, but there are some really amazing connections between whether you can factor
a polynomial and the symmetry of the coefficients. I don’t really know how to explain it in a way that
would make any sense to someone who hasn’t seen a lot more math, but it really is quite beautiful.
Anyway, take abstract algebra before you die. Hopefully a lot sooner. Getting back on topic, the one
disadvantage to writing a polynomial in factored form (aside from the fact that it’s difficult and sometimes
impossible) is that it’s not as immediately obvious what the degree is. But it’s not hard to figure out—
just count up all the x’s (including the exponents on the outside of factors). For example, the function
f (x) = x(x − 2)(x + 4)3 (2x − 6.39)2 is of degree 7.
More broadly, the reason we care about factoring is that we want to sketch polynomials, and so
we need to know the x-intercepts. If we know the factored form of a polynomial, it’s very easy to find the
x-intercepts. Which brings us to the graphs of polynomials.
Graphs of Polynomials
You probably already know that simple, one-term polynomials (“monomials”) are quite predictable.
Even-degree monomials look like this:
etc...
f (x) = x2 f (x) = x4 f (x) = x6 f (x) = xn , n even
And odd-degree monomials look like this:
etc...
f (x) = x3 f (x) = x5 f (x) = x7 f (x) = xn , n odd
But for the most part, we are curious about polynomials that have lots of terms—stuff like
f (x) = 4x6 − 20x5 + 20x4 + 3x3 − 5
What happens when we add all of those extra terms? How do all of those cross-terms affect the shape
of the polynomial? The basic idea is that when we add these additional terms, we might introduce the
possiblity of factoring the polynomial differently, which would give us new x-intercepts, and consequently,
new maxima and minima. But the polynomial still retains the “general shape” of its leading term—meaning
that if you zoom out far enough, the polynomial above will look more and more like
f (x) = 4x6
In other words, it’ll look more and more like a parabola. Here’s what the function looks like:
4
zoomed in zoomed out
Another way of thinking about this is that the polynomial has an end asymptote at whatever its
leading term is. In the above example, for instance, the polynomial f (x) = 4x6 − 20x5 + 20x4 + 3x3 − 5
has an end asymptote at f (x) = 4x6 (a parabolic asymptote).
Let me see if I can try to justify the fact that the end behavior depends only on the leading term.
This isn’t a formal proof, but it does contain the basic idea as to why this behavior happens. Imagine you
have some polynomial. Say you have f (x) = x6 + x4 + 3x2 + 7x + 24. What happens as you plug in bigger
and bigger values for x? Let’s make a table and see.
x x6 x4 3x2 7x 24
0 0 0 0 0 24
1 1 1 3 7 24
5 15, 625 625 75 35 24
10 100, 000 1, 000 300 70 24
100 1, 000, 000, 000, 000 100, 000, 000 30, 000 700 24
1000 1, 000, 000, 000, 000, 000, 000 1, 000, 000, 000, 000 3, 000, 000 7, 000 24
What’s going on here? When we plug in zero for x, everything is zero, except for the constant term.
Near x = 0, all the other terms will be comparatively small, so the constant term (to use a fancy phrase)
is the “dominant term”, and the function will be close to y = 24. But as we plug in larger values for x, 24
just stays the same, and all of a sudden other terms start getting larger than it.
As we plug in bigger and bigger values for x, not only is the leading term (x6 ) getting exponentially
bigger, but the difference between x6 and all the other terms is itself getting exponentially bigger. Compared
to 24, 7, 000, or even one trillion, 1, 000, 000, 000, 000, 000, 000 is just way, way bigger. (It’s 100, 000 times
larger than even 1, 000, 000, 000, 000. )5
Let’s look at that a little more closely. Here’s the table again, but this time we’ll include a column
that measures the difference between the leading term (x6 ) and all the other terms (x4 , 3x2 , 7x, 24). We’ll
do that by writing (in the far-right column) the leading term as a percentage of all the other terms—i.e.,
x6
we’ll compute 4 .
x + 3x2 + 7x + 24
5
You can see these more easily if you write them in scientific notation: 1, 000, 000, 000, 000, 000, 000 = 1018 and
1, 000, 000, 000, 000 = 1012 . So then 1018 /1012 = 1018−12 = 106 = 100, 000.
5
leading term as a
x x6 x4 3x2 7x 24
% of other terms
0 0 0 0 0 24 0%
1 1 1 3 7 24 2.8%
5 15, 625 625 75 35 24 610%
10 100, 000 1, 000 300 70 24 2, 495%
100 1, 000, 000, 000, 000 100, 000, 000 30, 000 700 24 250, 000%
1000 1, 000, 000, 000, 000, 000, 000 1, 000, 000, 000, 000 3, 000, 000 7, 000 24 25, 000, 000%
Do you see that? By the time x is 5, the leading term is six times bigger than all the other terms
combined. By the time we’re at x = 100, it’s 2, 500 times bigger.
In summary, then, it’s easy to find out what a polynomial looks like on the far left side and the far right
side—we just look at the leading term. The tails of a polynomial always go either up to infinity or down
to negative infinity—there are never any horizontal asymptotes or anything like that. Put differently...
• Polynomials of even degree: tails both go in the same direction.
– Leading coefficient +: tails both go up (like +x2 ).

– Leading coefficient −: tails both go down (like −x2 ).
• Polynomials of odd degree: tails go in opposite directions.
– Leading coefficient +: like +x3

– Leading coefficient −: like −x3 .
The real question, then, is what happens in the middle of the graph. In the middle of the graph (i.e.,
near the origin), polynomials bounce up and down and up and down and etc. etc. etc.—this is why we
love polynomials—and they√ always do so smoothly and continuously, with never a vertical asymptote (like
3 2
1/x has) nor cusp (like x has) or anything messy like that. Polynomials are very nice functions. They
are continuous everywhere and differentiable everywhere.
Here are some fun facts (proofs omitted—you can do them with calculus, though).
• A polynomial of degree n has
– at most n x-intercepts (maybe fewer)

– at most n − 1 extrema (maxima and minima) (but again, it might have fewer)
– at most n − 2 inflection points (points where the concavity changes, and yes, it can have fewer
than n − 2)
And here’s what we really want to talk about, so that we can know how these polynomials bounce up and
down:
x-Intercepts/Roots/Solutions/Zeroes
• Four different names for the same thing—basically interchangeable. I usually use “root” or “zero”,
but it doesn’t really matter. Mad important! Algebraic geometry! Roots!
• Occur wherever the polynomial = 0 (given a polynomial f (x), set f (x) = 0 and see what values of x
make that true.)
• We already said that polynomial of degree n can have up to n roots (can have fewer).
– Even-degree polynomials can have as few as 0 roots (why? Give an example.)
6
– Odd-degree polynomials must have at least 1 root (why?)
• Easy to find x-intercepts when polynomial is in factored form. When any individual factor = 0, the
whole function will be 0 (because 0 · (stuff) = 0 )
• So it’s sufficient just to find where each factor equals zero (i.e., each factor has a corresponding
x-intercept/root/solution/zero).
– For example, consider the polynomial f (x) = (x + 2)(x − 3). We know the polynomial will have
roots wherever (x + 2) = 0, and wherever (x − 3) = 0. So it has roots at x = −2 (from the first
factor) and x = 3 (from the second factor).
– Note that some factors might not cause x-intercepts—for 2
√ example, consider (x + 1). If we set
this equal to zero and try to solve for x, we get x = −1, which is not a real number (it’s
a complex number). So it doesn’t show up on the graph, and doesn’t affect any of the main
properties of the graph (except maybe the y-intercept).
• Each root also has a multiplicity. We say the multiplicity of an x-intercept/solution/zero/root is

the number of times the factor creating the particular root shows up in the complete factorization
(in the factored form of the polynomial).
For example, the polynomial f (x) = (x − 3)(x + 5)2 has a root at x = 3 of multiplicity 1, and a root
at x = −5 of multiplicity 2.
Why do we care? Because the multiplicity tells us what the polynomial looks like near that root! it
tells us the shape of the polynomial near the x-intercept/root/solution/zero
In particular...
– if the root is of multiplicity 1: polynomial looks like a straight line near that root (crosses
through axis)
– if the root is of even multiplicity: polynomial looks like a parabola (even-degree polynomial)
near the root (i.e., it only bounces off the axis; it doesn’t cross it).
– if the root is of odd multiplicity: polynomial looks like an odd-degree polynomial near the root
(e.g., crosses through axis and is bendy as it goes through).
This fxn has a root at This fxn has a root at This fxn has a root at
x = 3 of multiplicity 1 x = 3 of multiplicity 2 x = 3 of multiplicity 3
y-intercept
• This can also be useful. Since the y-axis is just the line x = 0, the y-intercept is just the value of the
function when x = 0. So we can plug 0 in for x to find the y-intercept.
• For example, consider the polynomial f (x) = 5x6 −2x3 +4x+32. Then f (0) = 5·06 −2·03 +4·0+32 =
−12, so the y-intercept is at y = 32. Which is just the constant term.
7
• Or if we have something factored, like f (x) = (x+3)(x−4)(x+2)2 . Then f (0) = (0+3)(0−4)(0+2)2 =
(3)(−4)(2)2 = −48. So the y-intercept is at y = −48.
Let’s graph a real polynomial!
• Consider a complicated example: f (x) = x(x − 2)(x + 4)3 (2x − 6.39)2 . Let’s graph it!
• First, let’s find the x-intercepts/roots/solutions/zeroes. This function has four factors, so we’ll set
each of them equal to zero and figure out the corresponding x-intercept/root/solution/zero.
– if x = 0, then x = 0 is an x-intercept/etc. (of multiplicity 1)

– if x − 2 = 0, then x = 2 is an x-intercept (of multiplicity 1)
– if (x + 4)3 = 0, then we do some more algebra...
p (x + 4)3 = 0√
3
(x + 4)3 = 3 0
(x + 4) = 0
x = −4
so x = −4 is an x-intercept. And, since the factor (x + 4)3 is cubed, the root is of multiplicity
3.
– and likewise with (2x − 6.39)2 , which gives us an x-intercept at x = 3.195 (of multiplicity 2).
• We can summarize our findings:

f (x) = x(x − 2)(x + 4)3 (2x − 6.39)2
Factor x x−2 (x + 4)3 (2x − 6.39)2
Root 0 2 −4 3.195
Multiplicity 1 1 3 2
• How about the y-intercept? We’ll just plug 0 in for x, and get: f (0) = 0(0−2)(0+4)3 (2·0−6.39)2 = 0.
So this polynomial has a y-intercept at 0
• Also, we know that the polynomial is 7th degree, and the leading term is positive, since there were
no negatives in front of any of the x’s in the factorization (and no negative in front), so the general
shape will look like +x3 .
• So now we can graph it! All we have to do is plot the x-intercepts and, basically, connect the dots—we
need to know where to start drawing a line from (in this case, starting at the bottom left and ending
at the top right, because it looks like +x3 ), and we need to know what the polynomial looks like at
each of the x-intercepts (which the multiplicity tells us). But then we can do it!
8
Another example
• Consider f (x) = −(x − 3)(x + 1)2 (x − 1)3
• This is sixth-degree, and there’s a negative in front, so the general shape is like −x2 (so both the
tails go down).
• It has a y-intercept at f (0) = −(0 − 3)(0 + 1)2 (0 − 1)3 = −3.
• And the roots and their multiplicities are thus:
f (x) = −(x − 3)(x + 1)2 (x − 1)3

Factor (x − 3) (x + 1)2 (x − 1)3
Root 3 −1 1
Multiplicity 1 2 3
• So our sketch should look like:
Going backwards
• What if we wanted to do the reverse—to walk into the Metropolitan Museum of Art, see a beautiful
oil painting of a polynomial on the wall (there is a little-known mathematics gallery underneath the
Temple of Dendur), and show off our math skills to our attractive date by saying, “Aha! I can come
up with an equation for that polynomial!”?
• We could do just that.
• So imagine that the painting that strikes your fancy looks like this:
9
• What can you say about this just from looking at it? We can see that both tails point down, so it
must be of even degree, and it must have a negative somewhere (making it open down rather than
up).
• We can also see that it has three roots.
– It has a root at x = −2, and since the polynomial looks like a parabola near x = −2, this root
must be of even multiplicity. It might be of multiplicity 2, multiplicity 4, 6, etc. So a possible
factor of the polynomial could be (x + 2)2 .
– It has a root at x = +1, and since it goes through the axis, straight through, that root must be
of multiplicity 1. So it must be created by the factor (x − 1).
– Finally, it has a root at x = +3, and since it goes through the axis and is kinda bendy as it goes
through, that root must be of odd multiplicity—maybe multiplicity 3, maybe 5, maybe 341. So
a possible factor could be (x − 3)3 .
We can summarize our findings, similarly to how we did before:

Root x = −2 x = +1 x=3
Multiplicity even 1 odd
Possible Factor (x + 2)2 (x + 1) (x − 3)3
• So now we can take a stab at coming up with an equation! We’ll just pile all those factors together,
and remember to include a − in front:
f (x) = −(x + 2)2 (x + 1)(x − 3)3
• Could this be an equation for this graph? Sure. The polynomial we’ve written is of degree 6 (so it’s
even, which checks out). It has a negative in front, which will make it open downward (good). And,
finally, we can check the y-intercept: f (0) = −(0 + 2)2 (0 + 1)(0 − 3)3 = −(1)2 (1)(−3)3 = −108. The
y-axis on our graph doesn’t have a scale, but we can see that the y-intercept should be some negative
number—so this appears to work. Hooray!
• So you tell this to your date. And she or he says, “Mmm... but couldn’t it also be f (x) = −(x +
2)4 (x + 1)(x − 3)7 ?” And you stammer, “Well, of course. That function would still have the same
10
basic properties—it just might have a different vertical scale, and some of the roots might be a little
more shapely.”
Then she or he says, “What about f (x) = −(x + 2)4 (x + 1)(x − 3)7 (x2 + 1)?”
And you say, “Of course not! What’s that extra (x2 + 1) factor doing in there? There’s no root at...
oh.”
She says, “It works. The factor (x2 + 1) doesn’t give you a real-valued root that you can plot on the
x, y plane. If you try to solve it, you get:
(x2 + 1) = 0
x2 = √
−1
x = ± −1
“The square root of negative one is a complex number, not a real one. It’ll change the vertical scale
of the polynomial—it’s like a non-linear vertical expansion, if you can imagine that—but it’ll still
have all these basic properties.”
And then she walks off toward the Henry Moore exhibit.
Problems
For the following polynomials:
1. What is the degree of the polynomial?

2. What is the sign of the leading coefficient?
3. How many real roots/solutions/zeroes/x-intercepts does it have? Where are they, and what are their
multiplicities?
4. What is the y-intercept?
5. What does the polynomial look like? (i.e., sketch it. Without a calculator!)
(You might have to factor or otherwise manipulate the expressions to get them into a convenient form.)
1. f (x) = (x − 2)(x + 4) 15. f (x) = (x + 3)2 (x + 5)2

2. h(x) = x2 + 8x + 15 16. j(x) = (x + 1)2
3. t(x) = x2 − 2x − 15 17. f (x) = (x + 1)(x − 5)2 (x2 + 3)
4. r(x) = x2 − x − 2 18. f (x) = −(x2 − 1)(x − 3)2 x2
5. w(x) = x2 − 2x − 63 19. f (x) = 5x(3x + 2)(x − 5)(x + 6)
6. f (x) = x3 + x2 − 4x − 4 20. f (x) = (x3 + 27)(x + 2)2
7. j(x) = x4 + 2x3 + 9x + 18
21. f (x) = −(x2 − x − 12)(x2 + 8x + 12)
8. g(x) = x5 − x3 + 5x2 − 5
22. f (x) = (x + 1)2 (x + 7)7 (x2 + 3)
9. h(x) = (x − 2)(x + 3)(x + 5)
23. f (x) = x3 (x2 − 9)
10. t(x) = −(x − 1)(x + 9)(x − 7)(x + 3)
24. f (x) = (x − a)(x + 5a)2 (x + a), where a is a
11. r(x) = (x + 4)(x − 2)(x + 6)(x − 5) constant greater than zero.
12. f (x) = 5(x − 2)(x + 10)(x + 4) 25. f (x) = a2 (x−3a)(x+a)3 , where a is a constant
13. f (x) = 3x2 (x − 1)(x + 7)(x + 8) greater than zero.
14. w(x) = (x − 6)(x + 4)2 (x − 1)3 26. f (x) = (x − a)(x + b)2 (x + 5b)
11
For the following graphs of polynomials:
1. What is the degree of the polynomial (even or odd)?

2. What is the sign of the leading coefficient (positive or negative)?
multiplicities?
4. How many extrema (minima and maxima) does the polynomial have?
5. What could a possible equation for the polynomial be? (Give a factorized form.)
6. What could a second possible equation for this polynomial be? (i.e., an equation which gives a
polynomial that has the same general shape and the same x-intercepts.)
(Note that the x-axes and y-axes of these graphs have different scales. Also note that the number of the
problem is at the bottom-left corner of the graph (not the top).)
27. 31.
28. 32.
29. 33.
30. 34.
12
35. 38.
36. 39.
37. 40.
13
The Dream of Rational Functions
Calculus 11/12, Veritas Prep.
Andrew M.H. Alexander1
In the beginning were the natural numbers:
0, 1, 2, 3, · · ·
But with only the natural numbers, mathematicians could not subtract, and so they created the integers:
· · · , −2, −1, 0, +1, +2, · · ·
But with only the integers, mathematicians could not divide. And so mathematicians created the rational
numbers. They did this by taking every possible ratio of the integers: they said that any number that can
be written in the form of one integer divided by another integer is a rational number.
More formally, each rational number can be written as pq , where p and q are both integers. What is
cool is that each time we expand our number system—when we go from the naturals to the integers to the
rationals, and so forth—we get more and more structure and richness and beauty. If we can only use the
natural numbers, then we can count, which is pretty cool2 , but as far as operations go, we’re restricted to
only adding and multiplying. (For example, what’s 5 − 12? It’s not a natural number.) If we add negatives
then all of a sudden, we can subtract. And if we go further, and start using the rationals, we can divide.
(5/12 isn’t an integer—it’s a rational.) We can keep going, too, and create the real numbers, and the
complex numbers, then the quaternions, and the Hamiltonians3 , etc., etc., and get all these amazing and
beautiful structures of numbers and their relationships to each other.
The analogy holds for polynomials. In the beginning, we learn about lines, which are just first-degree
polynomials. Then we generalize to polynomials of second degree (quadratics), of third degree (cubics), and
eventually, of degree n. They are pretty cool objects, and as we’ve seen, we can do a lot with polynomials.
But what if we generalized some more? What if we considered a function that is not a polynomial per se,
but rather a ratio of two polynomials?
We already know one example. Think about f (x) = x1 . Obviously, x is a polynomial, if not a very
interesting one. And so is 1 (we can think of it as 1x0 ). And, as you know, the graph of 1/x has some
pretty weird properties:
Namely, it has this vertical asymptote at x = 0 and this horizontal asymptote at y = 0. Those are
weird! Polynomials don’t have those! It’s co-mingling the finite with the infinite! And such is the case
for rational functions in general: we can still have the same shapes as polynomials (we still have the same
behavior with maxima and minima and roots, etc.), but we can have so much more.
1
Draft 1, September 2010. Contact: amha@uchicago.edu
2
It actually is extremely cool. All of combinatorics—all that stuff with permutations and combinations—that math all
happens using just the natural numbers. You can take entire classes just on counting (i.e., combinatorics).
3
Look them up on Wikipedia. You’ll be glad you did.
1
Here are some beautiful pictures of rational functions, just to give you an idea of how cool they can
look:
Look at these! Just look at them! Here are some things to notice: the maxima and minima! the roots!
the vertical asymptotes! the end behavior! Sometimes there are horizontal asymptotes, and sometimes
not! Sometimes the function goes up or down on both sides of a vertical asymptote; sometimes it goes in
opposite directions! See how weird these can look!
I guess I should formally define a rational function before we go much further. A rational function
is a function f (x) that can be written in the form
p(x)
f (x) =
q(x)
where both p(x) and q(x) are polynomials.

One of the neat things about moving from natural numbers to integers to rationals, etc., is that each
new number system doesn’t replace the previous number system. It simply expands it. Every natural
number is also an integer (as well as a rational number and a real number and a complex number, etc.);
every integer is also a rational number, etc., and so forth. This analogy holds for our more abstract concept
of a function: every polynomial is also a rational function. Why?
In these notes, I’ll discuss each of the main features of rational functions in turn:
1. end behavior,
2. roots (and their multiplicities),
3. vertical asymptotes (and their multiplicities),
4. holes,
5. and the y-intercept.
Then I’ll give some examples of how to sketch rational functions using these properties, and an example of
how to come up with an equation for a graph of a rational function. Then I’ll discuss an entirely different
(but equally-valid) way of graphing rational functions.
A word of warning first: there’s a lot of information here! But don’t be overwhelmed! Graphing
rational functions is really quite simple and quite beautiful. If anything, I think I may have made it seem
excessively complicated in the notes. If you ever feel overwhelmed, just turn to the back page, rip off Hunter
Hahn’s emergency reference sheet (giving you quick reminders of how to find all the basic properties of
rational functions), and use it!
2
End Behavior
• We know what polynomials look like on the far left side and the far right side of the graph—they
look like whatever their leading term looks like. For example, f (x) = x3 − 3x2 + 5x − 12 looks more
and more like x3 , the more and more you zoom out (or follow the polynomial to the left or right of
the origin). Another way of saying this, I guess, would be that f (x) = x3 − 3x2 + 5x has an end
asymptote at y = x3 . (It’s an asymptote that isn’t a straight line, but it’s still the same idea.)
With rational functions, by contrast, we have two polynomials (one in the numerator, and one in the
denominator), and so we need to consider the ratio of their leading terms. Whatever that ratio is,
when simplified, will be the “end asymptote” of our rational function. This “end asymptote” (for
lack of a better phrase) might be horizontal (e.g., the line y = 0) or it might not be horizontal (e.g.,
the parabola y = x2 , or the line y = 4x). Put most generally: a rational function will have an
end asymptote at y = thetheleading
leading term on top
term on bottom .
5x3 + 3x2 − 6x + 4
• For example, consider the rational function f (x) = . The leading term of the
6x3 − 8x2 − 7x + 5
top is 5x3 , and the leading term of the bottom is 6x3 . So this function has an “end asymptote”
5x3 5
at y = = . Which is just a horizontal asymptote at the line y = 5/6. So even though this
6x3 6
rational function might be jumping around and spiking and doing weird things near the origin, the
further and further we get from the origin, the more and more this function will start to look like
the horizontal line y = 5/6. In fact, we can state this as a general law: whenever the numerator
and denominator have the same degree, the rational function will have a horizontal asymptote at
y = thetheleading
leading coefficient on top
coefficient on bottom .
x2 + 2
• Here’s another example. What if we have the rational function f (x) = ? The
x3 + 4x2 − 8x + 9
leading term of the top is x2 , and the leading term of the bottom is x3 . So this will have a horizontal
x2 1
asymptote at y = 3 = . The further and further we get from the origin, the more and more f (x)
x x
will look like y = 1/x. But we also know that the further and further we get from the origin, the
more and more y = 1/x looks like the horizontal line y = 0. So then f (x), the further and further
we get from the origin, will look more and more like the horizontal line y = 0.
x2 + 2
Put differently: y = has a horizontal asymptote at y = 1/x, but y = 1/x has a
x3 + 4x2 − 8x + 9
x2 + 2
horizontal asymptote at y = 0, so y = 3 has a horizontal asymptote at y = 0. (It’s
x + 4x2 − 8x + 9
transitive.)
We can state this as a more general law: whenever the degree of the numerator is less than the degree
of the denominator, the rational function will have a horizontal asymptote at y = 0.
5x3 + 3x2 − 7x + 90
• Let’s do one more example: consider f (x) = . In this case, the ratio of the
x−7
3
leading coefficients is 5xx = 5x2 . So the rational function will have a parabolic asymptote at y = 5x2 ,
so both the left and right sides will spike up to infinity (like the parabola 5x2 !). Here’s what it looks
like (way, way zoomed out, and note the single vertical asymptote at x = 7):
3
See the parabolic asymptote? More generally: whenever the degree of the numerator is greater than
that of the denominator, the rational function will have a linear4 /parabolic/cubic asymptote, the
choice of which depending on the precise difference in degree.
• Here are some more pictures of end asymptotes:
This function has a cubic end This function has a linear end
asymptote asymptote
This function has a horizontal

end asymptote at y = 0
This function has a horizontal
end asymptote (not at y = 0)
4
Meaning that it will be slanted like a straight line with nonzero slope, “linear” just being the adjectival form of “’line.”
4
x-intercepts/roots/solutions/zeroes
• We know how to find the roots of a polynomial—we just factor it, and then deal with the individual
factors. But what about a rational function?!? We have factors on the top and on the bottom!
Factors everywhere! What a mess!
Well, luckily, it turns out that we only need to consider the factors on the top in our search for roots.
Put differently: the roots of the top will be the roots of the entire rational function. Here’s
why:
Consider the rational function f (x) = p(x)
q(x) , where p(x) and q(x) are both polynomials. We want to
find the x-intercepts, and we know that to do that, we’ll need to set the entire function equal to zero.
So we have
p(x)
0=
q(x)
To solve this, we can just multiply both sides by q(x):
p(x)
q(x) · 0 = · q(x)
q(x)
But then the left side will still be zero, and the q(x)’s will cancel out on the right side! So we’re left
with just
0 = p(x)
So all we need to do is find where the top polynomial is zero. Which we already know how to do—we
just factor it and look at the individual factors.
(x + 3)(x − 2)
• Let’s do an example. What if we have f (x) = ? If we set this equal to zero and
x(x + 6)
simplify:
(x + 3)(x − 2)
f (x) =
x(x + 6)
(x + 3)(x − 2)
0=
x(x + 6)
0 = (x + 3)(x − 2)
x = −3, +2
So we have two x-intercepts at −3 and +2. Note that these intercepts both have multiplicity 1, and
so f (x) will pass through the x-axis at these points.
• Note that basically everything we already know about the multiplicity of roots still applies.
The one difference is that rational functions, thanks to their asymptotes, tend to be pretty bendy,
and so it’s hard to distinguish between roots of multiplicity 1 and roots of odd–but–greater–than–1
multiplicity. So basically we’re just left with classifying roots as either even or odd.
Vertical Asymptotes
• We’ve discussed one of the major cool things about rational functions—that they can have horizontal
asymptotes. (They don’t have to go up to +∞ or down to −∞, like polynomials do!) The second
cool feature of rational functions is their vertical asymptotes.
Let’s start discussing this by considering the simplest rational function we know: 1/x. What happens
to this function as x gets close to 0?
5
x 1/x
2 0.5
1 1
0.5 2
0.1 10
0.01 100
0.001 1000
0.0001 10000
0.000001 1000000
As we plug smaller and smaller numbers in for x, 1/x becomes bigger and bigger and bigger. Put
differently: as we plug into x numbers that are closer and closer to 0, 1/x explodes! We get a vertical
asymptote. Vertical asymptotes are the physical manifestation of a divide-by-zero problem!
Rational functions will have vertical asymptotes wherever the denominator is zero. Put
p(x)
more formally, if we have a rational function f (x) = , then f (x) will have vertical asymptotes
q(x)
wherever q(x) = 0 (with one slight exception I’ll discuss later).
This isn’t a proof, of course—but later in the year, we’ll have a more formal definition of an “asymp-
tote,” and we’ll be able to prove this.
• For example, imagine we have the function
(x + 2)(x − 4)(x + 7)
f (x) =
(x − 1)(x − 3)(x + 5)
This function has vertical asymptotes at x = 1, x = 3, and x = −5. It looks like this:
(Don’t worry yet about how we drew the rest of the graph—just observe its beautiful vertical asymp-
totes.)
• NB: When we draw graphs, it can be nice to denote asymptotes (vertical or horizontal) with a dotted
line. It makes things a bit clearer.
• Oh, and by the way: can a vertical asymptote have a multiplicity? does that affect what it looks
like? Yes! Consider the difference between 1/x and 1/x2 :
With 1/x, the vertical asymptote has a multiplicity of 1 (because the x in the denominator only
shows up once); with 1/x2 , the vertical asymptote has a multiplicity of 2. And the visual difference
this begets is that:
6
1/x 1/x2
– with a vertical asymptote of odd multiplicity, the function goes up on one side of the asymptote
and down on the other
– with a vertical asymptote of even multiplicity, the function either goes up on both sides of
the asymptote, or goes down on both sides
Why this is true isn’t immediately apparent yet—but later in these notes, I’ll discuss a different
method of rational function sketching that will explain this.
• So, for example, what if we take a somewhat modified version of the last function we used as an
example:
(x + 2)(x − 4)(x + 7)
f (x) =
(x − 1)2 (x − 3)2 (x + 5)2
Here, I’ve changed all of its asymptotes so that they are now of multiplicity 2. This should change
the graph so that the function always approaches the asymptote in the same direction from either
side. And indeed, it does:
Again, don’t worry about how we graphed the rest of the function—just notice that the asymptotes
are now all going in the same direction. The other thing to notice with this graph is that there was
some collateral damage from changing the multiplicities—e.g., that parabolicky guy on the right side
now opens down, not up, and the horizontal asymptote is now at y = 0, not y = 1.
Holes
• There is one slight exception to our “vertical asymptotes are given by the zeroes of the denominator;
horizontal asymptotes are given by the zeroes of the numerator” rule. That is: what if we have a
number that is both a zero of the numerator and the denominator?
7
x2 + 5x + 6
• For example, consider the function f (x) = . If we factor it, we get f (x) = (x+2)(x+3)
(x+2) .
x+2
But we have an (x + 2) on both the top and bottom, so these essentially cancel out and give us
f (x) = x + 3. Except... well, they don’t really cancel out, because (in unfactored form) we still have
x2 + 5x + 6
f (x) = . What happens is this: at every point other than x = −2, the numerator and
x+2
the denominator will cancel out, and my function will look exactly like f (x) = x + 3. And at x = −2,
we’ll have f (x) = 0/0, which is basically like a divide-by-zero problem, except we’ve got zero on the
top, too, so instead of getting a vertical asymptote, the function simply won’t exist at that point. It
will be undefined. It will look exactly like f (x) = x + 3, but with that one point (−2, 1) missing. It’s
a ghost point!
Usually we denote a hole with a little circle in the graph:
• Holes are weird and kind of esoteric and they don’t usually come up too often—mostly they just
show up in problems math teachers write to deliberately include holes—but you should know of
their existence, just in case (and because the other time holes come up is with limits—you can make
functions with holes that are pedagogically excellent for teaching limits).
y-intercept
• As with every other function, we can find the y-intercept just by plugging in 0 for x. This is pretty
straightforward. I don’t even want to do an example. Look on your polynomial notes if you want
an example. (Of course, sometimes we might have a vertical asymptote at x = 0 (i.e., at the origin),
and thus not have a y-intercept.)
So now we can figure out quite a bit about a given rational function. Sadly, even with all of this information,
we still don’t have quite enough information to fully graph one. We’ll need to know a couple more things.
But we know enough to start graphing them, so let’s dive in with an example.
One example
5
• Consider the function f (x) = Now, obviously, we could think of this as just being 1/x with
x+2
some transformations (namely, moved left two and expanded vertically by 500%). But let’s treat it
as a rational function. (We’ll still get the same graph.)
– Roots: We know it will have roots/zeroes/etc. wherever the top polynomial equals zero. That
never happens. So this function has no x-intercepts.
– Vertical Asymptotes: We know that vertical asymptotes will happen whenever the bottom
polynomial equals zero. That happens at x = −2, so that’s our vertical asymptote. It has a
multiplicity of 1.
– Holes: These will happen whenever the top and bottom polynomials have a root at the same
place (of the same multiplicity). But that doesn’t happen here, so there are no holes.
8
– End behavior: We need to look at the ratio of the leading terms. The leading term of the
top is 5, and the leading term of the bottom is x, so this function has an end asymptote at
y = (5)/(x), which looks basically like 1/x, which has a horizontal asymptote at y = 0. So this
function has a horizontal asymptote at y = 0.
– y-intercept: Like with polynomials (or any other function), this is equivalent to plugging in 0
5
for x. Which gives us y = 0+2 = 2.5 as the y-intercept.
• So now we can start to draw it! We’ll start by drawing the asymptotes (with dotted lines), and
marking the location of the y-intercept. (If we had x-intercepts, we’d mark those, too.)
(Note that I’ve drawn the horizontal asymptote just a little bit under y = 0—I want to make it
visible and not get in the way of the x-axis.)
So from here, we can try to connect the dots, more-or-less. On the right-hand part, our function will
have to look like this (thanks to the y-intercept):
But what about the left part? We have two options. The function could either be above the x-axis,
or below it:
9
(Obviously, we know which one it should be, based on our knowledge of 1/x and its transformations,
but indulge me here. We could very easily have a more-complicated rational function for which we
don’t know where to draw the graph. Actually, that’s coming up in the next example.)
So how do we figure this out? There are a few ways to do it.
– One way: test the parity. Consider this: we have two possible graphs. On one of the graphs,
the function is positive (above the x-axis) on the left side; on the other, it’s negative on the left
side (below the x-axis). Put differently, on one graph, f (x) is positive when x < −2; on the
other, f (x) is negative when x < −2. So we just need to ask: when x is less than −2, is f (x)
positive or negative? We can do this quite easily algebraically, just by considering the individual
factors. To wit:
∗ If x < −2, then 5 is, um, still positive. (Obviously. Five is always positive. Ain’t no x
that’ll make it negative!)
∗ If x < −2, then x + 2 is negative..
(+)
So we have a positive over a negative. Put differently, when x < −2, then f (x) = (−) , or just
f (x) = (−). So when x < −2, then f (x) < 0. Which is just a fancy way of saying that on the
left side, the function has to be negative. So our graph must be this one:
– Another way is to just plug in some number less than 2 and see if the function is positive or
negative. But the algebraic way is prettier.
– An even easier way to have figured this out would have been to consider the multiplicity of
the vertical asymptote. The vertical asymptote at x = 2 has a multiplicity of 1, which means
that the function needs to go up on one side, and down on the other side. We already know
that the function goes up on the right side. Therefore, it must go down on the left side.
10
Another example
x−5 (x − 5)
• Imagine we have f (x) = = . What do we know about this?
x2 + 5x + 4 (x + 4)(x + 1)
top is x, and the leading term of the bottom is x2 , so this function has an end asymptote at
y = (x)/(x2 ) = 1/X, which means it will have a horizontal asymptote at y = 0.
– Roots: We know it will have roots/x-intercepts/etc. wherever the top polynomial equals zero.
That happens only at x = 5. This root has multiplicity 1.
polynomial equals zero. That happens at x = −4 and x = −1. So those are our vertical
asymptotes. They both have a multiplicity of 1.
– y-intercept: Like with polynomials (or any other function), this is equivalent to plugging in 0
(−5)
for x. Which gives us y = (4)(1) = −1.2 as the y-intercept.
So far, so good! Let’s sketch what we have so far. I’ll represent the intercepts with dots, and represent
the asymptotes with dotted lines:
We can connect dots on the far right side. Note how, because of the intercepts, it’ll have to start on
bottom, go up through (0, −1.2) and (5, 0), but then go back down again to approach the horizontal
asymptote at y = 0. It can’t start on the top, because then how would it have a negative y-intercept?
11
But what do we do in the middle? It can’t go through the x-axis, because then there would be an
x-intercept, but there isn’t one there. So it has to either always be above the axis, or below it. We
have two options:
The difference between the two possible graphs is that in one of them, the function is always positive
between −1 and −4 (i.e., above the x-axis); in the other, the function is always negative (i.e., below
the x-axis). So which is it? Is the function f (x) positive between x = −4 and x = −1, or negative?
We can figure out whether it’s positive or negative in that interval using algebra! In fact, since it’s
factored, we can do this simply by finding out whether each factor is positive or negative, and then
multiplying. For example, if the top factor is negative, and the bottom factors are both positive,
12
(a negative)
we’ll have (a positive)(a positive) , which = (a negative). It’s totally algebraic!
So let’s investigate. Let’s see which factors are positive between −1 and −4, and which are negative.
– Let’s do the top factor (x − 5) first. If we have a number between −1 and −4 (a negative
number), and we subtract 5 from it, then we certainly still have a negative number. So this
factor must be negative there.
That’s a kind of informal way of thinking about it. Put more algebraically: if we have
−4 < x < −1
then we must have

−4 − 5 < x − 5 < −1 − 5
−9 < (x − 5) < −1
So then between −1 and −4, the top factor (x − 5), must be between −9 and −1, which means
that that factor is negative.
– Likewise, if we have (x + 4), and we plug in a number between −1 and −4, then we will get a
number between +3 and 0. So this factor is positive in this interval.
– Finally, for the (x + 1) factor, plugging in a number between −1 and −4 gives us a negative
number.
We can summarize our findings5 :
when −4 < x < −1...

Factor (x − 5) (x + 4) (x + 1)
Parity − + −
(−)
So then, when x is between −4 and −1, we have f (x) = (+)(−) = (+). So the function must be
positive between −4 and −1, which means the graph we want is this one:
Another way to have figured this out (an easier way) would have been to have just considered the
multiplicity of the vertical asymptote. It had multiplicity 1, so the function should go up on one side
5
Note that “parity” is just a fancy name for “positive-or-negative-ness”.
13
and down on the other side. We already knew that it went down on the right side of the asymptote,
so we could have concluded that it went up on the left side (and thus, that the graph was the positive
one). The reason I’m spending all this time with the algebra is because sometimes you have to use
that method to figure out the shape of the graph, and it’s kind of complicated, so I’m trying to use
the repetition to make it clear.
What about far left? Again, we have two choices. The function could approach the asymptote from
below, or from above.
We know that it has to approach the horizontal asymptote from below, though, because the vertical
asymptote at x = −4 has multiplicity 1. The function goes up on the right side of that asymptote,
so it needs to go down on the left side. So it must approach the horizontal asymptote below (i.e.,
the function must be negative).
Alternatively, we could figure this out simply by knowing whether the function is positive or negative
on that interval (i.e., when x < −4, is f (x) > 0 or is f (x) < 0?) Let’s make a table again:
when x < −4...

Factor (x − 5) (x + 4) (x + 1)
Parity − − −
(−)
So then, when x is less than −4,, we have f (x) = (−)(−) = (−). So the function is always negative,
which means it approaches the asymptote below. So both methods give us the same result (good!).
Thus, the final graph is this one:
14
And that’s our function!!!
A Third Example With Weird Asymptotes, Etc.
• This stuff gets complicated, so let’s do a third example with some weird roots, some weird asymptotes,
and some weird end behavior. How about the function
3x(x + 1)2
f (x) =
(x − 2)2
What do we know about this?
top is 3x3 , and the leading term of the bottom is x2 , so this function has an end asymptote
at y = (3x3 )/(x2 ) = 3x, which means it will have a slanted/linear end asymptote at the line
y = 3x.
– Roots: We know it will have roots/x-intercepts/etc. wherever the top polynomial equals zero.
So we have two roots: at x = 0 (of multiplicity 1), and at x = −2 (of multiplicity 2).
polynomial equals zero. That happens at x = 2. This vertical asymptote has a multiplicity of
2, which means that the function will either spike up on both sides of that asymptote (like a
volcano!), or drop down on both sides (into an abyss!).
– y-intercept: We just plug in 0 for x. Which gives us y = 0 as the y-intercept. (This makes
sense, because we have an x-intercept at 0, so the function should pass through the origin.
So far, so good! Let’s sketch what we have so far. I’ll represent the intercepts with dots, and represent
the asymptotes with dotted lines:
15
Let’s start drawing this on the right side. I have to either come up from or come down from the
vertical asymptote at x = 2, and then start hanging out with the linear asymptote. So my graph
could look like either of these:
Except it can’t be the one on the bottom, because then there’d be an x-intercept somewhere to the
right of x = 2, and we don’t have an x-intercept there! So it must be this:
16
What about the left side? I can figure this out in many different ways. It has to come either down or
up from the vertical asymptote, go directly through the intercept at x = 0, bounce off the intercept
at x = −2, and then go down and hang out with the slant asymptote. I have two options:
It has to be the graph that starts on the top and goes down. Why? Because
– the vertical asymptote has a multiplicity of 2. Since it goes up on the right side, it must go up
on the left side, too (volcano-like).
– if it started at the bottom and went up, it’d have to keep going up—and then how would it ever
meet up with the end asymptote?
– if neither of the following two reasons convinced you, you could see where the function was
positive and negative, and come to the same conclusion.
Many ways of explaining the same behavior. The function has to look like this:
17
Going backwards
• With polynomials, we saw that we can do this process in both directions—we can take an equation
and draw a picture of it, or we can take a picture and write an equation for it. We can do the same
with rational functions.
• So imagine we have a rational function that looks like this:
How can we find an equation for it? What do we know?
– Vertical asymptotes: This function appears to have vertical asymptotes at x = 1 and x = −2.
18
They both split the function, so they must have multiplicity 1. So the factors in the denominator
could be (x − 1) and (x + 2).
– Roots and their multiplicities: This function appears to have only one root, at x = 0. It
goes through the x-axis, so the root must be of odd multiplicity. Thus, the numerator of the
function could be x, or x3 , etc.
– End behavior: This function appears to have some sort of a slant asymptote—maybe like
y = x, or at the very least, an asymptote that is some cubic polynomial (could be x3 , x5 , etc.,
because it goes up in the far right and down in the far left). This means that the degree of the
numerator will need to be greater than the degree of the denominator.
– y-intercept: It’s at zero, and it has to be, since we have an x-intercept there.
• Really, all we need to know to come up with a preliminary equation are the roots and the vertical
asymptotes (because that way we can make the numerator and the denominator). So now we can
come up with an equation.
x
• What about f (x) = ? Well, this won’t work, because this equation has a horizontal
(x − 1)(x + 2)
asymptote at y = 0.
x3
• What about f (x) = ? This could work. It gives a slant asymptote at y = x. Yay!
(x − 1)(x + 2)
(x3 )(x2 +1)
• Of course, f (x) = (x−1)(x+2) could work, too (the slant asymptotes might be a bit steeper), as could
x5 546x3
f (x) = (x−1)(x+2) , or f (x) = (x−1)(x+2) , etc., etc. All of these equations have graphs with this same
general shape.
An Entirely Different Way of Drawing Rational Functions

• There’s another way of drawing rational functions that you might prefer. In certain ways, it’s actually
much simpler. The inspiration comes from this question, asked last year by Hannah Duncan:
Imagine we have the rational function But what if we want to flip that
(x + 1)(x − 3) second-from-left segment (and only
f (x) = . It looks like
x(x − 2)(x − 4) that segment) so that it looks like this:
this:
How do we do this? It’s not immediately obvious. Maybe we could fiddle around with the multi-
plicities of the vertical asymptotes or something, but that’d require some trial and error. Is there a
better way to do this?
• Yes!
19
• Here’s the idea: what if we try drawing both the numerator and the denominator as separate poly-
nomials, on the same graph, and then sort of intuitively divide and draw it?
• What I mean is this. Let’s first draw both polynomials on the same axis, like so. I’ve drawn
the numerator-polynomial with long dashes, and the denominator-polynomial with shorter dashes:
• We know we’ll have vertical asymptotes wherever the denominator (short-dashed polynomial) is zero,
and x-intercepts wherever the numerator (long-dashed) polynomial is zero. So we can mark those:
• But how do we draw the rest of the polynomial? Here’s the trick. Let’s consider the far-left segment
first.
– We know that between −1 and 0, both polynomials are negative. Then the full rational function
must be a negative over a negative, which is a positive, so the full rational function must be
positive there. This means that it must go up to the asymptote.
– and we know that to the left of −1, the numerator is positive, while the denominator is still
negative. A positive over a negative is a negative, so the function must be negative there.
20
– And we know that the full rational function must have a horizontal asymptote at y = 0. So the
left-most segment must always be negative, but must eventually go back up and approach 0.
So then we can draw in the function on the left side:
• What about the next segment (i.e., the second segment from the left)? In that segment, the numerator
is always negative, and the denominator is always positive. So the function must always be positive,
like so:
• Then, in the third-from-left segment, we have an x-intercept.
– For the left part of this segment (between 2 and 3), both polynomials are negative, so the full
function must be positive.
– For the right part of this segment (between 3 and 4), the numerator is positive, and the denom-
inator is negative. So the full function must be negative.
So it will look like this:
21
• FINALLY, in the last segment (on the rightmost side), both polynomials are positive. So the function
must be positive there, too. And we know it has to approach y = 0, so I guess it’ll come down from
the asymptote. So our FULL FINAL FUNCTION looks like this (the function in solid bold, and its
constituent polynomials in dashes):
• But the whole motivation for doing this was to be able to flip portions of the functions. What’s cool
is that we can easily see that if we want to make that chunk of our rational function between 0 and
2 positive, we just need to make the denominator negative there! Then we’ll have a negative times
a negative, which will give us a positive. (We could make the numerator negative there, too, but
that’d change the end behavior.) So we just need to change the multiplicities on the x = 0 and x = 1
roots of the denominator to 2, like so:
22
Which will give us the graph we want, with the middle-left section flipped:
• By the way: using this method you can see why asymptote multiplicity changes the graph in
the way that it does. At each vertical asymptote, the denominator-polynomial might or might not
change sign (might or might not go from being positive/above the x-axis to being negative/below
the x-axis, or vice-versa). The numerator-polynomial, however, will not change sign at a vertical
asymptote (because if it did, it’d have to be zero, but it can’t be zero, because then there’d be a
hole). So
– if the denominator-polynomial doesn’t change sign, the entire function won’t change sign. But
in order for the denominator-polynomial to not change sign, the root that causes the vertical
23
asymptote must be of even multiplicity. So if the root is of even multiplicity, the entire function
won’t change sign, and thus it’ll either stay positive on both sides of the asymptote (spike up),
or stay negative on both sides of the asymptote (spike down)
– if the denominator-polynomial does change sign (i.e., is of odd multiplicity), then the entire
function will change sign. But for it to change sign at a vertical asymptote, it must be negative
(go down) on one side, and be positive (go up) on the other side.
Problems
For the following rational functions:
1. Where are the vertical asymptotes?

2. What is the end behavior? Is there a horizontal asymptote? Where? A parabolic asymptote? Cubic
asymptote?
3. Where are the holes, if any?
multiplicities?
5. What is the y-intercept?
6. What does the rational function look like? (i.e., sketch it. Without a calculator! You may need to
determine on what intervals the function is positive (i.e., above the x-axis) and on what intervals the
function is negative (i.e., below the x-axis).)
(You might have to factor or otherwise manipulate the expressions to get them into a convenient form.)
3 x2 − 3x − 2
1. f (x) = 12. f (x) =
x+2 x2 + x − 20
4 x2 + 9x + 18
2. f (x) = 13. f (x) =
x + 12 x2 − 1
1 x2 + 4x + 3
3. f (x) = 14. f (x) =
x−3 x2 − x − 2
x
4. f (x) = x2 − 12x + 35
x+1 15. f (x) =
x2 − 25
3x
5. f (x) = x2 − 4
x−7 16. f (x) =
x2 − 5x
x2
6. f (x) = x3 − 4x2 + x − 4
x+4 17. f (x) =
4x
5
7. f (x) = x + 4x2 + 7x + 28
3
(x + 2)2 18. f (x) =
5x + 3
5
8. f (x) = 2x3 − 6x2 + x − 3
(x − 1)2 19. f (x) =
2x + 1
7x
9. f (x) = 5
(x − 1)2 20. f (x) = 2
x −x−6
x+4 3
10. f (x) = 21. f (x) = 2
(x + 3)3 x + x − 20
x2 − x − 2 2x
11. f (x) = 22. f (x) = 2
x2 − 2x − 63 x + 4x + 3
24
2x2 − 5x − 3 (x − 2)4 (x + 1)4
23. f (x) = 39. f (x) =
3x2 + 2x − 1 (x − 1)2 (x − 3)2 (x + 2)2 (x − 5)2
5(x2 − x − 2) (x + 2)(x − 2)
24. f (x) = 2 40. f (x) =
x − 2x − 63 (x3 − 3x)
(x + 1)(x − 2)(x + 7)
25. f (x) = −4(x − 2)3
4(x − 4)(x + 7)(x + 2) 41. f (x) =
(x − 7)(x2 − 16)
(x + 1)(x + 3)(x + 5)
26. f (x) =
x(x + 2)(x + 4) −4(x − 2)3
42. f (x) =
(x − 4)(x − 5) (x − 7)(x2 − 16)
27. f (x) =
(x − 4)2 x3 + x2 − 4x − 4
43. f (x) =
x2 (x + 3) x4 + 2x3 + 9x + 18
28. f (x) =
(x + 5)2 (x + 7)2 −3(x + 1)(x − 2)(x + 4)
44. f (x) =
−x3 (x + 2)
29. f (x) =
(x − 4)3 (x + 1)2
5(x2 + 1)(x + 5)(x + 7)
x(x − 4)(x − 2) 45. f (x) =
30. f (x) = (x + 7)(x − 2)
(x − 1)(x − 3)(x − 5)
x(x2 − 1)
x2 − x − 2 46. f (x) =
31. f (x) = 2 (x2 − 4)
x − 2x − 63
(x − 2)(x + 4) (x2 − 4)(x + 5)(x − 5)
32. f (x) = 2 47. f (x) =
x + 8x + 15 x
(x − 1)(x + 9)(x − 7)(x + 3) 4x + 2
33. f (x) = 48. f (x) =
(x + 4)(x − 2)(x + 6)(x − 5) 1
3(x − 2)(x + 1)(x − 4)(x + 2) 1
34. f (x) = 49. f (x) =
(x − 1)(x − 3)(x + 1)(x − 6) (x + 1)(x + 54)(x − 12)
−(x − 7)(x + 1)(x − 2)(x + 0) 3
35. f (x) = 50. f (x) =
(x − 1)(x − 9)(x + 2)(x − 5) 4
x5 − x3 + 5x2 − 5 x3 (x + 4)(x − 5)
36. f (x) = 51. f (x) =
(x − 2)(x + 3)(x + 5) (x + 2)(x − 1)
−(x − 2)(x + 1)2 (x + 2)2 (x − 4)
37. f (x) = 52. f (x) =
(x − 1)(x − 3)(x + 2)(x − 5) x2
(x2 − 1)(x2 − 9) 1
38. f (x) = 53. f (x) = 2
x(x + 2)2 (x − 2)2 x +1
For the following graphs of rational functions:

1. Where are the vertical asymptotes? What are their multiplicities? (What does this tell you about
the denominator?)
2. What is the end behavior/end asymptote? (What does this tell you about the relative degrees of the
numerator and denominator?)
3. Where are the x-intercepts/roots/solutions/zeroes, if there are any? Their multiplicities? (What
does this tell you about the numerator?)
4. Where is the y-intercept, if it exists? (You might not know it as a number, but is it positive or
25
negative?)
5. Finally (and most importantly): write an equation for the graph.
(Note that the x-axes and y-axes of these graphs have different scales. Also note that the number of the
problem is at the bottom-left corner of the graph (not the top).)
54. 57.
55. 58.
56. 59.
26
60. 64.
61. 65.
62. 66.
63. 67.
27
68. 72.
69. 73.
70. 74.
71. 75.
28
76. 80.
77. 81.
78. 82.
79. 83.
29
84. 88.
85. 89.
86. 90.
87. 91.
30
92. 94.
93. 95.
31
.
HUNTER HAHN’S
EMERGENCY REFERENCE:
Vertical asymptotes: at x = the zeroes of the bottom
odd multiplicity: fxn goes up on one side & down on other
even multiplicity: fxn either goes up or down on both sides
End asymptote: at y = thetheleading
leading term on top
term on bottom
x-intercepts: at x = the zeroes of the top (w/ normal multiplicities)
y-intercept: at y = the function, but with zero plugged in for x
Holes: at x = any number that’s both a zero of the top and the bottom
32
Inverse Functions
One of the most important things we do with functions—I mean, our entire method of calculus is based
on functions, so they’re pretty important as a structural element, but here I’m talking about functions in
the abstract—the most important thing we do with functions is compose them. We can put two functions
together to create a new function. With numbers, we can add two numbers together and get a third
number; with functions, we can compose two functions together and get a third function. Composition is
our most basic operation with functions. For instance, if
f (x) = 3x2 and g(x) = 5x + 1
then we can plug g(x) into f (x) and create a third function, h(x):
f ( g(x) ) = 3( g(x) )2 = 3(5x + 1)2 = h(x)
This is what we do when we talk about linear transformations—we take a parent function, f (x), and into
it we either plug a different function (to make a horizontal transformation), or we plug the parent function
into a different function (to make a vertical transformation). For instance, say we have the parent function
f (x) (without knowing exactly what it is), and say we have the transformation-function g(x) = x − 3.
Then
f ( g(x) ) = f ( g(x) ) = f (x − 3) = the parent function shifted right by three
The ‘x’ in a function sometimes stands for a position along the horizontal axis of a coordinate plane, but
more generally, ‘x’ is just a “placeholder”1 that can stand for anything. For instance, these four functions
are all written with different placeholders, but they all describe the same underlying function, f :
1 1
f (x) = f (t) =
x2 t2
1
f (Mr. Dickerson) = f (ζ) = 1/ζ 2
(Mr. Dickerson)2
So instead of plugging x or Mr. Dickerson into f , why not replace the placeholder with an entire function?
This is the essence of composition.
Anyway, here’s our real question: what if we want to undo composition of functions? Think about
linear transformations: if we move something right by three, we can undo that action by moving it left
by three. This question is hardly restricted to functions alone. We do things all the time, and so it is
completely natural to ask, can we undo things (and if so, how)? For instance: if we add two things, we can
undo that by subtracting, thus getting the first thing back:
A+B−B =A
If we multiply two things, we can undo that by dividing:

1
A·B· =A
B
We usually don’t use the word “undo” to describe this operation; we usually refer to it as inverting or
finding an inverse. So here’s our question: we know how to compose functions, but how do we invert that
operation? how do we find an inverse for function composition? can we?
1
I feel that this word is overused and underdefined, but I don’t know any better words.
1
Let’s do a quick example. Imagine I take something and quint it (i.e., raise it to the fifth power). How
do I get the original thing back? I just take a quintic (fifth) root.
√5
(x5 )1/5 = x5 = x
So quinting (raising something to the fifth) and taking a quintic root (taking a fifth root) are inverse
functions. We know that because if we do one after the other, we get back the original thing we plugged
into the equation. In this case that was just x, but it could well have been anything. Imagine that Mr.
Dickerson, on a field trip to the quinting factory, accidentally fell into the quinting machine. In order to
get Mr. Dickerson back, you’d just need to bring him to the quintic-root factory next door:
p5
(Mr. Dickerson)5 = Mr. Dickerson
Of course, it doesn’t matter what order we compose these functions in2 . If A is the inverse of B, then B
is the inverse of A: √5
x5 = x
√
( 5 x)5 = x
Formally, then, we define an inverse function in the following way: f (x) and g(x) are inverses if (and
only if) the following are true:
• f ( g(x) ) = x, or
• g( f (x) ) = x
Note that if one of these identities is true, the other will be true, too.
Often we use a special notation for inverses: if f (x) is some function, then we write its inverse function
as f −1 (x). But this is bad notation. The superscript (−1) here has absolutely, absolutely nothing to do
1
with a reciprocal. f (x) is not (with one or two uninteresting exceptions) the inverse of f (x). Yes, we do
usually write reciprocals with a (−1) superscript. But here the (−1) just denotes an inverse. That’s why
it’s horrible notation—the same symbol for two entirely different things. “In the language of everyday
life,” Wittgenstein writes,
it very often happens that the same word signifies in two different ways—and therefore
belongs to two different symbols—or that two words, which signify in different ways, are appar-
ently applied in the same way in the proposition... Thus there easily arise the most fundamental
confusions. In order to avoid those errors, we must employ a symbolism which excludes them,
by not applying the same sign in different symbols, and by not applying signs in the same way
which signify in different ways.3
(This quotation isn’t necessary to prove my point; I’m only including it because it is apt, and I was
reading it earlier today.4 ) Quite frankly, I’d rather you never use that notation at all; the only reason I’m
mentioning it is because other people use it, so if you see an f −1 (x) somewhere else in life, you’ll know
what they mean. If you need a good way of writing the inverse of f (x), why not write f inv (x) instead?
(Or come up with some other clever notation.)
Beyond defining what an inverse is, though, the useful question is: how do we actually compute them?
There are two ways: algebraically and graphically5 .
2
Not for our purposes, anyway—not for the types of functions we deal with, for the most part. I could give more complicated
examples where g is the inverse of f but f isn’t precisely the inverse of g.
3
Tractatus 3.323-3.325
4
Also, Wittgenstein uses the word “sign” to mean the same thing that I called a “symbol,” and by “symbol” W. means
something else altogether... so I am afraid I am guilty of the very offense both he and I believe should be capital.
5
“Graphically” isn’t really a method of “computation”, but I think you know what I mean.
2
• Algebraically: we transpose (switch) x and y in the equation (and then solve for y). For example:
if we have
y = 2x3 + 5
then its inverse is
x = 2y 3 + 5
Usually, though, we write functions with the y isolated (i.e., solved for y in terms of x), and if we
were to rearrange this, we’d get
x − 5 = 2y 3
x−5
= y3
2
r
3 x − 5
y=
2
So then the following are inverse functions
r
3 3 x−5
y = 2x + 5 and y=
2
I guess another way to denote this would be to say:
r
x−5
f (x) = 2x3 + 5 f inv (x) =
3
and
2
or r
x−5
f inv (x) = 2x3 + 5
3
f (x) = and
2
If we wanted to formally prove that these are inverses, we’d need to show that they satisfy our
definition of an inverse. So we’d need to plug them into one another, simplify, and end up just with
x:
r
3 f (x) − 5
f inv ( f (x) ) =
r 2
3
3 (2x + 5) − 5
=
r 2
3
3 2x
=
√ 2
3
= x 3
=x
Ta-da!
3
• Graphically: finding an inverse just corresponds to a simple swap of the x and y axis (a 90-degree
rotation left, followed by a horizontal reflection), which is all just the same as reflecting the function
across the line y = x, like so:
q
y = 2x3 + 5 y = 3 x+52
Note that not every function has an inverse that is itself a function. Why not? Come up with an example
of a function whose inverse isn’t a function. For functions we have a “vertical line test” for functionality—
can you come up with a similar test to automatically determine whether a given function has an inverse
function? (This, by the way, is where commutativity of inverses begins to break down—where A being the
inverse of B doesn’t mean that B is the inverse of A.)
Here’s another fun question: can you come up with a function(s) that is (are) its own inverse(s)?6
what are the necessary algebraic/graphical properties of such a function?
Problems
Graph each of the following functions and try to sketch their inverses; then find the inverse algebraically.
(If the inverse is a function, say so; if not, say why.) Then prove that the inverse is, in fact, an inverse.
√ x
1. f (x) = −x 8. f (x) = 4x − 7 14. f (x) =
√ x2 +1
2. f (x) = −x + 1 9. f (x) = 5 + 3x − 2 r
5 3x − 1
3. f (x) = 5x − 12 10. f (x) = 1/x 15. f (x) =
x−2
4. f (x) = ax + b
11. f (x) = 1/x2 16. f (x) = ax2
5. f (x) = 5x2 −4
12. f (x) = xk
6. f (x) = 5 − 2x3 17. f (x) = a(x + b)3 + c
1
7. f (x) = (x5 + 1)3 13. f (x) = 18. f (x) = ex
2x2 +1
6
The mathematical name for such a function is an “involution.”
4
Logarithms! And Exponentials.
Calculus 11, Veritas Prep.
Arizona. August 2010. It’s 145 degrees. Wave after wave of photons crashes down on you as
you stroll along the shore of Tempe Town Flats. It used to be a lake, but that was before—before it
all evaporated. The water, the town, the people, their souls, your bank account—gone. All gone. They
evaporated into a humid human mist that hung in the air briefly, like a firework at the apex of its flight,
and then dispersed, constrained by the second law of thermodynamics to forever increase the entropy of
the universe.
It’s so hot you can’t even ride your bike. The rubber tires crumble the instant they meet UV. Forget
about driving. Car tires are just as bad. Prices have reached $10,000 for a set, and even then, you can
only drive between 1 and 2 AM. Your buddy stole a tank from the abandoned army base—no rubber on
that thing!—but you wouldn’t want to be sealed into a hermetic steel chamber in this heat. You no longer
need superheated gases to weld your metal sculptures together. You just use a magnifying glass.
You’ve got to get out of Tempe. Luckily, you’ve saved up $3, 000 over the past few years by running
a textbook-smuggling ring. New editions of glossy books, delivered in crates by your contacts at print
shops in Dallas and Sacramento, resold at a tenth of retail to impoverished ASU students. Somehow the
university still operates, despite what’s happened to the rest of the Valley. Quality doesn’t seem to have
gone down.
You’ve got to get out of the wasteland. You want to go to Alaska. You’re not really sure what you
would do. Maybe set up a used-book store outside of Fairbanks. Maybe set up a homestead; live off the
land. Lumber’s probably easily available. You could export moose- and bear-jerky to gourmet food stores
the Lower 48.
In any case, you figure you need $9, 000 to get up there and get set up. You’ve got a guy in Tucson
who says he can given you a 15% return per year on your money. 15% per year, every year. That beats
inflation. That beats inflation a lot. You decide to invest with him. Eventually, you’ll have enough to
export yourself to the land of the midnight sun.
time passed your assets

initially $3, 000
after 1 year $3, 000 · (1.15) = $3, 450
after 2 years $3, 000 · (1.15) ·(1.15) = ($3, 000) · (1.15)2 = $3, 967.50
| {z }
what you had last year
after 3 years $3, 000 · (1.15)(1.15) ·(1.15) = ($3, 000) · (1.15)3 = $4, 562.63
| {z }
after 4 years $3, 000 · (1.15)(1.15)(1.15) ·(1.15) = ($3, 000) · (1.15)4 = $5, 247.02
| {z }
.. ..
. .
after t years ($3, 000) · (1.15)t
After you invest, your assets will increase exponentially! You can even plot their growth with time:
1
Here’s the question: when will you be able to go to Alaska? When will you have the $9, 000 necessary
to leave? From the graph, it looks like it’ll take another 8 years of textbook-running.
But how do you find this exactly? You need to find what value of t makes your asset-function equal
to $9, 000:
$9, 000 = $(3000) · (1.15)t
How do you solve this equation? I guess you could start by dividing off the $3, 000:
$9, 000
= (1.15)t
$3, 000
3 = (1.15)t
But how do we isolate t? Could we take a t’th root?
31/t = ((1.15)t )1/t
31/t = 1.15
That doesn’t work. It just gives us the same problem, in a different place. We still have a t in the exponent.
What we really need, I guess, is some function that could get the t out of the exponent. Some function
that could undo the action of exponentiation... an inverse function!
How would we do this? I guess we’d say that we have the function f (x) = 3000 · 1.15x , and we want
to find f inv (x), such that f inv (3000 · 1.15t ) = t. That way, we could solve our equation for t:
9000 = 3000(1.15)t
f inv (9000) = f inv (3000 · 1.15t )

f inv (9000) = t
t = f inv (9000)
Whatever that inverse function is—and we don’t really know what it is; we’re just assuming that exists
and that it’s something—we can just plug $9, 000 into it, and find out how long we need to wait until we
triple our assets and can go to Alaska!!! I guess the inverse function will look like this (a reflection of the
original function across y = x):
The problem is even more general than simply finding out when we can go to Alaska. More broadly,
we know how to deal with exponential functions. Can we come up with functions that are the inverse
of exponential functions? Let us call such functions (whatever they are) logarithms, and define them
formally in this way, as the inverse of an exponential function: if we have an exponential function with
base a
f (x) = ax
then the logarithm base a is its inverse:
f inv (x) = loga x
2
such that
loga (ax ) = x and/or aloga (x) = x
We might not have a general method to figure these out—to actually compute logarithms as specific
functions or specific numbers, written in terms of decimals and square roots and arithmetic operations and
whatnot. But if we define the logarithm in this way, we should be able to figure out some simple ones. I’ll
give examples in a moment.
The problem is, if we define the logarithm in this way, we don’t know very much about it. We don’t
know what the function is in terms of numbers and addition/subtraction/multiplication/division, in terms
of x’s and square roots and that sort of thing. All we know is that it’s the inverse of an exponential. That,
by itself, isn’t particularly useful. We can’t call up the travel agent and book tickets to Alaska, “departing
log1.15 (9000) years from now.” Not yet, at least. So the question is: if this is all we know about the
logarithm, what else must necessarily be true?
I guess we must know what it looks like. We know what exponential functions look like (e.g. ex , 5x ,
etc.), and we know what inverses look like (they look like the original function flipped around the line
y = x). So we can graph logarithms! Imagine we have some generic exponential function ax , where a is
some constant. Then the loga (x) must look like this (graphed together with y = ax and y = x):
So that’s helpful. The graph should give us some idea as to how log should behave as a function—what
sort of numbers it should give us when we plug numbers into it. There’s a vertical asymptote at x = 1,
and an x-intercept at x = 0. It looks like it increases as we go to the right, but slowly1 .
What else can we figure out about logarithms, based solely on this definition? I suppose the next most
important observation (which concerns specific numbers and not more general functions) is this. Imagine
we have some sort of exponential equation, like:
ab = c
This is the same2 as writing:

loga (c) = b
1
Note that it doesn’t exist on the left side of the graph. For our purposes, we’re only defining the logarithm of positive
numbers. You can define the logarithm for negative numbers, but for various reasons there are lots of different ways to define
it, and it gets kind of messy. (Complex numbers get involved. The typical way to define the logarithm of a negative is such
that ln(−1) = iπ.)
2
By “the same” I mean that these two statements are logically equivalent—that if one is true, then the other is necessarily
true as well. Often we just denote this using a double-double-arrow: ab = c ⇐⇒ loga (c) = b
3
For instance, since 52 = 25, then log5 (25) = 2. Why is this true (in general)? Well, imagine I have ab = c.
Then...
ab =c
loga (ab ) = loga (c) (taking the loga of both sides)
b = loga (c) (they’re inverse fxns—they just cancel out
on the left)
loga (c) =b (rearranging)
So, for example:
• 52 = 25, and log5 (25) = 2

• 43 = 64, and log4 (64) = 3
• 24 = 16, and log2 (16) = 4
Another way to think about these examples is this:

log5 (25) = log5 (52 ) (another way to write 25)
•
=2 (inverse fxns—they cancel out)
•
=3 (inverse fxns cancel)
•
=4 (inverse fxns cancel)
Yet another way to think about these (and I think this is the best way) is as a question:
• log5 (25) is like asking, 5what = 25? Clearly, the answer is just 2. So log5 (25) = 2.
Here’s another way of thinking about logs. (I’m worried this is really disorganized, so if you have better
ideas, please let me know.) I like the number seven. It’s my favorite number. But sometimes I don’t want
to write just “7”. Sometimes I want to write it differently. Luckily, there are lots of different ways to write
7:
• 7
• √ −3
10
• 49
• 28/4
• 7·1
• the solution to the equation x − 7 = 0
• 3431/3
But two years before 7 became my favorite number, 5 was my favorite number. What if I wanted to
combine these two? What if I wanted to write my current favorite number (7) in terms of my older favorite
number (5)? I guess I could write:
• 5+2=7
• 5 · 1.4 = 7
4
But what if I wanted to do this not using additon (like in the first example) or multiplication (like in the
second), but by using my third binary operation—exponentiation? Put differently, what if I want to write:
5something = 7
How do I figure out what that something in the exponent is? Put more formally, if I have 5x = 7, how do
I solve for x? The cool obsevation here, that might have escaped your notice, is that we can write any
(positive) number, simply as any other positive number raised to some power!!! And logarithms
tell us how we can do that. It’s the same as how we can write any number as some other number plus
something:
there is some number x such that 5 + x = 7
we can find it using subtraction: x = 7 − 5
or how I can write any number as some other number times something:
there is some number x such that 5 · x = 7
we can find it using division: x = 7/5
so we define the logarithm as the operation that does this—that is the inverse of exponentiation.
there is some number x such that 5x = 7
we can find it using a logarithm: x = log5 (7)
Does any of this make any sense? I have given, like, five different intuitive explanations of a logarithm, plus
a formal definition, and some wacky story about textbooks and Tucson. I’m trying to make it un-confusing,
but I worry that excessive information will have the opposite effect.
Anyway, your persisting question should be, “Well, it’s
operation inverse operation
all well and good to say that log5 (7) is the number such that
addition subtraction
5that number = 7, but I still don’t know what it is! We’ve just
multiplication division
written “log”! We don’t know what it is as a decimal!!!”
exponentiation logarithmancy
That is a good objection. I hope you agree, as we saw
before, that we can find the logarithm as a decimal under the right circumstances: log5 (25) = 2, because
52 = 25. But this leaves us with a lot to be desired. What is log5 (7) as a decimal, anyway? And when can
we can go to Alaska? From looking at the graphs back on pages 2 or 3, we can see that it’ll take maybe 7
or 8 years. But we’d like to know the exact date, so we can buy plane tickets! How do we figure that out?
The analogy here is to division. Sometimes we can divide numbers and get really clean results:
30/6 = 5. But sometimes we can’t, and we have to approximate using long division: 30/7 ≈ 4.285712 . . .
It’s the same with logarithms. So if we’re faced with something like log5 (7), we can do two things:
• We might be able to approximate in our heads. For example, if we want to find log10 (1004), we know
that’s probably a little bit more than 3, because 103 = 1000. Or if we want to find log6 (34), that’s
probably a little less than 2, because 62 = 36.
• You could use your... your... calculator. Eek. Except your calculator only has ln and Log buttons,
for the base-e and base-10 logs. What if you want to find log5 (7)? As it turns out, there’s a change-
of-base formula that lets you write any logarithm in terms of any other logarithm. We’ll prove it in
a bit. But I’ll give it to you now:
logc (b)
loga (b) =
logc (a)
So in this case, you could have
loge (7) ln(7)
log5 (7) = =
loge (5) ln(5)
And you could type that into your stupid hunk of silicon. You get that log5 (7) ≈ 1.209 . . .
5
• Or you could use the change-of-base formula, coupled with this fun fact from higher calculus:
n=∞
X (−1)n+1 n x2 x3 x4
ln(1 + x) = x = x− + − + ···
n 2 3 4
n=1
This, by the way, is basically how your calculator computes logs—it uses a series approximation like
this. But I don’t think you’d really want to do this by hand, because it’d take a long time, and it’d
be boring...
I guess the broader comment I would make here is, who cares if we can’t write it as a decimal? A
decimal is (usually) just an approximation of a number, anyway. We’d far prefer to write 30/7 as 30/7
and not as 4.2857 . . . . The former is the actual number; the latter is just a dirty photocopy. This sort of
thing comes up all the time. There are very few numbers that we can write exactly as a√decimal (like 2
or 5.68). There are far more numbers that we can’t express exactly in decimal form—like 2 or π or even
something basic like 1/3.
the actual its decimal

number approximation
46 46
π
√ 3.14159 . . .
2 1.41421 . . .
50/17 2.94117 . . .
5/4 1.25
sin(π/5) 0.58778 . . .
log5 (7) 1.20906 . . .
ln(5) 1.60943 . . .
Before we go much further, I should probably comment on a special logarithm, and the special number that
is its base. (Because I just used it in my examples, among other things.) That is: the natural logarithm
ln(x), defined as the logarithm to the base e: ln(x) = loge (x), where e is an irrational, transcendental
number (like π): e ≈ 2.718... Incidentally, there’s a cool mnemonic to remember the first 15 digits:
e ≈ 2.7 1828 1828 45 90 45...
So it’s just 2.7, followed by the year Andrew Jackson was elected president, then that again, followed by
a 45 − 45 − 90 right triangle (but rearranged).
The abbreviation ln, by the way, comes from the Latin logarithmus naturalis. You can pronounce
it “ell-enn” or “linn” or even just “log,” if you’re of a certain bent. Often mathematicians will write
“log” when they mean “log base e”. But the button Log(x) on your calculator computes the base-10 log
(log10 (x)), and a lot of K-12 teachers say that “log” means “log base ten.” I think this is silly. 10 is a
horrifically unnatural base for a logarithm. The only reason we use base 10 is because we count in units
of ten, and the only reason we count in units of ten (base 10) is because we have ten fingers. What if you
were a tapir? with only four toes on each of your front feet? Would you then use a base-8 number system?
What if you wanted to count with your rear toes? Tapirs only have three toes on each rear foot. Would you
then use a base-6 number system? Thus, you can assume that if I just say “log” without any qualification
as to the base, I’m talking about the natural log. If I say “the artificial, evolutionarily-induced log,” then
I’m probably talking about the base-10 log. (Or going on a bizarre rant about faux-wood trim.)
But where does e come from? Why do we care about the natural log? Is it just one of those bizarre
math-things that we’re not supposed to be able to understand but have to accept? No. e is a number
that eerily arises out of mathematical nature, like π. π is just the ratio of the circumference of a circle
6
to its diameter. But it’s not a nice, clean integer, or even a rational number. It’s an infinitely-long,
non-repeating, patternless decimal. Out of very basic geometry we get a very strange number3 .
e’s origin is slightly more abstract. It comes from this question: is there a function that is its own
slope? As it turns out—you won’t know why for a bit—as it turns out, the slope of exponential functions
like 5x or 7x is equal to the same exponential function back again, but times some constant:
slope of 5x = (some constant) · 5x
And e is the number such that that constant is 1:
slope of ex = 1 · ex = ex
THAT is why we care about the number e. It’s the number whose exponential function is its own slope!!!
There’s another amazing fact we’ll prove in 12th grade:
eiπ + 1 = 0
This is known as Euler’s Identity, after Leonhard Euler (1707–1783). It is wondrous and beautiful.
It gives, in a single equation, a relationship between the five most important numbers in math—e, i, π,
0, and 1—the three most important operations—addition, multiplication, exponentiation—and the most
important relationship—equality.
Anyway, let’s talk more about how logs behave. We’ve talked a lot about how we can compute
logarithms of specific numbers. But let’s consider how logarithms behave in a more general sense. Here
are a bunch of cool properties that we’ll prove (and that you can use):
Our Logarithm Theorems/Properties

Theorem A: loga (ax ) = x x
Theorem E: loga = loga (x) − loga (y)
y
Theorem B: aloga (x) = x
Theorem C: loga (xy) = loga (x) + loga (y) Theorem F: loga (1) = 0
logb (x)
Theorem D: loga (xk ) = k loga (x) Theorem G: loga (x) =
logb (a)
Theorems A & B: loga (ax ) = x and aloga (x) = x

Proof: We already proved these, right after we gave the definition of the logarithm (as an inverse function).
But let’s demonstrate the proof again.
Imagine we have some exponential function, f (x) = ax . Then, by definition, the inverse of f (x) is the
logarithm base a (“log base a” is just another name for “the inverse of ax ”). So:
f (x) = ax f inv (x) = loga (x)
But we know that, if we have any two functions that are inverses of each other, they will cancel out upon
composition:
f inv ( f (x) ) = x and f ( f inv (x) ) = x
So then, for these specific two inverses, we will have:
loga ( ax ) = x and aloga (x) = x
3
Incidentally: what’s the length of the hypotenuse of a right triange whose other sides are 1?
7
Theorem C: loga (xy) = loga (x) + loga (y)
This is one of the coolest properties of logs—that we can split them up along multiplication! Or,
put differently, that logarithms can turn multiplication into addition. In fact, we could (if we wanted to)
define logarithms only as a function that has this property (rather than define them as the inverse of
an exponential). That’s how they were created—18th century mathematicians wanted an easier way to
multiply number together, and tried to find a function that could turn a (potentially difficult) multiplication
problem into an (easier) addition problem.
Proof: The proof is kind of cool. First, we’ll use logs to find a different way to write x and y, and
then multiply them together to find a different way to write xy. And then we’ll hit that up with a log
again, simplify, and poof! we’ll get our answer.
I know (from Thm B) that I can rewrite x as : x = aloga (x)
Likewise, I can rewrite y as: y = aloga (y)
If I put them together, xy must be equal to this: xy = aloga (x) aloga (y)
but by properties of exponents, this must be
xy = aloga (x)+loga (y)
just:
= loga aloga (x)+loga (y)

and if I take loga of both sides: loga (xy)
on right side, the loga and a cancel, b/c inverses: loga (xy) = loga (x) + loga (y)
Theorem D: loga (xk ) = k loga (x)

This theorem is really useful and really cool. It tells us that if we want to get something out of an
exponent (which is our whole reason for coming up with logs in the first place), we can take any logarithm—
we don’t need to get the base right. This is a very, very useful property. It tells us, for instance, that
ln(75 ) = 5 ln(7).
Proof:
loga (xk ) | · x{z· · · x})
= loga (x (by definition of exponentia-
k times tion)
= loga (x) + loga (x) + · · · + loga (x) (by Thm C—I can split logs
up along multiplication)
| {z }
k times
= k · loga (x) (multiplication)
= k loga (x)

x
Theorem E: loga = loga (x) − loga (y)
y
This proof is a slightly different version of Theorem C. In Theorem C, we showed that logarithms turn
multiplication into addition; here, we show that logarithms turn division into subtraction. (Which should
make sense.) Plus, I think the proof is cool, because rather than going all the way back to first principles,
we can just use Theorem C (in combination with Theorem D) to prove it.
Proof:

loga xy = loga (x · y −1 ) (properties of exponents)
= loga (x) + loga (y −1 ) (by Thm C)
= loga (x) + (−1) loga (y) (by Thm D)
= loga (x) − loga (y)
8
Theorem F: loga (1) = 0
Proof: We know, as a property of exponents, that any number raised to the 0 is 1. Boom.
logb (x)
Theorem G: loga (x) =
logb (a)
This is the change-of-base formula! With this formula, we’ll finally be able to use our calculators (ew
ew ew) and evaluate logarithms to weird bases—we’ll be able to compute log5 (7), and we’ll be able to
compute when we can finally go to Alaska!
Proof: So imagine we want to rewrite loga (x) using a differently-based logarithm (say in terms of a
base-b logarithm). For starters, I hope you agree that, due to Theorem B, I can rewrite x as:
x = aloga (x)
This may seem obvious and pointless, but be patient. Now imagine that we take the logb of both sides.
Then we have:
logb (x) = logb aloga (x)
But because of Theorem D (I told you it was cool), we can just move that exponent inside the log out of
it:
logb (x) = loga (x) · logb (a)
But if we just rearrange this, we get:
logb (x)
loga (x) =
logb (a)
So now we can finally compute log5 (7), as well as figure out when we’ll go to Alaska! For the former...
ln(7)
log5 (7) = ≈ 1.20906 . . .
ln(5)
So then 51.20906... = 7!!! As for Alaska, we needed to solve the following equation for t:
$9, 000 = $(3000) · (1.15)t

$9, 000
= 1.15t
$3, 000
3 = 1.15t
ln(3) = ln(1.15t )
ln(3) = t ln(1.15)
ln(3)
t=
ln(1.15)
t ≈ 7.8606 . . .
So we can go to Alaska in 7.86 years!!! Note that (secret of secrets) we didn’t even need to use the change-
of-base formula—we could have taken a log1.15 , yes, and then used the change-of-base formula, but since
we just wanted to get the t out of the exponent, any logarithm would do, so we chose the easiest one (the
natural log).
9
Problems
Evaluate the following logarithms (without a calculator!):
1

1. log2 (2) 17. log125 (1) 33. log12 144

2. log5 (5) 18. log4 (1) 1
34. log2 4,096
3. log2 (23 ) 19. log7 (1) 1

35. log5 125
4. log7 (79 ) 20. log23,344.1 (1)
36. log1/2 21

5. log11 (1145 ) 21. logπ (1)
37. log1/6 61

6. log15 (15) 22. ln(1)
38. log1/7 71

7. ln(e) 23. log246.7 (1)
8. ln(e3 ) 24. log27.9 (27.9) 39. log1/9 (9)
9. log2 (210 ) 25. log46 (46) 40. log1/5 (5)
10. log4 (43 ) 26. log3 (3) 41. log1/3 (3)
11. log5 (25) 27. log5 15

42. log1/5 (125)
12. log3 (81) 28. log7 17

43. log1/9 (81)
29. log4 14 44. log1/4 (16)

13. log7 (343)
1 45. log1/9 (3)

14. log2 (512) 30. log6 36
1 46. log1/16 (2)

15. log4 (64) 31. log9 81
1
47. ln( 1e )

16. log10 (10, 000, 000) 32. log11 121
Estimate the following logarithms WITHOUT A CALCULATOR!!! Then use the change-of-base formula
to estimate them with your calculator.
48. log10 (947) 52. log7 (45) 56. ln(10e)

49. log2 (65) 53. ln(3) 57. log10 (15)
50. log2 (131) 54. ln(2.5e) 58. log10 (7500)
51. log5 (33.1) 55. ln(10) 59. log3 (3)
Simplify the following logarithms/logarithmic expressions:

b
60. loga ( aac ) 68. ln(2k 5 ) 76. ln(510 ·310 ·210 ·710 ·100·1310 )
2
61. loga (ab ac ) 69. loga ( xy
4x3
) 77. loga (x2 − 2x − 3)
62. logx (x5 ) 70. loga (ab b5 ) 78. ln(12x3 + 9x2 + 8x + 6)
63. loga (ab + ab ) b
71. loga ( ab5 ) 79. loga (3x2 + 5x + 11)
√
64. logk (52000 ) a
72. loga ( 50 ) 80. 2 ln( e)
65. loga (a + 4a2 ) 73. loga (105 53 ) 81. ln(ln(ee ))
66. loga (ab k b ) 74. ln(252 95 ) 82. ln(e2 ln(x) )
√
67. loga (a b a) 75. loga (22 33 52 72 115 ) 83. ln(sin θ) − ln( sin5 θ )
10
1
√ 2 +y 2 )
84. ln(3x2 − 9x) + ln( 3x ) 88. 3 ln( 3 t2 − 1) − ln(t + 1) 92. eln(x
1
85. 2 ln(4t4 ) − ln(2) 89. eln(7.2)
2) 93. e− ln(.3)
86. ln( cos1 θ ) + ln(cos θ) 90. e− ln(x
87. ln(8x + 4) − 2 ln(2) 91. eln(x)−ln(y) 94. eln(πx)−ln(2)
Solve the following equations for y or t:
95. ln(y − 40) = 5t 104. ey/1000 = a

96. ln(y − 1) − ln(2) = x + ln(x) 105. ey ln(.8) = .8
97. ln(y) = −t + 5 106. e−.3t = 27
98. ln(1 − 2y) = t
107. eky = 1/2
99. ln(y 2 − 1) − ln(y + 1) = ln(sin x)
108. et ln(.2) = .4
100. e2y =4
101. 100e10y − 200 = 0 109. eky = 1/10
102. e5y = 1/4 110. et ln(2) − 1/2 = 0

103. 80ey = 1 111. ln(t) = 2t + 4
Here are some word problems involving exponential growth and decay. Do them.
112. After doing some sort of bacterial experiment, you place your agar-coated petri dish underneath a
UV lamp to sterilize it so that you can reuse it. You leave it under the UV lamp for ten minutes, and
the high-energy light kills all the bacteria... except for one. The lone remaining bacterium, fed by
the plentiful supply of agar, begins to grow and reproduce. It takes about an hour for it to undergo
a complete cell cycle, so an hour later, there are two bacteria. How many bacteria are there after
two hours? three hours? four hours? what about after t hours? what about after m minutes? how
many bacteria will there be after a day? when will there be ten million bacteria on your petri dish?
113. Russia has a problem. A problem with overpopulation. Or rather, a problem with underpopulation.
Population growth in Russia is (and has been for the past several decades) below replacement levels,
meaning that the population has actually been decreasing. (Somewhat hilariously, this is due not
only to a declining birthrate, but also to a decreasing life expectancy, thanks to the increasingly
third-world-quality of Russia’s healthcare system (not to mention its political system).) In 2009,
Russia’s population was about 142 million, and was declining at a rate of 0.177% per year. If this
is true, what should Russia’s population be this year (2010)? next year? in 2011? what about in
2020? what about in the centenary of Putin’s birth in 2052? when will the Russian population drop
below 100 million? according to the model, when will it drop below 75 million? If you were a Russian
policymaker, what would you do to alleviate this problem?
114. In 1985, the only freeways that existed in the greater Salt River Valley area4 were I-10 and I-17,
totaling about 85 miles in length. In 1985, Arizona voters passed Proposition 300, enacting a half-
cent sales tax to fund highway construction. This resulted in the current car-encouraging sprawl
of superhighways. In 2005, the total freeway network in Maricopa County was about 185 miles in
length5 . Assuming you have no other knowledge about Ph****x’s growth patterns, Arizona politics,
4
I refuse to call it the “Valley of the Sun,” which emphasizes the superficial desires of local immigrants for “good weather”
and ignores a perfectly competent, geographically-accurate, existing name.
5
these data come from some Google Maps calculations, as well as AZDOT’s 2005 Annual Report on Proposition 400,
http://www.azdot.gov/Highways/valley freeways/US60/Superstition/PDF/ANNUALREPORT89292.pdf
11
American demographics, etc., and can only assume that the Maricopa highway network will continue
to grow at the same rate, how many miles of freeway should there be this year (2010)? What about
in 2015? What about in 2020? What about in year t? When will the highway network in the Salt
River Valley reach 1,000 miles in length?
115. After graduating from high school, you gradually begin to lose touch with friends, such that every
year, you only have half as many friends from high school as you did the year before. At the instant
of your graduation, you had 22 friends from high school. How many do you have the summer after
your first year of college? the summer after that? Come up with a function for the number of high
school friends you have as a function of the years since your graduation. (Do you think that creating
said functions might be responsible for the decline in your number of friends?) When will you only
have four friends left from high school? When will you have no friends left from high school?
116. One of the (viscerally) coolest physics labs I did at the University of Chicago (cooler even than my
experiment with polymers that required commandeering my dorm’s industrial-size freezer) involved
creating a radioactive isotope and measuring its half life. In the basement of the undergraduate
physics building in a lead-lined room is a plutonium-beryllium “neutron howitzer”. The howitzer’s
radioactive ingredients spit out a steady stream of neutrons. If you were to stand next to it for too
long, your atoms would absorb some of the neutrons and become radioactive, too6 .
So my friend Alex and I went down to the basement with a small piece of silver, and stuck it into
the neutron howitzer. Silver, as it turns out, is an excellent material for this sort of experiment.
In its natural state, it is a mixture of two stable isotopes, Ag107 and Ag109 . Adding a neutron to
each produces radioactive isotopes, Ag108 and Ag110 . Each of these isotopes will eventually decay by
spitting the neutron back out.
Before we go any further, let’s talk about half-life. Radioactive decay is fundamentally quantum-
mechanical. If you have something like Ag108 ... well, anyway, I just realized that this problem is
going to be really hard, because these two isotopes have different half-lives, and so the tricky thing in
the experiment was isolating from the data the decay from each of the isotopes, etc., and measuring
the half-life of each of them... I guess it would be helpful if I had the actual lab report in front of
me. But it’s back in Ithaca. I’m just making this up as I remember it. Hm. Well, anyway, the point
is that it was a fun way to end a year of general physics.
117. Between 2003 and 2007, the University of Chicago’s endowment increased from $3.22 billion dollars
to $6.2 billion dollars—an average increase of roughly 18% per year7 . Assuming that the U of C’s
endowment increases exponentially, as financial instruments seem wont to do in first-order approxi-
mations, come up with a function for the value of the U of C’s endowment as a function of time. (It
might help to set 2003 as “year zero,” though if you set the year 0 C.E. as “year zero” in the model,
it’s just a simple horizontal shift away.) Based on your model, what was the endowment worth in
2006? When did the endowment hit $5 billion? What should the endowment have been worth in
2009, according to your model? The actual value of the endowment in 2009 was $5.1 billion. Why
does your model disagree?
118. You have a baby. You want the baby to go to college. (You hope the baby wants to go to college,
too.) But college is expensive, so as soon as the baby is born, you start putting aside money into an
educational investment account (a 529 plan8 ). You’re not sure how much money to put in, so you
do some calculations.
(a) Imagine you deposit k dollars into this account every year, and the account grows at a nice,
6
Well, there are two options: either your atoms could absorb a neutron and be boosted into a heavier-but-still-stable isotope,
or they could absorb a neutron and be boosted into a heavier-but-unstable isotope, which will eventually radioactively decay
by spitting the extra neutron back out.
7
2009 University of Chicago Annual Report, http://www.uchicago.edu/annualreport/financials/endowment.shtml
8
See section 529 of the Internal Revenue Code, 26 U.S.C. §529
12
predictable rate of p percent every year. (So after the first year, the account is worth k(1 + p).)
How much money will there be in 18 years, when your kid is headed off to Cambridge? Give your
answer in terms of k and p. If you assume that it grows at a reasonable rate of 5% every year9 ,
what will a $1, 000 annual contribution become in 18 years? A $5, 000 annual contribution?
$10, 000? The current cost of attending MIT for four years is about $200, 000—how much
money will you need to deposit every year in order to have that much in 18 years?
(b) Realistically, though, the cost of attending MIT will increase in the next 18 years. Assume that
the total four-year cost of attending school X is currently c dollars, and that it increases at a
steady rate of q percent each year. How much will it cost to attend school X in 18 years? Give
your answer in terms of c and q. What will the cost of attending MIT be in 18 years?
(c) Let’s combine these two ideas. Your kid wants to attend school X in 18 years, and you deposit
k dollars annually into an account that grows at p percent per year. How much money should
you deposit per year? Give your answer in terms of k, p, c, and q. Now imagine that school X
is MIT, and that your kid’s 529 plan returns 5% per year. How much should you deposit per
year?
(d) Of course, realistically, your income is probably going to increase over the course of the 18 years
this smelly little thing is wandering around your house, so we could continue this and say, well,
what if our contribution to the 529 plan increases by r percent each year... but that would be
creating a very complicated model all teetering on top of this assumption of nice, predictable,
exponential growth. And economies and incomes don’t actually work like that. Also, MIT gives
out lots of financial aid.
9
not historically unreasonable, but then again, “past performance is not a predictor of future returns.”
13
Radical Angle Measurement
How do we measure angles? We tend to measure them in degrees:
But why degrees? In a complete revolution of a circle, we have 360 degrees. Why 360? Isn’t that kind
of an awkward number? Why not define a “degree” such that all the degrees in a circle add up to 100
degrees? or 10 degrees?
Or maybe we could measure angles as a percent. We could say that the measure of the angle of a
full circle is 100% (or 1), and then a 90◦ angle would be the same as a 25% (0.25) angle, a 135◦ angle
would be equivalent to a 37.5% (0.375) angle, and so forth. This would be a natural way of measuring
an angle—rather than being based on the arbitrariness of the number “360”, it would use the much more
natural choice of the number 1.
Another way to measure an angle—the way we’ll use—is this. Today is Monday, and so it’s pizza day
at Veritas. In fact, it’s the first Monday of the month, and so it’s not only pizza day for students; it’s free
pizza day for faculty, too. And so imagine you’re me, and you go down to the faculty office after this class
to get your free pizza. And you have a strong predisposition for geometric symmetry, and so you want the
crust of your piece of pizza to be the same length as the two cheesey sides, like so:
(Note that you measure the crust length as its outer length.) So you want, I guess, an equilateral piece
of pizza. Sort of like an equilateral triangle, I suppose, except not a triangle per se, because the crust-side
is curvy. But the same idea. So the question is: what angle should you cut this slice of pizza at, such that
its crust length is the same as the length of the other two sides? (By angle I mean the angle at the center
of the pizza, the one whose opposite side is the crust.)
Let’s think about this. Imagine this is a pizza with an 18-inch diameter. Then it has a radius of 9
inches, and then we know the total length of its crust (measured on the outside) is 2π · 9 = 18π. We also
know—and we’ll slip back into degrees for a moment here, but I assure you it’s in the service of a greater
good—that the total number of degrees in this pizza is 360◦ . So we can think of this like a proportion:
total crust length crust length of our slice
=
total degrees in pizza degrees in our slice
Now, we know that, in our beautifully-equilateral slice, we want the length of the crust to be equal to the
length of the other two sides. But because the pizza slice is just a slice of a circle, the other two sides must
be both 9 inches long:
1
So if we plug in all that stuff, our proportion looks like this:
18π inches 9 inches
=
360◦ degrees in our slice
Then it is easy to solve this and find out how many degrees there are in our equilaterial slice of pizza:
18π
degrees in our slice · = 9 inches
360◦
9 inches · 360◦
degrees in our slice =
18π inches
degrees in our slice ≈ 57.9◦
So then if we want a beautiful, equilaterial slice of pizza, in which all the sides are the same length, we
should cut it at an angle of 57.9◦ !!!1 Note something interesting here—that even though we started with a
pizza of radius 9 inches, that all got cancelled out in our calculations. For any size pizza, an angle of 57.9◦
will give us an equilateral slice2 .
There is a point to all of this. Namely: this is how we will define the system of angle-measurement that
we will use. We will create a measurement system in which the base unit is not 360◦ in a full revolution, or
100% in a full revolution, but rather, in which the base unit of measurement represents the angle needed
to make the opposite arc of a circle equal to the length of its radii. Informally: we will define a radian as
the angle needed to make all three sides of a slice of pizza the same length. Somewhat more
formally, I could define it with the following picture:
By the way, note that the fact that a radian is approximately 57.9◦ should make sense. We know that
an equilateral triangle has angles of 60◦ , and an equilateral pizza slice isn’t quite the same thing—because
of the curvy sides—but it is pretty close. So it makes sense that its angles should be close to 60◦ .
If we want to convert from degrees to radians (you would, of course, never want to convert out of
radians!), we can simply derive a unit conversion factor. We know that the total number of radians in a full
circle must be 2π, since that’s how many times we can fit the radius of a circle around its circumference.
(We would have a pizza whose crust-length was 2π times that of its slice-radius.) Likewise, we know that
the total number of degrees in a full circle is 360. So we must have 360◦ for every 2π radians.
1
Or rather, we should tell Mr. Fink to cut it for us with that angle.
2
It wouldn’t be too hard to prove this—just repeat all of the calculations we did, but using a pizza of radius r instead of
radius 9
2
So, for example, if we want to convert 75◦ to radians:

75 degrees 2π radians 150π 5π
= radians = radians
1 360 degrees 360 6
Problems
Rewrite the following angles in radians (or degrees, as appropriate):
1. 0 10. 13π/6 19. 7π/3 28. 60◦ 37. 12◦
2. π 11. π/4 20. π/2 29. 30◦ 38. 5.34◦

3. 2π 12. 3π/4 21. 3π/2 30. 90◦ 39. 7◦
4. 3π 13. 7π/4 22. 5π/2 31. 135◦ 40. 180◦
5. 4π 14. 9π/4 23. 74, 452π 32. 110◦ 41. 360◦
6. π/6 15. π/3 24. 8, 000, 000π 33. 150◦
42. 365◦
7. 5π/6 16. 2π/3 25. kπ 34. 170◦
43. (5 million)◦
8. 7π/6 17. 4π/3 26. aπ/b 35. 179◦
44. (your favorite
9. 11π/6 18. 5π/3 27. 45◦ 36. 225◦ number)◦
3
The Unit Circle Definition of Trig Functions
Thanks to your previous experiences in math and physics, you already have some basic knowledge of
trigonometric functions—sine, cosine, tangent, and the like. You’ve probably defined them in terms of a
right triangle: for example, if you have some right triangle, and some angle θ in that triangle, then the
sine of θ is the ratio of the length of the opposite side to the length of the hypotenuse.
opp adj opp

sin(θ) = cos(θ) = tan(θ) =
hyp hyp adj
These are useful, because they are the Rosetta Stone that allows us to go from talking about angles
to talking about distances (from circular rotation to linear motion). In math and physics, you’ve probably
used calculators to find sines and cosines most of the time, but you’ve probably seen that there are some
special angles, like π/6 = 30◦ , that you can find the sine/cosine/tangent of exactly, and not as a decimal
approximation. You can do this using these two special right triangles:
For example, you can exactly evaluate sin(π/3) by looking √ at the special right triangle with a π/3
angle, seeing that the side opposite to that angle has length 3, and the hypotenuse of that triangle has
length 2. So then
π √3
sin =
3 2
√
And it is much nicer to write 3/2 than it is to write 0.866025404, which is what your calculator would
tell you.
Now, using this methood, we still might not be able to calculate every trig function nicely. For
example, what’s sin(7π/34)? We don’t know, because we don’t know any special right triangles with a
7π/34 angle in them. Like with logarithms, we are restricted to only evaluating some trig functions exactly
and beautifully. The others we can use a calculator for. Your calculator, of course, uses formulas that fill in
the gaps to estimate trig functions/logs as decimal approximations, but those formulas aren’t particularly
interesting.
Our real issue now is this one: our definition of these trig functions is based on triangles. Specifically,
on the angles in, and the ratio of the sides of, a right triangle. But the angles in a right triangle have to
be between 0 and 90◦ . What if we want to find the sine of some angle greater than 90◦ ? We can’t make
a right triangle that has an angle greater than 90◦ , so we can’t find the sin/cosine/tangent of any angle
greater than 90◦ . What if we want to find the sine/cosine/tangent of a negative angle? We certainly can’t
make a right triangle with a negative angle. (Do negative angles even exist? is it even reasonable to talk
about such things?) This is the downside to our current definition of trig functions. It restricts us to only
finding trig functions of angles between 0 and 90 degrees.
What if we wanted to extend our definition so that we could find the sine/cosine/tangent of angles
less than 0◦ , and greater than 90◦ ?
special right triangles
examples
e.g.
we still might not be able to calculate every single trig function as a decimal—what’s sin(7π/34)?—but
we can compute a few. just like with logs .
anyway, it’s easy to come up with formulas that can fill in the gaps and approximate to find things as
decimals like sin(7π/34), but that’s not particularly interesting (not right now, at least).
the real question is this: our definition of these trig functions is based on triangles. on the angles of a
right triangle. but the angles in a right triangle have to be between 0 and 90◦ .
but want to extend. what if we want to take the sine of −π/3? or the cosine of 5π/4? we can’t draw
a triangle that has a negative angle (that’d be like drawing a negative distance!) we can’t draw a right
triangle that has an angle of more than 90 degrees in it. can we even take the sine/cosine/tangent of these
angles?
extend definition of exponents
extend definition of trig functions!
some sort of coordinate system so that having a ”negative angle” makes sense (in the same way that
we can have a ”negative distance”–just go in opposite direction)
so put a triangle on the coordinate system! if we do this, then one thing we notice is:
i can draw a whole bunch of right triangles, but i only care about similar ones, so let’s keep hypotenuse
the same
and why not choose 1?
and some way of measuring a triangle
thus unit circle
examples with working it out.
EG find sin(π/3)
EG find cos(5π/6)
EG find sin(−π/4)
clearly, this matches up with our old definition of trig functions. in our old definition, sine was
opp/hyp. in this case, sin(θ) = opp/1 = opp, which is the same as the y-coordinate of that point. likewise
with cos(θ).
Problems
Using what you know about trigonometry, special right triangles, and the unit circle, evaluate the
following trig functions without a calculator:
1. sin(0) 22. sin(−3π) 43. sin(π/4) 64. sin(4π/3)

2. cos(0) 23. cos(−3π) 44. cos(π/4) 65. cos(4π/3)
3. tan(0) 24. tan(−3π) 45. tan(π/4)
66. tan(4π/3)
4. sin(π) 25. sin(−4π) 46. sin(3π/4)
67. sin(5π/3)
5. cos(π) 26. cos(−4π) 47. cos(3π/4)
68. cos(5π/3)
6. tan(π) 27. tan(−4π) 48. tan(3π/4)
7. sin(2π) 28. sin(π/6) 49. sin(7π/4) 69. tan(5π/3)
8. cos(2π) 29. cos(π/6) 50. sin(5π/4) 70. sin(7π/3)
9. tan(2π) 30. tan(π/6) 51. cos(5π/4) 71. cos(7π/3)
10. sin(3π) 31. sin(5π/6) 52. tan(5π/4)
72. tan(7π/3)
11. cos(3π) 32. cos(5π/6) 53. cos(7π/4)
73. sin(π/2)
12. tan(3π) 33. tan(5π/6) 54. tan(7π/4)
13. sin(4π) 34. sin(7π/6) 55. sin(9π/4) 74. cos(π/2)
14. cos(4π) 35. cos(7π/6) 56. cos(9π/4) 75. tan(π/2)

15. tan(4π) 36. tan(7π/6) 57. tan(9π/4) 76. sin(3π/2)
16. sin(−π) 37. sin(11π/6) 58. sin(π/3)
77. cos(3π/2)
17. cos(−π) 38. cos(11π/6) 59. cos(π/3)
78. tan(3π/2)
18. tan(−π) 39. tan(11π/6) 60. tan(π/3)
79. sin(5π/2)
19. sin(−2π) 40. sin(13π/6) 61. sin(2π/3)
20. cos(−2π) 41. cos(13π/6) 62. cos(2π/3) 80. cos(5π/2)
21. tan(−2π) 42. tan(13π/6) 63. tan(2π/3) 81. tan(5π/2)

A Helpful Mnemonic
One of the things you have probably noticed while working with our unit-circle definition of trig
functions is that the trig functions of angles greater than π/2 and less than 0 really aren’t all that different
than the trig functions of angles between 0 and π/2. The only √ difference is that sometimes they have a
negative sign in front of them. For √ example, cos(5π/4) is −1/ 2, which, except for that negative sign, is
the same as cos(π/4) (which is +1/ 2).
So—in order to make our computation of trig functions go a bit faster (because really, who wants to
write out the entire unit circle every time?)—we can make the following helpful generalization, based on
our knowledge of the unit circle, and summarize it with a nice mnemonic:
My high school trig teacher, before he moved to Ithaca, had taught in New York City public schools,
and as a result, he had a slightly different version of this mnemonic—at his school, students took something
else.
1
The Proof of the Pythagorean Theorem
Trig Identities
Periodicity Identities
What if we’re trying to find the sine of 9π/4? We know this will be exactly the same as sin(π/4).
Why? Because 9π/4 is just the same as π/4 + 2π, and the extra 2π is just another revolution around the
unit circle—it’s a different
√ angle than π/4, but it lands you back in the same place that π/4 does. So the
sine is the same! (It’s 1/ 2.)
This is, not surprisingly, a general law. Both sine and cosine repeat every time we go around the unit
circle. Put differently, every time we add 2π to an angle, the sine and cosine don’t change. And we could
add 2π as many times as we want—we could add 4π, 6π, 358π—or we could even go around the circle in
the opposite direction, and subtract 2π, or 4π, or 358π. More formally, the following equations (in which
k is any integer) are true:
sin(θ + 2kπ) = sin(θ)
cos(θ + 2kπ) = cos(θ)
What about tangent? It repeats even faster than sine and cosine. Since the only thing that changes
about sine and cosine every π units is its sign (s-i-g-n, i.e., whether it’s + or −), and since tangent is just
the quotient of sine and cosine, the signs will cancel out. And so tangent repeats every π units. Formally:
tan(θ + kπ) = tan(θ)
Symmetry Identities
There are some other useful identities. What if I have some angle θ, and I find cos(θ), and then I also
want to find cos(−θ)? Is there any relationship between these two cosines? Think about what this will
look like on a unit circle:
1
Cosine, of course, is just the x-coordinate of that point on the unit circle. So if we consider −θ, that
point just becomes the vertical reflection of the corresponding point for +θ. Meaning that its x-coordinate
doesn’t change. Meaning that its cosine doesn’t change. And thus we have the following identity:
cos(−θ) = cos(θ)
By contrast, the y-coordinate of that point does change. It becomes negative of whatever it was before.
And we know that the y-coordinate of that point is (by definition) sin(θ). So we must have:
sin(−θ) = − sin(θ)
Here’s another graph that shows it a bit better
Do you notice anything familiar about these two equations? They look just like the definition of even
and odd functions! So cosine is an even function/a function that’s horizontally symmetric around the
y-axis (like x2 ), and sine is an odd function (like x3 ).
Tangent, meanwhile, will incorporate the negative.
sin(−θ) − sin(θ)
tan(−θ) = = = − tan(θ)
cos(−θ) cos(θ)
2
and thus will be odd. This is just like the question on that test—the product/quotient of an even function
and an odd function is an odd function!
Interchange Identities
Another thing you’ve probably noticed, just from knowing the graphs of sine and cosine, is that they’re
the same—they’ve just got a slight horizontal shift. Why is this? The quick explanation (going back to
the unit circle definition) is that circles have this nice radial symmetry, so that the x and y coordinates
are changing at the same rate–the only difference is that the x-coordinate (cosine) starts at 1, and the
y-coordinate (sine) starts at 0. Put differently, the x-coordinate is about π/2 radians ahead of where the
y-coordinate is. Or:
cos(θ) = sin(θ + π/2)
Stated in the opposite way:
sin(θ) = cos(θ − π/2)
The Pythagorean Identity

There’s another really cool relationship between trig functions. Namely: if I square the cosine of some
angle, and then square the sine of the same angle, and add them, I just get one:
(sin θ)2 + (cos θ)2 = 1
Why is this true? Imagine we have a right triangle, with base lengths a and b, hypotenuse c, and an angle
θ:
Then we know that

a
sin(θ) =
c
b
cos(θ) =
c
Which, if we rearrange, is just another way of saying:
a = c sin(θ)
b = c cos(θ)
So we might as well label the sides of our triangle as c sin(θ) and c cos(θ), since they’re just equal to the
lengths:
But this is a right triangle, and so we must have:
3
(c sin θ)2 + (c cos θ)2 = c2 (by the Pythagorean thm)
c2 (sin θ)2 + c2 (cos θ)2 = c2 (distributing the square)
(sin θ)2 + (cos θ)2 =1 (dividing by c2 )
Often, by the way, we write things like (sin θ)2 and (cos θ)2 as sin2 θ and cos2 θ, just as a more
convenient notation (fewer parentheses!).
Note that we could write slightly modified versions of this identity. For example, if we divide both
sides by cos2 (θ), we get:
sin2 θ + cos2 θ 1
=
cos2 θ cos2 θ
sin2 θ cos2 θ 1
2
+ 2
=
cos θ cos θ cos2 θ
tan θ + 1 = 1/ cos2 θ
2
Or if we divide it all by sin2 θ:
sin2 θ + cos2 θ 1
2 =
sin θ sin2 θ
2 2
sin θ cos θ 1
2 + 2 =
sin θ sin θ sin2 θ
1 + 1/ tan θ = 1/ sin2 θ
2
For your convenience, I’ve summarized below all of the identities we’ve discussed. But please, please,
don’t try to memorize them! That will not help you understand trigonometry better! If you understand
the trig—really understand it—then all of these identities1 should make sense. They should be natural and
obvious and you shouldn’t even really have to think about these equations in order to apply them. This
list, then, should not be a list of things you need to know, but a codification of things you already know.
Periodicity Identities (for any integer k): Symmetry Identities:

• sin(θ + 2kπ) = sin(θ)
• cos(θ + 2kπ) = cos(θ) • cos(−θ) = cos(θ)
• tan(θ + kπ) = tan(θ) • sin(−θ) = − sin(θ)
• tan(−θ) = − tan(θ)
Interchange Identities:
• cos(θ) = sin(θ + π/2) Pythagorean Identity:
• sin(θ) = cos(θ − π/2) sin2 θ + cos2 θ = 1
1
With the possible exception of the Pythagorean identity, which is not obvious and takes a little work to derive.
4
Equations with Trig Functions
Using what you know about algebra (factoring, the quadratic formula, etc., etc.), and what you know
about trigonometry (the unit circle, special right triangles, Pythagorean and other identities, etc.), solve
the following equations for θ. That is, find all values of θ that make the equation true. (You may not
be able to solve each equation exactly and might have to write some of your answers using an inverse
trigonometric function.)
1. sin(θ) = 1/2 25. 3 sin2 (θ) + 2 sin(θ) = 5

√
2. cos(θ) = 1/ 2 cos(θ)
26. tan(θ) = cos(θ)
√ √
3. tan(θ) = ( 3 + 1)/( 3 − 1) 27. tan(θ) cos(θ) = cos(θ)
4. cos2 (θ) = 3/4 28. cos(θ) · 1
= 2 cos(θ)
sin(θ)
5. cos2 (θ) = 1/2 1
29. tan(θ) · cos(θ) + 3 tan(θ) = 0
6. cos2 (θ) =0
30. 4 sin(θ) tan(θ) − 3 tan(θ) + 20 sin(θ) − 15 = 0
7. sin2 (θ) = 1/4
31. 25 sin(θ) cos(θ) = 5 sin(θ) + 20 cos(θ) = 4
8. sin2 (θ) = 1/2
32. sin2 (θ) + 2 sin(θ) − 2 = 0
9. sin2 (θ) = 1
33. cos2 (θ) + 5 cos(θ) = 1
10. sin2 (θ) − 1/4 = 0
√
4−2 2 34. tan2 (θ) + 1 = 3 tan(θ)
11. sin2 (θ) − 8 =0
35. 4 cos2 (θ) − 2 cos(θ) = 1
12. cos2 (θ) − 1 = 0
√
4+2√3
36. 2 tan2 (θ) − 1 = 3 tan(θ)
13. tan2 (θ) − 4−2 3
=0
37. 6 sin2 (θ) + 4 sin(θ) = 1
2
14. sin (θ) − 5 sin(θ) − √1 sin(θ) + √5 =0 1
2 2 38. cos2 (θ)
− 2 tan2 (θ) = 0
15. sin2 (θ) + 32 sin θ − 1 = 0
39. 9 − 12 sin(θ) = 4 cos2 (θ)
16. sin(θ) cos(θ) + cos(θ) − √1 sin(θ) + √1 =0
2 2 1
40. cos2 (θ)
+ tan(θ) = 3
17. 2 cos2 (θ) + sin θ + 1 = 0
√ 41. cos2 (θ) − sin2 (θ) + sin(θ) = 0
18. sin(2θ) = 2/2
1
√ √ 42. 2 tan2 (θ) + tan(θ) = 5 − cos2 (θ)
19. cos(5θ) = ( 3 + 1)/(2 2)
43. cos(θ) = sin(θ)
20. tan(15θ) = 0
44. cos2 (θ) = sin2 (θ)
21. tan(θ) cos2 (θ) = tan(θ) √ √
3
22. 3 sin2 (θ) − 8 sin(θ) = 3 45. sin(θ) cos(θ) − √12 cos(θ) + 2 sin(θ) − 2√32 = 0
23. 5 cos2 (θ) + 6 cos(θ) = 0 46. tan(θ) sin(θ) − tan(θ) + sin(θ) − 1 = 0

24. 2 tan2 (θ) + 5 tan(θ) + 3 = 0 47. cos2 (θ) − cos(θ) − √1 cos(θ) + √1 =0
2 2
The Law of Sines
So we’ve learned a lot of trig so far. And it’s been awesome. The useful thing about trig is its ability to
turn problems of geometry into problems of algebra—and thus reduce all of our knowledge of the physical
world to just equations with numbers1 . But the major downside to the trig functions we’ve worked with
so far is that they only work with right triangles. And yet many (most?) triangles aren’t right triangles!!!
You’ve seen plenty of word problems in which the major element of interest has been a triangle that didn’t
have a 90◦ angle. And you’ve dealt with these by turning them into right triangles—by drawing a few more
sides and angles to make right triangles, then applying trig functions to them, and using that information
to figure out whatever it was you wanted to figure out about your original, non-right triangle. I could
probably add in a relevant word problem here to motivate this further.
Anyway, we can codify that procedure—finding things out about a non-right-triangle by turning it
into right triangles—in this way. Imagine we have some triangle—doesn’t have to be a right triangle—with
angles α, β, and γ, and opposite sides (respectively) of A, B, and C, like so:
Then the following equation(s) are true:
sin(α) sin(β) sin(γ)

= =
A B C
Why? Imagine we have such a triangle (as seen above), and we turn it into two right triangles by
drawing a line of length k, like so:
then, just from our knowledge of right-triangle trig, we must have

k k
sin(β) = and sin(γ) =
C B
If we rearrange both of these equations, we get:
k = C sin(β) and k = B sin(γ)
so I can set them equal to each other:

C sin(β) = B sin(γ)
and divide:
sin(β) sin(γ)
=
B C
Ta-da! To prove the rest of this—to show that these are both equal to sin(α)
A —we can do the same
procedure, but this time draw a line that doesn’t chop into the angle α. (For example, we could draw a
line from the point near angle β, and get two right triangles, one with an angle α and another with an
angle β.)
1
Okay, this is way overblown, but I think you get the idea
The Law of Cosines
You know the Pythagorean Theorem. a2 +b2 = c2 . It’s pretty awesome. It turns geometry into algebra.
But it only works for right triangles. You know what would be even more awesome? A Pythagorean
Theorem that worked for all triangles!!!
So let’s create one. Let’s come up with a super-Pythagorean Theorem that relates the lengths of the
sides of any triangle, even one that doesn’t have a right angle. The usual name given to the theorem we’ll
come up with is the Law of Cosines, but that doesn’t really do it justice. Call it the Super Pythagorean
Theorem, or something. To put it more mathematically, the Law of Cosines is a generalization of the
Pythagorean theorem, meaning that it does everything the Pythagorean Theorem does (relate the side
lengths of a right triangle), but it does even more (relate the sides of any triangle). Likewise, the unit
circle definition of trig functions is a generalization of the right-triangle definition of trig functions (because
it does all the same stuff that the right-triangle definition does, but it also accounts for angles greater than
90◦ or less than 0◦ ), and the rational numbers are a generalization of the integers (because the rationals
include the integers, but also include other numbers, like 2/5 or 0.945). Or to use a non-math example,
a cordless drill is a generalization of a screwdriver (because, given the right drill bits, you can use it as a
screwdriver, but you can also use it to drill holes in masonry to pack dynamite into).
The theorem is this: imagine we have some triangle with angles α, β, and γ, and opposite sides
(respectively) of length a, b, and c, like so:
Then this equation is true:

c2 = a2 + b2 − 2ab cos(γ)
Proof: What is really cool about this proof is that it uses the Pythagorean Theorem. Meaning, it uses
the Pythagorean Theorem to create a stronger version of itself. The strategy is to take a triangle (any
triangle), split it up into right triangles, and apply the Pyth. Thm. to them. This is cool because, I
mean, it’s like, using simple tools to create complicated tools. So my historicizing comparison would be:
civilization started with people banging stones together to create simple tools, and then using those tools
to create slightly more complicated tools, etc. etc., and then ten thousand years later we have lasers and
747s. Which can all be traced back to people in Anatolia banging stones together.
Now, in daily life, I don’t know how useful it is. I had never actually run across it until my fourth year
at the U of C, when I learned (the day before) that I was supposed to be teaching it. But I got so excited,
because it is SO COOL. Because it’s a generalization of this awesome theorem that we use all the time.
If there are any fundamental themes or ideas behind “mathematics,” this idea of generalization is one of
them. It is this idea of taking what we know and trying to think about it more broadly. What if we can
take the square root of negative numbers? What happens? The number system that results (the complex
numbers) not only contains everything we already know about the real numbers, but has so much more,
and has so much additional, beautiful structure, that we’d never see if we refused to look for it and simply
plugged our ears and clenched our eyes shut and shouted, “Of course you can’t take the square root of a
negative number!”
1
Right. So, the proof. Imagine we have a triangle like this:
And we split it up into two right triangles by drawing a line from angle β, and let’s say that line has length
k:
Then: consider that right triangle on the right. Let’s call the length of that bottom side of that
triangle (the side adjacent to γ) j:
Now let’s see if we can find the lengths of k and j, only in terms of what we already know. We know that
k j
sin(γ) = and cos(γ) =
a a
which are just different ways of saying that
k = a sin(γ) and j = a cos(γ)
So we can relabel our triangle accordingly:
But now let’s consider the right triangle on the left. We know that it has a hypotenuse of c, and that one
of its sides (the side opposite angle α) is of length a sin(γ). But what about the side on the bottom? What
is its length? We know that since the total length of the bottom line (for both triangles) is b, and that
2
the length of the right-hand portion is a cos(γ), the length of the portion on the left right triangle must be
b − a cos(γ):
But now we can apply the Pythagorean Theorem to that right triangle on the left!!! We must have:
c2 = (b − a cos γ)2 + (a sin γ)2
= (b2 − 2ab cos γ + a2 cos2 (γ) ) + a2 sin2 (γ) (squaring)

2
= b2 − 2ab cos γ + a2 (cos2 γ + sin γ) (factoring last couple terms)
= b2 − 2ab cos γ + a2 (1) (Pythagorean identity!)
= a2 + b2 − 2ab cos(γ) (rearranging)
Yay! Note that, if γ = 90◦ = π/2, this just reduces to the good old Pythagorean Theorem:
c2 = a2 + b2 − 2ab cos(π/2) = a2 + b2 − 2ab · 0 = a2 + b2
So don’t memorize the Pythagorean Theorem—memorize THIS!!!
3
Some Word Problems With Trig
Here are a bunch of word problems involving trig. Do them. Convert angles to radians when appropriate.
Leave your answers in fully-written-out form as far as you can; if you are desperate for a decimal answer,
wait until the last step before you plug things into your calculator. (And if you do that, be sure your
calculator is in the correct mode (radians or degrees).)
Note that a lot of these problems use the phrases “angle of elevation” and “angle of depression.” From the
context of the problems I think it’s clear what these mean, but here’s a visual explanation:
1. A 24-ft ladder leaned up against a wall forms 7. The Ohio Turnpike has a maximum uphill
an angle of 75◦ with the ground. How high up slope of 3◦ . How long must a straight uphill
the wall does the ladder reach? How far is the segment of the road be in order to rise 450 feet
base of the ladder from the wall? vertically?
2. Imagine you have a 15-foot ladder. Come up 8. Jilly-Jane is flying a kite. Her hand is 3 feet
with an equation for the height the ladder can above the ground, and is holding the end of a
reach as a function of the angle of elevation 300-ft kite string, which makes an angle of 57◦
between the ladder and the ground. with the horizontal. How high is the kite?
3. A guy wire stretches from the top of a radio 9. Imagine that a person with a reach of 27 inches
tower to a point (on flat ground) 18 feet from and a shoulder height of 5 feet is standing up-
the base of the tower. The angle between the right on rock face that makes a 62◦ angle with
wire and the ground is π/3.How high is the the horizontal (as shown below). Can the per-
tower? son touch the mountain?
4. A plane takes off at an angle of 5◦ . After trav-
eling three miles away from the airport (mea-
sured along the surface of the earth), how high
is the plane?
5. A plane takes off at an angle of 6◦ traveling
at the (constant) rate of 200 feet per second.
If it continues on this flight path at the same
speed, how many minutes will it take to reach
an altitude of 8000 feet? (Recall that distance
= rate · time, if the rate is constant.)
6. The angle between the top of a building and
a point 80 feet away from the base (on level
ground) is 70◦ . How tall is the building? 10. A swimming pool is 3 feet deep in the shal-
1
low end. The bottom of the pool has a steady of depression of the car changes from 15◦ to
downward drop of 12◦ towards the deep end. 33◦ during the three minutes that the man is
If the pool is 50 feet long, how deep is the deep watching the car, how far does the car travel?
end? how fast is it traveling?
11. A wire from the top of a TV tower makes an 18. Two boats lie on a straight line with the base
angle of 49.5◦ with the ground and touches the of a lighthouse. From the top of the lighthouse,
ground 225 feet from the base of the tower. 21 meters above the water level, you observe
How high is the tower? that the angle of depression of the two boats
12. A building casts a shadow 130 feet long when are 53◦ and 27◦ . How far apart are the boats?
the angle of elevation of the sun (measured 19. A rocket shoots straight up from the launch
from the horizon) is 38◦ . How tall is the build- pad. Five seconds after lift-off, an observer 2
ing? miles away notes that the rocket’s angle of ele-
13. A plane flies on a straight course. On the vation as 3.5◦ . Four seconds after that, the an-
ground directly below the flight path, ob- gle of elevation is 42◦ . How far did the rocket
servers 2 miles apart spot the plane at the same rise during those 4 seconds? What is its veloc-
time. The plane’s angle of elevation is 46◦ from ity? (Give units.)
one observation point and 71◦ from the other.
20. You are sitting in your apartment in Brooklyn
How high is the plane?
late one night, staring out the window. You
see a streetlight. The angle of depression to
the top of the streetlight is 55◦ , and the an-
gle of depression to the base of the streetlight
is 57.8◦ . You’re on the ninth floor of your
building, and each floor (including floor/ceiling
space) takes up about 12 vertical feet. How tall
is the streetlight?
21. The 60-foot drawbridge to your castle is 24 feet

above water level when closed. When open,
the bridge makes an angle of π/3 with the hor-
14. A buoy in the ocean is observed from the top izontal. When closed, how high is the tip of
of a 40-meter-high oil rig. The angle of depres- the bridge above the water? how far is the top
sion from the top of the tower to the buoy is of the bridge away from the opposite bank of
6.5◦ . How far is the buoy from the base of the your moat?
oil rig?
15. A plane passes directly over your head at an
elevation of 500 feet. Two seconds later, you
observe its angle of elevation (the angle be-
tween the plane, you, and the horizontal) as
π/4. How far did the plane travel during those
2 seconds?
16. A man stands 12 feet from a statue. The an-
gle of elevation from eye level to the top of the
22. In aerial navigation, directions are given in de-
statue is 20◦ , and the angle of depression to
grees clockwise from north, called headings.
the base of the statue is 15◦ . How tall is the
Thus due east is 90◦ , due south is 180◦ , due
statue?
west is 270◦ , etc. A plane travels from an air-
17. From the top of a 100-foot building, a man ob- port for 200 miles at a heading of 300◦ . How
serves a car moving towards him. If the angle far west of the airport is the plane?
2
of 65◦ for 2 hours, and then changes to a course
of 155◦ for 4 hours. After these six hours, how
far is it away from port?
29. A point on the North Rim of the Grand

Canyon is 7, 256 feet above sea level. A point
on the South Rim directly across is 6, 159
above sea level. The canyon is 3, 180 feet wide
(horizontally) between the two points. What
23. Okay, another airplane question. A plane trav- is the angle of depression from the North Rim
els from an airport at a constant 300mph at a point to the South Rim point?
heading of 65◦ . How far east of the airport is
the plane after half an hour? How far north of 30. A truss for a barn roof is constructed as shown
the airport is the plane after 2 hours and 24 below. What is the height of the barn directly
minutes? underneath the center of the roof?
24. A car on a straight road passes under a bridge.

Two seconds later, an observer on the bridge,
20 feet above the road, notes that the angle
of depression to the car is 7.4◦ . How fast (in
mph) is the car traveling?
25. You walk across the shown-below pedestrian

overpass. As you step onto it, your sweater
snags on the bridge. Unaware, you continue
walking, and it begins to unravel. When you
disembark at the opposite side of the bridge,
how many feet of yarn have unravelled from
your sweater? (Assume, however unrealisti-
cally, that the yarn doesn’t stretch, and is lying
31. Your grandfather dies and leaves you his 9-
on the bottom of the bridge.)
foot-tall clock. It has a base (and top) that’s
about 2 feet square. Your apartment has 9-
foot, two-inch ceilings. But the door to your
apartment is only 7 feet tall. Will you be able
to use the grandfather clock as an interior dec-
oration?
32. More generally: imagine you have a grandfa-

ther clock a units tall and b units wide, a ceil-
26. A 50-foot flagpole stands on top of a building.
ing k units high, and a door less than k units
From a point on the ground, the angle of el-
tall. When will you be able to fit it into your
evation to the top of the pole is 43◦ , and the
apartment?
angle of elevation to the bottom of the pole is
40◦ . How high is the building? 33. On a foggy night in San Francisco, you set off
27. Two points on level ground are 500 meters in your jetski at 9:30 PM, zipping across San
apart. The angles of elevation from these Francisco Bay at 23mph on a heading of 105◦ .
points to the top of a nearby hill are 52◦ and Simultaneously, your friend Bob leaves from a
67◦ , respectively. The two points and the top marina in Oakland due east of your marina in
of the hill lie on a straight line. How high is SF, traveling at a heading of 195◦ . Later, other
the hill? boaters will tell police that they heard screams
and saw a large fireball in the Bay. What hap-
28. A boat travels at 40mph from port on a course pened? When? What was Bob’s average speed
3
before the crash? would its dimensions be? How would you fig-
ure out?
34. Somewhere in Canada’s frozen north, a
poacher is setting a trap for a polar bear. Un- 37. At 7 AM, Ship A is 60 miles due east of Ship
beknownst to him, a polar bear has noticed, B. Ship A had been sailing west at 20mph and
and is running toward the poacher to teach Ship B had been sailing southeast at 30mph.
him a lesson in Arctic ethics. The poacher’s Come up with a function for the distance be-
friend, watching in horror, tries to warn the tween the two ships as a function of time.
poacher, but they’re a half-mile apart, and the When are the two ships closest together?
poacher can’t hear his friend shouting for him
to turn around. The friend, meanwhile, has 38. A fence h feet high runs parallel to a tall build-
a compass with him, and observes that over ing and w feet away from it. Find the length
a thirty-second period, the angle between the of the shortest ladder that will reach from the
two friends and the bear (i.e., the angle given ground across the top of the fence to the wall
by poacher–friend–bear) decreases from π/3 to of the building. (Suggestion: come up with a
π/4. (You can assume that the polar bear is function for the length of such a ladder as a
running on a course perpendicular to the line function of its angle of elevation.)
between the two friends, i.e., that the angle
friend–poacher–bear is a right angle.) When 39. Determine the point(s) on the function f (x) =
will the polar bear reach the poacher? (Be x2 +1 that are closest to the point (0, 2). (Sug-
sure to give proper units.) gestion: Draw a picture. Then come up for the
a function for the distance between the point
35. Imagine a rectangle of width 2x inscribed in (0, 2) and some arbitrary point on the function
an isosceles right triangle with a hypotenuse 2 f (x). When will that function be smallest?)
units long:
40. You are a lifeguard at the municipal beach
in Churchill, Manitoba. One day, as you are
sitting on your lifeguard chair next to Hud-
son Bay, you see a swimmer being attacked
by a polar bear. The swimmer appears to be
roughly 120 yards out to sea (on a straight line
between the swimmer and the shore), and the
lifeguard station is roughly 300 yards down the
beach from the nearest point on shore to the
swimmer. You can run at 13 yards per minute
along the beach, and you can swim at 5 yards
(a) Express the y-coordinate of the point P per minute. You want to get to the swimmer
in terms of x in the shortest amount of time, but you know
that if you swim directly to him/her, it will
(b) Come up for a function for the area of the
take longer (since you can’t swim very fast),
rectangle in terms of x.
and you know that if you run all the way down
(c) Imagine you wanted to find the area of the beach, it’ll take longer (since you’ll have to
the largest possible rectangle that can fit travel a longer distance). Given that you want
inside the triangle. How would you do it? to reach the swimmer as quickly as possible,
how far down the beach do you run, and how
36. You want to make an open-topped rectangular
far do you swim?
box by taking an 8-by-15in piece of cardboard,
cutting squares out from the corners, and fold- Suggestion: if you run x yards down the beach,
ing up the sides. Come up with an equation how far do you swim (in terms of x)? how long
for the volume of said box, if each square you will it take you to get to the swimmer (in terms
cut out is x inches long. What if you wanted of x)? how can you find where this function has
to make the biggest box you can make? What a minimum?
4
Exactly Evaluating Even More Trig Functions
We know how to find trig functions of certain, special angles. Using our unit circle definition of the trig
functions, as well as our knowledge of a couple special right triangles, we can find the sine/cosine/tangent of
angles like π, 2π, 3π, −π, −2π—all the multiples of π—as well as all the multiples of π/2, π/3, etc.—really,
all the multiples of
π π π π
π, , , ,
2 3 4 6
But it would be nice if we could evaluate the trig functions of even more angles!!! So... well, I don’t know
how we would do this in general. But let me suggest one way that maybe we could go about adding one
more angle to our inventory. Here’s something I notice when I look at that list. What happens if I subtract
π/3 and π/4?
π π 4π − 3π π
− = =
3 4 12 12
I get a new angle—π/12—an angle that isn’t already on the list1 . So maybe... I mean, we don’t know what,
like, sin(π/3 − π/4) is. We don’t know any way to simplify that. It’s certainly not just sin(π/3) − sin(π/4).
We can’t just break trig functions up like that. But maybe... maybe if we could come up with some sort
of formula for, like, the sine and cosine of (something − something else)... maybe then I could be able to
work out sin(π/3 − π/4), and by extension, sin(π/12). I guess what I mean, more formally, is: can we come
up for a formula for sin(α − β) (and cos(α − β)) only using sin(α), sin(β), cos(α), cos(β), and ordinary
arithmetic? Because if we could do that, we could figure out the sine and cosine of π/12. Which would be
one little step towards our ultimate goal of being able to find the sine and cosine of any angle!!!
Let’s do it. Let’s try to find a formula for cos(α − β). The basic idea of this derivation2 is that we’ll
draw the angle α − β in two different ways. Then we’ll use the Pythagorean theorem/distance formula
to translate these two geometric pictures into algebraic sentences (i.e., equations). Then, because our two
pictures are pictures of the same angle, we’ll be able to set these two equations equal to each other, do
some algebra, and ultimately solve for cos(α − β).
Here’s how we’ll start. Imagine I have Cartesian axes3 , and I draw the angle β on them:
And then, on the same axes, also starting from the right side of the x-axis, I draw the angle α:
Obviously, α and β could be anything; they’re not specific angles. The cool observation here is that this
picture contains the angle α − β! See it? It’s just the angle in between α and β:
1
In number-theory terms, the reason this happens is because 4 and 3 are “relatively prime”—they share no common factors.
And so when I go to find a common denominator, I have to multiply them together.
2
Which is just a polysyllabic word for “proof,” or, more accurately, for a certain type of proof.
3
Just a fancy name for “a graph with x and y axes,” after Rene Descartes.
1
Meanwhile, I can also draw the angle α − β in the “normal” way, meaning that I can draw it starting from
the x-axis (rather than drawing it in the middle):
So then I have two different ways of drawing the same angle!
One way: Another way:
Here’s the interesting thing. What if I draw the unit circle on these graphs, like so:
2
Then I can notice that the following line (opposite the angle α − β)(the dotted one) is the same line in
both drawings:
The line is in a slightly different position, yes—but it must be the same length. The only difference in the
two pictures is that we’ve rotated the angle α − β. The dotted line is just the other side of this triangle
formed by these two radii of length 1 and the angle α − β (side-angle-side, anyone?). So whatever the
length of the line is, it must be the same for both drawings.
If only we had a way to measure the length! But we do. Because this is a unit-circle setup, we know
the coordinates of the two ends of the line:
So we can use the distance formula to find the lengths of the line! The equations will look different,
but we know that the triangles are the same, so they’ll have to work out to be the same thing (and then
3
we’ll be able to set the two different equations for the length of the dotted line equal to each other, etc.,
and eventually solve for just cos(α − β)).
The distance formula, keep in mind, is just a modified version of the Pythagorean theorem. If we have
two points
p(x1 , y1 ) and (x2 , y2 ), then the distance between them (or the length of a straight line between
them) is (x2 − x1 )2 + (y2 − y1 )2 .
So let’s apply this to the diagram on the left. If I use the distance formula to find the length of the
dotted line, I get: p
length = (cos α − cos β)2 + (sin β − sin α)2
Then if I multiply out the first square, this becomes:
p
length = cos2 α − 2 cos α cos β + cos2 β + (sin β − sin α)2
And if I multiply out the next square, I get:

q
length = cos2 α − 2 cos α cos β + cos2 β + sin2 β − 2 sin α sin β + sin2 α
Messy, messy, messy. And long. But we can make it shorter! Remember that thing we proved the other
day? The Pythagorean Identity?4 Remember the Pythagorean identity? The one about sin2 θ +cos2 θ = 1?
We can use that here! Allow me to rearrange this equation slightly:
q
length = (cos2 α + sin2 α) − 2 cos α cos β + (cos2 β + sin2 β) − 2 sin α sin β
See? Parentheses? We have cos2 α + sin2 α, which must be equal to just 1, and we also have cos2 β + sin2 β,
which must also be equal to 1. So we have:
p
length = 1 − 2 cos α cos β + 1 − 2 sin α sin β
or if I combine the 1’s: p

length = 2 − 2 cos α cos β − 2 sin α sin β
So this is ONE way of writing the length of that dotted line. Phew. ALTERNATIVELY, we could find
the length using the picture on the RIGHT. If we do that, and if we apply the distance formula, we get:
p
length = (cos(α − β) − 1)2 + (sin(α − β) − 0)2
You may want to glance back at the picture to convince yourself that this is true. Then, if I multiply out
those squares, I get: q
length = cos2 (α − β) − 2 cos(α − β) + 1 + sin2 (α − β)
Kinda messy. But AGAIN, I can apply the Pythagorean Identity!!! (I TOLD you it would be useful!)5 We
have cos2 (α − β) and also sin2 (α − β), so when added together these must be 1. So I must have:
p
length = −2 cos(α − β) + 1 + 1
p
length = 2 − 2 cos(α − β)
So we have two different ways to write the length of the same line. Let’s summarize:
4
The thing I love about this proof, by the way, is how long and intricate it is and how it pulls together so many different
concepts.
5
Sorry for all the capitals. It’s 12:15 AM, and I a) have to finish this before I go to bed, b) have to wake up at 5:30, and
c) have had lots of caffeine due to a).
4
Length of dotted line: Length of dotted line:
√ p
2 − 2 cos α cos β − 2 sin α sin β 2 − 2 cos(α − β)
Obviously, these two lines are the same lines. So their lengths must be equal. So I can set these two
equations equal to each other, and (hopefully!) solve for cos(α − β)!!!
√ p
2 − 2 cos α cos β − 2 sin α sin β = 2 − 2 cos(α − β)
2 − 2 cos α cos β − 2 sin α sin β = 2 − 2 cos(α − β) (squaring both sides)
−2 cos α cos β − 2 sin α sin β = −2 cos(α − β) (subtracting 2)
cos α cos β sin α sin β = cos(α − β) (dividing by −2)
cos(α − β) = cos α cos β + sin α sin β
There we go! We’ve done it! We’ve found an equation for cos(α − β)! This means that we can now
find cos(π/12)!!! We can use the equation that we just derived:
cos(π/12) = cos(π/3 − π/4) (fractions)

= cos(π/3) cos(π/4) + sin(π/3) sin(π/4) (by our equation!!!)
√
1 1 3 1
= ·√ + ·√ (trig)
2 2 2 2
√
1+ 3
= √ (fractions)
2 2
π 1 + √3
OMG!!!!1!!!!1111 We’ve found the cosine! cos = √ !
12 2 2
But there are still many questions unanswered. We know the cosine of π/12—but what’s the sine?
We know cos(α − β)—but what about cos(α + β)? Or sin(α + β)? Or sin(α − β)? We could make a whole
list of related formulae:
Sum and Difference Identities:
sin(α + β) =
sin(α − β) =
cos(α + β) =
cos(α − β) = cos α cos β + sin α sin β
5
We only know one of these formulae. Let’s see if we can find the other three (with the goal, ultimately, to
be able to fine sin(α − β), and thus sin(π/12)).
Now, we could follow the same procedure that we did for these other three formulae. It would work.
But it was a rather tedious, rather long process. It would be nice if there was an easier way. There is! Now
that we’ve done the hard work to get one of the formulae, the others come much more easily. (In class I
described this with an extended metaphor about burglary that I don’t really have time to scribble down.)
For instance, if we want to find cos(α + β), we can use the formula we already have—we can just
“subtract” “negative beta”, and then use two of our symmetry identities to simplify it:
cos(α + β) = cos(α − (−β)) (just algebra!!!)
= cos(α) cos(−β) + sin(α) sin(−β) (we can apply our equation)
= cos(α) cos(β) + sin(α) sin(−β) (we know cos(−θ) = cos(θ))
= cos(α) cos(β) + sin(α)(− sin(β)) (and sin(−θ) = − sin(θ))
= cos(α) cos(β) − sin(α) sin(β) (distributing negative)
What about the sine? Here’s the tricky thing: we know that sine and cosine are the same function...
just modulo a horizontal shift6 . More formally, we know that sin(θ) = cos(θ − π/2). So if we think of a
sine as just being a shifted cosine, we can have:
sin(α + β) = cos( (α + β) − π/2)
And I can move the parentheses around and write this as:
sin(α + β) = cos(α + (β − π/2) )
But then we can just apply our formula for the cosine of a sum! We have two parts—α, and β − π/2. So
if we plug those into the formula, we get:
sin(α + β) = cos(α) cos(β − π/2) − sin(α) sin(β − π/2)
Which is kind of ugly. We could simplify the cos(β −π/2) by applying our formula for cos(α −β) again, but
we don’t need to. This is just that same horizontal shift identity!!! We know that sin(θ) = cos(θ − π/2).
We’ve already used it, even. We can apply it AGAIN. So then we have:
sin(α + β) = cos(α) sin(β) − sin(α) sin(β − π/2)
What about this sin(β − π/2)? Can we simplify that? You might not know it offhand, but we can come
up with a similar horizontal shift identity. You can convince yourself that sin(θ − π/2) = − cos(θ). So if
we apply that, we have:
sin(α + β) = cos(α) sin(β) − sin(α)(− cos(β) )
which, if we distribute the negative, becomes just:
sin(α + β) = cos(α) sin(β) + sin(α) cos(β)
Okay. But we still want to be able to find sin(π/12). And in order to do that, we’ll need to find
sin(π/3 − π/4), and in order to do that, we’ll need to have a formula for sin(α − β). Luckily, we can just
use the same method (or a similar method) we used a little bit ago:
6
The word “modulo” in this context means “they are the same except for a horizontal shift.” Another similar usage might
be, “my blue 1999 minivan is the same as your blue 2001 minivan, modulo the model year.” I have been part of a small cult
movement that has been trying, for the past few years, to introduce this word (originally from math) into the popular lexicon.
Here’s another example, from an email my dad sent a few days ago: “Actually, life in Switzerland is not so different from life
at home, modulo a few obvious things like living without a car, struggling to communicate with people on the street, and
being dirt poor.”
6
sin(α − β) = sin(α + (−β)) (just algebra!!!)
= cos(α) sin(−β) + sin(α) cos(−β) (by the formula we just came
up with)
= cos(α) sin(−β) + sin(α) cos(β) (we know cos(−θ) = cos(θ))
= cos(α)(− sin(β)) + sin(α) cos(β) (and sin(−θ) = − sin(θ))
= − cos(α) sin(β) + sin(α) cos(β) (distributing negative)
= sin(α) cos(β) − cos(α) sin(β) (rearranging)
So there we have it: all of our sum and difference identities!!! Let’s summarize:
Sum and Difference Identities:
sin(α + β) = sin(α) cos(β) + cos(α) sin(β)
sin(α − β) = sin(α) cos(β) − cos(α) sin(β)
cos(α + β) = cos(α) cos(β) − sin(α) sin(β)
cos(α − β) = cos(α) cos(β) + sin(α) sin(β)
Formulae that follow like a tedious argument... we still have this overwhelming question: what’s the
sine of π/12? Now we can figure it out!!! Using the same method that we used to find cos(π/12), we can
split π/12 up into π/3 − π/4, and then apply the identity we just derived:
sin(π/12) = sin(π/3 − π/4) (fractions)

= sin(π/3) cos(π/4) − cos(π/3) sin(π/4) (by our equation!!!)
√
3 1 1 1
= ·√ − ·√ (trig)
2 2 2 2
√
−1 + 3
= √ (fractions)
2 2
Whoopee!!!! Now we know both the sine and cosine of π/12! But wait: something’s interesting here. We
know √ √
1+ 3 −1 + 3
cos(π/12) = √ and sin(π/12) = √
2 2 2 2
And we also know that, at least with angles between 0 and π/2, cosine and sine are the ratios of the
adjacent and opposite sides of a right triangle to its hypotenuse!
√ √
1+ 3 adj −1 + 3 opp
cos(π/12) = √ = and sin(π/12) = √ =
2 2 hyp 2 2 hyp
7
√
So we √can construct a new special right triangle, whose
√ hypotenuse is 2 2, whose angle adjacent to π/12
is 1 + 3, and whose angle opposite to π/12 is −1 + 3! (Note, then, that the other angle in the triangle
will be 5π/12, since the angles have to all add up to π/2.) THIS IS SO COOL.
7
I know I should be writing “the hypotenuse of which is...”, but I’d rather anthropomorphize my triangles.
7
So this is what all of our work comes down to. We did this huge derivation to find the sum
and difference identities, and then we used the sum and difference identities to evaluate the sine and
cosine of π/12, and then we used that to build this special right triangle. And now we can evaluate the
sine/cosine/tangent of any multiple of π/12. Just using this new special right triangle. Phew.
This was fun.
Problems
Using what you know about trigonometry, the unit circle, and special right triangles—including our brand-
new special right triangle!—evaluate the following trig functions without a calculator:
1. sin(π/12) 12. tan(11π/12) 23. cos(23π/12)

2. cos(π/12) 13. sin(13π/12) 24. tan(23π/12)
3. tan(π/12) 14. cos(13π/12) 25. sin(25π/12)
4. sin(5π/12) 15. tan(13π/12) 26. cos(25π/12)
5. cos(5π/12) 16. sin(17π/12) 27. tan(25π/12)
6. tan(5π/12) 17. cos(17π/12) 28. sin(9, 456, 342π)
7. sin(7π/12) 18. tan(17π/12) 29. cos(456, 093, 235π)
8. cos(7π/12) 19. sin(19π/12) 30. tan(300, 564, 222π)
9. tan(7π/12) 20. cos(19π/12) 31. sin(349π)
10. sin(11π/12) 21. tan(19π/12) 32. cos(557, 563π)
11. cos(11π/12) 22. sin(23π/12) 33. tan(1137π)
8
Intuitive Slope Sketching
Sketch the slope of each of the following functions on the same set of axes (i.e., on the worksheet). (Obvi-
ously, it does not need to be exactly to scale, but you should include the key features.) For the graphs of
polynomials: do you notice any relationship between the degree of the function and the degree of its slope?
1. 4.
2. 5.
3. 6.
1
7. 11.
8. 12.
9. 13.
10. 14.
2
15. 19.
16. 20.
17. 21.
18. 22.
3
23. 27.
24. 28.
25. 29.
26. 30.
4
31. 35.
32. 36.
33. 37.
34. 38.
5
39. 42.
40. 43.
41. 44.
6
The Derivative of xn is nxn−1
In all of the slopes we’ve drawn and computed so far, you’ve probably noticed a pattern. Visually, the
slope of a polynomial appears to be one degree less than the polynomial itself. Algebraically, you’ve seen
that:
d 2 d 3
(x ) = 2x (x ) = 3x2 etc.
dx dx
d n
Is this pattern more than just a coincidence? Is it true, in the general case, that (x ) = nxn−1 , for any
dx
value of n?
Let’s find out. In principle, we don’t have to do anything particularly different than what we did to
differentiate x2 and x3 . All we need to do is take xn , plug it into Fermat’s difference quotient, fiddle a bit,
send h to 0, and voila! we should get nxn−1 :
(x + h)n − xn
h
In practice, of course, we have the substantial complication that our exponent is not 2 or 3 or some definite
number. Our exponent is n. But how do we simplify this? When we had (x + h)2 , we were able to multiply
it out; when we had (x + h)3 , we were able to multiply that out, too. But what if I want to multiply (x + h)
by itself not once or twice but n times? Clearly, there is a pattern, but what is it?
You might recall that we discussed this earlier in the year. We can distill that pattern and write it as
an iterated sum (the things with the giant Σ), in a formula known as the binomial theorem:
(a + b)0 = 1
(a + b)1 = a + b
(a + b)2 = a2 + 2ab + b2
(a + b)3 = a3 + 3a2 b + 3ab2 + b3
..
.
k=n
X n
n
(a + b) = an−k bk
n−k
k=0
Now, this looks very scary. There’s a giant Greek letter in the middle of the page. But it’s not scary. All
we’re going to do in this proof is do the same thing we did with x2 and x3 . The only difference is that
we’re going to use a slightly fancier tool—a giant Σ rather than normal arithmetic.
It is the difference between using a hammer to crush a small stone, and using a jackhammer to tear
apart asphalt. The jackhammer looks huge and expensive and scary, and it runs on gasoline, not adenosine
triphosphate, and you have to borrow it from your cousin in the demolition business. But in principle, it
is no different than the hammer. It operates by exactly the same mechanism: hitting things with a blunt
object. It is the same with the binomial theorem: it looks different, but it’s really just the same thing.
There are some extra things we need to do as a result of its presence (get a permit, fill it with gas, put on
safety glasses, etc.) but the core idea of the proof is identical. We are going to expand (x + h)n , simplify
it, cancel x’s and h’s out, and then send h to zero.
Now. As to the binomial theorem itself. It is only true, of course, for n being some nonnegative
n
integer, like 0, 1, 2, 3 . . . And remember that the n−k is a combination. Informally, we can think of a
combination like ab as being the number of different ways we can select a things from a collection of b

things, with the order being irrelevant. Formally, we define it as

a a!
=
b b!(a − b)!
1
So if we apply this to our problem, we have
k=n
X
n
xn−k hk − xn
(x + h)n − xn n−k
k=0
=
h h
or just
k=n
!
1 X n

n−k k n
= x h −x
h n−k
k=0
Which is not immediately more helpful. In fact, it looks worse. But what you may have noticed from
working out the derivatives of x2 , x3 , and so forth, is that after you multiply out (x + h)2 (or to the 3),
you get an x2 (or x3 term), which then cancels out with the −x2 (or −x3 ) term later on. We want to do
the same thing here. We want to somehow extract an xn term from this sum, such that we can get rid
of the −xn . We can do it in this way: we can partially break up the sum. Meaning: we have this sum,
consisting of n terms from k = 0 to k = n. What if we extract the first term (the term where k = 0?)
Then we would have:
k=n
X n k=n
X n
n
xn−k hk = xn−0 h0 + xn−k hk
n−k n−0 n−k
k=0 | {z } k=1
| {z }
the k=0 term
now our sum needs to start at k=1
k=n
X n
n n−0 0
= x h + xn−k hk
n n−k
k=1
k=n
n
X n
=x + xn−k hk
n−k
k=1
I’ve condensed a whole lot of work into two steps, so stare at this for a while (and try to do the intermediate
steps yourself on scrap paper, if you need to) until you see it. Note that I worked out the combination: nn =

1. You could either do this from the formula, or by realizing that if you have a collection of n objects,
there’s only one way you can pick n of them (if the order doesn’t matter).
If we take what we just learned, and apply it to our derivative formula, we’ll have:
k=n
! !
1 X n
= xn + xn−k hk − xn
h n−k
k=1
And THEN we can cancel out the lonely xn ’s on the right side and get:
k=n
!
1 X n
= xn−k hk
h n−k
k=1
or just
k=n
1X n
= xn−k hk
h n−k
k=1
Then we can bring the 1/h inside of the sum. We can do this because a sum is just a bunch of things
added together, and a giant Σ is like a pair of parentheses around them—we’re just distributing the 1/h
to the inside.
k=n
X n xn−k hk
=
n−k h
k=1
2
But using laws of exponents we can simplify this to:
k=n
X
n
= xn−k hk−1
n−k
k=1
And it’s not entirely clear where we go from here... but wait! What if we try that same fancy trick we
tried before, where we pull out the first term of the sum? We can extract the k = 1 term, and rewrite this
as:
k=n
X n
n n−1 1−1
= x h + xn−k hk−1
n−1 n−k
k=2
n

If we use the formula for n−1 , we can discover that it’s just n.

n n! n! n! n · (n − 1) · (n − 2) · (n − 3) · · ·
= = = = =n
n−1 (n − 1)!(n − (n − 1))! (n − 1)! · 1! (n − 1)! (n − 1) · (n − 2) · (n − 3) · · ·
Which should make sense—if I have a collection of n objects, there are n different ways I can choose n − 1
objects from it—I just systematically exclude one of the objects. So then this becomes:
k=n
X
n−1 0 n
= nx h + xn−k hk−1
n−k
k=2
or just:
k=n
X
n−1 n
= nx + xn−k hk−1
n−k
k=2
ALMOST DONE. See the sum? The sum starts at k = 2, which means that the greatest-powered h inside
the sum will be h2−1 = h1 = h. Meaning that if we were to write this out, we’d get something like:
= nxn−1 + (blah)h + (blah blah)h2 + (blah blah blah)h3 + · · ·
And so then when h gets really small, all of that other stuff will go away! As h goes to 0, it will drag the
rest of the sum with it, down into the depths of nothingness! And all that will be left, when h goes to 0,
is the exact slope of xn , the derivative, or just
= nxn−1
Problem
Strictly speaking, we’ve only proven that (xn )0 = nxn−1 when n is a positive integer (or zero). Our
procedure was based on combinatorics and the binomial theorem, which are all based on the natural
numbers (0, 1, 2, 3, . . . ). (What would it mean to “choose −2 objects from a group of π objects?” What is
7.2 factorial?) There is nothing wrong with this approach; it’s just that we’d like to be able to prove that
(xn )0 = nxn−1 for ALL real-numbered values of n. But what if n is a negative integer? What if we want
to find the derivative of, say, x−1 (i.e., 1/x)? Or x−2 (i.e., 1/x2 )? what if n is a fraction? what if n is an
irrational number? We’d like to prove that:
d π d 3/2 3
(x ) = πxπ−1 (x ) = x1/2
dx dx 2

d −1 d 1 −1
(x ) = = −1x−2 = 2
dx dx x x
3
and so forth.
There is a very beautiful proof for the case of n being any real number that uses the derivatives of
a logarithm. It only takes two or three lines and is very elegant. But you don’t know the derivative of a
logarithm yet, and it takes a bit of work to get there.
On the other hand, it is not too difficult to generalize our binomial-theorem-based proof to negative
d
values of n. Imagine that we want to prove that (xn ) = nxn−1 for n being some negative integer. This
dx
d 1 −n
is the same as proving that = n+1 = −nx−n−1 for some positive integer n. So then, if we use
dx xn x
our definition of a derivative, we’ll have
1 1
(x+h)n − xn
h
which is
1 1 1
n
− n
h (x + h) x
which, if we substitute in the binomial theorem, will become
 
 
1 1 1 
− n
 
h  k=n x

X n 
xn−k hk 
n−k
k=0
For extra credit, finish this proof, write it up, and turn it in on the Monday after Thanksgiving.
(The next step, in parallel to our derivation of the derivatives of 1/x and 1/x2 , is to combine the two
fractions atop a common denominator. This will get messy, but you certainly are able to do it.) When
your obnoxious relatives ask during Thanksgiving dinner what you’re doing in school (“What grade are
you in now? 9th?”), you can pause for a moment, as if to wonder how to explain your work to the laity,
and then say, “I’m trying to use the binomial theorem to differentiate xn for negative n.” And you will
sound like a graduate-student-in-the-making.
If you do write it up, please take the time to write it up nicely—don’t hand in a paper full of
scratch work. Use sentences, explain your methodology, and justify each of your steps in English and in
math. Somewhere on the internet there’s an excellent four-page article entitled “How to Write Math in
Paragraph Style,” by Tim Hsu. Read it if you want an idea of what I mean. Either Google it, or find it at
http://www.math.sjsu.edu/~hsu/
4
Limits
So far, we have been quite good at finding the derivatives of functions by plugging the functions into
Fermat’s difference quotient, rearranging, and then letting h = 0. This is how we found the derivative of
d n
x2 and 1/x; in fact, our proof of the formula x = nxn−1 told us that this general method works for
dx
any rational function1 , insofar as we had to do nothing special about the h in that proof.
So let’s move on to other functions. What about trig functions? We can see, just by sketching it, that
the derivative of sin(x) looks kind of like cos(x) (potentially with a vertical expansion/compression):
But can we calculate it? Presumably it’s not too hard. We’ll just plug it into Fermat’s little machine,
and then see what happens when h gets small:
sin(x + h) − sin(x)
h
But, well, this looks kind of nasty. Unlike with, say, x2 , we can’t “multiply out” the sin(x+h), or distribute
it. It’s certainly not equal to sin(x) + sin(h). But we can simplify it using that formula we proved in trig:
If we apply that here, we’ll get:
sin(x + h) − sin(x) cos(x) sin(h) + sin(x) cos(h) − sin(x)

=
h h
which, if I do some rearranging, is
sin(h) 1 − cos(h)
= cos(x) − sin(x)
h h
But then what do we do? As h gets really small, sin(h)/h gets close to... 0/0? And (1 − cos h)/h also
approaches 0/0? What does that mean? 0/0 is undefined. It isn’t anything. It’s a divide-by-zero error.
Um. This is bad.
Really, really, really bad.
We want to take the derivative of sine. We want to find its slope. But, uh. This is the equivalent of
going to the grocery store and accidentally running over someone with your car. It was just a quick trip
to the store! Nothing bad was supposed to happen! We were supposed to find the derivative of sin(x) as
easily as we found the derivative of x2 !
1
Or polynomial, since polynomials are just a type of rational function.
1
We are faced with a grave problem: we cannot get the sin(h)/h and the (1 − cos h)/h to go away!
We can find the average slope of sine between x and x + h, but if we want to find the exact slope–the
derivative—we get this divide-by-zero error. Both the top and the bottom go to zero. This is bizarre.
What sort of exotic derivative could result? What strange creature could it be? And yet we can see, from
our sketch, that the derivative should not be exotic at all—it should be just cosine, or some variant thereof.
Maybe it’s just that sine doesn’t have a derivative. That it’s weird? That, for some reason, there’s
something about it that prevents us from taking its derivative? But it’s not that weird. It just goes up
and down and up and down and up and down ad infinitum. What could be more boring? Surely it must
have a derivative. I mean, we have a picture of it! The derivative should be cosine, or at least some close
relative of cosine! And yet...
Maybe the problem is more fundamental. Consider Fermat’s difference quotient. It gives us the
average slope of a function between two points, x and x + h. What happens as h gets really small?
f (x + h) − f (x) as h gets f (x) − f (x) 0
−−−−−−−−−−→ =
h gets really small 0 0
WE GET A DIVIDE-BY-ZERO ERROR. Always. Whenever we use Fermat’s difference quotient. This is
not a problem specific to sine. IT WAS THERE ALL ALONG! With the other functions we dealt with,
were able to shove this problem under the carpet, behind the couch, into the closet—with x2 and 1/x
and whatnot we were able to cancel things out and fiddle with it so that we could make h = 0 without
creating any divide-by-zero problems. BUT WE WERE DIVIDING BY ZERO ALL ALONG. Our very
fundamental formulation of a derivative creates a divide-by-zero error. This is bad. Extremely bad. We
can’t make it zero, because then we’d be dividing by zero, and we can’t not make it zero, because then we
wouldn’t have the precise derivative—we’d just have the average slope between two points on the function.
This is the same problem, note, as “how can a single point have a slope?” If h = 0, then we effectively
are finding the slope at not two points x and x + h, but at one point, x.
Clearly we need to better understand what we actually are doing when we say “as h gets small” or “as
h goes to 0”. Obviously we can’t just be plugging in zero. But then what? Can we never take derivatives?
That would be bad. Is everything we know wrong? That, too, would be bad.
Perhaps we should first ask ourselves: what does happen when we divide by zero? Recall that there
are two possibilities.
Usually what happens is that things become bizzare—we get vertical asymptotes. Consider, for
example, 1/x2 .
What’s happening to 1/x2 doing at x = 0? Nothing. It doesn’t exist at 0. But really near x = 0, it’s
getting really big. The closer and closer we get to x = 0, the bigger and bigger 1/x2 gets. It doesn’t really
exist at 0, unless you want to say that’s infinity2 , but we do have a sense of what it’s doing near 0. And
this is the general trend—if, in the course of some function, we have (some number)/0, we get a vertical
asymptote.
But that’s not the only option. Consider, for example:
5x2 − 3x − 2
f (x) =
x−1
2
though that causes some complications if you do
2
This doesn’t exist at x = 1. But it doesn’t have a vertical asymptote at x = 1. Instead, it has a hole,
because if we factor it, we get:
5x2 − 3x − 2 (5x + 2)(x − 1)

f (x) = =
x−1 (x − 1)
What’s going on is that 99% of the time, this function looks exactly like y = 5x + 2. Everywhere other
than x = 1, the (x − 1) factors will cancel out and go away and we’ll just have 5x + 2. But at x = 1, the
(x − 1) factors will give us 0/0—and that’s certainly not 1. 0/0 doesn’t cancel out. It causes the universe
to explode. So we get a hole at that point. At that instant that x = 1, the function blips out of existence,
due to the divide-by-zero error. It wants to exist at x = 1. It should exist at x = 1. We can make it as
close to x = 1 as we want, so long as we don’t make it actually 1.
And this is exactly what is going on with our derivative. sin(h)/h turns into 0/0 when h = 0;
(1 − cos h)/h also turns into 0/0 when h = 0. They both have holes. Which, in a sense, is a good thing.
See, it would be a problem if they had, like, vertical asymptotes, because then what would the derivative
be? infinite? undefined? But the thing about a hole is that... all it does is affect that one tiny point. With
a vertical asymptote, not only is the function undefined at that point, it’s shooting up to infinity (or −∞,
or something) nearby. The divide-by-zero error affects a whole region of the function. But with a hole, we
2 −3x−2
can put it back when no one’s looking, and pretend that it never was a hole. With f (x) = 5x x−1 , we
can just cancel out the (x − 1)’s, get f (x) = 5x + 2, and no one else will know. You’ve read the Inferno,
right? Does Dante assign to a circle of hell the people who fix removable discontinuities in functions (which
is the technical name for “pretending a function with a hole doesn’t have a hole”)? Of course not.
Anyway. Let’s talk about sin(h)/h first. It has a hole at h = 0. It doesn’t exist when h = 0. But it
wants to exist at h = 0. It should exist there. The only question is, what should it be? It wants to be
something at h = 0, but what?
We’re already comitting one sin, so we may as well commit another. Let’s use a calc***tor. If we
graph sin(h)/h, we get something like this:
(Using the calculator is going to be our little secret, OK? Don’t tell your parents about it.) So as h gets
closer and closer to 0, sin(h)/h gets closer and closer to some number. That number, in fact, though I
haven’t labelled it on this graph, is 1. It gets closer and closer to 1. It doesn’t actually reach 1, but it does
get close (infinitesimally close?)
3
Thus, if we’re trying to find the derivative of sin(x)—which we are—we can say that as h gets smaller
and smaller, sin(h)/h gets closer and closer to 1, and so the first half of our derivative will be:
sin(h) 1 − cos(h)
= cos(x) · − sin(x) ·
h h
as h ↓ gets small
cos(x) · 1 − (etc.)
But we still need to figure out what (1 − cos h)/h turns into as h gets really, really small. If we look at a
graph of it, we see:
(1 − cos h)/h does not actually exist when h = 0. But as h gets closer and closer to 0, (1 − cos h)/h
gets closer and closer to 0. So then we can say:
sin(x + h) − sin(x)
(sin x)0 =
h
cos(x) sin(h) + sin(x) cos(h) − sin(x)
=
h
sin(h) 1 − cos(h)
= cos(x) · − sin(x) ·
h h
sin(h) 1 − cos(h)
= cos(x) · − sin(x) ·
h h
↓ as h gets small ↓
cos(x) · 1 − sin(x) · 0
= cos(x)
And so the derivative of sin(x) is just cos(x). !!!

(Of course, this is not really a proof, since it relies on these graphs. But it gives you an idea of
what’s going on when we take derivatives—not just the derivative of sine, but any derivative, since, due to
Fermat’s difference quotient, we will always have this 0/0 problem.)
More generally, though, what is it we are doing here? What is it that we mean when we say “as h
gets really, really small”? What is this value of h that can simultaneously be small enough to be 0, but
not so small that we can’t divide by it?
4
We need to better understand what’s going on here. And in order to do so, we’re going to have to
take a bit of a detour. Well—detour isn’t quite the right word. Because there’s no better way. Ever had
it happen when you’re crossing the Khyber Pass that there’s a rockslide and part of the road is blocked
with boulders? Well, sometimes, you can get around simply by scrambling over the boulders, and then
calling a taxi from Kabul on the other side. But sometimes the rocks are too steep and too unstable. It’s
too dangerous. And so you have to get in your car, put it in reverse, and take an entirely different route.
That’s what we need to do now. We’re still trying to get to this wonderful world where we can see
derivatives in their full glory. But we can’t go up this canyon any further. We’d never make it to the top
of the mesa. We need to go back and take a different route. We need to figure out what’s actually going
on when we take a derivative. And I guess we should start here:
The usual gloss on calculus is that despite all the formulas and the formalism, it’s about just two
ideas: slopes and areas. But in a more fundamental sense, calculus is about just one idea: what happens
when we tangle with the infinite?
Slopes and areas, you see, are just applications of that tangling. We find the slope of a curvy line by
considering it as an infinite number of infinitely-short straight lines; we find the area of a curvy shape by
splitting it up into an infinite number of infinitely-small rectangles. The idea underlying all of this is the
idea of the infinite.
This gives us two interwoven questions:
1. What happens to things as they become infinite?
2. What happens to things as they become infinitesimal?
The somewhat less dangerous way to phrase this—because if you start using the word “infinite,” certain
theologians and philosophers get upset—is:
1. What happens to things as they get really, really big?
2. What happens to things as they get really, really small?
I could ask:
What is the largest number3 smaller than 2?
or analogously:
What is the largest number greater than 2?
These two questions seem different. But they’re not.
Implicitly, we have been talking about this all year. That’s all we talk about with when we talk about
asymptotes. Consider, for instance, 1/x2 . This has both a vertical asymptote (at x = 0) and a horizontal
asymptote (at y = 0):
By “it has a vertical asymptote at x = 0”, all we really mean is:

3
By “number” here I mean “real number.” Had I meant, say, “integer,” the answer would be obvious—1.
5
“as x gets closer and closer to 0, 1/x2 gets bigger and bigger.”
Or, put differently
“as x gets infinitesimally close to 0, 1/x2 gets infinitely big.”
This is not new; this is what we have been talking about all year. But the word a mathematician would
use to describe this would be “limit.” As in:
“the limit of 1/x2 as x approaches 0 is ∞.”
Which we would write using the following notation:

1 1 x
either lim =∞ or: −−−−→ ∞
x→0 x2 x2 0
This is just a more formal notation and name for stuff we’ve been doing all along. (Another way to say
this would be that as x goes to 0, 1/x2 “increases without bound”.)
As to the horizontal asymptote... all we really mean when we say that 1/x2 has a horizontal asymptote
at y = 0 is that as x gets bigger and bigger, 1/x2 gets closer and closer to 0. It might or might not actually
get there at some point, but it wants to be 0. We could say this using a limit, too:
1 1 x
either lim =0 or: −−−−−→ 0
x→∞ x2 x 2 ∞
These two notations (the “lim” and the arrow) are equivalent; depending on the context, one might be
more useful than the other.
Here’s another example. What about lim x3 ? This is just another way of asking, “as x gets closer
x→5
and closer to 5, what does x3 get closer and closer to?” Obviously the answer is 125:
x
lim x3 = 125 or, written differently: x3 −−−−→ 125
x→5 5
Moreover, when x actually is 5, x3 actually is 125. But that last part is irrelevent—the limit doesn’t care
about what the function actually is at that point. It cares about what the function wants to be at that
point. Which might or might not be the same as what it actually is. x3 wants to be 125 when x = 5, and
it actually is 125 when x = 5. It is successful in its quest.
Contrast this with the function
x3 (x − 5)
f (x) =
x−5
This looks almost exactly the same as x3 . They are identical in every respect—except that, at x = 5, this
function doesn’t exist. It just blips out of existence for a moment. It has a hole at x = 5. At every other
value of x, the x − 5 on top and the x − 5 on the bottom will cancel out and the function will look like
x3 , except at x = 5, because then we’ll have 0/0, and that certainly doesn’t cancel out. It’s not equal to
1—it’s not equal to anything. It’s undefined.
However, when we’re really close to x = 5, the function still looks like x3 . And so the closer and closer
we get to x = 5, the closer and closer this function will get to being 125. It never actually is 125—it has a
hiccup, and so just skips over that point. But it wants to be 125. It wants to be 125 so, so, so badly. And
so, just like with x3 , the limit of this function as x approaches 5 is 125:
x3 (x − 5) x3 (x − 5) x
lim = 125 or, written differently: −−−−→ 125
x→5 x−5 x−5 5
Appropos of x3 : what’s the limit of x3 as x goes to ∞? Put differently, as x gets bigger and bigger
and bigger, what happens to x3 ? Obviously, x3 also gets bigger and bigger. It spikes up to ∞. We know
6
this already. Back when we were talking about polynomials, we called this the “end behavior.” But we
can state it just as well with limits:
x
lim x3 = ∞ written differently: x3 −−−−−→ ∞
x→∞ ∞
Again, none of this is new. It’s just applying the name “limit” and giving some new symbols for
things we’ve already talked about.
Here’s another example. Consider the end behavior of a rational function (which we called, for the
most part, an “end asymptote”). We could formulate our entire theory of end aymptotes in this language
of limits. We know that if we have a rational function, its end asymptote will be the ratio of the leading
terms. So imagine we have a rational function composed of an n-degree polynomial on top and m-degree
polynomial on the bottom:
an xn + an−1 xn−1 + an−2 xn−2 + · · ·
bm xm + bm−1 xm−1 + bm−2 xm−2 + · · ·
Then we know it will have an end asymptote at
an xn
y=
bm xm
Or, put differently: as x gets bigger and bigger, this function will get closer and closer to
an xn
y=
bm xm
Using the notation of a limit, we could say this as:
an xn + an−1 xn−1 + an−2 xn−2 + · · · an xn

an n−m
lim m m−1 m−2
= m
= x
x→±∞ bm x + bm−1 x + bm−2 x + ··· bm x bm
Or, using the other limit notation, that
an xn + an−1 xn−1 + an−2 xn−2 + · · · x an xn an n−m
−−−−−−→ = x
bm xm + bm−1 xm−1 + bm−2 xm−2 + · · · ±∞ bm x m bm
(Note that here I’ve really compressed two limits into one—I’ve said that this is true either as x goes to
+∞ or −∞.)
Oh! Here’s something else. Consider the function 1/x:
What happens to this function as x goes to 0? Put differently, what is:

1
lim =???
x→0 x
On the one hand, it’s spiking up to +∞ on the right side. But on the other hand, it’s also dropping down
into the abyss of −∞ on the left side. Which is it? Do we have to choose?4 Can we say that the limit is
4
Note that this is why I was using 1/x2 as the example earlier—it doesn’t have this problem.
7
±∞? If we did that, we’d no longer be able to consider a limit as being a function5 . Let’s deal with this
by making the distinction between:
1. getting really, really close to a certain x value from the left side, and
2. getting really, really close to a certain x value from the right side.
In this case, as we get really, really close to x = 0 from the left side, 1/x gets really, really negative (it
goes down to −∞). But as we get really, really close to x = 0 from the right side, 1/x gets really, really
big (it goes to ∞). So let’s make the distinction in this way:
• When we get really, really close to an x-value from the left side, let’s call that the left-handed
limit, and symbolize it with a little superscript negative sign (because the left side is the negative
side of the graph). So in this case:
1
lim = −∞
x→0− x
• When we get really, really close to an x-value from the right side, let’s call that the right-handed
limit, and symbolize it with a little superscript positive sign (because the right side of the graph is
the positive side). So in this case:
1
lim = +∞
x→0 x
+
Of course, in many cases the left- and right-handed limits will be the same, and in those cases it makes
sense to talk about the limit, without differentiating between the chirality6 .
Anyway, there are a bunch of problems at the end of these notes that will give you practice computing
limits. I’ll leave you with one last example of a limit, drawn from a different field of mathematics. See if
you can figure out what’s going on in this picture:
Anyway. Obviously none of what we have been doing has been particularly formal. “As x gets closer
and closer to 5, x3 gets closer and closer to 125.” “As x gets bigger and bigger, 1/x gets smaller and
smaller.” What does all of this mean? “Closer and closer”? “Bigger and bigger”? This does not seem
particularly mathematical. It sounds qualitative and subjective, not rigorous and logical. This is a valid
criticism. However, this reasoning can be made logical and rigorous. It is not fundamentally unsound. We
will make it logical and rigorous. But that is for next time.
(For a quick historical overview of the history of calculus—which is essentially the history of the resolution
of the problems we’ve described here—read “Infinitesimally Yours,” by Jim Holt, The New York Review
of Books, May 20, 1999.)
5
Which, note, it basically is—I plug a function into this “take the limit as x goes to blah” function, and get some answer.
6
Oh man. Awesome word. It just means “handedness,” essentially. Which clearly comes from the Greek. Also, did I not
just rock that pun with “differentiating”?
8
Problems
Evaluate the following limits:
1. lim 1/x 19. lim tan(x) x2 + 11x + 18

x→∞ x→0 35. lim
x→−9 x2 − 4
2. lim 1/x 20. lim tan(x)
x→−∞ x→.5π + x2 + 11x + 18
36. lim
3. lim 1/x
x→+∞ x2 − 4
21. lim tan(x)
x→2 x→.5π −
x2 + 11x + 18
2 37. lim
4. lim 1/x 22. lim tan(x) x→−∞ x2 − 4
x→0 x→.25π
5 (x − 4)(x − 5)
5. lim 1/x2 23. lim 38. lim
x→∞
x→3 (x − 1)2
x→−∞ (x − 4)2
6. lim 1/x2 5 (x − 4)(x − 5)
x→−∞ 24. lim 39. lim
x→1 (x − 1)2
x→+∞ (x − 4)2
7. lim 1/x2
x→7 5 (x − 4)(x − 5)
25. lim 40. lim
x→0 (x − 1)2 x→4 (x − 4)2
5x2 − 3x − 2
8. lim
x→7 x−1 x+4 (x − 4)(x − 5)
26. lim 41. lim
x→0 (x + 3)3 x→5 (x − 4)2
5x2 − 3x − 2
9. lim
x→1 x−1 x+4 (x − 4)(x − 5)
27. lim 42. lim
x→−3− (x + 3)3 x→12 (x − 4)2
5x2 − 3x − 2
10. lim
x→∞ x−1 x+4 43. lim 2x + 1
28. lim x→3
5x2 − 3x − 2 x→−3+ (x + 3)3
11. lim 44. lim 2x + 1
x→∞
x→−∞ x−1 x+4
29. lim
x→−4 (x + 3)3 45. lim 2x + 1
12. lim ex x→−∞
x→−∞
x+4 46. lim 1/x
13. lim ex 30. lim
x→∞
x→∞ (x + 3)3 x→0+
x+4 47. lim 1/x

14. lim ex 31. lim x→0−
x→0 x→−∞ (x + 3)3 √
15. lim e x 48. lim x
x→3 x2 + 11x + 18 x→0−
32. lim √
2 x→−4 x2 − 4 49. lim x
16. lim x x→0+
x→8
x2 + 11x + 18 √
2 33. lim 50. lim x
17. lim x x→2 x2 − 4 x→9
x→−∞
x2 + 11x + 18 (x + h)2 − x2
18. lim x2 34. lim 51. lim
x→−12 x→−2 x2 − 4 h→o h
9
∀ > 0 ∃δ > 0 (|x − c| < δ ⇒ |f (x) − L| < )
A Brief History of The Calculus:

1. Late 1660s. Isaac Newton (1643-1727) and Gottfried Leibniz (1646-1716) invent calculus, more-or-
less simultaneously. Though their methods have some differences, they are both essentially based
on the idea of the infinitesimal—of making the h in Fermat’s difference quotient infinitely small.
Calculus immediately proves to be immensely useful, but there is nervousness over the question of
“aren’t we just dividing by zero?”. The philosopher George Berkeley (1685-1783), better known for
his metaphysical positions, writes an entire treatise (c. 1734) attacking the foundations of calculus,
famously calling infinitesimals “ghosts of departed quantities.”
(a) The infinitesimal method has the added complication of involving the concept of “infinity,”
which brings in all sorts of philosophical and theological concerns. The prevailing belief until
the end of the 19th century is that anything involving “infinity” is ipso facto divine and thus
intractable. Infinitesimals, the belief is, do not exist.
2. The early-to-mid 19th century. After 150-plus years of foundational uncertainty and embarrassment,
calculus is finally saved when Augustin-Louis Cauchy (1789-1857), Bernhard Bolzano (1781-1848),
and Karl Weierstrass (1815-1897), among others, develop the concept of a limit and give it a rigorous
definition. Nowhere in their formulation of a limit do they use the idea of an infinitesimal—their
limit only involves variables becoming “closer and closer” or “arbitrarily close” to points. They then
define the derivative using a limit, and calculus is saved.
3. Mathematicians move on to other interesting questions. But in the late 1950s, UCLA mathematician
Abraham Robinson (1918-1974) shows that infinitesimal numbers do actually exist—that in the same
way that we can add on to the integers to make the rational numbers, or add on to the real numbers to
make the complex numbers, we can add on infinitesimals to the real numbers (to make the hyperreals).
And then we can do calculus just in the way that Newton and Leibniz intended. (The actual proof
of the existence of infinitesimals is quite elaborate, and requires quite a bit of modern mathematics.)
Some More Background

Let’s remind ourselves of where we are. We wanted to find a way of finding the slopes of smooth
curves. So we derived Fermat’s difference quotient (f (x + h) − f (x) /h), and then said that as h gets
smaller, the difference quotient (the approximate slope) approaches the derivative (the exact slope). But it
wasn’t quite clear what we meant by “as h gets smaller,” because we were basically just plugging in 0 for h.
And we soon realized that this was quite terrible, because it meant we were dividing by zero. So we tried
to fix this by elaborating on what we meant by “as h gets smaller.” We generalized this by considering
what a function f (x) gets close to as x gets close to something else, and we called this a “limit.” Thus, we
decided that we could formalize this concept of a derivative by including a limit in it; i.e., if we were to
take the limit of the difference quotient, we would have the derivative:
f (x + h) − f (x)
f 0 (x) = lim
h→0 h
But there is a problem with all this. Namely, we still don’t know what a limit is. I mean, we “know”
what it is, in an intuitive sense—lim f (x) is what f (x) gets close to, as x gets close to c—but this is not a
x→c
particularly rigorous/formal/mathematical definition. We are still saying “close to,” and what does that
mean???
1
In a very serious way, then, we are in no better a place than we were when we realized that our
procedure for finding a derivative involves dividing by zero. We haven’t actually improved our situation.
We have a better grasp of what goes on when we take a derivative (the difference quotient has a hole at
h = 0, and so we find the presumptive y-value of the hole), but all we’ve really done to fix this problem—to
find the y-value of the hole—is to make up this nonsense about “closer and closer”. We’ve tried to talk
around the problem—we’ve said “well, h isn’t actually zero, blah blah, but it gets close to 0, and then the
difference quotient gets close to the derivative”—but this is really no different than what we started our
investigation of derivatives by saying. We are still saying that “h gets closer to 0,” and it is totally unclear
what this actually means, in a real, mathematical sense. It is totally unclear whether we actually can do
this, or whether, like some interlocutor in a Socratic dialogue, we have simply been building a grander and
grander rhetorical edifice in top of something that is, fundamentally, sophistry. We’ve given a fancy name
to this procedure—we’ve called it a “limit”—but we don’t actually understand it. Put differently: we still
don’t know that all of calculus isn’t a lie.
We would like to have some certainty. We’d like to know whether calculus works or not (and if it
works, why it works). As it turns out, calculus does work. There are two ways to explain it. One way is
to say that
1. infinitesimal numbers exist,

2. the h in the difference quotient is an infinitesimal number, and
3. thus, calculus works.
The other way is to say that
1. limits exist,
2. the derivative is an example of a limit (the limit of F.’s D.Q. as h → 0), and
3. thus, calculus works.
The second method is the method we have already begun developing, and which we will continue. This
is the method of the great 19th century mathematicians—Cauchy, Bolzano, Dedekind, Weierstrass—who
solved the 200-year-old problem of calculus’s foundations, and saved it from metaphysical gloom. We will
follow their method, not because it is neccesarily better or worse than the first method—in fact, I actually
believe that the first method—the method of developing calculus through infinitesimals—by the way, the
formal name for that is “non-standard analysis” (because “analysis” is the mathy word for “calculus”—I
actually believe that method is somewhat superior—we will develop calculus using the method of limits,
because
1. Nearly every other calculus class uses that method, as does every advanced “real analysis” class.
Thus, when you go on to take more calculus classes in college, you’ll be familiar with the method of
-δ proofs, and already good at them.
2. To actually prove that infinitesimal numbers exist requires a substantial amount of mathematics—far
more than what you guys know. However, to prove that limits exist is not as onerous—you are more
than capable of understanding them. Thus, you can actually understand, for yourselves, how this
remarkable problem was solved, whereas with the infinitesimal method, you’d have to be content
with me assuring you that infinitesimals do, in fact, exist.
3. That said, to actually understand the deep, foundational, mathematical machinery of a limit (i.e.,
why and how limits actually exist) is not particularly easy. The level of abstraction involved in the
formal definition of a limit is substantial. It will be difficult. It will be challenging. And this will be
good for you—it will be a brutal mathematical workout, from which you will emerge much stronger.
I guess there is actually a third way of developing calculus. Namely, we could
1. pretend that the h-gets-smaller/division-by-zero problem isn’t actually a problem, and

2. just continue finding formulas for derivatives and whatnot.
2
This is how we started doing calculus. And we were able to get pretty far—we found a formula for the
derivative of any polynomial, and we had an educated guess about the formulas for the derivative of sine
and cosine. In fact, this is how people did calculus for 200 years. They pretended that these very basic,
very foundational problems weren’t actually problems, and they plowed ahead, to much success. So in a
sense, whether we develop calculus using infinitesimals or limits or sophistry is irrelevant—because when
we actually do calculus, when we actually find formulas for derivatives and integrals and do word problems
and whatnot, none of this will actually come into play. It is, in a sense, useless.
But we are not concerned with utility. We are concerned with truth—with the truth of the calculus.
What We Talk About When We Talk About Limits

What do we mean when we say1 :
x
lim f (x) = L or equivalently: f (x) −−−−→ L
x→c c
We mean:
“As x gets closer and closer to c,
f (x) gets closer and closer to L”
or, put differently:
“The closer that x gets to c,
the closer that f (x) gets to L.”
or, yet differently:
“We can make f (x) as close to L as we want,
simply by making x close enough to c.”
The important thing is, we’re never saying that x is infinitesimally close to c—we’re never using this
idea of the infinite. If you’re a 19th century mathematician, this is very important. But can we make
this more mathematical? Let’s start with the third definition. In many ways this is the worst of the lot,
because we have agency in there (will the limit still exist if “we” aren’t around to take it?) and desire
(why should we “want” f (x) to be anything?).
We can note that this definition really has two parts. In order for something to be a limit, we need
1. to know how close we want f (x) to be to L, and
2. determine how close x must be to c to ensure this.
But how can we say this more mathematically? We are still saying “close,” which is vague. Can we measure
it? How do we measure closeness? Imagine the following dialogue:
Ezra: I live close to the Metropolitan Museum of Art.
Cornelius: That’s kind of vague. How close?
Ezra: Really close.
Cornelius: Well, I live “really close” to the sun, compared to someone who lives in another galaxy.
Seriously, how close are you to the Met?
Ezra: Three blocks.
We measure closeness—i.e., distances—with numbers! In particular, we usually measure them with
real numbers (since we think of distance as being a continual thing—distance (usually) doesn’t jump in
increments, like the integers do), and we usually have those real numbers be nonnegative (since if we’re
just talking about distance, we don’t have a sense of direction, and thus negatives). Positive real numbers
(or 0). That’s how we measure distance.
So let’s rephrase this, however awkwardly:
1
Where do I put the question mark? I’ve asked a question, but I can’t put the question mark until the question is finished.
Do I put it beneath the equations? to the right side of them? do I do the Spanish thing and start with an upside-down
question mark???
3
No matter what positive real number we use in specifying how close we want f (x) to be to L,
we must be able to find how close x must be to c (measured as a positive real number),
in order to ensure that f (x) is that close to L.
But let’s be a bit more clear here:

• The small positive real number used in measuring how close we want f (x) to be to L—let’s call that
.
• And the small positive real number used in measuring how close x needs to be to c—let’s call that δ.
Presumably and δ are related to each other—presumably, as δ gets smaller, will get smaller, too.
If we want to be smaller, we’ll need to have a smaller δ. And, of course, could be anything—we could
want f (x) to be within 10 units of L, within 2.3 units of L, or within 1/1, 000, 000 of a unit of L. Whatever
that distance is, is .
So we mean:
For every positive real number ,
there is a positive real number δ (that depends on ), such that
if x is within δ of c, then f (x) is within of L.
But again, can we phrase this more mathematically? How can we say “within such-and-such-distance of
such-and-such” in a more formal way?
“f (x) is within of L” means that
the distance between f (x) and L is less than ,
which means that f (x) − L < ,
except when we talk about distance, we don’t care about direction, and so
we’ll take an absolute value: |f (x) − L| <
Likewise...
“x is within δ of c” means that
the distance between x and c is less than δ,
which means that x − c < δ,
except with distance we don’t care about direction, so let’s absolute-value
that: |x − c| < δ
So then, if we clean up our definition some more, what we mean by lim f (x) = L is:
x→c
For every positive real number , there is a positive real number δ (that depends on ), such
that if |x − c| < δ, then |f (x) − L| <
4
Put differently:
no matter how close we wantf (x) to be to L so that f (x) actually is that close to L
z }| { z }| {
For every positive real number , there
| is a
{znumber δ
} , such that if |x − c| < δ, then |f (x) − L| <
we can find how close x must be to c
Let us state this formally:
The Epsilon-Delta Definition of a Limit
The limit of f (x) as x approaches c is L if, for every positive

real number , there is a corresponding (positive real) number
δ such that the following is true:
if | x − c | < δ, then | f (x) − L | <
And we write this as

x
lim f (x) = L or f (x) −−−−→ L
x→c c
Here are a couple comments about this definition:

• First of all, it’s hard. The -δ definition of a limit, and resulting -δ proofs are easily the hardest
topic in first-year calculus—in fact, they are often skipped altogether (the “let’s pretend dividing by
zero isn’t a problem” approach to calculus). The definition is abstract, abstruse, and hard to feel.
I spent two pages (and an hour in class) simply explaining where the definition comes from. Most
of the time, when we define mathematical objects, we just throw the definitions out there, because
they’re easy to understand. For instance, we define a rational function f (x) as
p(x)
f (x) = , where p(x) and q(x) are both polynomials.
q(x)
That’s totally simple. All this junk about epsilons and deltas... much less so.
• Note that this definition only works for a limit in which x is approaching a finite number, and f (x)
is also approaching a finite number. If we had a limit either to or from ±∞, or where the function
approaches ±∞, we’d have to make a slightly different definition. Because in those cases, we wouldn’t
have x “getting closer and closer” to infinity; strictly speaking, what we’d have would be x “getting
bigger and bigger.” So we’d need to modify things a bit.
• Obviously and δ, while they represent numbers, they are not explicit numbers per se. could be
anything, and consequently, so could δ. We want to simultaneously consider all the possible values
of —to account for letting f (x) be within 2 units of L, and 15 units of L, and 1/3 of a unit of L,
and one-twenty-three-millionth of a unit of L—all at once. That we discuss and δ and not specific
numbers is just one of the many layers of abstraction in this definition and these proofs.
• The other really weird thing about this definition is that (to use a mathy word) it’s non-constructive.
Meaning: it tells you what a limit is, but it doesn’t tell you how to find it. If we want to find a
limit, we still have to use the ad hoc methods we’ve been using all along. And then this definition
can tell us whether the things we think are limits actually are limits. It is like having a calculator
that doesn’t tell you what 200 · 374 is, but will tell you whether 74, 800 is the correct answer or not.
(A calculator that knows the answer, and will tell you if your answer is correct, but won’t help you
come up with the answer.)
As an analogy, compare these two ways of defining the number e:
5
1. e is the number such that the function ex is its own derivative (i.e., that (ex )0 = ex ).
∞
X 1 1 1 1 1 1
2. e = = + + + + + ···
n! 0! 1! 2! 3! 4!
n=0
The second way2 tells you how to actually compute e as a decimal; the first way doesn’t. Put
differently, the second way is an explicit definition (a construction); the first way is an implicit
definition. (This is not a perfect analogy and a mathematician might quibble a bit, but it gets the
idea across.)
Having defined a limit, then, our next question is: can we use this definition to show that our intuitive
calculations of limits are actually correct? that what we think are limits actually are limits, in accordance
with this definition? Can we use this definition to show that lim 3x+1 actually does equal 16 (for example),
x→5
(x + h)2 − x2
or more importantly (for example) to show that lim actually does equal 2x?
h→0 h
As an analogy: we feel3 , intuitively, that murder is wrong. And we we write laws saying that if you
murder someone, you go to jail. If you’re a judge, you can’t convict4 a murderer because you “feel” they
did something wrong. You can only convict them because they violated the law. We might base the law
in intuition, but for the sake of consistency and due process, we must formalize it.
Likewise with a limit. We’ve developed, over the past few weeks, a strong intuition for what a limit is.
We can compute them pretty well. But we want to formalize that: we want to put our intuition down on
paper, and make it clear enough so that a computer, or a robot, (or a judge) with no sense of what “closer
and closer” or “approaches” means could still understand a limit. We want to make it mathematically
valid. We want to find a way of giving it a logical basis, because only if we do that can we be sure that limits
actually work/exist, and that they’re not sophistical, and thus, that derivatives and the rest of calculus all
work/exist, too. And we want to not just say “Here it is! This is the logical basis of a limit!” We want to
connect it to our intuition—to show that it matches up with what our intuition for a limit is.
Example
What if we have lim 3x + 1? Obviously, we know this limit is just 7... but do we? how do we really
x→2
know that the limit is 7? Can we prove this? Using our definition of a limit?
Because here is the thing: it is our obvious intuition that the limit is 7. But we want to be totally
formal and totally symbolic. Not being formal/rigorous is what got us into this mess in the first place—it
was a lack of formality that had us happily dividing by zero to take all those derivatives, and convinced
that this was fine.
So if we want to prove that the limit is 7, we’ll need to show that the statement “ lim 3x + 1 = 7”
x→1
actually is in conformity/compliance with our rigorous, mathematical definition of a limit. Meaning we’ll
need to show that there does actually exist a number δ, such that for every possible positive real number
, it is the case that if |x − 2| < δ, then |(3x + 1) − 7| < .
2
Note that 0! = 1, so the first term isn’t undefined or anything.
3
Meaning, “most of us.”
4
Granted, capital cases in the US are tried by jury. But some in some other countries (e.g., Japan) they are tried by judges.
In any case, juries have to apply the law, too, rather than “feelings.”
6
So in principle, all we need to do is show that if x is within δ of 2, then 3x + 1 is within of 6.
Put differently, we need to start with the statement |x − 2| < δ and algebraically manipulate it to get
|3x + 1 − 7| < :
|x − c| <δ (start)
|x − 2| <δ (substituting)
etc.
ALGEBRA MAGIC GOES HERE!
etc.
|3x + 1 − 7| <
|f (x) − L| < (finished!)
Here is the difficulty: we start with δ on the right side of the inequality, but then at the end we need to
end up with on the right side. How do we get from δ to ? What is the relationship between δ and ?
That’s the algebraic way of putting that question; put graphically, this is the same as asking: if 3x + 1 is
within of 6, how close must x be to 2? (Because the distance from x to 2 is, by definition, δ.) Maybe the
distance between x and 2 has to be a third of the distance between 3x + 1 and 6 (i.e., maybe δ = /3). Or
maybe we’ll be okay if the distance between x and 2 is less than twice the distance between 3x + 1 and
2 (i.e., if δ = 2). We don’t know. We could guess, and then just try to do the proof for various random
values of δ, but this wouldn’t be particularly efficient.
So here’s a method we can use, at least for relatively simply functions, to make an educated guess
about the relationship between and δ. What if we, in a sense, work backwards? We want to prove
|x − 2| < δ ⇒ |3x + 1 − 7| < ; what if we start with |3x + 1 − 7| < and try to manipulate it to get it to
look like |x − 2| < (epsilon and stuff)? Then we could compare it to |x − 2| < δ, set δ = (epsilon and stuff),
and see if that works in our proof.
|f (x) − L| <
|3x + 1 − 7| < (substituting)
|3x − 6| <
|3(x − 2)| <
|3| · |x − 2| < (can factor out of abs. value)
7
3|x − 2| <
|x − 2| < /3
which looks a lot like:
|x − 2| <δ
We end up with |x − 2| < /3, which looks a lot like |x − 2| < δ! So perhaps the relationship between
and δ is this: perhaps δ = /3. So this is suggesting (suggesting, not proving) that if we want 3x + 1 to be
within units of 6, then x needs to be within /3 units of 2.
Put differently,
• if we want the function to be within 3 units of 7 (between 4 and 10), then x has to be within 1 unit
of 2;
• if we want the function to be within 1 units of 7 (between 6 and 8), then x has to be within 1/3 of a
unit of 2 (between 1 and two-thirds and 2 and a third);
• if we want the function to be within 1/10 of a unit of 7, then x has to be within 1/30 of a unit of 2.
Let’s see if this is actually true—let’s see if we can prove that if x is within /3 units of 2, then 3x + 1 will
be within units of 6. If we can do so, then we’ll prove that no matter how close we want 3x + 1 to be to
6, we can make x close enough to 2 to make that the case. (We can make this jump because could be
any number—hence “no matter how close...”)
|x − 2| < /3 (if δ = /3)
3|x − 2| < (multiplying)
|3(x − 2)| < (can distribute into abs. value)
|3x − 6| < (distributing)
|3x + 1 − 7| < (because −6 = 1 − 7)
8
YAY!!!! So the limit as x approaches 2 of 3x + 1 is 7!!!
Here’s a question: what if δ were even smaller than /3? Meaning: we’ve shown that if x is within
/3 of 2, then 3x + 1 is within of 7. But what if x were within, say, /18 of 2? what if it were even closer
to 2?
Well, presumably if x is even closer to 2, then 3x+1 will certainly still be within of 7. Put differently:
if we make the distance between x and 2 really really really small, presumably f (x) will still be within the
normal distance away from L. (“Within” and the inequality are the key concepts here—we’re not saying
that 3x + 1 has to be exactly away from 7; we’re saying that it can be at most away from 7, but if it’s
closer, more the better.)
We can prove this:
|x − 2| < /18 (if δ = /18)
18|x − 2| < (multiplying)
6 · 3|x − 2| < (because 18 = 6 · 3)
6 · |3(x − 2)| < (distributing the 3)
6 · |3x − 6| < (distributing more)
6 · |3x + 1 − 7| < (because −6 = 1 − 7)
|3x + 1 − 7| < /6 (dividing 6)
|3x + 1 − 7| < /6 < (because /6 is certainly smaller than )
|3x + 1 − 7| < (getting rid of middle (“transitivity”))
Yay! So, /3 is sort of like an upper bound for δ—it’s the largest that can be and still have 3x + 1 be
within of 6. If we were to try this with, say, δ = /2, it wouldn’t work—3x + 1 might be further than
9
away from 6. But it certainly can be closer.
You might think, “aren’t we doing the same thing twice here? didn’t we just do all of this, but
backwards?” In a sense, this is true. But the direction in which we do this proof is very important. Here’s
what I mean by that: essentially, these types of very simple -δ proofs have two steps. First, you find
a relationship between and δ; second, you prove that it works. Why isn’t the first step sufficient for
showing that it works?
Because as far as logic goes, direction does matter in if-then statements. Consider the two statements:
If I kill someone, then I will go to jail.
and
If I go to jail, then I killed someone.
These are not equivalent! The former is (hopefully) true; the latter, false. There are other reasons I might
go to jail. I might have committed tax fraud. Consider another example:
If I’m in Ithaca, then it’s cold outside.
and
If it’s cold outside, then I’m in Ithaca.
Again, these are clearly not the same statement—one is always true, and the other, always false. (Ithaca
is not the only place that is cold. Chicago is cold, too. And Archangel5 .) So in the case of our -δ proofs,
we are trying to prove
if |x − c| < δ, then |f (x) − L| < .
which is NOT the same as proving that
if |f (x) − L| < , then if |x − c| < δ.
Trying to use the fact that |f (x) − L| < ⇒ |x − c| < δ to prove that |x − c| < δ ⇒ |f (x) − L| < is like
saying “It is cold outside; therefore, I am in Ithaca!” or “I am in jail; therefore, I killed someone!” It does
not logically follow. (Remember that the double arrow ⇒ is the way of writing if-then (a conditional or
implication) in logical notation.)
You might also want to remember the truth-table for implication:
A B A⇒B
T T T
F T T
T F F
F F T
Weird Example
What if we want to prove that lim 4? Obviously this is just equal to 4. The function f (x) = 4 is
x→2
shockingly boring—no matter what x is, it’s always 4.
5
the city in northern Russia, where that scene from Frankenstein takes place
10
But how do we prove that lim 4 = 4? The first step, of course, is finding a relationship between and
x→2
δ—of finding out, if we want f (x) to be within of 4, how close x has to be to 2.
But obviously, x can be as close or far away from 2 as we like. No matter how close x is to 2, f (x)
will always be within of 4—because f (x) always is 4. Put differently: δ can be absolutely anything. It
doesn’t matter what δ is.
How do we show this algebraically? Well, we know, by our construction of the -δ definition, that
must be greater than 0. (Part of the definition is that is a positive real number.) But if 0 < , then surely
|0| < (because |0| = 0)

|4 − 4| < (because 4 − 4 = 0)
|f (x) − L| <
We’ve proven it—surprisingly quickly! We never even had to use any knowledge of x or c or δ. Because
regardless of what x and c and δ are, f (x) − L will always be less than . Because f (x) − L is always 0,
and has to be a positive number. So then |f (x) − L| = 0 < . Which is just what we showed up there.
If you’re skeptical about our lack of inclusion of |x − c| < δ, think about the logic6 . We want to show
that |x − c| < δ ⇒ |f (x) − L| < , and the truth table for implication (“if-then”) looks like this:
6
Logic in the real, LOGICAL sense—not in the sense of, “Um, Mr. Alexander, can’t we just do this problem using logic?
Because it, like, makes sense. So it’s logical. Right?”
11
|x − c| < δ
|x − c| < δ |f (x) − L| < ⇒ |f (x) − L| <
T T T
F T T
T F F
F F T
If |f (x) − L| < is true, then |x − c| < δ ⇒ |f (x) − L| < will always be true, regardless of whether
|x − c| is less than δ or not. And |f (x) − L| < is true. That’s what we showed with algebra above. So
it is true that |x − c| < δ ⇒ |f (x) − L| < ; thus, we have satisfied the definition of a limit, and thus,
lim 4 = 4.
x→2
Weirder Example
TO COME
We Mean Multiple Things When We Talk About Limits

Earlier, I used a legal analogy to describe the -δ definition of a limit and -δ proofs. I said that the
definition was an attempt to formalize our intuition—to give limits a solid basis outside of our human
minds, in the same way that we codify laws in order to be consistent and give due process.
Now, occasionally, we might feel that the law doesn’t accurately represent our intuition—that there
is some loophole through which “bad guys” are, legally, doing the wrong thing. So then we might change
the law such that the loophole is closed. Then what they are doing is illegal, and then they will go to jail.
But even if we do that, we still have to rely on the law. We might be basing the law in intuition, but for
the sake of consistency and due process, we must formalize it7 .
Allow me to assert, then, that our current -δ definition of a limit does not capture our complete
intuition about what we mean by a limit. Consider, for instance:
lim f (x) = ∞
x→c
(Such as in the case of a vertical asymptote) What do we mean when we say this? We do not mean literally
the same thing as in the case of lim f (x) = L; we do not mean “as x gets closer and closer to 0, f (x) gets
x→c
closer and closer to infinity.” It is somewhat oxymoronic to talk about something getting “closer” to an
infinite magnitude. Previously, we have only discussed the cases of limits approaching finite numbers and
becoming finite numbers; what if we have a function that (again, using the math terminology), “increases
without bound”? What we mean is simply:
“as x gets closer and closer to 0,

f (x) gets bigger and bigger.”
Or, put differently:
“I can make f (x) as big as I want,

simply by making x close enough to c.”
But, as before, we can measure distance (i.e., “big”ness and “close”ness) using a number!
7
Of course—and this continues the analogy with morality—there are plenty of situations in which we don’t have intuition
about how things should be. If we are standing on a bridge next to a fat person, and see, below the bridge, a train speeding
out of control towards a half-dozen unsuspecting railworkers, should we push the fat guy off the bridge so that he falls onto
the switch and diverts the runaway train onto a different track, saving the lives of the railworkers but losing the life of the fat
guy? It is not clear. In this sort of situation we might want to revert to the formalism to develop our intuition—e.g., “I don’t
know, intuitively, what is right and wrong in this situation, but if we apply the laws...”
12
“No matter what positive number I use in specifying how big I want f (x) to be,
I can find how close x has to be to c (measured as a positive number)
such that f (x) is, in fact, actually that big.”
See a pattern here in our definition-construction? Let’s be slightly clearer:
• The positive real number used in measuring how big we want f (x) to be—let’s call that M .
• And the small positive real number used in measuring how close x needs to be to c—let’s call that δ.
So then we can say:
For every positive real number M ,

there is a positive real number δ (that depends on M ), such that
if x is within δ of c, then f (x) is greater than M .
But again, let’s phrase this more mathematically.

“f (x) is greater than M ” means that f (x) > M
And...
“x is within δ of c” means that
the distance between x and c is less than δ,
which means that |x − c| < δ.
So then what we mean by lim f (x) = ∞ is:

x→c
For every positive real number M , there is a positive real number δ (that depends on M ), such
that if |x − c| < δ, then f (x) > M .
Put differently:
no matter how big we wantf (x) to be so that f (x) is at least that big
z }| { z }| {
For every positive real number M , there
| is a
{znumber δ
} , such that if |x − c| < δ, then f (x) > M
we can find how close x must be to c
We could come up with analogous definitions for lim f (x) = L and lim f (x) = ∞. In fact, you’ll do just
x→∞ x→∞
that in the homework. In fact, you could come up with a whole table of various limit definitions for the
various possible limits of x and f (x):
f (x) → finite f (x) → infinite
x → finite
.
x → infinite
.
13
Problems
For the following problems:
1. Find the value of the limit.

2. Sketch the function, being sure to label the limit point (on the x) and a little δ-neighborhood around
it, and the limit value (on the y) and a little -neighborhood around it.
3. Write down the -δ definition of a limit. Seriously. I know it’s on this sheet, but you’ll need to have
more-or-less memorized, so you may as well practice.
4. Using the -δ definition, find a feasible relationship between and δ. (Graphically, does this make
sense? does it seem, based on your sketch of the situation, that the relationship should be so?)
5. Then do the actual proof: show that if | x − c | < δ, then | f (x) − L | < . You can do this by
starting with | x − c | < δ and then, via substitution and algebraic manipulation (possibly painful
manipulation, and keeping in mind the various properties of absolute values that you’ll need), ending
up with | f (x) − L | < .
1. lim (3x − 2) simple linear functions. Epsilon-delta proofs

x→3
get quite a bit tougher as the functions get
2. lim (4x + 2) more interesting. But, if you do an -δ proof
x→1
for this function, you’ve basically done it for
3. lim (x) every other linear function (and you can get
x→5
the particular results by plugging in the ap-
4. lim (−x + 2) propriate values of a, b, and c.)
x→0

1
1
5. lim x+7 16. lim
x→0 x2
x→4 2
1
6. lim (6x + 3) 17. lim
x→2 x→0 x4
7. lim (−2x + 19) 1

x→7 18. lim
x→−7 (x + 7)2

5 √
8. lim x+3 19. lim x + 1
x→0 2 x→3
9. lim (4) 20. lim (x2 )

x→1 x→0
10. lim (π)

x→π
21. lim (x2 − 1)
x→0
11. lim (k), where c and k are both constants. √

x→c 22. lim 3−x
x→3
12. lim (x − 6)
x→4 23. lim (x2 + 3)
x→1
13. lim (2x − 7)
x→1 24. lim (x2 + 3)
x→−1

1
14. lim x−4 25. lim (3x2 − x)
x→−3 3 x→2
15. lim (ax+b), where a, b, and c are all constants. 26. lim (x2 + x − 2)
x→c x→0
(This is the general case of the previous dozen
or so problems... which, note, are all relatively 27. lim (x2 − 4x − 5)
x→0
14
28. lim (x3 ) erty!)
x→0
29. lim ( f (x) + g(x) ) = lim f (x) + lim g(x) (This 31. lim ( f (x) − g(x) ) = lim f (x) − lim g(x)
x→c x→c x→c x→c x→c x→c
is a fundamental property of limits! Hint: use
the triangle inequality (one of the properties of (x + h)2 − x2
absolute values).) 32. lim
h→0 h
30. lim kf (x) = k lim f (x), where k is a constant.
x→c x→c (a(x + h) + b) − (ax + b)
(Again, this is another important limit prop- 33. lim
h→0 h
34. In the same manner in which we constructed the definitions for lim f (x) = L and lim f (x) = ∞,
x→c x→c
come up with a suitable definition for lim f (x) = L. (Explain your reasoning!)
x→∞
1
35. Using the definition you constructed above, prove that lim = 0.
x→∞ x
−1
36. Likewise, construct a definition for lim f (x) = −∞, and then prove that lim = −∞
x→c x→5 (x − 5)2
37. Now come up with a definition for lim f (x) = ∞ (i.e., the case in which x and f (x) both get bigger
x→∞
and bigger), and then show that lim x2 = ∞
x→∞
15
Differentiation Laws
Derivatives informally and then formally; limits informally and then formally; now
back to derivatives.
We started calculus, two months ago, by discussing derivatives informally. We learned how we could
draw the slopes of functions from the graphs of the functions, and in doing so gained a good feeling for how
derivatives work. Then, once we were comfortable, we moved on to discussing derivatives more formally:
we said, “Well, drawing pictures is great and everything, but wouldn’t it better if we had equations for
derivatives?” So we came up with Fermat’s difference quotient, and used that to compute the equations
for derivatives.
We did that for a while, and we had some success, but then we realized that we had serious problems
in our understanding of Fermat’s difference quotient: weren’t we just dividing by zero? So we came up with
the concept of a limit to get around this problem. We played around with the idea of a limit for a while (as
x gets closer to something, what does f (x) get closer to?). But—still being vaguely uncertain about their
true nature, and not wanting to base calculus on anything but absolute, black-and-white, Manichaean,
anti-relativist Truth—we decided we needed to formalize our idea of a limit. So we came up with our -δ
definition of a limit, and had quite the mental adventure trying to understand that.
Having done all of that—having satisfied ourselves that limitry is not Sophistry—we can now return
to our original goal: that of understanding derivatives. How do these things that are the slopes of functions
work? What do we know about them?
For starters, we know Fermat’s difference quotient, on which we base our formal definition of a deriva-
tive. Everything else we know about derivatives comes out of this equation. Fermat’s difference
quotient is the burning oil rig from which the unrefined petroleum that is our knowledge of derivatives
gushes forth, billions of barrels per second, about to spontaneously ignite and cause awful smoke that will
lower global surface temperatures by a fraction of a degree for the next year, because as Plutarch reminds
us: “the mind is not a vessel to be filled; it is a fire to be lighted.” Anyway, Fermat’s difference quotient
is this:
f (x + h) − f (x)
the derivative of a function f (x) is lim
h→0 h
We also spent a long time proving that the derivative of xn is nxn−1 :
d n
in Leibniz notation: [x ] = nxn−1 or, in Lagrange notation: (xn )0 = nxn−1
dx
(Look in your notes if you don’t remember the theorem or the proof.) So, for example, if you want to find
the derivative of x12 , you could use Fermat’s difference quotient:
d 12 (x + h)12 − x12
(x ) = lim
dx h→0 h
But then you’d have to spend all weekend multiplying out (x + h)12 , and that’s simply drudgery. It’s not
interesting. It doesn’t require actual thinking. It’s downright boring! Far better it would be to simply use
the shortcut we proved, and find in just a few moments that the derivative is just:
d 12
x = 12x11
dx
This isn’t cheating! It is a real shortcut! We proved it! It comes out of the definition of the derivative! It
is Fermat’s difference quotient, but just applied to a specific situation (that of xn ) and cleaned up a bit.
1
Analogously, you could walk from here to Boston in order to hang out with your friend in Cambridge, but
why spend months doing that when you could simply fly?1 You’d achieve your objective much quicker.
We know some other shortcuts, too. What if we want to take the derivative of, say, 5x2 ? We can’t use
our xn law (at least not directly), because this doesn’t look like xn —there’s that 5 in the way. However,
we also proved that we can pull constants out of a derivative:
d df
in Leibniz notation: [a · f (x)] = a · or, in Lagrange notation: (a · f (x))0 = a · f 0 (x)
dx dx
So we can use both of these laws together to take the derivative of 5x2 . First we’ll use our constant rule:
d 2 d 2
5x = 5 · x
dx dx
and THEN—with the 5 out of the way—then we can use our xn rule:
= 5 · 2x
= 10x
What if we want to find the derivative of, say x12 + x5 ? Again, we don’t want to have to write it as
((x + h)12 + (x + h)5 ) − (x12 + x5 )
lim and simplify, because that would be truly awful. Thankfully, we
h→0 h
also proved that we can split derivatives up along addition, and thus, that if we want to find the derivative
of x12 + x5 , it’s sufficient to find the derivatives of x12 and x5 individually, and then add them:
d 12 d 12 d 5
(x + x5 ) = (x ) + (x )
dx dx dx
= 12x11 + 5x4
More generally, we have this rule:

d df dg
in Leibniz notation: [f (x) + g(x)] = +
dx dx dx
or, in Lagrange notation: ( f (x) + g(x) )0 = f 0 (x) + g 0 (x)
Note that the fact that we can split it up along addition is hardly unique to this “take the derivative”
function2 . Compare it with how other functions treat addition:
• We can split the “multiply by five” function along addition: 5(a + b) = 5a + 5b

√ √ √
• We can’t split square roots up along addition: a + b 6= a + b
• We can’t split squaring along addition: (a + b)2 6= a2 + b2
Or rather, we can split it up, but in a weird way: (a + b)2 = a2 + 2ab + b2
• We can’t split logs up along addition: ln(a + b) 6= ln(a) + ln(b)
• We can split trig functions up along addition, but weirdly: sin(a + b) = sin(a) cos(b) + cos(a) sin(b)
• We can split exponentials up along addition, but in a bizarre way: ea+b = ea · eb
1
For perfectly good reasons to do that, see, e.g., The Places in Between, The Roads to Sata, or any number of other books
about long-distance walking.
2
Of course differentiation is a function! It’s just a function into which we usually give and get other functions, rather than
numbers.
2
Want a cool math word? Ignore this if you don’t. Homomorphic (and homomorphism). Basically,
a homomorphism is a function that preserves algebraic structure... √ for example, square-rooting is homo-
√ √
morphic with respect to the
√ operation of multiplication
√ (because a · b = a · b), but not homomorphic
√
w.r.t. addition (because a + b 6= a + b). So this one time at the University of Chicago, I was at a
party and I was telling this girl about my algebra class and how we were studying homomorphisms (which
I abbreviated) and finite abelian groups (which have a questionable acronym), and—actually, never mind.
Incidentally, because we know that a) derivatives split up along addition, and b) we can pull constants
out of derivatives, we know that c) derivatives must split up along subtraction, too. Subtraction, after all,
is just the same as adding a negative, and a negative is the same as a positive multiplied by −1. Here’s
the formal proof:
d d
[f (x) − g(x)] = [f (x) + (−1)g(x) ] (algebra)
dx dx
d d
= [f (x)] + [(−1)g(x)] (we can split derivatives up along addition)
dx dx
d d
= [f (x)] + (−1) · [g(x)] (and we can pull constants out)
dx dx
d d
= [f (x)] − [g(x)] (algebra)
dx dx
Or, in Lagrange (prime) notation:
(f (x) − g(x) )0 = (f (x) + (−1)g(x) )0 (algebra)

= f 0 (x) + ( (−1)g(x) )0 (we can split derivatives up along addition)
= f 0 (x) + (−1)g 0 (x) (and we can pull constants out)
= f 0 (x) − g 0 (x) (algebra)
What all of these properties combined mean is that we can take the derivative of any polynomial!!!
We just need to differentiate each term (leaving the constant in place):
d d d d d
(7x10 + 3x9 − 2x13 + 5) = (7x10 ) + (3x9 ) − (2x13 ) + (5)
dx dx dx dx dx
d 10 d 9 d 13 d
=7· (x ) + 3 · (x ) − 2 · (x ) + (5)
dx dx dx dx
= 7 · 10x9 + 3 · 9x8 − 2 · 13x12 + 0
= 70x9 + 27x8 − 26x12
Here’s another example:

d
(10x15 − 12x4 + 5x3 + 2x − 7) = 10 · 15x14 − 12 · 4x3 + 5 · 3x2 + 2x0 − 0
dx
= 150x14 − 48x3 + 15x2 + 2
So that’s it for polynomials! We can take the derivative of any polynomial! Calculus: conquered!!!
Except not. There are lots of things that aren’t polynomials3 . tan(x) isn’t a polynomial, though we might
be curious as to what it’s derivative is. ex isn’t a polynomial. Rational functions aren’t polynomials.
Logarithms aren’t. Trig functions aren’t. And so forth! What are their derivatives? Do we have to go
back to the definition of a derivative, or can we find other shortcuts?
Here are some more questions:
3
although, if you are familiar with Taylor series, the entire world is made only out of polynomials
3
• We know how to take the derivative of two functions added together... what if we have two functions
multiplied together? like x2 sin(x)? what’s the derivative? is it 2x cos(x)? or x2 cos(x)? or 2x sin(x)?
or something else altogether???
x2 2x
• What if I want to find the derivative of two functions divided, like ? Is it ? (Hint: No.)
sin(x) cos(x)
• What if I want to find the derivative of one function inside another function, like sin(x2 )? What if I
2
want to find the derivative of one function raised to another function, like (sin x)x ? Or even more
simply, what if I want to find the derivative of something like ex or 5x , or maybe 5sin x ? And what
about logarithms? What’s the derivative of logk (x)?
• And what if I put all this stuff together??!? What if I need to find the derivative of:
5x7 − x2 sin(x) + ex − tan(x)

log7 (x) + (x2 + 3x)34 − cos(x8 )
If we don’t have any shortcuts, our life would be totally miserable, because we’d have to work out4 :
5(x+h)7 −(x+h)2 sin(x+h)+ex+h −tan(x+h) 5x7 −x2 sin(x)+ex −tan(x)
 
2 34
log7 (x+h)+((x+h) +3(x+h)) −cos((x+h) )8 − 2 34
log7 (x)+(x +3x) −cos(x )8
lim  
h→0 h
WE’RE GOING TO NEED MORE SHORTCUTS.
Let’s deal with these questions one at a time.
First of all: what’s the derivative of x2 sin(x)? We know that the derivative of x2 is 2x; we know
that the derivative of sin(x) is cos(x); is the derivative of x2 sin(x) just 2x cos(x)? More generally... is the
derivative of two things multiplied together just the derivative of each of the two things multiplied?
We know that we can split derivatives up along addition:
( f (x) + g(x) )0 = f 0 (x) + g 0 (x)
Can we also split derivatives up along multiplication?
( f (x)g(x) )0 = f 0 (x)g 0 (x) ???
We know that we can split derivatives up along addition because we proved it. But, without proving it,
that’s not an obvious result. Likewise, it’s not obvious that we can split derivatives up along multiplication.
Maybe we can; maybe we can’t.
Another way we can think about this is: how does this take-the-derivative function treat multiplica-
tion? Compare to how other functions treat multiplication:
√ √ √
• We can split square roots up along multiplication: ab = a · b
• We can split squaring along multiplication: (ab)2 = a2 b2
• We can’t split the “multiply by five” function along multiplication: 5(ab) 6= 5a · 5b
• We can split logs up along multiplication, but in a weird way: ln(ab) = ln(a) + ln(b)
• We can’t split trig functions up along multiplication5 : sin(ab) 6= sin(a) · cos(b)
4
by “work out” I mean “simplify such that we get rid of all the h’s and the limit so that the whole thing looks pretty(-ier)”.
5
There is, actually, some formula for sin(ab), but it’s really weird. The point is that, like with logs, multiplication inside a
trig function doesn’t transform into multiplication outside a trig function.
4
• We can’t split exponentials up along multiplication: eab 6= ea · eb
As it turns out, we can’t split derivatives up along multiplication. At least not cleanly. The derivative
of two things multiplied together (and we’ll prove this in a moment) is:
d df dg
in Leibniz notation: [f (x) · g(x)] = · g(x) + f (x) ·
dx dx dx
or, in Lagrange notation: ( f (x)g(x) )0 = f 0 (x)g(x) + f (x)g 0 (x)
So it’s the derivative of the first thing, times the second thing, and then plus the original first thing times
the derivative of the second thing6 .
So, for instance, if we return to our example of x2 sin(x), its derivative is:
d 2
x sin(x) = (x2 )0 · sin(x) + x2 (sin x)0

dx
= 2x sin(x) + x2 cos(x)
This rule, by the way, is known as the product rule. You can call it the “multiplicative rule,” too, if you
like, or “the one about the derivative of f (x) times g(x),” but this is the name most people give it.
Let’s prove it! The proof is relatively straightforward, but with one caveat. All we really need to do is
plug f (x)g(x) into Fermat’s difference quotient and work things out, except there’s one weird step, where
in order to break it up into two pieces—into f 0 (x)g(x) PLUS f (x)g 0 (x)—we need to add zero. And by
“add zero” I mean “add a special form of zero such that the equation doesn’t actually change (because
how can it change when we’re adding nothing?) but which enables us to look at it in a new and useful
way.” In this case, our zero will be:
0 = f (x)g(x + h) − f (x)g(x + h)
You’ll see it come up in a moment. Now, let’s start the proof!
f (x + h)g(x + h) − f (x)g(x)
[f (x)g(x)]0 = lim (definition of a deriva-
h→0 h
tive)
 =0 
z }| {
 f (x + h)g(x + h) −f (x)g(x + h) + f (x)g(x + h) −f (x)g(x) 
= lim   (adding zero!)
h→0  h 

f (x + h)g(x + h) − f (x)g(x + h) f (x)g(x + h) − f (x)g(x)
= lim + (splitting up fraction)
h→0 h h

f (x + h)g(x + h) − f (x)g(x + h) g(x + h) − g(x)
= lim + f (x) (factor f (x) out of sec-
h→0 h h
ond fraction)

f (x + h) − f (x) g(x + h) − g(x)
= lim g(x + h) + f (x) (factor g(x + h) out of
h→0 h h
first fraction)

f (x + h) − f (x) g(x + h) − g(x)
= lim g(x + h) + lim f (x) (can split limits up
h→0 h h→0 h
along addition)
6
Were we to continue thinking about abstract mathematical structures, rather than the particular mathematical structure
that is calculus, we might ask: are there other functions that treat multiplication the same way derivatives do? are there other
functions f (x) such that f (ab) = f (a)·b + a·f (b)?
5

f (x + h) − f (x) g(x + h) − g(x)
= lim ·lim [g(x + h)] + lim [f (x)]·lim (can split limits up
h→0 h h→0 h→0 h→0 h
along multiplication)

f (x + h) − f (x) g(x + h) − g(x)
= lim · lim[g(x + h)] + lim[f (x)] · lim
h→0 h h→0 h→0 h→0 h
| {z } | {z } | {z } | {z }
=f 0 (x) =g(x) =f (x) =g 0 (x)
= f 0 (x)·g(x) + f (x)·g 0 (x)
Good! Any questions? No? Then let’s move on to our next question: what if I want to find the
derivative of two things divided, like x2 / sin(x)? Like with multiplication, differentiation does weird things
to fractions, so the derivative isn’t just 2x/ cos(x); actually, it works out to be, like,
2
d x 2x sin(x) − x2 cos(x)
=
dx sin(x) (sin x)2
But, actually, wait—I don’t want to do this yet. Let’s save this theorem for a few pages from now. There
are two ways to prove it, and one of them (the way I prefer) requries knowing how to do something we
haven’t done yet. So let’s actually talk about a different question.
Question: what’s the derivative of one function inside another? What if, for instance, I want to find
the derivative of sin(x4 ), with x4 on the inside and sin(x) (or sin(stuff)) on the outside? Or the derivative
of (sin x)4 , with sin x on the inside and x4 (or (stuff)4 ) on the outside? Or what about, say, (x3 + x9 )22 ?
2
Or ex (where estuff is on the outside and x2 is on the inside)?
d
[sin(x4 )] =???
dx
d
[(sin x)4 ] =???
dx
d
[(x3 + x9 )22 ] =???
dx
d x2
[e ] =???
dx
Is the derivative of sin(x4 ) just cos(4x3 )? No. As it turns out, its derivative is: cos(x4 ) · 4x3 . Or, more
generally:
( f (g(x)) )0 = f 0 ( g(x) ) · g 0 (x)
So it’s just the derivative of the outside function, with the same insides, and then times the derivative of
the inside. This is known as the chain rule (etymology unknown to me).
So for example:
d
sin(x4 )

•
dx
= cos(x4 ) · (x4 )0
= cos(x4 ) · 4x3
d
(sin x)4

•
dx
= 4(sin x)3 · (sin x)0
= 4(sin x)3 · cos(x)
6
d 3
(x + x9 )22

•
dx
= 22(x3 + x9 )21 · (x3 + x9 )0
= 22(x3 + x9 )21 · (3x2 + 9x8 )
d h x2 i
• e
dx
2
= ex · (x2 )0 (ex is its own derivative)
2
= ex · 2x
Let’s prove it. To do this proof, we’ll use a slightly different version of Fermat’s difference quotient. Usually
we write the FDQ so that only h is in the denominator:
f (x + h) − f (x)
f 0 (x) = lim
h→0 h
but when we originally constructed it7 we just had ∆y in the top and ∆x in the bottom; the actual change
in x, going from a point x and ending at a point x + h, is (x + h) − x. So we could also define the derivative
like this:
f (x + h) − f (x)
f 0 (x) = lim
h→0 (x + h) − x
Obviously, these two are exactly the same, because (x + h) − x = h. But my point is, it’s this latter version
that will be somewhat more helpful in proving the chain rule.
So. Let’s do this. If we want to take the derivative of f ( g(x) ) I guess we could, for lack of any other
obvious starting point, consider what the FDQ is:
f ( g(x + h) ) − f ( g(x) )
[f ( (g(x) )]0 = lim
h→0 h
Or, written differently:
f ( g(x + h) ) − f ( g(x) )
[f ( (g(x) )]0 = lim
h→0 (x + h) − x
Here’s the trick in this proof: we multiply by 1:
f ( g(x + h) ) − f ( g(x) ) g(x + h) − g(x)
= lim ·
h→0 (x + h) − x g(x + h) − g(x)
| {z }
=1
A common trick, as you’ve seen. If we multiply by 1, we don’t actually change anything—we just look at
it in a different way. Anyway, from here, the proof is straightforward (even there’s a lot of writing). All
we really need to do is fiddle with this equation so that we eventually get f 0 (g(x)) · g 0 (x). For starters, we
can combine these two fractions:
( f ( g(x + h) ) − f ( g(x) ) ) · (g(x + h) − g(x) )
= lim
h→0 ( (x + h) − x ) · ( g(x + h) − g(x) )
and rearrange:
( f ( g(x + h) ) − f ( g(x) ) ) · (g(x + h) − g(x) )
= lim
h→0 ( g(x + h) − g(x) ) · ( (x + h) − x )
split up into two fractions again:
f ( g(x + h) ) − f ( g(x) ) g(x + h) − g(x)
= lim ·
h→0 g(x + h) − g(x) (x + h) − x
7
“To construct” here means “to derive”, especially when coming from geometry; what I mean in this sense is that we
“constructed” the equation by translating this geometric thing into algebra.
7
and, since we know we can split limits up along multiplication:
f ( g(x + h) ) − f ( g(x) ) g(x + h) − g(x)
= lim · lim
h→0 g(x + h) − g(x) h→0 (x + h) − x
BUT WAIT! Those two things look like derivatives!!!
f ( g(x + h) ) − f ( g(x) ) g(x + h) − g(x)
= lim · lim
h→0 g(x + h) − g(x) h→0 (x + h) − x
| {z } | {z }
=f 0 (g(x)) =g 0 (x)
So we just have:
= f 0 ( g(x) ) · g 0 (x)
Yay! One additional comment, in case you didn’t like the last step. If you’re not convinced that
f ( g(x + h) ) − f ( g(x) )
lim actually is f 0 (g(x)), consider this: we know that this is the derivative of f is:
h→0 g(x + h) − g(x)
f (x + h) − f (x)
f 0 (x) = lim
h→0 h
or, written differently:
f (x + h) − f (x)
f 0 (x) = lim
h→0 (x + h) − x
but f 0 is itself a function of x, so instead of plugging in x,why not plug in a triangle:
f (∆ + h) − f (∆)
f 0 (∆) = lim
h→0 (∆ + h) − ∆
or Mr. Fink:
f (Mr. Fink + h) − f (Mr. Fink)
f 0 (Mr. Fink) = lim
h→0 (Mr. Fink + h) − Mr. Fink
√
or another function of x, like x:
√ √
0 √ f ( x + h) − f ( x)
f ( x) = lim √ √
h→0 ( x + h) − x
or ANY other function of x, like g(x):
f ( g(x + h) ) − f ( g(x) )
f 0 (g(x)) = lim
h→0 g(x + h) − g(x)
I hope that helps.
Question that’s still pending: what is the derivative of one function divided by another function? what
is the derivative of, say, x2 / sin(x)? Here’s the answer (the formula is called the quotient rule):
df dg
· g(x) − f (x) · dx

d f (x)
in Leibniz notation: = dx
dx g(x) (g(x))2
0
f 0 (x)g(x) − f (x)g 0 (x)

f (x)
or, in Lagrange notation: =
g(x) (g(x))2
There are two different ways we could prove this. The way I prefer uses the chain rule. You see,
you don’t need to know the quotient rule at all, because division doesn’t exist—it’s just multiplication of
something that’s been raised to the −1. (In the same way that subtraction doesn’t exist—it’s just adding
a number that’s been multiplied by −1.) If you want to take the derivative of x2 / sin(x), for example:
8
0
x2

= (x2 )(sin x)−1 (algebra/properties of
sin(x)
exponents)
d 2 d
= (x ) · (sin x)−1 + x2 · (sin x)−1 (product rule)
dx dx
d
= 2x · (sin x)−1 + x2 · (sin x)−1 (derivative of x2 )
dx
d
= 2x · (sin x)−1 + x2 · (−1)(sin x)−2 · (sin x) (chain rule!)
−1 2 −2
dx
= 2x · (sin x) + x · (−1)(sin x) · cos(x) (chain rule, continued)
−1 2 −2
= 2x(sin x) − x (sin x) · cos(x) (algebra)
2x x2 cos x
= − (algebra)
sin x (sin x)2
2x sin(x) − x2 cos(x)
= (combining fractions)
(sin x)2
So we can prove the quotient rule in precisely this manner, except instead of using the specific functions
x2 and sin x, we’ll use the general functions f (x) and g(x):
0
f (x)
= (f (x))(g(x))−1 (algebra/properties of
g(x)
exponents)
d
= f 0 (x) · (g(x))−1 + f (x) · (g(x))−1 (product rule)
dx
= f 0 (x) · (g(x))−1 + f (x) · (−1)(g(x))−2 · (g 0 (x) (chain rule!)
= f 0 (x)(g(x))−1 − f (x)(g(x))−2 · g 0 (x) (algebra)
f 0 (x) f (x)g 0 (x)
= − (algebra)
g(x) (g(x))2
f 0 (x)g(x) − f (x)g 0 (x)
= (combining fractions)
(g(x))2
Excellent! Now, the other way of proving the chain rule is to go back to first principles—rather than
use shortcuts we already know (chain rule/product rule) to create another shortcut (quotient rule), we
simply build a new shortcut from scratch. That is to say, from Fermat’s difference quotient. If you’ve
already seen the FDQ-based proofs of the additive rule, the constant multiple rule, the xn rule8 , the product
rule, the chain rule, etc., then the proof of the quotient rule is pretty straightforward. We take f (x)/g(x),
we plug it into the FDQ, we rearrange, we maybe do some algebra tricks (there’s an adding-zero step), and
eventually, after enough obnoxious manipulation, we end up with the familiar form of the quotient rule.
The proof (or this version of the proof) is a homework problem, but I’ll be nice and get you started:
0 f (x+h) f (x)

f (x) g(x+h) − g(x)
= lim (definition of a derivative)
g(x) h→0 h
8
which, by the way, tends to get called the “power rule” by people other than me
9
f (x+h)g(x)−f (x)g(x+h)
g(x+h)g(x)
= lim (combining fractions)
h→0  h
=0 
z }| {
 f (x+h)g(x) −f (x)g(x) + f (x)g(x) −f (x)g(x+h) 
 g(x+h)g(x) 
= lim   (adding zero)
h→0 
 h 

And keep going from here! You’ll need to break the fractions up and factor things and do some limit
things, and it’ll be a lot of writing, but eventually you should end up with the quotient rule.
I have one more comment to make. All of these rules we’ve been proving and using—they’re really
just shortcuts to Fermat’s difference equation. These rules are all specific cases of the more general case of
Fermat’s difference quotient. The FDQ gives us a formula for the derivative of any function; the product
rule (e.g.) gives us a formula for the derivative of a function that looks like two things multiplied together.
I’ve listed the rules we’ve discussed here on the next page; you can fill them in and use it as a reference.
But my point is that there is nothing special about this set of rules; there’s no reason why we couldn’t
have come up with more rules (or fewer). All of these rules are just riffs, variations, and mash-ups of the
f (x + h) − f (x)
one fundamental equation: lim . I’ve discussed the rules I’ve discussed merely because I
h→0 h
think that they’ll be the most helpful in your calculus adventures.
This is a way of leading into my story. Yenyen Gatela noticed that she had a lot of homework problems
that looked like some number over xn :

d 5 d 52
for example: , , etc.
dx x3 dx x6
So she wondered: rather than do each of these problems by hand, could she come up with a shortcut?
a
could she find some equation for the derivative of n (where a and n are any constant)? She had a bunch
x
of specific cases (5/x3 , 52/x6 , etc.); could she come up with a more general case (a/xn )? Based on her
experience, she guessed that it was this:
d a −an
n
= n+1
dx x x
And she was able to prove this without too much trouble:
d hai d
n
= [ax−n ] (laws of exponents)
dx x dx
d −n
=a· [x ] (constant multiple rule)
dx
= a · (−n) · x−n−1 (the xn rule)
= −anx−n−1 (algebra)
= −anx−(n+1) (more algebra)
−an
= n+1 (laws of exponents)
x
Awesome! You see, that is what all of you should be doing. You should all be noticing similarities between
problems and using them to come up with more general rules—rules which are themselves but servants to
the FDQ.
10
Differentiation Quick Reference
d n
• [x ] =
dx
d
• [sin(x)] =
dx
d
• [cos(x)] =
dx
d
• [tan(x)] =
dx
d x
• [e ] =
dx
d
• [ln(x)] =
dx
d
• [k · f (x)] =
dx
d
• [f (x) + g(x)] =
dx
d
• [f (x) − g(x)] =
dx
d
• [f (x) · g(x)] =
dx

d f (x)
• =
dx g(x)
d
• [f (g(x))] =
dx
dh g(x)
i
• f (x) =
dx
d
• logf (x) (g(x)) =
dx
11
Problems
Find the derivative of the function, then find the second derivative (i.e., the derivative of the derivative):
1. f (x) = 1 − x 12 4 1 x2 + 5x − 1
25. f (x) = − 3+ 4 48. f (x) =
x x x x2
2. f (x) = 2(1 + x)
26. f (x) = x1/2 (x − 1)(x2 + x + 1)
3. f (x) = 11x5 − 6x3 + 8 √ 49. f (x) =
27. f (x) = x x3
4. f (x) = −x2 + 3 (x2 + x)(x2 − x + 1)
28. f (x) = x1/3 50. f (x) =
√ x4
5. f (x) = x2 + x + 8
29. f (x) = 3 x 51. f (x) = xh(x)
6. f (x) = 3x7 − 7x3 + 21x2 √
30. f (x) = 5 x 1
52. f (x) = h(x) −
4x3 10 h(x)
7. f (x) = −x 31. f (x) = √
3 x 53. f (x) = 3x2 g(x) − 5x
x3 x2 x
8. f (x) = + + 32. f (x) = x1/n , n constant x
3 2 4 54. f (x) = g(x) +
√ g(x)
33. f (x) = n x
9. f (x) = 3x−2
55. f (x) = (h(x))2
34. f (x) = x3 − 2x (2x + 5)

3
10. f (x) = 2 56. f (x) = h(x2 + 1)
x

8 9
1
35. f (x) = 9x − 8x x+
11. f (x) = −2x−1

x x−1
57. f (x) = h
−2

1

1
x+1
12. f (x) = 36. f (x) = 1 + 1+ 2
x x x 58. f (x) = (h(x))2 + 1
1 37. f (x) = x2 (x − 1)

13. f (x) = h(x) − 1
x 59. f (x) =
38. f (x) = x2 sin(x) h(x) + 1
4
14. f (x) = 2 60. f (x) = sin(x)
x 39. f (x) = 5x9 cos(x)
15. f (x) = ax2 + bx + c x2 − 1 61. f (x) = cos(x)
40. f (x) =
a, b.c constant 2x + 3 62. f (x) = tan(x)
x2+2 7x4
+ 11 (hint: how can you rewrite
16. f (x) = 41. f (x) =
x3 x+1 tan(x)?)
17. f (x) = (x2 − 1)(x − 3) x3 + 3x 63. f (x) = −10x + 3 cos(x)
42. f (x) =
1 x2 − 1
18. f (x) = x − 64. f (x) = 6x2 − sin(x)
x 6 − 1/x
43. f (x) = 2
x3 x−2 65. f (x) = 4 + − cos(x)
19. f (x) = x
1−x 1 + x4
44. f (x) = 3
x2 66. f (x) = + 5 sin(x)
20. f (x) = 6x2 − 10x − 5x−1 x
ax − b
21. f (x) = 4 − 2x − x−3 45. f (x) = x2 1
cx − d 67. f (x) = − 2
1 5 a, b, c, d constants tan(x) x
22. f (x) = 2
−
3x 2x ax2 + bx + c 68. f (x) = x sin(x) + cos(x)
46. f (x) =
1 7 cx2 + bx + a 4 1
23. f (x) = 2 + 69. f (x) = +
6x 12x3 a, b, c, d constants cos(x) sin(x)
1 x3 + 7
24. f (x) = 9x + 1 + 47. f (x) = 70. f (x) = x2 sin(x) + 2x cos(x)
x x
12
Find the derivative of each of the following functions:
71. f (x) = (2x + 1)400 88. f (x) = x3 (2x − 5)4

72. f (x) = (4 − 3x)9 89. f (x) = (1 − x)(3x2 − 5)5
73. f (x) = (x2 + 1)−3 90. f (x) = (4x + 3)4 (x + 1)−3
74. f (x) = (x + x3 )−2
91. f (x) = (1 − x)(3x2 − 5)5
1
75. f (x) =
5x − 7 92. f (x) = (4x + 3)4 (x + 1)−3
2
76. f (x) = 93. f (x) = (2x − 5)−1 (x2 − 5x)6
x2 +6
2
x

sin(x)
77. f (x) = (1 − )−7 94. f (x) =
7 1 + cos(x)
x −10
78. f (x) = −1 −1
2 1 + cos(x)
95. f (x) =
2 4 sin(x)
x 1
79. f (x) = +x− −3
8 x x
96. f (x) =
5 x−1
x 1
80. f (x) = + 2
5 5x
x 4
97. f (x) = −
4
81. f (x) = (sin(x)) + (cos(x)) −2 x−1 x−1
82. f (x) = x5 − 25 sin(x/5) 98. f (x) = sin3 (x) tan(4x)
83. f (x) = 2 cos(x/2) + x2 /4 cos4 (x)

99. f (x) =
84. f (x) = x tan(3x) + 7 tan(x)
−3
2x

x
85. f (x) = 100. f (x) =
cos(2x) x−1
1 −1

1 7 x3 + x2 + x + 1
86. f (x) = (3x − 2) + 4 − 2 101. f (x) =
21 2x x3 − x2 + x − 1
4
−3 1 2 x3 + x2 + x − 1
87. f (x) = (5 − 2x) + +1 102. f (x) =
8 x x3 − x2 + x + 1
Find the indicated derivatives:
d4

d d 2
3x − x4

103. x (x − x ) 105. 4
dx dx dx
d5 4
ax + bx3 + cx2 + dx + e

106. 5
d2 dx

d
104. 2
(x2 − 3x) (x + x−1 ) a, b, c, d, e constant
dx dx
13
Find all the derivatives of the following functions (i.e., the first derivative, the second derivative, the third
derivative, etc., etc., etc., ad infinitum)
107. f (x) = x2 − x x5
110. f (x) =
120
x3 x2
108. f (x) = + −5 111. f (x) = xn , n constant
3 2
x4 3 2
109. f (x) = − x −x
2 2
Find a formula for the nth derivative:

1 x
112. f (x) = 114. f (x) =
x 1+x
1 1
113. f (x) = 115. f (x) = a, b, c constants
1−x bx + c
There are lots of functions that we can sketch pretty easily. But there are others that we can’t sketch so
easily. Luckily, the derivative gives us a new bit of information that can help us see functions! For the
following functions:
1. Where is the slope of the function 0?

2. Where is the function increasing (i.e., where is the slope positive)?
3. Where is the function decreasing (i.e., where is the slope negative)?
116. f (x) = x3 122. f (x) = 2x3 − 3x2 − 12x + 1

117. f (x) = x4 1
123. f (x) = x2 −
x
118. f (x) = x2 + 4
4
1 124. f (x) = x +
119. f (x) = x + x2
x
120. f (x) = x3 + 3x2 − 2 125. f (x) = (1 + x2 )−2
121. f (x) = 2x3 − 12x2 + 7 126. f (x) = (1 − x2 )2
Find a function with the given derivative (and then check your answer by differentiating):
127. f 0 (x) = 3x2 + 2x + 1 132. f 0 (x) = 2x cos(x2 ) − 2 sin(2x)
128. f 0 (x) = 4x3 − 2x + 4 133. f 0 (x) = 3(x2 + 1)3 (2x)

1
129. f 0 (x) = 2x2 − 3x − 134. f 0 (x) = 2x(x2 − 1)
x2
1 135. f 0 (x) = 2(x3 − 2)(3x2 )
130. f 0 (x) = x4 + 2x3 + √
2 x
dy
131. f 0 (x) = 2 cos(x) − 3 sin(x) 136. = 3x2 (x3 + 2)2
dx
14
137. Finish the Fermat’s difference quotient-based proof of the quotient rule, as started on page 9.
138. Consider the function f (x) = xn , where n is a positive integer. Find the kth derivative of f , if a)
k = n, b) k > n, and c) k < n.
139. What’s the second derivative of a product? That is, what is [f (x)g(x)]00 ? What about the third
derivative? the fourth derivative? fifth? what about the kth derivative?
140. Find the values of x where the slope of f (x) = ax2 + bx + c is zero.
141. Find the conditions/values of a, b, and c such that the function f (x) = ax3 + bx2 + cx + d
(a) has a slope of zero at two distinct points

(b) has a slope of zero at one point
(c) never has a slope of zero
d999
142. Find [cos(x)]
dx999
d725
143. Find [sin(x)]
dx725
d
144. Find [sin(sin(sin(cos(x2 + 5x))))]
dx
d
145. Find [f (g(h(x)))]
dx
146. Consider the function f (x). Use the chain rule to show that a) if f (x) is even, then f 0 (x) is odd,
and b) if f (x) is odd, then f 0 (x) is even. (If you don’t remember the definitions of even and odd
functions, look them up on the internet or in a book.)
15
Circle of Trig Derivatives!
1
Calculus With Logarithms and Exponentials
We know how to work with exponential functions. We know how to work with logarithms. But there are
still two things we don’t know:
1. What is the derivative of an exponential function?

d x x
Answer: (a ) = ln(a) · a
dx
2. What is the derivative of a logarithm?

d 1
Answer: loga x =
dx x ln(a)
In the rest of these notes, we’ll derive these two identites (and more). It might seem like a lot of details,
and a lot of proofs, but don’t get bogged down in the formalism. All that is just details. Everything boils
down to these two equations. All the stuff we already know about calculus, logarithms, and exponential
functions still applies. This just gives us new toys to play with.
That said, you want to know why these two things are true, right? Let’s go find out!
The Derivative of an Exponential Function

What if we have an exponential function like f (x) = ax , where a is some greater-than-zero constant?
What is its derivative?
We have no idea. This isn’t a polynomial like f (x) = xa . In that case, the variable is in the base and
the constant is in the exponent. For example, x2 and 2x are very different functions:
f (x) = x2 f (x) = 2x
We have long known what the derivative of something like x2 is. But in this case—in the case of an
exponential function like 2x —the base is a constant, and the exponent is a variable. We don’t know how
to find the derivative of that. And we don’t really know where to begin in our search for the derivative of
an exponential function, so let’s get back to basics.
We know that the definition of the derivative is
f (x + h) − f (x)
f 0 (x) = lim
h→0 h
(Ages ago we constructed this definition using Fermat’s difference quotient.) Let’s use this. If we use some
general exponential function f (x) = ax , then we have:
1
d x ax+h − ax
(a ) = lim
dx h→0 h
ax ah − ax
= lim (properties of exponents)
h→0 h
ax (ah − 1)
= lim (factor out ax )
h→0 h
ah − 1
= ax · lim (as h → 0, ax doesn’t change, so we can
h→0 h
pull it out of the limit)
But where to go from here? We’re kind of stuck with this nasty limit thing.
ah − 1
But... notice that lim is constant with respect to x. It’s just a fixed number (for some fixed
h→0 h
a). Hmm. For convenience, let’s call it L. Then we have:
ah − 1
= ax · lim
h→0 h
= ax · L
= L · ax
This is cool! This means that the derivative of any exponential function is just the function again
times some constant L! Now, we have no idea how to actually compute L—it’s a really scary-looking
limit—but just be patient for a moment. Do you like taking derivatives? I do. If we take a whole bunch...
d x
[a ] = L · ax
dx
d2 x
[a ] = L2 · ax
dx2
d3 x
[a ] = L3 · ax
dx3
d4 x
[a ] = L4 · ax
dx4
d5 x
[a ] = L5 · ax
dx5
.. ..
. .
The L’s keep piling up! What a mess.
ah − 1
Anyway, back to what L is. We don’t really know how to compute it. lim is a total mess. We
h→0 h
can’t simplify it. But we can estimate it with a silicon slave. Don’t worry too much about how we came
up with these numbers—you could, for example, graph (2x − 1)/x, look at the graph, and see what value
it gets close to as x gets close to 0—but, for example:
ah − 1 d x
a lim (a )
h→0 h dx
2 0.693 0.693 · 2x
2.5 0.916 0.916 · 2.5x
3 1.098 1.098 · 3x
3.5 1.252 1.252 · 3.5x
2
And so forth. Anyway, here’s an observation: if we could make L = 1—i.e., if we could find some
(that number)h − 1
number such that lim = 1—we could have a function who is its own derivative!!!1
h→0 h
THAT WOULD BE AWESOME. Question is, what is that number? Based on the examples, it must
be between 2.5 and 3. If we estimate it more precisely, we get:
that number ≈ 2.718281828459045...

Unfortunately, it’s not something nice like an integer. It’s an irrational, transcendental number that
goes on forever, like π. Which I guess in some ways makes it more beautiful.
Anyway—this is e! The base of all natural logarithms! “Euler’s Number”2 , after Leonhard Euler
d
(1707–1783)! This is how we define e! It is the number such that dx (ex ) = ex (don’t worry about the
self-referentiality there). Like with π, which is the ratio of the circumference of a circle to its diameter, e
is the number whose exponential function is its own slope. It arises eerily out of mathematical nature.
So, to recap. We know:
d x ah − 1
(a ) = ax · lim (for any a > 0)
dx h→0 h
d x
(e ) = ex (for e ≈ 2.718...)
dx
Maybe we should also include the chain rule in these theorems—i.e., what if I want to raise a constant
not just to x but to some more complicated function of x? Then these two formulas follow directly from
the chain rule:
d h g(x) i ah − 1 0
a = ag(x) · lim · g (x)
dx h→0 h
d h g(x) i
e = eg(x) · g 0 (x)
dx
But we still have this nasty limit expression in our formula for the derivative of a general exponential
function ax . Is there a nicer way we can write the derivative of ax ? Yes. Let’s do it.
The Derivative of an Exponential (Way #2)

Back when we started this unit, we showed that
d x ah − 1
(a ) = ax · lim
dx h→0 h
But unless a ≈ 2.71828..., this is a useless mess, because we have no idea how to compute this limit (i.e.,
approximate the number that it represents, as a decimal). So, let’s see a new way to write this derivative,
which will be in terms of a natural log.
d x
Theorem: (a ) = ln(a) · ax
dx x
Proof: Because of the laws of logarithms, we can write ax as eln(a ) , because estuff and ln(stuff) are inverse
functions and will cancel each other out when composed. So then we can write ax as:
x)
ax = eln(a
1
Notice how I’ve anthropomorphized derivatives here.
2
e roughly equals 2.7 1828 1828 45 90 45, which is easy to remember, because then it’s just the year 1828, twice, and a
45-90-45 triangle. Of course, memorizing numbers like this is what gives math a bad name—math isn’t about numbers and
certainly not about memorizing long strings of them, etc. etc., but this is kind of cool, and makes for a nice party trick.
3
And we can use laws of logarithms to simplify this even further. We can pull the exponent x out and have:
ax = ex ln(a)
If we differentiate this equation, we get:

d x d h x ln(a) i
[a ] = e
dx dx
But we already know how to take the derivative of ex ln a . It looks like eg(x) , and we’ve figured out how to
differentiate that. Because of the chain rule, we must have:
d
= ex ln(a) · [x ln(a)]
dx
What’s the derivative of x ln(a)? Well, ln(a) is just a constant—it’s a number, like 5 or π or three-sevenths.
So the derivative of x ln(a) must just be ln(a) (in the same way that the derivative of 5x is 5).
= ex ln(a) · ln(a)
and we already know that another way to write ex ln a is ax :
= ax · ln(a)
= ln(a) · ax
For example...
d x
• (7 ) = ln(7) · 7x
dx
d 5x d
• e = e5x · (5x) = 5e5x
dx dx
d x3 3
• 5 = ln(5) · 5x · 3x2 (by the chain rule)
dx
d
(3π)2x+1 = ln(3π) · (3π)2x+1 · 2 (chain rule again)

•
dx
Note
d x
What if we have dx (e )? This must just be ln(e) · ex , but since ln(e) = loge (e1 ) = 1, we must have:
d x
(e ) = ln(e) · ex = 1 · ex = ex
dx
Which we already knew. So thankfully, this is consistent.
Corollary
Notice how we needed to use the chain rule in that last example. Might it not be nice to come up
with a version of this theorem that has a built-in chain rule? To wit:
d g(x)
(a ) = ln(a) · ag(x) · g 0 (x)
dx
(where g(x) is any function of x)
4
Proof: Do it yourself! It’s basically the same as the proof we just did, but with g(x) instead of x. Or—even
more easily—it’s just the last theorem, but with an application of the chain rule.
Another note
d x ah − 1 x
From before, we know (a ) = lim ·a
dx h→0 h
d x
And now, we know (a ) = ln(a) · ax
dx
Then... these are just two different ways of writing the same derivative, so they must be equal. But then
ah − 1
lim and ln(a) must be equal! Or:
h→0 h
ah − 1
lim = ln(a)
h→0 h
Which is totally crazy and counterintuitive and doesn’t at all make visceral sense. Just by thinking about
how limits work, or by thinking about how logarithms work, you would not at all expect these two things
to be equal. But they are. We’ve proved it. This is, in a sense, one of the real powers of math—we are
not constrained simply to what we feel is true, or think should be true. We can use logic to discover new
truths, or even truths that contradict our innate, not-neccesarily-rational beliefs.
Of course, I am talking about mathematical truths. In a sense, it is much easier to discover truth in
mathematics than truth in reality (whatever that is). In math, we start with explicit axioms and rules
of logical inference; mathematical truths are only true relative to these axioms. They are not true in any
universal, structure–of–the–cosmos sort of way3 .
ah − 1
Anyway, back to logs. I have no idea when you would need to calculate lim , but if you did,
h→0 h
you could use this formula:
For example...
2h − 1
• lim = ln(2) ≈ 0.693 . . .
h→0 h
πt − 1
• lim = ln(π) ≈ 1.1447 . . .
t→0 t
f (x)h − 1
• lim = ln(f (x))
h→0 h
The Derivative of a Logarithm

This is tricky. We have no rule that will help us take the derivative of a log. What if we go back to
basics and use the definition of a derivative? This, after all, is whence all of our other derivative laws came
d n
(e.g., dx x = nxn−1 ), and it’s how we came up with the derivative of ax .
f (x + h) − f (x)
So let’s try that. Given a function f (x), the derivative is lim . So if f (x) = logk (x)...
h→0 h
d logk (x + h) − logk (x)
(logk (x)) = lim
dx h→0 h
3
Or are they? Would aliens on Alpha Centurai come up with an equivalent set of mathematical axioms, and thus an
equivalent mathematics? We can use different axioms and create different mathematical structures—does math reflect reality?
is it part of it? or is it just an awesome game?
5
Maybe we could rewrite this using laws of logs:

d 1 x+h
(logk (x)) = lim logk
dx h→0 h x
And maybe split the fraction up:

d 1 h
(logk (x)) = lim logk 1 +
dx h→0 h x
But where do we go from here? We can’t do much more to simplify this. Uh. Darn. We’ll need to use
another method.
The Derivative of a Logarithm (Second Attempt)

What else do we know about logarithms? We know they’re cool. We also know that, by definition,
they cancel out with exponentials upon composition:
logk (k x ) = x and k logk (x) = x
What if we try to take the derivative of that equation on the right? It has a log in it, but more importantly,
it has an exponential—and we already know how to take the derivative of those.
d h logk (x) i d
k = (x) (taking derivative of both sides)
dx dx
d h logk (x) i
k =1 (d/dx(x) = 1)
dx
d
k logk (x) · ln(k) · [logk (x)] =1 (derivative of exponential—derivative of
dx
exponent comes out by chain rule)
d 1
[logk (x)] = (dividing by k logk (x) · ln(k))
dx k logk (x) · ln(k)
d 1
[logk (x)] = (and we already know that k logk (x) = x)
dx x · ln(k)
Yay! Now we know how to take the derivative of a logarithm!!!

For example...
d 1
• (log7 (x)) =
dx x ln(7)
d d 1 1
• (ln(x)) = (loge (x)) = =
dx dx x ln(e) x
d 1 5
• [log9 (x5 )] = 5 · 5x4 =
dx x ln(9) x ln(9)
Corollary
Notice how we needed to use the chain rule in that last example. Perhaps we could come up with a
formula for the derivative of a logarithm that has a built-in chain rule?
Imagine we have a logarithm base a and some function g(x). Then:
d 1
[loga ( g(x) )] = · g 0 (x)
dx g(x) ln(a)
6
Proof: You can do it yourself! It’s basically the same as the proof we just did, but with g(x) instead
of x.
The Derivative of an Inverse Function

Note that in order to take the derivative of a log, we used the property that logs and their inverses
(exponentials) cancel out when we put them inside of each other. This is, of course, true about any pair
of inverse functions—it’s the very definition of an inverse function. So the cool thing is, then, that we can
use the same method to figure out the derivative of a logarithm to figure out the derivative of any inverse
function (at least in terms of some other stuff).
0 1
Theorem: f inv (x) = 0 inv
| {z } f ( f (x) )
derivative of the inverse
For example...
√
f (x) = x3 and f inv (x) = 3 x are inverses. And we know f 0 (x) = 3x2 . So by this theorem,
0 1 1 1 1 1
f inv (x) =
√ = √ = = 2/3 = x−2/3
f 0 ( 3 x) 3( 3 x)2 3(x1/3 )2 x 3
√
Of course, we already knew this—we know how to differentiate 3 x the normal way. (Just write it like
x1/3 and use the power rule.) This is just another way to do it.
Proof: By the definition of an inverse function, we must have
f ( f inv (x) ) = x
So if we differentiate both sides:

d d
f ( f inv (x) ) =

(x)
dx dx
Which is just:
d
f ( f inv (x) ) = 1

dx
But on the left, we have the derivative of a function inside another function... which requires, like, THE
CHAIN RULE! So if we use the chain rule we’ll get:
0
f 0 f inv (x) · f inv (x) = 1

But we can just use algebra to rearrange this:

0 1
f inv (x) =
f 0 ( f inv (x) )
Whee!
Another example
d
f (x) = sin(x) and f inv (x) = sininv (x) are inverses. And dx (sin x) = cos(x). So then the derivative of
inv
sin (x) is...
0 1
sininv (x) =
cos(sininv (x))
The Derivative of xx
d a
We know how to take the derivative of a variable raised to a constant: (x ) = axa−1
dx
7
d x
We know how to take the derivative of a constant raised to a variable: (a ) = ln(a)ax
dx
d x
But how do we take the derivative of a variable raised to a variable??? (x ) = ????
dx
(By the way, what is the derivative of a constant raised to a constant?)
d x
Theorem: (x ) = xx (1 + ln(x))
dx
x
Proof: By properties of logarithms, we can write xx as eln(x ) :
x)
xx = eln(x
Which we can simplify by using another property of logs:
xx = ex ln(x) (∗)
We’ll use this equation again, so let’s label it as ∗. If we differentiate with respect to x:
d x d x ln(x)
(x ) = (e )
dx dx
But we know how to take the derivative of estuff :
d x d
(x ) = ex ln(x) · (x ln(x))
dx |dx {z }
by chain rule
Using the product rule, we get:

d x x ln(x) d d
(x ) = e · x (ln x) + ln x (x)
dx dx dx

1
= ex ln(x) · x · + ln(x) · 1
x
= ex ln(x) · (1 + ln x)
Almost there! By the identity we proved back in ∗, we can write ex ln x as xx , so this becomes:
d x
(x ) = xx (1 + ln x)
dx
For example...
d t
• (t ) = tt (1 + ln t)
dt
d 2 2
• ((x2 )x ) = (x2 )x (1 + ln(x2 )) · |{z}
2x
dx
chain rule
d
• ((sin θ)sin θ) ) = (sin θ)(sin θ) (1 + ln(sin θ)) · cos θ (chain rule again)
dθ
8
BUT. This formula only tells us how to find the derivative of a variable raised to itself. What if we
want, say, to find the derivative of (3x + 2)sin x ? More generally, what if we want to find the derivative of
one function of x raised to another function of x?
d h i d
Theorem: g(x)h(x) = g(x)h(x) · [h(x) ln(g(x))]
dx dx
d
Proof: This is basically the same as the proof of dx (xx ), but with arbitrary functions (i.e., with f (x)
instead of x). By laws of logarithms, we know:
h(x)
g(x)h(x) = eln(g(x) )
Also by log laws:

g(x)h(x) = eh(x) ln(g(x)) (∗∗)
differentiating...
d h i d h h(x) ln(g(x)) i
g(x)h(x) = e
dx dx
which is:
d h i d
g(x)h(x) = eh(x) ln(g(x)) · [h(x) ln( g(x) )]
dx dx
But we already know that eh(x) ln(g(x)) = g(x)h(x) (from ∗∗). So:
d d
g(x)h(x) = g(x)h(x) · (h(x) ln(g(x))
dx dx
For example...
d d
(x2 )sin x = (x2 )sin x · dx (sin x · ln(x2 ))

• dx

2 sin x 2 1
= (x ) · cos x · ln(x ) + sin(x) · 2 · 2x (product rule)
x

2 sin x 2 2
= (x ) · cos x · ln(x ) + sin(x) ·
x

2 sin x 2 sin(x)
= (x ) · 2 cos(x) ln(x) +
x
d 3 d 3
• dx [(8x + 4)x ] = (8x + 4)x · dx (x3 ln(8x + 4))

x3 1
= (8x + 4) · 3x2 ln(8x + 4) + x3 · · 8 (product rule)
8x + 4
8x3

3
= (8x + 4)x 2
· 3x ln(8x + 4) + (simplification)
8x + 4
2x3

3
= (8x + 4)x 2
· 3x ln(8x + 4) + (simplification)
2x + 1
An Awesome Proof of Why (xn )0 = nxn−1

We’ve already proven that the derivative of xn is nxn−1 ... sort of. Our proof, which was rather
excruciating and involved huge amounts of writing, only actually worked for n being some positive integer.
Some of you, for an extra credit problem, proved it for n being a negative integer. But this still leaves
out large numbers of numbers: what if n is a rational number? for example, what if we want to take the
derivative of x1/2 ? what if n is a real number? what if we want to take the derivative of xπ ?
9
As it turns out, the (xn )0 = nxn−1 law is true if n is any real number (and not just an integer or a
rational number). And, as it turns out, there’s a very simple and very clean proof. It doesn’t take any
of the pain that our proof that only worked for natural numbers did; in fact, it doesn’t even use Fermat’s
difference quotient. It simply relies of the derivatives of logs, exponentials, and the chain rule.
n
The basic idea is that if we have xn , then we can rewrite it as eln(x ) , just using properties of logs—the
estuff and ln(stuff) will cancel out. But then using a different property of logs—the one about how you can
pull exponents down—you can rewrite it as en ln(x) . (That works for any real number n.) And then we
know how to take its derivative!
d n d h ln(xn ) i
[x ] = e properties of logs
dx dx
d h n ln(x) i
= e another property of logs
dx
d
= en ln(x) · [n ln(x)] chain rule
dx
1
= en ln(x) · n ·
x
n
But now we can rewrite en ln(x) , since we know that’s just equal to eln(x ) , or just xn :
1
= xn · n ·
x
nxn
=
x
= nxn−1
One Last Question (or two)

x j(x)
What is the derivative of xx ? What about of g(x)h(x) ? Remember how we can make “iterated
sums” with a giant Σ and “iterated products” with a giant Π? Can we make an “iterated exponentiation”
x
x···
function, like Powern (x) = xx
| {z }? What is its derivative?
n times
One thing to be careful with is that—unlike with addition/subtraction and multiplication/division—
exponentiation is not commutative: ab 6= ba , or, in calculator notation, a ∧ b 6= b ∧ a. (For example,
25 6= 52 .) Likewise, it’s not associative: a ∧ (b ∧ c) 6= (a ∧ b) ∧ c. So if you have a teetering tower of things
2
being exponentiated, you need to be clear where your parentheses are—the function x(x ) isn’t the same
as the function (xx )2 .
Actually, I Have Another Question

Throughout these notes, we’ve dealt exclusively with logarithms which bases4 are constants. The
natural log (ln) has base e; the log7 has base 7, logk (which I use in proofs a lot) has a base of constant k.
But logarithms only exist as inverses of exponential functions. Usually we talk about exponential
functions which bases are constants—ex , 7x , k x . Sometimes we raise them to things that are more com-
2
plicated than just x: esin x , 7x , k logk (5) . But there’s no reason why we couldn’t exponentiate them by
something more complicated than a constant. There’s no reason why the base has to be constant. What’s
4
Note my use of “which” as a posessive relative pronoun. Certainly I couldn’t say “whose base,” since logarithms are
inanimate, and while I could sound mildly pretentious and say “the base of which,” I discovered (in the midst of a David
Foster Wallace essay)(he used it in this way) that “which” used to be used in this way. I’ve since seen it once or twice in early
20th century/late 19th century English books.
10
wrong with the function (sin(x))x ? What’s wrong with (x2 + 5x3 )cos x ? Absolutely nothing! We can take
functions and raise them to other functions.
So, ultimately my question is “what is the derivative of the log base f (x) of g(x)”?
d h i
logf (x) (g(x)) = ???
dx
But to answer this question, we need to first ask the question: “what is a log base f (x) of g(x)”?? Meaning:
we’ve never seriously dealt with logarithms which bases are arbitrary functions and not constants. We’ve
only seen them once in passing. On the logs quiz back in October (September?), I asked you:
!
b−12
loga3 b4 √
4
= ???
( a3 )12
Many of you were able to fiddle with this and get:

!
b−12
3 4 −3
loga3 b4 √4
= log 3
a b4 a b = −3
( a3 )12
But we don’t have a general theory of how logs to a variable base behave. We don’t have a list of properties
of logs-base-f (x); we only have a list of properties of logs-base-k (where k is some constant). Before we
start doing calculus with logs-base-f (x), we really ought to know how to do algebra5 with logs-base-f (x).
(It might or might not be necessary as a prerequisite for calculus; if nothing else, it would probably be
interesting.)
I have no idea what such a theory would be; I haven’t thought about it very much. But I suppose
I would start by thinking about general properties of exponential functions with variable bases. If we
understand how things like f (x) = g(x)h(x) behave; maybe we can start thinking about how f inv (x) must
behave.
Problems
Differentiate the following functions, with respect to x, θ, t, etc., as appropriate. (As is my usual convention,
a, b, c, k, and n represent constants.)
1. y = ex 11. y = esin x 21. y = xex

2. y = e100x 12. y = 2etan θ 22. y = −3tet
1
3. y = e2x/3 13. y = 7e cos 4θ
2 23. y = xex − ex
4. y = e−4x/5 14. y = (ex + 1)2
√
5. y = ex+ 2 15. y = (e2x − e−2x )2 24. y = (1 + 2x)e−2x
√ √
6. y = ex−π 16. y = e x ln( x) 25. y = (6x2 + 6x + 3)e2x
2
7. y = e3x 17. y = e2 ln(x)
26. y = (9x2 − 6x + 2)e3x
8. y = e −x2 18. y = ex/ ln(x) √
27. y = 2te t
9. y = e 5t2 −7t 19. y =
2
e2x −x
2 3
10. y = e4t+t 20. y = ex ln(x) 28. y = t2 e2/t
5
I use this word loosely. Mathematicians, please don’t quibble.
11
29. y = x2 ex − xex
2
e2θ
40. y =
ex e2θ + 1
30. y = −x
e +1 eθ
41. y =
e−x 1 − e2θ
31. y = x
e +1 42. y = ln(cos e2x )
32. y = (cos(θ) − 1)ecos θ 43. y = ax
33. y = (sin(θ) − 1)esin θ 44. y = 2x
et 45. y = 8x
34. y = (sin t + cos t)
2
46. y = 3−x
e−t
35. y = (sin t − cos t) 47. y = 9−x
2
√
e−x 48. y = 5 x
36. y = (2 sin(2x) − cos(2x)
5 49. y = 2 s2
e2x 1
37. y = (2 sin(3x) − 3 cos(3x) 50. y = 7 cos θ ln(7)
13
ax − 1 ax 51. y = 3tan θ ln(3)
38. y = e
a2 52. y = 2sin(3t)
ax + 1 −ax
39. y = e 53. y = 5− cos(2t)
a2

54. y = ln(x) 72. y = (ln x)3 x+2
86. y = ln
x3 − 1
55. y = ln(5x) 73. y = ln(ln x) θ
e
56. y = ln(x/5) 74. y = 1/(ln x) 87. y = ln
1 + eθ
57. y = ln(t2 ) 75. y = t(ln t)2 √ !
θ
58. y = ln(t3/2 ) p
76. y = t ln(t) 88. y = ln √
1+ θ
59. y = ln(3/x) ln(t)
77. y = 1 cos θ
60. y = ln(10/x) t 89. y = ln +
sin(θ) sin θ
61. y = ln(x3 + 1) 1 + ln(t)
78. y = 90. y = sin(ln x)
t
62. y = ln(x2 + 3x + π)
ln x 91. y = cos(ln x)
63. y = x2 ln(x) 79. y =
a + ln x 92. y = log2 (5θ)
64. y = ln( (x + 1)x ) x ln(x)
√ 80. y = 93. y = log3 (1 + θ ln(3))
65. y = ln( 1 + x2 ) 1 + ln(x)
√ 94. y = log4 (x) + log4 (x2 )
66. y = ln( 4 x2 + 1) 81. y = ln(3te−t ) √
95. y = log25 (ex ) − log5 ( x)
67. y = ln(θ + 1) 82. y = ln(2e−t sin t)
96. y = log2 (r) · log4 (r)
68. y = ln(2θ + 2) 83. y = (2x + 1)2 ln(2x + 1) ln 3
x+1
69. y = x4
ln(x) − x4
ln x

1 3
97. y = log3 x−1
4 16
84. y = 2 + ln
x3 x3 x ln(x2 ) x !
70. y = 3 ln(x) − 9
r ln 5
√
98. y = log5 7x
71. y = x2 ln x + (ln x)3 85. y = ln x + x2 + 1 3x+2
12
2
√ √
99. y = 10x + (x2 )10 104. y = (cos θ) 2 109. y = ( t)t
√
100. y = (sin x)2 + 2sin x 105. y = (ln θ)π 110. y = t t
101. y = x2 106. y = (x2 + 1)ln x 111. y = (sin x)x

102. y = xln(2) 107. y = (x + 1)x 112. y = xsin x)
103. y = t1−e 108. y = xx+1 113. y = (ln x2 )2x+3
114. Consider the graphs of f (x) = xk e−x , where k is some positive constant. What do they look like (for
varying values of k)? As x → ∞, f (x) → ? Also: f (x) has a maximum between x = 0 and ∞. What
are its coordinates?
115. The equation x2 = 2x has three solutions6 : one at x = 2, one at x = 4, and one elsewhere. Use a
calc***tor to estimate it as a decimal.
116. Find the nth derivative of a) y = eax (where a is some positive constant), and b) y = e−ax .

7 1 b
117. Show that the average slope of the natural logarithm from x = a to x = b is ln
b−a a
118. Show that f (x) = ln(2x) and g(x) = ln(3x) have the same derivative. Then calculate the derivative
of y = ln(kx), where k is any positive number. Explain your results in terms of the properties of
logarithms.
119. Find the nth derivative of a) y = ln x and b) y = ln(1 − x).
ax − 1
120. Consider the function f (x) = . Find its inverse f inv (x). Then find the derivative of its inverse.
ax + 1
6
“Solutions” just means “values of x for which the equation is true;” in this case, values of x for which x2 is the same as
x
2 .
7
i.e., prove
13
The “Teach Mr. Vierra About Derivatives” Paper
What are derivatives and why are they important (and awesome)? Imagine you must explain
this to someone with no knowledge of calculus—like Mr. Vierra! Use American English, symbols
(i.e., math), graphs, charts, metaphors, etc., to teach your reader what derivatives are, how to think
about them informally, how to define them formally, where all these formulas (like the quotient rule)
come from, what various applications are, and so forth. Go into detail! Be creative! This is your
opportunity to show me what you know. Feel free to use references other than your notes (such as
your book, handouts, the internet); if you do, be sure to cite properly.
Due: Friday, February 25th.
The Meno
(somewhat adapted and abbreviated)
Socrates: By the gods, Meno, be generous, and tell me what you say that a limit is;
for I shall be truly delighted to find that I have been mistaken, and that you and
Leibniz do really have this knowledge; although I have been just saying that I have
never found anybody who had.
Meno: There will be no difficulty, Socrates, in answering your question. Let us take
first the limit of a rational function––if a rational function has a hole, one must
eliminate the hole, and then plug in numbers. The limit of a polynomial, if you wish
to know about that, may be easily described: all you do is plug the number into the
function. Sometimes the limit is the true value of the function, but sometimes it
differs. There are also limits from the left side and limits from the right side. Every
function, every point, large or small, rational or integral, real or complex, has a
different limit: there are limits numberless, and no lack of definitions of them; for a
limit is relative to the actions and axes of each particular function at each particular
point.
Socrates: How fortunate I am, Meno! When I ask you for one limit, you present
me with a swarm of them, which are in your keeping. Suppose that I carry on the
figure of the swarm, and ask of you, What is the nature of the bee? and you answer
that there are many kinds of bees, and I reply: But do bees differ as bees, because
there are many and different kinds of them; or are they not rather to be
distinguished by some other quality, as for example beauty, size, or shape? How
would you answer me?
Meno: I should answer that bees do not differ from one another, as bees.
Socrates: And if I went on to say: That is what I desire to know, Meno; tell me
what is the quality in which they do not differ, but are all alike; would you be able
to answer?
Meno: I should.
Socrates: And so of the limits, however many and different they may be, they have
all a common nature which makes them limits; and on this he who would answer
the question, “What is a limit?” would do well to have his eye fixed. Do you
understand?
Meno: I am beginning to understand; but I do not as yet take hold of the question
as I could wish.
sin0(x) = cos(x)
We have long suspected, based on the visuals alone, that the derivative of sin(x) looks kind of like cos(x)
(potentially with a vertical expansion/compression):
And I’ve assured you that, in fact, the derivative of sin(x) is cos(x). But how do we prove this? We
tried to do this back in November1 , but we ran into a stumbling block—one that caused us to trip and fall
not onto the ground, but into a weird alternate universe of limits and epsilons and deltas and h’s going to
zero. It gave us a whole new perspective—a fuller perspective—on what derivatives are.
Now that we’re back in the main universe (the one that has derivatives), let’s see if we can actually
prove this to be true. (That is, let’s see if we can see why this is true.) We want to calculate sin0 (x), and
since we have no other rules that we can apply (can’t use the product rule, can’t use the chain rule, etc.),
we may as well start with first principles: plug it into Fermat’s difference quotient.

0 sin(x + h) − sin(x)
sin (x) = lim
h→0 h
But this looks kind of nasty. Unlike with, e.g., x2 , we can’t “multiply out” the sin(x + h), or distribute it.
It’s certainly not equal to sin(x) + sin(h). But we can simplify it using that formula we proved in trig:
If we apply that here, we’ll get:

0 sin(x + h) − sin(x)
sin (x) = lim
h→0 h

cos(x) sin(h) + sin(x) cos(h) − sin(x)
= lim
h→0 h
which, if I do some rearranging, is

cos(x) sin(h) sin(x) cos(h) − sin(x)
= lim +
h→0 h h

sin(h) cos(h) − 1
= lim cos(x) + sin(x)
h→0 h h

sin(h) − cos(h) + 1
= lim cos(x) − sin(x) (factoring out a − 1)
h→0 h h
1
see the original limits notes
1

sin(h) 1 − cos(h)
= lim cos(x) − sin(x)
h→0 h h

sin(h) 1 − cos(h)
= lim cos(x) − lim sin(x)
h→0 h h→0 h

sin(h) 1 − cos(h)
= cos(x) · lim − sin(x) · lim
h→0 h h→0 h
So now we’re basically at the place where we had to stop when we tried to do this in November. Because:
what are these limits? It’s not clear how we work out limh→0 sin(h)/h and limh→0 (1 − cos h)/h, because
if we simply plug 0 in, we get 0/0. Each of these functions has a hole at h = 0. So we need some way of
finding the presumptive y-value of that hole—of finding what sin(0)/0 should be, if it existed.
With rational functions, this is no big deal—we simply cancel out the offending/hole-creating factor,
and then act as if it never existed. But we can’t do that here, because sine and cosine aren’t polynomials.
So. Um.
Geometry to the rescue! (I have this image of Mr. Austin swooping in with a cape.) We need
to figure out what those two limits are. So we’ll go on a bit of a tangent, and build some triangles.
What we’re going to do, in a sentence, is use geometry to build some algebraic relationships, and then
use those to find our limit. We’re going to construct an inequality that has sin(x)/x in the middle (like
stuff < sin(x)/x < more stuff), and then we’re going to (more or less) crush sin(x)/x between the two
things on the outside. We’ll be able to find the limit as h → 0 of both of the things on the outside—they’ll
both go to 1—and since we know that sin(x)/x is always between them, we’ll know that the limit as h → 0
of sin(x)/x will be 1, too. This is the general idea.
So. Imagine I draw a right triangle with an angle of x, legs of length a and b, and hypotenuse of 1:
so then, just using trig, we know that

opp a
sin(x) = = =a
hyp 1
and
adj b
cos(x) = = =b
hyp 1
So rather than using two new variables, a and b, I can just label the legs with sin(x) and cos(x), since
that’s how long each of them are:
2
The bottom side has length cos(x) and the right side has length sin(x). Then, what if I extend this
to make a slightly bigger triangle? I’ll extend the bottom side such that it’s 1 unit long and I no longer
have a right triangle:
(This triangle will be important. For reference, let’s call this triangle “little triangle.” Which might
seem a bit odd, because it’s bigger than the triangle we started with, but later in the proof we’re going to
make an even bigger triangle. So this is our “little one.”)
Then, what if I imagine that the far right side of the triangle were not a straight line but a curvy
arc—than instead of having a triangle, we have a wedge, a pizza slice, a fraction of a larger circle:
This is another shape we’ll refer to again. Let’s call this wedge-shaped area our “pizza slice.”
THEN, let’s build another triangle! What if I turn this shape into a (larger) right triangle, like so:
3
Let’s call this the “big triangle”. I’ve labelled the side opposite x as γ, but we can write this solely in
terms of x:
opp γ
tan(x) = = =γ
adj 1
So then rather than calling the far-right side γ, let’s call it tan(x):
Why are we doing all of this? We have three overlapping shapes: a small triangle, a pizza slice, and
4
a big triangle. They’re stacked on top of each other. We know that the little triangle is smaller than the
pizza slice, and we know that the pizza slice is smaller than the big triangle. So we know:
area of little triangle < area of pizza slice < area of big triangle
But we also know what each of their areas are. The little triangle we know has a base of 1, and a height of
sin(x) (from what we started with), so we can find its area. The big triangle has a base of 1, and a height
of tan x. And the pizza slice—well, we know that it’s a slice out of a pizza with radius 1, and so the entire
pizza will have area π · (1)2 . But it’s only a percentage of the entire pizza. It’s x radians of the entire
pizza, and the entire pizza has 2π radians. So the pizza slice’s area will be (π · 12 ) · (x/ 2π). In summary:
1 1 1
area of little triangle = · base · height = · 1 · sin x = sin x
2 2 2
x 1
area of pizza slice = (area of full circle) · (percentage of the circle it is) = (π · 12 ) · = x
2π 2
1 1 1 1 sin x
area of big triangle = · base · height = · 1 · tan x = tan x =
2 2 2 2 cos x
So because we know:
area of little triangle < area of pizza slice < area of big triangle
if we plug in the areas, we must have:

1 1 1 sin x
sin(x) < x <
2 2 2 cos x
Let’s clean this up a bit. We want to get sin(x)/x in the middle there, so that we can CRUSH it. If I get
rid of the 1/2s...
sin x
sin x < x <
cos x
and divide by sin(x)...
sin x x sin x
< <
sin x sin x cos x sin x
x 1
1< <
sin x cos x
and then take the reciprocal (which will flip the direction of the inequalities2 ):
1 sin x cos x
> >
1 x 1
sin x
1>> cos x
x
So I get a sin(x)/x! This is the thing whose limit as h → 0 we want to figure out. Here’s the trick: we
know that sin(x)/x is always between 1 and cos x. And we know what happens to cos x and 1 as x → 0.
Both of them just become 1. (Well, 1 already is 1):
x
cos(x) −−−−→ 1
0
x
1 −−−−→ 1
0
2 1 1
since (for example) 2 < 3 and >
2 3
5
And since we know that sin(x)/x is always between cos(x) and 1, then we know that sin(x)/x must also go
sin h
to 1 as well!! (Which, by the way, obviously tells us what lim is, since they’re the same thing—just
h→0 h
with the letter h instead of the letter x.)
sin h
lim =1
h→0 h
That’s kind of a cool technique. We don’t know how to take the limit of sin(x)/x directly, so instead
we take the limit of two of its best friends—friends that we know sin(x)/x always hangs out with. And
since we know sin(x)/x is always hanging out with 1 and cos(x), if we know where 1 and cos x are, we
know where sin(x)/x is.
This technique is an example of what is called the sandwich theorem (if you’re a high school
teacher), the pinching theorem (if you’re a mathematician), or the two policeman theorem (if you’re
a Russian mathematician). Also sometimes the squeeze theorem. The idea is simple enough: if you put
some mozzarella between two pieces of bread, and then press the two pieces of bread together and put it
in your mouth, the mozzarella will go in your mouth, too. Or, put differently, if you get arrested, and
there’s one cop in front of you and one cop behind you, and the front cop walks to the precinct, and the
rear cop walks to the precinct, you’ll get walked to the precinct, too. (This assumes you are walking on a
one-dimensional path; given the width of European streets, this is not an bad assumption.) Were we doing
calculus totally rigorously (totally formally), we’d need to prove this theorem. But it’s pretty intuitive,
and for us limits are a tool, not an end, and to prove it requires some other fancy techniques that you
don’t know and that are sort of extraneous, so we won’t prove it. But feel free to look it up online or in a
book if you’re curious.
sin h
So now we know what lim is. But our original formula for the derivative of sin(x) had two limits
h→0 h
sin h 1 − cos h
in it—not just lim , but also lim . So we need to figure that out, too.
h→0 h h→0 h
Unfortunately, the procedure isn’t quite as interesting. All we need to do is a little bit of algebra to
write it in a more convenient form, and then it’ll be obvious what the limit is. First, the algebra:

1 − cos(x) 1 − cos(x) 1 + cos x
= · multiplying by 1
x x 1 + cos x
| {z }
=1
1 − cos2 (x)
=
x(1 + cos x)
but because of the Pythagorean theorem from trig, we know that sin2 x + cos2 x = 1, and thus that
1 − cos2 (x) = sin2 x:
sin2 (x)
=
x(1 + cos x)
sin(x) · sin(x)
=
x(1 + cos x)
sin(x) sin(x)
= ·
x 1 + cos x
So we know, then, how to rewrite 1 − cos(x) / x:
1 − cos x sin(x) sin(x)

= ·
x x 1 + cos x
6
But we also know that we can split limits up along multiplication, and we know how to take the limit of
both of those things! We just found out that as h → 0, sin(x)/x goes to 1, and as h → 0, sin(x)/(1 + cos x)
should just go to 0/(1 + 1), or just 0.

1 − cos x sin(x) sin(x)
lim = lim ·
x→0 x x→0 x 1 + cos x

sin(x) sin(x)
= lim · lim
x→0 x x→0 1 + cos x
| {z } | {z }
=1 =0
=1·0
=0
So we’ve calculated both of the limits! We know:

sin h 1 − cos h
lim =1 and lim =0
h→0 h h→0 h
So with these limits both calculated, we can return to our original argument, and finally prove that
the derivative of sine is cosine! When we left off, we had:

0 sin(h) 1 − cos(h)
sin (x) = cos(x) · lim − sin(x) · lim
h→0 h h→0 h
and if I plug in what we just found out about these limits...

sin(h) 1 − cos(h)
= cos(x) · lim − sin(x) · lim
h→0 h h→0 h
| {z } | {z }
=1 =0
= cos(x)·1 − sin(x)·0
= cos(x)
Problem
Prove that cos0 (x) = − sin(x). There are a couple ways you could do this. One way would be to follow
the method we used here—to plug cos(x) into the FDQ, do the trig addition identity, rearrange, and figure
out the limits. You can probably get it just in terms of the two limits we evaluated here, so no need to
do anything new to figure out what they are. Another way would be to first rewrite cosine using a trig
identity, and then take its derivative. Try it both ways. Write them up nicely and turn it in.
7
Two Problems
We know how to find the derivative of two functions multiplied together—that’s what the product rule is
for. What if I want to find the third derivative of two functions multiplied together? the fourth derivative?
fifth? what if I want to find the nth derivative?
d
[f (x)g(x)] = f 0 (x)g(x) + f (x)g 0 (x)
dx
d2
[f (x)g(x)] = ???
dx2
d3
[f (x)g(x)] = ???
dx3
d4
[f (x)g(x)] = ???
dx4
.. ..
. .
dn
[f (x)g(x)] = ???
dxn
We know how to find the derivative of two functions multiplied together—that’s what the product rule is
for. But what if we have three functions multiplied together? what if we have four functions multiplied
together? five? what if I have n functions multiplied together??? (For notational convenience as you try
to figure this out, you may wish to write your functions as f1 (x), f2 (x), f3 (x), and so on, because there
are only 26 letters. And you might wish to drop the “(x)”, too, since it creates a lot of redundant writing.)
d
[f1 f2 ] = f10 f2 + f1 f20
dx
d
[f1 f2 f3 ] = ???
dx
d
[f1 f2 f3 f4 ] = ???
dx
.. ..
. .
d
[f1 f2 f3 · · · fn ] = ???
dx
Alternative Proofs
d
Probably our most basic derivative rule is the additive rule—the theorem that says that [f (x) + g(x)] =
dx
f 0 (x) + g 0 (x). Sensibly enough, we proved this by starting with Fermat’s difference quotient. The FDQ
is our fundamental definition of a derivative, and it’s not too hard to plug f (x) + g(x) into it, rearrange
things a bit, and get f 0 (x) + g 0 (x). But—as we discovered during one 11th grade class—we can prove it
entirely differently. We can prove it just using our knowledge of the derivatives of logs and exponentials,
and the chain rule. The idea is that f (x) + g(x) is equal to ln ef (x)+g(x) , by basic properties of logs, so

if I want to take the derivative of the former, I can just take the derivative of the latter:
d d h i
[f (x) + g(x)] = ln(ef (x)+g(x) )
dx dx
but I can use properties of exponents to rewrite this as:
d h f (x) g(x) i
= ln e e
dx
so I just do the derivative of a log:
1 d h f (x) g(x) i
= · e e
ef (x) eg(x) dx
and the chain rule “derivative of the inside” part I can evaluate using the product rule:
1
0 f (x) g(x) 0 g(x) f (x)

= · f (x)e e + g (x)e e
ef (x) eg(x)
but then I can factor out the f 0 (x) + g 0 (x):
1 0 0
f (x) g(x)
= · f (x) + g (x) · e e
ef (x) eg(x)
and then everything else cancels:
= f 0 (x) + g 0 (x)
Woah! So we’ve used a couple of derivative laws in here—we’ve used the laws about the derivative of an
exponential, the derivative of a log, and the chain rule. The derivative of a log and the chain rule we can
both prove using only Fermat’s difference quotient (and the derivative of a log requires only the derivative
of an exponential). So from just two applications of the FDQ, we can get all these other things! We don’t
have to keep using the FDQ! Compare to the two ways of proving the quotient rule: we proved it first
using the FDQ, and then using a combination of the product rule and the chain rule. We can base all of
our shortcuts on the FDQ, or we can try to base our shortcuts on other shortcuts based on shortcuts based
on the FDQ.
Problem
Without using Fermat’s difference quotient, prove the product rule. Then prove the constant multiple rule.
For some extra credit, write the proofs up nicely and turn them in before break.
Another fun thing to do would be to draw a “dependency chart” for our various theorems: can you
visually represent which derivative rules are based on which other derivative rules (like a family tree, but a
family in which Fermat’s difference quotient is the ultimate patriarch and everyone else is related through
incestuous logic that spans and skips generations)?
Some Notes on Usage: Derive, Differentiate, Etc.
A derivation is a type of proof; for our purposes, it’s synonomous with “proof,” since all of the proofs
you’ve seen have been derivations. To prove something in the manner of a derivation is to derive it.
A derivative, on the other hand, is the slope of a function. To compute the derivative of a function
is to differentiate it; we also often speak of “taking” a derivative: I took the derivative of x2 and got
2x. I differentiated x2 and got 2x. The adjectival form of “derivative” is differential: We are studying
differential calculus this year.
1
(Implicit) Differentiation
Example #1
Imagine I have the equation y = x2 and I want to find dy/dx (that is, the derivative of y with respect
to x). How do I do this? Easy. I just take the derivative of both sides:
y = x2
d d 2
[y] = x
dx dx
dy
= 2x
dx
Not very interesting.
Example #2
What if I have a more complicated equation, like y 3 + x2 = 5? How do I find dy/dx? Presumably I
do some algebra first, to solve for y, and then take the derivative:
y 3 + x2 = 5
y 3 = 5 − x2
p3
y = 5 − x2
differentiating:
d d hp 3
i
[y] = 5 − x2
dx dx
dy d h i
= (5 − x2 )1/3
dx dx
dy 1
= (5 − x2 )−2/3 · (−2x)
dx 3
Again, not particularly interesting.
Example #3
But what if I ask you to find dy/dx of the following equation:
y 5 + y = x2
So we’ll just solve for y and then take the derivative... except... you can’t solve this equation for y.
This is a perfectly valid equation, and it does represent some function or some curve in a two-dimensional
plane—but we can’t isolate y. We can’t get a single y alone, on one side1 . Can we still take its derivative?
Can we still come up with an equation for its rate of change? YES. The idea is this: we can still take
the derivative with respect to x of both sides. y here is some function of x—it somehow changes as x
changes—it’s just that we don’t know exactly how it changes. We don’t know exactly what its relationship
to x is.
1
If you don’t believe me, try solving for y. You should ask yourself: what does it mean for an equation to be solvable? can
we solve every equation? why, or why not? under what conditions are equations solvable? is it possible to make equations
solvable by creating new numbers? (“creating new numbers”?? does that make any sense? what am I talking about? what
about x2 + 1 = 0? for what values of x is that true?)
1
So just to remind ourselves that y is a function of x, and not a constant, we can write the y’s as
“y(x)” and not “y”. This doesn’t change anything; it’s just a clearer way of writing it2 . It’s exactly how
we usually write functions as f (x) = blah blah, and not f = blah blah. It’s just that here we’re using the
letter y instead of f .
y 5 + y = x2
(y(x))5 + y(x) = x2
And then if we take the derivative of both sides with respect to x (which means “considering x as the
variable”):
d d 2
(y(x))5 + y(x) =

x
dx dx
The right side is easy—that’s just 2x:
d
(y(x))5 + y(x) = 2x

dx
And we know we can split the left side up:
d d
(y(x))5 +

[y(x)] = 2x
dx dx
To work out the first thing on the left side, we can use the chain rule. We have two functions—we have
y(x), and we have (stuff)5 . So by the chain rule, the derivative will be:
d
5(y(x))4 · y 0 (x) + [y(x)] = 2x
dx
Of course, we don’t know what y 0 (x) is, so we just have to leave it like that... except that’s actually a
GOOD thing, because y 0 (x) is what we want to find, ultimately!!! It’s just different notation for dy/dx.
So eventually, we’ll be able to use algebra to solve for it! But we need to finish taking the derivative first.
The derivative of y(x), clearly, is just y 0 (x) (whatever that is):
5(y(x))4 · y 0 (x) + y 0 (x) = 2x
So now we have an equation that has the derivative inside of it! We just need to solve for it. First we’ll
factor out the y 0 (x):
y 0 (x) · (5(y(x))4 + 1) = 2x
and divide:
2x
y 0 (x) =
5(y(x))4 + 1
and there we go! We’ve found the derivative! If you want to write it in the other notation (and if you
don’t need the “of x” to remind you that y is a function), you could write it like this:
dy 2x
= 4
dx 5y + 1
So there we go! We’ve found the derivative! Of course, the derivative is in terms of y—in terms of the
original function—and since we still don’t know explicitly what y is, we can’t go much further. But, hey!
It’s progress!
2
Sometimes you might see mathematicians abbreviate all of this simply by writing, “y = y(x)”.
2
Example #2, Again
We could have done this with our previous equation, too. Rather than rewriting y 3 + x2 = 5 as
√
y = 3 5 − x2 , we could have kept it as is (possibly reminding ourselves that y is in fact y(x)):
(y(x))3 + x2 = 5
d d
(y(x))3 + x2 =

[5]
dx dx
3(y(x))2 · y 0 (x) + 2x = 0
so if we solve for y 0 (x):
3(y(x))2 · y 0 (x) = −2x

−2x
y 0 (x) =
3(y(x))2
written differently:
dy −2x
=
dx 3y 2
Yay! One thing, though. This might look a little bit different. When we found the derivative the first way
(by doing algebra and then calculus), we found that the derivative was:
dy 1
= (5 − x2 )−2/3 · (−2x)
dx 3
Which does not look the same as:
dy −2x
=
dx 3y 2
Except
√ they actually are the same thing! Because—remember that we found, by doing algebra, that
y = 3 5 − x2 . So if we plug that into our second version of the derivative, we get...
−2x −2x
= √
3y 2 3
3( 5 − x2 )2
−2x
=
3((5 − x2 )1/3 )2
−2x
=
3(5 − x2 )2/3
−2x
=
3(5 − x2 )2/3
1 1
= · · (−2x)
3 (5 − x2 )2/3
1
= · (5 − x2 )−2/3 · (−2x)
3
Which is exactly what we found by taking the derivative the first way.
So What?
In summary, then: if we have an equation and we want to find the derivative of one of the variables
with respect to the other variable, we can either:
• do algebra to solve for the desired variable, then take its derivative, or
3
• take the derivative of both sides, and then solve for the desired derivative.
We can either do algebra first and then calculus, or we can do calculus first and then algebra.
In one sense, this is just a generalization of what we’ve been doing all along. Up until now, we’ve
only taken the derivative of things that have been solved for y (or f or whatever). We’ve only taken the
derivative of equations in which one side of the equation was a single, unique variable. But now, we can
take the derivative of equations in which both sides can be anything. This is really just the more general
case (both sides can be anything) of a specific case (one side must be very clean).
Incidentally, the name of this procedure (as you may have guessed from the title) is implicit dif-
ferentiation. The idea is that you can have an equation that explicitly relates two variables (i.e., one
variable is completely isolated): p3
y = 5 − x2
or you can have an equation that, while it does relate two variables, does not explicitly show the relationship
(it relates them implicitly):
y 3 + x2 = 5
If you were a calculator plotting points to make a graph, it’d be easy to use the first equation to draw your
graph. Just plug in a whole bunch of different x values, and you get out a whole bunch of corresponding
y-values! But you can’t really do that with the second equation—you’d have to, like, pick random points in
the plane (random (x, y) pairs), plug them into the equation, and see whether they work (whether y 3 + x2
does, in fact, equal 5).
Problems
dy
Express (or y 0 or y 0 (x), if you prefer one of those notations) in terms of x and y:
dx
√ √
1. x2 + y 2 = 4 8. x + y = 4 15. sin(x + y) = xy
2. x3 + y 3 − 3xy = 0 9. x2 − x2 y + xy 2 + y 2 = 1 x−y
16. x2 =
x+y
3. x2 y + xy 2 = 6 10. x3 − xy + y 3 = 1
17. x = sin(y)
4. 2xy + y 2 = x + y 11. (x − y)2 − y = 0
5. 4x2 + 9y 2 = 36 12. (y + 3x)2 − 4x = 0 18. x + sin(y) = xy
6. x4 + 4x3 y + y 4 = 1 13. x2 (x − y)2 = x2 − y 2 19. y sin(1/y) = 1 − xy
7. x3 − xy + y 3 = 1 14. (3xy + 7)2 = 6y 20. y 2 cos(1/y) = 2x + 2y
4
Derivatives as Rates of Change
Imagine the following scenario: a tortoise and a hare decide to go on an Aesopian footrace, the prize
of which will be eternal glory. Except rather than race down a straight path, or even race down the same
path, a Darwinian glitch in the hare’s navigational systems has caused it to hop off at an orthogonal angle1
to the tortoise:
This is bad for any number of reasons, among them the fact the winner of the race will not be known
until each animal has circumambulated the surface of the earth—the primitive technology available to the
tortoises and hares doesn’t allow them to directly measure velocity or distance. Hence the need for direct
comparison. Which, I guess, is what one always does in a race—everyone starts and finishes at the same
points.
Uh, anyway, as a third complication, a practical jokester (perhaps an otter?) has attached a spool of
string to the back of the tortoise, and tied the end of the strong to one of the hare’s ears. So as soon as
they start their race, the string starts unspooling, unbeknownst to either of them.
A couple of questions: first of all, when will they run out of string? that is to say, when will the tortoise
and the hare feel a sudden jerk back towards each other, and realize that they’re tied together? And,
secondly: the amount of internal injuries they suffer will be directly proportional to the force exerted by
the string at the moment that it stops them from running independently, which in turn will be proportional
to the rate at which the distance between them is changing—i.e., the speed at which the string is coming
of the spool. So at the instant they run out of string, how fast is the distance between the tortoise and the
hare changing?
(If they were both running in the same direction, we’d just subtract their speeds; if they were both
running in opposite directions, we’d add their speeds. You did this in physics last year—you discussed
relative motion.) (Note how I’ve used the word “speed” rather than “velocity,” since velocity includes
information about direction.))
Let’s assume that the tortoise can run (“run”) 5m/h (where “m” means “meter” and not “mile”) and
the hare can run 200m/h, and that at the beginning of the race, there’s 1000m of string on the tortoise’s
back. (Assume, for simplicity’s sake, that the weight of the string doesn’t slow the tortoise down.) And
let’s let:
• T (t) be the position of the tortoise as a function of time
– then T 0 (t) (or dT /dt) is the velocity of the tortoise (as a fxn of time) (which, incidentally, we
know is 5m/h.)
• H(t) be the position of the hare as a function of time

1
the adult version of “right angle”.
1
– then H 0 (t) (or dH/dt) is the velocity of the hare (as a fxn of time) (which, incidentally, we know
is 200m/h.)
Furthermore, let’s assume there’s 1km (i.e., 1000m) of string on the tortoise’s back. Let’s call S(t)
be the amount of string that has been unspooled (as a function of time)—which, note, is just the distance
between the tortoise and the hare. And then S 0 (t) (or dS/dt) will be the rate at which the string is
unspooling (i.e., the rate at which the distance between the tortoise and the hare is increasing).
So our situation looks something like this:
(So S(t) is really just the hypotenuse of this right triangle—which means that the Pythagorean theorem
might come into play...)
We want to find a couple things: first, when do they run out of string? Meaning, when is the
distance between the tortoise and the hare 1000m? Meaning, when is S(t) = 1000? (And here “when”
means “for what value of t is S(t) = 1000?”)
Because of the Pythagorean theorem, we must have:
(T (t))2 + (H(t))2 = (S(t))2
For ease of writing, you might wish to drop the “(t)”’s:
T 2 + H 2 = S2
We want to find when S = 1000, so we can plug that in:
T 2 + H 2 = 10002
But how do I solve for t? We’ll need more information. Luckily, we have more information. We know
the velocities of both the tortoise and the hare, and we know that both of these velocities are constant
(neither the tortoise nor the hare speed up or slow down). so then after t hours, the tortoise has travelled
2
T (t) = 5 · t meters, and after t hours, the hare has travelled H(t) = 200 · t meters2 So we know:
T (t) = 5t
H(t) = 200t
If we plug these into our Pythagorean thing, we get:
(5t)2 + (200t))2 = 10002

25t2 + 2002 t2 = 10002
I haven’t bothered squaring 1000 and 25, because that’s too much writing—and I can always let the
calculator take care of it. Now, if I solve for t:
t2 (25 + 2002 ) = 10002

t2 = 10002 /(25 + 2002 )
p
t = 10002 /(25 + 2002 )
t ≈ 4.99
So the hare and the tortoise will be running for about five hours before they run out of string. It’ll take
about four hours before they notice that they’re tied to each other.
Second question: what is the rate at which the distance between the animals is changing at
the instant they run out of string? (I.e., when S(t) = 1000.) Put differently, what is S 0 (t) (or dS/dt)
when S = 1000? This is the crucial idea here: we can think of the derivative as a rate of change.
dy/dx is the amount that y changes for every change in x. By “rate” we usually mean “change per unit
time”, and so something like dS/dt or S 0 (t) is the rate at which S changes, per some change in t. Velocity
is a rate—e.g., miles per hour. And it’s just a derivative—the derivative of position. So I guess we’ll want
to find S 0 (t) to find the speed at which the distance is changing (at any time t), and then plug in 4.99 for
t to find the speed at that moment.
We already know:
(T (t))2 + (H(t))2 = (S(t))2
So I guess if we want to find S 0 (t), we’ll need to take a derivative! We could do it in two ways: either solve
for S(t) and then differentiate, or differentiate and then solve for S 0 (t).
If I differentiate and then solve, I get:
(T (t))2 + (H(t))2 = (S(t))2

d d
(T (t))2 + (H(t))2 = (S(t))2

dt dt
2T (t) · T (t) + 2H(t) · H (t) = 2S(t) · S 0 (t)
0 0
2T (t)T 0 (t) + 2H(t)H 0 (t)

S 0 (t) =
2S(t)
T (t)T 0 (t) + H(t)H 0 (t)
S 0 (t) =
S(t)
Alternatively, if I first solve for S(t) and then differentiate:
(T (t))2 + (H(t))2 = (S(t))2

2
If their velocities weren’t constant—if they were speeding up or slowing down; if their speedometer weren’t stuck—then it
wouldn’t be so easy. We’d need to use an integral. Distance—and this is just one of the many, many lies that evil Mr. Ward
told you last year—distance is not always equal to rate times time. d = rt is only true if rate is constant.
3
so then
p
S(t) = (T (t))2 + (H(t))2
differentiating...
d d hp i
[S(t)] = (T (t))2 + (H(t))2
dt dt
1
S 0 (t) = p · 2T (t)T 0 (t) + 2H(t)H 0 (t)

2 (T (t))2 + (H(t))2
2T (t)T 0 (t) + 2H(t)H 0 (t)

= p
2 (T (t))2 + (H(t))2
T (t)T 0 (t) + H(t)H 0 (t)

= p
(T (t))2 + (H(t))2
p
but since we know that S(t) = (T (t))2 + (H(t))2 , we can simplify this to:
T (t)T 0 (t) + H(t)H 0 (t)

S 0 (t) =
S(t)
So, either way I get the same thing! We get that
T (t)T 0 (t) + H(t)H 0 (t)

S 0 (t) =
S(t)
(I’m sorry if these two derivations were hard to read; I think all the “(t)”s made it needlessly messy, but I
wanted to put them in there just so you wouldn’t forget that S, T , and H are all functions of t... I dunno.
If you find them confusing, try writing them out without the “(t)”s, and see if it’s clearer.)
So this tells us the rate (the speed) at which the distance between the tortoise and the hare is changing
as a function of time. Or rather, as a function of the distance the tortoise and the hare have each travelled,
as well as their speeds and the distance between them. We can simplify it a bit. First of all, we know the
speeds of the tortoise and the hare (we know T 0 (t) = 5 and H 0 (t) = 200), so we can plug those in:
T (t) · 5 + H(t) · 200

S 0 (t) =
S(t)
p
Likewise, we also know that S(t) = (T (t))2 + (H(t))2 , so if we plug that in:
T (t) · 5 + H(t) · 200

S 0 (t) = p
(T (t))2 + (H(t))2
And since we know that H(t) = 200t and T (t) = 5t, we have:
5t · 5 + 200t · 200
S 0 (t) = p
((5t)2 + (200t)2
4
or:
25t + 2002 t
=√
25t2 + 2002 t2
t(25 + 2002 )
=p
(t2 )(25 + 2002 )
t(25 + 2002 )
=√ √
t2 25 + 2002
t(25 + 2002 )
= √
t 25 + 2002
25 + 2002
=√
25 + 2002
≈ 200.0625 m/s
So that means that the distance between the tortoise and the hare is increasing at a constant rate—at a
speed of a little more than 200 meters per second! So at the instant that the string runs out, the tortoise
and the hare are getting further apart at a rate of about 200 meters per second. (Which is close to the
hare’s speed—that’s because the difference in speeds is so great. If they were going at similar speeds, we’d
get something different... this is close to the scenario in which the hare is speeding along and the tortoise
is standing still.)
velocity: change in position per change in time interest: change in money per change in time flux
power: change in energy per change in time(?) other physics examples
usuaally by “rate” we mean change in X per change in time though, of course, we could measure
change relative to somethijng othter than time. for instance, i could
Another Example
Here’s another good example. You’re in Arizona. You’re trying to get to Tombstone, but your party
was massacred by Indians/bandits/wild animals somewhere north of Nogales. You’re the only survivor.
So you’ve been walking for days, with no sign of anything living. And then you see it: train tracks. They
extend to the horizon like a pencil line in two-point perspective, and just before the vanishing point rises
an object that you remember from your boyhood days studying Euclid: a cone. Are you hallucinating?
Are you so close to death that time is melting? Are you in the desert or in Mr. Dick’s class? Are you
entering the Platonic world of shapes?
The cone, though, isn’t quite a cone: it’s an inverted cone, raised up above the ground and supported
on an iron lattice. It’s a water tower. You straggle forth. The water tower must be miles away but in
what seems like minutes you’re underneath it, lapping up the drip-drip-dripping coming from the leak at
the very vertex of the cone. The water tastes like rust, but it also tastes like water, which is delicious and
revivifying. So you lie down directly underneath the leak and open your mouth. You don’t have a cup or
anything to catch the water with. All you have is your mouth. But that’s OK, because you could lie there
and drink water all day.
Unfortunately, that won’t be possible. There’s only a finite amount of water in the tank. Moreover:
the rate at which the water is dripping out of the tank is decreasing over time. The rate at which the water
drips will be proportional to the amount of water in the tank—the more water there is in the tank, the
greater the water pressure at the vetex will be, and the faster it’ll drip—and so the more water that drips
out, the less water there is, and the less the pressure there is, and the slower it drips.
Let’s imagine you don’t actually care about the amount of water in the tank. Let’s imagine, instead,
that the water is sort of like an IV for you—that you need to drink a certain amount of water per unit time,
5
lest you lapse back into unconsciousness. You do not want to return to your dessicated desert delerium.
You need to be drinking at least 10 cubic centimeters of water per minute to remain conscious.
Uh. Crud. I just realized that this problem, the way I’m setting it up, is a bit beyond your level. This
always happens when I write word problems—I get way too involved and write stuff that’s too complicated
and elaborate. Look, let’s just do a dumb problem involving a stupid cone and water dripping.
OK, so imagine you have a cone or a funnel or something and water is dripping out of it. Let’s imagine
that cone is 10 feet in diameter at the opening and 15 feet tall, and that the water is dripping at a constant
rate of one cubic foot per hour. That’s a stupid assumption to make—the water won’t drip at a consant
rate—but all of the textbooks love making that assumption. So, anyway, water drips out, the water leve
in the cone changes, and the surface area of the water in the cone changes, too. Let’s say that the height
of the water at any given time t is hwater (t). Then I have a situation like this:
I’ve labelled the radius of the water level in the cone as rwater (t). So, the typical textbook question to
ask now—not like you’d really care, were you dying of thirst in the Sonoran desert—the typical question
to ask is “at what rate is the water level falling?”
So really we are asking, at what rate is the height of the water changing? Or: what is dhwater /dt a/k/a
h0water (t)? I guess we’ll need to find an equation with h(t) in it that we can differentiate.
What do we know? We know the rate at which water is leaving the cone: if the volume of water is
Vwater , then
dVwater 0
= Vwater (t) = −1f t3 /h
dt
We also know that the volume of a cone with radius r and height h is:
1
V = πr2 h
3
Here, of course, we have two cones: we have the actual container, with a permanent height of 15f t
6
and radius of 10f t. And inside of that, we have a cone of water, whose height and radius are changing (as
the water is leaking out).
1 2
Vwater = πrwater hwater
3
Or if I write it to remind myself that these things are all functions of time:
1
Vwater (t) = π(rwater (t))2 hwater (t)
3
Our question is “at what rate is the water level falling?”? So really we are asking, at what rate is the
height of the water changing? Or: what is dhwater /dt a/k/a h0water (t)? We want to find this. We could do
this a couple ways: we could solve this equation for hwater and take a derivative, or we could differentiate
what we have implicitly and then solve for h0water (t).
But let’s simplify it a bit before we do either. We have three functions here—the volume, the height,
and the radius. We can simplify this. We can reduce h(t) and r(t) to just one function. We know that
the radius of the container cone is 10f t and the height of the container cone is 15f t, and we know that
any cone inside of this one will have the same proportions (because of similar triangles or something):
rcone rwater
=
hcone hwater
10 rwater
=
15 hwater
10
rwater = hwater
15
2
rwater = hwater
3
So really, we know that:

2
rwater (t) = hwater (t)
3
7
We can plug this back into our formula for the volume of water and get:
1
Vwater (t) = π(rwater (t))2 hwater (t)
3
2
1 2
= π hwater (t) hwater (t)
3 3
1 4
= π · (hwater (t))2 · hwater (t)
3 9
4
= π(hwater (t))3
27
This is a bit easier to deal with. So. So I know Vwater (t). I want to find h0 (t). I should probabaly take a
derivative! I could either take what I have and solve it for h(t) and then differentiate; alternatively, I could
just differentiate what I have implicitly, and solve for h0 (t).

d d 4 3
(Vwater (t)) = π(hwater (t))
dt dt 27
0 4
Vwater (t) = π · 3(hwater (t))2 · h0water (t)
27
0 4
Vwater (t) = π · (hwater (t))2 · h0water (t)
9
so then if I solve for h0 (t)
0
Vwater (t)
h0water (t) = 4 2
9 π · (hwater (t))
0
We’ve found it! But we can simplify this a bit. We already know another way of writing Vwater (t). We
3
know it’s just equal to −1f t /hr. So if we plug this in:
−1
h0 (t) = 4
9π · (hwater (t))2
simplifying:
−9
h0 (t) =
4π(hwater (t))2
There it is! There’s the rate at which the water level is falling! I might be curious about a specific case:
for instance, at the instant that the water is 7 feet deep, how fast is the water level falling? I don’t know
at what time that will be, so I can’t plug something in for t, but i don’t need to—I can just plug in 7 for
hwater (t):
−9
h0 (t) =
4π(7)2
−9
=
4π · 49
≈ −0.0146
So at that instant, the water level is dropping at about 0.0146 feet per hour!
8
Problems
In the following problems: an object is moving along a line such that its position at any time t is given
by the function x(t). Find the velocity and acceleration of the particle any any time t, and then find the
position, velocity, and acceleration of the particle at the given instant.
1. x(t) = 4 + 3t − t2 , when t = 5 4. x(t) = (2t)/t + 3), when t = 3
2. x(t) = 5t − t3 , when t = 3 5. x(t) = (t2 + 5t)(t2 + t − 2), when t = 1
3. x(t) = 18/(t + 2), when t = 1 6. x(t) = (t2 − 3t)(t2 + 3t), when t = 2
7. Imagine that the radius of a circle is changing (b) If the current through the circuit changes
over time. Can you come up with a function over time, but the resistance remains con-
for the area of the circle as a function of time? stant, how does the power of the circuit
Can you come up with a function for the rate change over time?
at which the area of the circle changes (possi- (c) If both the current and the resistance
bly in terms of the radius and the rate at which change over time, how does the power
the radius is changing)? through the circuit change?
8. Likewise, imagine that the radius of a sphere (d) If neither the current nor the resistance
is changing over time. (Like, you’re inflating a change, how does the power through the
balloon or something.) Can you come up with circuit change?
an expression for the volume of the sphere as
Note that in none of these questions do you
a function of time? Can you come up with
know how this stuff changes with time. You
an expression for the rate at which the volume
simply know, for example, that i is somehow a
changes (again, as a function of the radius and
function of time—that if t changes, i changes,
the rate the radius changes)?
too. (The usual abbreviation for this would be
9. Or imagine this. Imagine you have a balloon- “i = i(t)”.) You don’t know, for example, that
inflating machine that inflates balloons at a it changes at three amps per second. You know
rate of three cubic centimeters per second. that it doesn’t change at zero units per second
Assume that the balloon is a sphere. You (it’s not constant), but that’s all you know.
know two different ways of writing V 0 (t) (or
11. The volume of a cylinder with radius r and
dV /dt)—you know it’s 3cm/s, and since you
height h is V = πr2 h
also know the equation for the volume of a
sphere, you can differentiate it (with respect (a) If the radius changes with time but the
to time, t) and get the second way of writing height does not, how does the volume of
dV /dt. Do that. Can you come up with an the cylinder change with time?
expression for the rate at which the radius of
(b) If the height of the cylinder changes with
the balloon changes over time?
time but the radius doesn’t, how does the
10. Assuming that Ohm’s law holds (which it volume change with time?
doesn’t always), the power P of an electrical (c) If both the radius and the height change
circuit is given by P = Ri2 , where R is the with time, how does the volume change?
resistance of the circuit and i is the current. (d) If neither the radius nor the height of the
cylinder change, how does the volume of
(a) If the resistance of the circuit changes the cylinder change?
over time, but current remains constant,
how does the power through the circuit 12. The length of the diagonals d in a rectangle
p of
change over time? sides length x and y is given by d = x + y 2 .
2
9
(a) Imagine that one of the two sets of par- rate of three cubic centimeters of water per
alell sides is changing over time, but the second. Come up with a function for the rate
other set of sides is not. How is the length at which the water level drops. How fast is the
of the diagonal changing? water level falling when the water is only one
(b) Imagine that all of the sides are changing centimeter from the top of the cup?
over time (but such that the rectangle is
18. A 747 flying west at 550 miles per hour goes
remaining a rectangle). How is the length
over the air traffic control tower at the Ithaca-
of the diagonal changing?
Tompkins Regional Airport (ITH) at noon. An
(c) Imagine that none of the sides of the rect- hour later, a stealth fighter at the same alti-
angle are changing over time. How is the tude flies over the tower headed due north at
length of the diagonal changing? 1000 miles per hour. Come up with a func-
tion for the rate at which the distance between
13. The area of a triangle with sides a and b and
the airplanes is changing as a function of time.
an angle θ between them is A = 21 ab sin(θ).
How fast is the distance between the planes
(a) If a changes with time, but b and θ are changing at 2 : 00 PM?
constant w.r.t.3 time, how does the area
of the triangle change with time? 19. A 20-foot ladder is leaning against a building.
If the bottom of the ladder is sliding away from
(b) If θ changes with time but a and b remain the building along the ground at one foot per
constant w.r.t. time, how does the area second, how fast is the top of the ladder falling
change with time? when the bottom of the ladder is five feet from
(c) If both a and b change with time (but θ the wall?
doesn’t), how does the area change with
time? 20. Sand is pouring out of a pipe at the rate of 16
cubic feet per second. The falling sand forms
(d) If a, b, and θ all change over time, how
a conical pile that makes an angle with the
does the area change?
ground, thanks to research by Sidney Nagel
(e) What if none of them change over time? and others4 , of about 30◦ . How fast is the
How does the area change? height of the pile increasing at the instant the
pile is four feet high?
14. Each edge of a cube is increasing at a rate of
three miles per second. How fast is the volume 21. Let’s go fly a kite! You are flying a kite that
of the cube increasing when the edges are each is 90 feet above your hand level. The wind is
12 miles long? blowing it horizontally away from you at 5 feet
15. A metal disk expands during heating. If its ra- per second. How fast are you letting out cord?
dius increases at a rate of 0.02 inches per sec- How fast are you letting out cord at the in-
ond, how fast is the area of one of its sides instant that the kite is 150 feet away? (Assume
creasing at the instant the radius is 8.1 inches? that the cord is straight from your hand to the
kite.)
16. How fast is the radius of a soap bubble increas-
ing if air is blown into it at a rate of three cubic 22. A particle is moving along the line x + 2y = 2.
inches per second? Come up with an expres- Find a) the rate of chang of the y-coordinate,
sion as a function of time. How fast is the ra- if the x-coordinate is increasing at a rate of
dius increasing at the instant that the radius 4 units per second, and b) the rate of change
is four inches? of the x-coordinate, if the y-coordinate is de-
creasing at a rate of 2 units per second.
17. You are drinking out of a conical paper cup
(height 10cm, diameter of top 6cm, like the 23. A rectangle has two sides on the positive x and
kind you find next to water dispensers) at a y-axes and one corner at a point P that moves
3
=“with respect to”
4
GRANULAR MATERIAL IS SO COOL: http://en.wikipedia.org/wiki/Angle of repose
10
along the curve y = ex in such a way that y in- where mr is the mass at rest and c is the speed
creases at a rate of 1/2 units per minute. How of light. At what rate is the mass changing
fast is the area of the rectangle changing at the when the particle’s velocity is 0.5c and the rate
instant when y = 3? of change of the velocity is 0.01c per second?
24. A tank contains 1000 cubic feet of natural gas
at a pressure of 5psi. Find the rate of change 32. Water is dripping through the bottom of a con-
of the volume of gas if the pressure decreases ical cup four inches across and six inches deep.
at a rate of 0.05 psi/hour. (Assume Boyle’s Given that the cup loses half a cubic inch of
law: pressure · volume = a constant. water per minute, how fast is the water level
dropping when the water is three inches deep?
25. The volume of a spherical balloon is increasing
atr a constant rate of 8 cubic feet per minute.
33. A revolving searchlight a half-mile from shore
How fast is the radius increasing at the instant
makes one revolution per minute. How fast
the radius is exactly 10 feet? How fast is the
is the light travelling along a straight beach
surface area increasing at that instant?
at the instant it passes over a shorepoint one
26. At a certain instant the side of an equilateral mile away from the shorepoint nearest to the
triangle is α centimeters long and increasing at searchlight?
the rate of k centimeters per minute. How fast
is the area increasing? 34. A searchlight is trained on a plane that flies
27. The perimeter of a rectangle is fixed at 24 cen- directly above the light at an altitude of two
timeters. If the length l of the rectangle is in- miles and a speed of 400 miles per hour. How
creasing at a rate of 1 centimeter per second, fast must the light be turning 2 seconds after
when (meaning, for what value of l) does the the plane passes directly overhead?
area of the rectangle start to decrease?
35. When the shadow of the sash appeared on
28. A spherical snowball is melting in such a man-
the curtains it was between seven and eight
ner that its radius is changing at a constant
oclock and then I was in time again, hearing
rate, decreasing from 16cm to 10cm iin 30 min-
the watch. It was Grandfather’s and when
utes. How fast is the volume of the snowball
Father gave it to me he said I give you the
changging at the instant the radius is 12cm?
mausoleum of all hope and desire; it’s rather
29. A man standing three feet from the base of a excruciating-ly apt that you will use it to gain
lamppost casts a shadow four feet long. If the the reducto absurdum of all human experience
man is six feet tall and walks away from the which can fit your individual needs no better
lamppost at a speed of 400 feet per minute, at than it fitted his or his father’s. I give it to you
what rate will his shadow lengthen? How fast not that you may remember time but that you
is the tip of his shadow moving? might forget it now and then for a moment and
not spend all your breath trying to conquer it.
30. An object that weighs 150 pounds on the sur-
1
−2 Because no battle is ever won he said. They
face of the earth will weigh 150 1 + 4000 r
are not even fought. The field only reveals to
pounds when it is r miles above the surface.
man his own folly and despair, and victory is
Given that the altitude of the object is increas-
an illusion of philosophers and fools.
ing at the rate of 10 miles per second, how fast
is the weight decreasing at the instant it is 400
—William Faulkner, The Sound and the Fury
miles above the surface?
31. According to special relativity, the mass of a The minute hand on a clock is five inches long,
particle moving at velocity v is and the hour hand is four inches long. How fast
mr is the distance between the tips of the hands
p
1 − v 2 /c2 changing at 3:00?
11
Optimization
Our entire exploration of calculus began with a single question:
You are a lifeguard at the municipal beach in Churchill, Manitoba. One day, as you are sitting
on your lifeguard chair next to Hudson Bay, you see a swimmer being attacked by a polar bear.
The swimmer appears to be roughly 120 feet out to sea (on a straight line between the swimmer
and the shore), and the lifeguard station is roughly 300 feet down the beach from the nearest
point on shore to the swimmer. You can run at 13 feet per second along the beach, and you can
swim at 5 feet per second. Given that you want to reach the swimmer as quickly as possible,
how far down the beach do you run, and how far do you swim?
The problem here is that we need to find the perfect balance between two competing desires. On one hand,
we want to take the shortest possible route to the swimmer. But if we do that, we’d swim all the way,
and we can’t swim very fast. So on the other hand, we want to spend the least amount of time swimming.
But if we do that—if we run all the way down the beach and then hop in the water when we’re only a
short swim away—the total distance we travel will be quite long. What we want to do is find the perfect
median between these two extremes: find the path that minimizes the amount of time it takes to reach the
swimmer.
Back in October, when we faced this question, we were only able to get as far as coming up with a
function for the amount of time it takes to reach the swimmer (as a function of how far down the beach
you run and how far you swim). To find what choice takes the minimum amount of time, we would have
had to somehow have found the minimum of this function—but how were we to do that?
Now, after four months of work, we are finally at a position to do that. Using our tools of calculus,
we can finally find the minimum of that function—and, in doing so, save the swimmer. As we discussed in
November, the essential idea is that the minimum (or, for that matter, any extrema) is a point at which
the function is neither increasing nor decreasing—a point at which the derivative of the function is zero.
So all we need to do is come up with a function for the amount of time it takes to reach the swimmer, take
a derivative, and then find what value(s) of x make it equal to zero.
Let’s first refresh our memory by coming up with the function for how long it takes to reach the
swimmer. Our situation looks something like this:
1
So let’s say that we run all but the last x feet down the beach—i.e., we run 300 − x feet down the
beach—and then swim on a straight line out to the swimmer. If we abstract the relevant details, we get
something like this:
We know that I’m running at a constant rate of 13 feet/second, and swimming at a constant rate of
5 feet/second. And we know that if rate is constant, distance = rate · time, or time = distance
rate . So the time
it takes me to reach the swimmer will be:
distance running distance swimming
Time(x) = +
speed running speed swimming
Or just:
300 − x distance swimming
Time(x) = +
13 5
So, clearly we need to figure out how far we swim—but we can do that! All we have is a right triangle,
with sides x and 120, so we can figure out how far we swim. We must have:
1202 + x2 = (distance swimming)2
or: p
distance swimming = 1202 + x2
So then, the total amount of time it takes to get to the swimmer, if we run 300 − x yards down the beach,
is: √
300 − x 1202 + x2
Time(x) = +
13 5
Were I to graph this—which I can’t, because all those years of saltwater lifeguarding has corroded my
TI-83 beyond use—but were I to graph this, I’d see something like:
2
How can I find where time is minimal? I just need to find where the slope of this function is zero—I
just need to find where its derivative is zero. So if I take a derivative, I get:
−1 2x
Time0 (x) = + √
13 5 · 2 1202 + x2
−1 x
= + √
13 5 1202 + x2
And if I want to find what value of x makes this zero, I can simply set it equal to zero and do some algebra:
−1 x
0= + √
13 5 1202 + x2
1 x
= √
13 5 1202 + x2
p 13
1202 + x2 = x
5
squaring both sides...
132 2
1202 + x2 = x
52
132
1202 = 2 x2 − x2
5 2
2 13
120 = − 1 x2
52
1202
132
= x2
52
−1
s
1202
x= 132
52
−1
x = 50
so if we want to reach the swimmer in the least amount of time, we need to run all but the last 50 yards
down the beach, and swim from there! This means we’ll need to run:
distance running: = 300 − x

= 250 feet
and swim:
p
distance swimming: = 1202 + x2
p
= 1202 + 502
= 130 feet
Perhaps the logical follow-up question to ask is, how long does it take us to get out to the swimmer? We
know how to get there, but how minimal is this minimum? Well, we already have a function for how long
it takes us to reach the swimmer as a function of x, and now we have an x (50), so we can just plug that
in:
√
300 − 50 1202 + 502
Time(50) = +
13 5
≈ 45.2 seconds
3
So it will take us about 45 seconds to reach the swimmer.
As If It Were That Easy

Unfortunately, there is a catch. Namely: how do we know that this actually is the minimum amount
of time it takes to reach the swimmer? Knowing that the derivative is zero doesn’t tell us that it’s a
minimum—it just tells us that the slope is zero. That could just as easily happen at a maximum, or at a
point that is not an extremum at all:
Imagine how disastrous it would be if, in trying to minimze the amount of time it takes to get to
the swimmer, we accidentally maximize the time. Or if, in trying to calculate how our company should
maximize its profit, we accidentally minimize profit.
So the question is: how do we distinguish between maxima, minima, and non-extrema? Simply finding
where the derivative is zero doesn’t tell us. It tells us what points are POSSIBLE maxima (or possible
minima), but it doesn’t actually pinpoint the extrema. It gives us a list of suspects—but it doesn’t tell us
which of those suspects is the murderer.
Every maximum, and every minimum, will be a point at which the derivative is zero—but not every
point at which the derivative is zero will be a maximum (or an extremum at all). We could have something
like the inflection point on x3 .
Put differently: if a point on a function is a minimum, then the derivative at that point will be
zero—but if the derivative of a function is zero at some point, that point is not necessarily a minimum1 .
Clearly, we need to learn how to distinguish between these options. Allow me to suggest this as a
resolution:
• If we have a minimum, we know that the function must be decreasing until it gets to the minimum,
then zero, and then increasing after it gets past the minimum.
• If we have a maximum, we know that the function must be increasing until it gets to the maximum,
then zero, and then decreasing after it gets past the maximum.
1
If a shape is a square, then that shape is a rectangle; but if a shape is a rectangle, that shape is not necessarily a square.
4
Visually:
But of course, “increasing” is just a fancy way of saying “the derivative is positive,” and “decreasing” is
just a fancy way of saying “the derivative is negative”:
So if we want to see which possible extrema are maxima and which are minima, we can just see what
the derivative does!
Example
Allow me to give an example. What if we have the function:
f (x) = 5(x − 4)2 + 7
Now, obviously, we know this is a simple, upward-opening parabola, and so the point where the derivative
is zero will be a minimum. But pretend we don’t know that. Pretend we’re totally blind about parabolae,
and we can only use our fancy calculus tools2 .
So if we take a derivative, we get:
f 0 (x) = 10(x − 4)
And so we can ask: where is this function decreasing? That’s equivalent to finding out where the
derivative is negative:
f 0 (x) < 0
10(x − 4) < 0
x−4<0
x<4
So we know that f (x) will be decreasing wherever x is less than 4.

We can continue, and ask: where is this function increasing? f (x) will be increasing wherever the
2
Which is actually a good idea, because you only already know that that point is a minimum because you’ve memorized
how the shapes of parabolas correspond to their equations. I would argue that you don’t actually know why that happens.
5
derivative is positive:
f 0 (x) > 0
10(x − 4) > 0
x−4>0
x>4
So we know that f (x) will be increasing wherever x is greater than 4. We can summarize our knowledge:
So the function is DECREASING to the left of x = 4, and then to the right of x = 4 it’s INCREAS-
ING... so x = 4 must be a minimum!!! Moreover, we can ask: what is the y-coordinate of this minimum?
Since we know the x-coordinate, we can just plug that back into the original function:
f (0) = 5 · (0 − 4)2 + 7
= 87
So this function has a minimum at (4, 87)! It must look like this:
More Complicated Example

What if we do something slightly more complicated? Imagine we have the following function:
f (x) = 2x3 − 3x2 − 12x + 24
Where does it have maxima and minima? (It’s a cubic, so presumably it has one maximum and one
minimum, and it looks like +x3 , so presumably the maximum is to the left of the minimum—but let’s
pretend we don’t know that. Let’s do this only using our calculus tools.)
6
If I take a derivative, I get:
f 0 (x) = 6x2 − 6x − 12
= 6(x2 − x − 2)
= 6(x + 1)(x − 2)
So now I can ask: where is this function increasing? where is it decreasing? To do that, I just need to
find where the derivative is positive and negative, respectively. My derivative has two factors, so I have, I
guess, three cases to consider:
• If x < −1, then we have: 6 · (−)(−) = (+)
– so when x is less than −1, f 0 (x) is positive

∗ so when x is less than −1, f (x) is increasing
• If −1 < x < 2, then we have: 6 · (+)(−) = (−)
– so when x is between −1 and +2, f 0 (x) is negative

∗ so when x is between −1 and +2, f (x) is decreasing
• If x > 2, then we have: 6 · (+)(+) = (+),
– so when x > 2, f 0 (x) is positive

∗ so when x > 2, f (x) is increasing
So our situation looks like this:
But if I am switching from increasing to decreasing at x = −1, then I must have a maximum at x = −1.
And that maximum will be located at y-coordinate of:
f (−1) = 2(−1)3 − 3(−1)2 − 12(−1) + 24

= 31
So we have a maximum at (−1, 31).

And then we go from decreasing to increasing at x = 2, so we must have a minimum at x = 2. That
minimum will be located at a y-coordinate of:
f (2) = 2(2)3 − 3(2)2 − 12(2) + 24

=4
So we have a minimum at (2, 4). And then our function in total must look like:
7
(Incidentally, this tells us something else that’s kind of interesting. We can’t really factor x3 − 3x2 −
12x + 24, at least not easily, and thus we can’t find out where its roots are—or even how many roots it
has. We know it can have up to 3, because it’s third degree, but it could have fewer. BUT—because we
know that it’s way negative way on the left, and increases until the maximum at y = 31 and then decreases
down until the minimum at y = 4, and then keeps on increasing—we know it must have only one root,
somewhere to the left of x = −1.)
The “Second Derivative Test”

But, actually, there’s an easier way to do most of these. We’ve seen that we have a minimum if the
function goes from decreasing to increasing, i.e., if the derivative goes from negative to positive. But if the
derivative is going from (−) to (+), then the derivative itself must be increasing! But that’s the same as
saying that the second derivative is positive!!! So if we have a point at which
a) the first derivative is zero, and
b) the second derivative is positive,
then we must know that
c) the point is a minimum.
Likewise with maxima: we’ve seen that we have maximima if:
• the function goes from increasing to decreasing

• i.e., if the derivative goes from (+) to 0 to (−),
• but that’s the same as saying that the derivative is decreasing,
• but that’s the same as saying that the derivative of the derivative is negative,
• which is the same as saying that the second derivative is negative.
Don’t freak out if it takes you a while to grasp this. There are a lot of layers of abstraction here. It’s
hard. Read it, and then read it again, and then read it again. Perhaps some visuals will help: imagine I
have a function and its derivative that look like this:
8
So this is a function that has a maximum, a minimum, and an inflection point (i.e., a point where the
derivative is zero but which is not an extremum). What distinguishes the maximum from the minimum
from the inflection point?
9
At the extrema, the derivative is passing through the axis; at the inflection point, it is merely bouncing
off it. Moreover, we can see that at the maximum, the derivative is decreasing (i.e., the derivative itself
has a negative slope). And at the minimum, the derivative is increasing (i.e., the derivative itself has a
positive slope):
These sketches are very important, since I think they manage to capture in a single drawing the entire
content of these notes.
The salient feature you should take away from them is this: the derivative has to change sign
in order for the function to have an extremum. If the derivative simply bounces off the axis, the
function won’t have an extremum (it’ll have an inflection point). Put differently: if the derivative has a
root of odd multiplicity, that root is an extremum of the original function; if the derivative has a root of
even multiplicity, that root is not an extremum of the original function.
Or we might say, stronger yet: a function will have extrema exactly where its derivative has
odd roots3 .
Returning to Those Two Examples

We could have found the extrema of those two functions we investigated earlier, and determined which
extrema are maxima and which are minima, much more easily simply by considering the second derivative.
3
“Exactly where” is a mathematical term of art equivalent to “if and only if;” what it means is that every odd root of the
derivative is the location of an extremum, and (conversely) every extremum happens where the derivative has an odd root.
10
If we have f (x) = 5(x − 4)2 + 7, then we have f 0 (x) = 10(x − 4), and so we know the derivative is zero
when x = 4. (We know f 0 (4) = 0.) Moreover, we know the second derivative is f 00 (x) = 10, so if I want to
find what the second derivative is at x = 4, I get f 00 (4) = 10, which is positive. So then at this point where
the derivative is zero, the second derivative is 10, which is positive. So at this point where the derivative
is zero, the derivative is increasing—it’s going from negative to zero to positive—and so that point (x = 4)
must be a minimum.
If I have f (x) = 2x3 − 3x2 − 12x + 24, then my derivatives are as follows:
f (x) = 2x3 − 3x2 − 12x + 24

f 0 (x) = 6x2 − 6x + 12 = 6(x + 1)(x − 2)
f 00 (x) = 12x − 6
There are two points where the derivative is zero: at x = −1, and at x = 2. If we consider what’s happening
with the second derivative at these points...
• When x = −1, f 00 (−1) = −12 − 6 = −18. So the second derivative is negative at this point where
the first derivative is zero, which means the first derivative must be decreasing, which means that
the first derivative must be going from positive to zero to negative, which means that the original
function must be going from increasing to flat to decreasing... which means that x = −1 must be a
maximum. (Phew. This matches up with what we found earlier.)
• When x = 2, f 00 (2) = 12 · 2 − 6 = 18. So the second derivative is positive at this point, which means
that the first derivative must be increasing, which means that the first derivative must be going from
negative to zero to positive, which means that the original function must be going from decreasing
to flat to increasing... and so x = 2 must be a minimum.
Globally...
As one closing comment: the stuff we’re doing here—i.e., the procedure of finding maxima and
minima—is known as optimization4 . This is, in general, a hard thing to do. I hope you’ve realized
that it is more complicated than simply finding where the derivative is zero. But even beyond that,
you should realize that even finding where the derivative is zero is essentially a problem of algebra—the
problem of finding where a certain equation is zero, i.e., the problem of finding roots of an equation, i.e.,
factoring. This is hard. Factoring is not easy, as we’ve discussed. So we’ve hardly saved the world—we’ve
just reduced one problem that we don’t know how to solve (finding extrema) to another problem that we
sometimes know how to solve (finding roots).
A mathematician is tired of teaching and tired of research, so he decides to become a

firefighter. He goes down to the New Haven Fire Department to interview, and the fire chief
says, “Your record is excellent, and we’d love to hire you, but before we do, I have just two
questions: you’re walking down the street one day, and you see a fire in a dumpster. What do
you do?”
The mathematician responds, “I put it out.”
4
As you might have guessed from the title of the notes.
11
“Great!,” the fire chief says. “Now, what if you’re walking down the street and you see a
dumpster that’s not on fire? What do you do?”
The mathematician responds, “Well, I light it on fire, and then I put it out.”
The fire chief is flabbergasted. “You light it on fire—even though it’s not already on fire?
Why would you do that??”
“Well, because that way, I’ve reduced it to a problem I already know how to solve.”
Even assuming we can factor everything, there’s yet another, more important caveat I should make:
we are only considering finding extrema when we have nice, simple, smooth functions—things we can take
derivatives of—like polynomials and trig functions and exponentials and whatnot. The world is hardly this
simple. There are plenty of functions (elsewhere in math, and certainly in the world) that are not smooth
and continuous like polynomials. The world is far too chaotic. And when it gets to such functions, the
problem of finding maxima and minima can be incredibly difficult.
For instance: I spent four years at the University of Chicago working at Doc Films, the student-run
cinema. Every quarter, the hundred-odd volunteers would have to sign up for shifts. We’d rank our top
five shifts and positions, in order of preference, and send them in to the volunteer chair. There are fourteen
shows per week (average of two per night), and seven jobs per show, so that gives 7 · 14 = 98 possible slots.
So the question is: how do you schedule slots so that the largest number of people get the slots they want
the most? how do you schedule such that you create the greatest benefit for the greatest number?
This doesn’t reduce to an easy problem of “take the derivative!” We could make a function for the net
happiness of the volunteers, but it certainly wouldn’t be the sort of smooth curve that we could differentiate.
It would be somewhat random and arbitrary, based on the whims of the volunteers. There would be no
Platonic foundation. It wouldn’t be like how each of the infinite number of points on the function y = x2
satisfies that equation.
Of course, the advantage our situation has over finding the extrema of x2 is that we at least only have
a finite number of possibilities. With y = x2 , one way to find the extrema would be to look at every point,
and then look at the points immediately to the left and to the right—if both points are lower, we have
a maximum; if both points are higher, we have a minimum. But there are an infinite number of points
on y = x2 ; any one of them could be an extremum. We’d have to look at an infinite number of points5 .
In our scheduling problem, there are only a finite number of ways we can schedule our finite number of
volunteers into a finite number of slots. So, in principle, we could just look at all the possible arrangements,
calculate how optimal each arrangement is, and choose the best arrangement. For example, we could say
that a volunteer has a happiness-score of 98 if they get the job they wanted most, a score of 97 if they get
their second-most-wanted job, and so forth, and then add up all the happiness-scores of all the volunteers.
Then we’d just need to calculate this total-happiness-score for each possible arrangement of volunteers into
slots, and choose the arrangement that gives us the highest total-happiness-score. With 98 slots and 98
volunteers, there are 98! possible ways to assign slots—98! possible arrangements. 98! is about 9.4 · 10152 .
The number of atoms in the universe is about 1080 .
Uh-oh. We can’t just check every possible arrangement—if the number of possibilities is greater than
the number of atoms in the universe, good luck getting your PowerBook to go through each possiblity and
find the best one. Good luck getting any computer to do that. And you don’t have the benefit, as you
do with smooth curves that you can differentiate, of having some sort of predictability. You just have a
bunch of discrete, randomly-ordered states.
So we have to give up hope that we can ever find the optimal solution. Instead, we’ll have to find some
procedure that gives us a pretty good solution—some algorithm that will approximate a solution, that will
estimate the extremum—and hope that our approximation is be pretty close to the actual optimum.
Here’s a cartoon about another movie-theater-related optimization problem (one that you may me
more familiar with)6 :
5
Not to mention the obvious issues with countability and continuity—as we discussed a few months ago, there is no such
thing as a “next” real number. What’s the real number immediately to the right of 3? 3.1? 3.01? 3.001?
6
http://xkcd.com/173/
12
So the general question of finding extrema is both tremendously important and tremendously difficult—
so much that I have friends who are getting PhDs in, quite literally, “optimization.” I once wrote a
lengthy paper on Nicomachean Ethics, the thesis of which was that life is just a tremendously complicated
optimization problem—and not just in the obvious “you need to find the perfect median!” way, but in a
really cool way that resolved (well, attempted to resolve) the paradoxes of Book X.
Here’s a press release from Cornell from a few years back on some kid’s doctoral thesis, which was
about optimal placement of ambulances within a city7 :
Cornell efficiency experts seek to save precious minutes in deploying ambulances
Every extra second it takes an ambulance to how best to spread ambulances across a munici-
get to its destination can mean life or death. But pality to get maximum coverage at all times.
how, besides driving faster, can ambulances get The researchers are working on a comput-
emergency services to people in need as efficiently erized approach to take such available informa-
as possible, every day? It’s a classic operations re- tion as historical trends of types and incidences of
search question that three Cornell researchers are calls, geographical layout and real-time locations
tackling in groundbreaking ways. of ambulances to figure out where ambulance bases
A National Science Foundation grant of al- should be, and where ambulances should be sent
most $300, 000 is allowing associate professor of once finished with a call.
operations research Shane Henderson, assistant The whole process is not unlike the puzzle
professor of operations research Huseyin Topaloglu game Tetris, Restrepo said. The easy part is know-
and applied mathematics Ph.D. student Mateo ing what an ideal system should look like. The
Restrepo to work on this problem. They are seek- hard part is anticipating various outcomes in a
ing to perfect a computer program that estimates limited period of time, like the falling blocks in
7
by Anne Ju, Cornell Chronicle, 16 June 2008, http://www.news.cornell.edu/stories/June08/ambulance.orie.aj.html
13
the video game. cal construction that estimates the impact of a cur-
Using their program, the researchers are rec- rent decision on the future evolution of the system.
ommending that ambulance organizations break In this case, it’s the impact of current ambulance
the traditional setup of assigning ambulance crews locations on the number of future calls that are
to various bases and sending them back to their served on time.
assigned locations once finished with a call. “When you’re trying to make a decision, you
Going back to base isn’t necessarily the best have to select the locations of your ambulances so
option for maximum efficiency, say the operations the performance predicted by the value function is
researchers. It might be better to redeploy an as good as possible,” Topaloglu explained. “But it
idle ambulance to where coverage is lacking, even turns out that computing that function is very dif-
though no calls have yet been placed there. ficult, especially if you’re talking about the scale
“If everyone is constantly going back to the of the problem we’re trying to solve.”
base assigned, they’re ignoring what’s going on in Henderson has more than 10 years of experi-
real time in the system,” Henderson explained. ence working on such problems, using a technique
The concept is easy enough, but the solu- called simulation optimization, which is modeling
tion is tricky, especially because of the enormous different scenarios of what could happen in any
amount of uncertainty involved. given industrial system.
The field of operations research that deals He and a colleague have already commercial-
with making decisions over time in the face of un- ized an earlier generation of emergency medical
certainty is called dynamic programming, in which system planning, which now forms the basis for
Topaloglu is an expert. The key is coming up the technology used by the New Zealand ambu-
with what’s called a value function, a mathemati- lance company Optima.
Problems
Find where each of the following functions are increasing and where they are decreasing. Then find the
extrema, and say whether each extremum is a maximum or a minimum. (And, for good measure, sketch
the function.)
x
1. f (x) = x2 − 3x + 2 11. f (x) = 20. f (x) = cos2 (x)
1 + x2
2. f (x) = x3 − 3x2 + 6 x−2 3

x2 + 1 21. f (x) =
1 12. f (x) = x+2
3. f (x) = x + x2 − 1
√
x 22. f (x) = 3 1 − x
x2
1 13. f (x) = √
4. f (x) = x2 + x2 + 1 23. f (x) = x 3 1 − x
x
1 14. f (x) = x2 (1 + x)2 1 1
24. f (x) = −
5. f (x) = x + 2 x+1 x−2
x x−1
15. f (x) =
6. f (x) = (x − 3)3 x+1 25. f (x) = x7/3 − 7x1/3
7. f (x) = x3 (1 + x)
16 26. f (x) = x2/3 + 2x1/3
16. f (x) = x2 +
x2 √
27. f (x) = (x − x)2
8. f (x) = x(x + 1)(x + 2) 17. f (x) = x − cos(x)
4 28. f (x) = (4x−1)1/3 (2x−1)2/3
9. f (x) = (x + 1)
18. f (x) = x + sin(x) √
1 29. y = xe−x (for x ≥ 0, obvi-
10. f (x) = 2x − 2 19. f (x) = cos(2x) + 2 cos(x) ously)
x
14
2
30. y = x2 e−x 36. f (x) = x2 ln(1/x) 42. f (x) = e1/x
31. f (x) = 2esin(x/2) 37. y = ln(cos x) 2
43. f (x) = e−x
32. f (x) = 1 − e1+cos(πx) 38. y = cos(ln x)
2 +1) 44. f (x) = x2 e−x
33. f (x) = ex/(x 39. y = x2 ln(x)
√
34. g(x) =
4
ex/ x +1 40. y = x ln(x) 45. f (x) = x2 ln(x)
√
35. h(t) = et 4−t2 41. f (x) = (1 − x)ex 46. f (x) = (x − x2 )e−x
The following are derivatives of functions. Sketch both the derivatives and the original functions (i.e., both
f and f 0 ). Where does f have minima? where does f have maxima?
47. f 0 (x) = (x + 1)(x − 2)2 50. f 0 (x) = x4 (x − 2)3 (x + 3)(x − 6)
48. f 0 (x) = x4 (x − 3) 51. f 0 (x) = (x + 1)(x − 5)2 (x2 + 3)
49. f 0 (x) = x3 (x − 1)2 (x + 1)(x − 2) 52. f 0 (x) = (x − a)(x + b)2 (x + 5b)
53. Imagine you have the function f (x) = ax2 + bx + c. Where does f (x) have an extremum? When
(i.e., under what conditions of a, b, and c) is this extremum a maximum? when is it a minimum?
54. Imagine you have f (x) = ax3 + bx2 + cx + d. When will f (x) have two extrema? one extrema? no
extrema? where are they? (Give (x, y)-coordinates.)
a
55. Consider the function f (x) = x2 + . What value of a makes it have
x
(a) a minimum at x = 2
(b) an inflection point at x = 1?
a
56. Where does f (x) = x2 + have a maximum?
x
57. What values of a and b make f (x) = x3 + ax2 + bx have
(a) a local maximum at x = 1

(b) an inflection point at x = 1?
58. Imagine you have the equation f (x) = x4 − 8x3 + 22x2 − 24x + 4. If you wanted to find out where
the roots of this equation are (and how many of them there are), you’d need to factor it—but good
luck doing so. Without factoring it, though, can you figure out how many roots this equation has
(and roughly where they are)? (Hint: find the extrema.)
59. Is the function f (x) = x2 − x + 1 ever negative?
60. Is the function f (x) = 3 + 4 cos(x) + cos(2x) ever negative?
61. Suppose the function f (x) = ax2 + bx + c has a local minimum at x = 2 and passes through the
points (−1, 3) and (3, −1). Find a, b, and c.
ax
62. If you want the function f (x) = to have a local minimum at x = −2 and for the derivative
x2
+ b2
at 0 to be −1 (i.e., f 0 (0) = −1), what must a and b be?
15
63. Imagine you have the function f (x) = xp (1 − x)q . Where might f have extrema? Where does f
have a maximum? If p is even, where does f have a minimum? If q is even, where does f have a
minimum?
64. How many extrema can a polynomial of degree n have, at most? Why?
65. Imagine you have a function whose first derivative is never zero. What can you say about the
function?
66. Imagine you have a function whose second derivative is never zero. What can you say about the
function?
67. Imagine you have a point where the derivative is zero, and the second derivative is also zero. Does
this mean that the point is not an extremum?
68. Find the dimensions of the rectangle of perime- (b) Express the area of the rectangle in terms
ter 24 that has largest area. (What is that of x.
area?) (c) What is the largest possible area the rect-
angle could have? What are its dimen-
69. Prove that, of all the rectangles with a fixed sions?
perimeter, the one with largest area is a square.
72. A rectangular garden 200 square feet in area
70. The sum of two positive numbers is 20. If you
is to be fenced off against deer. Find the di-
want the product of the numbers to be as large
mensions that will require the least amount of
as possible, what must the numbers be? If you
fencing if one side is already protected by a
want the sum of the squares of the numbers to
barn.
be as large as possible, what must the numbers
be? If you want the sum of one number plus 73. Find the largest possible area for a rectangle
the square root of the other number to be as whose base is on the x-axis and whose vertices
large as possible, what must the numbers be? are on the curve y = 4 − x2 .
71. Imagine you have a rectangle inscribed in an 74. Find the largest possible area for a rectangle
isoceles right triangle whose hypotenuse is 2 inscribed in a circle of radius 4.
units long:
75. A rectangular warehouse will have 5000 square
feet of floor space and will be seperated into
two rectangular rooms by an interior wall. The
cost of the exterior walls is $1500 per linear
foot and the cost of the interior walls is $1000
per linear foot. Find the dimensions that will
minimize the cost of building the warehouse.
76. I once returned home to find that my cat had

taken to sleeping in a box that was far too
small for it, resulting in ripples of fat spilling
out over the edges. (I thought I had photo
of the cat in the box on my cell phone, but
alas.) Why she found this comfortable, I have
(a) Express the y-coordinate of point P in no idea. But imagine that you plan to build
terms of x. (It might help to first come a similar catbox out of an eight-by-fifteen inch
up with an equation for line AB.) sheet of cardboard. You construct it by cut-
16
ting squares out from the corners, and folding at a point that’s ten miles away from the clos-
up the sides. est point on shore. You can row at three miles
per hour, and you can walk at four miles per
hour. Where should you land the boat in order
to make it to Grand Marais in the least amount
of time? (Assume the shoreline of Lake Supe-
rior is straight, which it basically is at that
point, and assume that Grand Marais is thirty
miles away from the point on shore closest to
your dock.) How long does it take you to get
there?
Assuming you want your cat to be comfort-

able, and thus to not spill out from the edges
of the box, what size squares should you cut
out if you want to maximize the volume of the
box?
77. More generally, what if you have a piece of

material that is w units wide and l units long?
What is the largest-volume open-topped box 83. I have enough pure germanium to coat one
you can make with it? square meter of surface. I plan to coat a sphere
and a cube. What dimensions should they be if
78. A rectangular box with square base and top
I want the total volume of germanium-covered
is to be made to contain 1250 cubic feet. The
solids to be maximal? minimal?
material for the base costs 35 cents per square
foot, for the top 15 cents per square foot, and 84. A rectangle has one side on the x-axis, and the
2
for the sides 20 cents per square foot. Find the upper two corners on the graph of y = e−x .
dimensions that will minimize the cost of the Where should the vertices be placed in order
box. to maximize the area of the rectangle?
79. You want to break into a building. The build- 85. A poster you are designing needs to contain
ing is surrounded by an eight-foot fence, ex- fifty square inches of printed material, with
actly one foot away from the building. What two-inch margins at the top and bottom and
is the shortest-length ladder that can go over one-inch margins on the sides. What dimen-
the fence and reach the building? sions for the poster minimize the amount of
80. More generally, what if you have a fence h feet paper used?
tall that is w feet away from a tall building? 86. As the deputy director for operations of the
What is the length of the shortest ladder that Central Arizona Project, you are trying to de-
will reach the building from outside the fence? sign a new canal from Tempe to Tucson. The
81. Conical paper cups canal will be above ground, for some reason,
√ are usually made so that and built using prefabricated sheets of an ad-
the depth is 2 times the radius of the
rim. Show that this design requires the least vanced polymer composite imported from Nor-
amount of paper per unit volume. way. The polymer sheets are ten feet wide, and
come in sections 39 feet long (to fit in standard
82. You have to escape from Isle Royale and make shipping containers). Sensibly enough, you’ll
it back to the Minnesotan mainland before the assemble them into a quasi-U shape, with one
wolves eat you alive! Your rowboat is docked section as the bottom of the canal, and two sec-
17
tions as sides. Since you want to maximize the cut from squares that measure 2r on a side.
volume of delicious, life-providing water that Thus, the total amount of aluminum used by
this canal carries, what should the angle be- each can will be:
tween the two side pieces and the bottom piece
be? (At some point in the answer, trig identi- A = 2 · (2r)2 + 2πrh
ties may be helpful.)
What should r and h be to minimize the
amount of aluminum used?
90. What is the maximum possible area for a tri-

angle inscribed in a circle of radius r?
91. A power line is needed to connect a power sta-

tion on the shore of a river to an island 4 kilo-
meters downstream and 1 kilometer offshore.
Find the minimum cost for such a line, given
that it costs $50, 000 per kilometer to lay wire
under water and $30, 000 per kilometer to lay
wire under ground.
92. The U.S. Postal Service will only accept a box

for domestic shipment if the sum of its length
and girth (distance around) does not exceed
108 inches. What dimensions will give a box
with a square end the largest possible volume?
87. Imagine you want to fit 12 ounces (about 355

ml) into a cylindrical can—like a soda can!
You could have a really narrow and really long
can, but that’d take a lot of aluminum and
would be expensive; you could have a really
short and really wide can, but that’d take a lot
of aluminum and’d be expensive. Somewhere
in between these two extremes is the perfect
median that minimizes the amount of surface
area in your can! Calculate it. What are the
dimensions of this optimal soda can? Then go
out, find a soda can, measure it, and compare √
it to your result. 93. How close does the semicircle
√ y = 16 − x2
come to the point (1, 3)? (Suggestion: what
88. More generally, if you want to enclose k ml of is it you want to minimize?)
liquid into a cylindrical can, and you want the
surface area of the can to be minimal, what 94. A tapestry 7 feet high hangs on the wall of your
should the dimensions of the can be? (Com- castle. The lower edge is 9 feet above an ob-
pare your result to whatever cans you find ly- server’s eye. If you’re really close, then when
ing around the house!) you want to observe it you’ll have to bend your
head upwards and it’ll be really small (as a
89. You are designing a 1000cm3 cans, and in your fraction of your visual field); if you’re really far
calculations for the most efficient design, you away, the tapestry will look really small (and
want to take waste into account. There is no thus the angular space it takes up in your vi-
waste in cutting the aluminum for the sides, sual field will be really small). How far from
but the tops and bottoms of radius r must be the wall should the you stand to obtain the
18
most favorable view? Namely, what distance function for the total distance the light trav-
from the wall maximizes the angle that the els, and minimize that? I always do this one
tapestry takes up in your visual field? wrong, because I try to write a function for dis-
tance in terms of θi and θr —that doesn’t seem
95. Suppose that at some time t, the position of an to get you anywhere. Instead, try appealing to
object moving back and forth along the x-axis Pythagoras.
is given by x(t) = (t − 1)(t − 4)4 .
98. What if you have a curved mirror—i.e., a mir-
(a) When is the particle at rest? Where is it
ror whose surface is not given by a straight line
when it is at rest?
(as in the above problem) but whose surface is
(b) When is the particle moving leftwards?
given by some function f (x)? Can you prove
(c) When is the particle going the fastest?
that Fermat’s principle still holds?
(d) Graph x as a function of t between t = 0
and t = 6. 99. Two sources of heat are placed s meters
apart—a source of intensity a at A and a source
96. An object of weight W is dragged along a hori-
of intensity b at B. The intensity of heat at a
zontal plane by means of a force P whose line of
point P on the line between A and B is given
action makes an angle of θ with the plane. The
by the formula:
magnitude of the force is given by the equation:
µW a b
P = I= 2
+
µ sin(θ) + cos(θ) x (s − x)2
where µ is the coefficient of friction. For what If you plan on sticking your hand in between
value of θ is it easiest to pull the object? the two heat sources, where should you put it?
97. Prove Fermat’s principle: that the angle of in- 100. Your bus company offers charter trips to the
cidence of light on a planar, reflective surface, Adirondack Museum in Blue Mountain Lake
is equal to the angle of reflection: at a fare of $37 per person if 16 to 35 passen-
gers sign up. You don’t charter trips for fewer
than 16 passengers. The bus has 48 seats. If
more than 35 passengers sign up, the fare for
every passenger is reduced by 50 cents for each
passanger in excess of 35 who signs up. Deter-
mine the number of passengers that maximizes
your revenue.
101. You are driving a truck on a freeway at a con-

stant speed of x miles per hour. The maximum
speed on the freeway is 55 mph, and the mini-
mum speed is 35 mph. Assume that fuel costs
You can prove this geometrically or analyti- $1.35 per gallon (how old is this textbook!?!),
1 2
cally (i.e., using calculus); do it using calcu- and is consumed at a rate of 2 + 600 x gallons
lus. Suggestion: imagine that light travels per hour. Given that the driver is paid $13 per
from some point A (the flashlight) to another hour, at what speed should the truck be driven
point B (the eye). Can you come up with a to minimize the truck owner’s expenses?
19
Write Your Own Adventure
Imagine this scenario: you’re a student at an academically-demanding school in the northern Sonoran
desert, and you’ve been doing word problems with derivatives for the last month or two, except the word
problems are really rather bland, and you’re good enough at the fundamental mathematics that you can
do all of them without much trouble. They’re so boring and so tedious that whenever you sit down to do
them, you always end up zoning out. You sit at your desk, idly tapping your pencil and staring at the
wall, all while wishing: if only this stuff could be harder and more interesting!
This is not a hypothetical situation! This is, in fact, your final project for this class: I want you to
write and solve a word problem. It might sound easy, but in many ways is quite a bit harder than just
solving a word problem. Often in the 12th grade class I have spent several days designing a single word
problem (and I can do most of the symbol-manipulation in my head). The math in your problem needs
to be all correct. But that’s just the basics. When you write a Humane Letters essay, the English needs
to be sensible and syntactical. You have plenty of time at home to check your algebra and calculus. You
shouldn’t get any of it wrong. What I’m really looking for is for you to do two things: one, create a reality
that induces a challenging, non-trivial math problem, and two, solve it. Put differently, I want you to
write and solve a very creative word problem that requires difficult mathematics.
As examples, consider the following four word problems:
Socrates walks into a coffee shop and the barista asks him what 9 + 1 is.
This is terrible. The math is entirely deus ex machina—it is not remotely related to Socrates or a coffee
shop. Plus, it’s utterly simple—no one taking calculus should have the slightest bit of difficulty adding
two one-digit numbers.
Socrates and Plato walk into a coffee shop. Socrates orders nine shots of espresso and Plato orders one
shot of espresso. How many shots of espresso do they order in total?
This is marginally better. The math is actually integrated into the narrative, but still, 9 + 1? The math
needs to be challenging.
Socrates walks into a coffee shop. The barista asks him to find the volume of the solid generated by
√
revolving the functions y = 3 x + 4 − 2x and y = 3 sin(2x4 ) about the line y = x.
OK, so here we have math that is actually substantive. This is a difficult problem. For one thing, the
functions are complicated, which will probably make the algebra unpleasant. But more importantly, how
do we revolve a shape around a slanted line? We know how to revolve shapes around axes or lines parallel
to the axes—but how do we deal with a slanted line?1 Anyway, the math in this problem is good, but it’s
still totally unrelated to Socrates’s coffee break—and as a result, fails as a word problem.
Socrates walks into a coffee shop... but then the barista tells him that the Sophists are hot on his tail and
he needs to flee Athens. So Socrates runs outside and jumps into his Toyota Corolla—maybe not the
best getaway car, but sensible and affordable. After all, who is Socrates trying to impress? He’d rather
donate the profits from his academy to charity. Anyway, as he’s racing down the street, sending money-
changers and street urchins flying, he looks in his rear-view mirror and sees three F-22 fighters from the
elite Sophist Air Superiority Arm in pursuit. One of them fires a Hellfire missile. At the moment the
missile is fired, Socrates is driving 15 mph, and his car can accelerate at 14 meters per second per second
(m/s2 ) up to a top speed of 75 mph. The Hellfire, on the other hand, can accelerate at 40 m/s2 up to a
top speed of Mach 1.3. Question: when does the vial of hemlock Socrates has stored in his wallet become
irrelevant?
1
Go online and look up “rotation matrix” if you’re curious—it will look really scary, because it pretends to be a matrix,
but it’s really nothing more than a messy formula. You can understand it without thinking of it as a matrix, and if you had
enough time on your hands, you could probably come up with it by yourself.
This is good. The math is difficult, and is is clearly related to the very vividly written story. (Whether
the story is an accurate interpretation of Socrates’s teachings or death remains an open question.)
I hope this gives you an idea of the kind of problem I’m looking for. There are plenty more examples
in the problems I’ve given you this year and the problems in your book. I want you to challenge yourself
and produce something you’re proud of. This is a way of pulling together everything you’ve learned this
year into a unified whole.
Your word problem should involve calculus, which means that it should probably be a related rates
problem and/or an optimization problem. One especially cool thing you might try, if you’re ambitious,
would be to combine the two—write some sort of a problem that involves finding a maximum/minimum
rate.
When you turn it in, be sure it is written up neatly (typed is best), and that your solution to the
problem is clear, carefully-explained, and complete (so that even your younger siblings can understand
it)2 . I’m giving you this assignment quite a while before it’s due, so you are welcome to turn it in early.
(Do not wait until the night before to start—you will regret it.)
Due: May 2nd
2
As a reference/guide, you might want to read the article “How to Write Math in Paragraph Style,” by Tim Hsu, online
at http://www.math.sjsu.edu/~hsu/
Calculus Presentations
In this class we are not just learning math—we are also learning how to teach math. These
are two very different things. Math may be an object of contemplation, but math teaching requires
practical, not theoretical, wisdom. So we need to practice. These presentations are an opportunity
for you to take charge and teach the class. Each night when I assign homework, I’ll assign two of
you problems to present in class (see schedule on reverse); the next day, you’ll come in and spend
ten minutes or so at the board going over the problem, just as if I were going over it (except better).
Teach us how to solve the problem! You should set up, explain, and work through the problem
as clearly as possible, as if we had just seen that type of problem for the first time. Write on the
board, use algebra, calculus, graphs, words, color-code, come up with clever metaphors, call on
people to help with steps, bring in visual aids—do everything that you think is necessary to fully
explain how to solve the problem.
You won’t know what problem you’ll be presenting until the day before, due to some strategic
decisions about long-term planning on my part, but I don’t mean that to be a hinderance. Most
of the problems I’ll be assigning for you to present will be on topics that we’ve already spent some
time covering, or that are review from last quarter, so that you can focus your efforts on explaining
the math rather than understanding it. (This is not to say that explaining and understanding are
disjoint, but that’s a much longer conversation...) Come talk to me if you are uncertain about
how to work out the math for your problem, or even if you just want to practice explaining your
solution—I’m happy to help. I really do want you to do well. The point isn’t to show off how
good at math you are, or how you can antidifferentiate a factored fifth-degree polynomial in your
head—the point is to show off how good you are at teaching us how we can do the math.
Some Themes in 11th Grade Math
Abstraction and generality. We know how to find the sine/cosine/tangent of angles between 0 and
90 degrees—but can we find the sine/cosine/tangent of negative angles? what about angles greater than
90◦ ? We know the Pythagorean theorem for right triangles—can we extend it to any triangle? We know
how to deal with functions of the form axn where n is a natural number (i.e., polynomials)—what if n
is any integer (and thus axn a rational function?) what if n is a rational number (and this axn a root)?
We know how to find the slope of a straight line—can we find the slope of a curvy line? If we have a
function, we know how to take its first/second/third/etc. derivative—but what would it mean to take its
−1st derivative? its −2nd derivative? what about its three-and-a-halfth derivative?
Formalism. We have a rough idea of how to sketch the slope of a curvy line—but how do we find the
equation of the slope? We have some idea of what we mean by “limit”—but how can we state this idea
in a logical, mathematical way? We suspect that the derivative of a function xn is nxn−1 ; can we prove
this? We know how to find the area of circles, rectangles, and triangles—but what do these “areas” have
in common? What is the fundamental idea of “area” beneath each of these particular instances? Can we
express it mathematically? We have intuitive concepts in our minds (math, morals, truth) and we want
to formalize them (into equations, laws, syllogisms). When the intuitive and the formal versions are in
conflict, which one is right?
Indirect knowledge. It’s hard to factor polynomials, but it’s easy to check that our factorization
works—just multiply it out. Does it make sense to define things “non-constructively”? Does it make sense
to define a limit (using s and δs) in a way that tells us whether a given number is a limit—but not what
a limit actually is? Is there a formula for antidifferentiation, or is an antiderivative of a function just
something that, when you differentiate it, gives you the function?
1
Mathematical Biography: Volume II
At the beginning of last year, I asked you to write a “mathematical biography,” so that you
could introduce yourself to me as a mathematician and as a person. This year, that isn’t necessary,
since I already know you as a student. But—given that you are halfway done with calculus, and
six-sevenths done with Veritas—I would like some sort of state of your mathematical union: is your
soul divided or unified with respect to calculus?
• What do you think you did well with last year in calculus? What areas do you think you
need to improve in? This could be a specific topic (e.g., “I’m still fuzzy about fractions, but
I can differentiate like a demon”), or it could be more general habit (e.g., “I put a lot of
effort into my papers, but I need to be more conscientious about doing day-to-day homework
assignments”).
• What are you excited about for this year? What mathematical topics do you want to learn?
Likewise, what are you least excited about?
• You exist as a person outside of the calculus classroom. Tell me something interesting you
did during our hiatus, and tell me what you want to do next year (college? something else?
where? for what? why?)
Your biography will be graded on effort; as long as it is complete and thoughtful, you will receive
full credit. (Your biography will be kept confidential.) Due:
(rev. 8/2010)
What We Did Last Year
These problems should be fairly self-explanatory. Please write up your solutions/answers for the last
four problems (8–12) on a separate sheet and hand them in.
1. Solve for x (i.e., find all values of x that make this equation true):
0 = 10αx29 − 40gαx27 − 5yx2 α2 + 20α2 gy
x(x − 5)(x + 2)
2. Sketch y =
3(x + 1)(x + 2)
3. Sketch f (x) = x2 (x + 1)(x − 20)(x + 5)3 (2x + 4)
4. Evaluate: sin(π/3), cos(π/4), tan(5π/4), tan(5π/6).
dy
5. Find dx , given that y = x2 + x74 − ln(x) − log3 (x) + 51x − ex + sin(x) − cos(x)
dy
6. Find dx , given that y = x(x2 + 2) − sin(x4 − x90 ) + esin(x) + ln(cos(x2 ))
dy x5 + x25
7. Find dx , given that y = + x5 sin(x) + x3 sin(x)e5x
sin(x)
8. A pig and a porcupine, connected by a spring, are perambulating away from a pine tree at
perpendicular angles to each other. The pig is waddling at 5m/s and the porcupine is strolling at 3m/s.
After they’ve been fleeing the tree for twenty seconds, how quickly is the length of the spring changing?
(Assume that, despite the spring force getting stronger and stronger, they keep constant velocities and
trajectories. A much, much cooler version of this problem would have it entangled with Hooke’s law, which
is, after all, just a basic differential equation: force = -(some constant) · velocity, and since force = mass
· dv dv
dt (where v is velocity), we have m dt = −kv. I guess we’d need to know the masses of the animals.
Anyway, I am totally excited about writing a problem like this sometime in the future.)
9. As the deputy director for operations of the Central Arizona Project, you are trying to design
a new canal from Tempe to Tucson. The canal will be above ground, for some reason, and built using
prefabricated sheets of an advanced polymer composite imported from Norway. The polymer sheets are
one foot wide, and come in sections 39 feet long (to fit in standard shipping containers). Sensibly enough,
you’ll assemble them into a quasi-U shape, with one section as the bottom of the canal, and two sections
as sides. Since you want to maximize the volume of delicious, life-providing water that this canal carries,
what should the angle between the two side pieces and the bottom piece be? (At some point in the answer,
the periodic trig identities may be helpful.)
f (x + h) − f (x)
10. Explain the significance of the equation lim . Where does it come from?
h→0 h
11. What is a limit? (Write a paragraph.)
12. What is an antiderivative? Discuss.
Integration
For all the formulas and the formalism, despite the rigors of the difference quotient and the difficulties
of the quotient rule, calculus is about just two ideas: slopes and areas. These are two ideas that seem
to have nothing in common with each other. And yet—as Newton and Leibniz discovered—these two
operations are not only related, they are the same operation. Or, rather, they are inverses of each other:
if we have a function, and we take its derivative, we get its slope. If we have a function and we take an
antiderivative, we get its area. It makes sense that “slope” and “derivative” should be equivalent, since
that was our very purpose in coming up with the formal idea of the derivative, but Newton and Leibniz’s
remarkable realization was that area is just an antiderivative.
Let me try to show you why. Imagine we have some function f (x):
And imagine that the area beneath this function, from the origin out to some point x, is given by the
function A(x):
Obviously, we don’t yet know what A(x) is; our goal is to find some sort of formula for it. So bear with me.
What if I want to find not the area underneath f (x) from 0 to x, but just the area of a little sliver—from
x to a point, say, h units beyond it, x + h.
We can actually come up with a formula for this using our function A(x). We know that the area from 0
to x is A(x). But x could be anything—it could be 5, it could be 6, it could even be x + h. So the area
1
from 0 to x + h must be A(x + h). But we don’t want to find the area from 0 to x + h—we want to find
the area from x to x + h. So all I need to do is take the area from 0 to x + h, and subtract that big chunk
I don’t want—subtract the area of the region from 0 to x.
So then the area of my little sliver is A(x + h) − A(x).
But there’s another way that I can find this area. Namely: this is a little sliver, not a giant plank. h is
pretty small. So then the region from x to x + h is probably pretty close to a rectangle. Sure, there’s that
curvy bit at the top, but because h is reasonably small, that doesn’t make that much of a difference.
But we already know how to find the area of a rectangle: width times height. The width of this little
sliver is h, and the height is approximately f (x):
So then the area must be roughly f (x) · h. But then I have two different ways of writing the area of this
sliver: the area is A(x + h) − A(x), and the area is also (approximately) f (x) · h. So I must have:
approximate area ≈ exact area

f (x) · h ≈ A(x + h) − A(x)
Or just:
A(x + h) − A(x)
f (x) ≈
h
2
Moreover, as h gets smaller and smaller, this approximation gets better and better—as h goes to zero,
these two things become equal:
A(x + h) − A(x)
f (x) = lim
h→0 h
(One of the things I dislike about writing, as opposed to teaching in person, is that it’s harder to toy with
the timing. Because what I really want you to do right now is stare at that equation and let it sink in and
realize what just happened.)
This looks horrifyingly familiar. We tried to find the area of this shape—and we ended up with
Fermat’s difference quotient. What this is telling us is that if we take the derivative of this equation for
the area, A(x), we get the equation for the curve, f (x). Conversely, if we had the equation for the curve,
and took an antiderivative, we’d get the equation for the area. Or: the area beneath a curve is just the
antiderivative of the curve.
This is startling.
Somewhat More Formally

That was a somewhat informal exploration. Let’s see if we can develop this idea further. First of all,
let’s call this function for the area an “integral,” and define it thusly:
Z b
f (x) dx = the area between f (x) and the x-axis, between a and b
a
3
So, if we do this, we are really just creating a fancy notation for a very simple idea: the idea of
area. One of the consequences of this is that there are already a bunch of integrals we already know. For
Z 12
instance, what if you want to find 5 dx? You can do this without any fancy techniques! You know
0
that the function 12 is just a horizontal line at y = 12, so if you want to find the area between y = 5, the
x-axis, and the x-coordinates 0 and 12, all you need to do is find the area of a rectangle with width 12 and
height 5:
So we have: Z 12
5 dx = 12 · 5 = 60
0
In the homework, I’ll ask you to generalize this a little: what if you want to find the integral from a to b
of the function f (x) = k?
This is not at all interesting; all we’ve done is apply a fancy symbolism to stuff you’ve known
R for
years. You could go into an elementary school classroom, talk about “integrals” and use the symbol,
and even though you’d be doing absolutely nothing different, it’d seem completely incoherent and foreign
and complicated... even though the idea is very simple.
Here’s another example: what if we have a sloped line, like f (x) = 3x + 2? What if we want to find
Z 5
3x + 2 dx?
0
4
This is not hard, either—I just break it up into a triangle and a rectangle. (Or use a formula for the area
of a trapezoid.) If I find the coordinates of all the relavent points, I have something like:
And then I can find the width and the height of each shape:
5
So I have a triangle with width 5 and height 17, so it has area 12 5 · 17 = 37.5. And I have a rectangle with
width 5 and height 2, so it has area 5 · 2 = 10
So then the total area of my shape is 37.5 + 10 = 47.5. Put differently:

Z 5
3x + 2 dx = 47.5
0
6
Problems
Using various non-fancy techniques, evaluate each of the following fancy expressions:
Z 3 Z 239 Z b Z 4p
1. 5 dx 4. 7 dx 7. x + β dx 10. 16 − x2 dx
0 5 0 −4
Z b Z b Z b Z r p
2. k dx 5. x dx 8. αx + β dx 11. r2 − x2 dx
0 0 0 −r
Z b Z b Z b Z rp
3. k dx 6. x + 6 dx 9. αx + β dx 12. r2 − x2 dx
a 0 a 0
Towards a Unified Theory of Integration

There’s a bit of an issue that we haven’t addressed so far. In the previous section, we worked out
simple integrals using formulas we already know for the areas of familiar shapes. What we’ve found is just
a bunch of ad hoc methods for calculating integrals. We’ve said, more or less, that if the function is a
straight line, we can find the integral by considering it as a rectangle; if the function looks like a triangle,
we can use our formula for the area of a triangle; if the function looks like a trapezoid... all we’ve done
is to make a correspondence between shapes of functions and shapes whose areas we already know how to
find. We’ve made up this fancy symbolism that is pure decoration: it has zero utility. (Except possibly
to confuse: why
R use words like “area” and letters like “A” when you could use words like “integral” and
symbols like f (x) dx?)
Before that, we said that integrals are the same thing as areas, and we demonstrated that integrals
are the same as antiderivatives by assuming that we had some function for the area beneath a curve. We
assumed that this “take the area under f (x) from 0 to x” function exists. We take this idea of “area” as
a primitive, atomic idea, one that can’t be further understood or broken down.
But... what is area, anyway? Does the concept of “area” make any sense? Well, that’s kind of a
stupid thing to say. Of course “area” makes sense! We’ve been talking about the areas of shapes our entire
lives! “Area” IS a primitive concept! It makes perfect sense! Except... well, when we came up with the
derivative, we had to spend a long time just talking about the idea of a derivative—we had to convince
ourselves that this concept of the “slope” of a curvy line does make sense, that individual points can have
slopes, and so forth. “Slope” isn’t an idea that makes as much intuitive sense as area.
So if area is such an obvious concept, what is it? What is “area,” anyway? Is it length times width?
It can’t be that, because that wouldn’t work for a circle. Is it “the space something takes up” or “the
space something takes up bounded by a perimeter” or “the two-dimensional space of a region bounded by
a perimeter that a shape takes up”? Those “definitions” don’t really give any clarity—they just use lots
of fancy words to hide the fact that we can’t really explain what area is.
Here’s what I mean. When we talk about derivatives—well, a derivative is a slope, right? Except
“slope” isn’t really a concept of algebra or arithmetic. And all of our calculus is just algebra and arithmetic.
It’s just a fancy set of ideas layered on top of algebra and arithmetic, a beautiful piece of architecture
constructed out of these previous mathematical concepts. But “slope” isn’t really a primitive concept of
algebra and arithmetic. Not in the way that things like “plus” and “five” and “equals” are. So when
we built the derivative, we had to come up with Fermat’s Difference Quotient as a way of translating our
intuitive idea of “slope” into the language of arithmetic and algebra.
It seems to be the same for area. “Area” isn’t a basic concept of algebra or arithmetic. But we seem
to have some intuitive idea of what “area” is. So can we translate that into arithmetic and algebra?
We know a bunch of specific areas:
• Area of a circle = πr2

• Area of a rectangle = width · length
7
• Area of a triangle = 12 base · height
But surely we don’t want to define area on a case-by-case basis! Surely we don’t want to say, “Well, if
you have this shape, the area is this; if you have this shape, the area is this,” and so forth. Surely there
is some fundamental, singular concept of AREA beneath all of these specific formulas! Can we come up
with a single, overarching definition of an area? One method that will always work? One rule to win every
time? As the heirs to Aristotle, Aquinas, and Russell, can we at last realize this Platonic dream of fully
ordering the universe?
Allow me to make a suggestion. First of all, we don’t have to come up with an equation per se, at
least not yet. We could come up with a procedure (an algorithm) instead. So what if we did this: what if
we take a blobby shape like this:
And draw a grid on it, like this:
And then fill in all the boxes that are completely inside of the shape:
8
Then we could say that the area of this blobular shape is 12 boxes.
Area = 12 boxes
Of course, that’s only an approximation, since we still have all of those boxes that are partly inside
the shape and partly outside. So what if we repeat this procedure, and draw a SMALLER grid inside
each of those boxes, and then color in the littler boxes? If we draw a grid that splits each box up into 9
sub-boxes:
And then fill in the sub-boxes:
9
We can fill in 72 of the little sub-boxes. Each sub-box is 1/9 of a full box, so the total area, then, of
our shape must be 12 + 72/9 boxes.
72
Area = 12 + boxes
9
= 20 boxes
But this is STILL just an approximation, because we still have sub-boxes that are partially inside and
partially outside of the shape. So let’s repeat our procedure: let’s split each sub-box up into a grid, so
that each sub-box contains four sub-sub-boxes, and then color those in:
72 50
Area = 12 + + boxes
9 36
≈ 21.389 boxes
10
And we can just repeat this procedure, on and on, ad infinitum. And so let us define the area in
this way: area is what the number of boxes approaches as we repeat this procedure more and more times.
If we define area like this, our definition will work for any shape—triangle, circle, rectangle, blob—and,
presumably, it will give us the same answer as our intuitive concept of “area” does.
With Functions
Of course, our project is slightly more specific: we want to find the area of not just any shape, but
the area beneath a function. So allow me to suggest a slightly simplified version of this “split it up into
boxes” definition/procedure for finding area.
Allow me to suggest this as a procedure: rectangles. When we came up with our super-Pythagorean
theorem, we constructed it by using the old Pythagorean theorem. When we came up with our formula for
the slope of a curvy line (the derivative), we constructed it using the slope of a straight line. We were able
to build on our particular knowledge in order to understand the general case. Let’s do that here. What’s
the simplest shape to find the area of? A rectangle! Its area is just base times height! So—in the same
way that in constructing the derivative we considered every function to consist of an infinite number of
infinitely-short straight lines—what if we think of every region beneath a function to consist of an infinite
number of infinitely-narrow rectangles? Then we could just add up the area of all these rectangles, and
we’d have the total area. (An infinitely-large number of infinitely-small things! The continued clash of
infinities!!!)
This is a rather ambitious project, so let’s start somewhere simpler. What if we want to just approxi-
mate the area underneath a curve? We could do it with a finite number of rectangles. We could just make
some boxes, fit them beneath our curve, and then add up their areas.
For instance, let’s say I want to take a gander at what the area beneath x2 from 0 to 4 is. I’ll split
the area into four boxes, each one unit wide, and I’ll draw the boxes so that their tops hit x2 on the right
side, like so:
So then, if I work out the heights of the boxes, I get something like:
11
And so, all told I’ll get, as my area:
area underneath x2 from 0 to 4 ≈ 1·1 + 1·4 + 1·9 + 1·16

≈ 30
Obviously this isn’t the exact area1 , but it’s probably pretty close. If anything, our estimate is probably
a slight overestimate, since we’ve got those extra curvy-triangles of area on top.
So maybe I could try again, but instead of drawing the boxes such that their tops hit x2 on the right
side, I could draw them so their tops hit x2 on the left side, like so:
And then I get
area underneath x2 from 0 to 4 ≈ 1·0 + 1·1 + 1·4 + 1·9

≈ 15
So this different approximation tells me that the area is about 15. I guess I can draw a couple of conclusions:
1
because the exact area is 21 1/3.
12
1. This second approximation is an underestimate, since my boxes are all slightly below x2 . Thus, the
actual area is probably somewhere between 15 and 30.
2. That’s a big range for an estimate—15 and 30 differ by a factor of two. It would be nice to have a
better idea of the actual area. One way to do this would be to use more boxes! Instead of 4, why
not 5? or 5, 000, 000? Or ∞?
Quick vocabulary note: this type of procedure is called a Riemann sum, after Bernhard Riemann (1826–
1886). Riemann sums are the Legos that we will construct our integral out of—we will (in a minute)
formally define the integral as being an infinite Riemann sum. There are all sorts of different ways we
could make Riemann sums. We could make them (as in the previous example) with boxes of the same
width, with the heights drawn on the left or right sides:
A Riemann sum with six equal-width partitions, A Riemann sum with six equal-width partitions,
with box heights drawn on the left endpoints with box heights drawn on the right endpoints
Or we could make them with the heights drawn in the middle, or three-quarters of the way to the left, or
with the heights drawn anywhere within the box. I could draw the heights at the highest point in the box,
or the lowest point (known, respectively, as an upper sum and a lower sum):
An upper sum with six partitions A lower sum with six partitions
(These types of Riemann sums—upper and lower sums—will be crucial in our proof of the Fundamental
Theorem of Calculus.) Or I could make a bunch of boxes of random width, with the height drawn at some
random point:
13
The point is, it doesn’t actually matter how we draw our boxes, because the more boxes we have, the
closer the Riemann sum will be to the actual area. Always. As the number of boxes goes to ∞,
the Riemann sum approximation will approach the actual area.
Problems
For each of the following problems, estimate the given area using each of the following methods. For each
estimation, sketch the function and the relevant Riemann boxes. (I know these questions are tedious, but
do them anyway.)
(a) A Riemann sum with three partitions, and box heights drawn in the center of each box,
heights drawn on the left endpoints, (g) an upper sum with three partitions,
(b) a Riemann sum with three partitions, and box (h) a lower sum with three partitions,
heights drawn on the right endpoints, (i)an upper sum with six partitions,
(c) a Riemann sum with three partitions, and box (j)a lower sum with six partitions,
heights drawn in the center of each box, (k) a Riemann sum with any number of partitions
(d) a Riemann sum with six partitions, and box and box heights chosen in absolutely any way
heights drawn on the left endpoints, you want,
(e) a Riemann sum with six partitions, and box (l) and by asking a younger sibling. (If you don’t
heights drawn on the right endpoints, have a younger sibling, ask a friend if you can
(f) a Riemann sum with six partitions, and box borrow theirs.)
134. The area underneath the function f (x) = x2 between 0 and 3.
135. The area underneath f (x) = x3 + 8 between −2 and 4.
136. The area underneath the function f (x) = ex between 0 and 6.

√
137. The area underneath the function f (x) = x between 2 and 8.
14
Infinite Riemann Sums
We’ve done Riemann sums with a finite number of boxes. But you know what’d be awesome? An
INFINITE number of boxes, all INFINITELY narrow! Then we’d find not just the approximate area—but
the EXACT area!!!
Let’s do this with our old curvy friend, x2 . What if I want to estimate the area underneath x2 (from,
say 0 out to some number b) using a Riemann sum? And what if I want to do this not using a Riemann
sum with a finite number of boxes, but with an infinite number? I guess that in order to do this, I’d need
to involve a limit somehow—I’d have to come up with a formula for a Riemann sum of x2 with n boxes,
and then take a limit as n → ∞. So let’s see if we can do this. Let’s imagine that I cut my area into boxes
at the points x0 , x1 , x2 , x3 , and so on, all the way up to xn .
Note, incidentally, that because my area goes from 0 to b, x0 = 0 and xn = b. Also, for convenience, let’s
assume that each of these boxes are the same width. In that case, we know that, since there are n boxes
fitting into the space from 0 to b—a space b units wide—each box must be b/n wide.
We don’t have boxes yet—we just have the bases of boxes. (The foundations of skyscrapers, but no
15
superstructure yet!) So let’s draw these boxes. We could, of course, draw their heights anywhere; for no
particular reason2 let’s draw them with their heights on the right side:
So let’s see if we can figure out the heights of each of these boxes. This function is x2 , so to find the height,
all I need to do is square the right-hand coordinate of the box. For instance, the first box must have height
(x1 )2 , the second box must have height (x2 )2 , and so on.
Except we can simplify this a bit. We already know that each box is b/n wide. But this means that x1
2
other than that I’ve done this problem before, and this choice makes the algebra work out easiest
16
(the right-hand side of the first box) must be at b/n. x2 must be at 2 · b/n. x3 must be 3 · b/n. And so on.
2
2 b b2
• So really, the height of the first box is: (x1 ) = = 2
n n
2 · b 2 22 b2

2
• The height of the second box is: (x1 ) = = 2
n n
3 · b 2 32 b2

• The height of the third box is: (x1 )2 = = 2
n n
k 2 b2
• And so forth. Put differently, the height of the kth box will be:
n2
Moreover, the width of the kth box will be b/n (all the boxes are b/n wide). So then the area of the kth
box will be:
area of kth box = width · height

b k 2 b2
= ·
n n2
k 2 b3
= 3
n
So if I want to find the area of all these n boxes added together, I have something like:
k=n
X k 2 b3
area of all the boxes =
n3
k=1
17
and since b and n are constant w.r.t. k, I can pull them out of the sum:
k=n
b3 X 2
= k
n3
k=1
So this gives us a formula for the area of a Riemann sum with n boxes, beneath x2 from 0 to b. However,
to turn this into not just an approximation of the exact area but the exact area itself, we’ll need to take a
limit as n → ∞. " k=n #
b3 X 2
exact area = lim k
n→∞ n3
k=1
P 2
But we still have this issue of the k . This isn’t a particularly
P 2 satisfying answer, since we haven’t
worked out the limit. And it’s not clear how to deal with the k . Presumably, were we to work out that
sum somehow, there’d be an n in it (it’s the sum from k = 1 to k = n; it’s the sum of the first n squares).
So allow me to suggest that we refer back to a fun formula from algebra. You may or may not remember
this, but if you want to add up the first n squares (12 + 22 + 32 + 42 + · · · ), there’s an easy formula:
k=n
X n(n + 1)(2n + 1)
k2 =
6
k=1
So if we plug that in here, we get that the area of this n-boxed Riemann sum is:
k=n
b3 X 2 b3 n(n + 1)(2n + 1)
k = ·
n3 n3 6
k=1
b3 2n3 + 3n2 + n

= 3·
n 6
b3
3
3n2 n

2n
= 3· + +
n 6 6 6
3 2

3 2n 3n n
=b · + +
6n3 6n3 6n3

2 3 1
= b3 · + + 2
6 6n 6n

3 1 1 1
=b · + +
3 2n 6n2
And to find the exact area, we take a limit as n → ∞:

3 1 1 1 3 1 1 1
lim b · + + = b lim + +
n→∞ 3 2n 6n2 n→∞ 2 2n 6n2
1
= b3 ·
3
1 3
= b
3
1 3
So the area underneath x2 from 0 to b is just b ! Put differently:
3
Z b
1
x2 dx = b3
0 3
18
This is interesting and useful. It is useful because, for example, in our previous problem of finding the
area beneath x2 from 0 to 4, we could just use this formula to find the exact area. In that case b = 4, so
as the exact area, we’d have:
1 3 1 3 1 1
b = 4 = 64 = 21 +
3 3 3 3
1
It is interesting, because b3 looks an awful lot like the antiderivative of x2 . I mean, it has a b instead
3
of an x, but still...
Problems
Z b Z b
1. We now have a formula for x2 dx. But what’s x2 dx? (Think about how the geometry works.)
0 a
2. Using a method analogous to the one above, find the exact area underneath the functions x3 and x4
Z b Z b
3
from 0 out to b. That is, evaluate x dx and x4 dx. The following formulas may help you:
0 0
k=n k=n
X 1 1 1 X 1 1 1 1
k 3 = n4 + n3 + n2 and k 4 = n5 + n4 + n3 − n
4 2 4 5 2 3 30
k=1 k=1
Z b Z b
3. As in #1, use problem #2 and geometry to find formulas for x3 dx and x4 dx.
a a
The Integral, Formally

So, earlier we informally defined an integral as being the “area” between a curve and the x-axis. But
what’s an area? Does algebra know about areas? Does calculus know about areas? We might have asked
the same question about slopes and derivatives—we wanted to come up with an equation for the slope of
a function, but “slope” is not a concept of algebra and arithmetic (not in the same way that “addition” or
f (x + h) − f (x)
“five” are). So instead we defined the derivative formally as lim , with the understanding
h→0 h
that this captures, algebraically, our intuitive concept of “slope.” We want to do something similar for an
integral: using only our well-defined concepts of arithmetic, algebra, and calculus, we want to come up
with an equation that is more-or-less equivalent to our intuitive notion of “area”3 .
Now that we have added Riemann sums to our arsenal, we can define an integral4 formally. Here we
3
Of course, one of the issues here is that intuitive concepts are, by definition, not particularly well-defined, and so formalizing
them is, to some degree, subjective. This is the perennial issue in the quantitative social sciences: if I want to measure
something (GDP, happiness, etc.), how do I measure it? what quantitative metric best captures my intuitive understanding
of the concept? what if my intuitive understanding is not exactly the same as everyone else’s? Pick a subject at random
and follow the debates: the arguments are overwhelmingly about context rather than content: “I think that’s a bad way of
measuring economic inequality,” not “I think you made those numbers up.” Wittgenstein: “The existence of the experimental
method makes us think we have the means of solving the problems which trouble us; though problem and method pass one
another by” (PI 2.XIV).
4
or, at least, a Riemann integral—one of the trippiest things about integration is that there are all sorts of different
integrals that one can define, some of which work in some situations but not others. The usual next step, if you’re taking
a “real analysis” class, is to talk about measure theory so that you can define and understand the Lebesque integral
(pronounced “la-bayg”).
19
go: " k=n #
Z b X
f (x) dx = lim f (ck )∆xk
a n→∞
k=1
This looks scary, so let’s talk about each part bit-by-bit:
• f (ck ) is the height of the kth box

– and so ck is, like, the x-coordinate from which we want to measure the height,
• and ∆xk is the width of the kth box,
– i.e., the difference (hence the ∆) between the x-coordinates of box k and box k − 1
– i.e., xk − xk−1
• so, thus, f (ck )∆xk = (height of the kth box)·(width of the
P kth box) = area of the kth box
• and we want to add up all n of the boxes, so we take a from the first box to the nth box
• and we take a limit as n → ∞, because we want this to be a Riemann sum not with a finite number of
boxes—we don’t want to approximate the area—we want this to be a Riemann sum with an infinite
number of boxes—we want the exact area!
This illustration may help a bit:
The Fundamental Theorem of Calculus

He computed with great fascination,
The Riemann sum approximation.
But he knew he could get,
The best answer yet,
With definite integration.
what’s weird is that we don’t directly use the definition we sort of attack this from the side the
definition was kind of just for fun
20
Z x
d
The Fundamental Theorem of Calculus, Part I: f (t) dt = f (x)
dx 0
This tells us, more or less, that an integral is an antiderivative, at least when we choose the bounds
correctly. It tells us that
R if we take a derivative of an integral, we get the function that was inside the
d
integral—that dx and cancel each other out.
(Of course, the caveat is that we have an integral going from some fixed constant a to the variable
x. (And note that we’ve written the function on the inside of the integral as f (t), rather than f (x), to
avoid ambiguity between x’s.) This should highlight one of the superficial differences between integrals
and derivatives: when we find a derivative, we find the derivative at any point on a function; our derivative
is a function of x. But with an integral, it doesn’t really make sense to find the “area” at a single point:
we really do need two points between which to find the area.)
Proof: Imagine we have this integral: Z x
f (t) dt
0
We’ll be using this a lot in the proof, so for convenience, let’s abbreviate it as F (x):
Z x
F (x) = f (t) dt
0
Then: what if we consider:

Z x+h Z x
f (t) dt − f (t) dt
0 0
Using one of our geometric properties of integrals, this must be just equal to
Z x+h Z x Z x+h
f (t) dt − f (t) dt = f (t) dt
0 0 x
Moreover, it must also be equal to F (x + h) − F (x):

Z x+h Z x
f (t) dt − f (t) dt = F (x + h) − F (x)
0 0
| {z } | {z }
F (x+h) F (x)
so, combining both of those two ideas:

Z x+h
f (t) dt = F (x + h) − F (x)
x
Now: let’s make a Riemann sum for this integral! In particular, let’s make upper and lower sums! But
let’s be boring, and make an upper sum that consists only of one box, and a lower sum that also consists
only of one box.
• Let’s say that Mh is the maximum value of f between x and x + h
21
– so then Mh·h is an upper sum for f between x and x + h. It’s not a very interesting upper sum—
it’s just a single, giant box encompassing the entire area—but it’s an upper sum nonetheless.
• Likewise, let’s say that mh is the minimum value of f between x and x + h

– then, by the same reasoning, mh ·h is a lower sum for f between x and x + h.
So we have an upper sum and a lower sum, and we know that the actual area beneath the function—the
actual integral—must lie between the upper sum and the lower sum:
lower sum ≤ actual area ≤ upper sum
or just:
Z x+h
mh · h ≤ f (t) dt ≤ Mh · h
0
or just:
Z x+h
1
mh ≤ f (t) dt ≤ Mh
h 0
R x+h
but we know that 0 f (t) dt is just a more complicated way of writing F (x + h) − F (x):
1
mh ≤ F (x + h) − F (x) ≤ Mh
h
F (x + h) − F (x)
mh ≤ ≤ Mh
h
AND if we make h really small....

F (x + h) − F (x)
lim [mh ] ≤ lim ≤ lim [Mh ]
h→0 h→0 h h→0
22
What happens to these three expressions as h → 0? If you think about it geometrically, as h gets
close to 0, then the area between x and x + h gets narrower and narrower. And whatever the maximum
value of f (x) is between x and x + h gets closer to f (x)—there’s nowhere else left to go.
Likewise, as x + h approaches x, the minimum value of f (x) gets closer to f (x) itself. If we had no
area—if h actually were zero—then there would be only ONE value of the function (between x and x + 0),
and so f (x) would be both the minimum and maximum value.
So we have:
• lim [Mh ] = f (x), and

h→0
• lim [mh ] = f (x)

h→0
If we think of this in terms of our inequality, we get:

F (x + h) − F (x)
f (x) ≤ lim ≤ f (x)
h→0 h

F (x + h) − F (x)
But if lim is both less than or equal to f (x), and greater than or equal to f (x),
h→0 h
then it must be equal to f (x)! (This is the same argument (the “two policeman theorem”) we made in
proving that the derivative of sine is cosine: we have something sandwiched in the middle of an inequality,
and we use a limit to crush everything together.)
So we must have

F (x + h) − F (x)
f (x) = lim
h→0 h
Or, put differently:
d
f (x) = [F (x)]
dx
Or, yet differently:

Z x
d
f (x) = f (t) dt
dx 0
Z x
But if the derivative of f (t) dt is f (x), then that’s the same as saying that an antiderivative of f (x) is
Z x a
f (t) dt.
0
Z b
The F.T.C., Part II: If F (x) is any antiderivative of f (x), then f (x) dx = F (b) − F (a)
a
Proof: From Part I, we already know that if F (x) is an antiderivative of f (x), then
Z x
F (x) = f (t) dt
0
So then we must have: Z b

F (b) = f (t) dt
0
23
and Z a
F (a) = f (t) dt
0
But then, because of this geometric property of integrals that we’ve discussed so many times before, we
must have:
Z b Z a Z b
f (t) dt − f (t) dt = f (t) dt
| 0 {z } | 0 {z } a
F (b) F (a)
= F (b) − F (a)
Moreover, it doesn’t matter which antiderivative we choose. We already know that functions can have more
than one antiderivative, and that different antiderivatives of the same function only differ by a constant.
Imagine that both F (x) and G(x) are antiderivatives of f (x). Then, since they only differ by a constant,
we must have:
F (x) = G(x) + C
for some constant C. But then if we consider F (b) − F (a)...
F (b) − F (a) = [G(b) + C] − [G(a) + C]
= G(b) − G(a)
The constant goes away when we subtract, since plugging in a different value for x (either a or b) doesn’t
change it. And we get the same thing.
So, for instance, if we wanted to find the area underneath the function f (x) = cos(x) + 4 from 0 to 2π,
we’d just use this formula.
The antiderivative of f (x) = cos(x) + 4 is F (x) = sin(x) + 4x, and so if we plug 0 and 2π in, we get
Z 2π
cos(x) dx = (sin(2π) + 4·2π) − (sin(0π) + 4·0)
0 | {z } | {z }
F (b) F (a)
= (0 + 8π) − (0 + 0)
= 8π
But if integrals are really just antiderivatives... I guess that means we’re going to need to talk more
about antiderivatives.
24
Integrals as “Net Area”
Problems
Z 12
1. Using Part II of the FTC, calculate x2 dx. Then calculate
5
Z 12
−x2 dx. Compare and contemplate your results. (Note that
5
I am not asking you here to find a certain area—I am asking you
to calculate an integral.)
Z π Z 2π
2. Likewise, calculate both sin(x) dx and sin(x) dx.
0 π
Z 2π
Then calculate sin(x) dx.
0
Z b
3. Make a conjecture about the relationship between f (x) dx
a
Z b
and −f (x) dx
a
Z 5 Z 0
2
4. Then calculate both x dx and sin(x) dx. How are these
12 π
different from the above problems? Make a conjecture about the
Z b Z a
relationship between f (x) dx and f (x) dx.
a b
There’s a fairly major caveat to integrals that we haven’t mentioned yet. Namely: in the
way that we’ve constructed them, integrals measure not “area,” but rather something more like net area.
Meaning: regions above the x-axis get counted as positive-area and regions below the x-axis get counted
as negative area, like so:
Z 2π
For example, sin(x) dx = 0, even though there is obviously quite a bit of area between sin x and
0
1
the axis from 0 to 2π—it’s just that each of the two lobes cancel each other out1 :
This may seem odd, but it is a concept you’ve encountered many times before. The distinction
between “integral” and “area” is the same as the distinction between “velocity” and “speed,” or as between
“subtraction” and “distance.”
Measures of magnitude Measures of magnitude AND direction
(i.e., scalar quantities) (i.e., vector quantities)
area integral
speed velocity
distance subtraction
Subtraction, for instance, measures the “distance” between two numbers—-but if you do it in the
wrong order, you get a negative:
5 − 2 = +3
2 − 5 = −3
If you want to really measure distance—the absolute distance, without any information about the direction—
you need to use an absolute value
|5 − 2| = +3
|2 − 5| = +3
It’s basically the same with integrals. In our conception of area, like distance, area is something that
can never be negative. But integrals consider any area above the x-axis to be “positive area” any area
below the x-axis to be “negative” area. Why? What’s going on here?
Let me give an example. Earlier, even before we learned that, as a result of the Fundamental Theorem
of Calculus, integrals are really just antiderivatiives, we were able to use basic geometry to come up with
this formula: Z b
k dx = (b − a)k
a
But the implicit assumption we made is that we have a situation like this:
Z π Z 2π
1
In particular, sin(x) dx = π and sin(x) dx = −π
0 π
2
Namely, we’re assuming that k is some positive number. We’re assuming that this line f (x) = k lies
above, not below, the x-axis. But what if k were negative? Then by our formula, the integral would be
negative, too. For example, if we had f (x) = −5, then according to this formula, we’d have:
Z 7
(−5) dx = (7 − 3)(−5) = −20
3
But obviously the region in question doesn’t have some sort of metaphysically-disturbing “negative area”—
it has a positive area. The area of this region is 20 units. It’s just that the region has been flipped below
the x-axis:
If we wanted that integral to be positive, we’d have to throw some sort of messy absolute-value into
my formula: we’d have to say:
Z b
k dx = (b − a)|k|
a
But obviously, that’s a problem, because then the integral would no longer be the same as the
antiderivative. Rather than simply have F (b) − F (a) be a general formula that tells us how to evaluate
any integral, we’d have to go back to the graph of the function every time and make sure it’s above the
x-axis (and throw in absolute values in the right places if it’s not). The beauty and simplicity of the
Fundamental Theorem of Calculus would be completely destroyed.
It wouldn’t work, by the way, just to throw in a single absolute value after the whole thing—to say,
for instance, that given some function g(x) and its antiderivative G(x), that:
Z b
g(x) dx = |G(b) − G(a)|
a
3
This wouldn’t work because we might have a region that is both above and below the x-axis between a
and b. Take sin(x), for instance:
Z 2π
If we try to find sin(x) dx using the FTC, we just get zero. And taking an absolute value of zero
0
won’t change that:
Z 2π
sin(x) dx = [− cos(x)]2π
0
0
= [− cos(2π)] − [− cos(0)]
= − cos(2π) + cos(0)
= −1 + 1
=0
Note that if we do want to find the actual area beneath sine, or beneath any curve, we’ll have to split
it up at all of its x-intercepts, and then take integrals, and then absolute-value the integrals:
Z π Z 2π

absolute area = sin(x) dx + sin(x) dx
0
π
= |[− cos(x)]0 | + [− cos(x)]2π
π
π
= |− cos(π) − − cos(0)| + |− cos(2π) − − cos(π)|

= |1 − −1| + |−1 − − − 1|
= |1 + 1| + |−1 − 1|
= |2| + |−2|
=2+2
=4
So it seems that if we have a negative inside of an integral, it doesn’t go away—rather, it keeps the
entire integral negative:
Z b Z b
−f (x) dx = − f (x) dx
a a
Put differently: if we flip a function vertically (by multiplying it by −1), the integral’s sign flips as well.
Another way to interpret this formula is this: we can pull negatives out of integrals, just like we can
with derivatives. In fact, integrals obey many of the same nice properties as derivatives—we can pull out
4
constants, and we can split them up along addition:
Z b Z b
k · f (x) dx = k · f (x) dx
a a
Z b Z b Z b
f (x) + g(x) dx = f (x) dx + g(x) dx
a a a
You’ve seen a lot of functions that obey these two properties—derivatives, integrals, sums (with the Σ),
etc. The general name for a function that obeys such properties is a linear function.
Back to integrals. There’s a second complication we haven’t dealt with. We made another assumption
in our sketch:
Z b
In figuring out k dx, not only did we assume that k was positive; we also assumed that a is to the
a
left of b (algebraically, that b > a). Our formula works out fine as long as a is to the left of b, like in the
diagram. If b > a, then (a − b) is a positive number, and thus we have a positive “width” for the box. But
what if we swap a and b? Imagine my same example with f (x) = 5. I’d have:
Z 7
5 dx = (7 − 3) · 5 = 4 · 5 = 20
3
Z 3
5 dx = (3 − 7) · 5 = (−4) · 5 = −20
7
All of a sudden our box has a “negative” width, and thus a negative area. Yipes.
More generally, as another consequence of the whole “net area” thing, the order in which we write the
little numbers to the right of the integral sign—the “limits” or “bounds” of integration—matters. Namely:
switching the bounds switches the signs:
Z b Z a
f (x) dx = − f (x) dx
a b
Ordinarily we write them with the leftmost bound on the bottom, and the rightmost bound on the top,
and then we don’t have to worry about this—if we do that, then we only need to keep track of whether
the area is above or below the x-axis.
As before, I could fix this simply by throwing an absolute value into my equation:
Z b
k dx = |b − a| · k
a
but the problem now is that, like before, I have this messy absolute value. This is unpleasant for many
reasons, not least of which is that integrals would no longer be antiderivatives. The fundamental
theorem of calculus would no longer hold. (Or at least not without considerably more caveats and details.)
If we return to our definition of an integral (as being a limit of Riemann sums), we can see where
these negatives come from. We defined an integral as:
5
"k=n #
Z b X
f (x) dx = lim f (ck )∆xk
a n→∞
k=0
But if we have a shape that’s below the x-axis, then the f (ck )’s will be negative:
And so if the f (ck )’s are negative, then f (ck )∆xk will be negative, too, and I’ll be adding up a bunch of
“negative” areas.
Likewise, by ∆xk , we mean something like xk − xk−1 . And we make our sum by saying that the
leftmost point on the region is x0 , and then the side of the next box is x1 , and then the next box to the
right is x2 , and so forth. We go from left to right. So as long as we’re going from left to right—as long as
xk > xk−1 —then xk − xk−1 will be positive. But if I go from right to left, then I’ll be subtracting bigger
numbers from smaller numbers, and so xk − xk−1 willl be negative. And then, as before, f (ck )∆xk will be
negative, and I’ll be adding up a bunch of “negative” areas.
Again, we could totally make this into a non-issue by defining a Riemann integral as:
Z b "k=n #
X
f (x) dx = lim |f (ck )| · |∆xk |
a n→∞
k=0
But then our integral would no longer be an antiderivative. The signs would be all messed up. And then, if
it weren’t an antiderivative, it would be a huge pain to compute. The whole joy and beauty of an integral
is that this giant mess of Riemann sums simplifies down just to an easy antiderivative.
Besides, there are plenty of times when we are far more interested in net quantity rather than absolute
quantity. If I go online to look at my bank account, I want it to show me the net amount of money I
have: the amount of money I’ve earned, minus the money I’ve spent. I don’t want it to add the money
I’ve spent to the money I’ve earned. That would be bizzare, confusing, and useless. If I want to sail down
Cayuga Lake, I want to know that there’s a lot of wind. But I want to know not only that there’s a lot of
wind, but that the wind is blowing north—otherwise I’d be sailing directly into the wind. So we’ll leave
the integral as telling us net area, and if we ever need to find the absolute area, we can just deal.
6
Antidifferentiation!
The year is not over yet. We have one last thing to do. We have learned how to differentiate—now we
must learn how to antidifferentiate. This problem is interesting enough in the context of differentiation.
Now that we’ve learned that integration is really just the same thing as antidifferentiation, it has taken on
added urgency.
But perhaps I should be a bit clearer with my pronouns: you must learn how to antidifferentiate, since
I am already fairly good at it. Here’s what we’ll do: I’ll give you 200–odd antiderivative problems (ranging
widely in difficulty and generality), and you’ll figure out how to do them. All by yourselves. I will do
nothing but sit and observe. And you will do this because during finals week, I will give you a very hard
test on antidifferentiation.
Now, in principle, each of you is more than capable of teaching yourselves how to antidifferentiate.
You are all already quite good at taking derivatives, so with sufficient play you should be able to figure
out how to move in the opposite direction. Put differently, you know how to do this:
function −→ derivative
and now you need to figure out how to do this:
function ←− derivative
so that you can understand the full correspondence:
function ←→ derivative
Some of you may be able to do all of the problems I give you with little difficulty. Others might have
more trouble. If you discover that you are good at this stuff, that doesn’t mean you can simply relax for
the next week and a half while your colleagues flounder. You have the obligation to help your classmates
who are struggling—to teach them and help them understand the material. If you do not, I will be very
disappointed. And you may be disappointed, too, when you see your semester evaluation, because I will
be evaluating you in this unit not simply on how well you do on the test, but on how well you operate as
a class.
Exactly how you organize the unit is up to you. You might want to split up the problems so you
discuss some predetermined fraction of them each day, you might want to designate a few people each day
to present problems and run the class, you might want to have a Hobbesian free-for-all... you are welcome
to do whatever you feel will be most efficacious. The point is, you will need take the initiative yourselves.
As I said supra, a large part of how I evaluate you will be on your ability to function as a cohesive and
directed whole.
My hope is that this short unit will be neither nasty nor brutish. I think it will be fun and challenging,
and a good way to end the year.
Some Technical Notes

In the problems I give you, I’ll use the symbol to represent an antiderivative. For example1
R
Z
2x dx = “the antiderivative of 2x” = x2
1
For now, don’t worry about what the dx means—we’ll talk plenty next year about it. Just think of it as being part of the
dy
notation and telling you to take the antiderivative with respect to x. It’s the same dx that shows up in a derivative —what
dx
it is, basically, is an infinitesimally small change in x—some value of x that is not equal to zero but is smaller than every
other real number. This should seem paradoxical. After all, didn’t we say, when discussing limits, that you couldn’t do that?
Well, yes, we did, and this is the paradox that haunted calculus for 200 years before Karl Weierstrass and his friends fixed it
in the mid-to-late 19th century. They did this by purging the idea of the infinitesimal from calculus (but without destroying
calculus in the process). Then in the late 1950s, a guy at UCLA named Abraham Robinson showed that these infinitely small
numbers actually do exist—that what seems to be a paradox isn’t—and, anyway, for a good treatment of this, read David
Foster Wallace’s Everything and More: A Compact History of ∞ (one of my favorite popular math books) or “Infinitesimally
Yours,” by Jim Holt, New York Review of Books, 20 May 1999.
Z
Of course, 2x dx could be x2 + 5, too, or even x2 + 3, 429, 836.8345. So really we should just write
Z
2x dx = x2 + C
where C can be any real number (i.e., is some constant). That way we can capture every antiderivative,
and not just one. Formally, I would define this notation as follows: given some function f (x) and its
derivative f 0 (x), the following relationship holds:
Z
f 0 (x) dx = f (x) + C
Additionally, unless otherwise specified, I represent constant quantities in these problems with single
letters (e.g. a, b, n) and functions with function-notation (e.g. a(x), b(t), f (x), r(θ)). Some of your
answers may have to include arbitrary constants or functions—that’s OK. You are welcome to ask me if
you have additional questions of this sort, but if you have questions on how to actually do the problems, I
will probably be quite reticient.
Antiderivatives I
−1
Z Z
1.1. 2x dx 1.14. dx
x2
Z Z
1.2. 2
3x dx 1.15. −2x−3 dx
−2
Z Z
1.3. 4x3 dx 1.16. dx
x3
Z Z
1.4. 17x16 dx 1.17. −3x−4 dx
−3
Z Z
1.5. 0 dx 1.18. dx
x4
Z Z
1.6. 1 dx 1.19. x−n dx
1
Z Z
1.7. 332 dx 1.20. dx
xn
5
Z Z
1.8. 5x dx 1.21. dx
x3
7
Z Z
1.9. 6x2 dx 1.22. dx
3x4
b
Z Z
5
1.10. 4x dx 1.23. dx
xn
−12
Z Z
1.11. 22x47 dx 1.24. dx
7x6
9
Z Z
n
1.12. ** ax dx 1.25. dx
x8
5
Z Z
1.13. −x−2 dx 1.26. + x5 dx
67x98
1 −1/2
Z Z
1.27. x dx 1.38. x−3 + x2 dx
2
Z
1
Z
√ 1
1.28. √ dx 1.39. x + √ dx
2 x x
Z
3 1.40. x−5 dx
Z
1.29. √ dx
x
Z
Z
3 1/2 1.41. ax2 + bx + c dx
1.30. x dx
2
3√ 1
Z Z
1.31. x dx 1.42. ax3 − dx
2 ax
Z
−1 −3/2 af 0 (x) dx
Z
1.32. x dx 1.43.
2
Z
−3 −5/2 1.44. f 0 (x) + g 0 (x) dx
Z
1.33. x dx
2
Z
Z
√ 1.45. x5 + 3x8 − 12x7 + 14 dx
1.34. 5 x dx
Z "k=n
X
#
k
Z
1 −2/3 1.46. ak x dx (where all of the ak , i.e.,
1.35. x dx k=0
3
a0 , a1 , a2 , etc., are constants)
1
Z
1.36. √ dx
Z
√
3
3 x2 1.47. 5 x − 3x6 + 23x4 + π dx
Z
−1 −4/3
Z
√
1.37. x dx 1.48. x3 + 3 5 x + 8x2/3 + bx + a dx
3
Antiderivatives II
1
Z Z
2.1. ** cos(x) dx 2.8. dx
cos2 x
Z
2.2. ** sin(x) dx Z
2.9. sin(x + 3π) dx
Z
2.3. −π sin(πx) dx
2
Z
2.10. cos x − π dx
3
Z
2.4. 3 sin(x) dx
Z
2.11. 2x cos(x2 ) dx
Z
2.5. sin(3x) dx
Z
Z 2.12. 2x cos(x) dx
2.6. sin(πx) − 3 sin(3x) dx
π π
Z
sin(x)
Z
2.7. cos x dx 2.13. dx
2 2 x
Antiderivatives III
Z Z
3.1. 2x cos(x2 ) dx 3.14. (1 − cos(t/2))2 sin(t/2) dt
Z Z
3.2. 2
3x cos(x ) dx 3 3.15. 28(7x − 2)−5 dx
Z Z
3.3. 4
−5x sin(x ) dx 5 3.16. x3 (x4 − 1)2 dx
9r
Z Z
3.4. 323(4x3 + 3x2 )(x4 + x3 )322 dx 3.17. √ dr
1 − r2
Z Z
3.5. 5
9(6x + 2x)(x + x ) dx 6 2 8 3.18. 12(y 4 + 4y 2 + 1)2 (y 3 + 2y) dy
1
Z Z
3.6. 4
15x (x + 12) dx 5 2 3.19. √ dx
5x + 8
Z Z √
6 7 5 3.20. 3 − 2s ds
3.7. 35x (x + 3) dx
Z Z
3.8. 3
4x cos(x + 2π) dx 4 3.21. (2x + 1)3 dx
3
Z Z
3.9. (k + 1) · f 0 (x) · ( f (x) )k dx 3.22. dx
(2 − x)2
Z Z p
0 n 4
3.10. f (x) · ( f (x) ) dx 3.23. θ 1 − θ2 dθ
Z Z
3.11. 3 6
(4x + 7x )f (x + x ) dx 0 4 7 3.24. g(x)g 0 (x) dx
4y
Z Z
0 0 3.25. dy
3.12. **** g (x) · f ( g(x) ) dx p
2y 2 + 1
Z Z
3.13. x sin(2x2 ) dx 3.26. x1/2 sin(x3/2 + 1) dx
x x
Z
g(x)
Z
3.27. sin5 cos dx 3.39. dx
3 3 (g(x))2
cos(z)
!5 Z
r3
Z
3.40. dz
3.28. r2 −1 dr
p
18 4 + 3 sin(z)
Z
! 3.41. cos−3 (2θ) sin(2θ) dθ
r5
Z
3.29. r4 7− dr
10
θ 1
Z
3.42. tan−5 dθ
Z 6 cos2 ( 6θ )
3.30. x1/3 sin(x4/3 − 8) dx
Z
3.43. (4y − y 2 + 4y 3 + 1)−2/3 (12y 2 − 2y + 4) dy
3
3 1
Z
3.31. 2
1− dx
x x
Z
3.44. (y 3 + 6y 2 − 12y + 9)−1/2 (y 2 + 4y − 4) dy
√
(1 + x)3
Z
3.32. √ dx
x Z p
3.45. x4 + 1 dx
1 1
Z
3.33. cos −1 dt
t2 t b3 x3
Z
3.46. √ dx
1 − a4 x4
Z p
3.34. x 1 − x2 dx
xn−1
Z
3.47. √ dx
a + bxn
Z
3.35. −5x4 sin(x4 ) dx !−2
y3
Z
2
3.48. y 1− 2 dy
a
√
10 v
Z
3.36. dv
(1 + v 3/2 )2 Z q
3.49. 1 + sin(x) cos(x) dx
4x
Z
3.37. √ dx Z
x2 + 1 3.50. x sin3 (x2 ) cos(x2 ) dx
x3
Z Z
3.38. √ dx 3.51. k 0 (x) · g 0 (k(x)) · f 0 (g(k(x))) dx
x4 − 9
Antiderivatives IV
Z Z
4.1. ex dx 4.14. 5x dx
Z Z
4.2. 2e 2x
dx 4.15. 2x cos(x3 ) dx
1 x
Z Z
276x
4.3. 276e dx 4.16. k dx
ln(k)
Z Z
4.4. cos(x)esin(x) dx 4.17. ** ax dx
Z Z
2
4.5. 2xex dx 4.18. 1.3x dx
Z Z
x5 +x3 2
4.6. 4
(5x + 3x )e 2
dx 4.19. x2x dx
Z Z
4.7. ** 0
g (x)e g(x)
dx 4.20. 7cos(t) sin(t) dt
( 31 )tan(t)
Z Z
4.8. 8ex+1 dx 4.21. dt
cos2 (t)
Z
e3x + 5e−x dx 1
Z
4.9. 4.22. ** dx
x
Z
2ex − 3e−2x dx 1
Z
4.10. 4.23. dx
2x
Z
2
2te−t dt 1
Z
4.11. 4.24. dx
x+4
Z
4
t3 et dt 1
Z
4.12. 4.25. dx
x−2
etan(θ)
Z
1
Z
4.13. dθ 4.26. dx
cos2 (θ) 3x + 8
1
Z
ln(x)
Z
4.27. dx 4.41. dx
3x − 2 x
2x
Z
4.28. dx ln(x)
Z
x2 − 25 4.42. dx
3x
8x
Z
4.29. dx Z
4
4x2 −5 4.43. dx
x(ln(x))2
4 sin(θ)
Z
4.30. dθ
1 − 4 cos(θ) Z
(ln(x))2
4.44. dx
7x
f 0 (x)
Z
4.31. ** dx
f (x)
(ln(x))2/3
Z
4.45. dx
x
− sin(θ)
Z
4.32. dθ
cos(θ) − 2
log7 (x)
Z
4.46. dx
ln(7)
sin(θ)
Z
4.33. dθ
cos(θ)
logk (x)
Z
4.47. dx
Z ln(k)
4.34. ** tan(θ)dθ
Z
4.48. ** logk (x) dx
πt
Z
4.35. tan dt
2
er
Z
4.49. dr
5 1 + er
Z
4.36. (8 + ln(x))4 dx
x
log7 (x)
Z
Z
3x2 +2 3 4.50. dx
4.37. 19 (x + 2x + ln(x))18 dx x
x
logk (x)
Z
1 0
Z
4.38. f (ln(x)) dx 4.51. dx
x x
g 0 (x)
Z Z
4.39. 0
f [( ln( g(x) ) ] dx 4.52. xex dx
g(x)
1
Z Z
4.40. · 2 · (ln(x))1 dx 4.53. ln(x) dx
x
Antiderivatives V
Z Z
5.1. 1 + 2x + 3x2 + 4x3 dx 5.14. (3 − x)10 dx
Z Z p
3 7 4
5.2. x + 4x − 2x + 6 dx 5.15. −2x 4 − x2 dx
Z Z √
n n b 5.16. 7x + 9 dx
5.3. ax + x + 3x − g dx
x3
Z Z
5.4. 1/2 · x−1/2 + 1/3 · x−2/3 + 1/12 · x−11/12 dx 5.17. dx
(1 + x4 )1/3
Z Z
5.5. x4/5 − x3/4 + x−1/3 + 5x2/7 dx 5.18. e5x+2 dx
Z √ √ √
5 4 1 7
Z
5.6. x4 − 3+ √
3
+ 5 x2 dx 5.19. 4 cos(3x) dx
x
Z
sin(ln(x) )
Z
5.7. g 0 (x) · f 0 ( g(x) ) dx 5.20. s dx
x
Z
3x + 6
Z
5.8. 3x2 · 8(x3 + 5)7 dx 5.21. dx
x2 + 4x − 3
Z Z
cos(x) · 4( sin(x) + 3)3 dx
2 +1
5.9. 5.22. x3x dx
Z Z
3
5.10. (3x2 + 8x7 − 1) · 26(x3 + x8 − x)25 dx 5.23. x2 e−4x dx
Z Z
5.11. (5x4 − 9x2 − 720x79 )(x5 − 3x3 − 9x80 )11 dx 5.24. (cos(x) )3 sin(x) dx
Z Z
9 10 4
5.12. 3(cos(x) + 10x )(sin(x) + x ) dx 5.25. x3 cos(5x4 ) dx
Z Z
5.13. (2x + 5)(x2 + 5x)7 dx 5.26. (3x + 4)100 dx
cos(x) xn−1
Z Z
5.27. dx 5.30. √ dx
1 + sin2 (x) a + bxn
Z
5.31. x3 (x2 + 1)1/8 dx
√
(2 − x)5
Z
5.28. √ dx
x Z
5.32. ex ex dx
1
Z
ln(ax)
Z
5.29. 2
dx 5.33. dx
cos (2x − 3) x
Antiderivatives VI
3 3x2 + 3
Z Z
6.1. 325 + 4x8 + 9x3 4 + 7x6 − x5 − 3 dx 6.9. dx
4 x3 + 3x
Z
√ 2 1 1
Z
6.2. 3
x − 2 + 5x3/2 − √ + dx 6.10. (3x2 + 8x7 − 1) · 26(x3 + x8 − x)25 dx
x 4x 5
1 1 1 1 1 sin(ln(ax))
Z Z
6.3. + 2 + 3 + 4 + n dx 6.11. dx
x x x x x x
Z Z
6.4. g 0 (x) · f 0 ( g(x) ) dx 6.12. (x4 + 2x9 ) sin(x5 + x10 ) cos(x5 + x10 ) dx
2 sin(x) + 4x
Z Z
6.5. (5x4 + 12x11 ) · 37(x5 + x12 )36 dx 6.13. dx
cos(x) + x2
1
Z Z
6.6. (3x2 + 16x) · dx 6.14. (x + 5)22 dx
x3 + 8x2 + 56
Z Z
6.7. cos(x + 3) · 4( sin(x + 3) )3 dx 6.15. (7x6 + 20x3 )(x7 + 5x4 )3/4 dx
Z Z
2 +2x+3 2
6.8. (2x + 2) · ex dx 6.16. e−x dx
Antiderivatives VII
Z
7.1. 2x cos(x) dx
Z
7.2. xex dx
Z
7.3. 2x ln(x) dx
Z
7.4. f 0 (x)g(x) dx
Z
7.5. x2 ex dx
Z
7.6. ln(x) dx
Z
7.7. 1 · ln(x) dx
Z
7.8. x5 sin(x) dx
Z
7.9. ex cos(x) dx
Some Comments on Solids of Revolution
So far, you’ve learned one formula to find the volume of a solid of revolution:
Z b
the volume of the shape generated by f (x)
= π( f (x) )2 dx
from a to b revolved around the x-axis a
Explained differently, this formula revolves a shape around whatever axis the function defining that shape
is written in terms of. For example, if we plugR y = x2 into this formula, it revolves x2 around the x-axis;
√ √ √
if we plug x = 3 y into this formula (and find π( 3 y)2 dy), it’ll revolve 3 y around the y-axis.
We can derive another formula that does something similar, but with a subtle and important difference:
Z b
the volume of the shape generated by f (x)
= 2πxf (x) dx
from a to b revolved around the y-axis a
This formula is different. This formula takes a function written in terms of one variable, and revolves it
around the opposite axis. It takes functions written in terms of x and revolves them around the y-axis; it
takes functions in terms of y and revolves them around the x-axis.
If you’re wondering why we care about having two different formulas that do more-or-less the same
thing—aren’t they redundant? couldn’t we just solve a function for either variable and then use either
formula to find the volume of the same shape?—skip ahead to the second example.
As another comment: neither of these formulas actually describe the three-dimensional shapes. They
merely find their volumes. If you wanted to describe the shapes in three dimensions, you’d need a third
variable (and some understanding of multivariate calculus). For example—and this is just an example,
and not something to generalize from—if you want the equation for a parabola rotated around the y-axis
(a three-dimensional paraboloid, it’s y = x2 + z 2 , with z being the axis of our third dimension
Example: The Volume Of The Same Shape Found Using Two

√
Different Formulas
Maybe I want to revolve the bullet-like shape given by the function y = x from 0 to 9 around the x-axis
(and find the volume of said bullet). The 2D (unrevolved) shape looks like this:
I can find the volume of the 3D shape in two ways. Using the first formula, I can just things in, and get:
Z 9
√
V = π( x)2 dx
0
Z 9
= πx dx
0
h π i9
= x2
2
π 0 π
= 92 − 02
2 2
81π
= −0
2
81π
=
2
Or we could use the second formula. First, we have to write this in terms of y, because that way, when I
plug it into my other formula, the formula will revolve it around the x-axis (which is what we want):
√
y= x
√
y 2 = ( x)2
x = y2
Then I have to find the new starting and ending points of my shape. It starts at x = 0, and it ends at
x = 9. If I plug those into my original function, I find that those two points are
√
x=y
√
0=0
√
9=3
So (with respect to y) the shape goes from y = 0 to y = 3. So if I plug things into my second formula, I
have:
Z 3
V = 2π · y · y 2 dy
0
Z 3
= 2πy 3 dy
0
2π 4 3

= y
4
0
2π 4 2π 4
= 3 − 0
4 4
81π
= −0
2
81π
=
2
Same answer!
Another Important Example p
What if we want to take the region bounded by y = ln(x) and y = 0 (on the top and bottom) and x = 1
and x = e4 (on the left and right, revolve it around the x-axis, and find the volume? The region, by the
way, looks like this:
So the 3D shape will look like a much-less-aerodynamic bullet. If we use our first formula, we get
Z e4 p Z e4
2
V = π( ln(x) ) dx = π ln(x)dx
1 1
How do you work this out? Who knows? Not you. You have no idea how to do this integral1 . GOOD
THING WE HAVE ANOTHER FORMULA! What if we try using it? First we’d need to put everything
in terms of y. So we’ll have:
p 2 2
y = ln(x) =⇒ y 2 = ln(x) =⇒ e(y ) = x =⇒ x = e(y )
p √
x = 1 =⇒ y = ln(1) =⇒ y = 0 =⇒ y = 0
p √
x = e4 =⇒ y = ln(e4 ) =⇒ y = 4 =⇒ y = 2
Then if we plug this stuff into our other formula, it’ll revolve this shape around the x-axis (which is what
we want). So we’ll have:
Z 2
2
V = 2π · y · ey dy
1
2 2
h i
= πey
1
22 2
= (πe ) − (πe1 )
= πe4 − πe
= πe(e3 − 1)
≈ 179.88
Yay!
1
I know how to do it! Eventually, you will, too. But not yet.
One Last Thing
Note that we can generalize both of our formulas. What if we want to take a shape generated by rotating
the area between two functions around the x-axis, and find the volume of that shape? (Like, we could have
a ring or something. A “hollow” of revolution, rather than a solid of revolution.) Not surprisingly, we
simply subtract two integrals (the volume of the outer shape minus the volume of the inner shape):
the volume of the shape generated by rotating Z b Z b

the area between f (x) and g(x) between x = a = π( f (x) )2 dx − π( g(x) )2 dx
and x = b around the x-axis a a
Note here that, as usual, the order of the functions is important—f (x) should be the outer radius of this
shape, and g(x) the inner radius. Otherwise you’d get a negative you didn’t want. And if g(x) = 0 (i.e., is
the x-axis), then we just have the same formula we had before. We can make the same generalization for
our other method:
the volume of the shape generated by rotating Z b Z b
the region between f (x) and g(x) between = 2πx f (x) dx − 2πxg(x) dx
x = a and x = b around the y-axis a a
Centroids!
The centroid (or center) of the region between the functions f (x) & g(x) and the lines x = a & x = b is
given by the following coordinates:
Rb
a x[ f (x) − g(x) ] dx
xc = Rb
a f (x) − g(x) dx
1
Rb 2 2
2 a(f (x)) − (g(x)) dx
yc = Rb
a f (x) − g(x) dx
or, put differently:
Rb 1
Rb 2 − (g(x))2 dx
!
a x[ f (x) − g(x) ] dx 2 a(f (x))
Rb , Rb
a f (x) − g(x) dx a f (x) − g(x) dx
Sketch the regions enclosed by the following curves and then find their centroids. (Mark the centroids on
your sketches.)
1. y = 2 − x, y = 0, and x = 0.
2. y = 2 − x2 , y = 0
3. y = 31 x2 , y = 0, x = 4
4. y = x3 , y = 0, x = 1
5. y = 12 (x2 − 10), y = 0, x = −2, x = 2

√
6. y = 2x − 4, y = 2 x, x = 1
7. y = x2 , y = x + 3
8. x = y 2 , x = 2
9. x = y 2 − 3y, x = −y
10. y = x2 , y = 4.
11. y = xk , y = 0, x = 1 (where k is an integer greater than zero)

√
12. y = xm , y = n x (again, where m and n are integers greater than zero. What happens if m = n?)
Improper Integrals
So far in our study of integrals—or, more specifically, in our study of the areas beneath curves—we’ve
only considered areas of regions that start and end at definite points. We’ve learned how to calculate, for
example, the area underneath x2 from −5 to 2:
Z 2 2
1 3
x2 dx = x
−5 3 −5
1 1
= (2)3 − (−5)3
3 3
133
=
3
≈ 44.3
But what if we wanted to find... all the area underneath x2 ? Like, from −∞ to +∞? Well, that’s
kind of a stupid question. The area is obviously infinite. And, indeed, if we did the math, that’s what we’d
find. We can’t really plug ∞ into an integral, since it’s not a real number per se, but we can use our limit
magic—we can just plug some dummy variable in and then take the limit as that variable goes to ∞. So,
for instance, to find the area beneath x2 from 0 to ∞, we could take the integral from 0 to some number
b, and then see what happens as b → ∞:
Z ∞ Z b
2
x dx = lim x2 dx
0 b→∞ 0
1 3 b

= lim x
b→∞ 3 0

1 3 1 3
= lim b − 0
b→∞ 3 3

1 3
= lim b
b→∞ 3
=∞
Intuitively, this is obvious; formally, we confirm it.

Here’s a slightly different example: what if we want to find the area beneath not x2 between 0 and
∞, but what if we want to find the area beneath, say, 1/x2 , between, say, 1 and ∞? This region looks
slightly different—1/x2 is asymptoting down to the x-axis rather than zooming up, like x2 —but the region
is infinitely long, so its area should be infinite as well:
So if we do the math:
Z ∞ Z b
1 1
dx = lim dx
0 x2 b→∞ 0 x
2
b
−1
= lim
b→∞ x 0

−1 −1
= lim −
b→∞ b 1

−1
= lim +1
b→∞ b

1
= lim 1 −
b→∞ b
=1
... woah. The area isn’t infinite. The area is finite. We have this shape that’s infinitely long yet has finite
area. This is bizarre. This is strange. What is going on???
One way of thinking about this is that this shape is a type of fractal. Normally 1/x2 from 1 to ∞
looks like this:
but I can roll it up into a spiral, a spiral that gets smaller and smaller and goes on forever, and yet
which fits into a neat little circle of finite area:
AN INFINITE JELLYROLL! An infinitely-long jellyroll that ONLY HAS FINITE AREA! The real
question is: where does one get a baking pan for such a jellyroll? (Hint: a certain Swedish department
store’s Geneva, Switzerland branch.)
Does this mean that every shape that’s infinitely-long, but which is getting narrower, will have finite
√ √
area? What if I have a different shape? What if I have, say, 1/ x from 1 to ∞? Like 1/x2 , 1/ x also
has a horizontal asymptote at y = 0. It’s also getting closer and closer to the x-axis as x gets bigger and
bigger:
√
But if we calculate the area beneath 1/ x betwen 1 and ∞....
Z ∞ Z b
1 1
√ dx = lim √ dx
1 x b→∞ 1 x
√ b
= lim 2 x 1
b→∞
√ √
= lim 2 b − 2 1
b→∞
√
= lim 2 b − 2
b→∞
=∞
So NOW we have an area that’s infinitely-wide, and also has infinite area!
If you think about it, we deal with the infinite in two ways: we can have shapes that are infinitely wide,
but we can also have shapes that are infinitely tall. ZWe have horizontal asymptotes, and we have vertical
1
1
asymptotes. What if, for example, we want to find 2
dx? We know 1/x2 has a vertical asymptote at
0 x
x = 0, so when we do the integral, we’ll actually need to not do the integral from 0 to 1—we’d get some
sort of divide-by-zero error—but do the integral from some dummy variable—say, a—to 1, and then take
the limit as a approaches 0 from the right side:
Z 1 Z 1
1 1
2
dx = lim 2
dx
0 x a x
a→0 +
1
−1
= lim
a→0 + x a

−1 −1
= lim −
a→0+ 1 a

−1 1
= lim +
a→0+ 1 a

1
= lim −1
a→0 + a
but as a approaches 0 from the right side (positive direction), 1/a gets bigger and bigger and bigger...
=∞
So the area of this infinitely-tall shape is also infinite! But that’s kind of weird—it means we have
this one function, 1/x2 , and some infinitely-long pieces of it have infinite area (like the region from 0 to
1), and other infinitely-long pieces of it have finite area (like the region from 1 to ∞):
Problems
Draw the functions and shapes described in the following integrals, and then find their areas:
Z ∞ Z ∞ Z ∞ Z ∞
1 1 1 1
1. 2
dx 2. 2
dx 3. dx 4. dx
1 x 5 x 1 x 5 x
Z ∞ Z ∞ 1 1
1 1 8x3
Z Z
√ dx 1
5. 11. dx 17. dx 22. dx
1 x 1 x4 0 x2 −∞ (x4 + 1)2
Z ∞ 0
1
Z
6. √ dx 12. xex dx 1 1
Z
1
Z
3
1 x −∞ 18. √ dx 23. x ln(x) dx
Z ∞ Z ∞ 0 x 0
1
7. dx 13. e−x sin(x) dx
x4 1 2
Z Z
1 0 1
∞ ∞ 19. x dx 24. dx
(x − 2)
Z Z
0 0
8. x dx 14. xe−x dx
1 0 Z 1 Z 2
∞ ∞ 1
x5 dx
Z Z
1 20. 25. dx
9. √ dx 15. sin(x) dx 0 0 (x − 2)2
1 1−x 0
∞ 1 π/2 Z 2
1
Z Z Z
ln(x) 1
10. dx 16. dx 21. tan(x) dx 26. dx
1 x 0 x 0 0 (x − 2)2/3
Z ∞
27. For what values of p is xp e−x dx infinite? for what values of p is it finite?
0
Z ∞
2
28. Is e−x dx finite, or infinite? This is not particularly easy to answer, since you can’t antidifferen-
1 Z ∞
2
tiate e−x without using an infinite series. So let me ask this: is e−x dx finite or infinite? Might
Z ∞ 1
−x2
knowing this aid you in knowing whether e dx is finite or infinite? (Note that I’m not asking
1
2
for the precise area beneath e−x , if it is finite; I’m merely curious as to whether it’s finite or infinite.)
Improper Integrals, II
Name:
Improper integrals are soooo cool. You can have a shape that’s infinitely long but has finite area? How
trippy is that? But, of course, sometimes you have a shape that’s infinitely long that does have infinite
area. Why the difference? Why can infinite shapes have finite area, anyway? Fill out this table, and see
if you can start to probe this. (Obviously, you’ll need to do the work on a separate sheet (or sheets) of
paper.) I think it might help if you drew the corresponding regions, too, rather than just
calculated the integrals.
Z ∞
k x k xk dx
1
3 x3
2 x2
1 x
√
1/2 x1/2 = x
√
1/3 x1/3 = 3
x
√
1/4 x1/4 = 4
x
0 x0 = 1
1
−1/4 x−1/4 = √4
x
1
−1/3 x−1/3 = √3
x
1
−1/2 x−1/2 = √
x
1
−1 x−1 =
x
1
−4/3 x−4/3 = 4/3
x
1
−3/2 x−3/2 = 3/2
x
1
−2 x−2 = 2
x
1
−3 x−3 = 3
x
1
−4 x−4 = 4
x
1
−5 x−5 = 5
x
Suggestion: this looks like a huge number of integrals, but it’s a lot less work than it seems. Many
have the same answer. And it might make your life easier if you find the antiderivative of xk first, and use
that to evaluate all these integrals, rather than doing it anew in each case. But remember that doesn’t
work for 1/x (because you’ll need to use a natural log instead).
You should see a pattern here. When k is less than −1, the area under xk between 1 and ∞ is finite.
When k is greater than (or equal to) −1, the corresponding area is infinite.
But that’s just an empirical conjecture. It’s just a guess, based on the evidence. Not aZ proof. So can
∞
you prove this to be true? My suggestion: consider the function f (x) = xk , and try to find xk dx. For
1
convenience, you’ll probably want to split it up into three seperate cases:
• when k is greater than −1,

• when k = −1, and
• when k is less than −1.
Write up your proof/argument on a seperate piece of paper and attach it to this. By “attach,” I don’t
mean, “by folding it,” or “by putting it nearby.” I mean, “with a staple.” If you don’t own a stapler, buy
one, and get your accountant to credit it as a tax write-off.
k
Z under the function x between 1 and ∞ as a function
Next question: can you plot the value of the area
∞
of k? That is, can you plot the function f (k) = xk dx? What does it look like? Do that, too, and
1
attach it. When I was 19, I thought that graph was one of the most remarkable things I had ever seen.
Improper Integrals, III
Name:
Yesterday, you analyzed the area beneath the functions xk from 0 to ∞, and found that the area was
infinite for k ≥ −1 (sensibly enough), and finite for k < −1 (bizarrely). Today, I ask you to consider
a related question: what is the area beneath xk between 0 and 1? Sometimes this is obviously a finite
number—for example, for x2 . The region between 0 and 1 is just a curvy-triangular shape. But sometimes
(i.e., when k < 0), xk has a vertical asymptote at x = 0—and so this region between 0 and 1 is infinitely
tall! So the obvious question is: this infinitely-tall-region between 0 and 1—does it have infinite area, or
finite area? when? Fill out this table, and see if you can start to probe this. (Obviously, you’ll need to do
the work on a separate sheet (or sheets) of paper.)
Z 1 Z ∞
k
x dx xk xk dx
0 1
x3 ∞
x2 ∞
x ∞
√
x1/2 = x ∞
√
x1/3 = 3
x ∞
√
x1/4 = 4
x ∞
x0 = 1 ∞
1
x−1/4 = √4
∞
x
1
x−1/3 = √3
∞
x
1
x−1/2 = √ ∞
x
1
x−2/3 = 2/3 ∞
x
1
x−3/4 = 3/4 ∞
x
1
x−1 = ∞
x
1
x−4/3 = 4/3 3
x
1
x−3/2 = 3/2 2
x
1
x−2 = 2 1
x
1
x−3 = 3 1/2
x
1
x−4 = 1/3
x4
1
x−5 = 5 1/4
x
Just like with last time, you should see a pattern here. Make a conjecture as to what’s happening:
• when k is greater than −1, the area under xk between 0 and 1 is:
• when k is equal to −1, the area under xk between 0 and 1 is:
• when k is less than −1, the area under xk between 0 and 1 is:
Now: can you prove your conjecture? Write up your proof/argument on a seperate piece of paper and
staple it to this. Also: like last time, can you plot the value of the area
Z under the function xk between 0
1
and ∞ as a function of k? That is, can you plot the function f (k) = xk dx? What does it look like?
0
Do that, too, and attach it.
“Hey Friend, What’s An Integral?”
What is an integral and how do you find one? Imagine that your penpal—who is familiar with
derivatives but isn’t all that great at math and has never seen an integral before—imagine your
penpal wants you to explain the idea of an integral to them well enough that they can place into
Honors Multivariable Calculus next year at college. Use American English, symbols (i.e., math),
graphs, charts, metaphors, etc., to teach them what integrals are, how to find them, their connection
to derivatives, and why they’re useful (and interesting). Go into detail! Be creative! This is your
opportunity to show me what you know. Feel free to use references other than your notes (such as
your book or the internet); if you do, be sure to cite properly.
Due: Monday, September 20th.
Mathematical Biography
Even on the first day of school, you come to math class with a lot of baggage: not just pencils and
paper, but your previous experiences with math and math classes. This assignment is your opportunity to
think about those experiences and tell me something about them so I can get to know you, and teach you,
better. Take some time to reflect on the following questions, and write out careful answers on a page or
two to be handed in. You need not write a complete essay, though that’d be nice, but please do write in
complete sentences and make the effort to express complete thoughts. The better I can understand your
writing, the better I will be able to teach you. (Please type it.)
• Overall, how do you feel about math? Have you always felt that way, or were there specific experiences
or moments that have given you that feeling? If the latter, what were they, and why were they
important?
• What was your best experience with math—this could be a particular class, or problem you solved,
or book you read, or even a situation outside of class where math really helped you. What made it
so good? What were you doing? Try to be specific: “My teacher was always friendly when I asked
for extra help” is more useful than “My teacher was really cool.”
• What has been your worst experience in math? This could be a particular topic you couldn’t
understand, a problem you couldn’t solve, or situation where lack of expertise was embarrassing.
What made it so bad? Was it unavoidable, or was there something you could have done differently
to make it less awful? What could you do now if the situation came up again? Do you still feel bad
about it now, or is it something you can laugh about?
• Is there one particular math problem or topic that you really like? If so, what was it, and why do
you like it so much?
• Is there anything you’d like to change about your own approach to mathematics this year? If so,
what?
• In summary, what are one or two adjectives you would use to describe your feelings about mathe-
matics, what would they be?
• What is something you wouldn’t expect me to know about you? This could be something like your
favorite flavor of ice cream, the fact that you’re a nationally ranked shuffleboard player, or that you’ve
never really liked wearing shorts—but it should be something that I wouldn’t know from seeing your
name on a class list.
Your biography will be graded on effort; as long as it is complete and thoughtful, you will receive full
credit. (Your biography will be kept confidential.) Due:
(rev. 8/2010)

Adventures in Mathematics PDF

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Adventures in Mathematics PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Exponentiation: Theorems, Proofs, Problems

Pre/Calculus 11, Veritas Prep.

Our Exponentiation Theorems

Theorem B: (an )m = anm

But wait! We can cancel things here, too! We get:

f (x) = 5x2 + 3x − 2 f (x) = −8x15 f (x) = 3x + 4

f (x) = cn xn + cn−1 xn−1 + cn−2 xn−2 + · · · + c2 x2 + c1 x1 + c0 x0

f (x) = cn xn + cn−1 xn−1 + cn−2 xn−2 + · · · + c2 x2 + c1 x + c0

Standard form: f (x) = x2 + x − 6 Standard form: f (x) = 5x2 + 5x

And odd-degree monomials look like this:

f (x) = 4x6 − 20x5 + 20x4 + 3x3 − 5

• Polynomials of even degree: tails both go in the same direction.

– Leading coefficient +: tails both go up (like +x2 ).

• Polynomials of odd degree: tails go in opposite directions.

– Leading coefficient +: like +x3

• A polynomial of degree n has

– at most n x-intercepts (maybe fewer)

– Even-degree polynomials can have as few as 0 roots (why? Give an example.)

• Each root also has a multiplicity. We say the multiplicity of an x-intercept/solution/zero/root is

– if x = 0, then x = 0 is an x-intercept/etc. (of multiplicity 1)

• We can summarize our findings:

• Consider f (x) = −(x − 3)(x + 1)2 (x − 1)3

• It has a y-intercept at f (0) = −(0 − 3)(0 + 1)2 (0 − 1)3 = −3.

• And the roots and their multiplicities are thus:

f (x) = −(x − 3)(x + 1)2 (x − 1)3

• So our sketch should look like:

• We could do just that.

• We can also see that it has three roots.

We can summarize our findings, similarly to how we did before:

1. What is the degree of the polynomial?

1. f (x) = (x − 2)(x + 4) 15. f (x) = (x + 3)2 (x + 5)2

1. What is the degree of the polynomial (even or odd)?

In the beginning were the natural numbers:

· · · , −2, −1, 0, +1, +2, · · ·

where both p(x) and q(x) are polynomials.

• Here are some more pictures of end asymptotes:

This function has a horizontal

• For example, imagine we have the function

then we must have

We can summarize our findings5 :

when −4 < x < −1...

when x < −4...

A Third Example With Weird Asymptotes, Etc.

• So imagine we have a rational function that looks like this:

How can we find an equation for it? What do we know?

An Entirely Different Way of Drawing Rational Functions

So then we can draw in the function on the left side:

• Then, in the third-from-left segment, we have an x-intercept.

So it will look like this:

1. Where are the vertical asymptotes?

For the following graphs of rational functions:

f (x) = 3x2 and g(x) = 5x + 1

f ( g(x) ) = 3( g(x) )2 = 3(5x + 1)2 = h(x)

If we multiply two things, we can undo that by dividing:

time passed your assets

31/t = ((1.15)t )1/t

f inv (9000) = f inv (3000 · 1.15t )

f inv (x) = loga x

This is the same2 as writing:

So, for example:

• 52 = 25, and log5 (25) = 2

Another way to think about these examples is this: