Sunteți pe pagina 1din 159

A Short Course on

Approximation Theory

Math 682 Summer 1998


N. L. Carothers
Department of Mathematics and Statistics
Bowling Green State University
Table of Contents
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Problem Set: Function Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Approximation by Algebraic Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Problem Set: Uniform Approximation by Polynomials . . . . . . . . . . . . . . . . 42
Trigonometric Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Problem Set: Trigonometric Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Characterization of Best Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Problem Set: Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Examples: Chebyshev Polynomials in Practice . . . . . . . . . . . . . . . . . . . . . . 74
A Brief Introduction to Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Problem Set: Lagrange Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Approximation on Finite Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
A Brief Introduction to Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Problem Set: Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Jackson's Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Problem Set: Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
The M untz Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
The Stone-Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
A Short List of References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Preface
These are notes for a six week summer course on approximation theory that I oer oc-
casionally at Bowling Green State University. Our summer classes meet for 90 minutes,
ve days a week, for a total of 45 hours. But the pace is somewhat leisurely and there is
probably not quite enough material here for a \regulation" one semester (45 hour) course.
On the other hand, there is more than enough material here for a one quarter (30 hour)
course and evidently enough for a ve or six week summer course.
I should stress that my presentation here is by no means original: I borrow heavily
from a number of well known texts on approximation theory (see the list of references at
the end of these notes). I use T. J. Rivlin's book, An Introduction to the Approximation
of Functions, as a complementary text and thus you will see many references to Rivlin
throughout the notes. Also, a few passages here and there are taken from my book, Real
Analysis. In particular, large portions of these notes are based on copyrighted material.
They are oered here solely as an aid to teachers and students of approximation theory
and are intended for limited personal use only. I should also point out that I am not an
expert in approximation theory and I make no claims that the material presented here is
in current fashion among experts in the eld.
My interest in approximation theory stems from its beauty, its utility, and its rich
history. There are also many connections that can be drawn to questions in both classical
and modern analysis. For the purposes of this short introductory course, I focus on a
handful of classical topics (with a little bit of modern terminology here and there) and
\name" theorems. Indeed, the Weierstrass approximation theorem, along with its various
relatives, is the central theme of the course.
In terms of prerequisites, I assume at least a one semester course in advanced calcu-
lus or real analysis (compactness, completeness, uniform convergence, uniform continuity,
normed spaces, etc.) along with a course in linear algebra. The rst chapter, entitled
Preliminaries, contains four brief appendices that provide an all too brief review of such
topics they are included in order to make the notes as self-contained as possible. The
course is designed for beginning master's students (in both pure and applied mathemat-
ics), but should be largely accessible to advanced undergraduates. From my experience,
there are plenty of topics here that even advanced PhD students will nd entertaining.
Math 682 Preliminaries 5/18/98

Introduction
In 1853, the great Russian mathematician, P. L. Chebyshev C ebysev], while working on a
problem of linkages, devices which translate the linear motion of a steam engine into the
circular motion of a wheel, considered the following problem:
Given a continuous function f dened on a closed interval  a b ] and a positive
P
integer n, can we \represent" f by a polynomial p(x) = nk=0 ak xk , of degree
at most n, in such a way that the maximum error at any point x in  a b ] is
controlled? In particular, is it possible to construct p in such a way that the error
max jf (x) ; p(x)j is minimized?
a x b
This problem raises several questions, the rst of which Chebyshev himself ignored:
{ Why should such a polynomial even exist?
{ If it does, can we hope to construct it?
{ If it exists, is it also unique?
R
{ What happens if we change the measure of the error to, say, ab jf (x) ; p(x)j2 dx?
Chebyshev's problem is perhaps best understood by rephrasing it in modern terms.
What we have here is a problem of linear approximation in a normed linear space. Recall
that a norm on a (real) vector space X is a nonnegative function on X satisfying
kxk  0, and kxk = 0 () x = 0
kxk = jjkxk for  2 R
kx + yk  kxk + kyk for any x, y 2 X .
Any norm on X induces a metric or distance function by setting dist(x y) = kx ; yk. The
abstract version of our problem(s) can now be restated:
{ Given a subset (or even a subspace) Y of X and a point x 2 X , is there an
element y 2 Y which is \nearest" to x that is, can we nd a vector y 2 Y such
that kx ; yk = zinf 2Y
kx ; zk? If there is such a \best approximation" to x from
elements of Y , is it unique?
Preliminaries 2
Examples
;P 
1. In X = Rn with its usual norm k(xk )nk=1k2 = nk=1 jxk j2 1=2, the problem has
a complete solution for any subspace (or, indeed, any closed convex set) Y . This
problem is often considered in Calculus or Linear Algebra where it is called \least-
squares approximation." A large part of the current course will be taken up with
least-squares approximations, too. For now let's simply note that the problem changes
character dramatically if we consider a dierent norm on Rn.
Consider X = R2 under the norm k(x y)k = maxfjxj jyjg, and consider the
subspace Y = f(0 y) : y 2 Rg (i.e., the y-axis). It's not hard to see that the point
x = (1 0) 2 R2 has innitely many nearest points in Y  indeed, every point (0 y),
;1  y  1, is nearest to x.
2. There are many norms we might consider on Rn. Of particular interest are the `p-
norms that is, the scale of norms:
X
n !1=p
k(xi )ni=1 kp = jxk jp 1p<1
k=1
and
k(xi )ni=1k1 = 1max jx j:
i n i
It's easy to see that k  k1 and k  k1 dene norms. The other cases take a bit more
work we'll supply full details later.
3. Our original problem concerns X = C  a b ], the space of all continuous functions
f :  a b ] ! R under the uniform norm kf k = amaxx b
jf (x)j. The word \uniform" is
used because convergence in this norm is the same as uniform convergence on  a b ]:
kfn ; f k ! 0 () fn f on  a b ]:
In this case we're interested in approximations by elements of Y = Pn, the subspace
of all polynomials of degree at most n in C  a b ]. It's not hard to see that Pn is a
nite-dimensional subspace of C  a b ] of dimension exactly n + 1. (Why?)
If we consider the subspace Y = P consisting of all polynomials in X = C  a b ],
we readily see that the existence of best approximations can be problematic. It follows
Preliminaries 3
from the Weierstrass theorem, for example, that each f 2 C  a b ] has distance 0
from P but, since not every f 2 C  a b ] is a polynomial (why?), we can't hope for
a best approximating polynomial to exist in every case. For example, the function
f (x) = x sin(1=x) is continuous on  0 1 ] but can't possibly agree with any polynomial
on  0 1 ]. (Why?)
The key to the problem of polynomial approximation is the fact that each Pn is
nite-dimensional. To see this, it will be most ecient to consider the abstract setting of
nite-dimensional subspaces of arbitrary normed spaces.
\Soft" Approximation
Lemma. Let V be a nite-dimensional vector space. Then, all norms on V are equivalent.
That is, if k  k and jjj  jjj are norms on V , then there exist constants 0 < A B < 1 such
that
A kxk  jjjx jjj  B kxk
for all vectors x 2 V .
Proof. Suppose that V is n-dimensional and that k  k is a norm on V . Fix a basis
e1 : : : en for V and consider the norm
X  X
 ai ei  = n jaij = k(ai )ni=1k1
n
 i=1 1 i=1
P
for x = ni=1 ai ei 2 V . Since e1 : : : en is a basis for V , it's not hard to see that k  k1 is,
indeed, a norm on V . It now suces to show that k  k and k  k1 are equivalent. (Why?)
One inequality is easy to show indeed, notice that
X  n  X X 
 n ai ei   X ja jk e k  max k e k
n
ja j = B 

n
a e  :
 i=1  i=1 i i 1 i n i
i=1
i  i=1 1
i i

The real work comes in establishing the other inequality.


To begin, notice that we've actually set-up a correspondence between Rn and V 
P
specically, the map (ai )ni=1 7! ni=1 ai ei is obviously both one-to-one and onto. Moreover,
this correspondence is an isometry between (Rn k  k1) and (V k  k1 ).
Preliminaries 4
Now the inequality we've just established shows that the function x 7! kxk is contin-
uous on the space (V k  k1) since
 kxk ; kyk   kx ; yk  B kx ; yk
1

for any x, y 2 V . Thus, k  k assumes a minimum value on the compact set


S = fx 2 V : kxk1 = 1g:
(Why is S compact?) In particular, there is some A > 0 such that kxk  A whenever
kxk1 = 1. (Why can we assume that A > 0?) The inequality we need now follows from
the homogeneity of the norm:
 x 
   A =) kxk  A kxk1 :
kxk1
Corollary. Given a < b ( xed) and a positive integer n, there exist 0 < An , Bn < 1
(constants which may depend on n) such that
X
n Xn  Xn
An jak j  amax  
 a x   Bn jak j:
k
k=0 x b k=0 k  k=0
Exercise
Find explicit \formulas" for An and Bn, above. (This can be done without any fancy
theorems.) If it helps, you may consider the case  a b ] =  0 1 ].
Corollary. Let Y be a nite-dimensional normed space and let M > 0. Then, any closed
ball fy 2 Y : kyk  M g is compact.
Proof. Again suppose that Y is n-dimensional and that e1 : : : en is a basis for Y . From
our previous lemma we know that there is some constant A > 0 such that
X
n X n 
A jai j   ai ei 

i=1 i=1
P
for all x = ni=1 ai ei 2 Y . In particular,
Xn 
A jai j   ai ei   M =) jai j  M
A for i = 1 : : : n:
i=1
Preliminaries 5
Thus, fy 2 Y : kyk  M g is a closed subset (why?) of the compact set
( X
n )
x = ai ei : jai j  M
A i = 1 ::: n :
i=1

Corollary. Every nite-dimensional normed space is complete. In particular, if Y is a


nite-dimensional subspace of a normed linear space X , then Y is a closed subset of X .
Theorem. Let Y be a nite-dimensional subspace of a normed linear space X , and let
x 2 X . Then, there exists a (not necessarily unique) y 2 Y such that

kx ; y k = min
y2Y
kx ; yk
for all y 2 Y . That is, there is a best approximation to x by elements of Y .
Proof. First notice that since 0 2 Y , we know that a nearest point y  will satisfy
kx ; y k  kxk = kx ; 0k. Thus, it suces to look for y among the vectors y 2 Y
satisfying kx ; yk  kxk. It will be convenient to use a slightly larger set of vectors,
though. By the triangle inequality,

kx ; yk  kxk =) kyk  kx ; yk + kxk  2kxk:


Thus, we may restrict our attention to those y's in the compact set

K = fy 2 Y : kyk  2kxkg:

To nish the proof, we need only notice that the function f (y) = kx ; yk is continuous:
 
jf (y) ; f (z)j =  kx ; yk ; kx ; zk   ky ; zk
hence attains a minimum value at some point y 2 K .
Corollary. For each f 2 C  a b ], and each positive integer n, there is a (not necessarily
unique) polynomial pn 2 Pn such that

kf ; pnk = pmin
2Pn
kf ; pk:
Preliminaries 6
Corollary. Given f 2 C  a b ] and a ( xed) positive integer n, there exists a constant
R < 1 such that if  X n 
f ; ak xk   kf k
 k=0 
then 0max ja j  R.
k n k
Examples
Nothing in our Corollary says that pn will be a polynomial of degree exactly n|rather, a
polynomial of degree at most n. For example, the best approximation to f (x) = x by a
polynomial of degree at most 3 is, of course, p(x) = x. Even examples of non-polynomial
functions are easy to come by for instance, the best linear approximation to f (x) = jxj
on ;1 1 ] is actually the constant function p(x) = 1=2, and this makes for an entertaining
exercise.
Before we leave these \soft" arguments behind, let's discuss the problem of uniqueness
of best approximations. First, let's see why we want best approximations to be unique:
Lemma. Let Y be a nite-dimensional subspace of a normed linear space X , and suppose
that each x 2 X has a unique nearest point yx 2 Y . Then, the nearest point map x 7! yx
is continuous.
Proof. Let's write P (x) = yx for the nearest point map, and let's suppose that xn ! x
in X . We want to show that P (xn ) ! P (x), and for this it's enough to show that there is
a subsequence of (P (xn )) which converges to P (x). (Why?)
Since the sequence (xn ) is bounded in X , say kxn k  M for all n, we have
kP (xn )k  kP (xn ) ; xn k + kxnk  2kxnk  2M:
Thus, (P (xn )) is a bounded sequence in Y , a nite-dimensional space. As such, by passing
to a subsequence, we may suppose that (P (xn )) converges to some element P0 2 Y . (How?)
Now we need to show that P0 = P (x). But
kP (xn) ; xnk  kP (x) ; xnk (why?)
for any n, and hence, letting n ! 1,
kP0 ; xk  kP (x) ; xk:
Preliminaries 7
Since nearest points in Y are unique, we must have P0 = P (x).
Exercise
Let X be a normed linear space and let P : X ! X . Show that P is continuous at x 2 X
if and only if, whenever xn ! x in X , some subsequence of (P (xn )) converges to P (x).
Hint: The forward direction is easy for the backward implication, suppose that (P (xn ))
fails to converge to P (x) and work toward a contradiction.]
It should be pointed out that the nearest point map is, in general, nonlinear and, as
such, can be very dicult to work with. Later we'll see at least one case in which nearest
point maps always turn out to be linear.
We next observe that the set of best approximations is always pretty reasonable:
Theorem. Let Y be a subspace of a normed linear space X , and let x 2 X . The set Yx,
consisting of all best approximations to x out of Y , is a bounded, convex set.
Proof. As we've seen, the set Yx is a subset of fy 2 X : kyk  2kxkg and, hence, is
bounded.
Now recall that a subset K of a vector space V is said to be convex if K contains the
line segment joining any pair of its points. Specically, K is convex if

x y 2 K 0    1 =) x + (1 ; )y 2 K:
Now, y1 , y2 2 Yx means that

kx ; y1k = kx ; y2 k = min
y2Y
kx ; yk:
Next, given 0    1, set y = y1 + (1 ; )y2 . We want to show that y 2 Yx, but notice
that we at least have y 2 Y . Finally, we estimate:
kx ; y k = kx ; (y1 + (1 ; )y2 )k
= k(x ; y1 ) + (1 ; )(x ; y2 )k
 kx ; y1 k + (1 ; )kx ; y2k
= min
y2Y
kx ; yk:
Preliminaries 8
Hence, kx ; y k = min
y2Y
kx ; yk that is, y 2 Yx.
Exercise
If, in addition, Y is nite-dimensional, show that Yx is closed (hence compact).
If Yx contains more than one point, then, in fact, it contains an entire line segment.
Thus, Yx is either empty, contains exactly one point, or contains innitely many points.
This observation gives us a sucient condition for uniqueness of nearest points: If our
normed space X contains no line segments on any sphere fx 2 X : kxk = rg, then any
best approximation (out of any set) will be unique.
A norm k  k on a vector space X is said to be strictly convex if, for any x 6= y 2 X
with kxk = r = kyk, we always have kx + (1 ; )yk < r for any 0 <  < 1. That is,
the open line segment between any pair of points on the surface of the ball of radius r in
X lies entirely inside the ball. We often simply say that the space X is strictly convex,
with the understanding that a property of the norm in X is implied. Here's an immediate
corollary to our last result:
Corollary. If X has a strictly convex norm, then, for any subspace Y of X and any point
x 2 X , there can be at most one best approximation to x out of Y . That is, Yx is either
empty or consists of a single point.
In order to arrive at a condition that's somewhat easier to check, let's translate our
original denition into a statement about the triangle inequality in X .
Lemma. X has a strictly convex norm if and only if the triangle inequality is strict on
non-parallel vectors that is, if and only if
x 6= y y 6= x all  2 R =) kx + yk < kxk + kyk:

Proof. First suppose that X is strictly convex, and let x and y be non-parallel vectors
in X . Then, in particular, the vectors x=kxk and y=kyk must be dierent. (Why?) Hence,
  kxk  x  kyk  y 
 
kxk + kyk kxk + kxk + kyk kyk  < 1:
That is, kx + yk < kxk + kyk.
Preliminaries 9
Next suppose that the triangle inequality is strict on non-parallel vectors, and let
x 6= y 2 X with kxk = r = kyk. If x and y are parallel, then we must have y = ;x.
(Why?) In this case,
kx + (1 ; ) yk = j2 ; 1jkxk < r
since j2 ; 1j < 1 whenever 0 <  < 1. Otherwise, x and y are non-parallel. In this case,
for any 0 <  < 1, the vectors x and (1 ; ) y are likewise non-parallel. Thus,

kx + (1 ; ) yk < kxk + (1 ; )kyk = r:

Examples
1. The usual norm on C  a b ] is not strictly convex (and so the problem of uniqueness
of best approximations is all the more interesting to tackle). For example, if f (x) = x
and g(x) = x2 in C  0 1 ], then kf k = 1 = kgk, f 6= g, while kf + gk = 2. (Why?)
2. The usual norm on Rn is strictly convex, as is any one of the norms k kp, 1 < p < 1.
(We'll prove these facts shortly.) The norms k  k1 and k  k1, on the other hand, are
not strictly convex. (Why?)
Appendix A
For completeness, we supply a few of the missing details concerning the `p-norms. We
begin with a handful of classical inequalities of independent interest. First recall that we
have dened a scale of \norms" on Rn by setting:
X
n !1=p
kxkp = jxi jp 1p<1
i=1
and
kxk1 = 1max jx j
i n i

where x = (xi )ni=1 2 Rn. Please note that the case p = 2 gives the usual Euclidean norm
on Rn and that the cases p = 1 and p = 1 clearly give rise to legitimate norms on Rn.
Common parlance is to refer to these expressions as `p-norms and to refer to the space
(Rn k  kp) as `np. The space of all innite sequences x = (xn )1
n=1 for which the analogous
Preliminaries 10
innite sum (or supremum) kxkp is nite is referred to as `p. What's more, there is a
\continuous" analogue of this scale: We might also consider the norms
Z b !1=p
kf kp = jf (x)jp dx 1p<1
a
and
kf k1 = sup jf (x)j
a x b
where f is in C  a b ] (or is simply Lebesgue integrable). The subsequent discussion actually
covers all of these cases, but we will settle for writing our proofs in the Rn setting only.
Lemma. (Young's inequality): Let 1 < p < 1, and let 1 < q < 1 be de ned by
p
p + q = 1 that is, q = p;1 . Then, for any a, b  0, we have
1 1

ab  1p ap + 1q bq :
Moreover, equality can only occur if ap = bq . (We refer to p and q as conjugate exponents
note that p satises p = q;q 1 . Please note that the case p = q = 2 yields the familiar
arithmetic-geometric mean inequality.)
Proof. A quick calculation before we begin:
q ; 1 = p ;p 1 ; 1 = p ;p(;p ;1 1) = p ;1 1 :
Now we just estimate areas for this you might nd it helpful to draw the graph of y = xp;1
(or, equivalently, the graph of x = yq;1 ). Comparing areas we get:
Za Zb
ab  x dx + yq;1 dy = p1 ap + 1q bq :
p ; 1
0 0
The case for equality also follows easily from the graph of y = xp;1 (or x = yq;1 ), since
b = ap;1 = ap=q means that ap = bq .
Corollary. (H older's inequality): Let 1 < p < 1, and let 1 < q < 1 be de ned by
1 1
p + q = 1. Then, for any a1 : : : an and b1 : : : bn in R we have:
Xn X
n !1=p X
n !1=q
jai bi j  jaijp jbi jq :
i=1 i=1 i=1
(Please note that the case p = q = 2 yields the familiar Cauchy-Schwarz inequality.)
Moreover, equality in Holder's inequality can only occur if there exist nonnegative
scalars  and  such that  jai jp =  jbi jq for all i = 1 : : : n.
Preliminaries 11
P P
Proof. Let A = ( ni=1 jai jp )1=p and let B = ( ni=1 jbi jq )1=q . We may clearly assume
that A, B 6= 0 (why?), and hence we may divide (and appeal to Young's inequality):
jaibi j  jai jp + jbi jq :
AB pAp qBq
Adding, we get:
1 Xn
ja b j  1 X n
j a jp + 1 X n
jb j q = 1 + 1 = 1:
AB i=1 i i pAp i=1 i qBq i=1 i p q
P
That is, ni=1 jai bi j  AB.
The case for equality in H older's inequality follows from what we know about Young's
inequality: Equality in H older's inequality means that either A = 0, or B = 0, or else
jai jp=pAp = jbi jq =qBq for all i = 1 : : : n. In short, there must exist nonnegative scalars 
and  such that  jai jp =  jbi jq for all i = 1 : : : n.
Notice, too, that the case p = 1 (q = 1) works, and is easy:
X
n X
n ! 
jaibi j  jai j max jb j
1 i n i
:
i=1 i=1

Exercise
When does equality occur in the case p = 1 (q = 1)?
Finally, an application of H older's inequality leads to an easy proof that kkp is actually
a norm. It will help matters here if we rst make a simple observation: If 1 < p < 1 and
if q = p;p 1 , notice that
 ( ja jp;1)n  = X
n
jai jp
!(p;1)=p
= kakpp;1:
i i=1 q
i=1
Lemma. (Minkowski's inequality): Let 1 < p < 1 and let a = (ai )ni=1, b = (bi )ni=1 2 Rn.
Then, ka + bkp  kakp + kbkp.
Proof. In order to prove the triangle inequality, we once again let q be dened by
1 1
p + q = 1, and now we use H older's inequality to estimate:
Xn Xn
jai + bi j = jai + bi j  jai + bi jp;1
p
i=1 i=1
Preliminaries 12
X
n X
n
 jaij  jai + bi jp;1 + jbi j  jai + bi jp;1
i=1 i=1
p ; n
 kakp  k ( jai + bi j )i=1kq +
1 kykp  k ( jai + bi jp;1)ni=1kq
= ka + bkpp;1 ( kakp + kbkp) :

That is, ka + bkpp  ka + bkpp;1 ( kakp + kbkp), and the triangle inequality follows.

If 1 < p < 1, then equality in Minkowski's inequality can only occur if a and b
are parallel that is, the `p-norm is strictly convex for 1 < p < 1. Indeed, if ka + bkp =
kakp +kbkp , then either a = 0, or b = 0, or else a, b 6= 0 and we have equality at each stage of
our proof. Now equality in the rst inequality means that jai + bi j = jai j + jbi j, which easily
implies that ai and bi have the same sign. Next, equality in our application of H older's
inequality implies that there are nonnegative scalars C and D such that jaijp = C jai + bi jp
and jbi jp = D jai + bi jp for all i = 1 : : : n. Thus, ai = E bi for some scalar E and all
i = 1 : : : n.
Of course, the triangle inequality also holds in either of the cases p = 1 or p = 1
(with much simpler proofs).

Exercises
When does equality occur in the triangle inequality in the cases p = 1 or p = 1? In
particular, show that neither of the norms k  k1 or k  k1 is strictly convex.

Appendix B
Next, we provide a brief review of completeness and compactness. Such review is doomed
to inadequacy the reader unfamiliar with these concepts would be well served to consult
a text on advanced calculus such as Analysis in Euclidean Spaces by K. Homan, or
Principles of Mathematical Analysis by W. Rudin.
To begin, we recall that a subset A of normed space X (such as R or Rn) is said to be
closed if A is closed under the taking of sequential limits. That is, A is closed if, whenever
(an ) is a sequence from A converging to some point x 2 X , we always have x 2 A. It's not
hard to see that any closed interval, such as  a b ] or  a 1), is, indeed, a closed subset of
R in this sense. There are, however, much more complicated examples of closed sets in R.
Preliminaries 13
A normed space X is said to be complete if every Cauchy sequence from X converges
(to a point in X ). It is a familiar fact from Calculus that R is complete, as is Rn. In fact,
the completeness of R is often assumed as an axiom (in the form of the least upper bound
axiom). There are, however, many examples of normed spaces which are not complete
that is, there are examples of normed spaces in which Cauchy sequences need not converge.
We say that a subset A of a normed space X is complete if every Cauchy sequence
from A converges to a point in A. Please note here that we require not only that Cauchy
sequences from A converge, but also that the limit be back in A. As you might imagine,
the completeness of A depends on properties of both A and the containing space X .
First note that a complete subset is necessarily also closed. Indeed, since every con-
vergent sequence is also Cauchy, it follows that a complete subset is closed.
Exercise
If A is a complete subset of a normed space X , show that A is also closed.
If the containing space X is itself complete, then it's easy to tell which of its subsets
are complete. Indeed, since every Cauchy sequence in X converges (somewhere), all we
need to know is whether the subset is closed.
Exercise
Let A be a subset of a complete normed space X . Show that A is complete if and only if
A is a closed subset of X . In particular, please note that every closed subset of R (or Rn)
is complete.
Finally, we recall that a subset A of a normed space X is said to be compact if every
sequence from A has a subsequence which converges to a point in A. Again, since we
have insisted that certain limits remain in A, it's not hard to see that compact sets are
necessarily also closed.
Exercise
If A is a compact subset of a normed space X , show that A is also closed.
Moreover, since a Cauchy sequence with a convergent subsequence must itself converge
Preliminaries 14
(why?), we actually have that every compact set is necessarily complete.
Exercise
If A is a compact subset of a normed space X , show that A is also complete.
Since the compactness of a subset A has something to do with every sequence in A,
it's not hard to believe that it is a more stringent property than the others we've considered
so far. In particular, it's not hard to see that a compact set must be bounded.
Exercise
If A is a compact subset of a normed space X , show that A is also bounded. Hint: If not,
then A would contain a sequence (an ) with kank ! 1.]
Now it is generally not so easy to describe the compact subsets of a particular normed
space X , however, it is quite easy to describe the compact subsets of R (or Rn). This
well-known result goes by many names we will refer to it as the Heine-Borel theorem.
Theorem. A subset A of R (or Rn) is compact if and only if A is both closed and
bounded.
Proof. One direction of the proof is easy: As we've already seen, compact sets in R
are necessarily closed and bounded. For the other direction, notice that if A is a bounded
subset of R, then it follows from the Bolzano-Weierstrass theorem that every sequence
from A has a subsequence which converges in R. If A is also a closed set, then this limit
must, in fact, be back in A. Thus, every sequence in A has a subsequence converging to a
point in A.
Appendix C
We next oer a brief review of pointwise and uniform convergence. We begin with an
elementary example:
Example
(a) For each n = 1 2 3 : : :, consider the function fn(x) = ex + nx for x 2 R. Note that
for each (xed) x the sequence (fn (x))1 x
n=1 converges to f (x) = e because
jfn(x) ; f (x)j = jnxj ! 0 as n ! 1:
Preliminaries 15
In this case we say that the sequence of functions (fn ) converges pointwise to the
function f on R. But notice, too, that the rate of convergence depends on x. In
particular, in order to get jfn(x) ; f (x)j < 1=2 we would need to take n > 2jxj. Thus,
at x = 2, the inequality is satised for all n > 4, while at x = 1000, the inequality is
satised only for n > 2000. In short, the rate of convergence is not uniform in x.
(b) Consider the same sequence of functions as above, but now let's suppose that we
restrict that values of x to the interval ;5 5 ]. Of course, we still have that fn(x) !
f (x) for each (xed) x in ;5 5 ] in other words, we still have that (fn ) converges
pointwise to f on ;5 5 ]. But notice that the rate of convergence is now uniform over
x in ;5 5 ]. To see this, just rewrite the initial calculation:
jfn(x) ; f (x)j = jnxj  n5 for x 2 ;5 5 ]
and notice that the upper bound 5=n tends to 0, as n ! 1, independent of the choice
of x. In this case, we say that (fn ) converges uniformly to f on ;5 5 ]. The point
here is that the notion of uniform convergence depends on the underlying domain as
well as on the sequence of functions at hand.
With this example in mind, we now oer formal denitions of pointwise and uniform
convergence. In both cases we consider a sequence of functions fn : X ! R, n = 1 2 3 : : :,
each dened on the same underlying set X , and another function f : X ! R (the candidate
for the limit).
We say that (fn ) converges pointwise to f on X if, for each x 2 X , we have fn(x) !
f (x) as n ! 1 thus, for each x 2 X and each " > 0, we can nd an integer N (which
depends on " and which may also depend on x) such that jfn(x) ; f (x)j < " whenever
n > N . A convenient shorthand for pointwise convergence is: fn ! f on X or, if X is
understood, simply fn ! f .
We say that (fn ) converges uniformly to f on X if, for each " > 0, we can nd
an integer N (which depends on " but not on x) such that jfn(x) ; f (x)j < " for each
x 2 X , provided that n > N . Please notice that the phrase \for each x 2 X " now occurs
well after the phrase \for each " > 0" and, in particular, that the rate of convergence N
does not depend on x. It should be reasonably clear that uniform convergence implies
Preliminaries 16
pointwise convergence in other words, uniform convergence is \stronger" than pointwise
convergence. For this reason, we sometimes use the shorthand: fn f on X or, if X is
understood, simply fn f .
The denition of uniform convergence can be simplied by \hiding" one of the quan-
tiers under dierent notation indeed, note that the phrase \jfn(x) ; f (x)j < " for any
x 2 X " is (essentially) equivalent to the phrase \supx2X jfn (x) ; f (x)j < "." Thus, our
denition may be reworded as follows: (fn) converges uniformly to f on X if, given " > 0,
there is an integer N such that supx2X jfn (x) ; f (x)j < " for all n > N .
The notion of uniform convergence exists for one very good reason: Continuity is
preserved under uniform limits. This fact is well worth stating.
Exercise
Let X be a subset of R, let f , fn : X ! R for n = 1 2 3 : : :, and let x0 2 X . If each fn
is continuous at x0 , and if fn f on X , then f is continuous at x0. In particular, if each
fn is continuous on all of X , then so is f . Give an example showing that this result may
fail if we only assume that fn ! f on X .

Appendix D
Lastly, we discuss continuity for linear transformations between normed vector spaces.
Throughout this section, we consider a linear map T : V ! W between vector spaces V
and W  that is we suppose that T satises T (x + y) = T (x) + T (y) for all x, y 2 V ,
and all scalars ,  . Please note that every linear map T satises T (0) = 0. If we further
suppose that V is endowed with the norm k  k, and that W is endowed with the norm
jjj jjj, the we may consider the issue of continuity of the map T .
The key result for our purposes is that, for linear maps, continuity|even at a single
point|is equivalent to uniform continuity (and then some!).

Theorem. Let (V k  k ) and (W jjj jjj ) be normed vector spaces, and let T : V ! W be a
linear map. Then, the following are equivalent:
(i) T is Lipschitz
(ii) T is uniformly continuous
Preliminaries 17
(iii) T is continuous (everywhere)
(iv) T is continuous at 0 2 V 
(v) there is a constant C < 1 such that jjj T (x) jjj  C kxk for all x 2 V .

Proof. Clearly, (i) =) (ii) =) (iii) =) (iv). We need to show that (iv) =) (v), and
that (v) =) (i) (for example). The second of these is easier, so let's start there.
(v) =) (i): If condition (v) holds for a linear map T , then T is Lipschitz (with constant
C ) since jjj T (x) ; T (y) jjj = jjj T (x ; y) jjj  C kx ; yk for any x, y 2 V .
(iv) =) (v): Suppose that T is continuous at 0. Then we may choose a
> 0 so that
jjj T (x) jjj = jjj T (x) ; T (0) jjj  1 whenever kxk = kx ; 0k 
. (How?)
 
Given 0 6= x 2 V , we may scale by the factor
=kxk to get 
x=kxk  =
. Hence,
 T ;
x=kxk   1. But T ;
x=kxk = (
=kxk) T (x), since T is linear, and so we get
jjj T (x) jjj  (1=
)kxk. That is, C = 1=
works in condition (v). (Note that since condition
(v) is trivial for x = 0, we only care about the case x 6= 0.)

A linear map satisfying condition (v) of the Theorem (i.e., a continuous linear map)
is often said to be bounded. The meaning in this context is slightly dierent than usual.
Here it means that T maps bounded sets to bounded sets. This follows from the fact
that T is Lipschitz. Indeed, if jjj T (x) jjj  C kxk for all x 2 V , then (as we've seen)
jjj T (x) ; T (y) jjj  C kx ; yk for any x, y 2 V , and hence T maps the ball about x of
; 
radius r into the ball about T (x) of radius Cr. In symbols, T Br (x) BCr ( T (x)). More
generally, T maps a set of diameter d into a set of diameter at most Cd. There's no danger
of confusion in our using the word bounded to mean something new here the ordinary
usage of the word (as applied to functions) is uninteresting for linear maps. A nonzero
linear map always has an unbounded range. (Why?)
The smallest constant that works in (v) is called the norm of the operator T and is
usually written kT k. In symbols,

kT k = sup jjjkTx jjj = sup jjjTx jjj:


x6=0 xk kxk=1

Thus, T is bounded (continuous) if and only if kT k < 1.


Preliminaries 18
The fact that all norms on a nite-dimensional normed space are equivalent provides
a nal (rather spectacular) corollary.
Corollary. Let V and W be normed vector spaces with V nite-dimensional. Then, every
linear map T : V ! W is continuous.
Proof. Let x1 : : : xn be a basis for V and let k Pni=1 ixi k1 = Pni=1 jij, as before.
From the Lemma on page 3, we know that there is a constant B < 1 such that kxk1 
B kxk for every x 2 V .
Now if T : (V k  k ) ! (W jjj  jjj) is linear, we get
 X n !   X n 
 T  =  iT (xi ) 
 i=1 i xi  i=1
X n
 jij jjj T (xi ) jjj
i=1 X
n
 max jjj T (xj ) jjj jij
1 j n i=1
  Xn 
 B max jjj T (xj ) jjj  ixi  :
1 j n i=1
That is, jjj T (x) jjj  C kxk, where C = B max jjj T (xj ) jjj (a constant depending only on T
1 j n
and the choice of basis for V ). From our last result, T is continuous (bounded).
Math 682 Problem Set: Function Spaces 5/18/98
Problems marked (.) are essential to a full understanding of the course we will discuss
most of these in class. Problems marked ( ) are of general interest and are oered as a
contribution to your personal growth. Unmarked problems are just for fun.]

The most important collection of functions for our purposes is the space C  a b ], consisting
of all continuous functions f :  a b ] ! R. It's easy to see that C  a b ] is a vector space
under the usual pointwise operations on functions: (f + g)(x) = f (x)+ g(x) and (f )(x) =
f (x) for  2 R. Actually, we will be most interested in the nite-dimensional subspaces
Pn of C  a b ], consisting of all algebraic polynomials of degree at most n.
. 1. The subspace Pn has dimension exactly n + 1. Why?
Another useful subset of C  a b ] is the collection lipK , consisting of all those f 's which
satisfy a Lipschitz condition of order  > 0 with constant 0 < K < 1 i.e., those f 's for
which jf (x) ; f (y)j  K jx ; yj for all x, y in  a b ]. Some authors would say that f is
Holder continuous with exponent .]
2. (a) Show that lipK  is, indeed, a subset of C  a b ].
(b) If  > 1, show that lipK  contains only the constant functions.
p
(c) Show that x is in lip1 (1=2) and that sin x is in lip11 on  0 1 ].
(d) Show that the collection lip , consisting of all those f 's which are in lipK  for
some K , is a subspace of C  a b ].
(e) Show that lip 1 contains all the polynomials.
(f) If f 2 lip  for some  > 0, show that f 2 lip  for all 0 <  < .
(g) Given 0 <  < 1, show that x is in lip1  on  0 1 ] but not in lip  for any  > .
We will also want to consider a norm on the vector space C  a b ] we typically use the
uniform or sup norm (Rivlin calls this the Chebyshev norm) dened by kf k = amax x b
jf (x)j.
Some authors write kf ku or kf k1.]
3. Show that Pn and lipK  are closed subsets of C  a b ] (under the sup norm). Is lip 
closed? A bit harder: Show that lip 1 is both rst category and dense in C  a b ].
P
. 4. Fix n and consider the norm kpk1 = nk=0 jak j for p(x) = a0 + a1x +  + anxn 2 Pn.
Function Spaces 20
Show that there are constants 0 < An Bn < 1 such that Ankpk1  kpk  Bnkpk1,
where kpk = max jp(x)j. Do An and Bn really depend on n ?
a x b
We will occasionally consider spaces of real-valued functions dened on nite sets that
is, we will consider Rn under various norms. (Why is this the same?) We dene a scale
P
of norms on Rn by kxkp = ( ni=1 jxi jp)1=p, where x = (x1 : : : xn ) and 1  p < 1 (we
need p  1 in order for this expression to be a legitimate norm, but the expression makes
perfect sense for any p > 0, and even for p < 0 provided no xi is 0). Notice, please, that
the usual norm on Rn is given by kxk2.
5. Show that plim
!1 kxkp = 1max jx j. For this reason we dene kxk1 = 1max
i n i
jx j. Thus
i n i
Rn under the norm k  k1 is the same as C (f1 2 : : : ng) with its usual norm.

6. Assuming xi 6= 0 for i = 1 : : : n, compute plim !0+ p


lim kxkp .
kxk and p!;1
7. Consider R2 under the norm kxkp . Draw the graph of the unit sphere fx : kxkp = 1g
for various values of p (especially p = 1, 2, 1).
8. (Young's inequality): Let 1 < p < 1 and let q satisfy p1 + 1q = 1. Show that
ab  p1 ap + 1q bq for all a, b  0 with equality if and only if ap = bq .
9. (H older's inequality): Let 1 < p < 1 and let q satisfy p1 + q1 = 1. Show that
(a)
Pn ja b j  (Pn ja jp)1=p (Pn jb jq )1=q , and
i=1 i i i=1 i i=1 i
R
R 1=p
R b q 1=q
(b) b jf (x) g(x)j dx  b jf (x)jp dx
a a jg(x)j dx .
a
Describe the case for equality in each inequality. What happens if p = 1 (q = 1)?
10. (Minkowski's inequality): For 1  p < 1, show that
P P P
(a) ( ni=1 jai + bi jp)1=p  ( ni=1 jai jp)1=p + ( ni=1 jbi jp)1=p and that

R b 1=p
R b 1=p
1=p
(b) a jf (x) + g (x)j dx
p  a jf (x)jp dx + Rab jg(x)jp dx .
Describe the case for equality in each inequality. What happens if p = 1?
Exercise 10 shows that k  kp is indeed a norm for 1  p < 1. We write Lp a b ] to mean
the vector space of functions on  a b ] for which the integral norm is dened and nite, we
write `np to mean the vector space of sequences of length n that is, Rn supplied with the
Function Spaces 21
norm k  kp, and we write `p to mean the vector space of innite sequences x = (xn )1 n=1
for which kxkp < 1. In each space, the usual algebraic operations are dened pointwise
(or coordinatewise) and the norm is understood to be k  kp.
A normed space (X k  k) is said to be strictly convex if kx + yk = kxk + kyk always
implies that x and y lie in the same direction that is, either x = y or y = x for some
nonnegative scalar . Equivalently, X is strictly convex if the triangle inequality is always
strict on nonparallel vectors.
11. Prove that the following are equivalent:
(a) (X k  k) is strictly convex.  x + y  kxk + kyk
(b) If x, y 2 X are nonparallel, then  2  < 2 .
 x + y 
(c) If x 6= y 2 X with kxk = 1 = kyk, then  2  < 1.
12. Show that Lp and `p are strictly convex for 1 < p < 1. Show also that this fails in
case p = 1. Hint: This is actually a statement about the function jtjp, 1 < p < 1.]
Strictly convex spaces are of interest when considering the problem of nearest points: Given
a nonempty subset K of a normed space X and a point x 2= K , we ask whether there is a
best approximation to x from elements of K  that is, we want to know if there exist one
or more points y0 2 K satisfying
kx ; y0 k = yinf
2K
kx ; yk = dist (x K ):
It's not hard to see that a satisfactory answer to this question will require that we take K
to be a closed set in X (for otherwise the points in K n K wouldn't have nearest points).
Less easy to see is that we typically also want to assume that K is a convex set. Recall
that a subset K of a vector space X is said to be convex if it contains the line segment
joining any pair of its points that is, K is convex if
x y 2 K 0    1 =) x + (1 ; )y 2 K:
Obviously, any subspace of X is a convex set and, for our purposes at least, this is the
most important example.
13. Let X be a normed space and let B = fx 2 X : kxk  1g. Show that B is a closed
convex set.
Function Spaces 22
14. Consider R2 under the norm k  k1. Let B = fy 2 R2 : kyk1  1g and let x = (2 0).
Show that there are innitely many points in B nearest to x.
15. (a) Let K = ff 2 L1 0 1 ] : f  0 and kf k1 = 1g. Show that K is a closed convex
set in L1 0 1 ], that 0 2= K , and that every point in K is a nearest point to 0.
R
(b) Let K = ff 2 C  0 1 ] : f (0) = 0 and 01 f = 1g. Again, show that K is a closed
convex set in C  0 1 ], that 0 2= K , but that no point in K is nearest to 0.
16. Let K be a compact convex set in a strictly convex space X and let x 2 X . Show
that x has a unique nearest point y0 2 K .
17. Let K be a closed subset of a complete normed space X . Prove that K is convex if
and only if K is midpoint convex that is, if and only if (x + y)=2 2 K whenever x,
y 2 K . Is this result true in more general settings? For example, can you prove it
without assuming completeness? Or, for that matter, is it true for arbitrary sets in
any vector space (i.e., without even assuming the presence of a norm)?
Math 682 Approximation by Algebraic Polynomials 5/20/98

Introduction
Let's begin with some notation. Throughout, we're concerned with the problem of best
(uniform) approximation of a given function f 2 C  a b ] by elements from Pn, the subspace
of algebraic polynomials of degree at most n in C  a b ]. We know that the problem has a
solution (possibly more than one), which we've chosen to write as pn . We set

En(f ) = pmin
2P n
kf ; pk = kf ; pnk:
Since Pn Pn+1 for each n, it's clear that En(f )  En+1(f ) for each n. Our goal in this
chapter is to prove that En(f ) ! 0. We'll accomplish this by proving:
Theorem. (The Weierstrass Approximation Theorem, 1885): Let f 2 C  a b ]. Then,
for every " > 0, there is a polynomial p such that kf ; pk < ".

It follows from the Weierstrass theorem that pn f for each f 2 C  a b ]. (Why?)
This is an important rst step in determining the exact nature of En(f ) as a function of
f and n. We'll look for much more precise information in later sections.
Now there are many proofs of the Weierstrass theorem (a mere three are outlined in
the exercises, but there are hundreds!), and all of them start with one simplication: The
underlying interval  a b ] is of no consequence.

Lemma. If the Weierstrass theorem holds for C  0 1 ], then it also holds for C  a b ], and
conversely. In fact, C  0 1 ] and C  a b ] are, for all practical purposes, identical: They
are linearly isometric as normed spaces, order isomorphic as lattices, and isomorphic as
algebras (rings).
Proof. We'll settle for proving only the rst assertion the second is outlined in the
exercises (and uses a similar argument).
Given f 2 C  a b ], notice that the function
;
g(x) = f a + (b ; a)x
 0x1
Algebraic Polynomials 24
denes an element of C  0 1 ]. Now, given " > 0, suppose that we can nd a polynomial p
such that kg ; pk < " in other words, suppose that
;  
max f a + (b ; a)x ; p(x) < ":
0 x 1
Then,   t ; a 
max f (t) ; p b ; a  < ":
a t b

t;a
(Why?) But if p(x) is a polynomial in x, then q(t) = p b;a is a polynomial in t (again,
why?) satisfying kf ; qk < ".
The
proof of the converse is entirely similar: If g(x) is an element of C  0 1 ], then
f (t) = g bt;;aa , a  t  b, denes an element of C  a b ]. Moreover, if q(t) is a polynomial
in t approximating f (t), then p(x) = q(a + (b ; a)x) is a polynomial in x approximating
g(x). The remaining details are left as an exercise.
The point to our rst result is that it suces to prove the Weierstrass theorem for
any interval we like  0 1 ] and ;1 1 ] are popular choices, but it hardly matters which
interval we use.
Bernstein's Proof
The proof of the Weierstrass theorem we present here is due to the great Russian math-
ematician S. N. Bernstein in 1912. Bernstein's proof is of interest to us for a variety of
reasons perhaps most important is that Bernstein actually displays a sequence of polyno-
mials that approximate a given f 2 C  0 1 ]. Moreover, as we'll see later, Bernstein's proof
generalizes to yield a powerful, unifying theorem, called the Bohman-Korovkin theorem.
If f is any bounded function on  0 1 ], we dene the sequence of Bernstein polynomials
for f by n  k  n
;B (f )(x) = X f n  k xk (1 ; x)n;k 0  x  1:
n
k=0
Please note that Bn(f ) is a polynomial of degree at most n. Also, it's easy to see that
;B (f )(0) = f (0), and ;B (f )(1) = f (1). In general, ;B (f )(x) is an average of
n n n
the numbers f (k=n), k = 0 : : : n. Bernstein's theorem states that Bn(f ) f for each
f 2 C  0 1 ]. Surprisingly, the proof actually only requires that we check three easy cases:
f0 (x) = 1 f1(x) = x and f2(x) = x2 :
Algebraic Polynomials 25
This, and more, is the content of the following lemma.

Lemma. (i) Bn(f0 ) = f0 and Bn(f1 ) = f1 .




(ii) Bn(f2 ) = 1 ; 1 f2 + 1 f1 , and hence Bn(f2 ) f2.
n n
X
k 2 n k
n  
(iii) ; x x (1 ; x) n;k = x(1 ; x)  1 , if 0  x  1.
k=0 n k n 4n
(iv) Given
> 0 and 0  x  1, let F denote the set of k's in f0 : : : ng for which
 k  X n k n;k  1 .
 n ; x 
. Then x (1 ; x )
k 2F k 4n
2

Proof. That Bn (f0 ) = f0 follows from the binomial formula:

n n
X k n;k n
k=0 k x (1 ; x) = x + (1 ; x)] = 1:

To see that Bn(f1 ) = f1, rst notice that for k  1 we have

k n = (n ; 1) !
n ; 1
n k (k ; 1) ! (n ; k) ! = k ; 1 :

Consequently,
n k n
X n n ; 1
X
xk (1 ; x)n;k = x k;1
k ; 1 x (1 ; x)
n;k
k=0 n k k=1
;1 n ; 1
nX
= x j xj (1 ; x)(n;1);j = x:
j =0

Next, to compute Bn(f2 ), we rewrite twice:


 k 2 n k n ; 1 n ; 1 k ; 1 n ; 1 1 n ; 1
n k = n k ; 1 = n  n ; 1 k ; 1 + n k ; 1 if k  1
 1 n ; 2 1 n ; 1
= 1; n k;2 + n k;1 if k  2:
Algebraic Polynomials 26
Thus,
n  k 2 n
X k n;k
k=0 n k x (1 ; x)
1 Xn n;2   1 Xn n;1  
= 1; n xk (1 ; x ) n ; k + xk (1 ; x)n;k
k=2 k ; 2 n k=1 k ; 1
 1 1
= 1 ; n x2 + n x

which establishes (ii) since kBn(f2 ) ; f2 k = n1 kf1 ; f2 k ! 0 as n ! 1.

To prove (iii) we combine the results in (i) and (ii) and simplify. Since ((k=n) ; x)2 =
(k=n)2 ; 2x(k=n) + x2, we get
n k
X 2 n  
n ;x k xk (1 ; x)n;k = 1 ; n1 x2 + n1 x ; 2x2 + x2
k=0
= n1 x(1 ; x)  41n

for 0  x  1.

Finally, to prove (iv), note that 1  ((k=n) ; x)2 =


2 for k 2 F , and hence
X n X
k 2 n k
k xk (1 ; x)n;k 
12 n ; x k x (1 ; x)n;k
k 2F k2F
1 Xn
k 
2 n k

2 n ; x k x (1 ; x)n;k
k=0
 4n
1 2 from (iii).

Now we're ready for the proof of Bernstein's theorem:


Proof. Let f 2 C  0 1 ] and let " > 0. Then, since f is uniformly continuous, there is
a
> 0 such that jf (x) ; f (y)j < "=2 whenever jx ; yj <
. Now we use the previous
;
lemma to estimate kf ; Bn(f )k. First notice that since the numbers nk xk (1 ; x)n;k are
Algebraic Polynomials 27
nonnegative and sum to 1, we have
 Xn n  k  

jf (x) ; Bn(f )(x)j = f (x) ; f x k (1 ; x)n;k 
k=0 k n 
X n     
=  f (x) ; f n k n
k xk (1 ; x)n;k 
k=0
X n   k  n

 f (x) ; f n  k xk (1 ; x)n;k
k=0
Now x n (to be specied in a moment) and let F denote the set of k's in f0 : : : ng for
which j(k=n) ; xj 
. Then jf (x) ; f (k=n)j < "=2 for k 2= F , while jf (x) ; f (k=n)j  2kf k
for k 2 F . Thus,
f (x) ; ;B (f )(x)
n
" X n k ; X n k
 2 k
n k
x (1 ; x) + 2kf k k x (1 ; x)n;k
k=2F k 2F
< "2  1 + 2kf k  4n

1 from (iv) of the Lemma,


2
< " provided that n > kf k="
2 :
Landau's Proof
Just because it's good for us, let's give a second proof of Weierstrass's theorem. This one
is due to Landau in 1908. First, given f 2 C  0 1 ], notice that it suces to approximate
f ; p, where p is any polynomial. (Why?) In particular, by subtracting the linear function
f (0)+ x(f (1) ; f (0)), we may suppose that f (0) = f (1) = 0 and, hence, that f 0 outside
 0 1 ]. That is, we may suppose that f is dened and uniformly continuous on all of R.
Again we will display a sequence of polynomials that converge uniformly to f  this
time we dene Z1
Ln(x) = cn f (x + t) (1 ; t2)n dt
;1
where cn is chosen so that Z1
cn (1 ; t2)n dt = 1:
;1
Note that by our assumptions on f , we may rewrite this expression as
Z 1;x Z1
Ln(x) = cn f (x + t) (1 ; t2)n dt = cn f (t) (1 ; (t ; x)2 )n dt:
;x 0
Algebraic Polynomials 28
Written this way, it's clear that Ln is a polynomial in x of degree at most n.
We rst need to estimate cn. An easy induction argument will convince you that
(1 ; t2)n  1 ; nt2, and so we get
Z1 Z 1=pn
(1 ; t2 )n dt  2 (1 ; nt2) dt = 3p4 n > p1n
;1 0
p
from which it follows that cn < n. In particular, for any 0 <
< 1,
Z1 p
cn (1 ; t2)n dt < n (1 ;
2)n ! 0 (n ! 1)

which is the inequality we'll need.
Next, let " > 0 be given, and choose 0 <
< 1 such that
jf (x) ; f (y)j  "=2 whenever jx ; yj 
:
Then, since cn(1 ; t2)n  0 and integrates to 1, we get
 Z 1 

jLn(x) ; f (x)j = cn f (x + t) ; f (x) (1 ; t2)n dt
0
Z1
 cn jf (x + t) ; f (x)j(1 ; t2)n dt
0
" Z Z1
 2 cn (1 ; t ) dt + 2kf k cn (1 ; t2)n dt
2 n
0 
" p
 2 + 2kf k n (1 ;
2)n < "
provided that n is suciently large.
A third proof of the Weierstrass theorem, due to Lebesgue in 1898, is outlined in the
exercises. Lebesgue's proof is of particular interest since it inspired Stone's version of the
Weierstrass theorem we'll discuss the Stone-Weierstrass theorem a bit later in the course.
Before we go on, let's stop and make an observation or two: While the Bernstein
polynomials Bn(f ) oer a convenient and explicit polynomial approximation to f , they
are by no means the best approximations. Indeed, recall that if f1(x) = x and f2 (x) = x2 ,
then Bn(f2 ) = (1 ; n1 )f2 + n1 f1 6= f2 . Clearly, the best approximation to f2 out of Pn
should be f2 itself whenever n  2. On the other hand, since we always have
En(f )  kf ; Bn(f )k (why?)
Algebraic Polynomials 29
a detailed understanding of Bernstein's proof will lend insight into the general problem of
polynomial approximation. Our next project, then, is to improve upon our estimate of the
error kf ; Bn(f )k.

Improved Estimates
To begin, we will need a bit more notation. The modulus of continuity of a bounded
function f on the interval  a b ] is dened by


!f (
) = !f ( a b ]
) = sup jf (x) ; f (y)j : x y 2  a b ] jx ; yj 

for any
> 0. Note that !f (
) is a measure of the \"" that goes along with
(in the
denition of uniform continuity) literally, we have written " = !f (
) as a function of
.
Here are a few easy facts about the modulus of continuity:

Exercises
1. We always have jf (x) ; f (y)j  !f ( jx ; yj ) for any x 6= y 2  a b ].
2. If 0 <
0 
, then !f (
0 )  !f (
).
3. f is uniformly continuous if and only if !f (
) ! 0 as
! 0+.
4. If f 0 exists and is bounded on  a b ], then !f (
)  K
for some constant K .
5. More generally, we say that f satises a Lipschitz condition of order  with constant
K , where 0 <   1 and 0  K < 1, if jf (x) ; f (y)j  K jx ; yj for all x, y. We
abbreviate this statement by the symbols: f 2 lipK . Check that if f 2 lipK , then
!f (
)  K
 for all
> 0.

For the time being, we actually only need one simple fact about !f (
):

Lemma. Let f be a bounded function on  a b ] and let


> 0. Then, !f (n
)  n !f (
)
for n = 1 2 : : :. Consequently, !f (
)  (1 + ) !f (
) for any  > 0.

Proof. Given x < y with jx ; y j  n


, split the interval  x y ] into n pieces, each of
length at most
. Specically, if we set zk = x + k(y ; x)=n, for k = 0 1 : : : n, then
Algebraic Polynomials 30
jzk ; zk;1 j 
for any k  1, and so
X 
jf (x) ; f (y)j =  n f (zk ) ; f (zk;1 )
k=1 
Xn
 jf (zk ) ; f (zk;1 )j
k=1
 n !f (
):
Thus, !f (n
)  n !f (
).
The second assertion follows from the rst (and one of our exercises). Given  > 0,
choose an integer n so that n ; 1 <   n. Then,
!f (
)  !f (n
)  n !f (
)  (1 + ) !f (
):

We next repeat the proof of Bernstein's theorem, making a few minor adjustments
here and there.
Theorem. For any bounded function f on  0 1 ] we have
 
kf ; Bn(f )k  23 !f p1n :
In particular, if f 2 C  0 1 ], then En(f )  23 !f ( p1n ) ! 0 as n ! 1.
Proof. We rst do some term juggling:
Xn     
jf (x) ; Bn(f )(x)j =  f (x) ; f n k n
k xk (1 ; x)n;k 

k=0
Xn   k  n
 f (x) ; f n  k xk (1 ; x)n;k

k=0
Xn  k  n
 !f x ; n  k xk (1 ; x)n;k
k=0
 1 X n  p  k  n
 !f pn 1 + n x ; n  k xk (1 ; x)n;k
k=0
 1 " p X n  k
 n #
= !f pn 1 + n x ; n  k x (1 ; x) k n ; k
k=0
Algebraic Polynomials 31
p 
where the third inequality follows from our previous Lemma (by taking  = n x ; nk 

and
= p1n ). All that remains is to estimate the sum, and for this we'll use Cauchy-
Schwarz (and our earlier observations about Bernstein polynomials). Since each of the
;
terms nk xk (1 ; x)n;k is nonnegative, we have
Xn   
x ; k  n xk (1 ; x)n;k
k=0 n k
"Xn  2 n #1=2 "Xn n #1=2
 k
x ;  xk (1 ; x)n;k  xk (1 ; x)n;k
k=0 n k k=0 k
 1 1=2 1
 4n = 2pn :
Finally,  1 
p 1
 3 1
jf (x) ; Bn(f )(x)j  !f pn 1 + n  2pn = 2 !f pn :
Examples
1. If f 2 lipK , it follows that kf ; Bn(f )k  23 Kn;=2 and hence En(f )  23 Kn;=2.
 
2. As a particular case of the rst example, consider f (x) = x ; 12  on  0 1 ]. Then
f 2 lip11, and so kf ; Bn(f )k  23 n;1=2 . But, as Rivlin points out (see Remark 3 on
p. 16 of his book), kf ; Bn(f )k > 12 n;1=2 . Thus, we can't hope to improve on the
power of n in this estimate. Nevertheless, we will see an improvement in our estimate
of En(f ).
The Bohman-Korovkin Theorem
The real value to us in Bernstein's approach is that the map f 7! Bn(f ), while providing
a simple formula for an approximating polynomial, is also linear and positive. In other
words,
Bn(f + g) = Bn(f ) + Bn(g)
Bn(f ) = Bn(f )  2 R
and
Bn(f )  0 whenever f  0:
As it happens, any positive, linear map T : C  0 1 ] ! C  0 1 ] is necessarily also continuous!
Algebraic Polynomials 32
Lemma. If T : C  a b ] ! C  a b ] is both positive and linear, then T is continuous.
Proof. First note that a positive, linear map is also monotone. That is, T satises
T (f )  T (g) whenever f  g. (Why?) Thus, for any f 2 C  a b ], we have

;f f  jf j =) ;T (f ) T (f )  T (jf j)
that is, jT (f )j  T (jf j). But now jf j  kf k  1, where 1 denotes the constant 1 function,
and so we get
jT (f )j  T (jf j)  kf k T (1):
Thus,
kT (f )k  kf k kT (1)k
for any f 2 C  a b ]. Finally, since T is linear, it follows that T is Lipschitz with constant
kT (1)k:
kT (f ) ; T (g)k = kT (f ; g)k  kT (1)kkf ; gk:
Consequently, T is continuous.
Now positive, linear maps abound in analysis, so this is a fortunate turn of events.
What's more, Bernstein's theorem generalizes very nicely when placed in this new setting.
The following elegant theorem was proved (independently) by Bohman and Korovkin in,
roughly, 1952.
Theorem. Let Tn : C  0 1 ] ! C  0 1 ] be a sequence of positive, linear maps, and suppose
that Tn(f ) ! f uniformly in each of the three cases

f0 (x) = 1 f1(x) = x and f2(x) = x2 :

Then, Tn(f ) ! f uniformly for every f 2 C  0 1 ].


The proof of the Bohman-Korovkin theorem is essentially identical to the proof of
Bernstein's theorem except, of course, we write Tn (f ) in place of Bn(f ). For full details,
see Cheney's book An Introduction to Approximation Theory, Chelsea, 1982. Rather than
proving the theorem, let's settle for a quick application.
Algebraic Polynomials 33
Example
Let f 2 C  0 1 ] and, for each n, let Ln(f ) be the \polygonal" approximation to f with
nodes at k=n, k = 0 1 : : : n. That is, Ln(f ) is linear on each subinterval  (k ; 1)=n k=n ]
and agrees with f at each of the endpoints Ln(f )(k=n) = f (k=n). Then, Ln(f ) ! f
uniformly for each f 2 C  0 1 ]. This is actually an easy calculation all by itself, but let's
see why the Bohman-Korovkin theorem makes short work of it.
That Ln(f ) is positive and linear is (nearly) obvious that Ln(f0 ) = f0 and Ln(f1 ) = f1
are really easy since, in fact, Ln(f ) = f for any linear function f . We just need to show
that Ln(f2 ) f2 . But a picture will convince you that the maximum distance between
Ln(f2 ) and f2 on the interval  (k ; 1)=n k=n ] is at most
 k 2  k ; 1 2 2k ; 1 2
n ; n =  :
n2 n
That is, kf2 ; Ln(f2 )k  2=n ! 0 as n ! 1.
Note that Ln is a linear projection from C  0 1 ] onto the subspace of polygonal
functions based on the nodes k=n, k = 0 : : : n. An easy calculation, similar in spirit
to the example above, will show that kf ; Ln(f )k  2 !f (1=n) ! 0 as n ! 1 for any
f 2 C  0 1 ].]
Math 682 Problem Set: Uniform Approximation by Polynomials 5/20/98

One of our rst tasks will be to give a constructive proof of Weierstrass's Theorem, stating
that each f 2 C  a b ] is the uniform limit of a sequence of polynomials. As it happens, the
choice of interval  a b ] is inconsequential: If Weierstrass's theorem is true for one, then
it's true for all.
. 18. Dene :  0 1 ] !  a b ] by (t) = a + t(b ; a) for 0  t  1, and dene a transfor-
mation T : C  a b ] ! C  0 1 ] by (T (f ))(t) = f ( (t)). Prove that T satises:
(a) T (f + g) = T (f ) + T (g) and T (cf ) = c T (f ) for c 2 R.
(b) T (fg) = T (f ) T (g). In particular, T maps polynomials to polynomials.
(c) T (f )  T (g) if and only if f  g.
(d) kT (f )k = kf k.
(e) T is both one-to-one and onto. Moreover, (T );1 = T;1 .
The point to exercise 18 is that C  a b ] and C  0 1 ] are identical as vector spaces, metric
spaces, algebras, and lattices. For all practical purposes, they are one and the same space.
While Bernstein's proof of the Weierstrass theorem (below) will prove most useful for our
purposes, there are many others two of these (in the case of C  0 1 ]) are sketched below.
19. (Landau's proof): For each n = 1 2 : : : and 0 
 1, dene In(
) = R1 (1 ; x2 )n dx.
Show that In(
)=In (0) ! 0 as n ! 1 for any
> 0. Now, given f 2 C  0 1 ] with
R
f (0) = f (1) = 0, show that the polynomial Ln(x) = (2In(0));1 01 f (t)(1;(t;x)2 )n dt
converges uniformly to f (x) on  0 1 ] as n ! 1. Hint: You may assume that f 0
outside of  0 1 ].] To get the result for general f 2 C  0 1 ], we simply need to subtract
the linear function f (0) + x(f (1) ; f (0)).
20. (Lebesgue's proof): Given f 2 C  0 1 ], rst show that f can be uniformly approxi-
mated by a polygonal function. Specically, given a positive integer N , dene L(x)
by the conditions L(k=N ) = f (k=N ) for k = 0 1 : : : N , and L(x) is linear for
k=N  x  (k +1)=N  show that kf ; Lk is small provided that N is suciently large.
The function L(x) can be written (uniquely) as a linear combination of the \angles"
P
'k (x) = jx ; k=N j + x ; k=N and 'N (x) = 1 the equation L(x) = Nk=0 ck 'k (x) can
P
be solved since the system of equations L(k=N ) = Nk=0 ck 'k (k=N ), k = 0 : : : N ,
Polynomials 35
can be solved (uniquely) for c0 : : : cN . (How?) To nish the proof, we need to show
that jxj can be approximated by polynomials on any interval  a b ]. (Why?)
21. Here's an elementary proof that there is a sequence of polynomials (Pn) converging
uniformly to jxj on  ;1 1 ].
(a) Dene (Pn) recursively by Pn+1(x) = Pn(x) + x ; Pn(x)2 ]=2, where P0(x) = 0.
Clearly, each Pn is a polynomial.
p
(b) Check that 0  Pn(x)  Pn+1(x)  x for 0  x  1. Use Dini's theorem to
conclude that Pn(x)
px on  0 1 ].
(c) Pn(x2 ) is also a polynomial, and Pn(x2 ) jxj on  ;1 1 ].
. 22. The result in problem 19 (or 20) shows that the polynomials are dense in C  0 1 ].
Using the results in 18, conclude that the polynomials are also dense in C  a b ].
. 23. How do we know that there are non-polynomial elements in C  0 1 ]? In other words,
is it possible that every element of C  0 1 ] agrees with some polynomial on  0 1 ]?
24. Let (Qn ) be a sequence of polynomials of degree mn, and suppose that (Qn ) converges
uniformly to f on  a b ], where f is not a polynomial. Show that mn ! 1.
25. If f 2 C ;1 1 ] (or C 2 ) is an even function, show that f may be uniformly approxi-
mated by even polynomials (or even trig polynomials).
26. If f 2 C  0 1 ] and if f (0) = f (1) = 0, show that the sequence of polynomials
Pn ;nf (k=n) xk (1 ; x)n;k with integer coecients converges uniformly to f
k=0 k
(where x] denotes the greatest integer in x). The same trick works for any f 2 C  a b ]
provided that 0 < a < b < 1.
27. If p is a polynomial and " > 0, prove that there is a polynomial q with rational
coecients such that kp ; qk < " on  0 1 ]. Conclude that C  0 1 ] is separable.
28. Let (xi ) be a sequence of numbers in (0 1) such that nlim 1 Pn xk exists for every
P !1 n i=1 i
n
!1 n i=1 f (xi ) exists for every f 2 C  0 1 ].
k = 0 1 2 : : :. Show that nlim 1

29. If f 2 C  0 1 ] and if R01 xn f (x) dx = 0 for each n = 0 1 2 : : :, show that f 0. Hint:


R
Using the Weierstrass theorem, show that 01 f 2 = 0.]
The next proof of the Weierstrass theorem that we consider is quite explicit we actually
Polynomials 36
display a sequence of polynomials that converges uniformly to a given f 2 C  0 1 ]. Given
;  of Bernstein polynomials for f by
f 2 C  0 1 ], we dene the sequence Bn(f ) 1
n=1
;B (f )(x) = X f k   nxk (1 ; x)n;k :
n 
n
k=0 n k
Please note that Bn(f ) is a polynomial of degree at most n. Also, it's easy to see that
;B (f )(0) = f (0) and ;B (f )(1) = f (1). In general, ;B (f )(x) is an average of
n n n
the numbers f (k=n), k = 0 : : : n. Bernstein's theorem states that the sequence Bn(f )
converges uniformly to f for each f 2 C  0 1 ] the proof is rather simple once we have a
few facts about the Bernstein polynomials at our disposal. For later reference, let's write
f0 (x) = 1 f1(x) = x and f2(x) = x2 :
Among other things, the following exercise establishes Bernstein's theorem for these three
polynomials. Curiously, these few special cases will imply the general result.
. 30. (i) Bn(f0 ) = f0 and Bn(f1 ) = f1 . Hint: Use the binomial theorem.]
; 
(ii) Bn(f2 ) = 1 ; n1 f2 + n1 f1, and hence (Bn(f2 )) converges uniformly to f2 .
P ; ;
(iii) nk=0 nk ; x 2 nk xk (1 ; x)n;k = x(1n;x)  41n , if 0  x  1.
(iv) Given
> 0 and 0  x  1, let F denote the set of k's in f0 : : : ng for which
 k ; x 
. Then P ;n xk (1 ; x)n;k  1 .
n k2F k 4n2
. 31. Show that jBn(f )j  Bn(jf j), and that Bn(f )  0 whenever f  0. Conclude that
kBn(f )k  kf k.
32. If f is a bounded function on  0 1 ], show that Bn(f )(x) ! f (x) at each point of
continuity of f .
33. (Bohman, Korovkin) Let (Tn) be a sequence of monotone linear operators on C  0 1 ]
that is, each Tn is a linear map from C  0 1 ] into itself satisfying Tn(f )  Tn (g)
whenever f  g. Suppose also that Tn(f0 ) f0, Tn(f1 ) f1 , and Tn(f2 ) f2 .
Prove that Tn(f ) f for every f 2 C  0 1 ]. Hint: Mimic the proof of Bernstein's
theorem.]
34. Find Bn(f ) for f (x) = x3 . Hint: k2 = (k ; 1)(k ; 2) + 3(k ; 1) + 1.] The same
method of calculation can be used to show that Bn(f ) 2 Pm whenever f 2 Pm and
n > m.
Polynomials 37
35. Let f be continuously dierentiable on  a b ], and let " > 0. Show that there is a
polynomial p such that kf ; pk < " and kf 0 ; p0 k < ".
36. Suppose that f 2 C  a b ] is twice continuously dierentiable and has f 00 > 0. Prove
that the best linear approximation to f on  a b ] is a0 + a1 x where a0 = f 0 (c),
a1 = f (a) + f (c) + f 0 (c)(a + c)]=2, and where c is the unique solution to f 0 (c) =
(f (b) ; f (a))=(b ; a).
The next several exercises concern the modulus of continuity. Given a bounded real-valued
function f dened on some interval I , we dene !f , the modulus of continuity of f , by

!f (I 
) = !f (
) = sup jf (x) ; f (y)j : x y 2 I jx ; yj 


 0:

Note, for example, that if f is uniformly continuous, then !f (


) ! 0 as
! 0. Indeed,
the statement that jf (x) ; f (y)j  " whenever jx ; yj 
is equivalent to the statement
that !f (
)  ". On the other hand, if the graph of f has a jump of magnitude 1, say, then
!f (
)  1 for all
> 0.
. 37. If f satises the Lipschitz condition jf (x) ; f (y)j  K jx ; yj, what can you say about
!f ? Calculate !g for g(x) = px.
38. If f 2 C  a b ], show that !f (
1 +
2 )  !f (
1) + !f (
2 ) and that !f (
) # 0 as
# 0.
Use this to show that !f is continuous for
 0. Finally, show that the modulus of
continuity of !f is again !f .
. 39. (a) If x = cos , where ;1  x  1, and if g() = f (cos ), show that !g (;  ]
) =
!g ( 0  ]
)  !f (;1 1 ]
).
(b) If g(x) = f (ax+b) for c  x  d, show that !g ( c d ]
) = !f ( ac+b ad+b ] a
).
40. Let f be continuously dierentiable on  0 1 ]. Show that (Bn(f )0 ) converges uniformly
to f 0 by showing that kBn(f 0 ) ; (Bn+1(f ))0 k  !f 0 (1=(n + 1)). In order to see
why this is of interest, nd a uniformly convergent sequence of polynomials whose
derivatives fail to converge uniformly. Compare this result with problem 35.]
Math 682 Trigonometric Polynomials 5/26/98

Introduction
A (real) trigonometric polynomial, or trig polynomial for short, is a function of the form
X
n ; 
a0 + ak cos kx + bk sin kx ( )
k=1
where a0 : : : an and b1 : : : bn are real numbers. The degree of a trig polynomial is the
highest frequency occurring in any representation of the form ( ) thus, ( ) has degree n
provided that one of an or bn is nonzero. We will use Tn to denote the collection of trig
polynomials of degree at most n, and T to denote the collection of all trig polynomials
(i.e., the union of the Tn's).
It is convenient to take the space of all continuous 2-periodic functions on R as the
containing space for Tn a space we denote by C 2 . The space C 2 has several equivalent
descriptions. For one, it's obvious that C 2 is a subspace of C (R), the space of all con-
tinuous functions on R. But we might also consider C 2 as a subspace of C  0 2 ] in the
following way: The 2-periodic continuous functions on R may be identied with the set
of functions f 2 C  0 2 ] satisfying f (0) = f (2). Each such f extends to a 2-periodic
element of C (R) in an obvious way, and it's not hard to see that the condition f (0) = f (2)
denes a subspace of C  0 2 ]. As a third description, it is often convenient to identify C 2
with the collection C (T), consisting of all the continuous real-valued functions on T, where
T is the unit circle in the complex plane C . That is, we simply make the identications

 ! ei and f () ! f (ei ):


In any case, each f 2 C 2 is uniformly continuous and uniformly bounded on all of R, and
is completely determined by its values on any interval of length 2. In particular, we may
(and will) endow C 2 with the sup norm:
kf k = max jf (x)j = max jf (x)j:
0 x 2 x2R

Our goal in this chapter is to prove what is sometimes called Weierstrass's second
theorem (also from 1885).
Trig Polynomials 39
Theorem. (Weierstrass's Second Theorem, 1885) Let f 2 C 2 . Then, for every " > 0,
there exists a trig polynomial T such that kf ; T k < ".
Ultimately, we will give several dierent proofs of this theorem. Weierstrass gave a
separate proof of this result in the same paper containing his theorem on approximation
by algebraic polynomials, but it was later pointed out by Lebesgue (1898) that the two
theorems are, in fact, equivalent. Lebesgue's proof is based on several elementary obser-
vations. We will outline these elementary facts as \exercises with hints," supplying a few
proofs here and there, but leaving full details to the reader.
We rst justify the use of the word \polynomial" in describing ( ).

Lemma. cos nx and sin(n + 1)x= sin x can be written as polynomials of degree exactly n
in cos x for any integer n  0.

Proof. Using the recurrence formula cos kx + cos(k ; 2)x = 2 cos(k ; 1)x cos x it's not
hard to see that cos 2x = 2 cos2 x ; 1, cos 3x = 4 cos3 x ; 3 cos x, and cos 4x = 8 cos4 x ;
8 cos2 x + 1. More generally, by induction, cos nx is a polynomial of degree n in cos x
with leading coecient 2n;1. Using this fact and the identity sin(k + 1)x ; sin(k ; 1)x =
2 cos kx sin x (along with another easy induction argument), it follows that sin(n +1)x can
be written as sin x times a polynomial of degree n in cos x with leading coecient 2n.
Alternatively, notice that by writing (i sin x)2k = (cos2 x ; 1)k we have
"X
n n #
cos nx = Re (cos x + i sin x)n ] = Re k n;k
k=0 k (i sin x) cos x
X n
n=2]
= (cos2 x ; 1)k cosn;2k x:
k=0 2k
The coecient of cosn x in this expansion is then
X n
n=2]
1X n n 
n;1 :
2k = 2 k=0 k = 2
k=0

(All the binomial coecients together sum to (1 + 1)n = 2n, but the even or odd terms
taken separately sum to exactly half this amount since (1 + (;1))n = 0.)
Trig Polynomials 40
Similarly,
"nX
+1   #

sin(n + 1)x = Im (cos x + i sin x)n+1 = Im n + 1 (i sin x)k cosn+1;k x
k=0 k
X n+1
n=2]
k n;2k x sin x
2k + 1 (cos x ; 1) cos
= 2
k=0
where we've written (i sin x)2k+1 = i(cos2 x ; 1)k sin x. The coecient of cosn x sin x is
X n+1
n=2]
1 nX
+1 
n + 1 = 2n:
2k + 1 = 2 k=0 k
k=0

Corollary. Any trig polynomial ( ) may be written as P (cos x) + Q(cos x) sin x, where
P and Q are algebraic polynomials of degree at most n and n ; 1, respectively. If ( )
represents an even function, then it can be written using only cosines.
Corollary. The collection T , consisting of all trig polynomials, is both a subspace and
a subring of C 2 (that is, T is closed under both linear combinations and products). In
other words, T is a subalgebra of C 2 .
It's not hard to see that the procedure we've described above can be reversed that is,
each algebraic polynomial in cos x and sin x can be written in the form ( ). For example,
4 cos3 x = 3 cos x + cos 3x. But, rather than duplicate our eorts, let's use a bit of linear
algebra. First, the 2n + 1 functions
A = f 1 cos x cos 2x : : : cos nx sin x sin 2x : : : sin nx g
are linearly independent the easiest way to see this is to notice that we may dene an
inner product on C 2 under which these functions are orthogonal. Specically,
Z 2 Z 2
hf gi = f (x) g(x) dx = 0 hf f i = f (x)2 dx 6= 0
0 0
for any pair of functions f 6= g 2 A. (We'll pursue this direction in greater detail later in
the course.) Second, we've shown that each element of A lives in the space spanned by
the 2n + 1 functions
B = f 1 cos x cos2 x : : : cosn x sin x cos x sin x : : : cosn;1 x sin x g:
Trig Polynomials 41
That is,
Tn span A span B:
By comparing dimensions, we have
2n + 1 = dim Tn = dim(span A)  dim(span B)  2n + 1
and hence we must have span A = span B. The point here is that Tn is a nite-dimensional
subspace of C 2 of dimension 2n + 1, and we may use either one of these sets of functions
as a basis for Tn.
Before we leave these issues behind, let's summarize the situation for complex trig
polynomials i.e., the case where we allow complex coecients in ( ). Now it's clear that
every trig polynomial ( ), whether real or complex, can be written as
X
n
ck eikx ( )
k=;n
where the ck 's are complex that is, a trig polynomial is actually a polynomial (over C ) in
z = eix and z! = e;ix . Conversely, every polynomial ( ) can be written in the form ( ),
using complex ak 's and bk 's. Thus, the complex trig polynomials of degree n form a vector
space of dimension 2n +1 over C (hence of dimension 2(2n +1) when considered as a vector
space over R). But, not every polynomial in z and z! represents a real trig polynomial.
Rather, the real trig polynomials are the real parts of the complex trig polynomials. To
see this, notice that ( ) represents a real-valued function if and only if
X
n X
n X
n
ck eikx = ck eikx = c!;k eikx 
k=;n k=;n k=;n
that is, ck = c!;k for each k. In particular, c0 must be real, and hence
X
n X
n
ck eikx = c0 + (ck eikx + c;k e;ikx)
k=;n k=1
Xn
= c0 + (ck eikx + c!k e;ikx)
k=1
Xn
= c0 + (ck + c!k ) cos kx + i(ck ; c!k ) sin kx
k=1
Xn
= c0 + 2Re(ck ) cos kx ; 2Im(ck ) sin kx
k=1
Trig Polynomials 42
which is of the form ( ) with ak and bk real.
Conversely, given any real trig polynomial ( ), we have
X
n ;  n   a ; ib 
X  a + ib  
a0 + ak cos kx + bk sin kx = a0 + k k ikx
e + k k e ; ikx
k=1 k=1 2 2
which of of the form ( ) with ck = c!;k for each k.
It's time we returned to approximation theory! Since we've been able to identify C 2
with a subspace of C  0 2 ], and since Tn is a nite-dimensional subspace of C 2 , we have
Corollary. Each f 2 C 2 has a best approximation (on all of R) out of Tn. If f is an
even function, then it has a best approximation which is also even.
Proof. We only need to prove the second claim, so suppose that f 2 C 2 is even and
that T  2 Tn satises
kf ; T  k = min kf ; T k:
T 2Tn
Then, since f is even, Te(x) = T  (;x) is also a best approximation to f out of Tn indeed,
kf ; Te k = max jf (x) ; T  (;x)j
x2R
= max jf (;x) ; T  (x)j
x2R
= max jf (x) ; T  (x)j = kf ; T  k:
x2R

But now, the even trig polynomial


e   
Tb(x) = T (x) +2 T (x) = T (;x)2+ T (x)

is also a best approximation out of Tn since


 e 
 )  kf ; Te k + kf ; T  k
kf ; Tb k =  ( f ; T ) + (f ; T   = min kf ; T k:
2 2 T 2Tn

We next give (de la Vall"ee Poussin's version of) Lebesgue's proof of Weierstrass's
second theorem that is, we will deduce the second theorem from the rst.
Trig Polynomials 43
Theorem. Let f 2 C 2 and let " > 0. Then, there is a trig polynomial T such that
kf ; T k = max jf (x) ; T (x)j < ".
x2R
Proof. We will prove that Weierstrass's rst theorem for C ;1 1 ] implies his second
theorem for C 2 .
Step 1. If f is even, then f may be uniformly approximated by even trig polynomials.
If f is even, then it's enough to approximate f on the interval  0  ]. In this case, we
may consider the function g(y) = f (arccos y), ;1  y  1, in C ;1 1 ]. By Weierstrass's
rst theorem, there is an algebraic polynomial p(y) such that
max jf (arccos y) ; p(y)j = max jf (x) ; p(cos x)j < ":
;1 y 1 0 x 
But T (x) = p(cos x) is an even trig polynomial! Hence,
kf ; T k = max jf (x) ; T (x)j < ":
x2R
Let's agree to abbreviate kf ; T k < " as f  T .
Step 2. Given f 2 C 2 , there is a trig polynomial T such that 2f (x) sin2 x  T (x).
Each of the functions f (x) + f (;x) and f (x) ; f (;x)] sin x is even. Thus, we may
choose even trig polynomials T1 and T2 such that
f (x) + f (;x)  T1(x) and f (x) ; f (;x)] sin x  T2(x):
Multiplying the rst expression by sin2 x, the second by sin x, and adding, we get
2f (x) sin2 x  T1 (x) sin2 x + T2(x) sin x T3(x)
where T3 (x) is still a trig polynomial, and where \" now means \within 2"" (since
j sin x j  1).
Step 3. Given f 2 C 2 , there is a trig polynomial T such that 2f (x) cos2 x  T (x), where
\" means \within 2"."
Repeat Step 2 for f (x ; =2) and translate: We rst choose a trig polynomial T4 (x)
such that

2f x ; 2 sin2 x  T4 (x):
Trig Polynomials 44
That is,
2f (x) cos2 x  T5(x)
where T5(x) is a trig polynomial.
Finally, by combining the conclusions of Steps 2 and 3, we nd that there is a trig
polynomial T6 (x) such that f  T6 (x), where, again, \" means \within 2"."
Just for fun, let's complete the circle and show that Weierstrass's second theorem
for C 2 implies his rst theorem for C ;1 1 ]. Since, as we'll see, it's possible to give an
independent proof of the second theorem, this is a meaningful exercise.
Theorem. Given f 2 C ;1 1 ] and " > 0, there exists an algebraic polynomial p such
that kf ; pk < ".
Proof. Given f 2 C ;1 1 ], the function f (cos x) is an even function in C 2 . By our
Corollary to Weierstrass's second theorem, we may approximate f (cos x) by an even trig
polynomial:
f (cos x)  a0 + a1 cos x + a2 cos 2x +  + an cos nx:
But, as we've seen, cos kx can be written as an algebraic polynomial in cos x. Hence, there
is some algebraic polynomial p such that f (cos x)  p(cos x). That is,
max jf (cos x) ; p(cos x)j = max jf (t) ; p(t)j < ":
0 x  ;1 t 1

The algebraic polynomials Tn (x) satisfying


Tn(cos x) = cos nx for n = 0 1 2 : : :
are called the Chebyshev polynomials of the rst kind. Please note that this formula
uniquely denes Tn as a polynomial of degree exactly n, and hence uniquely determines
the values of Tn(x) for jxj > 1, too. The algebraic polynomials Un (x) satisfying
n + 1)x for n = 0 1 2 : : :
Un(cos x) = sin(sin x
are called the Chebyshev polynomials of the second kind. Likewise, note that this formula
uniquely denes Un as a polynomial of degree exactly n.
Trig Polynomials 45
We will discover many intriguing properties of the Chebyshev polynomials in the next
chapter. For now, let's settle for just one: The recurrence formula we gave earlier

cos nx = 2 cos x cos(n ; 1)x ; cos(n ; 2)x

now becomes
Tn(x) = 2x Tn;1 (x) ; Tn;2(x) n  2
where T0 (x) = 1 and T1(x) = x. This recurrence relation (along with the initial cases T0
and T1 ) may be taken as a denition for the Chebyshev polynomials of the rst kind. At
any rate, it's now easy to list any number of the Chebyshev polynomials Tn for example,
the next few are T2 (x) = 2x2 ; 1, T3(x) = 4x3 ; 3x, T4(x) = 8x4 ; 8x2 + 1, and T5(x) =
16x5 ; 20x3 + 5x.
Math 682 Problem Set: Trigonometric Polynomials 5/26/98

A (real) trigonometric polynomial, or trig polynomial for short, is a function of the form
X
n ; 
a0 + ak cos kx + bk sin kx ( )
k=1
where a0 : : : an and b1 : : : bn are real numbers. We will use Tn to denote the collection
of trig polynomials of degree at most n, considered as a subspace C 2 , the space of all
continuous 2-periodic functions on R. The space C 2 may, in turn, be considered as a
subspace of C  0 2 ]. Indeed, the 2-periodic continuous functions on R may be identied
with the subspace of C  0 2 ] consisting of those f 's which satisfy f (0) = f (2). As an
alternate description, it is often convenient to instead identify C 2 with the collection
C (T), consisting of all continuous real-valued functions on T, where T is the unit circle in
the complex plane C . In this case, we simply make the identications

 ! ei and f () ! f (ei ):

. 41. (a) By using the recurrence formulas cos kx + cos(k ; 2)x = 2 cos(k ; 1)x cos x and
sin(k + 1)x ; sin(k ; 1)x = 2 cos kx sin x, show that each of the functions cos kx
and sin(k + 1)x= sin x may be written as algebraic polynomials of degree exactly
k in cos x. In each case, what is the coecient of cosk x?
(b) Equivalently, use the binomial formula to write the real and imaginary parts of
(cos x + i sin x)n = cos nx + i sin nx as algebraic polynomials in cos x and sin x.
Again, what are the leading coecients of these polynomials?
(c) If P (x y) is an algebraic polynomial (in two variables) of degree at most n, show
that P (cos x sin x) may be written as Q(cos x) + R(cos x) sin x, where Q and
R are algebraic polynomials (in one variable) of degrees at most n and n ; 1,
respectively.
(d) Show that cosn x can be written as a linear combination of the functions cos kx,
k = 1 : : : n, and that cosn;1 x sin x can be written as a linear combinations of
the functions sin kx, k = 1 : : : n. Thus, each polynomial P (cos x sin x) in cos x
and sin x can be written in the form ( ).
Trigonometric Polynomials 47
(e) If ( ) represents an even function, show that it can be written using only cosines.
Conversely, if P (x y) is an even polynomial, show that P (cos x sin x) can be
written using only cosines.
42. Show that Tn has dimension exactly 2n + 1 (as a vector space over R).
43. We might also consider complex trig polynomials that is, functions of the form ( ) in
which we now allow the ak 's and bk 's to be complex numbers.
(a) Show that every trig polynomial, whether real or complex, may be written as
X
n
ck eikx ( )
k=;n
where the ck 's are complex. Thus, complex trig polynomials are just algebraic
polynomials in z and z!, where z = eix 2 T.
(b) Show that ( ) is real-valued if and only if c!k = c;k for any k.
(c) If ( ) is a real-valued function, show that it may be written as a real trig poly-
nomial that is, it may be written in the form ( ) using only real coecients.
Math 682 Characterization of Best Approximation 5/27/98

We next discuss Chebyshev's solution to the problem of best polynomial approximation


from 1854. Given that there was no reason to believe that the problem even had a solution,
let alone a unique solution, Chebyshev's accomplishment should not be underestimated.
Chebyshev might very well have been able to prove Weierstrass's result|30 years early|
had the thought simply occurred to him! Chebyshev's original papers are apparently
rather sketchy. It wasn't until 1903 that full details were given by Kirchberger. Curiously,
Kirchberger's proofs foreshadow very modern techniques such as convexity and separation
arguments. The presentation we'll give owes much to Haar and to de la Vall"ee Poussin
(both from around 1918).
We begin with an easy observation:
Lemma. Let f 2 C  a b ] and let p = pn be a best approximation to f out of Pn. Then,
there are at least two distinct points x1 , x2 2  a b ] such that
f (x1 ) ; p(x1 ) = ;(f (x2 ) ; p(x2 )) = kf ; pk:
That is, f ; p attains both of the values kf ; pk.
Proof. Let's write E = En (f ) = kf ; pk = max jf (x) ; p(x)j. If the conclusion of the
a x b
Lemma is false, then we might as well suppose that f (x1 ) ; p(x1 ) = E , for some x1, but
that
e = min (f (x) ; p(x)) > ;E:
a x b
In particular, E + e 6= 0 and so q = p + (E + e)=2 is an element of Pn with q 6= p. We
claim that q is a better approximation to f than p. Here's why:
E + e E + e E + e
E; 2  f (x) ; p(x) ; 2  e; 2
or E ; e E ; e
2  f (x) ; q(x)  ; 2 :
That is, E ; e
kf ; q k  2 < E = kf ; pk
Best Approximation 49
a contradiction.
Corollary. The best approximating constant to f 2 C  a b ] is
2 3
p0 = 1 4 max f (x) + min f (x)5
2 a x b a x b

and 2 3
E0(f ) = 2 1 4 max f (x) ; min f (x)5 :
a x b a x b

Proof. Exercise.
Now all of this is meant as motivation for the general case, which essentially repeats
the observation of our rst Lemma inductively. A little experimentation will convince you
that a best linear approximation, for example, would imply the existence of three points
(at least) at which f ; p1 alternates between kf ; p1 k.
A bit of notation will help us set up the argument for the general case: Given g in
C  a b ], we'll say that x 2  a b ] is a (+) point for g (respectively, a (;) point for g) if
g(x) = kgk (respectively, g(x) = ;kgk). A set of distinct point a  x0 < x1 <  < xn  b
will be called an alternating set for g if the xi 's are alternately (+) points and (;) points
that is, if
jg(xi )j = kgk i = 0 1 ::: n
and
g(xi ) = ;g(xi;1 ) i = 1 2 : : : n:
Using this notation, we will be able to characterize the polynomial of best approximation.
Since the following three theorems are particularly important, we will number them for
future reference. Our rst result is where all the ghting takes place:
Theorem 1. Let f 2 C  a b ], and suppose that p = pn is a best approximation to f out
of Pn. Then, there is an alternating set for f ; p consisting of at least n + 2 points.
Proof. If f 2 Pn, there's nothing to show. (Why?) Thus, we may suppose that f 2= Pn
and, hence, that E = En(f ) = kf ; pk > 0.
Best Approximation 50
Now consider the (uniformly) continuous function ' = f ; p. We may partition  a b ]
by way of a = t0 < t1 <  < tn = b into suciently small intervals so that

j'(x) ; '(y)j < E=2 whenever x y 2  ti ti+1 ]:

Here's why we'd want to do such a thing: If  ti ti+1 ] contains a (+) point for ' = f ; p,
then ' is positive on all of  ti ti+1 ]. Indeed,

x y 2  ti ti+1 ] and '(x) = E =) '(y) > E=2 > 0:

Similarly, if  ti ti+1 ] contains a (;) point for ', then ' is negative on all of  ti ti+1 ].
Consequently, no interval  ti ti+1 ] can contain both (+) points and (;) points.
Call  ti ti+1 ] a (+) interval (respectively, a (;) interval) if it contains a (+) point
(respectively, a (;) point) for ' = f ; p. Notice that no (+) interval can even touch a
(;) interval. In other words, a (+) interval and a (;) interval must be strictly separated
(by some interval containing a zero for ').
We now relabel the (+) and (;) intervals from left to right, ignoring the \neither"
intervals. There's no harm in supposing that the rst \signed" interval is a (+) interval.
Thus, we suppose that our relabeled intervals are written

I1 I2 : : : Ik1 (+) intervals


Ik1 +1 Ik1+2 : : : Ik2 (;) intervals
:::::::::::::::::
Ikm;1+1 Ik1 +2 : : : Ikm (;1)m;1 intervals

where Ik1 is the last (+) interval before we reach the rst (;) interval, Ik1 +1. And so on.
For later reference, we let S denote the union of all the \signed" intervals  ti ti+1 ]
S I , and we let N denote the union of all the \neither" intervals  t t ].
that is, S = m j =1 kj i i+1
Thus, S and N are compact sets with S  N =  a b ] (note that while S and N aren't
quite disjoint, they are at least \non-overlapping"|their interiors are disjoint).
Our goal here is to show that m  n + 2. (So far we only know that m  2!) Let's
suppose that m < n + 2 and see what goes wrong.
Best Approximation 51
Since any (+) interval is strictly separated from any (;) interval, we can nd points
z1 : : : zm;1 2 N such that
max Ik1 < z1 < min Ik1 +1
max Ik2 < z2 < min Ik2 +1
::::::::::::::::::::
max Ikm;1 < zm;1 < min Ikm;1+1

And now we construct the oending polynomial:

q(x) = (z1 ; x)(z2 ; x)  (zm;1 ; x):

Notice that q 2 Pn since m ; 1  n. (Here is the only use we'll make of the assumption
m < n + 2!) We're going to show that p + q 2 Pn is a better approximation to f than p,
for some suitable scalar .
We rst claim that q and f ; p have the same sign. Indeed, q has no zeros in any
of the () intervals, hence is of constant sign on any such interval. Thus, q > 0 on
I1 : : : Ik1 because each (zj ; x) > 0 on these intervals q < 0 on Ik1+1 : : : Ik2 because
here (z1 ; x) < 0, while (zj ; x) > 0 for j > 1 and so on.
We next nd . Let e = max jf (x) ; p(x)j, where N is the union of all the subin-
x2N
tervals  ti ti+1 ] which are neither (+) intervals nor (;) intervals. Then, e < E . (Why?)
Now choose  > 0 so that kqk < minfE ; e E=2g. We claim that p + q is a better
approximation to f than p. One case is easy: If x 2 N , then

jf (x) ; (p(x) + q(x))j  jf (x) ; p(x)j + jq(x)j  e + kqk < E:


On the other hand, if x 2= N , then x is in either a (+) interval or a (;) interval. In
particular, we know that jf (x) ; p(x)j > E=2 > kqk and that f (x) ; p(x) and q(x) have
the same sign. Thus,
jf (x) ; (p(x) + q(x))j = jf (x) ; p(x)j ; jq(x)j
 E ;  min jq(x)j < E
x2S
Best Approximation 52
since q is nonzero on S . This contradiction nishes the proof. (Phew!)
Remarks
1. It should be pointed out that the number n + 2 here is actually 1 + dim Pn.
2. Notice, too, that if f ; pn alternates in sign n + 2 times, then f ; pn must have at
least n +1 zeros. Thus, pn actually agrees with f (or \interpolates" f ) at n +1 points.
We're now ready to establish the uniqueness of the polynomial of best approximation.
Theorem 2. Let f 2 C  a b ]. Then, the polynomial of best approximation to f out of
Pn is unique.
Proof. Suppose that p, q 2 Pn both satisfy kf ; pk = kf ; qk = En(f ) = E . Then,
as we've seen, their average r = (p + q)=2 2 Pn is also best: kf ; rk = E since f ; r =
(f ; p)=2 + (f ; q)=2.
By Theorem 1, f ; r has an alternating set x0 x1 : : : xn+1, containing n + 2 points.
Thus, for each i,

(f ; p)(xi ) + (f ; q)(xi ) = 2E (alternating)

while
;E  (f ; p)(xi ) (f ; q)(xi )  E:
But this means that

(f ; p)(xi ) = (f ; q)(xi ) = E (alternating)

for each i. (Why?) That is, x0 x1 : : : xn+1 is an alternating set for both f ; p and f ; q.
In particular, the polynomial q ; p = (f ; p) ; (f ; q) has n + 2 zeros! Since q ; p 2 Pn,
we must have p = q.
Finally, we come full circle:
Theorem 3. Let f 2 C  a b ], and let p 2 Pn. If f ; p has an alternating set containing
n + 2 (or more) points, then p is the best approximation to f out of Pn.
Best Approximation 53
Proof. Let x0 x1 : : : xn+1 be an alternating set for f ; p, and suppose that some q 2 Pn
is a better approximation to f than p that is, kf ; qk < kf ; pk. In particular, then, we
must have
jf (xi ) ; p(xi )j = kf ; pk > kf ; qk  jf (xi ) ; q(xi )j
for each i = 0 1 : : : n + 1. Now the inequality jaj > jbj implies that a and a ; b have the
same sign (why?), hence q ; p = (f ; p) ; (f ; q) alternates in sign n + 2 times (because
f ; p does). But then, q ; p would have at least n +1 zeros. Since q ; p 2 Pn, we must have
q = p, which is a contradiction. Thus, p is the best approximation to f out of Pn.
Example (taken from Rivlin)
While an alternating set for f ; pn is supposed to have at least n + 2 points, it may well
have more than n + 2 points thus, alternating sets need not be unique. For example,
consider the function f (x) = sin 4x on ;  ]. Since there are 8 points where f alternates
between 1, it follows that p0 = 0 and that there are 4  4 = 16 dierent alternating
sets consisting of exactly 2 points (not to mention all those with more than 2 points). In
addition, notice that we actually have p1 =  = p6 = 0, but that p7 6= 0. (Why?)
Exercise
Show that y = x ; 1=8 is the best linear approximation to y = x2 on  0 1 ].
Essentially repeating the proof given for Theorem 3 yields a lower bound for En(f ).
Theorem. Let f 2 C  a b ], and suppose that q 2 Pn is such that f (xi ) ; q(xi ) alternates
in sign at n + 2 points a  x0  x1  : : :  xn+1  b. Then,

En(f )  min jf (xi ) ; q(xi )j:


i=0:::n+1

Proof. If the inequality fails, then the best approximation p = pn would satisfy

max jf (xi ) ; p(xi )j  En(f ) < min jf (xi ) ; q(xi )j:


1 i n+1 1 i n+1

Now we could repeat (essentially) the same argument used in the proof of Theorem 3 to
arrive at a contradiction. The details are left as an exercise.
Best Approximation 54
Even for relatively simple functions, the problem of actually nding the polynomial
of best approximation is genuinely dicult (even computationally). We end this section
by stating two important problems that Chebyshev was able to solve.
Problem
Find the polynomial pn;1 2 Pn;1, of degree at most n ; 1, that best approximates
f (x) = xn on the interval ;1 1 ]. (This particular choice of interval makes for a tidy
solution we'll discuss the general situation later.)
Since pn;1 is to minimize max jxn ; pn;1 (x)j, our rst problem is equivalent to:
jxj 1
Problem
Find the monic polynomial of degree n which deviates least from 0 on ;1 1 ]. In other
words, nd the monic polynomial of degree n which has smallest norm in C ;1 1 ].
We'll give two solutions to this problem (which we know has a unique solution, of
course). First, let's simplify our notation. We write
p(x) = xn ; pn;1 (x) (the solution)
and
M = kpk = En;1(xn  ;1 1 ]):
All we know about p is that it has an alternating set ;1  x0 < x1 <  < xn  1
containing (n ; 1) + 2 = n + 1 points that is, jp(xi )j = M and p(xi+1 ) = ;p(xi ) for all i.
Using this tiny bit of information, Chebyshev was led to compare the polynomials p2 and
p 0 . Watch closely!
Step 1. At any xi in (;1 1), we must have p 0 (xi ) = 0 (because p(xi ) is a relative extreme
value for p). But, p 0 is a polynomial of degree n ; 1 and so can have at most n ; 1 zeros.
Thus, we must have
xi 2 (;1 1) and p 0 (xi ) = 0 for i = 1 : : : n ; 1
(in fact, x1 : : : xn;1 are all the zeros of p 0 ) and
x0 = ;1 p 0 (x0 ) 6= 0 xn;1 = 1 p 0 (xn;1 ) 6= 0:
Best Approximation 55
Step 2. Now consider the polynomial M 2 ; p2 2 P2n. We know that M 2 ; (p(xi ))2 = 0
for i = 0 1 : : : n, and that M 2 ; p2  0 on ;1 1 ]. Thus, x1 : : : xn;1 must be double
roots (at least) of M 2 ; p2. But this makes for 2(n ; 1) + 2 = 2n roots already, so we
must have them all. Hence, x1 : : : xn;1 are double roots, x0 and xn are simple roots, and
these are all the roots of M 2 ; p2.
Step 3. Next consider (p 0 )2 2 P2(n;1). We know that (p 0 )2 has a double root at each
of x1 : : : xn;1 (and no other roots), hence (1 ; x2 )(p 0 (x))2 has a double root at each
x1 : : : xn;1 , and simple roots at x0 and xn . Since (1 ; x2)(p 0 (x))2 2 P2n, we've found all
of its roots.
Here's the point to all this \rooting":
Step 4. Since M 2 ; (p(x))2 and (1 ; x2 )(p 0 (x))2 are polynomials of the same degree with
the same roots, they are, up to a constant multiple, the same polynomial! It's easy to see
what constant, too: The leading coecient of p is 1 while the leading coecient of p 0 is
n thus, 2 )(p 0 (x))2
M ; (p(x)) =
2 2 (1 ; x :
n2
After tidying up,
0
p M 2p;(x(p) (x))2
= p n 2:
1;x
We really should have an extra  here, but we know that p 0 is positive on some interval
we'll simply assume that it's positive on ;1 x1 ]. Now, upon integrating,
 p(x) 
arccos M = n arccos x + C
or
p(x) = M cos(n arccos x + C ):
But p(;1) = ;M (because p 0 (;1)  0), so
cos(n + C ) = ;1 =) C = m (with n + m odd)
=) p(x) = M cos(n arccos x)
=) p(cos x) = M cos nx:
Look familiar? Since we know that cos nx is a polynomial of degree n with leading coe-
cient 2n;1 (the n-th Chebyshev polynomial Tn ), the solution to our problem must be
p(x) = 2;n+1 Tn(x):
Best Approximation 56
Since jTn(x)j  1 for jxj  1 (why?), the minimum norm is M = 2;n+1.
Next we give a \fancy" solution, based on our characterization of best approximations
(Theorem 3) and a few simple properties of the Chebyshev polynomials.
Theorem. For any n  1, the formula p(x) = xn ; 2;n+1 Tn (x) de nes a polynomial
p 2 Pn;1 satisfying
2;n+1 = max jxn ; p(x)j < max jxn ; q(x)j
jxj 1 jxj 1
for any other q 2 Pn;1.
Proof. We know that 2;n+1 Tn (x) has leading coecient 1, and so p 2 Pn;1 . Now set
xk = cos((n ; k)=n) for k = 0 1 : : : n. Then, ;1 = x0 < x1 <  < xn = 1 and
Tn(xk ) = Tn(cos((n ; k)=n)) = cos((n ; k)) = (;1)n;k :
Since jTn (x)j = jTn(cos )j = j cos nj  1, for ;1  x  1, we've found an alternating set
for Tn containing n + 1 points.
In other words, xn ; p(x) = 2;n+1 Tn(x) satises jxn ; p(x)j  2;n+1 and, for each
k = 0 1 : : : n, has xnk ; p(xk ) = 2;n+1 Tn(xk ) = (;1)n;k 2;n+1. By our characterization
of best approximations (Theorem 3), p must be the best approximation to xn out of
Pn;1.
Corollary. The monic polynomial of degree exactly n having smallest norm in C  a b ] is

(b ; a)n  T 2x ; b ; a :

2n2n;1 n b;a
Proof. Exercise. Hint: If p(x) is a polynomial of degree n with leading coecient 1,
then p~(x) = p((2x ; b ; a)=(b ; a)) is a polynomial of degree n with leading coecient
2n=(b ; a)n. Moreover, max jp(x)j = max jp~(x)j.]
a x b ;1 x 1
Properties of the Chebyshev Polynomials
As we've seen, the Chebyshev polynomial Tn (x) is the (unique, real) polynomial of degree
n (having leading coecient 1 if n = 0, and 2n;1 if n  1) such that Tn(cos ) = cos n
Best Approximation 57
for all . The Chebyshev polynomials have dozens of interesting properties and satisfy all
sorts of curious equations. We'll catalogue just a few.
C1. Tn(x) = 2x Tn;1 (x) ; Tn;2 (x) for n  2.
Proof. It follows from the trig identity cos n = 2 cos  cos(n ; 1) ; cos(n ; 2) that
Tn(cos ) = 2 cos  Tn;1(cos ) ; Tn;2(cos ) for all  that is, the equation Tn (x) =
2x Tn;1 (x) ; Tn;2(x) holds for all ;1  x  1. But since both sides are polynomials,
equality must hold for all x.
The next two properties are proved in essentially the same way:
C2. Tm (x) + Tn(x) = 12 Tm+n(x) + Tm;n (x) for m > n.
C3. Tm (Tn(x)) = Tmn (x).
C4. Tn(x) = 12 (x + x2 ; 1 )n + (x ; x2 ; 1 )n .
p p

Proof. First notice that the expression on the right-hand side is actually a polynomial
p p
since, on combining the binomial expansions of (x + x2 ; 1 )n and (x ; x2 ; 1 )n, the
p
odd powers of x2 ; 1 cancel. Next, for x = cos ,
Tn(x) = Tn (cos ) = cos n = 12 (ein + e;in )

= 1 (cos  + i sin )n + (cos  ; i sin )n

2
; p  ; p
= 1 x + i 1 ; x2 n + x ; i 1 ; x2 n

2
1 ; p  ; p
= 2 x + x2 ; 1 n + x ; x2 ; 1 n :

We've shown that these two polynomials agree for jxj  1, hence they must agree for all x
(real or complex, for that matter).
p p
For real x with jxj  1, the expression 21 (x + x2 ; 1 )n + (x ; x2 ; 1 )n equals
cosh(n cosh;1 x). In other words,
C5. Tn(cosh x) = cosh nx for all real x.
The next property also follows from property C4.
Best Approximation 58
p
C6. Tn(x)  (jxj + x2 ; 1 )n for jxj  1.
An approach similar to the proof of property C4 allows us to write xn in terms of the
Chebyshev polynomials T0 T1 : : : Tn .
X n
n=2]
C7. For n odd, 2nxn = k 2 Tn;2k (x) for n even, 2 T0 should be replaced by T0.
k=0
Proof. For ;1  x  1,

2nxn = 2n(cos )n = (ei + e;i )n


n n
in
=e + i ( n ; 2)  + 2 ei(n;4) + 
1 e
 n   n 
 + n ; 2 e ; i (n ; 4)  + n ; 1 e;i(n;2) + e;in
n n
= 2 cos n + 1 2 cos(n ; 2) + 2 2 cos(n ; 4) + 
n n
= 2 Tn(x) + 1 2 Tn;2 (x) + 2 2 Tn;4(x) + 
; 
where, if n is even, the last term in this last sum is n=n2] T0 (since the central term in the
;  ; 
binomial expansion, namely n=n2] = n=n2] T0, isn't doubled in this case).
C8. The zeros of Tn are x(kn) = cos((2k ; 1)=2n), k = 1 : : : n. They're real, simple, and
lie in the open interval (;1 1).
Proof. Just check! But notice, please, that the zeros are listed here in decreasing order
(because cosine decreases).
C9. Between two consecutive zeros of Tn, there is precisely one root of Tn;1.
Proof. It's not hard to check that
2k ; 1 < 2k ; 1 < 2k + 1
2n 2 (n ; 1) 2n
for k = 1 : : : n ; 1, which means that x(kn) > x(kn;1) > x(kn+1) .
C10. Tn and Tn;1 have no common zeros.
Best Approximation 59
Proof. Although this is immediate from property C9, there's another way to see it:
Tn(x0 ) = 0 = Tn;1(x0 ) implies that Tn;2(x0 ) = 0 by property C1. Repeating this
observation, we would have Tk (x0 ) = 0 for every k < n, including k = 0. No good!
T0(x) = 1 has no zeros.
C11. The set fx(kn) : 1  k  n n = 1 2 : : :g is dense in ;1 1 ].
Proof. Since cos x is (strictly) monotone on  0  ], it's enough to know that the set
f(2k ; 1)=2ngkn is dense in  0  ], and for this it's enough to know that f(2k ; 1)=2ngkn
is dense in  0 1 ]. (Why?) But
2k ; 1 = k ; 1  k
2n n 2n n
for n large that is, the set f(2k ; 1)=2ngkn is dense among the rationals in  0 1 ].
It's interesting to note here that the distribution of the roots fx(kn)gkn can be esti-
mated (see Natanson, Constructive Function Theory, Vol. I, pp. 48{51). For large n, the
number of roots of Tn that lie in an interval  x x + $x ] ;1 1 ] is approximately
pn$x 2 :
 1;x
In particular, for n large, the roots of Tn are \thickest" near the endpoints 1.
In probabilistic terms, this means that if we assign equal probability to each of the
roots x(0n) : : : x(nn) (that is, if we think of each root as the position of a point with mass
1=n), then the density of this probability distribution (or the density of the system of point
p
masses) at a point x is approximately 1= 1 ; x2 for large n. In still other words, this
tells us that the probability that a root of Tn lies in the interval  a b ] is approximately
Z
1 b p 1 dx :
 a 1 ; x2
C12. The Chebyshev polynomials are mutually orthogonal relative to the weight w(x) =
(1 ; x2 );1=2 on ;1 1 ].
Proof. For m 6= n the substitution x = cos  yields
Z1 dx Z
Tn(x) Tm (x) p = cos m cos n d = 0
;1 1 ; x2 0
Best Approximation 60
while for m = n we get
Z1 
dx = Z  cos2 n d =  if n = 0
T 2(x) p
n =2 if n > 0.
;1 1 ; x2 0

C13. jTn0 (x)j  n2 for ;1  x  1, and jTn0 (1)j = n2.


Proof. For ;1 < x < 1 we have
d
d T (x) = d Tn (cos ) = n sin n :
dx n d
d cos  sin 
Thus, jTn0 (x)j  n2 because j sin nj  nj sin j (which can be easily checked by induction,
for example). At x = 1, we interpret this derivative formula as a limit (as  ! 0 and
 ! ) and nd that jTn0 (1)j = n2.
As we'll see later, each p 2 Pn satises jp 0 (x)j  kpkn2 = kpk Tn0 (1) for ;1  x  1,
and this is, of course, best possible. As it happens, Tn(x) has the largest possible rate of
growth outside of ;1 1 ] among all polynomials of degree n. Specically:
Theorem. Let p 2 Pn and let kpk = max jp(x)j. Then, for any x0 with jx0j  1 and
;1 x 1
any k = 0 1 : : : n we have
jp(k)(x0 )j  kpk jTn(k)(x0 )j
where p(k) is the k-th derivative of p.
We'll prove only the case k = 0. In other words, we'll check that jp(x0 )j  kpk jTn(x0 )j.
The more general case is in Rivlin, Theorem 1.10, p. 31.
Proof. Since all the zeros of Tn lie in (;1 1), we know that Tn (x0 ) 6= 0. Thus, we may
consider the polynomial

q(x) = Tp((xx0 )) Tn (x) ; p(x) 2 Pn:


n 0
If the claim is false, then  p(x ) 
kpk <  T (x0 )  :
n 0
Best Approximation 61
Now at each of the points yk = cos(k=n), k = 0 1 : : : n, we have Tn(yk ) = (;1)k and,
hence,
q(yk ) = (;1)k Tp((xx0 )) ; p(yk ):
n 0
Since jp(yk )j  kpk, it follows that q alternates in sign at these n + 1 points. In particular,
q must have at least n zeros in (;1 1). But q(x0 ) = 0, by design, and jx0 j  1. That is,
we've found n + 1 zeros for a polynomial of degree n. So, q 0 that is,

p(x) = Tp((xx0 )) Tn (x):


n 0
But then,  p(x ) 
jp(1)j =  T (x0 )  > kpk
n 0
since Tn(1) = Tn (cos 0) = 1, which is a contradiction.
Corollary. Let p 2 Pn and let kpk = max jp(x)j. Then, for any x0 with jx0 j  1, we
;1 x 1
have  q n
jp(x0 )j  kpk jx0 j + x20 ; 1 :
Rivlin's proof of our last Theorem in the general case uses the following observation:
C14. For x  1 and k = 0 1 : : : n, we have Tn(k)(x) > 0.
Proof. Exercise. Hint: It follows from Rolle's theorem that Tn(k) is never zero for
x  1. (Why?) Now just compute Tn(k)(1).]
Uniform Approximation by Trig Polynomials
We end this section by summarizing (without proofs) the analogues of Theorems 1{3
for uniform approximation by trig polynomials. Throughout, f 2 C 2 and Tn denotes the
collection of trig polynomials of degree at most n.
1. f has a best approximation T  2 Tn.
2. f ; T  has an alternating set containing 2n +2 (or more) points in  0 2). (Note here
that 2n + 2 = 1 + dim Tn.)
3. T  is unique.
Best Approximation 62
4. If T 2 Tn is such that f ; T has an alternating set containing 2n + 2 or more points
in  0 2), then T = T  .
The proofs of 1{4 are very similar to the corresponding results for algebraic polyno-
mials. As you might imagine, 2 is where all the ghting takes place, and there are a few
technical diculties to cope with. Nevertheless, we'll swallow these facts whole and apply
them with a clear conscience to a few examples.
Example
For m > n, the best approximation to f (x) = A cos mx + B sin mx out of Tn is 0!

Proof. We may write f (x) = R cos m(x ; x0 ) for some R and x0 . (How?) Now we need
only display a suciently large alternating set for f (in some interval of length 2).
Setting xk = x0 + k=m, k = 1 2 : : : 2m, we get f (xk ) = R cos k = R(;1)k and
xk 2 (x0 x0 + 2]. Since m > n, it follows that 2m  2n + 2.
Example
The best approximation to
nX
+1; 
f (x) = a0 + ak cos kx + bk sin kx
k=1
out of Tn is
X
n ; 
T (x) = a0 + ak cos kx + bk sin kx
k=1
q2
and kf ; T k = an+1 + b2n+1 in C 2 .

Proof. By our last example, the best approximation to f ; T out of Tn is 0, hence T


must be the best approximation to f . (Why?) The last assertion is easy to check: Since
p
we can always write A qcos mx + B sin mx = A2 + B2  cos m(x ; x0 ), for some x0, it
follows that kf ; T k = a2n+1 + b2n+1 .

Finally, let's make a simple connection between the two types of polynomial approxi-
mation:
Best Approximation 63
Theorem. Let f 2 C ;1 1 ] and de ne ' 2 C 2 by '() = f (cos ). Then,
En(f ) = min kf ; pk = min k' ; T k EnT ('):
p2Pn T 2Tn
P
Proof. Suppose that p (x) = nk=0 ak xk is the best approximation to f out of Pn .
Then, Tb() = p (cos ) is in Tn and, clearly,

max jf (x) ; p (x)j = max jf (cos ) ; p (cos )j:


;1 x 1 0  2

Thus, kf ; p k = k' ; Tbk  min k' ; T k.


T 2Tn
On the other hand, since ' is even, we know that T  , its best approximation out of Tn,
is also even. Thus, T () = q(cos ) for some algebraic polynomial q 2 Pn. Consequently,
k' ; T  k = kf ; qk  min kf ; pk.
p2Pn
Remarks
1. Once we know that min kf ; pk = min k' ; T k, it follows that we must also have
p2Pn T 2Tn
T  () = p (cos ).
2. Each even ' 2 C 2 corresponds to an f 2 C ;1 1 ] by f (x) = '(arccos x) and, of
course, the conclusions of the Theorem and of Remark 1 hold in this case, too.
3. Whenever we speak of even trig polynomials, the Chebyshev polynomials are lurking
somewhere in the background. Indeed, let T () be an even trig polynomial, write
x = cos , as usual, and consider the following cryptic equation:
X
n X
n
T () = ak cos k = ak Tk (cos ) = p(cos )
k=0 k=0
P
where p(x) = nk=0 ak Tk (x) 2 Pn.
Math 682 Problem Set: Chebyshev Polynomials 5/28/98

We've shown that cos n and sin(n + 1)= sin  can be written as algebraic polynomials
of degree n in cos  we use this observation to dene the Chebyshev polynomials. The
Chebyshev polynomials of the rst kind (Tn(x)) are dened by Tn(cos ) = cos n, for
n = 0 1 2 : : :, while the Chebyshev polynomials of the second kind (Un(x)) are dened
by Un(cos ) = sin(n + 1)= sin  for n = 0 1 2 : : :.
. 44. Establish the following properties of Tn(x).
(i) T0 (x) = 1, T1(x) = x, and Tn(x) = 2xTn;1 (x) ; Tn;2(x) for n  2.
(ii) Tn(x) is a polynomial of degree n having leading coecient 2n;1 for n  1, and
containing only even (resp., odd) powers of x if n is even (resp., odd).
(iii) jTn(x)j  1 for ;1  x  1 when does equality occur? Where are the zeros of
Tn(x)? Show that between two consecutive zeros of Tn(x) there is exactly one
zero of Tn;1(x). Can Tn(x) and Tn;1(x) have a common zero?
(iv) jTn0 (x)j  n2 for ;1  x  1, and jTn0 (1)j = n2.

(v) Tm (x) + Tn (x) = 12 Tm+n(x) + Tm;n (x) for m > n.
(vi) Tm (Tn (x)) = Tmn(x).
Z1
(vii) Evaluate Tn(x) Tm (x) p dx 2 .
;1 1;x
(viii) Show that Tn is a solution to (1 ; x2)y00 ; xy0 + n2y = 0.
p p
(ix) Tn(x) = 21 (x + x2 ; 1 )n + (x ; x2 ; 1 )n for any x, real or complex.
(x) Re 1
;P tnein  = P1 tn cos n = 1 ; t cos  for ;1 < t < 1 that is,
P1 tnT (x) = 1 ; tx (this is a1 ;generating
n=0 n=0 2t cos  + t2
n=0 n 1 ; 2tx + t2 function for Tn it's closely
related to the Poisson kernel).
(xi) Find analogues of (i){(x) (if possible) for Un(x).
. 45. Show that every p 2 Pn has a unique representation as p = a0 + a1 T1 +  + an Tn.
Find this representation in the case p(x) = xn.
Chebyshev Polynomials 65
. 46. The polynomial of degree n having leading coecient 1 and deviating least from 0 on
;1 1 ] is given by Tn(x)=2n;1 . On an arbitrary interval  a b ] we would instead take

(b ; a)n T 2x ; b ; a :

22n;1 n b ; a
Is this solution unique? Explain.
47. If p is a polynomial on  a b ] of degree n having leading coecient an > 0, then
kpk  an(b ; a)n =22n;1. If b ; a  4, then no polynomial of degree exactly n with
integer coecients can satisfy kpk < 2 (compare this with problem 26 on the \Uniform
Approximation by Polynomials" problem set).
48. Given p 2 Pn, show that jp(x)j  kpkjTn(x)j for jxj > 1.
49. If p 2 Pn with kpk = 1 on ;1 1 ], and if jp(xi )j = 1 at n + 1 distinct point x0 : : : xn
in ;1 1 ], show that either p = 1, or else p = Tn. Hint: One approach is to
compare the polynomials 1 ; p2 and (1 ; x2)(p 0 )2.]
50. Compute Tn(k)(1) for k = 0 1 : : : n, where Tn(k) is the k-th derivative of Tn. For x  1
and k = 0 1 : : : n, show that Tn(k)(x) > 0.
Math 682 Examples: Chebyshev Polynomials in Practice 5/28/98

The following examples are cribbed from the book Chebyshev Polynomials, by L. Fox and
I. B. Parker (Oxford University Press, 1968).
As we've seen, the Chebyshev polynomals can be generated by a recurrence relation. By
reversing the procedure, we could solve for xn in terms of T0 T1 : : : Tn (we'll do this
calculation in class). Here are the rst few terms in each of these relations:
T0(x) = 1 1 = T0(x)
T1(x) = x x = T1(x)
T2(x) = 2x2 ; 1 x2 = (T0 (x) + T2(x))=2
T3(x) = 4x3 ; 3x x3 = (3 T1 (x) + T3 (x))=4
T4(x) = 8x4 ; 8x2 + 1 x4 = (3 T0 (x) + 4 T2 (x) + T4(x))=8
T5(x) = 16x5 ; 20x3 + 5x x5 = (10 T1 (x) + 5 T3 (x) + T5(x))=16
Note the separation of even and odd terms in each case. Writing ordinary, garden variety
polynomials in their equivalent Chebyshev form has some distinct advantages for numerical
computations. Here's why:

1 ; x + x2 ; x3 + x4 = 15
6 T 0 ( x) ; 7 T (x) + T (x) ; 1 T (x) + 1 T (x)
4 1 2 4 3 8 4

(after some simplication). Now we see at once that we can get a cubic approximation to
1 ; x + x2 ; x3 + x4 on ;1 1 ] with error at most 1=8 by simply dropping the T4 term on
the right-hand side (since jT4(x)j  1), whereas simply using 1 ; x + x2 ; x3 as our cubic
approximation could cause an error as big as 1. Pretty slick! This gimmick of truncating
the equivalent Chebyshev form is called economization.
As a second example we note that a polynomial with small norm on ;1 1 ] may have
annoyingly large coecients:

(1 ; x2)10 = 1 ; 10x2 + 45x4 ; 120x6 + 210x8 ; 252x10


+ 210x12 ; 120x14 + 45x16 ; 10x18 + x20
Chebyshev Polynomials in Practice 67
but in Chebyshev form (look out!):

(1 ; x2 )10 = 5241288 92 378 T0(x) ; 167 960 T2 (x) + 125 970 T4 (x) ; 77 520 T6(x)
+ 38 760 T8 (x) ; 15 504 T10 (x) + 4 845 T12(x) ; 1 140 T14(x)
+ 190 T16(x) ; 20 T18(x) + T20(x)

The largest coecient is now only about 0.3, and the omission of the last three terms
produces a maximum error of about 0.0004. Not bad.
P
As a last example, consider the Taylor polynomial ex = nk=0 xk =k! + xn+1e =(n + 1)!
(with remainder), where ;1  x,   1. Taking n = 6, the truncated series has error no
greater than e=7!  0:0005. But if we \economize" the rst six terms, then:
X
6
xk =k = 1:26606 T0(x) + 1:13021 T1(x) + 0:27148 T2(x) + 0:04427 T3(x)
k=0
+ 0:00547 T4(x) + 0:00052 T5(x) + 0:00004 T6(x):
The initial approximation already has an error of about 0:0005, so we can certainly drop
the T6 term without any additional error. Even dropping the T5 term causes an error of
no more than 0:001 (or thereabouts). The resulting approximation has a far smaller error
than the corresponding truncated Taylor series: e=5!  0:023.
The approach used in our last example has the decided disadvantage that we must rst
decide where to truncate the Taylor series|which might converge very slowly. A better
approach would be to write ex as a series involving Chebyshev polynomials directly. That
P a T (x). If the a 's are absolutely summable,
is, if possible, we want to write ex = 1 k=0 k k k
it will be very easy to estimate any truncation error. We'll get some idea on how to go
about this when we talk about \least-squares" approximation. As it happens, such a series
is easy to nd (it's rather like a Fourier series), and its partial sums are remarkably good
uniform approximations.
Math 682 A Brief Introduction to Interpolation 6/2/98

Our goal in this section is to prove the following result (as well as discuss its ramications).
In fact, this result is so fundamental that we will present three proofs!
Theorem. Let x0 x1 : : : xn be distinct points, and let y0 y1 : : : yn be arbitrary points
in R. Then, there exists a unique polynomial p 2 Pn satisfying p(xi ) = yi , i = 0 1 : : : n.
First notice that uniqueness is obvious. Indeed, if two polynomials p, q 2 Pn agree at
n + 1 points, then p q. (Why?) The real work comes in proving existence.
First Proof. (Vandermonde's determinant.) We seek c0 c1 : : : cn so that p(x) =
Pn ck xk satises
k=0
Xn
p(xi ) = ck xki = yi i = 0 1 : : : n:
k=0
That is, we need to solve a system of n + 1 linear equations for the ci's. In matrix form:
2 32 3 2 3
1 x0 x20  xn0
66 1 x1 x2  xn 77 66 cc01 77 66 yy01 77
66 . . .1 . .1 77 66 .. 77 = 66 . 77 :
4 .. .. .. . . .. 5 4 . 5 4 .. 5
1 xn x2n  xnn cn yn
This equation always has a unique solution because the coecient matrix has determinant
Y
D = (xi ; xj ) 6= 0:
0 j<i n
(D is called Vandermonde's determinant note that D > 0 if x0 < x1 <  < xn .) Since
this fact is of independent interest, we'll sketch a short proof below.
Lemma. D = Q (xi ; xj ).
0 j<i n
Proof. Consider
 
 1 x0 x20    xn0 
 1 x1 x21    xn1 
V (x0 x1 : : : xn;1 x) =  ... ... ..
. ... ..
.
 :
 
 1 xn;1 2
xn;1    xnn;1 
1 x x2    xn 
Interpolation 69
V (x0 x1 : : : xn;1 x) is a polynomial of degree n in x, and it's 0 whenever x = xi , i =
0 1 : : : n ; 1. Thus, V (x0 : : : x) = c ni=0
Q ;1(x ; x ), by comparing roots and degree.
i
However, it's easy to see that the coecient of xn in V (x0 : : : x) is V (x0 : : : xn;1 ).
Q ;1 (x ; x ). The result now follows by induction
Thus, V (x0 : : : x) = V (x0 : : : xn;1 ) in=0 i
and the obvious case  11 xx01  = x1 ; x0 .
Second Proof. (Lagrange interpolation.) We could dene p immediately if we had
polynomials `i (x) 2 Pn, i = 0 : : : n, such that `i (xj ) =
ij (where
ij is Kronecker's
P
delta that is,
ij = 0 for i 6= j , and
ij = 1 for i = j ). Indeed, p(x) = ni=0 yi `i (x)
would then work as our interpolating polynomial. In short, notice that the polynomials
f`0 `1 : : : `ng would form a (particularly convenient) basis for Pn.
We'll give two formulas for `i(x):
Y
(a). Clearly, `i(x) = xx ;; xxj works.
j 6=i i j
(b). Start with W (x) = (x ; x0 )(x ; x1 )  (x ; xn), and notice that the polynomial we
need satises
`i(x) = ai  xW;(xx)
i
for some ai 2 R. (Why?) But then, 1 = `i (xi ) = ai W 0 (xi ) (again, why?) that is,

`i(x) = (x ; W (x)
x ) W 0 (x ) :
i i
Q
Please note that `i(x) is a multiple of the polynomial j6=i (x ; xj ), for i = 0 : : : n, and
that p(x) is then a suitable linear combination of such polynomials.
Third Proof. (Newton's formula.) We seek p(x) of the form

p(x) = a0 + a1 (x ; x0 ) + a2(x ; x0 )(x ; x1 ) +  + an (x ; x0)  (x ; xn;1 ):

(Please note that xn does not appear on the right-hand side.) This form makes it almost
eortless to solve for the ai 's by plugging-in the xi 's, i = 0 : : : n ; 1.

y0 = p(x0 ) = a0
y1 = p(x1 ) = a0 + a1(x1 ; x0 ) =) a1 = xy1 ;
;x
a0 :
1 0
Interpolation 70
Continuing, we nd

a2 = y2(x; a;0 x; )(a1x(x2;;x x)0 )


2 0 2 1

a3 = y3 ; a0 ;(xa1 (;x3x;)(xx0 ) ;; xa2)((xx3 ;;xx0 )() x3 ; x1 )


3 0 3 1 3 2
and so on. Natanson, Vol. III, gives another formula for the ai's.]
Example
As a quick means of comparing these three solutions, let's nd the interpolating polynomial
(quadratic) passing through (1 2), (2 ;1), and (3 1). You're invited to check the following:
Vandermonde: p(x) = 10 ; 212 x + 52 x2 .
Lagrange: p(x) = (x ; 2)(x ; 3) + (x ; 1)(x ; 3) + 12 (x ; 1)(x ; 2).
Newton: p(x) = 2 ; 3(x ; 1) + 25 (x ; 1)(x ; 2).
As you might have already surmised, Lagrange's method is the easiest to apply by
hand, although Newton's formula has much to recommend it too (it's especially well-suited
to situations where we introduce additional nodes). We next set up the necessary notation
to discuss the ner points of Lagrange's method.
Given n + 1 distinct points a  x0 < x1 <  < xn  b (sometimes called nodes), we
rst form the polynomials
Yn
W (x) = (x ; xi )
i=0
and
Y x ; xj W (x)
`i(x) = = (x ; xi ) W 0 (xi ) :
j 6=i xi ; xj
The Lagrange interpolation formula is
X
n
Ln(f )(x) = f (xi ) `i (x):
i=0
That is, Ln (f ) is the unique polynomial in Pn that agrees with f at the xi 's. In particular,
notice that we must have Ln(p) = p whenever p 2 Pn. In fact, Ln is a linear projection
from C  a b ] onto Pn. Why is Ln(f ) linear in f ?]
Interpolation 71
Typically we're given (or construct) an array of nodes:
8 x(0)
>
> 0
< x(1)0 x(1)
1
X >
> x0 x(2)
(2)
1 x(2)
2
: .. ...
.
and form the corresponding sequence of projections
X
n
Ln(f )(x) = f (x(in)) `(in)(x):
i=0
An easy (but admittedly pointless) observation is that for a given f 2 C  a b ] we can always
nd an array X so that Ln(f ) = pn , the polynomial of best approximation to f out of Pn
(since f ;pn has n+1 zeros, we may use these for the xi 's). Thus, kLn(f );f k = En(f ) ! 0
in this case. However, the problem of convergence changes character dramatically if we
rst choose X and then consider Ln(f ). In general, there's no reason to believe that Ln (f )
converges to f . In fact, quite the opposite is true:
Theorem. (Faber, 1914) Given any array of nodes X in  a b ], there is some f 2 C  a b ]
for which kLn(f ) ; f k is unbounded.
The problem here has little to do with interpolation and everything to do with pro-
jections:
Theorem. (Kharshiladze, Lozinski, 1941) For each n, let Ln be a continuous, linear
projection from C  a b ] onto Pn. Then, there is some f 2 C  a b ] for which kLn(f ) ; f k
is unbounded.
Evidently, the operators Ln aren't positive (monotone), for otherwise the Bohman-
Korovkin theorem (and the fact that Ln is a projection onto Pn) would imply that Ln (f )
converges uniformly to f for every f 2 C  a b ].
The proofs of these theorems are long and dicult|we'll save them for another day.
(Some of you may recognize the Principle of Uniform Boundedness at work here.) The
real point here is that we can't have everything: A positive result about convergence of
interpolation will require that we impose some extra conditions on the functions f we want
to approximate. As a rst step in this direction, we prove that if f has suciently many
derivatives, then the error kLn(f ) ; f k can at least be measured.
Interpolation 72
Theorem. Suppose that f has n + 1 continuous derivatives on  a b ]. Let a  x0 <
x1 <    < xn  b, let p 2 Pn be the polynomial that interpolates f at the xi 's, and let
Q
W (x) = ni=0(x ; xi ). Then,
kf ; pk  (n +1 1)! kf (n+1)kkW k:
Proof. We'll prove the Theorem by showing that, given x in  a b ], there is a  in (a b)
with
f (x) ; p(x) = (n +1 1)! f (n+1)() W (x): ( )
If x is one of the xi 's, then both sides of this formula are 0 and we're done. Otherwise,
W (x) 6= 0 and we may set  = f (x) ; p(x)]=W (x). Now consider
'(t) = f (t) ; p(t) ; W (t):
Clearly, '(xi ) = 0 for each i = 0 1 : : : n and, by our choice of , we also have '(x) = 0.
Here comes Rolle's theorem! Since ' has n + 2 distinct zeros in  a b ], we must have
'(n+1)() = 0 for some  in (a b). (Why?) Hence,
0 = '(n+1)() = f (n+1)() ; p(n+1)() ; W (n+1)()
 f (x) ; p(x) 
=f ( n +1) () ;  (n + 1)!
W (x)
because p has degree at most n and W is monic and degree n + 1.
Observations
1. Equation ( ) is called the Lagrange formula with remainder. Compare this result to
Taylor's formula with remainder.]
2. The term f (n+1)() is actually a continuous function of x. That is, f (x) ; p(x)]=W (x)
is continuous its value at an xi is f 0(xi ) ; p 0 (xi )]=W 0 (xi ) (why?) and W 0(xi ) =
Q (x ; x ) 6= 0.
j 6=i i j
3. On any interval  a b ], using any nodes, the sequence of Lagrange interpolating poly-
nomials for ex converge uniformly to ex. In this case,
kex ; Ln(ex )k  (n +c 1)! (b ; a)n
Interpolation 73
where c = kexk in C  a b ]. The same would hold true for any innitely dierentiable
function satisfying, say, kf (n)k  M n (any entire function, for example).
4. On ;1 1 ], the norm of Qni=1(x ; xi ) is minimized by taking xi = cos((2i ; 1)=2n),
the zeros of the n-th Chebyshev polynomial Tn . (Why?) As Rivlin points out, the
zeros of the Chebyshev polynomials are a nearly optimal choice for the nodes if good
uniform approximation is desired.

The question of convergence of interpolation is actually very closely related to the


analogous question for the convergence of Fourier series|and the answer here is nearly
the same. We'll have more to say about this analogy later. First, let's note that Ln is
continuous (bounded) this will give us our rst bit of insight into Faber's negative result.
 
Lemma. kLn(f )k  kf k Pni=0 j`i(x)j  for any f 2 C  a b ].
Proof. Exercise.
P 
The numbers %n =  ni=0 j`i(x)j  are called the Lebesgue numbers associated to this
process. It's not hard to see that %n is the smallest possible constant that will work in
this inequality (in other words, kLnk = %n). Indeed, if
X  X
 j`i(x)j  = n j`i(x0 )j
n
 i=0  i=0
then we can nd an f 2 C  a b ] with kf k = 1 and f (xi ) = sgn(`i (x0 )) for all i. (How?)
Then,
X n  Xn
 
kLn(f )k  jLn(f )(x0 )j =  sgn(`i (x0 )) `i (x0 ) = j`i(x0 )j = %nkf k:
i=0 i=0
As it happens, %n  c log n (this is where the hard work comes in see Rivlin or Natanson
for further details), and, in particular, %n ! 1 as n ! 1.
A simple application of the triangle inequality will allow us to bring En(f ) back into
the picture:
Interpolation 74
Lemma. (Lebesgue's theorem) kf ; Ln(f )k  (1 + %n) En(f ), for any f 2 C  a b ].
Proof. Let p be the best approximation to f out of Pn . Then, since Ln (p ) = p , we
have
kf ; Ln(f )k  kf ; p k + kLn(f ; p )k
 (1 + %n) kf ; p k = (1 + %n) En(f ):
Appendix
Although we won't need anything quite so fancy, it is of some interest to discuss more
general problems of interpolation. We again suppose that we are given distinct points
x0 <    < xn in  a b ], but now we suppose that we are given an array of information
y0 y00 y000 : : : y0(m0)
y1 y10 y100 : : : y1(m1)
.. .. .. ..
. . . . (m )
0 00
yn yn yn : : : yn n
where each mi is a nonnegative integer. Our problem is to nd the polynomial p of least
degree that incorporates all of this data by satisfying
p(x0 ) = y0 p 0 (x0 ) = y00 : : : p(m0)(x0 ) = y0(m0 )
p(x1 ) = y1 p 0 (x1 ) = y10 : : : p(m1)(x1 ) = y1(m1 )
.. .. ..
. . .
p(xn ) = yn p 0 (xn ) = yn0 : : : p(mn )(xn ) = yn(mn ):
In other words, we specify not only the value of p at each xi , but also the rst mi derivatives
of p at xi . This is often referred to as the problem of Hermite interpolation.
Since the problem has a total of m0 + m1 +  + mn + n + 1 \degrees of freedom,"
it won't come as any surprise that is has a (unique) solution p of degree (at most) N =
m0 + m1 +  + mn + n. Rather than discuss this particular problem any further, let's
instead discuss the general problem of linear interpolation.
The notational framework for our problem is an n-dimensional vector space X on
which m linear, real-valued functions (or linear functionals) L1 : : : Lm are dened. The
general problem of linear interpolation asks whether the system of equations
Li (f ) = yi i = 1 ::: m ( )
Interpolation 75
has a (unique) solution f 2 X for any given set of scalars y1 : : : ym 2 R. Since a linear
functional is completely determined by its values on any basis, we would next be led to
consider a basis f1 : : : fn for X , and from here it is a small step to rewrite ( ) as a matrix
equation. That is, we seek a solution f = a1f1 +  + anfn satisfying
a1 L1(f1 ) +  + an L1(fn ) = y1
a1 L2(f1 ) +  + an L2(fn ) = y2
..
.
a1 Lm (f1 ) +  + anLm (fn ) = ym :
If we are to guarantee a solution a1 : : : an for each choice of y1 : : : ym , then we'll need
to have m = n and, moreover, the matrix Li (fj )] will have to be nonsingular.
Lemma. Let X be an n-dimensional vector space with basis f1 : : : fn , and let L1 : : : Ln
be linear functionals on X . Then, L1 : : : Ln are linearly independent if and only if the
; 
matrix Li(fj )] is nonsingular that is, if and only if det Li (fj ) 6= 0.
Proof. If Li (fj )] is singular, then the matrix equation
c1L1(f1 ) +  + cnLn(f1 ) = 0
c1L1(f2 ) +  + cnLn(f2 ) = 0
..
.
c1L1(fn ) +  + cnLn(fn ) = 0
has a nontrivial solution c1 : : : cn. Thus, the functional c1L1 +  + cnLn satises
(c1L1 +  + cnLn)(fi ) = 0 i = 1 : : : n:
Since f1 : : : fn form a basis for X , this means that
(c1L1 +  + cnLn)(f ) = 0
for all f 2 X . That is, c1L1 +    + cnLn = 0 (the zero functional), and so L1 : : : Ln are
linearly dependent.
Conversely, if L1 : : : Ln are linearly dependent, just reverse the steps in the rst part
of the proof to see that Li(fj )] is singular.
Interpolation 76
Theorem. Let X be an n-dimensional vector space and let L1 : : : Ln be linear function-
als on X . Then, the interpolation problem

Li (f ) = yi i = 1 ::: n ( )

always has a (unique) solution f 2 X for any choice of scalars y1 : : : yn if and only if
L1 : : : Ln are linearly independent.
Proof. Let f1 : : : fn be a basis for X . Then, ( ) is equivalent to the system of equations
a1L1 (f1 ) +  + anL1 (fn ) = y1
a1L2 (f1 ) +  + anL2 (fn ) = y2
.. ( )
.
a1 Ln(f1 ) +  + an Ln(fn ) = yn
by taking f = a1 f1 +  + an fn. Thus, ( ) always has a solution if and only if ( ) always
has a solution if and only if Li(fj )] is nonsingular if and only if L1 : : : Ln are linearly
independent. In any of these cases, note that the solution must be unique.
In the case of Lagrange interpolation, X = Pn and Li is evaluation at xi  i.e., Li (f ) =
f (xi ), which is easily seen to be linear in f . Moreover, L0 : : : Ln are linearly independent
provided that x0 : : : xn are distinct. (Why?)
In the case of Hermite interpolation, the linear functionals are of the form Lxk (f ) =
f (k)(x), dierentiation composed with a point evaluation. If x 6= y, then Lxk and Lym
are independent for any k and m if k 6= m, then Lxk and Lxm are independent. (How
would you check this?)
Math 682 Problem Set: Lagrange Interpolation 6/2/98

Throughout, x0 x1 : : : xn are distinct points in some interval  a b ], and V (x0 x1 : : : xn )


denotes the Vandermonde determinant:
 
 0 0
1 x x 2  xn
0 

 1 x1 x1  xn1 
2
V (x0 x1 : : : xn) =  .. .. .. . . ..  :
 . . . . . 
 1 xn x2n  xnn 
. 51. Show by induction that V (x0 x1 : : : xn ) =
Q (xi ; xj ).
0 j<i n
Hint: In order to reduce to the n  n case, replace cj , the j -th column, by cj ; x0 cj;1 ,
starting on the right with j = n. Factor and use the induction hypothesis.]
52. Let y0 y1 : : : yn 2 R be given. Show that the polynomial p 2 Pn satisfying p(xi ) = yi ,
i = 0 1 : : : n, may be written as
 
1 x x2  xn 
 0
1 x0 x20  xn0 
y
p(x) = c  ..0
.. .. . . . .. 
 .
. . . 
1 xn x2n  xnn 
 yn
where c is a certain constant. Find c and prove the formula.
53. Given f 2 C  a b ], let Ln(f ) denote the polynomial of degree at most n that agrees
with f at the xi 's. Prove that Ln is a linear projection onto Pn. That is, show that
Ln(f + g) = Ln(f ) + Ln(g), and that Ln(f ) = f if and only if f 2 Pn.
54. Let `i (x), i = 0 1 : : : n, denote the Lagrange interpolating polynomials of degree
at most n associated with the nodes x0 x1 : : : xn  that is, `i(xj ) =
ij . Show that
Pn ` (x) 1 and, more generally, that Pn xk ` (x) = xk , for k = 0 1 : : : n.
i=0 i i=0 i i
55. If `i and Ln are as above, show that the error in the Lagrange interpolation formula
P
is (Ln(f ) ; f )(x) = ni=0 f (xi ) ; f (x) ] `i (x).

56. With `i and Ln as above, show that kLn(f )k  %nkf k, where %n = Pni=0 j`i(x)j .

Show that no smaller number % has this property for all f 2 C  a b ].
Math 682 Approximation on Finite Sets 6/3/98

In this section we consider a question of computational interest: Since best approximations


are often very hard to nd, how might we approximate the best approximation? The
answer to this question lies in approximations over nite sets. Here's the plan:
(1) Fix a nite subset Xm of  a b ] consisting of m distinct points a  x1 <  < xm  b,
and nd the best approximation to f out of Pn considered as a subspace of C (Xm ).
In other words, if we call the best approximation pn(Xm ), then

max jf (xi ) ; pn(Xm )(xi )j = min max jf (xi ) ; p(xi )j En(f  Xm ):


1 i m p2Pn 1 i m

(2) Argue that this process converges (in some sense) to the best approximation on all
of  a b ] provided that Xm \gets big" as m ! 1. In actual practice, there's no need
to worry about pn(Xm ) converging to pn (the best approximation on all of  a b ])
rather, we will argue that En(f  Xm ) ! En(f ) and appeal to \abstract nonsense."
(3) Find an ecient strategy for carrying out items (1) and (2).

Observations
1. If m  n + 1, then En(f  Xm ) = 0. That is, we can always nd a polynomial p 2 Pn
that agrees with f at n + 1 (or fewer) points. (How?) Of course, p won't be unique if
m < n + 1. (Why?) In any case, we might as well assume that m  n + 2. In fact, as
we'll see, the case m = n + 2 is all that we really need to worry about.
2. If X Y  a b ], then En(f  X )  En(f  Y )  En(f ). Indeed, if p 2 Pn is the best
approximation on Y , then

En(f  X )  max jf (x) ; p(x)j  max jf (x) ; p(x)j = En(f  Y ):


x2X x2Y

Consequently, we expect En(f  Xm ) to increase to En(f ) as Xm \gets big."


Now if we were to repeat our earlier work on characterizing best approximations,
restricting ourselves to Xm everywhere, here's what we'd get:
Finite Sets 79
Theorem. Let m  n + 2. Then,
(i) p 2 Pn is a best approximation to f on Xm if and only if f ; p has an alternating set
containing n +2 points out of Xm  that is, f ; p = En(f  Xm ), alternately, on Xm .
(ii) pn(Xm ) is unique.

Next let's see how this reduces our study to the case m = n + 2.

Theorem. Fix n, m  n + 2, and f 2 C  a b ].


(i) If pn 2 Pn is best on all of  a b ], then there is a subset Xn+2 of  a b ], containing n+2
points, such that pn = pn(Xn+2). Moreover, En(f  Xn+2 )  En(f ) = En(f  Xn+2 )
for any other subset Xn+2 of  a b ], with equality if and only if pn(Xn+2) = pn.
(ii) If pn(Xm ) 2 Pn is best on Xm , then there is a subset Xn+2 of Xm such that
pn(Xm ) = pn(Xn+2 ) and En(f  Xm ) = En(f  Xn+2 ). For any other Xn+2 Xm
we have En(f  Xn+2 )  En(f  Xn+2 ) = En(f  Xm ), with equality if and only if
pn(Xn+2) = pn(Xm ).
Proof. (i): Let Xn+2 be an alternating set for f ; pn over  a b ] containing exactly n +2
points. Then, Xn+2 is also an alternating set for f ; pn over Xn+2. That is, for x 2 Xn+2,

(f (x) ; pn(x)) = En(f ) = max jf (y) ; pn (y)j:


y2Xn+2

So, by uniqueness of best approximations on Xn+2, we must have pn = pn(Xn+2) and
En(f ) = En(f  Xn+2 ). The second assertion follows from a similar argument using the
uniqueness of pn on  a b ].
(ii): This is just (i) with  a b ] replaced everywhere by Xm .

Here's the point: Through some as yet undisclosed method, we choose Xm with
m  n + 2 (in fact, m >> n + 2) such that En(f  Xm )  En(f )  En(f  Xm ) + ", and
then we search for the \best" Xn+2 Xm , meaning the largest value of En(f  Xn+2 ). We
then take pn(Xn+2) as an approximation for pn. As we'll see momentarily, pn(Xn+2) can
be computed directly and explicitly.
Finite Sets 80
Now suppose that the elements of Xn+2 are a  x0 < x1 <  < xn+1  b, let
p = pn (Xn+2) be p(x) = a0 + a1 x +    + anxn , and let
E = En(f  Xn+2 ) = max jf (xi ) ; p(xi )j:
0 i n+1

In order to compute p and E , we use the fact that f (xi ) ; p(xi ) = E , alternately, and
write (for instance)
f (x0 ) = E + p(x0 )
f (x1 ) = ;E + p(x1 )
..
.
f (xn+1 ) = (;1)n+1E + p(xn+1)
(where the \E column" might, instead, read ;E , E , : : :, (;1)n E ). That is, in order to
nd p and E , we need to solve a system of n + 2 linear equations in the n + 2 unknowns
E , a0 : : : an . The determinant of this system is (up to sign)
 1 1 x0    xn0 
 ;1 1 x1    xn1 
 .. .. ... ..  = A0 + A1 +  + An+1 > 0
 . n+1 . . 
(;1) 1 xn    xnn 
where we have expanded by cofactors along the rst column and have used the fact that
each minor Ak is a Vandermonde determinant (and hence each Ak > 0). If we apply
Cramer's rule to nd E we get

E = f (x0 )A0 ; f (xA1 )A+1A+  + (;1)n+1f (xn+1 )An+1


+  + A
0 1 n+1
= 0f (x0 ) ; 1 f (x1 ) +  + (;1)n+1n+1f (xn+1 )
P
where i > 0 and ni=0
P
+1  = 1. Moreover, these same  's satisfy n+1 (;1)i  q (x ) = 0
1 i i=0 i i
for every polynomial q 2 Pn since E = En(q Xn+2 ) = 0 for polynomials of degree at most
n (and since Cramer's rule supplies the same coecients for all f 's).
It may be instructive to see a more explicit solution to this problem. For this, recall
that since we have n + 2 points we may interpolate exactly out of Pn+1. Given this, our
original problem can be rephrased quite succinctly.
Finite Sets 81
Let p be the (unique) polynomial in Pn+1 satisfying p(xi ) = f (xi ), i = 0 1 : : : n + 1,
and let e be the (unique) polynomial in Pn+1 satisfying e(xi ) = (;1)i , i = 0 1 : : : n + 1.
If it is possible to nd a scalar  so that p ; e 2 Pn, then p ; e = pn(Xn+2) and
jj = En(f  Xn+2 ). Why? Because f ; (p ; e) = e = , alternately, on Xn+2 and so
jj = max jf (x) ; (p(x) ; e(x))j. Thus, we need to compare leading coecients of p
x2Xn+2
and e.
Now if p has degree less than n + 1, then p = pn(Xn+2) and En(f  Xn+2 ) = 0. Thus,
 = 0 would do nicely in this case. Otherwise, p has degree exactly n + 1 and the question
is whether e does too. Now,
nX+1
e(x) = W(;0 (1)x )  (xW;(xx) )
i
i=0 i i
Q
where W (x) = ni=0 +1 (x ; x ), and so the leading coecient of e is Pn+1(;1)i =W 0(xi ).
i i=0
We'll be done if we can convince ourselves that this is nonzero. But
Y iY
;1 nY
+1
W 0 (xi ) = (xi ; xj ) = (;1)n;i+1 (xi ; xj ) (xj ; xi )
j 6=i j =0 j =i+1

hence (;1)i =W 0 (xi ) is of constant sign (;1)n+1. Finally, since


nX
+1
f (xi )  W (x)
p(x) =
i=0 W (xi ) (x ; xi )
0

p has leading coecient


Pn+1 f (x )=W 0 (x ) and it's easy to nd the value of .
i=0 i i

Conclusion. pn(Xn+2) = p ; e, where


Pn+1 f (x )=W 0 (x ) nX
+1
 = Pin=0+1(;1)i i =W 0 (xi) = (;1)i i f (xi )
i=0 i i=0
and
0
i = Pn1+1=jW (xi0)j
j =0 1=jW (xj )j
and jj = En(f  Xn+2 ). Moreover, ni=0
P
+1(;1)i  q (x ) = 0 for every q 2 P .
i i n
Finite Sets 82
Example
Find the best linear approximation to f (x) = x2 on X4 = f0 1=3 2=3 1g  0 1 ].
We seek p(x) = a0 + a1 x and we need only consider subsets of X4 of size 1 + 2 = 3.
There are four:

X41 = f0 1=3 2=3g X42 = f0 1=3 1g X43 = f0 2=3 1g X44 = f1=3 2=3 1g:

In each case we nd a p and a  (= E in our earlier setup). For instance, in the case of
X42 we would solve the system of equations f (x) =  + p(x) for x = 0 1=3 1.
9
0 = (2) + a0 >
> (2) = 91
1 = ;(2) + a + 1 a =
9 0 3 1 > =) a0 = ; 19
1 = (2) + a0 + a1
>
!
a1 = 1
In the other three cases you would nd that (1) = 1=18, (3) = 1=9, and (4) = 1=18.
Since we need the largest , we're done: X42 (or X43) works, and p1 (X4 )(x) = x ; 1=9.
(Recall that the best approximation on all of  0 1 ] is p1 (x) = x ; 1=8.)

Where does this leave us? We still need to know that there is some hope of nding an
initial set Xm with En(f ) ; "  En(f  Xm )  En(f ), and we need a more ecient means
; 
of searching through the nm+2 subsets Xn+2 Xm . In order to attack the problem of
nding an initial Xm , we'll need a few classical inequalities. We won't directly attack the
second problem instead, we'll outline an algorithm that begins with an initial set Xn0+2,
containing exactly n + 2 points, which is then \improved" to some Xn1+2 by changing only
a single point.

The Inequalities of Markov and Bernstein


In order to discuss the convergence of approximations over nite sets, we will need to know
that dierentiation is bounded on Pn (a fact that is nearly obvious by itself).
The inequality we'll use is due to A. A. Markov from 1889:
Finite Sets 83
Theorem. If p 2 Pn, and if jp(x)j  1 for jxj  1, then jp 0 (x)j  n2 for jxj  1.
Moreover, jp 0 (x)j = n2 can only occur at x = 1, and only when p =  Tn, the Chebyshev
polynomial of degree n.
Markov's brother, V. A. Markov, later improved on this, in 1916, by showing that
jp(k)(x)j  Tn(k)(1). We've alluded to this fact already (see Rivlin, p. 31), and even more
is true. For our purposes, it's enough to have some bound on dierentiation in particular,
we'll only use
kp 0 k  n2 kpk and kp 00 k  n4kpk
where k  k is the norm in C ;1 1 ].
About 20 years after Markov, in 1912, Bernstein asked for a similar bound for the
derivative of a complex polynomial over the unit disk jzj  1. Now the maximum modulus
theorem tells us that we may reduce to the case jzj = 1, that is, z = ei , and so Bernstein
was able to restate the problem in terms of trig polynomials.
Theorem. If S 2 Tn, and if jS ()j  1, then jS 0 ()j  n. Equality is only possible for
S () = sin n( ; 0).
Our plan is to deduce Markov's inequality from Bernstein's inequality by a method of
proof due to P"olya and Szeg o in 1928. To begin, let's consider the Lagrange interpolation
formula in the case where xi = cos((2i ; 1)=2n), i = 1 : : : n, are the zeros of the
Chebyshev polynomial Tn. Recall that we have ;1 < xn < xn;1 <    < x1 < 1.
Lemma 1. Each polynomial p 2 Pn;1 may be written
1 Xn
i ;
q 2 Tn(x)
p(x) = n p(xi )  (;1) 1 1 ; xi  x ; x :
i=1 i

Proof. We know that the Lagrange interpolation formula is exact for polynomials of
degree < n, and we know that, up to a constant multiple, Tn(x) is the product W (x) =
(x ; x1 )    (x ; xn). All that remains is to compute Tn0 (xi ). But recall that for x = cos 
we have
sin n = pn sin n = pn sin n :
Tn0 (x) = n sin  1 ; cos2  1 ; x2
Finite Sets 84
But for xi = cos((2i ; 1)=2n), i.e., for i = (2i ; 1)=2n, it follows that sin ni =
sin((2i ; 1)=2) = (;1)i;1  that is,
1 = (;1)i;1 1 ; x2i :
p
Tn0 (xi ) n
Lemma 2. For any polynomial p 2 Pn;1, we have
 p
max jp(x)j  max  n 1 ; x2 p(x):

;1 x 1 ;1 x 1

Proof. To save wear and tear, let's write M = max  n 1 ; x2 p(x).


 p 
;1 x 1
First consider an x in the interval  xn x1 ] that is, jxj  cos(=2n) = x1. In this case
p
we can estimate 1 ; x2 from below:
p q r


1 ; x2  1 ; x21 = 1 ; cos2 2n = sin 2n  n1

because sin   2= for 0    =2 (from the mean value theorem). Hence, for jxj 
p
cos(=2n), we get jp(x)j  n 1 ; x2 jp(x)j  M .
Now, for x's outside the interval  xn x1 ], we apply our interpolation formula. In this
case, each of the factors x ; xi is of the same sign. Thus,
Xn p 
jp(x)j = n1  p(xi ) (;1) x 1;;x xi Tn(x) 
i; 1 2

i=1 i
X   X 
 M
n   n
 Tn (x)  = M2  Tn(x)  :
n i=1 x ; xi
2 n  i=1 x ; xi 
But,
X
n T (x)
n
x ; x = Tn0 (x) (why?)
i=1 i
and we know that jTn0 (x)j  n2. Thus, jp(x)j  M .
We next turn our attention to trig polynomials. As usual, given an algebraic poly-
nomial p 2 Pn, we will sooner or later consider S () = p(cos ). In this case, S 0() =
p 0 (cos ) sin  is an odd trig polynomial of degree at most n and jS 0 ()j = jp 0 (cos ) sin j =
Finite Sets 85
p
jp 0 (x) 1 ; x2 j. Conversely, if S 2 Tn is an odd trig polynomial, then S ()= sin  is even,
and so may be written S ()= sin  = p(cos ) for some algebraic polynomial p of degree at
most n ; 1. From Lemma 2,
 S () 
max  sin   = max jp(cos )j  n max jp(cos ) sin j = n max jS ()j:
0  2 0  2 0  2 0  2

This proves

Corollary. If S 2 Tn is an odd trig polynomial, then


 S () 
max  sin    n max jS ()j:
0  2 0  2

Now we're ready for Bernstein's inequality.

Bernstein's Inequality. If S 2 Tn, then


max jS 0 ()j  n max jS ()j:
0  2 0  2


Proof. We rst dene an auxiliary function f ( ) = S ( + ) ; S ( ; ) 2. For 
"
xed, f ( ) is an odd trig polynomial in  of degree at most n. Consequently,
 f ( ) 
 
sin    n0 max
 2
jf ( )j  n max jS ()j:
0  2

But
S 0 () = lim S ( + ) ; S ( ; ) = lim f ( )
!0 2 !0 sin 
and hence jS 0()j  n max jS ()j.
0  2

Finally, we prove Markov's inequality.


Markov's Inequality. If p 2 Pn, then
max jp 0 (x)j  n2 max jp(x)j:
;1 x 1 ;1 x 1
Finite Sets 86
Proof. We know that S () = p(cos ) is a trig polynomial of degree at most n satisfying

max jp(x)j = max jp(cos )j:


;1 x 1 0  2

Since S 0 () = p 0 (cos ) sin  is also trig polynomial of degree at most n, Bernstein's in-
equality yields
max jp0 (cos ) sin j  n max jp(cos )j:
0  2 0  2
In other words,
 p 
max p 0 (x) 1 ; x2   n max jp(x)j:
;1 x 1 ;1 x 1
Since p 0 2 Pn;1, the desired inequality now follows easily from Lemma 2:
 p 
max jp 0 (x)j  n max p 0 (x) 1 ; x2   n2 max jp(x)j:
;1 x 1 ;1 x 1 ;1 x 1

Convergence of Approximations over Finite Sets


In order to simplify things here, we will make several assumptions: For one, we will
consider only approximation over the interval I = ;1 1 ]. As before, we consider a xed
f 2 C ;1 1 ] and a xed integer n = 0 1 2 : : :. For each integer m  1 we choose a nite
subset Xm I , consisting of m points ;1  x1 <  < xm  1 in addition, we will
assume that x1 = ;1 and xm = 1. If we put


m = max min jx ; xi j > 0
x2I 1 i m

then each x 2 I is within


m of some xi . If Xm consists of equally spaced points, for
example, it's easy to see that
m = 1=(m ; 1).
Our goal is to prove
Theorem. If
m ! 0, then En(f  Xm ) ! En(f ).
And we would hope to accomplish this in such a way that
m is a measurable quantity,
depending on f , m, and a prescribed tolerance " = En(f  Xm ) ; En(f ).
As a rst step in this direction, let's bring Markov's inequality into the picture.
Finite Sets 87
Lemma. Suppose that m
m2 n4=2 < 1. Then, for any p 2 Pn, we have
(1) max jp(x)j  (1 ; m);1 max jp(xi )j
;1 x 1 1 i m
and
(2) !p(;1 1 ]
m ) 
m n2(1 ; m );1 max jp(xi )j.
1 i m
Proof. (1): Take a in ;1 1 ] with jp(a)j = kpk. If a = 1 2 Xm , we're done (since
(1 ; m );1 > 1). Otherwise, we'll have ;1 < a < 1 and p 0 (a) = 0. Next, choose xi 2 Xm
with ja ; xi j 
m and apply Taylor's theorem:
p(xi ) = p(a) + (xi ; a) p 0 (a) + (xi ;2 a) p 00 (c)
2

for some c in (;1 1). Re-writing, we have


2
jp(a)j  jp(xi )j +
2m jp 00 (c)j:
And now we bring in Markov:

2 n4
m
kpk  max jp(xi )j + 2 kpk
1 i m
which is what we need.
(2): The real point here is that each p 2 Pn is Lipschitz with constant n2 kpk. Indeed,
jp(s) ; p(t)j = j(s ; t) p 0 (c)j  js ; tjkp 0 k  n2kpkjs ; tj
(from the mean value theorem and Markov's inequality). Thus, !p(
) 
n2 kpk and,
combining this with (1), we get
!p(
m ) 
m n2kpk 
m n2 (1 ; m);1 max jp(xi )j:
1 i m

Now we're ready to compare En(f  Xm ) to En(f ). Our result won't be as good as
Rivlin's (he uses a fancier version of Markov's inequality), but it will be a bit easier to
prove. As in the Lemma, we'll suppose that
2 4
m =
m2n < 1
and we'll set
$m = 1
;m n2 :
m 
Note that as
m ! 0 we also have m ! 0 and $m ! 0.]
Finite Sets 88
Theorem. For f 2 C ;1 1 ],
En(f  Xm )  En(f )  (1 + $m) En (f  Xm ) + !f (;1 1 ]
m ) + $m kf k:

Consequently, if
m ! 0, then En(f  Xm ) ! En(f ) (as m ! 1).
Proof. Let p = pn (Xm ) 2 Pn be the best approximation to f on Xm . Recall that

max jf (xi ) ; p(xi )j = En(f  Xm )  En(f )  kf ; pk:


1 i m

Our plan is to estimate kf ; pk.


Let x 2 ;1 1 ] and choose xi 2 Xm with jx ; xi j 
m . Then,
jf (x) ; p(x)j  jf (x) ; f (xi )j + jf (xi ) ; p(xi )j + jp(xi ) ; p(x)j
 !f (
m ) + En(f  Xm ) + !p(
m )
 !f (
m ) + En(f  Xm ) + $m max jp(xi )j
1 i m

where we've used (2) from the previous Lemma to estimate !p(
m ). All that remains is to
revise this last estimate, eliminating reference to p. For this we use the triangle inequality
again:
max jp(xi )j  max jf (xi ) ; p(xi )j + max jf (xi )j
1 i m 1 i m 1 i m
 En(f  Xm ) + kf k:
Putting all the pieces together gives us our result:

En(f )  !f (
m ) + En(f  Xm ) + $m En(f  Xm ) + kf k :

As Rivlin points out, it is quite possible to give a lower bound on m in the case of, say,
equally spaced points, which will give En(f  Xm )  En(f )  En(f  Xm ) + ", but this is
surely an inecient approach to the problem. Instead, we'll discuss the one point exchange
algorithm.
The One Point Exchange Algorithm
We're given f 2 C ;1 1 ], n, and " > 0.
Finite Sets 89

n+1;i
1. Pick a starting \reference" Xn+2. A convenient choice is the set xi = cos n+1  ,
i = 0 1 ::: n +1. These are the \peak points" of Tn+1 that is, Tn+1(xi ) = (;1)n+1;i
(and so Tn+1 is the polynomial e from our \Conclusion").
2. Find p = pn (Xn+2) and  (by solving a system of linear equations). Recall that
jj = jf (xi ) ; p(xi )j  kf ; p k  kf ; pk
where p is the best approximation to f on all of ;1 1 ].
3. Find (approximately, if necessary) the \error function" e(x) = f (x) ; p(x) and any
point  where jf () ; p()j = kf ; pk. (According to Powell, this can be accomplished
using \local quadratic ts.")
4. Replace an appropriate xi by  so that the new reference Xn0 +2 = fx01 x02 : : :g has the
properties that f (x0i ) ; p(x0i ) alternates in sign, and that jf (x0i ) ; p(x0i )j  jj for all
i. The new polynomial p0 = pn (Xn0 +2) and new 0 must then satisfy

jj = min jf (x0i ) ; p(x0i )j  max jf (x0i ) ; p0 (x0i )j = j0 j:


0 i n+1 0 i n+1

This is an observation due to de la Vall"ee Poussin: Since f ; p alternates in sign on


an alternating set for f ; p0 , it follows that f ; p0 increases the minimum error over
this set. (See the Theorem on page 53 of \Characterization of Best Approximation"
for a precise statement.) Again according to Powell, the new p0 and 0 can be found
quickly through matrix \updating" techniques. (Since we've only changed one of the
xi 's, only one row of the matrix on page 82 needs to be changed.)
5. The new 0 satises j0j  kf ; p k  kf ; p0 k, and the calculation stops when
kf ; p0 k ; j0 j = jf (0 ) ; p0 (0 )j ; j0 j < ":
Math 682 A Brief Introduction to Fourier Series 6/8/98

The Fourier series of a 2-periodic (bounded, integrable) function f is


a0 + X
1;
a cos kx + b sin kx

2 k=1 k k

where the coecients are dened by


1 Z 1 Z
ak =  f (t) cos kt dt and bk =  f (t) sin kt dt:
; ;
Please note that if f is Riemann integrable on ;  ], then each of these integrals is
well-dened and nite indeed,
1 Z
jak j   jf (t)j dt
;
and so, for example, we would have jak j  2kf k for f 2 C 2 .
We write the partial sums of the series as
a 0 Xn; 
sn(f )(x) = 2 + ak cos kx + bk sin kx :
k=1
Now while sn(f ) need not converge pointwise to f (in fact, it may even diverge at a given
point), and while sn (f ) is not typically a good uniform approximation to f , it is still a
very natural choice for an approximation to f in the \least-squares" sense (which we'll
make precise shortly). Said in other words, the Fourier series for f provides a useful
representation for f even if it fails to converge pointwise to f .
Observations
1. The functions 1 cos x cos 2x : : :, sin x sin 2x : : :, are orthogonal on ;  ]. That is,
Z Z Z
cos mx cos nx dx = sin mx sin nx dx = cos mx sin nx dx = 0
; ; ;
for any m 6= n (and the last equation even holds for m = n),
Z Z
cos2 mx dx = sin2 mx dx = 
; ;
Fourier Series 91
R
for any m 6= 0, and, of course, ; 1 dx = 2.
; 
2. What this means is that if T (x) = 20 + Pnk=1 k cos kx + k sin kx , then
1 Z  T (x) cos mx dx = m Z  cos2 mx dx = 
 ;  ; m

for m 6= 0, while
1 Z  T (x) dx = 0 Z  dx =  :
 ; 2 ; 0

That is, if T 2 Tn, then T is actually equal to its own Fourier series.
3. The partial sum operator sn(f ) is a linear projection from C 2 onto Tn.
; 
4. If T (x) = 20 + Pnk=1 k cos kx + k sin kx is a trig polynomial, then
1 Z  f (x) T (x) dx = 0 Z  f (x) dx + X n  Z
k f (x) cos kx dx
 ; 2 ; k=1  ; 
X k 
n Z
+ f (x) sin kx dx
k=1  ;
0 a 0 Xn ; 
= 2 + k ak + k bk
k=1
where (ak ) and (bk ) are the Fourier coecients for f . This should remind you of the
dot product of the coecients.]
5. Motivated by 1, 2, and 4, we dene the inner product of two elements f , g 2 C 2 by
1 Z
hf gi =  f (x) g(x) dx:
;
Note that from 4 we have hf sn (f )i = hsn (f ) sn (f )i for any n. (Why?)
6. If some f 2 C 2 has ak = bk = 0 for all k, then f 0.
Indeed, by 4 (or linearity of the integral), this means that
Z
f (x) T (x) dx = 0
;
for any trig polynomial T . But from Weierstrass's second theorem we know that f is
the uniform limit of some sequence of trig polynomials (Tn ). Thus,
Z Z
f (x)2 dx = lim
n!1 ;
f (x) Tn (x) dx = 0:
;
Fourier Series 92
Since f is continuous, this easily implies that f 0.
7. If f , g 2 C 2 have the same Fourier series, then f g. Hence, the Fourier series for
an f 2 C 2 provides a representation for f (even if the series fails to converge to f ).
8. The coecients a0 a1 : : : an and b1 b2 : : : bn minimize the expression
Z 
'(a0 a1 : : : bn ) = f (x) ; sn (f )(x) 2 dx:
;
It's not hard to see, for example, that
@ ' = Z  2 f (x) ; s (f )(x) cos kx dx = 0
@ ak ; n

precisely when ak satises


Z Z
f (x) cos kx dx = ak cos2 kx dx:
; ;
9. The partial sum sn (f ) is the best approximation to f out of Tn relative to the L2
norm q 1 Z  1=2
kf k2 = hf f i =  f (x)2 dx :
;
(Be forewarned: Some authors prefer 1=2 in place of 1=.) That is,

kf ; sn (f )k2 = min kf ; T k2:


T 2Tn
Moreover, using 4 and 5, we have
kf ; sn (f )k22 = hf ; sn (f ) f ; sn(f )i
= hf f i ; 2 hf sn (f )i + hsn (f ) sn (f )i
= kf k22 ; ksn(f )k22
Z 2 X n
=1 f (x)2 dx ; a20 ; (a2k + b2k ):
; k=1
This should remind you of the Pythagorean theorem.]
10. It follows from 9 that
1 Z  s (f )(x)2 dx = a20 + X
n ;
a2 + b2   1 Z
f (x ) 2 dx:
 ; n 2 k=1 k k  ;
Fourier Series 93
In other symbols, ksn (f )k2  kf k2 . In particular, the Fourier coecients of any
f 2 C 2 are square summable. (Why?)
11. If f 2 C 2 , then its Fourier coecients (an ) and (bn ) tend to zero as n ! 1.
12. It follows from 10 and Weierstrass's second theorem that sn (f ) ! f in the L2 norm
whenever f 2 C 2 . Indeed, given " > 0, choose a trig polynomial T such that
kf ; T k < ". Then, since sn(T ) = T for large enough n, we have
kf ; sn(f )k2  kf ; T k2 + ksn(T ; f )k2
p p
 2kf ; T k2  2 2 kf ; T k < 2 2 ":
(Compare this calculation with Lebesgue's Theorem, page 74.)
By way of comparison, let's give a simple class of functions whose Fourier partial sums
provide good uniform approximations.
Theorem. If f 00 2 C 2 , then the Fourier series for f converges absolutely and uniformly
to f .
Proof. First notice that integration by-parts leads to an estimate on the order of growth
of the Fourier coecients of f :
Z Z  sin kx  1 Z
ak = f (x) cos kx dx = f (x) d k = ;k f 0 (x) sin kx dx
; ; ;
(because f is 2-periodic). Thus, jak j  2kf 0 k=k ! 0 as k ! 1. Now we integrate
by-parts again:
Z Z   Z
;kak = f 0 (x) sin kx dx = f 0 (x) d cos kx = 1
k k ; f 00 (x) cos kx dx
; ;
(because f 0 is 2-periodic). Thus, jak j  2kf 00 k=k2 ! 0 as k ! 1. More importantly,
this inequality (along with the Weierstrass M -test) implies that the Fourier series for f is
both uniformly and absolutely convergent:
 X    X
 a0 + ak cos kx + bk sin kx    a0  + 1 ;jak j + jbk j  C X
1 ;  1 1
 2 k=1  2 k=1 2:
k=1 k
Fourier Series 94
But why should the series actually converge to f ? Well, if we call the sum
a 0 X1; 
g(x) = 2 + ak cos kx + bk sin kx
k=1
then g 2 C 2 (why?) and g has the same Fourier coecients as f (why?). Hence (by 7),
g = f.
Our next chore is to nd a closed expression for sn(f ). For this we'll need a couple of
trig identities the rst two need no explanation.
cos kt cos kx + sin kt sin kx = cos k(t ; x)
2 cos  sin  = sin( +  ) ; sin( ;  )
1
1 + cos  + cos 2 +  + cos n = sin (n + 2 ) 
2 2 sin 21 
Here's a short proof for the third:
X
n X
n
sin 12  + 2 cos k sin 12  = sin 21  + sin (k + 21 )  ; sin (k ; 12 )  = sin (n + 21 ) :
k=1 k=1
The function 1
Dn(t) = sin (n +1 2 ) t
2 sin 2 t
is called Dirichlet's kernel. It plays an important role in our next calculation.
Now we're ready to re-write our formula for sn (f ).
X
n ; 
sn (f )(x) = 21 a0 + ak cos kx + bk sin kx
k=1
Z " X
n #
1
= f (t) 1 + cos kt cos kx + sin kt sin kx dt
2
; k=1
Z " Xn #
1
= f (t) 1 + cos k(t ; x) dt
2
; k=1
Z
f (t)  sin (n +1 2 ) (t ; x) dt
1 1
=
; 2 sin 2 (t ; x)
1 Z 1 Z
= f (t) Dn (t ; x) dt =  f (x + t) Dn (t) dt:
; ;
Fourier Series 95
It now follows easily that sn (f ) is linear in f (because integration against Dn is linear),
that sn(f ) 2 Tn (because Dn 2 Tn), and, in fact, that sn(Tm ) = Tmin(mn). In other words,
sn is indeed a linear projection onto Tn.
While we know that sn(f ) is a good approximation to f in the L2 norm, a better
understanding of its eectiveness as a uniform approximation will require a better under-
standing of the Dirichlet kernel Dn . Here are a few pertinent facts:

Lemma. (a) Dn is even,


(b) 1 Z  D (t) dt = 2 Z  D (t) dt = 1,
 ; n  0 n
(c) jDn(t)j  n + 12 and Dn(0) = n + 21 ,
(d)
j sin (n + 12 ) t j  jD (t)j   for 0 < t < ,
t Z n 2t
1 
(e) If n =  jDn(t)j dt, then 42 log n  n  3 + log n.
;
Proof. (a), (b), and (c) are relatively clear from the fact that

Dn(t) = 21 + cos t + cos 2t +  + cos nt:


(Notice, too, that (b) follows from the fact that sn(1) = 1.) For (d) we use a more delicate
estimate: Since 2=  sin    for 0 <  < =2, it follows that 2t=  2 sin(t=2)  t for
0 < t < . Hence,
  j sin (n + 21 ) t j  j sin (n + 12 ) t j
2t 2 sin 21 t t
for 0 < t < . Next, the upper estimate in (e) is easy:
Z Z
2  jD (t)j dt = 2  j sin(n + 12 ) t j dt
 0 n  0 2 sin 12 t
2 Z 1=n 2 Z 
  1
(n + 2 ) dt +
 1=n 2t dt
0

= 2nn+ 1 + log  + log n < 3 + log n:


The lower estimate takes some work:
2 Z  jD (t)j dt = 2 Z  j sin (n + 21 ) t j dt  2 Z  j sin (n + 12 ) t j dt
 0 n  0 2 sin 21 t  0 t
Fourier Series 96
2 Z (n+ 2 ) j sin x j
1
2 Z n j sin x j
= 
0 x dx   0 x dx
2 Xn Z k j sin x j
=  x dx
k=1 ( k ; 1)

2 Xn 1 Z k 4 Xn 1
  k j sin x j dx = 2 k
k=1 (k;1) k=1

 42 log n
P
because nk=1 k1  log n.
R
The numbers n = kDn k1 = 1 ; jDn(t)j dt are called the Lebesgue numbers asso-
ciated to this process (compare this to the terminology we used for interpolation). The
point here is that n gives the norm of the partial sum operator (projection) on C 2 and
(just as with interpolation) n ! 1 as n ! 1. As a matter of no small curiosity, notice
that, from Observation 10, the norm of sn as an operator on L2 is 1.
Corollary. If f 2 C 2 , then
1 Z
jsn(f )(x)j   jf (x + t)jjDn (t)j dt  nkf k: ( )
;
In particular, ksn(f )k  nkf k  (3 + log n)kf k.
If we approximate the function sgn Dn by a continuous function f of norm one, then
1 Z
sn (f )(0)   jDn(t)j dt = n :
;
Thus, n is the smallest constant that works in ( ). The fact that the partial sum operators
are not uniformly bounded on C 2 , along with the Baire category theorem, tells us that
there must be some f 2 C 2 for which ksn(f )k is unbounded. But, as we've seen, this has
more to do with projections than it does with Fourier series:
Theorem. (Kharshiladze, Lozinski) For each n, let Ln be a continuous, linear projection
from C 2 onto Tn . Then, there is some f 2 C 2 for which kLn(f ) ; f k is unbounded.
Although our last Corollary may not look very useful, it does give us some information
about the eectiveness of sn(f ) as a uniform approximation to f . Specically, we have
Lebesgue's theorem:
Fourier Series 97
Theorem. If f 2 C 2 , and if we set EnT (f ) = min kf ; T k, then
T 2Tn

EnT (f )  kf ; sn (f )k  (4 + log n) EnT (f ):

Proof. Let T  be the best uniform approximation to f out of Tn . Then, since sn (T  ) =


T  , we get
kf ; sn(f )k  kf ; T k + ksn (T  ; f )k  (4 + log n) kf ; T  k:

As an application of Lebesgue's theorem, let's speak brie&y about \Chebyshev se-


ries," a notion that ts neatly in between our discussions of approximation by algebraic
polynomials and by trig polynomials.
Theorem. Suppose that f 2 C ;1 1 ] is twice continously di erentiable. Then, f may
be written as a uniformly and absolutely convergent Chebyshev series that is, f (x) =
P1 ak Tk (x), where P1 jak j < 1.
k=0 k=0
Proof. As usual, consider '() = f (cos ) 2 C 2 . Since ' is even and twice dieren-
tiable, its Fourier series is an absolutely and uniformly convergent cosine series:
X
1 X
1
f (cos ) = '() = ak cos k = ak Tk (cos )
k=0 k=0
P a T (x).
where jak j  2k' 00 k=k2. Thus, f (x) = 1
k=0 k k
P
If we write S (f )(x) = n a T (x), we get an interesting consequence of this
n k=0 k k
Theorem. First, notice that
Sn(f )(cos ) = sn(')():
Thus, from Lebesgue's theorem,
En(f )  kf ; Sn(f )kC ;11 ] = k' ; sn(')kC2
 (4 + log n) EnT (') = (4 + log n) En(f ):
For n < 400, this reads
En(f )  kf ; Sn(f )k  10 En(f ):
Fourier Series 98
P
That is, for numerical purposes, the error incurred by using nk=0 ak Tk (x) to approximate
f is within one decimal place accuracy of the best approximation! Notice, too, that En(f )
would be very easy to estimate in this case since
 X  X
En(f )  kf ; Sn(f )k =  ak Tk   jak j  2 k' 00 k X 1 :
 k>n  k>n k>n k
2

Lebesgue's theorem should remind you of our \fancy" version of Bernstein's theorem
if we knew that EnT (f ) log n ! 0 as n ! 1, then we'd know that sn(f ) converged uni-
formly to f . Our goal, then, is to improve our estimates on EnT (f ), and the idea behind
these improvements is to replace Dn by a better kernel (with regard to uniform approx-
imation). Before we pursue anything quite so delicate as an estimate on EnT (f ), though,
let's investigate a simple (and useful) replacement for Dn.
Since the sequence of partial sums (sn ) need not converge to f , we might try looking
at their arithmetic means (or Ces'aro sums):
n = s0 + s1 +n + sn;1 :
(These averages typically have better convergence properties than the partial sums them-
selves. Consider n in the (scalar) case sn = (;1)n, for example.) Specically, we set
h i
n (f )(x) = n1 s0 (f )(x) +  + sn;1(f )(x)
Z " nX ;1 # Z
= 1 1
f (x + t) n Dk (t) dt =  1 f (x + t) Kn (t) dt
; k=0 ;
where Kn = (D0 + D1 +  + Dn;1)=n is called Fej
er's kernel. The same techniques we
used earlier can be applied to nd a closed form for n (f ) which, of course, reduces to
simplifying (D0 + D1 +  + Dn;1 )=n. As before, we begin with a trig identity:
nX
;1 nX
;1
2 sin  sin (2k + 1) = cos 2k ; cos (2k + 2)
k=0 k=0
= 1 ; cos 2n = 2 sin2 n:
Thus,
1 nX;1 sin (2k + 1) t=2 sin2 (nt=2)
Kn(t) = n = :
k=0 2 sin (t=2) 2n sin2(t=2)
Fourier Series 99
R
Please note that Kn is even, nonnegative, and 1 ; Kn (t) dt = 1. Thus, n(f ) is
a positive, linear map from C 2 onto Tn (but it's not a projection|why?), satisfying
k n(f )k2  kf k2 (why?).
Now the arithmetic mean operator n (f ) is still a good approximation f in L2 norm.
Indeed,  nX
;1  nX
;1
kf ; n (f )k2 = n  (f ; sk (f ))   n1 kf ; sk (f )k2 ! 0
1
k=0 2 k=0
as n ! 1 (since kf ; sk (f )k2 ! 0). But, more to the point, n (f ) is actually a good
uniform approximation to f , a fact that we'll call Fej
er's theorem:
Theorem. If f 2 C 2 , then n(f ) converges uniformly to f as n ! 1.
Note that, since n(f ) 2 Tn, Fej"er's theorem implies Weierstrass's second theorem.
Curiously, Fej"er was only 19 years old when he proved this result (about 1900) while
Weierstrass was 75 at the time he proved his approximation theorems.
We'll give two proofs of Fej"er's theorem one with details, one without. But both
follow from quite general considerations. First:

Theorem. Suppose that kn 2 C 2 satis es


(a) kn  0,
1 Z
(b)  kn(t) dt = 1, and
Z ; 
(c) kn(t) dt ! 0 for every
> 0.
 jtj 
Then, 1Z f (x + t) kn(t) dt f (x) for each f 2 C 2 .
;
Proof. Let " > 0. Since f is uniformly continuous, we may choose
> 0 so that
jf (x) ; f (x + t)j < ", for any x, whenever jtj <
. Next, we use the fact that kn is
nonnegative and integrates to 1 to write
 1 Z  1 
Z  

f (x) ;  =   f (x) ; f (x + t) kn(t) dt 
 ; f (x + t) kn(t) dt  ;
1 Z  
  f (x) ; f (x + t) kn(t) dt
;
Fourier Series 100
" Z 2 k f k Z
  kn(t) dt +  kn(t) dt
jtj<  jtj 
< " + " = 2"

for n suciently large.


To see that Fej"er's kernel satises the conditions of the Theorem is easy: In particular,
(c) follows from the fact that Kn (t) 0 on the set
 jtj  . Indeed, since sin(t=2)
increases on
 t   we have
2
Kn(t) = sin (nt=
2
2)  1 ! 0:
2n sin (t=2) 2n sin2(
=2)
Our second proof, or sketch, really, is based on a variant of the Bohman-Korovkin
theorem for C 2 , due to Korovkin. In this setting, the three \test cases" are

f0(x) = 1 f1(x) = cos x and f2 (x) = sin x:

Theorem. Let (Ln) be a sequence of positive, linear maps on C 2 . If Ln(f ) f for


each of the three functions f0(x) = 1, f1 (x) = cos x, and f2 (x) = sin x, then Ln(f ) f
for every f 2 C 2 .
We won't prove this theorem rather, we'll check that n(f ) f in each of the three
test cases. Since sn is a projection, this is painfully simple!

n(f0 ) = n1 (f0 + f0 +  + f0 ) = f0


n(f1 ) = n1 (0 + f1 +  + f1) = n;n 1  f1 f1
n(f2 ) = n1 (0 + f2 +  + f2) = n;n 1  f2 f2 :

Kernel operators abound in analysis for example, Landau's proof of the Weierstrass
theorem uses the kernel Ln(x) = cn(1 ; x2)n . And, in the next section, we'll encounter
Jackson's kernel Jn(t) = cn sin4 nt=n3 sin4 t, which is essentially the square of Fej"er's kernel.
While we will have no need for a general theory of such operators, please note that the key
to their utility is the fact that they're nonnegative!
Fourier Series 101
Lastly, a word or two about Fourier series involving complex coecients. Most modern
textbooks consider the case of a 2-periodic, integrable function f : R ! C and dene the
Fourier series of f by
X1
ck eikt
k=;1
where now we have only one formula for the ck 's:
1 Z
ck = 2 f (t) e;ikt dt
;
but, of course, the ck 's may well be complex. This somewhat simpler approach has other
advantages for one, the exponentials eikt are now an orthonormal set (relative to the
normalizing constant 1=2). And, if we remain consistent with this choice and dene the
L2 norm by 1 Z 1=2
kf k2 = 2 jf (t)j dt 2
;
then we have the simpler estimate kf k2  kf k for f 2 C 2 .
The Dirichlet and Fejer kernels are essentially the same in this case, too, except that
P
we would now write sn(f )(x) = nk=;n ck eikx. Given this, the Dirichlet and Fej"er kernels
can be written
X
n Xn
Dn (x) = e = 1 + (eikx + e;ikx)
ikx
k=;n k=1
X n
=1+2 cos kx
k=1
sin (n + 12 ) x
=
sin 12 x
and
nX
;1 X m Xn  
1
Kn (x) = n ikx
e = 1 ; jnkj eikx
m=0 k=;m k=;n
1 nX
;1 sin(m + 1 ) x
= 2
n m=0 sin 12 x
sin 2 (nt=2)
= :
n sin2(t=2)
Fourier Series 102
In other words, each is twice its real coecient counterpart. Since the choice of normalizing
p p
constant (1= versus 1=2, and sometimes even 1=  or 1= 2 ) has a (small) eect on
these formulas, you may nd some variation in other textbooks.
Math 682 Problem Set: Fourier Series 6/8/98

57. Dene f (x) = ( ; x)2 for 0  x  2, and extend f to a 2-periodic continuous
function on R in the obvious way. Check that the Fourier series for f is 2 =3 +
P cos nx=n2 . Since this series is uniformly convergent, it actually converges to f .
4 1 n=1
In particular, note that setting x = 0 yields the familiar formula 1
P 1=n2 = 2 =6.
n=1
58. (a) Given n  1 and " > 0, show that there is a continuous function f 2 C 2
R
satisfying kf k = 1 and 1 ; jf (t) ; sgn Dn(t)j dt < "=(n + 1).
(b) Show that sn(f )(0)  n ; " and, hence, that ksn (f )k  n ; ".
R
59. (a) If f , k 2 C 2 , prove that g(x) = ; f (x + t) k(t) dt is also in C 2 .
(b) If we only assume that f is 2-periodic and Riemann integrable on ;  ] (but
still k 2 C 2 ), is g still continuous?
(c) If we simply assume that f and k are 2-periodic and Riemann integrable on
;  ], is g still continuous?
60. Suppose that kn 2 C 2 satises
1 Z Z
kn  0  kn(t) dt = 1 and kn(t) dt ! 0 (n ! 1)
;  jtj 
R
for every
> 0. If f is Riemann integrable, show that 1 ; f (x + t) kn(t) dt ! f (x)
pointwise, as n ! 1, at each point of continuity of f . In particular, n (f )(x) ! f (x)
at each point of continuity of f .
61. Given f , g 2 C 2 , we dene the convolution of f and g, written f g, by
1 Z
(f g)(x) =  f (t) g(x ; t) dt:
;
(Compare this integral with that used in problem 59.)
(a) Show that f g = g f and that f g 2 C 2 .
(b) If one of f or g is a trig polynomial, show that f g is again a trig polynomial
(of the same degree).
(c) If one of f or g is continuously dierentiable, show that f g is likewise continu-
ously dierentiable and nd an integral formula for (f g)0(x).
Math 682 Jackson's Theorems 6/16/98

We continue our investigations of the \middle ground" between algebraic and trigonometric
approximation by presenting several results due to the great American mathematician
Dunham Jackson (from roughly 1911{1912). The rst of these results will give us the best
possible estimate of En(f ) in terms of !f and n.

Jackson's Theorem 1. If f 2 C 2 , then EnT (f )  6 !f (;  ] n1 ).


Theorem 1 should be viewed as an improvement over Bernstein's Theorem, which
stated that En(f )  32 !f ( p1n ) for f 2 C ;1 1 ]. As we'll see, the proof of Theorem 1
not only mimics the proof of Bernstein's result, but also uses some of the ideas we talked
about in the last section. In particular, the proof we'll give involves integration against an
\improved" Dirichlet kernel.
Before we dive into the proof, let's list several immediate and important Corollaries:

Corollary. Weierstrass's second theorem (since !f ( n1 ) ! 0 for any f 2 C 2 ).


Corollary. The Dini-Lipschitz theorem: If !f ( n1 ) log n ! 0 as n ! 1, then the Fourier
series for f converges uniformly to f .

Proof. From Lebesgue's theorem,


1
kf ; sn (f )k  (4 + log n) EnT (f )  6 (4 + log n) !f n ! 0:

Jackson's Theorem 2. If f 2 C ;1 1 ], then En(f )  6 !f (;1 1 ] n1 ).


Proof. Let '() = f (cos ). Then, as we've seen,
   
En(f ) = EnT (')  6 !' ;  ] n1  6 !f ;1 1 ] n1
where the last inequality follows from the fact that

j'() ; '( )j = jf (cos ) ; f (cos  )j  !f (j cos  ; cos  j)  !f (j ;  j):


Jackson's Theorems 105
Corollary. If f 2 lipK  on ;1 1 ], then En(f )  6Kn; . (Recall that Bernstein's
theorem gives only n;=2 .)
Corollary. If f 2 C ;1 1 ] has a bounded derivative, then En(f )  n6 kf 0 k.
Corollary. If f 2 C ;1 1 ] has a continuous derivative, then En(f )  n6 En;1(f 0 ).
Proof. Let p 2 Pn;1 be the best uniform approximation to f 0 and consider p(x) =
R x p (t) dt 2 P . From the previous Corollary,
;1 n

En(f ) = En(f ; p) (Why?)


 n6 kf 0 ; p k = n6 En;1(f 0 ):

Iterating this last inequality will give the following result:


Corollary. If f 2 C ;1 1 ] is k-times continuously di erentiable, then
k+1  
6
En(f )  n(n ; 1)  (n ; k + 1) !k n ;1 k
where !k is the modulus of continuity of f (k).
Well, enough corollaries. It's time we proved Jackson's Theorem 1. Now Jackson's
approach was to show that
 
1 Z  f (x + t)  c sin nt 4 dt
 ; n sin t f (x)
where Jn(t) = cn(sin nt= sin t)4 is the \improved" kernel we alluded to earlier (it's essen-
tially the square of Fej"er's kernel). The approach we'll take, due to Korovkin, proves the
existence of a suitable kernel without giving a tidy formula for it. On the other hand,
it's relatively easy to outline the idea. The key here is that Jn(t) should be an even,
R
nonnegative, trig polynomial of degree n with 1 ; Jn(t) dt = 1. In other words,
1 Xn
Jn(t) = 2 + kn cos kt
k=1
(why is the rst term 1/2?), where 1n : : : nn must be chosen so that Jn(t)  0. As-
suming we can nd such kn's, here's what we get:
Jackson's Theorems 106
Lemma. If f 2 C 2 , then
 1 Z  1 " r
1 ; 
#
 f (x) ;  1n :
 ; f (x + t) Jn (t) dt   !f n  1 + n 2
Proof. We already know how the rst several lines of the proof will go:
 Z  Z   
 f (x) ; 1  f (x + t) Jn (t) dt 
1
= 

 
f (x) ; f (x + t) Jn(t) dt 
 ; ;
Z 
 1 jf (x) ; f (x + t)j Jn(t) dt
;
Z 
 1 !f ( jtj ) Jn (t) dt:
;
Next we borrow a trick from Bernstein. We replace !f ( jtj ) by
 1 ;  1
!f ( jtj ) = !f njtj  n  1 + njtj !f n
and so the last integral on the right-hand side, above, is dominated by
1 1 Z  ;  1  n Z  
!f n   1 + njtj Jn(t) dt = !f n  1 +  jtj Jn(t) dt :
; ;
R
All that remains is to estimate ; jtj Jn(t) dt, and for this we'll appeal to the Cauchy-
Schwarz inequality (again, compare this to the proof of Bernstein's theorem).
1 Z  jtj J (t) dt = 1 Z  jtj J (t)1=2 J (t)1=2 dt
 ; n  ; n n
1 Z  1=2  1 Z  1=2
  jtj2 Jn(t) dt  ; Jn(t) dt
;
1 Z  1=2
=  ; jtj2 Jn (t) dt :
But,   t 2  2
jtj2   sin 2 = 2 (1 ; cos t ):
So,
 2 1=2 r
1 Z  jtj J (t) dt   1 Z
(1 ; cos t ) J ( t ) dt =  1 ; 1n :
 ; n 2  ; n 2
Jackson's Theorems 107
Now we still have to prove that we can actually nd a suitable choice of scalars
1n : : : nn. We already know that we need to choose the kn's so that Jn(t) will be
nonnegative, but now it's clear that we also want 1n to be very close to 1. To get us
started, let's rst see why it's easy to generate nonnegative cosine polynomials. Given real
numbers c0 : : : cn, note that
 Xn 2 X
n ! 0X
n
1
X
0   ck eikx 
 = ck eikx @ cj e;ijx A = ck cj ei(k;j)x
k=0 k=0 j =0 kj
X 2 X ; i(k;j)x i(j;k)x 
n
= ck + ck cj e +e
k=0 k>j
Xn X
= c2k + 2 ck cj cos(k ; j )x
k=0 k>j
X 2 nX
n ;1
= ck + 2 ck ck+1 cos x +  + 2c0cn cos nx: ( )
k=0 k=0
In particular, we need to nd c0 : : : cn with
X
n 1 nX
;1
c2
k = 2 and 1n = 2 ck ck+1  1:
k=0 k=0
P P
;1 c c  n c2 , and then normalize. But, in fact,
What we'll do is nd ck 's with nk=0 k k+1 k=0 k
we won't actually nd anything|we'll simply write down a choice of ck 's that happens to
work! Consider:
X
n k + 1  k + 2  Xn k + 1   k 
sin  sin
n+2  n+2 = sin n + 2  sin n + 2 
k=0 k=0
1Xn   k
  k + 2   k + 1 
= 2 sin n + 2  + sin n + 2  sin n + 2  :
k=0
By changing the index of summation, it's easy to see that rst two sums are equal and,
hence, each is equal to the average of the two. Next we re-write this last sum, using the
;  ;  ; 
trig identity 21 sin A + sin B = cos A;2 B sin A+2 B , to get
X
n k + 1  k + 2    X
n k + 1 
sin  sin  = cos sin2  :
k=0 n+2 n+2 n+2 k=0 n+2
Jackson's Theorems 108


k+1
Since cos n+2  1 for large n, we've done it! If we dene ck = c  sin n+2  , where
c is chosen
so that
Pn c2 = 1=2, and if we dene Jn(x) using ( ), then Jn(x)  0 and

k=0 k
1n = cos n+2 (why?). The estimate needed in our Lemma becomes
v
u

r u
t 1 ; cos    
1 ; 1n = n+2
= sin 2n + 4  2n
2 2
and so we have
 2   1  1
En (f )  1 + 2 !f n < 6 !f n :
T

Jackson's theorems are what we might call direct theorems. If we know something
about f , then we can say something about En(f ). There is also the notion of an inverse
theorem, meaning that if we know something about En(f ), we should be able to say
something about f . In other words, we would expect an inverse theorem to be, more or
less, the converse of some direct theorem. Now inverse theorems are typically much harder
to prove than direct theorems, but in order to have some idea of what such theorems might
tell us (and to see some of the techniques used in their proofs), we present one of the easier
inverse theorems, due to Bernstein. This result gives the converse to one of our corollaries
to Jackson's theorem (see the top of page 105).
Theorem. If f 2 C 2 satis es EnT (f )  A n; , for some constants A and 0 <  < 1,
then f 2 lipK  for some constant K .
Proof. For each n, choose Un 2 Tn so that kf ; Un k  A n; . Then, in particular,
(Un) converges uniformly to f . Now if we set V0 = U1 and Vn = U2n ; U2n;1 for n  1,
P Vn . Indeed,
then Vn 2 T2n and f = 1 n=0
kVn k  kU2n ; f k + kU2n;1 ; f k  A (2n ); + A (2n;1 ); = B  2;n
which is summable thus, the (telescoping) series 1
P
n=0 Vn converges uniformly to f .
(Why?)
Next we estimate jf (x) ; f (y)j using nitely many of the Vn's, the precise number to
be specied later. Using the mean value theorem and Bernstein's inequality we get
X
1
jf (x) ; f (y)j  jVn(x) ; Vn(y)j
n=0
Jackson's Theorems 109
mX
;1 X
1
 jVn(x) ; Vn (y)j + 2 kVn k
n=0 n=m
mX;1 X
1
= 0
jVn(n )jjx ; yj + 2 kVn k
n=0 n=m
mX
;1 X
1
 jx ; y j 2nkVnk + 2 kVnk
n=0 n=m
mX;1 X 1
 jx ; yj B 2n(1;) + 2 B 2;n
h n=0 m(1;) ;nm=mi
 C jx ; y j  2 +2 ( )
where we've used, in the fourth line, the fact that Vn 2 T2n and, in the last line, standard
estimates for geometric series. Now we want the right-hand side to be dominated by a
constant times jx ; yj . In other words, if we set jx ; yj =
, then we want

 2m(1;) + 2;m  D 

or, equivalently,
(2m
)(1;) + (2m
);  D:
Thus, we should choose m so that 2m
is both bounded above and bounded away from
zero. For example, if 0 <
< 1, we could choose m so that 1  2m
< 2.
In order to better explain the phrase \more or less the converse of some direct theo-
rem," let's see how the previous result falls apart when  = 1. Although we might hope
that EnT (f )  A=n would imply that f 2 lipK 1, it happens not to be true. The best result
in this regard is due to Zygmund, who gave necessary and sucient conditions on f so
that EnT (f )  A=n (and these conditions do not characterize lipK 1 functions). Instead of
pursuing Zygmund's result, we'll settle for simple \surgery" on our previous result, keeping
an eye out for what goes wrong. This result is again due to Bernstein.
Theorem. If f 2 C 2 satis es EnT (f )  A=n, then !f (
)  K
j log
j for some constant
K and all
suciently small.
Proof. If we repeat the previous proof, setting  = 1, only a few lines change. In
particular, the conclusion of that long string of inequalites ( ) would now read
jf (x) ; f (y)j  C jx ; yj  m + 2;m = C m
+ 2;m :
Jackson's Theorems 110
Clearly, the right-hand side cannot be dominated by a constant times
, as we might have
hoped, for this would force m to be bounded (independent of
), which in turn bounds

away from zero. But, if we again think of 2m


as the \variable" in this inequality, then the
term m
suggests that the correct order of magnitude of the right-hand side is
j log
j.
Thus, we would try to nd a constant D so that

m
+ 2;m  D 
j log
j
or
m(2m
) + 1  D  (2m
)j log
j:

Now if we take 0 <


< 1=2, then log 2 < ; log
= j log
j. Hence, if we again choose
m  1 so that 1  2m
< 2, we'll get

m log 2 + log
< log 2 =) m < log 2log; 2log
< log2 2 j log
j

and, nally,

m(2m
) + 1  2m + 1  3m  log6 2 j log
j  log6 2 (2m
) j log
j:
Math 682 Orthogonal Polynomials 6/17/98

Given a positive (except possibly at nitely many points), Riemann integrable weight
function w(x) on  a b ], the expression
Zb
hf gi = f (x) g(x) w(x) dx
a
denes an inner product on C  a b ] and
Z b ! 1=2 p
kf k2 = f (x)2 w(x) dx = hf f i
a

denes a strictly convex norm on C  a b ]. Thus, given a nite dimensional subspace E of


C  a b ] and an element f 2 C  a b ], there is a unique g 2 E such that

kf ; gk2 = min kf ; hk2:


h2E

We say that g is the least-squares approximation to f out of E (relative to w).


Now if we apply the Gram-Schmidt procedure to the sequence 1 x x2 : : :, we will
arrive at a sequence (Qn ) of orthogonal polynomials relative to the above inner product.
In this special case, however, the Gram-Schmidt procedure simplies substantially:
Theorem. The following procedure de nes a sequence (Qn ) of orthogonal polynomials
(relative to w). Set:

Q0 (x) = 1 Q1(x) = x ; a0 and Qn+1(x) = (x ; an )Qn (x) ; bnQn;1 (x)

for n  1, where
" "
an = h x Qn Qn i h Qn Qn i and bn = h x Qn Qn;1 i h Qn;1 Qn;1 i

(and where x Qn is shorthand for the polynomial x Qn (x)).


Proof. It's easy to see from these formulas that Qn is a monic polynomial of degree
exactly n. In particular, the Qn 's are linearly independent (and nonzero).
Orthogonal Polynomials 112
Now we checked in class that Q0 , Q1 , and Q2 are mutually orthogonal, so let's use
induction and check that Qn+1 is orthogonal to each Qk , k  n. First,

h Qn+1 Qn i = h x Qn Qn i ; anh Qn Qn i ; bn h Qn;1 Qn i = 0


and
h Qn+1 Qn;1 i = h x Qn Qn;1 i ; anh Qn Qn;1 i ; bnh Qn;1 Qn;1 i = 0
since h Qn;1 Qn i = 0. Next, we take k < n ; 1 and use the recurrence formula twice:

h Qn+1 Qk i = h x Qn Qk i ; an h Qn Qk i ; bnh Qn;1 Qk i


= h x Qn Qk i = h Qn x Qk i (Why?)
= h Qn Qk+1 + ak Qk + bk Qk;1 i = 0
since k + 1 < n.
Observations
1. Using the same trick as above, we have
" "
bn = h x Qn Qn;1 i h Qn;1 Qn;1 i = h Qn Qn i h Qn;1 Qn;1 i > 0:

2. Each p 2 Pn can be uniquely written p = Pni=0 iQi , where i = h p Qi i"h Qi Qi i.


3. If Q is any monic polynomial of degree exactly n, then Q = Qn + Pin=0;1 i Qi (why?)
and hence
nX
;1
kQk22 = kQnk22 + 2i kQi k22 > kQnk22
i=0
unless Q = Qn. That is, Qn has the least k  k2 norm of all monic polynomials of
degree n.
4. The Qn 's are unique in the following sense: If (Pn) is another sequence of orthogonal
polynomials such that Pn has degree exactly n, then Pn = nQn for some n 6= 0.
(Why?) Consequently, there's no harm in referring to the Qn 's as the sequence of
orthogonal polynomials relative to w.
5. For n  1 note that Rab Qn(t) w(t) dt = hQ0 Qn i = 0.
Orthogonal Polynomials 113
Examples
1. On ;1 1 ], the Chebyshev polynomials of the rst kind (Tn) are orthogonal relative
p
to the weight w(x) = 1= 1 ; x2 .
Z1 Z 8 0 m =6 n
 <
Tm (x) Tn (x) p dx 2 = cos m cos n d = :  m=n=0
;1 1;x 0 =2 m = n = 6 0.
Since Tn has degree exactly n, this must be the right choice. Notice, too, that
p
p12 T0 T1 T2 : : : are orthonormal relative to the weight 2= 1 ; x2 .
In terms of the inductive procedure given above, we must have Q0 = T0 = 1
and Qn = 2;n+1Tn for n  1. (Why?) From this it follows that an = 0, b1 = 1=2,
and bn = 1=4 for n  2. (Why?) That is, the recurrence formula given in our rst
Theorem reduces to the familar relationship Tn+1(x) = 2x Tn (x) ; Tn;1 (x). Curiously,
Qn = 2;n+1Tn minimizes both
Z 1 dx 1=2
max jp(x)j and p(x) p
2
;1 x 1 ;1 1 ; x2
over all monic polynomials of degree exactly n.
The Chebyshev polynomials also satisfy (1 ; x2 ) Tn00 (x) ; x Tn0 (x) + n2 Tn (x) = 0.
Since this is a polynomial identity, it suces to check it for all x = cos . In this case,
Tn0 (x) = n sin
sin n

and
2
Tn00 (x) = n cos n sin  ; n sin n cos  :
sin  (; sin )
2
Hence,
(1 ; x2 ) Tn00 (x) ; x Tn0 (x) + n2 Tn(x)
= ;n2 cos n + n sin n cot  ; n sin n cot  + n2 cos  = 0
2. On ;1 1 ], the Chebyshev polynomials of the second kind (Un) are orthogonal relative
p
to the weight w(x) = 1 ; x2 .
Z1
Um (x) Un (x) (1 ; x2 ) p dx 2
;1 1;x
Z  sin (m + 1) sin(n + 1)  0 m =6 n
= sin   sin   sin  d =
2
=2 m = n.
0
Orthogonal Polynomials 114
While we're at it, notice that
Tn0 (x) = n sin
sin n = n U (x):
 n;1
As a rule, the derivatives of a sequence of orthogonal polynomials are again orthogonal
polynomials, but relative to a dierent weight.
3. On ;1 1 ] with weight w(x) 1, the sequence (Pn ) of Legendre polynomials are
orthogonal, and are typically normalized by Pn(1) = 1. The rst few Legendre poly-
nomials are P0(x) = 1, P1(x) = x, P2(x) = 23 x2 ; 12 , and P3(x) = 25 x3 ; 32 x. (Check
this!) After we've seen a few more examples, we'll come back and give an explicit
formula for Pn.
4. All of the examples we've seen so far are special cases of the following: On ;1 1 ],
consider the weight w(x) = (1 ; x) (1 + x) , where   > ;1. The corresponding
orthogonal polynomials (Pn( )) are called the Jacobi polynomials and are typically
normalized by requiring that
n +  ( + 1)( + 2)    ( + n)
( 
Pn (1) = ) :
 n!
It follows that Pn(00) = Pn,
Pn(;1=2;1=2) = 1  3  5 2 n n(2! n ; 1) Tn
and
Pn(1=21=2) = 1  3 2 n5(n +(21)!
n + 1) U :
n

The polynomials Pn() are called ultraspherical polynomials.


5. There are also several classical examples of orthogonal polynomials on unbounded
intervals. In particular,
(0 1) w(x) = e;x Laguerre polynomials,
(0 1) w(x) = xe;x generalized Laguerre polynomials,
(;1 1) w(x) = e;x 2
Hermite polynomials.

Since Qn is orthogonal to every element of Pn;1, a fuller understanding of Qn will


follow from a characterization of the orthogonal complement of Pn;1. We begin with an
easy fact about least-squares approximations in inner product spaces.
Orthogonal Polynomials 115
Lemma. Let E be a nite dimensional subspace of an inner product space X , and let
x 2 X n E . Then, y 2 E is the least-squares approximation to x out of E (a.k.a. the
nearest point to x in E ) if and only if hx ; y yi = 0 for every y 2 E  that is, if and only
if (x ; y ) ? E .
Proof. We've taken E to be nite dimensional so that nearest points will exist since
X is an inner product space, nearest points must also be unique (see the exercises for a
proof that every inner product norm is strictly convex).]
((=) First suppose that (x ; y ) ? E . Then, given any y 2 E , we have
kx ; yk22 = k(x ; y ) + (y ; y)k22 = kx ; y k22 + ky ; yk22
because y ; y 2 E and, hence, (x ; y ) ? (y ; y). Thus, kx ; yk > kx ; y k unless
y = y  that is, y is the (unique) nearest point to x in E .
(=)) Suppose that x ; y is not orthogonal to E . Then, there is some y 2 E
with kyk = 1 such that  = hx ; y yi 6= 0. Now I claim that y + y 2 E is a better
approximation to x than y (and y +y 6= y , of course) that is, y is not the least-squares
approximation to x. To see this, we again compute:
kx ; (y + y)k22 = k(x ; y ) ; yk22 = h(x ; y ) ; y (x ; y ) ; yi
= kx ; y k22 ; 2 hx ; y yi + 2
= kx ; y k22 ; 2 < kx ; y k22:
Thus, we must have hx ; y yi = 0 for every y 2 E .
Lemma 1. (Integration by-parts.)
Zb X
n ib Zb
u(n)v k
= (;1) u; 1 ( n ;k ) v ( k ;1) + (;1) n uv(n):
a k=1 a a
Now if v is a polynomial of degree < n, then v(n) = 0 and we get:
Zb
Lemma 2. f 2 C  a b ] satis es f (x) p(x) w(x) dx = 0 for all polynomials p 2 Pn;1 if
a
and only if there is an n-times di erentiable function u on  a b ] satisfying fw = u(n) and
u(k)(a) = u(k)(b) = 0 for all k = 0 1 : : : n ; 1.
R
Proof. One direction is clear from Lemma 1: Given u as above, we would have ab fpw =
R b u(n)p = (;1)n R b up(n) = 0.
a a
Orthogonal Polynomials 116
R
So, suppose we have that ab fpw = 0 for all p 2 Pn;1. By integrating fw repeatedly,
choosing constants appropriately, we may dene a function u satisfying fw = u(n) and
u(k)(a) = 0 for all k = 0 1 : : : n ; 1. We want to show that the hypotheses on f force
u(k)(b) = 0 for all k = 0 1 : : : n ; 1.
Now Lemma 1 tells us that
Zb X
n
0 = fpw = (;1)k;1 u(n;k)(b) p(k;1) (b)
a k=1
for all p 2 Pn;1. But the numbers p(b) p0 (b) : : : p(n;1)(b) are completely arbitrary that
is (again by integrating repeatedly, choosing our constants as we please), we can nd
polynomials pk of degree k < n such that p(kk)(b) 6= 0 and p(kj)(b) = 0 for j 6= k. In
fact, pk (x) = (x ; b)k works just ne! In any case, we must have u(k)(b) = 0 for all
k = 0 1 : : : n ; 1.
Rolle's theorem tells us a bit more about the functions orthogonal to Pn;1:
Zb
Lemma 3. If w(x) > 0 in (a b), and if f 2 C  a b ] satis es f (x) p(x) w(x) dx = 0 for
a
all polynomials p 2 Pn;1, then f has at least n distinct zeros in the open interval (a b).
Proof. Write fw = u(n), where u(k)(a) = u(k)(b) = 0 for all k = 0 1 : : : n ; 1. In
particular, since u(a) = u(b) = 0, Rolle's theorem tells us that u0 would have at least one
zero in (a b). But then u0 (a) = u0 (c) = u0(b) = 0, and so u00 must have at least two zeros
in (a b). Continuing, we nd that fw = u(n) must have at least n zeros in (a b). Since
w > 0, the result follows.
Corollary. Let (Qn ) be the sequence of orthogonal polynomials associated to a given
weight w with w > 0 in (a b). Then, the roots of Qn are real, simple, and lie in (a b).
Lemma 4. If p is the least-squares approximation to f 2 C  a b ] out of Pn;1, and if
w > 0 in (a b), then f ; p has at least n distinct zeros in (a b).
Proof. The least-squares approximation satises h f ; p p i = 0 for all p 2 Pn;1 .

The sheer volume of literature on orthogonal polynomials and other \special func-
tions" is truly staggering. We'll content ourselves with the Legendre and the Chebyshev
Orthogonal Polynomials 117
polynomials. In particular, let's return to the problem of nding an explicit formula for
the Legendre polynomials. We could, as Rivlin does, use induction and a few observations
that simplify the basic recurrence formula (you're encouraged to read this see pp. 53{54).
Instead we'll give a simple (but at rst sight intimidating) formula that is of use in more
general settings than ours.
Lemma 2 (with w 1 and  a b ] = ;1 1 ]) says that if we want to nd a polynomial
f of degree n that is orthogonal to Pn;1, then we'll need to take a polynomial for u,
and this u will have to be divisible by (x ; 1)n (x + 1)n. (Why?) That is, we must have

Pn(x) = cn  Dn (x2 ; 1)n , where D denotes dierentiation, and where we nd cn by
evaluating the right-hand side at x = 1.
n n
X
Lemma 5. (Leibniz's formula) Dn (fg) = k n;k
k=0 k D (f ) D (g).
Proof.
;  ;  ;
Induction and the fact that n;1 + n;1 = n .
k ;1 k k
P ;
Consequently, Q(x) = Dn (x ; 1)n (x +1)n = nk=0 nk Dk (x ; 1)n Dn;k (x +1)n and
it follows that Q(1) = 2nn! and Q(;1) = (;1)n2nn!. This, nally, gives us the formula
discovered by Rodrigues in 1814:

Pn(x) = 2n1n! Dn (x2 ; 1)n :

The Rodrigues formula is quite useful (and easily generalizes to the Jacobi polynomials).
Observations
6. By Lemma 3, the roots of Pn are real, distinct, and lie in (;1 1).
7. (x2 ; 1)n = Pnk=0(;1)k ;nkx2n;2k . If we apply 2n1n! Dn and simplify, we get another
formula for the Legendre polynomials.
1 X
n=2] n2n ; 2k
Pn(x) = 2n k
(;1) k n xn;2k :
k=0
In particular, if n is even (odd), then Pn is even (odd). Notice, too, that if we let
Pen denote the polynomial given by the standard construction, then we must have
; 
Pn = 2;n 2nn Pen .
Orthogonal Polynomials 118
8. In terms of our standard recurrence formula, it follows that an = 0 (because xPn(x)2
is always odd). It remains to compute bn. First, integrating by parts,
Z1 i1 Z1
Pn(x)2 dx = xPn(x)2 ;1
; x  2Pn(x) Pn0 (x) dx
;1 ;1
or h Pn Pn i = 2 ; 2h Pn xPn0 i. But xPn0 = nPn + lower degree terms hence,
h Pn xPn0 i = nh Pn Pn i. Thus, h Pn Pn i = 2=(2n + 1). Using this and the fact
; 
that Pn = 2;n 2nn Pen, we'd nd that bn = n2=(4n2 ; 1). Thus,
    n2 
Pn+1 = 2;n;1 2nn++12 Pen+1 = 2;n;1 2nn++12 x Pen ; (4n2 ; 1) Pen;1

= 2nn++11 x Pn ; n + n P :
1 n;1
That is, the Legendre polynomials satisfy the recurrence formula
(n + 1) Pn+1(x) = (2n + 1) x Pn (x) ; n Pn;1 (x):
q
9. It follows from 8 that the sequence Pbn = 2n2+1 Pn is orthonormal on ;1 1 ].
10. The Legendre polynomials satisfy (1 ; x2 ) Pn00 (x) ; 2x Pn0 (x) + n (n + 1) Pn(x) = 0. If
we set u = (x2 ; 1)n that is, if u(n) = 2nn!Pn, note that u0 (x2 ; 1) = 2nxu. Now we
apply Dn+1 to both sides of this last equation (using Leibniz's formula) and simplify:

u(n+2)(x2 ; 1) + (n + 1) u(n+1) 2x + (n +2 1)n u(n) 2 = 2n u(n+1) x + (n + 1) u(n)

=) (1 ; x2 ) u(n+2) ; 2x u(n+1) + n (n + 1) u(n) = 0:

11. Through a series of exercises, similar in spirit to 10, Rivlin shows that jPn(x)j  1 on
;1 1 ]. See pp. 63{64 of Rivlin for details.
Given an orthogonal sequence, it makes sense to consider \generalized Fourier series"
relative to the sequence and to nd analogues of the Dirichlet kernel, Lebesgue's theorem,
and so on. In case of the Legendre polynomials we have the following:
Example. The \Fourier-Legendre" series for f 2 C ;1 1 ] is given by Pk h f Pbk i Pbk ,
where r Z1
Pbk = 2k 2+ 1 Pk and h f Pbk i = f (x) Pbk (x) dx:
;1
Orthogonal Polynomials 119
P
The partial sum operator Sn(f ) = nk=0h f Pbk i Pbk is a linear projection onto Pn and may
be written as Z1
Sn(f )(x) = f (t) Kn (t x) dt
;1
P
where Kn(t x) = nk=0 Pbk (t) Pbk (x). (Why?)
Since the Pbk 's are orthonormal, we have
X n
b 2 X
1
jh f Pk ij = kSn(f )k2 
2 kf k22 = jh f Pbk ij2
k=0 k=0
and so the generalized Fourier coecients h f Pbk i are square summable in particular,
h f Pbk i ! 0 as k ! 1. As in the case of Fourier series, the fact that the polynomials
(i.e., the span of the Pbk 's) are dense in C  a b ] implies that Sn(f ) actually converges to f
in the k  k2 norm. These same observations remain valid for any sequence of orthogonal
polynomials. The real question remains, just as with Fourier series, whether Sn(f ) is a
good uniform (or even pointwise) approximation to f .
If you're willing to swallow the fact that jPn(x)j  1, then
n r 2k + 1 r 2k + 1
X Xn 2
jKn (t x)j  2 2 = 2 (2k + 1) = (n +2 1) :
1
k=0 k=0
Hence, kSn(f )k  (n + 1)2 kf k. That is, the \Lebesgue numbers" for this process are at
most (n + 1)2 . The analogue of Lebesgue's theorem in this case would then read:

kf ; Sn(f )k  Cn2En(f ):
Thus, Sn(f ) f whenever n2En(f ) ! 0, and Jackson's theorem tells us when this will
happen: If f is twice continuously di erentiable, then the Fourier-Legendre series for f
converges uniformly to f on ;1 1 ].
The Christoel-Darboux Identity
It would also be of interest to have a closed form for Kn (t x). That this is indeed always
possible, for any sequence of orthogonal polynomials, is a very important fact.
Using our original notation, let (Qn ) be the sequence of monic orthogonal polynomials
corresponding to a given weight w, and let (Qbn ) be the orthonormal counterpart of (Qn )
Orthogonal Polynomials 120
p
in other words, Qn = nQbn , where n = h Qn Qn i . It will help things here if you recall
(from Observation 1 on page 112) that 2n = bn2n;1.
As with the Legendre polynomials, each f 2 C  a b ] is represented by the generalized
P
Fourier series k h f Qbk i Qbk , with partial sum operator
Zb
Sn(f )(x) = f (t) Kn (t x) w(t) dt
a
P
where Kn (t x) = nk=0 Qbk (t) Qbk (x). As before, Sn is a projection onto Pn in particular,
Sn(1) = 1 for every n.
Theorem. (Christoel-Darboux) The kernel Kn(t x) can be written
Xn b b b b
Qbk (t) Qbk (x) = n+1;n 1 Qn+1(t) Qn (xt) ;; xQn (t) Qn+1 (x) :
k=0
Proof. We begin with the standard recurrence formulas

Qn+1(t) = (t ; an ) Qn (t) ; bn Qn;1(t)


Qn+1(x) = (x ; an ) Qn (x) ; bnQn;1 (x)
(where b0 = 0). Multiplying the rst by Qn(x), the second by Qn(t), and subtracting:
Qn+1(t) Qn (x) ; Qn (t) Qn+1 (x)

= (t ; x) Qn (t) Qn (x) + bn Qn(t) Qn;1 (x) ; Qn (x) Qn;1 (t)

(and again, b0 = 0). If we divide both sides of this equation by 2n we get

;n 2 Qn+1(t) Qn (x) ; Qn(t) Qn+1 (x)


= (t ; x) Qbn (t) Qbn (x) + ;n;2 1 Qn(t) Qn;1 (x) ; Qn (x) Qn;1 (t) :
Thus, we may repeat the process arriving nally at
X n
;n 2 Qn+1(t) Qn (x) ; Qn(t) Qn+1 (x) = (t ; x) Qbn (t) Qbn (x):
k=0
The Christoel-Darboux identity now follows by writing Qn = nQbn , etc.
And we now have a version of the Dini-Lipschitz theorem:
Orthogonal Polynomials 121
Theorem. Let f 2 C  a b ] and suppose that at some point x0 in  a b ] we have
(i) f is Lipschitz at x0  that is, jf (x0 ) ; f (x)j  K jx0 ; xj for some constant K and all
x in  a b ] and
(ii) the sequence (Qbn (x0 )) is bounded.
P
Then, the series k h f Qbk i Qbk (x0 ) converges to f (x0 ).
Proof. First note that the sequence n+1 ; 1
n is bounded: Indeed, by Cauchy-Schwarz,
2n+1 = h Qn+1 Qn+1 i = h Qn+1 x Qn i
 kQn+1k2  k x k  kQnk2 = maxfjaj jbjg n+1n:
Thus, n+1;n 1  c = maxfjaj jbjg. Now, using the Christoel-Darboux identity,
Z b
Sn(f )(x0 ) ; f (x0 ) = f (t) ; f (x0 ) Kn(t x0 ) w(t) dt
a
Z b f (t) ; f (x0 )
= n+1;n 1 t ; x0 Qbn+1(t) Qbn (x0 ) ; Qbn (t) Qbn+1 (x0 ) w(t) dt
a

= n+1;n 1 h h Qbn+1 i Qbn (x0 ) ; h h Qbn i Qbn+1 (x0 )

where h(t) = (f (t) ; f (x0 ))=(t ; x0 ). But h is bounded (and continuous everywhere
except, possibly, at x0 ) by hypothesis (i), n+1;n 1 is bounded, and Qbn(x0 ) is bounded by
hypothesis (ii). All that remains is to notice that the numbers h h Qbn i are the generalized
Fourier coecients of the bounded, Riemann integrable function h, and so must tend to
zero (since, in fact, they're even square summable).
We end this section with a negative result, due to Nikolaev:
Theorem. There is no weight w such that every f 2 C  a b ] has a uniformly convergent
expansion in terms of orthogonal polynomials. In fact, given any w, there is always some
f for which kf ; Sn(f )k is unbounded.
Math 682 Problem Set: Orthogonal Polynomials 6/17/98

Throughout, w denotes a xed, positive (except possibly at nitely many points), Riemann
integrable weight function on  a b ], and we consider the inner product on C  a b ] dened
by
Zb
hf gi = f (x) g(x) w(x) dx
a
and the associated (strictly convex) norm
q Z b ! 1=2
kf k2 = hf f i = jf (x)j2 w(x) dx :
a

62. Prove that every inner product norm is strictly convex. Specically, let h i be an
p
inner product on a vector space X , and let kxk = hx xi be the associated norm.
Show that:
(a) kx + yk2 + kx ; yk2 = 2 (kxk2 + kyk2) for all x, y 2 X (the parallelogram identity).
 
(b) If kxk = r = kyk and if kx ; yk =
, then  x+2 y 2 = r2 ; (
=2)2 . In particular,
 x+y  < r whenever x 6= y.
2
We dene a sequence of polynomials (Qn ) which are mutually orthogonal, relative to w,
by setting Q0(x) = 1, Q1 (x) = x ; a0 , and

Qn+1(x) = (x ; an)Qn (x) ; bn Qn;1(x) for n  1, where


"
an = h x Qn Qn i h Qn Qn i and bn = h x Qn Qn;1 i h Qn;1 Qn;1 i
"

(and where x Qn is shorthand for the polynomial x Qn (x)).


63. Check that Qn is a monic polynomial of degree exactly n.
64. If (Pn ) is another sequence of orthogonal polynomials such that Pn has degree exactly
n, for each n, show that Pn = nQn for some n 6= 0. In particular, if Pn is a monic
polynomial, then Pn = Qn . Hint: Choose n so that Pn ; nQn 2 Pn;1 and note
that (Pn ; nQn) ? Pn;1. Conclude that Pn ; nQn = 0.]
65. Check that h x Qn Qn;1 i = h Qn Qn i, and conclude that bn > 0 for each n.
Orthogonal Polynomials 123
66. Given f 2 C  a b ] and n  0, prove that qn 2 Pn is the least-squares approximation
to f out of Pn (with respect to w) if and only if
Zb
hf ; qn p i = (f (x) ; qn (x)) p(x) w(x) dx = 0
a
for every p 2 Pn that is, if and only if (f ; qn ) ? Pn.
67. If f 2 C  a b ] but f 2= Pn, show that f ; qn changes sign at n + 1 (or more) points in
(a b). Hint: If not, show that there is a polynomial p 2 Pn such that (f ; qn ) p  0
(but (f ; qn ) p 6= 0) in (a b). Now appeal to the result in problem 66 to arrive at a
contradiction.]
68. Show that the least-squares approximation to f (x) = xn out of Pn;1 (relative to w)
is qn ;1(x) = xn ; Qn (x).
69. Show that Qn has n distinct, simple zeros in (a b). Hint: Combine 67 and 68.]
70. Given f 2 C  a b ], let pn denote the best uniform approximation to f out of Pn and
let qn denote the least-squares approximation to f out of Pn. Show that kf ; qn k2 
kf ; pnk2 and conclude that kf ; qn k2 ! 0 as n ! 1.
71. Show that the Chebyshev polynomials of the rst kind, (Tn ), and of the second kind,
(Un), satisfy the identities
Tn(x) = Un(x) ; x Un;1 (x) and (1 ; x2 ) Un;1 (x) = x Tn (x) ; Tn+1(x):
72. Show that the Chebyshev polynomials of the second kind, (Un), satisfy the recurrence
relation
Un+1(x) = 2x Un (x) ; Un;1(x) n  1
where U0 (x) = 1 and U1(x) = 2x. Please compare this with the recurrence relation
satised by the Tn's!]
Math 682 Gaussian Quadrature 6/23/98

Numerical integration, or quadrature, is the process of approximating the value of a denite


R
integral ab f (x) w(x) dx based only on a nite number of values or \samples" of f (much
like a Riemann sum). A linear quadrature formula takes the form
Zb X
n
f (x) w(x) dx  Ak f (xk )
a k=1
where the nodes (xk ) and the weights (Ak ) are at our disposal. (Note that both sides of
the formula are linear in f .)
Example. Consider the quadrature formula
Z1 nX
;1  
I (f ) = 1
f (x) dx  n f 2k2+n 1 = In(f ):
;1 k=;n
R
If f is continuous, then we clearly have In(f ) ! ;11 f as n ! 1. (Why?) But in the
particular case f (x) = x2 we have (after some simplication)

1 nX
;1 2k + 1  2 nX;1
In(f ) = n 2n = 2n3 (2k + 1)2 = 32 ; 6n1 2 :
1
k=;n k=0

That is, j In(f ) ; I (f ) j = 1=6n2. In particular, we would need to take n  130 to get
1=6n2  10;5, for example, and this would require that we perform over 250 evaluations
of f . We'd like a method that converges a bit faster! In other words, there's no shortage
of quadrature formulas|we just want faster ones.
One reasonable requirement for our proposed quadrature formula is that it be exact
for polynomials of low degree. As it happens, this is easy to come by.
Lemma 1. Given w(x) on  a b ] and nodes a  x1 <  < xn  b, there exist unique
weights A1 : : : An such that
Zb X
n
p(x) w(x) dx = Ai p(xi )
a i=1
Gaussian Quadrature 125
for all polynomials p 2 Pn;1.
Proof. Let `1 : : : `n be the Lagrange interpolating polynomials of degree n ; 1 associ-
P
ated to the nodes x1 : : : xn , and recall that we have p = ni=1 p(xi ) `i for all p 2 Pn;1.
Hence, Zb Zb
X
n
p(x) w(x) dx = p(xi ) `i(x) w(x) dx:
a i=1 a
R
That is, Ai = ab `i (x) w(x) dx works. To see that this is the only choice, suppose that
Zb Xn
p(x) w(x) dx = Bi p(xi )
a i=1
is exact for all p 2 Pn;1, and set p = `j :
Zb X
n
Aj = `j (x) w(x) dx = Bi `j (xi ) = Bj :
a i=1

The point here is that `1 : : : `n form a basis for Pn;1 and integration is linear thus,
integration is completely determined by its action on the basis|that is, by the n values
Ai = I (`i ), i = 1 : : : n.
T
Said another way, the n point evaluations
i (p) = p(xi ) satisfy Pn;1 \ ( ni=1 ker
i ) =
f0g, and it follows that every linear, real-valued function on Pn;1 must be a linear com-
bination of the
i 's. Here's why: Since the xi 's are distinct, Pn;1 may be identied with
Rn by way of the isomorphism p 7! (p(x1 ) : : : p(xn )). A linear, real-valued function on
Pn;1 must, then, correspond to some linear, real-valued function on Rn. In other words,
it's given by inner product against some xed vector (A1 : : : An ) in particular, we must
P
have I (p) = ni=1 Ai p(xi ).
In any case, we now have our quadrature formula: For f 2 C  a b ] we dene I (f ) =
Pn A f (x ), where A = R b ` (x) w(x) dx. But notice that the proof of our lastnresult
i=1 i i i a i
suggests an alternate way of writing our quadrature formula. Indeed, if Ln;1(f )(x) =
Pn f (xi )`i (x) is the Lagrange interpolating polynomial for f of degree n ; 1 based on
i=1
the nodes x1 : : : xn , then
Zb X
n Zb X
n
(Ln;1(f ))(x) w(x) dx = f (xi ) `i(x) w(x) dx = Ai f (xi ):
a i=1 a i=1
Gaussian Quadrature 126
In summary, In(f ) = I (Ln;1(f ))  I (f ) that is,
X
n Zb Zb
In(f ) = Ai f (xi ) = (Ln;1(f ))(x) w(x) dx  f (x) w(x) dx = I (f )
i=1 a a
where Ln;1 is the Lagrange interpolating polynomial of degree n ; 1 based on the nodes
x1 : : : xn . This formula is obviously exact for f 2 Pn;1.
It's easy to give a bound on jIn(f )j in terms of kf k indeed,
X
n X
n !
jIn(f )j  jAi jjf (xi )j  kf k jAi j :
i=1 i=1
By considering a norm one continuous function f satisfying f (xi ) = sgnAi for each i =
P
1 : : : n, it's easy to see that ni=1 jAi j is the smallest constant that works in this inequality.
P
In other words, n = ni=1 jAi j, n = 1 2 : : :, are the \Lebesgue numbers" for this process.
As with all previous settings, we want these numbers to be uniformly bounded.
If w(x) 1 and if f is n-times continuously dierentiable, we even have an error
estimate for our quadrature formula:
Z b Z b 
 f ; Ln;1(f )  Z b jf ; Ln;1(f )j  1 kf (n)k Z b Y
n
jx ; xi j dx
a a  a n! a i=1
(recall the Theorem on page 72 of \A Brief Introduction to Interpolation"). As it happens,
the integral on the right is minimized when the xi 's are taken to be the zeros of the
Chebyshev polynomial Un (see Rivlin, page 72).
The fact that a quadrature formula is exact for polynomials of low degree does not by
P
itself guarantee that the formula is highly accurate. The problem is that ni=1 Ai f (xi ) may
be estimating a very small quantity through the cancellation of very large quantities. So,
for example, a positive function may yield a negative result in this approximate integral.
This wouldn't happen if the Ai 's were all positive|and we've already seen how useful
positivity can be. Our goal here is to further improve our quadrature formula to have this
property. But we have yet to take advantage of the fact that the xi 's are at our disposal.
We'll let Gauss show us the way!
Theorem. (Gauss) Fix a weight w(x) on  a b ], and let (Qn ) be the canonical sequence of
orthogonal polynomials relative to w. Given n, let x1 : : : xn be the zeros of Qn (these all
Gaussian Quadrature 127
P R
lie in (a b)), and choose A1 : : : An so that the formula ni=1 Ai f (xi )  ab f (x) w(x) dx
is exact for polynomials of degree less than n. Then, in fact, the formula is exact for all
polynomials of degree less than 2n.
Proof. Given a polynomial P of degree less than 2n, we may divide: P = Qn R + S ,
where R and S are polynomials of degree less than n. Then,
Zb Zb Zb
P (x) w(x) dx = Qn (x) R(x) w(x) dx + S (x) w(x) dx
a a a
Zb
= S (x) w(x) dx since deg R < n
a
X
n
= Ai S (xi ) since deg S < n:
i=1
R
But P (xi ) = Qn(xi ) R(xi ) + S (xi ) = S (xi ), since Qn(xi ) = 0. Hence, ab P (x) w(x) dx =
Pn A P (x ) for all polynomials P of degree less than 2n.
i=1 i i

Amazing! But, well, not really: P2n;1 is of dimension 2n, and we had 2n numbers
x1 : : : xn and A1 : : : An to choose as we saw t. Said another way, the division algorithm
tells us that P2n;1 = QnPn;1 Pn;1. Since QnPn;1 ker(In), the action of In on P2n;1
is the same as its action on a \copy" of Pn;1.
In still other words, since any polynomial that vanishes at all the xi 's must be divisible
T
by Qn (and conversely), we have QnPn;1 = P2n;1 \ ( ni=1 ker
i ) = ker(In jP2n;1 ). Thus,
In \factors through" the quotient space P2n;1=QnPn;1 = Pn;1.
Also not surprising is that this particular choice of xi 's is unique.
Lemma 2. Suppose that a  x1 <  < xn  b and A1 : : : An are given so that the
R P
equation ab P (x) w(x) dx = ni=1 Ai P (xi ) is satis ed for all polynomials P of degree less
than 2n. Then, x1 : : : xn are the zeros of Qn .
Proof. Let Q(x) =
Qn (x ; x ). Then, for k < n, the polynomial Q  Qk has degree
i=1 i
n + k < 2n. Hence,
Zb X
n
Q(x) Qk (x) w(x) dx = Ai Q(xi ) Qk (xi ) = 0:
a i=1
Gaussian Quadrature 128
Since Q is a monic polynomial of degree n which is orthogonal to each Qk , k < n, we must
have Q = Qn. Thus, the xi 's are actually the zeros of Qn .

According to Rivlin, the phrase Gaussian quadrature is usually reserved for the specic
R R
quadrature formula whereby ;11 f (x) dx is approximated by ;11 (Ln;1(f ))(x) dx, where
Ln;1(f ) is the Lagrange interpolating polynomial to f using the zeros of the n-th Legendre
polynomial as nodes. (What a mouthful!) What is actually being described in our version
of Gauss's theorem is Gaussian-type quadrature.
Before computers, Gaussian quadrature was little more than a curiosity the roots
of Qn are typically irrational, and certainly not easy to come by. By now, though, it's
considered a standard quadrature technique. In any case, we still can't judge the quality
of Gauss's method without a bit more information.

Gaussian-type Quadrature
First, let's summarize our rather cumbersome notation.

orthogonal approximate
polynomial zeros weights integral
Q1 x(1)
1 A(1)
1 I1
Q2 x(2)
1 x2
(2) A1 A(2)
(2)
2 I2
Q3 x(3)
1 x2 x3
(3) (3) A1 A(3)
(3) (3)
2 A3 I3
.. .. .. ..
. . . .
P
Hidden here is the Lagrange interpolation formula Ln;1(f ) = ni=1 f (x(in) ) `(in;1), where
`(in;1) denote the Lagrange polynomials of degree n ; 1 based on x(1n) : : : x(nn). The n-th
quadrature formula is then
Zb X
n Zb
In(f ) = Ln;1(f )(x) w(x) dx = A(n)f (x(n) )
i i  f (x) w(x) dx
a i=1 a

which is exact for polynomials of degree less than 2n.


By way of one example, Hermite showed that A(kn) = =n for the Chebyshev weight
w(x) = (1 ; x2 );1=2 on ;1 1 ]. Remarkably, A(kn) doesn't depend on k! The quadrature
Gaussian Quadrature 129
formula in this case reads:
Z 1 f (x) dx  Xn  2 k ; 1

p  n f cos 2n  :
;1 1 ; x2 k=1
Or, if you prefer,
Z1  Xn  2 k ; 1
 2k ; 1
f (x) dx  n f cos 2n  sin 2n :
;1 k=1
(Why?) You can nd full details in Natanson's Constructive Function Theory, Vol. III.
The key result, due to Stieltjes, is that In is positive:
Lemma 3. A(1n) : : : A(nn) > 0 and Pni=1 A(in) = Rab w(x) dx.
Proof. The second assertion is obvious (since In (1) = I (1) ). For the rst, x 1  j  n
and notice that (`(n;1))2 is of degree 2(n ; 1) < 2n. Thus,
j
Z b h (n;1) i2 X
n h i
2
0 < h `(n;1) `(n;1) i
j j = `j (x) w(x) dx = A(n) `(n;1)(x(n))
i j i = A(jn)
a i=1
because `(jn;1)(x(in)) =
ij .
Now our last calculation is quite curious what we've shown is that
Zb Z b h (n;1) i2
A(n) j = `(n;1)(x) w(x) dx
j = `j (x) w(x) dx:
a a
The same calculation as above also proves
Corollary. h `(in;1) `(jn;1) i = 0 for i 6= j .
Since A(1n) : : : A(nn) > 0, it follows that In(f ) is positive that is, In(f )  0 whenever
f  0. The second assertion in Lemma 3 tells us that the In's are uniformly bounded:
X
n Zb
jIn(f )j  kf k A(n)
i = kf k w(x) dx
i=1 a
R
and this is the same bound that holds for I (f ) = ab f (x) w(x) dx itself. Given all of this,
proving that In(f ) ! I (f ) is a piece of cake. The following result is again due to Stieltjes
("a la Lebesgue).
Gaussian Quadrature 130

R b
Theorem. In the above notation, jIn(f );I (f )j  2 a w(x) dx E2n;1(f ). In particular,
In(f ) ! I (f ) for evey f 2 C  a b ].
Proof. Let p be the best uniform approximation to f out of P2n;1 . Then, since
In(p ) = I (p ), we have
jI (f ) ; In(f )j  jI (f ; p )j + jIn(f ; p )j
Zb X n
 kf ; p k w(x) dx + kf ; p k A(in)
a i=1
Zb Zb
= 2 kf ; p k w(x) dx = 2E2n;1(f ) w(x) dx:
a a

Computational Considerations
You've probably been asking yourself: \How do I nd the Ai 's without integrating?" Well,
rst let's recall the denition: In the case of Gaussian-type quadrature we have
Zb Zb Qn (x)
A(n)
i = `(n;1)(x) w(x) dx
i = w(x) dx
a a (x ; x(i n) ) Q0n (x(i n) )
(because \W " is the same as Qn here|the xi 's are the zeros of Qn ). Next, consider the
function Z b Qn (t) ; Qn (x)
'n(x) = t;x w(t) dt:
a
Since t ; x divides Qn(t) ; Qn (x), note that 'n is actually a polynomial (of degree at most
n ; 1 ) and that
Z b Qn (t)
'n (x(n))
i = (n) w(t) dt = A(in)Q0n (x(in)):
a t ; xi
Now Q0n (x(in)) is readily available we just need to compute 'n(x(in)).
Claim. The 'n's satisfy the same recurrence formula as the Qn's
'n+1(x) = (x ; an )'n(x) ; bn'n;1(x) n  1
but with di erent starting values
Zb
'0(x) 0 and '1(x) w(x) dx:
a
Gaussian Quadrature 131
Proof. The formulas for '0 and '1 are obviously correct, since Q0 (x) 1 and Q1 (x) =
x ; a0 . We only need to check the recurrence formula itself.
Z b Qn+1(t) ; Qn+1(x)
'n+1(x) = t;x w(t) dt
a
Z b (t ; an) Qn (t) ; bnQn;1 (t) ; (x ; an ) Qn (x) + bnQn;1 (x)
= t;x w(t) dt
a
Z b Qn (t) ; Qn (x) Z b Qn;1 (t) ; Qn;1 (x)
= (x ; an) t;x w(t) dt ; bn t;x w(t) dt
a a
= (x ; an) 'n (x) ; bn 'n;1(x)
R
since ab Qn(t) w(t) dt = 0.
Of course, the derivatives Q0n satisfy a recurrence relation of sorts, too:
Q0n+1(x) = Qn(x) + (x ; an) Q0n (x) ; bn Q0n;1(x):
Q
But Q0n(x(in)) can be computed without knowing Q0n(x). Indeed, Qn(x) = ni=1(x ; x(in)),
Q
so we have Q0n (x(in)) = j6=i(xi(n) ; x(jn)).
The weights A(in), or Christo el numbers, together with the zeros of Qn are tabulated
in a variety of standard cases. See, for example, Handbook of Mathematical Functions with
Formulas, Graphs, and Tables, by Abramowitz and Stegun, eds. In practice, of course, it's
enough to tabulate data for the case  a b ] = ;1 1 ].
Applications to Interpolation
Although Ln(f ) isn't typically a good uniform approximation to f , if we interpolate at the
zeros of an orthogonal polynomial Qn+1, then Ln(f ) will be a good approximation in the
k  k1 or k  k2 norm generated by the corresponding weight w. Specically, by rewording
R
our earlier results, it's easy to get estimates for each of the errors ab jf ; Ln(f )j w and
R b jf ; L (f )j2 w. We use essentially the same notation as before, except now we take
a n
nX
+1 ; 
Ln(f ) = f x(in+1) `(in)
i=1
where x(1n+1) : : : x(nn+1+1) are the roots of Qn+1 and `(in) is of degree n. This leads to a
quadrature formula that's exact on polynomials of degree less than 2(n + 1).
Gaussian Quadrature 132
As we've already seen, `(1n) : : : `(nn+1
) are orthogonal and so kL (f )k may be computed
n 2
exactly.

1=2
Lemma. kLn(f )k2  kf k Rab w(x) dx .
Proof. Since Ln (f )2 is a polynomial of degree  2n < 2(n + 1), we have
Zb
kLn(f )k22 = Ln(f )]2 w(x) dx
a
nX "nX #2
=
+1
A(jn+1)
+1 ;  ;
f x(n+1) `(n) x(n+1)

i i j
j =1 i=1
nX +1 h; i2
= A(jn+1) f x(jn+1)
j =1
nX
+1 Zb
 kf k2 A(n+1)
j = kf k2 w(x) dx:
j =1 a


R 1=2
Please note that we also have kf k2  kf k ab w(x) dx  that is, this same estimate
holds for kf k2 itself.
As usual, once we have an estimate for the norm of an operator, we also have an
analogue of Lebesgue's theorem.

R b 1=2
Theorem. kf ; Ln(f )k2  2 a w(x) dx En(f ).
Proof. Here we go again! Let p be the best uniform approximation to f out of Pn and
use the fact that Ln(p ) = p to see that:
kf ; Ln(f )k2  kf ; pk2 + kLn(f ; p )k2
Z b ! 1=2 Z b !1=2
 kf ; pk w(x) dx + kf ; p k w(x) dx
a a
Z b
!1=2
= 2En(f ) w(x) dx :
a

Hence, if we interpolate f 2 C  a b ] at the zeros of (Qn ), then Ln(f ) ! f in k  k2


norm. The analogous result for the k  k1 norm is now easy:
Gaussian Quadrature 133
R b
R b
Corollary. a jf (x) ; Ln(f )(x)j w(x) dx  2 a w(x) dx En(f ).
Proof. We apply the Cauchy-Schwarz inequality:
Zb Zb p p
jf (x) ; Ln(f )(x)j w(x) dx = jf (x) ; Ln(f )(x)j w(x) w(x) dx
a
aZ b !1=2 Z b ! 1 =2
 jf (x) ; Ln(f )(x)j2 w(x) dx w(x) dx
a a
Zb
 2En(f ) w(x) dx:
a
R R
Essentially the same device allows an estimate of ab f (x) dx in terms of ab f (x) w(x) dx
(which may be easier to compute).
Corollary. If Rab w(x);1 dx is nite, then
Zb Zb p
jf (x) ; Ln(f )(x)j dx = jf (x) ; Ln(f )(x)j w(x) p 1 dx
a a w(x)
Z b !1=2 Z b !1=2
 jf (x) ; Ln (f )(x)j2 w(x) dx 1 dx
a a w(x)
Z b ! 1=2 Z b !1=2
 2En(f ) w(x) dx 1 dx :
a a w(x)

In particular, the Chebyshev weight satises


Z 1 dx Z1p
p = and 1 ; x2 dx = 2 :
;1 1 ; x2 ;1
Thus, interpolation at the zeros of the Chebyshev polynomials (of the rst kind) would
provide good, simultaneous approximation in each of the norms k  k1, k  k2, and k  k.
The Moment Problem
Given a positive, continuous weight function w(x) on  a b ], the number
Zb
k = xk w(x) dx
a
is called the k-th moment of w. In physical terms, if we think of w(x) as the density of a
thin rod placed on the interval  a b ], then 0 is the mass of the rod, 1 =0 is its center of
Gaussian Quadrature 134
mass, 2 is its moment of inertia (about 0), and so on. In probabilistic terms, if 0 = 1,
then w is the probability density function for some random variable, 1 is the expected
value, or mean, of this random variable, and 2 ; 21 is its variance. The moment problem
(or problems, really) concern the inverse procedure. What can be measured in real life are
the moments|can the moments be used to nd the density function?
Questions: Do the moments determine w? Do dierent weights have dierent mo-
ment sequences? If we knew the sequence (k ), could we nd w? How do we tell if a
given sequence (k ) is the moment sequence for some positive weight? Do \special"
weights give rise to \special" moment sequences?
Now we've already answered one of these questions: The Weierstrass theorem tells us
that dierent weights have dierent moment sequences. Said another way, if
Zb
xk w(x) dx = 0 for all k = 0 1 2 : : :
a
R
then w 0. Indeed, by linearity, this says that ab p(x) w(x) dx = 0 for all polynomials p
R
which, in turn, tells us that ab w(x)2 dx = 0. (Why?) The remaining questions are harder
to answer. We'll settle for simply stating a few pertinent results.
Given a sequence of numbers (k ), we dene the n-th dierence sequence ($nk ) by
$0k = k
$1k = k ; k+1
$nk = $n;1k ; $n;1k+1 n  1:
For example, $2k = k ; 2k+1 + k+2. More generally, induction will show that
Xn n
$ k = (;1) i k+i :
n i
i=0
In the case of a weight w on the interval  0 1 ], this sum is easy to recognize as an integral.
Indeed,
Z1 X n n Z 1 Xn n
x (1 ; x) w(x) dx = (;1) i
k n i x w(x) dx = (;1) i k+i:
k + i i
0 i=0 0 i=0
In particular, if w is nonnegative, then we must have $nk  0 for every n and k. This
observation serves as motivation for
Gaussian Quadrature 135
Theorem. The following are equivalent:
(a) (k ) is the moment sequence of some nonnegative weight function w on  0 1 ].
(b) $nk  0 for every n and k.
(c) a0 0 + a11 +  + an n  0 whenever a0 + a1 x +  + anxn  0 for all 0  x  1.
The equivalence of (a) and (b) is due to Hausdor. A real sequence satisfying (b) or
(c) is sometimes said to be positive de nite.
Now dozens of mathematicians worked on various aspects of the moment problem:
Chebyshev, Markov, Stieltjes, Cauchy, Riesz, Fr"echet, and on and on. And several of
R
them, in particular Cauchy and Stieltjes, noticed the importance of the integral ab wx;(tt) dt
in attacking the problem. (Compare this expression to Cauchy's integral formula.) It was
Stieltjes, however, who gave the rst complete solution to such a problem|developing his
R (t) ), his own variety of continued fractions, and planting
own integral (by considering ab dWx;t
the seeds for the study of orthogonal polynomials while he was at it! We will attempt to
at least sketch a few of these connections.
To begin, let's x our notation: To simpliy things, we suppose that we're given a
nonnegative weight w(x) on a symmetric interval ;a a ], and that all of the moments of
w are nite. We will otherwise stick to our usual notations for (Qn ), the Gaussian-type
quadrature formulas, and so on. Next, we consider the moment-generating function:
Z a w(t) X
1 
k
Lemma. If x 2= ;a a ], then x ; t dt = k+1 .
;a k=0 x
1 1 1 X1 tk
Proof.
x ; t = x  1 ; (t=x) = k=0 xk+1 , and the sum converges uniformly because
jt=xj  a=jxj < 1. Now just multiply by w(t) and integrate.
By way of an example, consider the Chebyshev weight w(x) = (1 ; x2 );1=2 on ;1 1 ].
For x > 1 we have
Z1 dtp  ;set t = 2u=(1 + u2)
= p
;1 (x ; t) 1 ; t2 x2 ; 1

 1 ;1=2
= x 1 ; x2
Gaussian Quadrature 136
 
= x 1 + 12  x12 + 21  23  2!1  x14 + 
using the binomial formula. Thus, we've found all the moments:
Z 1 dt
0 = p = 
;1 1 ; t2
Z 1 t2n;1dt
2n;1 = p = 0
;1 1 ; t2
Z 1 t2ndt
2n = p 2 = 1  3  5 2nn(2! n ; 1) :
;1 1 ; t
R
Stieltjes proved much more: The integral ;aa wx;(tt) dt is actually an analytic function of
x in C n ;a a ]. In any case, since x 2= ;a a ], we know that x;1 t is continuous on ;a a ].
In particular, we can apply our quadrature formulas (and Stieltjes theorem, p. 132) to
write Z a w(t) Xn A(n)
i
x ; t dt = nlim (n)
;a !1 i=1 x ; xi
and these sums are recognizable:
X
n A(n) 'n(x) :
Lemma. i = Q
( n)
i=1 x ; xi n (x)

Proof. Since 'n has degree < n and 'n (x(i n) ) 6= 0 for any i, we may appeal to partial-
fractions to write
'n(x) = 'n(x) Xn ci
Qn (x) ( n ) ( n ) = (n )
(x ; x1 )  (x ; xn ) i=1 x ; xi
where ci is given by
' n ( x )
 'n (x (n) )
ci = Q (x) (x ; xi ) (n) = 0 i(n) = A(in):
(n )
n x=xi Qn(xi )

Now here's where the continued fractions come in: Stieltjes recognized the fact that
'n+1(x) = b0
Qn+1(x) (x ; a0 ) ; (x ; a ) ; b1
1 ...
; (x ;bnan)
Gaussian Quadrature 137
R
(which can be proved by induction), where b0 = ab w(t) dt. More generally, induction will
show that the n-th convergent of a continued fraction can be written as
An = p1
Bn q1 ; q ; p2
2 ...
; pqnn
by means of the recurrence formulas
A0 = 0 B0 = 1
A1 = p1 B1 = q1
An = qnAn;1 + pnAn;2 Bn = qnBn;1 + pnBn;2
where n = 2 3 4 : : :. Please note that An and Bn satisfy the same recurrence formula,
but with dierent starting values (as is the case with 'n and Qn ).
Again using the Chebyshev weight as an example, for x > 1 we have
 Z1 dtp 
p2 = =
x ;1 ;1 (x ; t) 1 ; t 2
x; 1=2
x ; 1=41=4
x; .
..
since an = 0 for all n, b1 = 1=2, and bn = 1=4 for n  2. In other words, we've just found
a continued fraction expansion for (x2 ; 1);1=2 .
Appendix
Finally, here is a brief review of some of the fancier bits of linear algebra used in this
chapter. To begin, we discuss sums and quotients of vector spaces.
Each subspace M of a nite-dimensional X induces an equivalence relation on X by
x  y () x ; y 2 M:
Standard arguments show that the equivalence classes under this relation are the cosets
(translates) x + M , x 2 X . That is,
x + M = y + M () x ; y 2 M () x  y:
Gaussian Quadrature 138
Equally standard is the induced vector arithmetic

(x + M ) + (y + M ) = (x + y) + M and (x + M ) = (x) + M

where x, y 2 X and  2 R. The collection of cosets (or equivalence classes) is a vector


space under these operations it's denoted X=M and called the quotient of X by M . Please
note the the zero vector in X=M is simply M itself.
Associated to the quotient space X=M is the quotient map q(x) = x + M . It's easy
to check that q : X ! X=M is a vector space homomorphism with kernel M . (Why?)
Next we recall the isomorphism theorem.
Theorem. Let T : X ! Y be a linear map between nite-dimensional vector spaces, and
let q : X ! X= ker T be the quotient map. Then, there exists a (unique, into) isomorphism
S : X= ker T ! Y satisfying S (q(x)) = T (x) for every x 2 X .
Proof. Since q maps onto X= ker T , it's \legal" to dene a map S : X= ker T ! Y by
setting S (q(x)) = T (x) for x 2 X . Please note that S is well-dened since
T (x) = T (y) () T (x ; y) = 0 () x ; y 2 ker T
() q(x ; y) = 0 () q(x) = q(y):
It's easy to see that S is linear and so precisely the same argument as above shows that S
is one-to-one.
Corollary. Let T : X ! Y be a linear map between nite-dimensional vector spaces.
Then, the range of T is isomorphic to X= ker T .
Math 682 The Muntz Theorems

For several weeks now we've taken advantage of the fact that the monomials 1 x x2 : : :
have dense linear span in C  0 1 ]. What, if anything, is so special about these particular
P
powers? How about if we consider polynomials of the form nk=0 ak xk2  are they dense,
too? More generally, what can be said about the span of a sequence of monomials (x n ),
where 0 < 1 < 2 < ? Of course, we'll have to assume that 0  0, but it's not hard
P
to see that we will actually need 0 = 0, for otherwise each of the polynomials nk=0 ak x k
vanishes at x = 0 (and so has distance at least 1 from the constant 1 function, for example).
If the n's are integers, it's also clear that we'll have to have n ! 1 as n ! 1. But
what else is needed? The answer comes to us from M untz in 1914. (You sometimes see
the name Otto Sz"asz associated with M untz's theorem, because Sz"asz proved a similar
theorem at nearly the same time (1916).)
Theorem. Let 0  0 < 1 < 2 < . Then, the functions (x n ) have dense linear span
P ;1 = 1.
in C  0 1 ] if and only if 0 = 0 and 1
n=1 n

What M untz is trying to tell us here is that the n's can't get big too quickly. In
P
particular, the polynomials of the form nk=0 ak xk2 are evidently not dense in C  0 1 ]. On
the other hand, the n's don't have to be unbounded indeed, M untz's theorem implies an
earlier result of Bernstein from 1912: If 0 < 1 < 2 <  < K (some constant), then
1 x1 x2 : : : have dense linear span in C  0 1 ].
Before we give the proof of M untz's theorem, let's invent a bit of notation: We write
(X
n )
Xn = ak x k : a0 : : : an 2 R
k=0
and, given f 2 C  0 1 ], we write dist(f Xn ) to denote the distance from f to the space
S X . That is, X is the linear span of
spanned by 1 x 1 : : : x n . Let's also write X = 1 n=0 n
the entire sequence (x n )1
n=0 . The question here is whether X is dense, and we'll address
the problem by determining whether dist(f Xn ) ! 0, as n ! 1, for every f 2 C  0 1 ].
If we can show that each (xed) power xm can be uniformly approximated by a linear
combination of x n 's, then the Weierstrass theorem will tell us that X is dense in C  0 1 ].
Muntz Theorems 140
(How?) Surprisingly, the numbers dist(xm Xn) can be estimated. Our proof won't give
P ;1 = 1 comes into the
the best estimate, but it will show how the condition 1n=1 n
picture.
Yn  m

Lemma. Let m > 0. Then, dist(xm Xn )  1 ; .
k=1 k
Proof. We may certainly assume that m 6= n for any n. Given this, we inductively
dene a sequence of functions by setting P0(x) = xm and
Z1
Pn(x) = (n ; m) x n t;1; n Pn;1(t) dt
x
for n  1. For example,
Z1 1
P1(x) = (1 ; m) x 1 t;1; 1 tm dt = ;x 1 tm; 1 x = x ;x :
m 1
x
P
By induction, each Pn is of the form xm ; nk=0 ak x k for some scalars (ak ):
Z1
Pn (x) = (n ; m) x n t;1; n Pn;1(t) dt
Zx1 " nX
;1 #
= (n ; m) x n t;1; n tm ; ak t k dt
x k=0
nX
;1 ak
= xm ; x n + (n ; m) n ; k (x ; x ):
k n
k=0
Finally, kP0k = 1 and kPnk  j1 ; mn jkPn;1k, because
Z1  m 
jn ; mj x n t;1; n dt = jn ; mj (1 ; x n )
n  1 ;   :
x n
Thus, n  
Y m
1 ;  :
dist(xm Xn )  kPnk  k
k=1

The preceding result is due to v. Golitschek. A slightly better estimate, also due to
Q
v. Golitschek (1970), is dist(xm Xn)  nk=1 jmm+; kk j .
Now a well-known fact about innite products is that, for positive a 's, the product
Q1 1 ; ak  diverges (to 0) if and only if the series P1 ak diverges (tok 1) if and only
k=1 k=1
Muntz Theorems 141
Q   Q  
if the product 1 k=11 + ak  diverges (to 1). In particular, k=1 1 ; k  ! 0 if and only
n m
P
if nk=1 1k ! 1. That is, dist(xm Xn ) ! 0 if and only if 1
P 1 = 1. This proves the
k=1 k
\backward" direction of M untz's theorem.
We'll prove the \forward" direction of M untz's theorem by proving a version of M untz's
theorem for the space L2 0 1 ]. For our purposes, L2 0 1 ] denotes the space C  0 1 ]
endowed with the norm Z 1 1=2
kf k2 = jf (x)j dx
2
0
although our results are equally valid in the \real" space L2 0 1 ] (consisting of square-
integrable, Lebegue measurable functions). In the latter case, we no longer need to assume
that 0 = 0, but we do need to assume that each n > ;1=2 (in order that x2 n be
integrable on  0 1 ]).
Remarkably, the distance from f to the span of x 0 x 1 : : : x n can be computed
exactly in the L2 norm. For this we'll need some more notation: Given linearly independent
vectors f1 : : : fn in an inner product space, we call
 h f f i  h f f i 
 1 . 1 . 1 n 
..  = det h fi fj i
G(f1 : : : fn ) =  .. ..
 h fn f1 i  h fn fn i 
. ij

the Gram determinant of the fk 's.


Lemma. (Gram) Let F be a nite dimensional subspace of an inner product space V ,
and let g 2 V n F . Then, the distance d from g to F is given by
d 2 = GG(g(ff1: :: :: : ffn) )
1 n
where f1 : : : fn is any basis for F .
Proof. Let f =
Pn a f be the best approximation to g out of F . Then, since g ; f
i=1 i i
is orthogonal to F , we have, in particular, h fj g i = h fj g i for all j  that is,
X
n
ai h fj fi i = h fj g i j = 1 : : : n: ( )
i=1
Since this system of equations always has a unique solution a1 : : : an, we must have
G(f1 : : : fn) 6= 0 (and so the formula in our Lemma at least makes sense).
Muntz Theorems 142
Next, notice that

d 2 = h g ; f g ; f i = h g ; f g i = h g g i ; h g f i
in other words,
X
n
d2 + aih g fi i = h g g i: ( )
i=1
Now consider ( ) and ( ) as a system of n +1 equations in the n +1 unknowns a1 : : : an,
and d 2 in matrix form we have
2 1 h g f1 i  h g fn i 32 d2 3 2 hg g i 3
66 0 h f1 f1 i  h f1 fn i 77 66 a1 77 66 h f1 g i 77
64 .. . . .. .. 75 64 .. 75 = 64 .. 75 :
. . . . . .
0 h fn f1 i  h fn fn i an h fn g i
Solving for d 2 using Cramer's rule gives the desired result expanding along the rst
column shows that the matrix of coecients has determinant G(f1 : : : fn ), while the
matrix obtained by replacing the \d column" by the right-hand side has determinant
G(g f1 : : : fn).
Note: By our last Lemma and induction, every Gram determinant is positive!
In what follows, we will still use Xn to denote the span of x 0 : : : x n , but now we'll
write dist 2 (f Xn) to denote the distance from f to Xn in the L2 norm.
Theorem. Let m, k > ;1=2 for k = 0 1 2 : : :. Then,
1 Yn jm ;  j
k :
dist 2(xm Xn ) = p
2m + 1 k=0 m + k + 1

Proof. The proof is based on a determinant formula due to Cauchy:


 1  1 
Y  a +. b .
1 1 a1 +bn
.. 
 Y
(ai + bj )  .. .. .  = (ai ; aj )(bi ; bj ):
ij  1 1 
 i>j
an +b 
1 an +bn
If we consider each of the ai 's and bj 's as \variables," then each side of the equation is a
polynomial in a1 : : : an b1 : : : bn. (Why?) Now the right-hand side clearly vanishes if
Muntz Theorems 143
ai = aj or bi = bj for some i 6= j , but the left-hand side also vanishes in any of these cases.
Thus, the right-hand side divides the left-hand side. But both polynomials have degree
n ; 1 in each of the ai 's and bj 's. (Why?) Thus, the left-hand side is a constant multiple
of the right-hand side. To show that the constant must be 1, write the left-hand side as
 1 aa1++bb1  aa1++bb1 
 n 
 a22+bn2 
1 2 1

Y  a22+b12 1
a + b a + b
(ai + bj )  .. ... .. 
i6=j  . . 
 aann++bbn1  aann++bnbn;1 1 
and now take the limit as b1 ! ;a1 , b2 ! ;a2, etc. The expression above tends to
Q (ai ; aj ), as does the right-hand side of Cauchy's formula.
i6=j
R
Now, h xp xq i = 01 xp+q dx = p+1q+1 for p, q > ;1=2, so
  ! Q
1 (i ; j )2
0
G(x : : : x ) = det  +  + 1
n = Q ( +  + 1)
i>j
i j ij ij i j
with a similar formula holding for G(xm x 0 : : : x n ). Substituting these expressions into
our distance formula and taking square roots nishes the proof.
Now we can determine exactly when X is dense in L2 0 1 ]. For easier comparison to
the C  0 1 ] case, we suppose that the n 's are nonnegative.
Theorem. Let 0  0 < 1 < 2 < . Then, the functions (x n ) have dense linear span
P ;1 = 1.
in L2 0 1 ] if and only if 1
n=1 n
P1 Q   Q  
n=1 n < 1, then each of the products k=1 1 ; k  and k=1 1 + k 
Proof. If 1 n m n (m+1)

converges to some nonzero limit for any m not equal to any k . Thus, dist 2(xm Xn) 6! 0,
as n ! 1, for any m 6= k , k = 0 1 2 : : :. In particular, the functions (x n ) cannot have
dense linear span in L2 0 1 ].
P 1 = 1, then Qn 1 ; m  diverges to 0 while Qn 1 + (m+1) 
Conversely, if 1n=1 n k=1 k k=1 k
diverges to +1. Thus, dist 2(x Xn) ! 0, as n ! 1, for every m > ;1=2. Since the
m
polynomials are dense in L2 0 1 ], this nishes the proof.
Finally, we can nish the proof of M untz's theorem in the case of C  0 1 ]. Suppose
that the functions (x n ) have dense linear span in C  0 1 ]. Then, since kf k2  kf k, it
Muntz Theorems 144
follows that the functions (x n ) must also have dense linear span in L2 0 1 ]. (Why?)
Hence, 1
P 1 = 1.
n=1 n
Just for good measure, here's a second proof of the \backward" direction for C  0 1 ]
P 1 = 1, and let m  1. Then,
based on the L2 0 1 ] version. Suppose that 1 n=1 n
 Xn   Z x n a Zx
X 
 xm ; ak x k  =  m tm;1 dt ; k t k ;1 dt 
1
 k=0  0 k=0 k 0
Z 1  1 X n a 
  m tm;1 ; k t k;1  dt
0 k=0 k
0Z  
 11=2
Xn 2
 @  m tm;1 ; ak t k;1  dtA :
1 1
0 k=0 k

Now the functions (x k ;1) have dense linear span in L  0 1 ] because


P
n >1 n ;1 = 1.
2 1
Thus, we can nd ak 's so that the right-hand side of this inequality is less than some ".
Since this estimate is independent of x, we've shown that
 Xn 
max  xm ; ak x k  < ":
0 x 1 k=0

Application. Let 0 = 0 < 1 < 2 <  with P1n=1 ;n 1 = 1, and let f be a continuous
function on  0 1) for which c = tlim
!1 f (t) exists. Then, f can be uniformly approximated
by nite linear combinations of the exponentials (e; nt )1n=0 .

Proof. The function g (x) = f (; log x), for 0 < x  1, and g (0) = c, is continuous on
 0 1 ]. In other words, g(e;t) = f (t) for each 0  t < 1. Thus, given " > 0, we can nd
n and a0 : : : an such that
 Xn   Xn 
max g(x) ; ak x k 
 = max f (t) ; ak e; kt  < ":

0 x 1 k=0 0 t<1 k=0
Math 682 The Stone-Weierstrass Theorem
To begin, an algebra is a vector space A on which there is a multiplication (f g) 7! fg
(from A  A into A) satisfying
(i) (fg)h = f (gh), for all f , g, h 2 A
(ii) f (g + h) = fg + fh and (f + g)h = fg + gh, for all f , g, h 2 A
(iii) (fg) = (f )g = f (g), for all scalars  and all f , g 2 A.
In other words, an algebra is a ring under vector addition and multiplication, together
with a compatible scalar multiplication. The algebra is commutative if
(iv) fg = gf , for all f , g 2 A.
And we say that A has an identity element if there is a vector e 2 A such that
(v) fe = ef = f , for all f 2 A.
In case A is a normed vector space, we also require that the norm satisfy
(vi) kfgk  kf kkgk
(this simplies things a bit), and in this case we refer to A as a normed algebra. If a
normed algebra is complete, we refer to it as a Banach algebra. Finally, a subset B of an
algebra A is called a subalgebra (of A) if B is itself an algebra (under the same operations)
that is, if B is a (vector) subspace of A which is closed under multiplication.
If A is a normed algebra, then all of the various operations on A (or A  A) are
continuous. For example, since
kfg ; hkk = kfg ; fk + fk ; hkk  kf k kg ; kk + kkk kf ; hk
it follows that multiplication is continuous. (How?) In particular, if B is a subspace (or
subalgebra) of A, then B, the closure of B, is also a subspace (or subalgebra) of A.
Examples
1. If we dene multiplication of vectors \coordinatewise," then Rn is a commutative
Banach algebra with identity (the vector (1 : : : 1)) when equipped with the norm
kxk1 = max jxi j.
1 i n
Stone-Weierstrass 146
2. It's not hard to identify the subalgebras of Rn among its subspaces. For example, the
subalgebras of R2 are f(x 0) : x 2 Rg, f(0 y) : y 2 Rg, and f(x x) : x 2 Rg, along
with f(0 0)g and R2.
3. Given a set X , we write B(X ) for the space of all bounded, real-valued functions on
X . If we endow B(X ) with the sup norm, and if we dene arithmetic with functions
pointwise, then B(X ) is a commutative Banach algebra with identity (the constant
1 function). The constant functions in B(X ) form a subalgebra isomorphic (in every
sense of the word) to R.
4. If X is a metric (or topological) space, then we may consider C (X ), the space of all
continuous, real-valued functions on X . If we again dene arithmetic with functions
pointwise, then C (X ) is a commutative algebra with identity (the constant 1 function).
The bounded, continuous functions on X , written Cb(X ) = C (X ) \ B(X ), form a
closed subalgebra of B(X ). If X is compact, then Cb(X ) = C (X ). In other words,
if X is compact, then C (X ) is itself a closed subalgebra of B(X ) and, in particular,
C (X ) is a Banach algebra with identity.
5. The polynomials form a dense subalgebra of C  a b ]. The trig polynomials form a
dense subalgebra of C 2 . These two sentences summarize Weierstrass's two classical
theorems in modern parlance and form the basis for Stone's version of the theorem.

Using this new language, we may restate the classical Weierstrass theorem to read:
If a subalgebra A of C  a b ] contains the functions e(x) = 1 and f (x) = x, then A is
dense in C  a b ]. Any subalgebra of C  a b ] containing 1 and x actually contains all the
polynomials thus, our restatement of Weierstrass's theorem amounts to the observation
that any subalgebra containing a dense set is itself dense in C  a b ].
Our goal in this section is to prove an analogue of this new version of the Weierstrass
theorem for subalgebras of C (X ), where X is a compact metric space. In particular, we
will want to extract the essence of the functions 1 and x from this statement. That is, we
seek conditions on a subalgebra A of C (X ) that will force A to be dense in C (X ). The
key role played by 1 and x, in the case of C  a b ], is that a subalgebra containing these
two functions must actually contain a much larger set of functions. But since we can't
be assured of anything remotely like polynomials living in the more general C (X ) spaces,
Stone-Weierstrass 147
we might want to change our point of view. What we really need is some requirement on
a subalgebra A of C (X ) that will allow us to construct a wide variety of functions in A.
And, if A contains a suciently rich variety of functions, it might just be possible to show
that A is dense.
Since the two replacement conditions we have in mind make sense in any collection of
real-valued functions, we state them in some generality.
Let A be a collection of real-valued functions on some set X . We say that A separates
points in X if, given x 6= y 2 X , there is some f 2 A such that f (x) 6= f (y). We say that
A vanishes at no point of X if, given x 2 X , there is some f 2 A such that f (x) 6= 0.

Examples
6. The single function f (x) = x clearly separates points in  a b ], and the function e(x) =
1 obviously vanishes at no point in  a b ]. Any subalgebra A of C  a b ] containing
these two functions will likewise separate points and vanish at no point in  a b ].
7. The set E of even functions in C ;1 1 ] fails to separate points in ;1 1 ] indeed,
f (x) = f (;x) for any even function. However, since the constant functions are even,
E vanishes at no point of ;1 1 ]. It's not hard to see that E is a proper closed
subalgebra of C ;1 1 ]. The set of odd functions will separate points (since f (x) = x
is odd), but the odd functions all vanish at 0. The set of odd functions is a proper
closed subspace of C ;1 1 ], although not a subalgebra.
8. The set of all functions f 2 C ;1 1 ] for which f (0) = 0 is a proper closed subalgebra
of C ;1 1 ]. In fact, this set is a maximal (in the sense of containment) proper closed
subalgebra of C ;1 1 ]. Note, however, that this set of functions does separate points
in ;1 1 ] (again, because it contains f (x) = x).
9. It's easy to construct examples of non-trivial closed subalgebras of C (X ). Indeed,
given any closed subset X0 of X , the set A(X0 ) = ff 2 C (X ) : f vanishes on X0g is
a non-empty, proper subalgebra of C (X ). It's closed in any reasonable topology on
C (X ) because it's closed under pointwise limits. Subalgebras of the type A(X0 ) are
of interest because they're actually ideals in the ring C (X ). That is, if f 2 C (X ),
and if g 2 A(X0 ), then fg 2 A(X0 ).
Stone-Weierstrass 148
As these few examples illustrate, neither of our new conditions, taken separately, is
enough to force a subalgebra of C (X ) to be dense. But both conditions together turn
out to be sucient. In order to better appreciate the utility of these new conditions, let's
isolate the key computational tool that they permit within an algebra of functions.
Lemma. Let A be an algebra of real-valued functions on some set X , and suppose that
6 y 2 X and a,
A separates points in X and vanishes at no point of X . Then, given x =
b 2 R, we can nd an f 2 A with f (x) = a and f (y) = b.
; 
Proof. Given any pair of distinct points x 6= y 2 X , the set Ae = f f (x) f (y ) : f 2 Ag
is a subalgebra of R2. If A separates points in X , then Ae is evidently neither f(0 0)g nor
f(x x) : x 2 Rg. If A vanishes at no point, then f(x 0) : x 2 Rg and f(0 y) : y 2 Rg are
both excluded. Thus Ae = R2. That is, for any a, b 2 R, there is some f 2 A for which
(f (x) f (y)) = (a b).
Now we can state Stone's version of the Weierstrass theorem (for compact metric
spaces). It should be pointed out that the theorem, as stated, also holds in C (X ) when
X is a compact Hausdor topological space (with the same proof), but does not hold for
algebras of complex-valued functions over C . More on this later.
Stone-Weierstrass Theorem. (real scalars) Let X be a compact metric space, and let
A be a subalgebra of C (X ). If A separates points in X and vanishes at no point of X ,
then A is dense in C (X ).
What Cheney calls an \embryonic" version of this theorem appeared in 1937, as a small
part of a massive 106 page paper! Later versions, appearing in 1948 and 1962, benetted
from the work of the great Japanese mathematician Kakutani and were somewhat more
palatable to the general mathematical public. But, no matter which version you consult,
you'll nd them dicult to read. For more details, I would recommend you rst consult
Folland's Real Analysis, or Simmons's Topology and Modern Analysis.
As a rst step in attacking the proof of Stone's theorem, notice that if A satises the
conditions of the theorem, then so does its closure A. (Why?) Thus, we may assume that
A is actually a closed subalgebra of C (X ) and prove, instead, that A = C (X ). Now the
closed subalgebras of C (X ) inherit more structure than you might rst imagine.
Stone-Weierstrass 149
Theorem. If A is a subalgebra of C (X ), and if f 2 A, then jf j 2 A. Consequently, A is
a sublattice of C (X ).
Proof. Let " > 0, and consider the function jtj on the interval
;kf k kf k . By the
P  
Weierstrass theorem, there is a polynomial p(t) = nk=0 ak tk such that  jtj; p(t)  < " for
all jtj  kf k. In particular, notice that jp(0)j = ja0 j < ".
 
Now, since jf (x)j  kf k for all x 2 X , it follows that  jf (x)j ; p(f (x))  < " for all
x 2 X . But p(f (x)) = (p(f ))(x), where p(f ) = a0 1 + a1f +  + an f n , and the function
 
g = a1 f +    + an f n 2 A, since A is an algebra. Thus,  jf (x)j ; g(x)   ja0j + " < 2"
for all x 2 X . In other words, for each " > 0, we can supply an element g 2 A such that
k jf j ; gk < 2". That is, jf j 2 A.
The statement that A is a sublattice of C (X ) means that if we're given f , g 2 A, then
maxff gg 2 A and minff gg 2 A, too. But this is actually just a statement about real
numbers. Indeed, since

2 maxfa bg = a + b + ja ; bj and 2 minfa bg = a + b ; ja ; bj

it follows that a subspace of C (X ) is a sublattice precisely when it contains the absolute


values of all its elements.
The point to our last result is that if we're given a closed subalgebra A of C (X ), then
A is \closed" in every sense of the word: Sums, products, absolute values, max's, and
min's of elements from A, and even limits of sequences of these, are all back in A. This is
precisely the sort of freedom we'll need if we hope to show that A = C (X ).
Please notice that we could have avoided our appeal to the Weierstrass theorem in this
last result. Indeed, we really only need to supply polynomial approximations for the single
function jxj on ;1 1 ], and this can be done directly. For example, we could appeal instead
p
to the binomial theorem, using jxj = 1 ; (1 ; x2 ). The resulting series can be shown
to converge uniformly on ;1 1 ]. By side-stepping the classical Weierstrass theorem, it
becomes a corollary to Stone's version (rather than the other way around).
Now we're ready for the proof of the Stone-Weierstrass theorem. As we've already
pointed out, we may assume that we're given a closed subalgebra (subspace, and sublattice)
Stone-Weierstrass 150
A of C (X ) and we want to show that A = C (X ). We'll break the remainder of the proof
into two steps:
Step 1: Given f 2 C (X ), x 2 X , and " > 0, there is an element gx 2 A with gx(x) = f (x)
and gx(y) > f (y) ; " for all y 2 X .
From our \computational" Lemma, we know that for each y 2 X , y 6= x, we can nd
an hy 2 A so that hy (x) = f (x) and hy (y) = f (y). Since hy ; f is continuous and vanishes
at both x and y, the set Uy = ft 2 X : hy (t) > f (t) ; "g is open and contains both x and y.
Thus, the sets (Uy )y6=x form an open cover for X . Since X is compact, nitely many Uy 's
suce, say X = Uy1     Uyn . Now set gx = maxfhy1 : : : hyn g. Because A is a lattice,
we have gx 2 A. Note that gx(x) = f (x) since each hyi agrees with f at x. And gx > f ; "
since, given y 6= x, we have y 2 Uyi for some i, and hence gx(y)  hyi (y) > f (y) ; ".
Step 2: Given f 2 C (X ) and " > 0, there is an h 2 A with kf ; hk < ".
From Step 1, for each x 2 X we can nd some gx 2 A such that gx(x) = f (x) and
gx(y) > f (y) ; " for all y 2 X . And now we reverse the process used in Step 1: For each x,
the set Vx = fy 2 X : gx(y) < f (y)+ "g is open and contains x. Again, since X is compact,
X = Vx1   Vxm for some x1 : : : xm . This time, set h = minfgx1 : : : gxm g 2 A. As
before, h(y) > f (y) ; " for all y, since each gxi does so, and h(y) < f (y) + " for all y, since
at least one gxi does so.
The conclusion of Step 2 is that A is dense in C (X ) but, since A is closed, this means
that A = C (X ).
Corollary. If X and Y are compact metric spaces, then the subspace of C (X Y ) spanned
by the functions of the form f (x y) = g(x) h(y), g 2 C (X ), h 2 C (Y ), is dense in C (X Y ).
Corollary. If K is a compact subset of Rn, then the polynomials (in n-variables) are
dense in C (K ).
Applications to C 2
In many texts, the Stone-Weierstrass theorem is used to show that the trig polynomials are
dense in C 2 . One approach here might be to identify C 2 with the closed subalgebra of
C  0 2 ] consisting of those functions f satisfying f (0) = f (2). Probably easier, though,
Stone-Weierstrass 151
is to identify C 2 with the continuous functions on the unit circle T = fei :  2 Rg =
fz 2 C : jzj = 1g in the complex plane using the identication
f 2 C 2 ! g 2 C (T) where g(eit) = f (t):
Under this correspondence, the trig polynomials in C 2 match up with (certain) polyno-
mials in z = eit and z = e;it . But, as we've seen, even if we start with real-valued trig
polynomials, we'll end up with polynomials in z and z having complex coecients.
Given this, it might make more sense to consider the complex-valued continuous func-
tions on T. We'll write CC (T) to denote the complex-valued continuous functions on
T, and CR (T) to denote the real-valued continuous functions on T. Similarly, CC2 is
the space of complex-valued, 2-periodic functions on R, while CR2 stands for the real-
valued, 2-periodic functions on R. Now, under the identication we made earlier, we have
CC (T) = CC2 and CR (T) = CR2 . The complex-valued trig polynomials in CC2 now match
up with the full set of polynomials, with complex coecients, in z = eit and z = e;it . We'll
use the Stone-Weierstrass theorem to show that these polynomials are dense in CC (T).
Now the polynomials in z obviously separate points in T and vanish at no point of T.
Nevertheless, the polynomials in z alone are not dense in CC (T). To see this, here's a proof
that f (z) = z cannot be uniformly approximated by polynomials in z. First, suppose that
P
we're given some polynomial p(z) = nk=0 ck zk . Then,
Z 2 Z 2 n Z 2
X
f (eit ) p(eit ) dt = eit p(eit ) dt = ck ei(k+1)t dt = 0
0 0 k=0 0

and so Z 2 Z 2
2 = f (eit ) f (eit ) dt =

f (eit ) f (eit ) ; p(eit ) dt
0 0
because f (z) f (z) = jf (z)j2 = 1. Now, taking absolute values, we get
Z 2 
2  f (eit ) ; p(eit ) dt  2kf ; pk:
0
That is, kf ; pk  1 for any polynomial p.
We might as well proceed in some generality: Given a compact metric space X , we'll
write CC (X ) for the set of all continuous, complex-valued functions f : X ! C , and we
Stone-Weierstrass 152
norm CC (X ) by kf k = max x2X
jf (x)j (where jf (x)j is the modulus of the complex number
f (x), of course). CC (X ) is a Banach algebra over C . In order to make it clear which eld
of scalars is involved, we'll write CR (X ) for the real-valued members of CC (X ). Notice,
though, that CR (X ) is nothing other than C (X ) with a new name.
More generally, we'll write AC to denote an algebra, over C , of complex-valued func-
tions and AR to denote the real-valued members of AC . It's not hard to see that AR is
then an algebra, over R, of real-valued functions.
Now if f is in CC (X ), then so is the function f (x) = f (x) (the complex-conjugate of
f (x)). This puts
; 
Ref = 12 f + f and Imf = 21i f ; f
; 
the real and imaginary parts of f , in CR (X ) too. Conversely, if g, h 2 CR (X ), then
g + ih 2 CC (X ).
This simple observation gives us a hint as to how we might apply the Stone-Weierstrass
theorem to subalgebras of CC (X ). Given a subalgebra AC of CC (X ), suppose that we could
prove that AR is dense in CR (X ). Then, given any f 2 CC (X ), we could approximate Ref
and Imf by elements g, h 2 AR . But since AR AC , this means that g + ih 2 AC , and
g + ih approximates f . That is, AC is dense in CC (X ). Great! And what did we really use
here? Well, we need AR to contain the real and imaginary parts of \most" functions in
CC (X ). If we insist that AC separate points and vanish at no point, then AR will contain
\most" of CR (X ). And, to be sure that we get both the real and imaginary parts of each
element of AC , we'll insist that AC contain the conjugates of each of its members: f 2 AC
whenever f 2 AC . That is, we'll require that AC be self-conjugate (or, as some authors
say, self-adjoint).
Stone-Weierstrass Theorem. (complex scalars) Let X be a compact metric space, and
let AC be a subalgebra, over C , of CC (X ). If AC separates points in X , vanishes at no
point of X , and is self-conjugate, then AC is dense in CC (X ).
Proof. Again, write AR for the set of real-valued members of AC . Since AC is self-
conjugate, AR contains the real and imaginary parts of every f 2 AC 
;  ; 
Ref = 21 f + f 2 AR and Imf = 21i f ; f 2 AR :
Stone-Weierstrass 153
Moreover, AR is a subalgebra, over R, of CR (X ). In addition, AR separates points in X
and vanishes at no point of X . Indeed, given x 6= y 2 X and f 2 AC with f (x) 6= f (y),
we must have at least one of Ref (x) 6= Ref (y) or Imf (x) 6= Imf (y). Similarly, f (x) 6= 0
means that at least one of Ref (x) 6= 0 or Imf (x) 6= 0 holds. That is, AR satises the
hypotheses of the real-scalar version of the Stone-Weierstrass theorem. Consequently, AR
is dense in CR (X ).
Now, given f 2 CC (X ) and " > 0, take g, h 2 AR with kg ; Ref k < "=2 and
kh ; Imf k < "=2. Then, g + ih 2 AC and kf ; (g + ih)k < ". Thus, AC is dense in
CC (X ).
Corollary. The polynomials, with complex coecients, in z and z are dense in CC (T). In
other words, the complex trig polynomials are dense in CC2 .
Note that it follows from the complex-scalar proof that the real parts of the polyno-
mials in z and z , that is, the real trig polynomials, are dense in CR (T) = CR2 .
Corollary. The real trig polynomials are dense in CR2 .
Application: Lipschitz Functions
In most Real Analysis courses, the classical Weierstrass theorem is used to prove that
C  a b ] is separable. Likewise, the Stone-Weierstrass theorem can be used to show that
C (X ) is separable, where X is a compact metric space. While we won't have anything quite
so convenient as polynomials at our disposal, we do, at least, have a familiar collection of
functions to work with.
Given a metric space (X d ), and 0  K < 1, we'll write lipK (X ) to denote the
collection of all real-valued Lipschitz functions on X with constant at most K  that is,
f : X ! R is in lipK (X ) if jf (x) ; f (y)j  Kd(x y) for all x, y 2 X . And we'll write
lip(X ) to denote the set of functions that are in lipK (X ) for some K  in other words,
lip(X ) = 1
S lip (X ). It's easy to see that lip(X ) is a subspace of C (X ) in fact, if X
K =1 K
is compact, then lip(X ) is even a subalgebra of C (X ). Indeed, given f 2 lipK (X ) and
g 2 lipM (X ), we have
jf (x)g(x) ; f (y)g(y)j  jf (x)g(x) ; f (y)g(x)j + jf (y)g(x) ; f (y)g(y)j
 K kgk jx ; yj + M kf k jx ; yj:
Stone-Weierstrass 154
Lemma. If X is a compact metric space, then lip(X ) is dense in C (X ).
Proof. Clearly, lip(X ) contains the constant functions and so vanishes at no point of
X . To see that lip(X ) separates point in X , we use the fact that the metric d is Lipschitz:
Given x0 6= y0 2 X , the function f (x) = d(x y0 ) satises f (x0 ) > 0 = f (y0 ) moreover,
f 2 lip1(X ) since

jf (x) ; f (y)j = jd(x y0 ) ; d(y y0 )j  d(x y):


Thus, by the Stone-Weierstrass Theorem, lip(X ) is dense in C (X ).
Theorem. If X is a compact metric space, then C (X ) is separable.
Proof. It suces to show that lip(X ) is separable. (Why?) To see this, rst notice that
S
lip(X ) = 1 E , where
K =1 K

EK = ff 2 C (X ) : kf k  K and f 2 lipK (X )g:

(Why?) The sets EK are (uniformly) bounded and equicontinuous. Hence, by the Arzel'a-
Ascoli theorem, each EK is compact in C (X ). Since compact sets are separable, as are
countable unions of compact sets, it follows that lip(X ) is separable.
As it happens, the converse is also true (which is why this is interesting) see Folland's
Real Analysis for more details.
Theorem. If C (X ) is separable, where X is a compact Hausdor topological space, then
X is metrizable.
A Short List of References
Books
Abramowitz, M. and Stegun, I., eds., Handbook of Mathematical Functions with For-
mulas, Graphs, and Mathematical Tables, Dover, 1965.
Birkhoff, G., A Source Book in Classical Analysis, Harvard, 1973.
Buck, R. C., ed., Studies in Modern Analysis, MAA, 1962.
Carothers, N. L., Real Analysis, Cambridge, 2000.
Cheney, E. W., Introduction to Approximation Theory, Chelsea, 1982.
Davis, P. J., Interpolation and Approximation, Dover, 1975.
DeVore, R. A. and Lorentz, G. G., Constructive Approximation, Springer-Verlag,
1993.
Dudley, R. M., Real Analysis and Probability, Wadsworth & Brooks/Cole, 1989.
Folland, G. B., Real Analysis: modern techniques and their applications, Wiley, 1984.
Fox, L. and Parker, I. B., Chebyshev Polynomials, Oxford University Press, 1968.
Hoffman, K., Analysis in Euclidean Space, Prentice-Hall, 1975.
Jackson, D., Theory of Approximation, AMS, 1930.
Jackson, D., Fourier Series and Orthogonal Polynomials, MAA, 1941.
Ko rner, T. W., Fourier Analysis, Cambridge, 1988.
Korovkin, P. P., Linear Operators and Approximation Theory, Hindustan Publishing,
1960.
La Vall ee Poussin, Ch.-J. de, Lecons sur l'Approximation des Fonctions d'une Vari-
able Reele, Gauthier-Villars, 1919.
Lorentz, G. G., Bernstein Polynomials, Chelsea, 1986.
Lorentz, G. G., Approximation of Functions, Chelsea, 1986.
Natanson, I., Constructive Function Theory, 3 vols., Ungar, 1964{1965.
Powell, M. J. D., Approximation Theory and Methods, Cambridge, 1981.
Rivlin, T. J., An Introduction to the Approximation of Functions, Dover, 1981.
Rivlin, T. J., The Chebyshev Polynomials, Wiley, 1974.
Rudin, W., Principles of Mathematical Analysis, 3rd. ed., McGraw-Hill, 1976.
Simmons, G. F., Introduction to Topology and Modern Analysis, McGraw-Hill, 1963
reprinted by Robert E. Krieger Publishing, 1986.
Articles
Boas, R. P., \Inequalities for the derivatives of polynomials," Mathematics Magazine,
42 (1969), 165{174.
Fisher, S., \Quantitative approximation theory," The American Mathematical Monthly,
85 (1978), 318{332.
References 156
Hedrick, E. R., \The signicance of Weierstrass's theorem," The American Mathemat-
ical Monthly, 20 (1927), 211{213.
Jackson, D., \The general theory of approximation by polynomials and trigonometric
sums," Bulletin of the American Mathematical Society, 27 (1920{1921), 415{431.
Lebesgue, H., \Sur l'approximation des fonctions," Bulletin des Sciences Mathematique,
22 (1898), 278{287.
Shields, A., \Polynomial approximation," The Mathematical Intelligencer, 9 (1987),
No. 3, 5{7.
Shohat, J. A., \On the development of functions in series of orthogonal polynomials,"
Bulletin of the American Mathematical Society, 41 (1935), 49{82.
Stone, M. H., \Applications of the theory of Boolean rings to general topology," Trans-
actions of the American Mathematical Society, 41 (1937), 375{481.
Stone, M. H., \A generalized Weierstrass theorem," in Studies in Modern Analysis, R.
C. Buck, ed., MAA, 1962.
Van Vleck, E. B., \The in&uence of Fourier's series upon the development of mathe-
matics," Science, 39 (1914), 113{124.
Weierstrass, K., \U ber die analytische Darstellbarkeit sogenannter willk urlicher Func-

tionen einer reellen Ver anderlichen," Sitzungsberichte der Koniglich Preussischen
Akademie der Wissenshcaften zu Berlin, (1885), 633{639, 789{805.
Weierstrass, K., \Sur la possibilit"e d'une repr"esentation analytique des fonctions
dites arbitraires d'une variable r"eele," Journal de Mathematiques Pures et Appliquees,
2 (1886), 105{138.

S-ar putea să vă placă și