Sunteți pe pagina 1din 95

Approximations:

From symbolic to
numerical computation,
and applications
Master dinformatique fondamentale
cole normale suprieure de Lyon
Fall-winter 2013
Nicolas Brisebarre Bruno Salvy

http://www.ens-lyon.fr/LIP/AriC/M2R/ASNA
Chapter 1
Introduction
The classical presentation of mathematical methods usually leaves out the problems of actually
getting numerical values. In practice, a compromise between accuracy and eciency is desirable
and this turns out to require the development of specic (and often very nice) algorithms. Our
aim in this course is to exhibit the interplay between symbolic and numerical computation in order
to achieve as precise (or guaranteed or proved) computations as possible, and fast ! (At least in a
number of problems).

1.1 From symbolic to numerical computations

1.1.1 Examples of numerical instability

Example 1.1. The Fibonacci recurrence.


It is classical that the recurrence
un+2 = un+1 + un
admits a solution of the form

1+ 5 1 1 5
un = a n + b n , with = and = =
2 2

the solutions of the characteristic polynomial x2 x 1.


The values of a and b are dictated by the initial conditions.
The classical Fibonacci sequence
is obtained with u0 = 0 and u1 = 1, leading to a = b = 1/ 5 , so that in particular un when
n , since a > 0 and || > 1. On the opposite side, the sequence obtained with a = 0, b = 1, or
equivalently u0 = 1, u1 = is ( n) whose terms tend to 0 when n since | | < 1. In practice
however, this phenomenon is very dicult to observe, as the following experiments show.
Maple 1] phi:=[solve(x^2=x+1,x)];

[1/2 5 + 1/2, 1/2 1/2 5 ]
First, a purely numerical experiment: we compute numerically and use it to get the rst 50 values
Maple 2] map(evalf,phi);
[1.618033988, 0.6180339880]
Maple 3] phi2:=%[2];
0.6180339880
Maple 4] N:=50:
Maple 5] u[0]:=1:u[1]:=phi2:for i from 0 to N-2 do u[i+2]:=u[i]+u[i+1] od:
Maple 6] L:=[seq(u[i],i=0..N)];

3
4 Introduction

[1, 0.6180339880, 0.3819660120, 0.2360679760, 0.1458980360, 0.0901699400, 0.0557280960,


0.0344418440, 0.0212862520, 0.0131555920, 0.0081306600, 0.0050249320, 0.0031057280,
0.0019192040, 0.0011865240, 0.0007326800, 0.0004538440, 0.0002788360, 0.0001750080,
0.0001038280, 0.0000711800, 0.0000326480, 0.0000385320, 0.0000058840, 0.0000444160, 0.000\
0503000, 0.0000947160, 0.0001450160, 0.0002397320, 0.0003847480, 0.0006244800, 0.001009\
2280, 0.0016337080, 0.0026429360, 0.0042766440, 0.0069195800, 0.0111962240, 0.0181158040,
0.0293120280, 0.0474278320, 0.0767398600, 0.1241676920, 0.2009075520, 0.3250752440, 0.5259\
827960, 0.8510580400, 1.377040836, 2.228098876, 3.605139712, 5.833238588, 9.438378300]
Here is a plot of these values:
Maple 7] plots[listplot]([seq(u[i],i=0..N)]):

The problem is that the numerical error introduced when replacing by a 10-digit approximation
amounts to having a very small, but nonzero, value for a. At rst, this goes unnoticed, but
eventually, since n tends to innity, it overtakes the part in n.
A natural solution is to work exactly, starting with a symbolic value for and reproducing the
same steps using symbolic computation:
Maple 8] phi2:=phi[2];

1/2 1/2 5
Maple 9] u[0]:=1:u[1]:=phi2: for i from 0 to N do u[i+2]:=u[i]+u[i+1]
od:L:=[seq(u[i],i=0..N)];
29
[1, 1/2 1/2 5 , 3/2 1/2 5 , 2 5 , 7/2 3/2 5 , 11/2 5/2 5 , 9 4 5 , 13/2 5 ,
2
47 123 55 199 89 521 233 843
21/2 5 , 38 17 5 , 5, 5 , 161 72 5 , 5,
2 2 2 2 2 2 2 2
377 2207 987 3571 1597 9349 4181
5 , 682 305 5 , 5, 5 , 2889 1292 5 , 5,
2 2 2 2 2 2 2
15127 6765 39603 17711 64079 28657
5 , 12238 5473 5 , 5, 5 , 51841 23184 5 ,
2 2 2 2 2 2
167761 75025 271443 121393 710647 317811
5, 5 , 219602 98209 5 , 5,
2 2 2 2 2 2
1149851 514229 3010349 1346269 4870847 2178309
5 , 930249 416020 5 , 5, 5,
2 2 2 2 2 2
12752043 5702887 20633239 9227465
3940598 1762289 5 , 5, 5 , 16692641
2 2 2 2
54018521 24157817 87403803 39088169
7465176 5 , 5, 5 , 70711162 31622993 5 ,
2 2 2 2
228826127 102334155 370248451 165580141
5, 5 , 299537289 133957148 5 ,
2 2 2 2
969323029 433494437 1568397607 701408733
5, 5 , 1268860318 567451585 5 ,
2 2 2 2
4106118243 1836311903 6643838879 2971215073
5, 5 , 5374978561 2403763488 5 ,
2 2 2 2
17393796001 7778742049 28143753123 12586269025
5, 5]
2 2 2 2
1.1 From symbolic to numerical computations 5

However, a new diculty occurs:


Maple 10] plots[listplot]([seq(u[i],i=0..N)]):

Again, the values explode eventually, although we have exact values all along. The reason for this
lies in the numerical evaluation of the large coecients involved in the exact expression. This can
be seen by evaluating both terms in each value separately:
Maple 11] u[50];
28143753123 12586269025
5
2 2
Maple 12] A:=[op(%)];
28143753123 12586269025
[ , 5]
2 2
Maple 13] evalf(A);
[14071876560.0, 14071876560.0]
Thus, in this case, increasing the precision of the numerical evaluation is sucient:
Maple 14] evalf(u[50],20);
0.0
Maple 15] evalf(u[50],30);
3.55318637 1011
Note that since both summands in the expression grow like n and we are computing a value that
decreases like n, the number of required digits grows linearly with n, making such a computation
costly.
The behaviour of this sequence is by no means an isolated accident. Every time a sequence has
an asymptotic behaviour which is not the dominating one, its direct evaluation presents this kind
of diculty. A simple and ecient way to compute such sequences will be presented in Chapter 8.

Example 1.2. The Airy function.


This is a classical special function, with many applications in asymptotic analysis and math-
ematical physics (it is related to the location of the supernumerary rays that are sometimes visible
underneath a rainbow). It can be dened by the following equations

32/3 31/6 (2/3)


y (x) x y(x) = 0, y(0) = , y (0) = .
(2/3) 2
For very similar reasons, solving this equation numerically by a scheme like Eulers or Runge-Kutta
is bound to explode and fail to capture the true behaviour of this function, which tends to 0 as
x .
Maple 16] deq:={diff(y(x),x,x)-x*y(x)=0,y(0)=3^(1/3)/3/GAMMA(2/3),D(y)(0)=-3^(1/
6)*GAMMA(2/3)/2/Pi};
6 Introduction

(
3

6
)
d2 3 3 (2/3)
y(x) x y(x) = 0, y(0) = 1/3 , D(y)(0) = 1/2
d x2 (2/3)

Maple 17] dsolve(deq,y(x));


y(x) = A i(x)
Maple recognizes this function that it knows about and can plot it from there:
Maple 18] plot(rhs(%),x=-10..10):

But the numerical solver ends up exploding:


Maple 19] plots[odeplot](dsolve(deq,y(x),numeric),x=-10..10,color=black):

Note that apart from this region where x is large, the numerical solver behaves very well, as we
can see by superposing both curves
Maple 20] plots[display](%%,%):

Here, this behaviour is very unfortunate: this function has been isolated and given a name by
mathematical physicists precisely because it has a mild behaviour at . It is therefore necessary
to nd other ways for its evaluation. An ecient approach to the guaranteed computation of such
functions with high precision will be presented in Chapter 3.
1.2 Approximations 7

1.1.2 Efficiency issues


A typical example is provided by the following equation
y (x) y(x) = 0, y(0) = 0, y (0) = 1,
that denes the sine function. Suppose we want to evaluate numerically this function on the interval
[0, ] with, say, absolute error bounded by = 1010.
It is easy to compute many terms of the Taylor expansion of sin and make sure that the error
commited in truncating the power series is negligible compared to . However, approximation by
the Taylor expansion is usually not the most ecient way to approximate such a function. We will
see in Chapter 2 another approach based on
1. computing a Chebyshev series symbolically instead of a Taylor series (i.e., expand on the
basis of Chebyshev polynomials (Tn(x)) rather than the power basis (xn));
2. evaluating this series numerically at well chosen points;
3. computing a polynomial of small degree interpolating these points, while containing the
errors to be smaller than ;
4. (optionally) use a process called economization to compute a polynomial that is even
cheaper to evaluate.
All these steps are motivated by the fact that evaluating a polynomial can be rather ecient.

1.2 Approximations
We now list a few of the questions dealt with in this course, whose natural habitat lies sometimes
within approximation theory and sometimes within symbolic computation.

Compute the rst 1000 digits of , ln 2, 7 , exp(-10), ... (see Chapter 3);
Compute the oating point number in the IEEE standard that is closest to these numbers;
Compute the rst 1000 Taylor coecients of
1
, arcsin(x), sin(tan (x)) tan (sin (x)),
1 x x2
or of the solutions of
1
y(x) = 1 + x y(x)5, y(x) = x + x log , x2 y (x) + x y (x) + (x2 1) y(x) = 0
1 y(x)
(Ecient algorithms exist in computer algebra. A general purpose approach is described in
Chapter 3 and a faster one for all but the second and fourth examples is in Chapter 3).
Compute a polynomial P of minimal degree such that
|f (x) P (x)| < 1015 for all x [0, 1/4],
and for each of the functions above (see Chapter 2).
Conversely, given a function f and a polynomial P , compute a bound on |f P | on such
an interval (Chapter 2);
Polynomials are not very good at approximating functions that have poles at or near the
endpoints of the interval. It is therefore natural to ask the same questions with rational
functions instead of polynomials, minimizing the sum of the degrees of the numerator and
denominator (Chapters 4 and 5);
Same questions when minimizing
Z 1/4
(f (t) P (t))2 dt
0
instead;
8 Introduction

Given (x1, y1), ..., (xn , yn), compute a polynomial of minimal degree (or a rational function
with minimal sum of degrees) such that P (xi) = yi, i = 1, ..., n;
Same question with a xed degree, minimizing
n
X n
X
|P (xi) yi | or |P (xi) yi |2;
i=1 i=1

Same question with a given f , with yi = f (xi) and now the aim is to minimize |f (x) P (x)|
for x [a, b] or
Z b
|f (t) P (t)|2 dt;
a

Same question if the choice of x1, ..., xn is free;


Compute these f (xi) when f is given by a linear dierential equation (end of the course);
Compute integrals, zeroes of functions,...
For all these questions, the objects of study will be the existence and uniqueness of solutions, the
discovery of iterations converging to them or other means of computing them and their eciency.
Proofs of irrationnality are sometimes not very far.

1.3 Symbolic Computation


We want to use symbolic computation tools to help produce good numerical approximations to
functions. In the setting of approximation theory, the input is often given as a function f con-
tinuous (say) over an interval [a, b]. In the setting of symbolic computation, it is necessary to be
much more precise concerning the way this function and this interval are given, as well as take into
account constraints due to the algorithms. We thus conclude this rst course by a brief introduction
to symbolic computation.
A fundamental issue is that not all mathematics can be decided by a computer. An important
result in this area is the following.

Theorem 1.3. [Richardson-Matiyasevich] There cannot exist an algorithm taking as input any
function of one variable f (x) built by repeated composition from the constant 1, addition, subtrac-
tion, multiplication, the functions sine and absolute value, and that decides whether f=0 or not.

This result restricts the class of functions that can be handled in symbolic computation systems.
It also implies that simplication is at best a collection of useful heuristics. The way out of this
undecidability result is to stay within algebraic constructions that preserve decidability.

1.3.1 Representation of mathematical objects


We call effective sets of objects of mathematics that can be dealt with (meaning, for which there is
a representation such that arithmetic operations and test for equality to 0 are given by algorithms)
are: machine integers (usually Z/264Z or Z/232Z); integers of arbitrary size; vectors and matrices
over an eective ring; polynomials in one or several variables over such a ring; rational functions;
truncated power series.
An important idea that enlarges the scope of symbolic computation is that equations are a
data-structure for their solutions. Thus, using univariate polynomials as a data-structure lets one
manipulate algebraic numbers and therefore work in the algebraic closure of any eective eld
(the algorithms are nothing but Euclids algorithm and the computation of resultants). A typical
exercice in this thread is to prove automatically the following beautiful identity:
sin (2/7) sin (/7) sin (3/7)
2
2
+ 2
= 2 7.
sin (3/7) sin (2/7) sin (/7)
1.3 Symbolic Computation 9

Similarly, and closer to the aim of this course, the solutions of linear dierential equations with
polynomial coecients over an eective eld enjoy a large number of closure properties made
eective by simple algorithms that will be presented in Chapter 3. Not only can one show auto-
matically identities like sin2 + cos2 = 1, but this gives access to an eective access to various
operations with special functions and orthogonal polynomials, thereby providing us with a large
set of eective functions that require approximation.

1.3.2 Efficiency
One minute
The basic algorithms of symbolic computation are extremely ecient. Thus in one minute of
cpu time of a typical modern computer, it is possible to compute
the product of integers with 500,000,000 digits;
the factorial of 20,000,000 (the result has roughly 140 106 digits);
(by contrast, only the factorisation of a 45-digits number can be done within this time);
the product of polynomials in K[x] of degree 14 106 (where K = Z/pZ with p = 67, 108,
879 is a 26-bit long prime number);
the gcd of polynomials of K[x] of degree 600, 000;
the determinant of matrices of size 4, 500 4, 500 over K;
the determinant of matrices of size 700 700 if their entries are 32-bit integers.
Complexity models
A simple means of assessing the eciency of algorithms is by analyzing their complexity. This
entails selecting a complexity model that denes precisely what operations are counted in the
analysis. One such model that is commonly used is the RAM model (for Random Access Machine).
In this model, the machine has one tape where it reads its input (one integer per cell in the tape);
another tape where it writes its output; a tape for its computations. It has an arbitrary number
of registers and the operations that are counted at unit cost are: reading or writing a cell; adding,
multiplying, subtracting, dividing integers; jumps that can be either unconditionnal or depending
on tests of the type =0 or >0 on the value of a register.
The complexity measured in this model is called the arithmetic complexity. While it is a good
measure of time when working in settings where the sizes of the integers are all similar, for instance
for polynomials or matrices over a nite eld, this model does not predict time correctly when
large size integers come into play. A variant of this model is then used, where the cells can only
contain one bit (0 or 1) and the operations only act on one bit as well. The complexity measured
in this model is called bit complexity.
Asymptotic estimates
On modern computers, a computation that takes more than 10 seconds is usually already
spending its time in the part that dominates the other ones asymptotically and thus fair predictions
of execution time can be obtained by a rst-order asymptotic estimate of the complexity. In each
case it is of course necessary to specify the complexity model (arithmetic or bit complexity in this
course).
For instance, the computation of n! requires
O(n2 log2n) bit operations by the nave algorithm and only O(n) arithmetic operations;

O( n log n) arithmetic operations with the currently known best algorithm in terms of
arithmetic operations;
O(n log3n loglog n) bit operations, with an algorithm presented in Chapter 3, which explains
the speed displayed above.
Integers and polynomials
10 Introduction

The starting point for fast algorithms in symbolic computation is fast multiplication. We use
M(N ) to denote the arithmetic complexity of the product of two polynomials of degree bounded
by N in one variable. Then,

O(N 2)
by the nave algorithm;
M(N ) = O(N log23) by Karatsubas algorithm;

O(N log N loglog N ) by fast Fourier transform (FFT)..

Thus asymptotically, multiplication is not much slower than addition.


Similarly, we use MZ(N ) to denote the complexity of the product of two integers of at most N
bits. The algorithms are a bit more intricate because of the need to deal with carry propagation,
but the end result is almost the same:

O(N 2)
by the nave algorithm;
MZ(N ) = O(N log23) by Karatsubas algorithm;

O(N log N loglog N ) by fast Fourier transform (FFT).

These will be the building blocks for fast algorithms in the following chapters. In particular, it is
important to keep in mind the following complexity estimates:
For power series, product, inverses, square-roots and more generally solutions of polynomials
can be computed in O(M(N )) arithmetic operations;
For polynomials, gcd, multipoint evaluation (evaluation of a polynomial of degree N at N
points) and interpolation can be computed in O(M(N )log N ) arithmetic operations.
Chapter 2
Polynomial approximations
In this chapter, we present various theoretical and algorithmic results regarding polynomial approx-
imations of functions. We will mainly deal with real-valued continuous functions over a compact
interval [a, b], a, bR, a 6 b. We will denote C([a, b]) the real vector space of continuous functions
over [a,b]. In the framework of function evaluation one usually works with the following two norms
over this vector space:
the least-square norm L2: given a nonnegative weight function w C([a, b]), if dx denotes
the Lebesgue measure, we write
g L2([a, b], w, dx)
if
Z b
w(x) |g(x)| 2 dx < ,
a
and then we dene sZ
b
kg k2,w = w(x) |g(x)|2 dx ;
a

the supremum norm (aka Chebyshev norm, innity norm, L norm) : if g is bounded
on [a, b], we set
kg k = sup |g(x)|,
x[a,b]

(observe that for a continuous function g, we have kg k = maxx[a,b] |g(x)|).

For both norms, one of the main questions we are interested in here is the following.

Question. Given f C([a, b]) and n N, minimize kf pk where p describes the space Rn[x] of
polynomials with real number coefficients and degree at most n.

In the L2 case, the answer to this question is easy. The space C([a, b]) is a subset of L2([a, b],
w, dx) which is a Hilbert space, i.e. a vector space equipped with an inner product
Z b
hf , g i = f (x) g(x) w(x) dx,
a

and kk2 is the associated norm, for which L2([a, b], w, dx) is complete. The best polynomial
approximation of degree at most n is the projection p = pr(f ) of f onto Rn[x]. We will give more
details on the L2 case in Chapter 7. The situation in the L case is more intricate and we will focus
on it in the sequel of this chapter.

2.1 Density of the polynomials in (C([a, b]),k.k)


For all f C([a, b]) and n N, let
En(f ) = inf kf pk.
pRn[x]

11
12 Polynomial approximations

We rst show that En(f ) 0 as n , a result due to Weierstra theorem, 1885). Various proofs
of this result have been published, in particular, those by Runge (1885), Picard (1891), Lerch
(1892 and 1903), Volterra (1897) Lebesgue (1898), Mittag-Leer (1900), Fejr (1900 and 1916),
Landau (1908), la Valle Poussin (1908), Jackson (1911), Sierpinski (1911), Bernstein (1912),
Montel (1918). The text [16] is an interesting account on Weierstra contribution to Approxima-
tion Theory and, in particular, his fundamental result on the density of polynomials in C([a, b]).
We give now one proof inspired by Bernsteins one.

Theorem 2.1. [Weierstra, 1885] For all f C([a, b]) and for all > 0, there exists n N, p Rn[x]
such that kp f k < .

Proof. Up to a change of variable, we can assume [a, b] = [0, 1]. Dene the Bernstein polynomials as
n  
X n
Bn(g, x) = g(k/n) xk (1 x)nk for g C([0, 1]).
k
k=0
We have
n  
X n k
Bn(1, x) = x (1 x)nk = 1,
k
k=0
n   n  
X n k k X n 1 k1
Bn(x, x) = x (1 x)nk = x x (1 x)nk
k n k1
k=0 k=1
n1
X n 1 
k n1k
=x x (1 x) = x,
k
k=0
n   2 n  
2
X n k X k n 1 k1
Bn(x , x) = xk (1 x)nk = x x (1 x)nk.
k n n k1
k=0 k=1
n  
X k 1 n 1 k1 nk x
= x x (1 x) +
n k 1 n
k=1
n  
n1 2 X n 2 k 2 x x n1
= x x (1 x)nk + = + x2 .
n k 2 n n n
k=2

Now consider the sequence


n  
X n k
f (x) Bn(f , x) = (f (x) f (k/n)) bn,k(x) where bn,k(x) = x (1 x)nk.
k
k=0

Fix > 0. The function f is continuous and hence uniformly continuous over [0, 1], hence there
exists > 0 such that
x1, x2 [0, 1], |x2 x1| < |f (x2) f (x1)| < .
Let M = maxx[0,1] |f (x)|. Since bn,k(x) > 0 for all x [0, 1], we can write

n n
X X
|f (x) Bn(f , x)| 6
(f (x) f (k/n)) bn,k(x) +
(f (x) f (k/n)) bn,k(x)
k=0 k=0
|xk/n|< |xk/n|>
Xn Xn X n
6 bn,k(x) + 2 M bn,k(x) = + 2 M bn,k(x).
k=0 k=0 k=0
|xk/n|> |xk/n|>

Note that we actually have



n n  2
X X x k/n
bn,k(x) 6
bn,k(x)

k=0 k=0
|xk/n|> |xk/n|>
x (1 x)
6 .
n 2
2.1 Density of the polynomials in (C([a, b]),k.k) 13

M
Therefore, we obtained |f (x) Bn(f , x)| 6 + 2n2 . The upper bound does not depend on x and
can be made as small as desired. 

Remark 2.2. One of the very nice features of this proof is that it provides an explicit sequence of
polynomials which converges to the function f . It is worth mentioning that Bernstein polynomials
prove useful in various other domains (computer graphics, global optimization, ...). See [7] for
instance.

Note that, in the proof, we only used the values of the Bn(f , x) for 0 6 n 6 2. In fact, we have
the following result.

Theorem 2.3. (Bohman and Korovkin) Let Ln be a sequence of monotone linear operators on
C(|a, b]), that is to say: for all f , g C([a, b])
Ln (f + g) = Ln(f ) + Ln(g) for all , R,
if f(x)>g(x) for all x [a, b] then Lnf(x)>Lng(x) for all x [a, b],
the following conditions are equivalent:
i. Ln f f uniformly for all f C([a, b]);
ii. Ln f f uniformly for the three functions x 7 1, x, x2 ;
iii. Ln1 1 and (Lnt)(t)0 uniformly in t [a, b] where t : x [a, b] 7 (t x)2 .

Proof. See [4]. 

A renement of Weierstras theorem that gives the speed of convergence is obtained in terms
of the modulus of continuity.

Definition 2.4. The modulus of continuity of f is the function defined as

for all > 0, () = sup |f (x) f (y)|.


|xy |< ,
x,y [a,b]

Proposition 2.5. If f is a continuous function over [0, 1], its modulus of continuity, then
 1
9
kf Bn(f , x)k = 4 n 2 .

Proof. Let > 0 and x [0, 1]. Let k {0, ..., n} such that |x k/n| 6 , then |f (x) f (k/n)| 6 w().
Since bn,k(y) > 0 for all y [0, 1], we have

n n
X X

(f (x) f (k/n)) b n,k (x) 6 w()
bn,k(x) = w().
k=0 k=0
|xk/n|<
j k
|x k/n| j
Now, let k {0, ..., n} such that |x k/n| > . Let M =
, let y j = x + M + 1 (k/n x) for
j = 0, ..., M + 1. Note that, for all j = 0, ..., M , we have |y j +1 y j | < , from which follows
M
X
|f (x) f (k/n)| 6 |f (y j +1) f (y j )| 6 (M + 1)w()
j =0 
    
1 k 1 k 2
6w() 1 + x 6 w() 1 + 2 x .
n n
14 Polynomial approximations

For all x [0, 1], we can write



n n
X X
|f (x) Bn(f , x)| 6 (f (x) f (k/n)) bn,k(x) +
(f (x) f (k/n)) bn,k(x)
k=0 k=0
|xk/n|< |xk/n|>
n  2 
X 1 k
6 w() + w() 1 + 2 x bn,k(x)
n
k=0
|xk/n|>

n  2
1 X k
2 + 2
6 w() x bn,k(x)
n
k=0
|xk/n|>
   
x (1 x) 1
6 w() 2 + 6 w() 2 + .
n 2 4n 2

Finally, replace with 1/ n . 

Remark 2.6. This result is not optimal. For improvements and renements, see Section 4.6 of
[4] or Chapter 16 of [17] for a presentation of Jackson theorems.

Corollary 2.7. When f is Lipschitz continuous, En(f ) = O n1/2 .

2.2 Best L (or minimax) approximation


The inmum En(f ) is reached, thanks to the following proposition.

Proposition 2.8. Let (E , kk) be a normed R-vector space, let F be a finite dimensional subspace
of (E , kk). For all f E, there exists p F such that kp f k = min q F kq f k. Moreover, the
set of best approximations to a given f E is convex.

Proof. Let f E. Consider F0 = {p F : kpk 6 2 kf k}. Then F0 is nonempty (it contains 0), closed,
bounded, and we assumed dim F < . Hence F0 is compact. Let (p) = kf pk. The function is
1-Lipschitz and hence continuous. It follows that (F0) is compact: there exists p F0 s.t. (p) =
min pF0 ||f pk. Moreover, if p F \ F0, kf pk > kpk ||f | > kf k > (p) since 0 F0. Thus,
kf pk = min pF ||f pk.
Now, let p and q F be two best approximations to f . For all [0, 1], the vector p + (1 )q
is an element of the vector space F and we have, from the triangle inequality, kp + (1 )q f k 6
kp f k + (1 )kq f k = min q F kq f k: the vector p + (1 )q is also a best approximation
to f . 

The best L2 approximation is unique, which is not always the case in the L setting.
Exercise 2.1. Consider the following simple situation : the interval is [1, 1], f is the constant function 1 and
F = Rg where g: x x2. Determine the set of best L approximations to f .

In the case of L, it is necessary to introduce an additional condition known as the Haar


condition.

Definition 2.9. Consider n + 1 functions 0, ..., n defined over [a, b]. We say that 0, ..., n
satisfy the Haar condition iff
a) the i are continuous;
b) and the following equivalent statements hold:
for all x0, x1, ..., xn [a, b],

0(x0) n(x0)

=0 i =
/ j , xi = x j ,

0(xn) n(xn)
2.2 Best L (or minimax) approximation 15

given pairwise distinct x0, ..., xn [a, b] and values y0, ..., yn, there exists a unique
interpolant
X n
p= k k , with k R, k = 0,..., n,
k=0
such that p(xi) = yi,
Pn
any p = k=0 k k = / 0 has at most n distinct zeros.

Exercise 2.2. Prove that the conditions above are equivalent.

A set of functions that satisfy the Haar condition is called a Chebyshev system. The prototype
example is i(x) = xi, for which we have

0(x0) n(x0) 1 xn0
Y
= = Vn = (x j xi). (2.1)

0(xn) n(xn) 1 xnn 06i< j 6n

(Proof: considering xn = z as an indeterminate and looking at the roots of the polynomial Vn, we
see that Vn = Vn1 (z x0) (z xn1).)
Exercise 2.3. Show that the following families of functions are Chebyshev systems as well:
{e0 x , ..., en x } for 0 < 1 < < n;
{1, cos x, sin x, ..., cos (n x), sin (n x)} over [a, b] where 0 6 a < b < 2 ;
{x0, ..., xn }, 0 < < n, over [a, b] with a > 0.

PLet E be a real vector space, e1, e2, ..., em E , we will denote SpanR{e1, ..., em } the set
{ mk=1 k ek; 1,..., m R}. If {0, ..., n } is a Chebyshev system over [a, b], any element of
SpanR{0, ..., n } will be called a generalized polynomial.

Theorem 2.10. [Alternation Theorem. Chebyshev? Borel (1905)? Kirchberger (1902)?] Let
{0, ..., n } be a Chebyshev system over [a, b]. Let f C([a, b]). A generalized polynomial
Pn
p = k=0 k k is the best approximation (or minimax approximation) to f iff there exist n +
2 points x0, ..., xn+1 , a 6 x0 < x1 < < xn+1 6 b such that, for all k,

f (xk) p(xk) = (1)k (f (x0) p(x0)) = kf pk.


In other words, p is the best approximation if and only if the error function f p has n + 2 extrema,
all global (of the same absolute value) and with alternating signs.
P10
Example 2.11. Let f : x [0, 1] 7 e1/cos(x), p = k=0 ckx
k
its minimax approximation. The graph
of the error function = f p is:

P15
Example 2.12. Let f : x [0.9, 0.9] 7 arctan (x), p = k=0 ckxk its minimax approximation.
The graph of the error function = f p is:
16 Polynomial approximations

Example 2.13. The best approximation to cos over [0, 10 ] on the Chebyshev system {1, x, x2}
is the constant function 0! Moreover, the same is true for {1, x, ..., xh } up to and including h = 10.

Proof. We can assume that f / SpanR{0, ..., n }.


We already proved the existence of a best approximation.
We now show that the equioscillation property implies optimality of the approximation. P Let
p be an approximation with equioscillating error function, and suppose that there exists q = j j
with kf qk < kf pk. Writing p q = (p f ) (q f ), we see that p q changes sign between
each pair of consecutive xi. It follows from the intermediate value theorem that there exist (n + 1)
points y0, ..., yn such that x0 < y0 < x1 < < xn < yn < xn+1 and p(yi) = q(yi). By denition of a
Chebyshev system, this implies that p = q.
Conversely, optimality implies equioscillation.
For simplicity, we assume that {0, ..., n } = {1, x, ..., xn }. Let p be a best approximation. First,
note that the global minimum and the global maximum of f p must have the same absolute value:
otherwise, we can improve the approximation by shifting p by a constant. Now suppose that f p
equioscillates at points x0 < x1 < < x, with 1 6 < n + 1. We choose the {xi }06i61 as follows.
The point x0 is the smallest number in [a, b] at which |p f | reaches its maximum: the set
A = {x [a, b]: |p(x) f (x)| = kp f k} is nonempty, bounded and closed since |p f | is continuous
and A = (|p f |) 1({kp f k}) [a ,b]: A is compact and let y0 be the minimum of A. Likewise,
the point x1 is dened as the smallest number in [x0, b] at which p f is equal to (p f )(x0)
and so on and so forth.
Assume wlog that p(x0) f (x0) = kp f k. For j = 0, ..., 1, let B j ={x j 6 x 6 x j +1;
p(x) = f (x)}, the set B j is nonempty since (p(x j ) f (x j ))(p(x j +1) f (x j +1))<0, closed (p f is
continuous and B j = (p f ) 1({0}) [x j , x j +1]) and bounded: it is a compact set, which has a
maximum y j+1 distinct from xj +1. We remark that y1 < y2 < < y , in particular.
We now dene Q(x) = (y1 x) (y x). Note that Q(a) > 0. If we set y0 = a and y+1 = b, let
/2
[
K1 = [a, y1] [y2, y3] = [y2k , y2k+1],
k=0

(1)/2
[
K2 = [y1, y2] [y3, y4] = [y2k+1, y2k+2].
k=0

The sets K1 and K2 are nite unions of compact sets, and hence compact.

kp f k 6 p f < kp f k and Q > 0 on the compact K1,


kp f k < p f 6 kp f k and Q 6 0 on the compact K2.

Hence there exists (0, +) such that

kp f k < p + Q f < kp f k,
2.2 Best L (or minimax) approximation 17

which contradicts the optimality of p.

Finally, let us prove the uniqueness. Let p, q be two best approximations, and let

= kf pk = kf qk.
1
It follows from Proposition 2.8 that 2 (p + q) is a best approximation too. Thus there exist
t0 < t1 < < tn+1 such that
 
p+ q
(ti) f (ti) = (1)i .
2
Thus, we have p(ti) f (ti) = q(ti) f (ti)(=(1)i ) for all i = 0, ..., n + 1, and hence p = q by the
Haar condition. 

Theorem 2.14. (La Valle Poussin) Let f C([a, b]). Let {0, ..., n } be a Chebyshev system
over [a, b], and let p SpanR {0, ..., n }. If there exist x0 < x1 < < xn+1 such that p f alternates
at the xi, then
min |f (xi) p(xi)| 6 En(f ) 6 kf pk,
i

where En(f ) = inf q SpanR {i } kf q k.

Proof. The second inequality is obvious. If the rst one does not hold, assume wlog that f (x0) >
p(x0). Then, if p is the best approximation of f , we have, for all k = 0, ..., n, (1)k(f (xk) p(xk)) >
(1)k(f (xk) p(xk)): the generalized polynomial p pchanges sign n times over [a, b], which is
not possible. 

Remark 2.15. The statements from Theorems 2.10 and 2.14 remain valid if [a, b] is replaced with
any compact subset of R containing at least n + 2 points.

Theorem 2.16. [Haars Unicity Theorem] Let {0, ..., n } be a Chebyshev system over [a, b].The
P
minimax approximation to a continuous function f by a generalized polynomial p = nk=0 k k is
unique for all choices of f iff {0, ..., n } satisfies the Haar condition.

Proof. We already proved the only if direction in Theorem 2.10.


We assume that {0, ..., n } does not satisfy the Haar condition. 

Remez (1934) proposed the following algorithm to approximate the minimax polynomial.

Algorithm 2.1
Remez rst algorithm
Input. A segment [a, b], a function f C([a, b]), a Chebyshev system {k }06k6n, a toler-
ance .
Output. An approximation of the best approximation of f on the system {k }.
1. Choose n + 2 points x0 < x1 < < xn+1 in [a, b], 1, 0.
2. While > || do
a. Solve for a0, ..., an and the linear system
n
X
ak k(x j ) f (x j ) = (1) j , j = 0, ..., n + 1.
k=0

b. Choose xnew [a, b] such that


n
X
kp f k =|p(xnew) f (xnew)|, with p = akk.
k=0
18 Polynomial approximations

Replace one of the xi with xnew, in such a way that p f alternates at the points of the
resulting sequence x0,new, ..., xn+1,new. Set = |p(xnew) f (xnew)| ||.
3. Return p.

Proof. We show that the The La Valle-Poussin theorem tells us that after each iteration, we
have || 6 En(f ) 6 || + . 

We will not give more details concerning this algorithm. See [4] or [17].

Theorem 2.17. Let pk denote the value of p after k(n+2) loop turns, and let p be such that
En(f ) = kf pk. There exists (0, 1) such that kpk pk = O(k).
k
Under mild regularity assumptions, the bound O(k) can in fact be improved to O( 2 ) [25].

2.3 Polynomial interpolation


Now we restrict our study to polynomials in Rn[x].
At this stage, it seems natural to focus on techniques for computing polynomials that interpolate
functions at a given nite family of points. We can rst ma
Step 2.a of Remez algorithm requires an ecient interpolation process,
Theorem 2.10 shows that, for all n, there exists a 6 z0 < z1 < < zn 6 b such that
f (zi) = p(zi) for i = 0, ..., n, where p is the minimax approximation of f : the polynomial
p is an interpolation polynomial of f .

Let A be a commutative ring (with unity). Given pairwise distinct x0, ..., xn A and corres-
ponding y0, ..., yn A, the interpolation problem is to nd p An[x] such that p(xi) = yi for all i.
P
Write p = k ak xk. The problem can be restated as
V a= y (2.2)
where V is a Vandermonde matrix. If det V is invertible, there is a unique solution.
From now on we assume A = R. The expression (2.1) of the Vandermonde determinant shows
that as soon as the xi are pairwise distinct, there is a unique solution. We now discuss several ways
to compute the interpolation polynomial.
Linear algebra. We could invert the system (2.2) using standard linear algebra algorithms. This
takes O(n3) operations using Gaussian elimination. In theory, the best known complexity bound
is currently O(n) where 2.3727 (Williams). In practice, Strassens algorithm yields a cost of
O(nlog2 7). There are issues with this approach, though:
the problem is ill-conditioned: a small perturbation on the yi leads to a signicant perturb-
ation of the solution.
we can do better from the complexity point of view: O(n2) or even O(n logO(1) n) in general,
O(n log n) if the xi are so-called Chebyshev nodes;

The divided-difference method. Newtons divided-difference method allows us to compute


interpolation polynomials incrementally. The idea is as follows. Let pk Rk[x] be such that
pk(xi) = yi for 0 6 i 6 k 6 n, and write
pn+1(x) = pn(x) + an+1 (x x0) (x xn).
Then we have

pn+1(x j ) = y j , 0 6 j 6 n,
pn+1(xn+1) = pn(xn+1) + an+1 (xn+1 x0) (xn+1 xn).
2.4 Interpolation and approximation, Chebyshev polynomials 19

Given y0, ..., yk, we denote by [y0, ..., yk] the corresponding ak: Then, we can compute ak using
the relation
[y , ..., yk+1] [y0, ..., yk]
[y0, ..., yk+1] = 1 .
xk+1 x0
This leads to a tree of the following shape.
[y0, ..., yn]

[y0, ..., yn1] [y1, ..., yn]

... ... ... ...

[y0] = y0 [y1] = y1 [yn1] = yn1 [yn] = yn

Hence, the cost for computing the coecients in O(n2) operations.

The evaluation cost at a given point z is in O(n) operations in R.


Lagranges Formula. For all j, let
Y x xk
L j (x) = .
x j xk
/j
k=

Then we have deg L j = n and L j (xi) = i,j for all 0 6 i, j 6 n. The polynomials L j , 0 6 j 6 n, form
a basis of Rn[x], and the interpolation polynomial p can be written
n
X
p(x) = yi Li(x).
i=0

Thus, writing the interpolation polynomial on the Lagrange basis is straightforward.


What about for the cost of evaluating the resulting polynomial at a given point z. If we do it
naively, computing L j (z) costs (say) 2 n subtractions, 2 n + 1 multiplications and 1 division. The
total cost is O(n2) operations in R.
But we can also write
n n
X yi Y
p(x) = W (x) , W (x) = (x xi).
i=0
(x xi) W (xi) i=0

Assuming the W (xi) are precomputed, the cost of evaluating p(z) using this formula is only O(n)
arithmetical operations.
Google barycentric Lagrange interpolation and/or see Trefethens book [22] for more inform-
ation (this decomposition has excellent properties regarding stability issues).

2.4 Interpolation and approximation, Chebyshev polynomials


How useful is interpolation for our initial L approximation problem? It turns out that the choice
of the points is critical. The more points, the better? Actually, with equidistant points, the error
can grow with the number of points (Runges phenomenon).
Exercise 2.4. Using your computer algebra system of choice, interpolate the function
1
f : x 7
1 + 5 x2
2k
at the points 1 + n
, 0 6 k 6 n, for n = 10, 15, ..., 30. Compare with f on [1, 1].

In short, we should never use equidistant points when approximating a function by interpola-
tion. Are there better choices?

Theorem 2.18. [Faber]


20 Polynomial approximations

We discuss better choices below. We start with the following analogue of the Taylor-Lagrange
formula.

Theorem 2.19. Let a < x0 < < xn < b, and let f C n+1([a, b]). Let p Rn[x] be such that
f (xi) = p(xi) for all i. Then, for all x [a, b], there exists x (a, b) such that
n
f (n+1)(x) Y
f (x) p(x) = W (x), W (x) = (x xi).
(n + 1)!
i=0

Proof. This is obvious when x {xi }. Assuming x / {xi }, let = f p W where is chosen
so that (x) = 0. Then, we have (xi) = 0 for all i, and by Rolles theorem there exist n+1 points
y1 < < yn+1 with (yi) = 0. Iterating the argument, there exists [a, b] such that (n+1)() = 0.
Now recall that the polynomial W is monic and has degree n + 1, the polynomial p has degree at
most n: this implies W (n+1)() = (n + 1)! and p(n+1)() = 0, which yields the result. 

This result encourages us to search for families of xi which make kW k as small as possible.
Its time for us to introduce Chebyshev polynomials.

Assume [a, b] = [1, 1]. The n-th Chebyshev polynomial of the rst kind is dened by
Tn(cos t) = cos (n t),t [0, 2].
The Tn can also be dened by
T0(x) = 1, T1(x) = x, Tn+2(x) = 2xTn+1(x) Tn(x),n N.
Among their numerous nice features, there is the following result which suggests to consider a
certain family of interpolation nodes.

Proposition 2.20. The minimum value of the set


 
max |p(x)|: p Rn[x], lc(p) = 1
x[1,1]

is uniquely attained for Tn/2n1 and is therefore equal to 2n+1.

Forcing W (x) = 2n Tn+1(x) leads to the interpolation points


(2 k + 1)
k = cos , k = 0, ..., n,
2 (n + 1)
called the Chebyshev nodes of the rst kind.
Another important family is that of Chebyshev polynomials of the second kind Un(x), dened by
sin ((n + 1) x)
Un(cos x) = .
sin (x)
They can also be dened by
U0(x) = 1, U1(x) = 2x, Un+2(x) = 2xUn+1(x) Un(x),n N.

d
For all n > 0, we have dx
Tn = n Un1. So the extrema of Tn+1 are 1, 1 and the zeros of Un,
that is,  
i
k = cos , k = 0, ..., n,
n
called the Chebyshev nodes of the second kind. With W (x) = 2n+1(1 x2) Un1(x), we have
kW k = 2n+1.
It is obvious that deg Tn = deg Un = n for all n N. Therefore, in particular, the family (Tk)06k6n
is a basis of Rn[x]. In the sequel of the chapter, we give results that allow for the (fast) computation
of the coecients of interpolation polynomials, at the Chebyshev nodes, expressed in the basis
(Tk)06k6n.
2.5 Clenshaws method for Chebyshev sums 21

Proposition 2.21. (Discrete orthogonality.)


i. We have
n
X 0,
i=/ j,
Ti(k) T j (k) = n + 1, i = j = 0,
k=0 n + 1, i = j =

/ 0.
2
ii. We have
n
X 0, i =
/ j,
Ti(k) T j (k) = n, i = j {0, n},
k=0 n, i = j

/ {0, n}.
2

Exercise 2.5. Prove the previous proposition.


P
The discrete orthogonality property implies the following ( denotes that the rst term of the
P
sum has to be halved, denotes that the rst and the last terms of the sum have to be halved).

Proposition 2.22.
P
i. If p1,n = 06i6n c1,i Ti Rn[x] interpolates f on the set {k : 0 6 k 6 n}, then
n
2 X
c1,i = f (k) Ti(k).
n+1
k=0
P
ii. Likewise, if p2,n = 06i6n c2,i Ti interpolates f at {k: 0 6 k 6 n}, then
n
2X
c2,i = f (k) Ti(k).
n
k=0

Proof. Exercise. 

2.5 Clenshaws method for Chebyshev sums


Given coecients c0, ..., cN and a point t, we would like to compute the sum
N
X
ck Tk(t).
k=0

Recall that the polynomials Tk satisfy Tk+2(x) = 2x Tk+1(x) Tk(x). A rst idea would be to
use this relation to compute the Tk(t) that appear in the sum. Unfortunately, this method is
numerically unstable. This is related to the fact that the Uk(x) satisfy the same recurrence but
grow faster: we have
kTk k = 1, kUk k = k + 1.
Clenshaws algorithm below does better.

Algorithm 2.2
Input. Chebyshev coecients c0, ..., cN , a point t
PN
Output. k=0 ck Tk(t)

1. bN +1 0, bN cN
2. for k = N 1, N 2, ..., 1
a. bk 2 t bk+1 bk+2 + ck
3. return c0 + t b1 b2
22 Polynomial approximations

Proof. By denition of the bk, we have


N
X
ck Tk(t) = c0 + (b1 2 t b2 + b3) T1(t) + + (bN 1 2 t bN + bN +1) TN 1(t) + cN TN (t).
k=0

The sum simplies to c0 + b1 t + b2 (T2 2 t T1) using the recurrence relation and the values of bN ,
bN +1. 

This algorithm runs in O(N ) arithmetic operations.

2.6 Computation of the Chebyshev coefficients


Now, how do we compute the ck? Assume we want to perform interpolation at the Chebyshev
nodes of the second kind and obtain the P result on the Chebyshev basis: given y0, ..., yN , we are

looking for c0, ..., cN such that p(x) = N
j=0 c j T j (x) satises p(k) = yk for all k.
By discrete orthogonality, we have
N
2 X
cj = yk Tk( j ).
N
k=0
Observe that we have  

Tk( j ) = cos j k
N
and hence !
N
2 X
c j = Re yk jk , = ei/N .
N
k=0

So the c j are (up to scaling) the real part of the discrete Fourier transform of the yk.
The DFT is the map Y 7 V Y , where = e2i/M and V = Vandermonde(1, , ..., M 1). We
have V 1 V = M Id, hence
PMthe DFT is almost its own inverse. The DFT sends the coecient
1
vector of a polynomial P = n=0 yn xn to its values P (1), P (), ..., P ( M 1).
Assume that M = 2 m is even; then, m = 1. Rewrite P as
P (x) = Q0(X) (X m 1) + R0(X) = Q1(X) (X m + 1) + R1(X)
with deg R0, deg R1 < m. Then (
R0( ), even
P ( ) =
R1( ), odd.

Thus evaluating P at 1, , ..., M 1 reduces to...


Chapter 3
D-Finiteness
In this chapter, we present a nice class of functions that:
contains a large number of elementary and special functions;
admits fast algorithms for evaluation;
allows for automatic proofs of identities.

3.1 Linear differential equations and linear recurrences

3.1.1 Definition
Notation 3.1. K denotes a field, K[[x]] the ring of formal power series with coefficients in K,
and K((x)) the field of fractions of K[[x]], that is, the field of formal Laurent series. Observe that
K((x)) is an algebra over K(x).

Definition 3.2. A formal power series A K[[x]] is called dierentially nite (abbreviated D-
nite) when its derivatives A, A , A , ... span a finite-dimensional vector subspace of K((x))
regarded as a vector space over K(x).

In other words, there exist polynomials p0(x), ..., pr(x) in K[x] such that A satises a linear
dierential equation of the form
p0(x) A(r)(x) + + pr 1(x) A (x) + pr(x) A(x) = 0.

Example 3.3. Rational functions are D-nite, and so are the classical exp, ln, sin, cos, sinh, cosh,
arcsin(h), arccos(h), arctan(h) as well as many special functions like Bessel J , I , K , Y , Airy Ai
and Bi, the integral sine (Si), cosine (Ci) and exponential (Ei) and many more.

Our point of view on these objects is that dierential equations will serve as a data structure
to work with D-nite series.

Definition 3.4. A sequence (an) of elements of K is called polynomially recursive (abbreviated


P-recursive) when its shifts (an), (an+1), ... span a finite-dimensional vector space over K(n).

Translation. A sequence (an) is P-recursive when it satisfies a recurrence relation of the form

q0(n) an+ + + q(n) an = 0, n > 0,


with polynomial coefficients q0, ..., q.

3.1.2 Translation
Theorem 3.5. A formal power series is D-finite if and only if its sequence of coefficients is
P-recursive.

23
24 D-Finiteness

Proof. We have the following dictionary (actually a ring morphism in a suitable setting):
f (x) fn
f (x) fn
x f (x) fn1
x f (x) n fn.

By combining these rules, we can translate any monomial xi f (j)(x) (resp. ni fn+ j ), and hence any
linear dierential equation/recurrence. 
Example 3.6. The dierential equation y = y that denes exp translates into (n + 1)yn+1 = yn
that denes /n!.
Example 3.7. The orders of the linear recurrence and dierential equation do not necessarily
match. For instance, the rst order y xk1 y = 0, that denes exp(xk/k) translates into the
linear recurrence (n + 1)yn+1 ynk+1 = 0 of order k. This recurrence has a vector space of
solutions of dimension k, but only a subspace of dimension 1 corresponds to the solutions of the
linear dierential equation. This subspace can be isolated by paying attention to the initial values
during the translation. Here, the identities y1 = = yk1 = 0 also come out of the translation.
Example 3.8. Assume we want to compute the coecient of x1000 in
p(x) = (1 + x)1000 (1 + x + x2)500.
A naive way of doing it would be to expand the polynomial. However, observing that
p (x) 1000 (2 x + 1)
= + 500 ,
p(x) 1 + x 1 + x + x2
yields a linear dierential equation (LDE) of order 1 for p, with coecients of order 3:
Maple 7] p:=(1+x)^1000*(1+x+x^2)^500:
deq:=numer(diff(y(x),x)/y(x)-diff(p,x)/p);
     
999 2 499 d 3 d 2 2 d
(1 + x) (x + x + 1) y(x) x + 2 y(x) x 2000 y(x) x + 2 y(x) x
dx dx dx

d
2500 y(x) x + y(x) 1500 y(x)
dx
This equation then translates into a linear recurrence equation (LRE) of order 3 with linear
coecients:
Maple 8] gfun:-diffeqtorec({%,y(0)=1},y(x),u(n));
{(2000 + n) u(n) + (2498 + 2 n) u (n + 1) + (1496 + 2 n) u (n + 2) + (n + 3) u (n + 3),
u(0) = 1, u(1) = 1500, u(2) = 1124750}
Then it suces to unroll this recurrence. A fast way of doing so is presented in 3.4.

3.2 Closure properties

3.2.1 Sum and product


Theorem 3.9. The set of D-finite power series in K[[x]] is a K-algebra. The set of P-recursive
sequences with elements in K is a K-algebra as well.
Proof. We need to prove that both D-nite series and P-recursive sequences are stable under the
operations of sum and product. All these proofs are similar, products beeing slightly more dicult
than sums. We detail the case of products of D-nite series.
Let f , g K[[x]] be D-nite series, and let h = f g. We know that for all i, j,

f (i) VectK(x) f , f , f , ..., f (m)

g (j) VectK(x) g, g , g , ..., f ()
3.2 Closure properties 25

for some m and . Now, by Leibniz formula,


k  
X n 
h(k) = f i g(k i) VectK(x) f (i) g(j) : 0 6 i 6 m, 0 6 j 6 ,
k
i=0

i.e. the derivatives of h all lie in a nite-dimensional vector space. 


Note that these proofs are fully eective: for instance, from the dierential equations for f and g,
we can compute a dierential equation for f g using linear algebra over K(x).
Example 3.10. A simple proof that

X k! x2k+2
arcsin (x)2 = 1 1  .
k=0 2 2
+k 2k+2
1
The starting point is the observation that f (x) = arcsin (x) satises f = , which makes it
1 x2
x
possible to define it by the linear dierential equation f = 1 x2
f , plus initial conditions. From
there, we get the sequence of computations:
h = f 2,
h = 2f f ,
2x
h = 2f 2 + 2
f f ,
 1 x 
4x + 2 6x2 2x
h = + ff+ f 2.
1 x2 (1 x2)2 1 x2
At this stage we have 4 vectors (h, h , h , h ) expressed in terms of 3 generators (f 2, f f , f 2). A
linear dependency is therefore found by looking for the kernel of a 3 4 matrix. The coordinates of
a generator of the kernel are the coecients of the desired dierential equation. (Note that actually
in this example, since f 2 only occurs in h, it is sucient to consider the three last equations.) This
whole computation is easily automated:

Example 3.11. A computation-free proof that sin2 + cos2 = 1. Both sin and cos are dened by
y + y = 0. If f is a solution of this equation, then as in the previous example, f 2 is solution of a
linear dierential equation of order at most 3, all derivatives being generated by (f 2, f f , f 2). Thus
both sin2 and cos2 satisfy the same linear dierential equation of order at most 3, and therefore so
does their sum. Next, 1 is solution of a linear dierential equation of order 1, namely y = 0 and
thus the sum sin2 + cos2 1 and its derivatives live in a vector space of dimension at most 4, hence
is solution of a linear dierential equation of order at most 4. Checking that the desired solution
of this equation is exactly 0 reduces to checking 4 initial conditions, ie, the proof is summarized by
sin2x + cos2x 1 = O(x4) = sin2 + cos2 = 1.
Note that the actual value of the dierential equation is not important here, only its order has
been used.
Note also that one can simplify the argument further (and reduce the order accordingly) if one
takes into account the fact that sin = cos . Then we consider h = f 2 + f 2 1 whose derivative is
h = 2f f 2f f = 0 and thus checking the value at 0 is sucient.

3.2.2 Hadamard product


P P
Proposition 3.12. If f (x) = fn xn and g(x) = gn xn are D-finite, then so is their Hadarmard
product
X
f g= fn gn xn.

Proof. This is a combination of the previous results: since f and g are D-nite, their sequences
of coecients (fn) and (gn) are P-recursive, then their product (fngn) is P-recursive too by the
previous theorem and nally the generating series of this product is D-nite. 
26 D-Finiteness

Example 3.13. Mehlers identity for Hermite polynomials. Hermite polynomials are dened by
X zn
Hn(x) = exp (z (2 x z)).
n!
n>0
Mehlers identity asserts that
 
4z(x y z (x2 + y 2))
X z n exp 1 4z 2
Hn(x) Hn(y) = .
n! 1 4z 2
n

This can be proved easily by noticing that the left-hand side is nothing but

X
exp (z(2x z)) exp (z(2y z)) n!z n ,
n=0

a Hadamard product of three factors that are clearly D-nite.

3.3 Algebraic series


Definition 3.14. A formal power series A(x) K[[x]] is algebraic if there exists a nonzero
polynomial P K[x, y] such that P (x, A(x)) = 0.

Theorem 3.15. Algebraic series are D-finite.

As a simple consequence, the series expansions of algebraic functions can be computed in only
linear (ie, optimal) arithmetic complexity.

Proof. Without loss of generality, we can assume that P and P y = y
P are coprime. From the
equation P (x, A(x)) = 0, we get by dierentiation
Px(x, A(x)) + P y(x, A(x)) A (x) = 0.
We then invert P y mod P . Let U , V K(x)[y] be the co-factors in Bzouts identity, that satisfy
U P y + V P = 1. Then multiplying the previous equation by U (x, A(x)) leads to
 
U (x, A(x)) Px(x, A(x)) + 1 V (x, A(x))P (x, A(x)) A (x) = 0.
||||||||||||||{z}}}}}}}}}}}}}}
0

The factor of A is exactly 1 (this was the aim of the inversion modulo P ) and the rst term is a
polynomial evaluated at A(x), which is therefore equal to the evaluation of its remainder in the
Euclidean division by P . Denoting by the degree of P with respect to y, we obtain
A (x) = R1(x, A(x)), with deg yR1 < .
Dierentiating once more leads to
A (x) = R1x(x, A(x)) + R1y(x, A(x))R1(x, A(x)) = R2(x, A(x)) with deg yR2 < .

Thus by induction, for all i, A(i) VectK(x)(1, A, A2, ..., A 1). 

Corollary 3.16. If f is D-finite and A is algebraic with A(0) = 0, then f A is D-finite.

Proof. Consider Vect(f (i)(A) A j ). 

3.4 Binary splitting

3.4.1 Fast computation of n!


Stirlings formula tells us that
1
log n! = n log n n + log n + O(1) as n .
2
3.4 Binary splitting 27

Hence the bit size of n! is (n log n) (and thus we cannot hope to compute it in less than (n) bit
operations). Similarly, 1!, 2!, ..., n! taken together have size (n2 log n).
In the nave alogrithm (compute n! by successive multiplications by k, k = 2, ..., n), the
multiplication k (k 1)! can be done by decomposing (k 1)! into k chunks of roughly log k bits.
The cost of the multiplication is then O(k log k log log k log log log k), and the total complexity
O(n2 log n log log n log log log n),
which is not too bad if we need all the values 1!, 2!, ..., n!.
If however we want to compute n! alone, we can to much better. Dene
P (a, b) = (a + 1) (a + 2) b.
Compute n! as
n! = P (0, n) = (1 2 n/2) ((n/2 + 1) n) = P (0, n/2) P (n/2 + 1, n)
and recurse. The key observation is that P (0, n/2) has size half of that of n! (by Stirlings formula)
and thefore so does the second factor. Assuming for simplicity that n is a power of 2, the binary
complexity can be bounded as follows:
     
n n n
C(0, n) = C 0, +C , n + MZ log n ,
 2  2   2
n n
6 MZ log n + 2C ,n ,
2 2  
   
n n 3n
6 MZ log n + 2MZ log n + 4C ,n ,
2 4 }}}}}}}}}}}}}}}}}}}
||||||||||||||||||||{z} 4
n
6MZ 2 log n

 
n
6 6 MZ log n log n = O(n log3 n log log n).
2
In the second line, we use the fact that the factors increase; in the third line, we iterate the
inequality once and use the convexity of the multiplication function M; in the last one the bound
log n on the number of recursive steps.
Finally, we have obtained n! in quasi-optimal bit complexity.

3.4.2 General case


We now consider a general recurrence
p0(n) un+r + + pr(n) un = 0, n>0 (3.1)
with given initial values u0, ..., ur 1. Letting

un
un+1
Un =




un+r 1
lets us reduce the question to a rst-order recurrence over vectors:

p0(n)
p0(n)
1

Un+1 =

Un ,
p0(n)

p0(n)
pr(n) pr 1(n) p1(n)
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||{z}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
A(n)

and hence we can apply the same idea and get


1
uN = A(N 1) A(N 2) A(0) U0.
p0(N 1) p0(0)
This matrix factorial can be computed the same way as the real factorial above.
28 D-Finiteness

Theorem 3.17. Assume that the recurrence ( 3.1) is nonsingular ( 0 / p0(N)), that all pi have
degree bounded by d and integer coefficients bounded by 2, and that u0, ..., ur 1 Q have numer-
ators and denominators bounded by 2. Then one can compute uN in

O(r N log2 N log log N (d log N + ))


bit operations, where r is a bound on the number of arithmetic operations needed to compute the
product of two r r matrices.
Pr
Proof. Consider the norm kuk:= j =1 |u j | and the induced norm on matrices
r
X
kM k:= max |mij |.
16 j 6r
i=1
The hypothesis implies
kA(k)k6Cr 2k d
for some C > 0. From there portions of the matrix factorial are bounded by
 d
( )n (n)!
kA(n)kkA(n)k6(Cr 2 ) ( > )
(n)!
and thus have size O(( )(dn log n + n)) as n . From there the proof proceeds as in the
case of n!.
(Note that the values of pi(n) for 0 6 n 6 N 1 are also needed. They can be computed within
the same complexity bound even with a nave algorithm.) 

3.5 Numerical evaluation


The idea of binary splitting leads to fast methods for the numerical evaluation of a wide variety
of constants and functions.

3.5.1 exp (1)


As a rst example, consider the sequence
n
X 1
en = e, n .
k!
k=0

1
Additionally, we have 0 < e en < n n! for all n, hence e en < 10N for n = O(N /log N ). Now
we obtain a linear recurrence of order 2 satised by (en):
1 1
en+1 en = = (en en1).
(n + 1)! n + 1
By Theorem 3.17, it follows that e con be computed within precision 10N in O(N log2 N log log N )
bit operations. (This gives us a huge rational number close to e. To get a binary/decimal expansion,
there remains to do a division. The inverse of the denominator can be computed eciently by
Newtons method for O(MZ(N )) bit operations.)

Example 3.18. All recent records in the computation of were obtained using basically the same
technique, starting from the following series due to the Chudnovskys

1 12 X (1)n(6n)!(A + Bn)
= 3/2 ,
C (3n)!n!3C 3n
n=0

with A = 13, 591, 409, B = 545, 140, 134, C = 640, 320. This series yields about 14 digits per term.
That alone is not sucient to reach a good eciency, which is achieved by observing that the
summand satises a linear recurrence of rst order and applying binary splitting.
3.5 Numerical evaluation 29

3.5.2 Analytic D-finite series


The fast computation of exp (1) can be generalized.
Consider y(z) solution to
a0(z) y (r)(z) + + ar(z) y(z) = 0, (3.2)
where the ai are polynomials. Our aim is to evaluate y numerically given initial conditions. We
rst state an existence theorem, with an eective proof for later use.

Theorem 3.19. (Cauchys theorem.) If z0 C is such that a0(D(z0, R)) / 0 then, for any y0, ...,
yr 1 , there exists a solution of Eq. ( 3.2) such that y (i) = yi for 0 6 i < r and that is analytic
in D(z0, R).

Proof. (By the Cauchy-Kovalevskaya method of majorants.) Rewrite (3.2) as



y
y

Y =AY , Y = . (3.3)

(r 1)
y

Since a0(D(z0, R))


/ 0, the matrix function A is analytic in D(z0, R). This implies that it has an
expansion as a power series
X
A(z) = Ak (z z0)k ,
k>0

such that for 0 < < R, there exists > 0 satisfying kAk k 6 for all k > 0.
k+1
Now consider the formal power series dened by

y0
Y (0) = and Y =AY ,
yr 1
that is, by
n
X
(n + 1) Yn+1 = Ak Ynk.
k=0
We have
n n
X X
(n + 1) kYn+1k 6 kAk k kYnk k 6 kYnk k.
k=0 k=0
k+1
Dene (un) by u0 = kY0k and
n
X
(n + 1) un+1 = unk.
k=0
k+1
P
Then kYn k 6 un. The generating series u(z) = un z n satises

u (z) = u(z),
z
hence  
z
u(z) = u0 1 .

Since the coecients of Y are dominated by those of u, which is analytic, the series Y is convergent
for |z z0| < . And since we can do this for any < R, the function Y is analytic in D(z0, R). 

The proof also yields bounds on the tails of the series expansion of Y (z). Indeed, we have

X X
n

Yn (z z0) 6 yn |z z0|n
n>N n>N
30 D-Finiteness

where
   
n+1
yn = y0 n (1)n = y0 n
n  n 
 
+ n 1 + N |z z0|
6 y0 N |z z0|N 1 + + .
N 1+N
The series on the right-hand side is convergent, say its sum is bounded by M . Now, in order to
ensure
X

Yn (z z0) 6 10k ,
n
n>N
it is sucient to take

k log 10 6 N log + log (...) + cst
|z z0|
i.e.
log 10
N >k + cst.
log |z z |
0

Combining this bound with the binary splitting method, we obtain the following theorem.

Corollary 3.20. When yi Q, ai Q[z], Q D(z0, R), all with numerators and denominators
bounded by 2, then y() can be computed at precision 10N in

2 r log N +
O N log N log log N R

log | z |
0

bit operations. So can y (), ..., y (r 1)().

3.5.3 Analytic continuation


Proposition 3.21. The set of solutions of Y = A Y forms an r-dimensional vector space over C.

Proof. Consider Y [i] with initial conditions 0 except in the ith coordinate that is 1, for 1 6 i 6 r. 

Definition 3.22. A fundamental matrix of Y = A Y is a matrix W whose columns form a basis


of solutions.

Clearly, such a matrix satises W = A W and any solution can be written W C with C a
constant vector.

Definition 3.23. The transition matrix between z0 and z1 D(z0, R) is the matrix such that
W (z1) = M (z0 z1) W (z0).

This matrix is well dened since W (z0) is invertible. The fundamental matrix itself has a radius
of convergence and by Cauchys theorem the solution can be extended to an analytic continuation
inside this new disk. Proceeding in this manner, one constructs a path (z0, z1, ..., zk) and transition
matrices M (z0 z1), M (z1 z2), ..., M (zk1 zk) whose product (in the right order) constructs
the analytic continuation of the fundamental matrix along that path. Each of these matrices can
be computed eciently by binary splitting if one takes for zi points with rational coordinates
bounded by 2 in the notation of the previous section. With a bit more eort, the cumulated error
is bounded by the product of the norms kM (zi zi+1)k that can be controled.

3.5.4 Bit burst


The previous section shows how to compute the value of an arbitrary D-nite function at an
arbitrary non-singular point given by its rational coordinates of moderate size. If the point itself
is given at precision 2N , then the estimate obtained with = N gives a quadratic complexity.
However, this can be improved.
3.5 Numerical evaluation 31

Proposition 3.24. For any D(z0, R), the value y() can be computed at precision 2N in
O(N log3N loglog N ) binary operations.

Proof. Compute z0, z1, ..., zm = with


i i
zi = 22 22 ,
ie, zi is the binary number with the rst 2i bits of . The idea is that the computation of y(zi+1)
needs more accuracy than that of y(zi), but fewer terms of the series expansion. Thus in the
complexity estimate, the term
+ r log N
R
log |z
i+1 zi |
becomes
2i + r log N
2i + log R
and their sum is
log log
! log
XN  2i r log N
 XN XN r log N
+ 6 1 + 6 (r + 1)log N = O(log N ).
2 + log R 2i + log R
i 2i
i=1 i=1 i=1

Chapter 4
Rational Approximation

4.1 Why rational approximation?


From a computer scientists point of view, the two most natural kinds of approximations to deal
with are polynomial and rational approximations. Indeed, polynomial approximations can be eval-
uated using only the ring operations +, , , and rational approximations using +, , , /.
However, we will use rational fractions only when they oer a signicant advantage in terms of
approximation, for the following reasons.
1. Rational functions are nonlinear objects.
2. Divisions are usually slower than multiplications, especially for small precisions. (Floating-
point division at machine precision is currently about ten times as slow as multiplication.
At high precisions, the cost of division is not much larger than that of multiplication, thanks
to Newtons method.)
The complexity/nonlinearity of rational functions can also be an advantage. For instance, it is
hopeless to approximate a function by a polynomial in the neighborhood of a singularity, while
rational functions can swallow poles.
Let m, n N. Denote
 
P
Rm,n = R = R(x) : P Rm[x], Q Rn[x]\{0} .
Q

We could additionally assume P Q = 1 and lc Q = 1 in the denition. Additionally, since we are


interested in continuous functions (and thus elements of Rm,n with no poles on [a, b]), we can
replace Rm,n by
 
P
Rm,n = R = R(x) : P Rm[x], Q Rn[x]\{0}, |lc Q| = 1, x [a, b] Q(x) > 0
Q

in the statements of this chapter.


Let f C([a, b]). Observe that {kf Rk, R Rm,n } is a nonempty subset of R+, and let

Em,n = inf kf Rk.


RRm,n

(This generalizes the notation En introduced for polynomials in Chapter 2.)


Let us compare the quality of approximations by polynomials and by rational functions on some
examples. In the case of f = exp over [0, 1], one nds

E4,4(f ) = 4.95... 1013,


E8,0(f ) = 3.49... 1010,
E10,0(f ) = 1.98... 1014.

Using Horners scheme, evaluation of a polynomial of degree 8 uses 8 multiplications and 8 addi-
tions. Evaluation of a deg 4/deg 4 rational fractions uses 8 multiplications, 8 additions, plus one
(expensive!) division. Rational approximation makes little sense in this case.

33
34 Rational Approximation

If f (x) = tanh (5x) over [0, 1], one gets


E4,4(f ) = 3.33... 106,
E8,0(f ) = 2.99... 104
E15,0(f ) = 2.49... 106,
E16,0(f ) = 5.99... 107.
ex
Now consider f (x) = 2 x over [1, 1]. We nd

E5,1(f ) = 5.11... 106,


E5,5(f ) = 6.13... 1013,
E22,0(f ) = 6.43... 1013,
E23,0(f ) = 1.72... 1013.
Here rational functions perform much better; this is related to the presence of a pole of the function
not too far from the segment we are interested in.
More examples:
if we consider f (x) = |x| over [1, 1], one can show

En,0(f ) , with = 0.2801... [24],
n
En,n(f ) 8e n [19].
Assume f (x) = ex on (, 0]. Since f is bounded, it cannot be well approximated by
nonconstant polynomials. In contrast, one can prove that
p
n 1 1
n+ 2
E0,n(f ) [18] and En,n 2 H , H 1 9.28 [1]
3
as n . This last example is related to a question raised in [3].

Rational approximation is useful for


function evaluation;
digital signal processing;
Diophantine approximation (e.g., irrationality proofs);
analytic continuation;
acceleration of convergence;
...

4.2 Best L approximations

4.2.1 Existence
Proposition 4.1. To each function f C([a, b]), there corresponds at least one best rational
approximation in Rm,n.

Proof. By analogy with the polynomial case, we might be tempted to consider the set
{R Rm,n : kR f k 6 2 kf k}.
This is a nonempty, closed and bounded set. It is not compact, though (remember that Rm,n is
not a nite-dimensional vector space over R, unlike Rn[X]). To illustrate this, simply consider the
1
sequence of continuous functions Rk(x) = kx + 1 , x [0, 1], k N. For any k N, kRk k 6 1 but the
function R dened as limk+Rk is not continuous since R(0) = 1 and R(x) = 0 otherwise.
4.3 An extension of Taylor approximation: Pad approximation 35

Instead, let (Rh)hN be a sequence of elements of Rm,n such that


kRh f k Em,n(f ) and kRh k 6 2 kf k.
/ 0, x [a, b] andkQh k = 1. We have
Write Rh = Ph/Qh. We can assume that Qh(x) =
kPh k 6 kQh k kRh k 6 2 kf k,
hence both (kPh k) and (kQh k) are bounded sequences, now in nite-dimensional vector spaces. Let
: N N be a strictly increasing function such that P (h) P Rm[x] and Q (h) Q Rn[x].
Note that kQk = 1: Qis nonzero. Set R = P /Q. For all x [a, b] such that Q(x) =
/ 0, we have

P (h)(x)
|f (x) R(x)| = lim f (x) 6 Em,n(f ).
h Q (h)(x)

The same holds for the remaining x by continuity. 

4.2.2 Equioscillation and unicity


Let R = P /Q Rm,n, with P , Q R[x], P Q = 1, lc Q = 1. Write = deg P , = deg Q.

Definition 4.2. The defect of R is the integer d(R) = min (m , n ).

Theorem 4.3. (Achieser, 1930) Let f C([a, b]). A rational function R Rm,n is a best
approximation to f if and only if R f equioscillates between at least m + n + 2 d(R) extreme
points. There is a unique best approximation.

Remark 4.4. There is again a Remez algorithm for computing best rational approximations, with
the same rate of convergence as in the polynomial case.

4.3 An extension of Taylor approximation: Pad approxima-


tion

4.3.1 Introduction
Let K be a eld, and let f K[[x]]. For m N, the degree-m Taylor approximant to f is the unique
pm Km[x] such that
f (x) pm(x) = 0 mod xm+1.
If now f C m+1 in the neighborhood of 0 (instead of f being a formal series), the analogous
condition is f (x) pm(x) = O(xm+1).
To extend this to rational functions, given f K[[x]], we would like to determine R = P /Q
(P Km[x], Q Kn[x]) such that
f (x) R(x) = 0 mod xm+n+1. (4.1)
Here again, we may also consider f C m+n+1, in which case we ask that
f (x) R(x) = O(xm+n+1).
In contrast with the case of Taylor approximation, it is not always possible to satisfy (4.1).

Example 4.5. Consider m = n = 1 and f (x) = 1 + x2 + x4 + . If R is a solution, then

R(x) = 1 + x2 mod x3. (4.2)


Write R(x) = (a x + b)/(c x + d), where we can assume that a d b c =
/ 0 (otherwise R K). Then
adbc
R (0) = =
/0
d2
which contradicts (4.2).
36 Rational Approximation

If we consider instead the problem of nding P Km[x] and Q Kn[x] such that
Q(x)f (x) P (x) = 0 mod xm+n+1,
it always has a nontrivial solution: think of it as a linear algebra problem, the homogeneous
linear system has n+1 + m+1 = n + m + 2 unknowns which are the coecients of P and Q and
n + m + 1 quations. Actually this linear system is given by a so-called Toeplitz matrix of dimension
(m + n + 1, n + 1). It is a structured matrix for which fast inversion algorithms exist, with the
same costs as the ones given in Remark 4.13. Nevertheless, we favour another presentation of the
problem.

4.3.2 Rational fraction reconstruction


Definition 4.6. Let M , P K[x]\{0}, with deg M = N > deg P. Given k J1, N K, the rational
reconstruction of P modulo M is the determination of a pair (R, T ) K[x]2 satisfying the conditions
P1. gcd (T , M ) = 1, deg R < k, deg T 6 N k, and T 1 R = P mod M.
(T 1is the inverse of T mod M.)

Notable special cases:


If M = xN , a solution R/T is called a Pad approximation of type (k 1, N k) for P . In
this case we think of P as a truncated power series.
QN
If M = i=1 (X ui) with pairwise distinct ui, the problem is called Cauchy interpolation:
in that case, P (ui) = R(ui)/T (ui), for 1 6 i 6 N .
Exercise 4.1. Show that the solution to P1 is unique if it exists.

Remark 4.7. Since T is invertible, we feel like considering the problem of nding (R, T ) K[x]2
such that
P2. deg R < k, deg T 6 N k, and R = P T mod M .
This condition is strictly weaker than the condition in P1.

4.3.2.1 A reminder of the extended Euclidean algorithm


Warning: We only need the results of Propositions 4.10 and 4.11. Reading the rest of this subsection
is optional.

If P is a polynomial, we denote by lc(P ) its leading coecient. As usual, if A and B K[x],


we will denote by gcd (A, B) the monic greatest common divisor of A and B.

Algorithm 4.1
Euclidean division.
Pn Pm
Input. Two nonzero polynomials A = i=0 ai xi and B = i=0 bi xi K[x].
Output. The couple (Q, R) K[x]2 such that A = B Q + R and deg R < m
1. If n < m, return Q = 0 and R = A
1
2. R := A, u := bm ,
3. for i = n m, n m 1, ..., 0, do
4. if deg R = m + i then qi := lc (R) u, R := R qi X i B
else qi := 0
P
5. Return Q = 06i6nm qi X i and R

Proposition 4.8. Euclidean division. Let n, m N, A and B K[x] with deg A = n and deg B = m.
1. A couple (Q, R) K[x]2 such that A = B Q + R and deg R < m is unique.
2. The Algorithm 4.1 returns the correct output.
4.3 An extension of Taylor approximation: Pad approximation 37

3. If n m, this algorithm requires at most (2 m + 1) (n m + 1) + 1 arithmetic operations in


K.

Proof. 1. Let (Q1, R1) and (Q2, R2) K[x]2 such that A = B Q1 + R1 and deg R1 < m,
A = B Q2 + R2 and deg R2 < m. Therefore, we have B (Q1 Q2) = R1 R2. The polynomial
B divides R1 R2 and deg B > deg (R1 R2) : necessarily, Q1 Q2 = 0 and consequently
R1 R2 = 0.
2. By construction, the nal R satises R = A Q B. Moreover, for all i = n m, n m 1, ...,
0, if deg R = m + i, then deg (R qi X i B) 6 m + i 1 since lc(R) = lc(qi X i B) by
construction. Therefore, we have deg R 6 m + i 1 after step 4. We get by induction that
the degreee of the nal R is lesser than m. We proved the uniqueness in 1.
3. Step 2 requires an inversion. Step 4 requires m + 1 multiplications (qi bj for j = 0, ..., m 1)
and m subtractions (for j = 0, ..., m 1, if ri+j denotes the coecient of X i+j in R, we
perform the subtractions ri+ j qi b j for j = 0, ..., m 1). This last step is executed n m + 1
times, which yields the result. 

Algorithm 4.2
Extended Euclidean algorithm.
Pm Pn
Input. Two nonzero polynomials A = i=0 ai xi and B = i=0 bi xi.
Output. N, four tuples of polynomials (Qi)16i6, (Ri)06i6+1, (Si)06i6+1, (Ti)06i6+1.
1. Set
R0 := A, S0 := 1, T0 := 0,
R1 := B , S1 := 0, T1 := 1,
i := 1.
2. While Ri =
/ 0 do
a. Let Qi , Ri be the quotient and the remainder in the Euclidean division of Ri1 by Ri.
b. Set Si+1 := Si1 Qi Si.
c. Set Ti+1 := Ti1 Qi Ti.
d. i := i + 1.
3. Return = i 1, (Qi)16i6, (Ri)06i6+1, (Si)06i6+1, (Ti)06i6+1.

Now we introduce the matrices with coecients in K[x]


   
S0 T 0 0 1
U0 = , Wi = for all 1 6 i 6
S1 T 1 1 Qi
and
Ui = Wi...W1U0 for all 0 6 i 6 .

Proposition 4.9. For all 0 6 i 6 , we have


   
A Ri
1. Ui B
= Ri+1
,
 
Si Ti
2. Ui = Si+1 Ti+1
and Si A + Ti B = Ri (this equality is also true for i = + 1),

3. Si Ti+1 Si+1 Ti = (1)i,


4. gcd (A, B) = gcd (Ri,Ri+1) = R/lc(R).

Proof.
1. By induction on i. For i = 0, its the rst step of the algorithm. We now assume i > 1 and
          
A Ri1 0 1 Ri1 Ri Ri
Ui = Wi = = = ,
B Ri 1 Qi Ri Ri1 Qi Ri Ri+1
38 Rational Approximation

2. A straightforward induction yields the rst equality which, combined with 1 implies Si A +
Ti B = Ri for 0 6 i 6 + 1.
3. We have, by denition, Ui = Wi...W1U0. It follows det (Ui) = det (Wi)...det(W1 )det(U0).
Since det Ui = Si Ti+1 Si+1 Ti, det (W j ) = 1 for all j and det U0 = 1, we obtain
Si Ti+1 Si+1 Ti = (1)i.
4. Let i {0, ..., }. We deduce from 1 that
     
R A Ri
= WWi+1Ui = WWi+1 .
0 B Ri+1

It follows that R is a linear combination over K[x] of Ri and Ri+1. Therefore gcd (Ri,Ri+1)
divides R. Moreover, det Wi = 1 : the matrix Wi is invertible of inverse
 
1 Qi 1
Wi = .
1 0
Hence    
Ri R
1
= Wi+1 W1
Ri+1 0
which implies that R divides Ri and Ri+1. This shows that R is agreatest common divisor
of Ri and Ri+1 and gcd (Ri,Ri+1) =R/lc(R). This is true in particular for i = 0. 

Proposition 4.10. Assume deg A > deg B. Then


X
deg Si = deg Q j = deg R1 deg Ri1 for all 2 6 i 6 + 1,
26 j <i
X
deg Ti = deg Q j = deg R0 deg Ri1 for all 1 6 i 6 + 1.
16 j <i

Only the second statement will be useful to solve P2. We use the rst one to prove Proposition
4.11.

Proof. We rst observe that from the initial assumption,deg R0 > deg R1 and by construction,
deg Ri > deg Ri+1 for 1 6 i 6 . It follows that deg Q1 > 0 and for 2 6 i 6 , deg Qi > 0 since Qj
is the quotient of the division of R j 1 by R j for j = 1, ..., . Therefore we have for 1 6 i 6 ,
deg (Qi Ri) = deg Qi + deg Ri > deg Ri > deg Ri+1, hence deg P Ri1 = deg (Qi Ri + Ri+1) = deg (Qi Ri)
i.e. deg Qi = deg Ri1 deg Ri for 1 6 2 6 . We obtain 26 j <i deg Q j = deg R1 deg Ri1 for all
P
2 6 i 6 + 1 and 16 j <i deg Q j = deg R0 deg Ri1 for all 1 6 i 6 + 1.
We have S2 = S0 Q1 S1 = 1 and deg S1 = < 0 = deg S2. Lets assume that we proved
X
deg S j 1 < deg S j for all 2 6 j 6 i and deg Si = deg Q j
26 j <i

for i {2, ..., }. We have


deg Si+1 = deg (Si1 Qi Si) = deg (Qi Si)
since deg (Qi Si) = deg Qi + deg Si > deg Si > deg Si1, the rst inequality is a consequence of
deg Qi = deg Ri1 deg Ri > 0 since i > 2 and the second one follows from the induction hypothesis.
It comes
X
deg Si+1 > deg Si and deg Si+1 = deg (Qi Si) = deg (Qi) + deg (Si) = deg Q j
26 j <i+1

if we apply the induction hypothesis again. This proves that the induction hypothesis also holds
for i + 1.
We have deg T1 = 0 and T2 = 0 Q1 = Q1, hence deg T2 = deg Q1.
The rest of the proof is identical to that of the rst statement since we can show
X
deg Ti > deg Ti1 and deg Ti = deg Qj
16 j <i
4.3 An extension of Taylor approximation: Pad approximation 39

for all i = 3, ..., + 1. 


Proposition 4.11. The cost of the extended Euclidean algorithm is O(m n) operations in K.
Proof. If deg B > deg A, the rst step then consists of swapping A and B and there is no
arithmetical cost. We assume deg B 6 deg A. From Proposition4.8, we know that the Euclidean
division of a polynomial P by a polynomial Q requires at most (2 deg Q + 1) (deg P deg Q + 1) + 1
arithmetic operations in K. The cost of the Euclidean algorithm (i.e. the computation of the
sequences (Qi)16i6 and (Ri)06i6+1) is therefore upper bounded by the sum

X
((2 deg Ri + 1) (deg Ri1 deg Ri) + 1).
i=1

The degree of each Ri is lesser or equal to deg B =


Pm for i > 1. As the deg Ri are nonincreasing for
i > 1, the cost is then upper bounded by (2 m + 1) i=1 (deg Ri1 deg Ri) + = (2 m + 1) (deg R0
deg R) + . The number is lesser or equal to m + 1 since R1 = B and deg Ri+1 < deg Ri for i > 1.
This yields a cost upper bounded by (2 m + 1) (n deg R) + m + 1 2 m n + n + m + 1 5 m n as
soon as min (n, m) 1.
The computation of Si+1 = Si1 Qi Si requires at most 2 deg Qi deg Si + deg Qi + deg Si + 1
arithmetic operations for the product Qi Si and deg Si+1 + 1 arithmetic operations for the
subtraction. From Proposition 4.10, we deduce that the cost is no larger than
X X
(2 (deg Ri1 deg Ri) (deg R1 deg Ri1) + 2 (deg R1 deg Ri + 1)) 6 (2 (deg Ri1
26i6 26i6

deg Ri) m + 2 m)
P
because deg Ri1 deg Ri > 1 for 2 6 i 6 . This upper bound 2 m 26i6 (deg Ri1 deg Ri) +
2 m ( 1) = 2 m (deg R1 deg R) + 2 m2 6 4 m2 = O (n m) for we assumed n > m.
Likewise the computation of Ti+1 = Ti1 Qi Ti requires at most 2 deg Qi deg Ti + deg Qi +
deg Ti + 1 arithmetic operations for the product Qi Ti and deg Ti+1 + 1 arithmetic operations for
the subtraction.
From Proposition 4.10, we deduce that the cost is no larger than the sum of n m + 1 (for
i = 1) and
X X
(2 (deg Ri1 deg Ri) (deg R0 deg Ri1) + 2 (deg R0 deg Ri + 1)) 6 (2 (deg Ri1
26i6 26i6

deg Ri) n + 2 n)
sincePdeg Ri1 deg Ri > 1 for 2 6 i 6 . The cost is upper bounded by n m + 1 +
2 n 26i6 (deg Ri1 deg Ri) + 2 n ( 1) = n m + 1 + 2 n (deg R1 deg R) + 2 n m 6
n m + 1 + 4 n m 6 n m + 4 n m = 5 n m as soon as m > 1.
The total cost is in O (n m). 
4.3.2.2 Solving the approximation problems P1 and P2
Let (Ri)06i6+1, (Si)06i6+1, (Ti)06i6+1 be the sequences of remainders and Bzout coecient
constructed by the extended Euclidean algorithm applied to the couple (M , P ). We have, see
Proposition4.8,
Ri = Si M + Ti P = Ti P mod M
for all i {0, ..., + 1}. We search for an i such that
deg Ri < k and deg Ti 6 N k.
Proposition 4.10 states that deg Ti = N deg Ri+1. Hence we want deg Ri < k 6 deg Ri1. The
sequence (deg Ri)06i6+1 is strictly decreasing: we have deg R0 = deg M =N > deg P = deg R1 and
(deg Ri)16i6+1 is always strictly decreasing. Since deg R0 = N and deg R+1 = , the integer i
exists: it is the smallest index j such that deg Ri < k. The couple (R j , T j ) is a solution to P2.
We now focus on P1. Weve just found R j and T j such that deg R j < k, deg T j 6 N k and
R j = T jP mod M. If gcd (R j , T j ) = 1, since gcd (R j , T j ) = gcd (M , T j ) = 1, the couple (R j , T j ) is a
solution to P1!
40 Rational Approximation

Conversely, assume that the couple (R, T ) is a solution to P1, with gcd (R, T ) = 1. Let S K[x]
such that R = SM +T P . We also have S jM + T jP = R j . If we suppose that S jT = / ST j , we get
M = (R jT RT j )/(S jT ST j ). Therefore, N = deg M = deg (R j T R T j ) deg (S j T S T j ).
Moreover,
deg (R j T R T j ) 6 max (deg (R j T ), deg (R T j )) = max (deg R j + deg T , deg R + deg T j )
6 max (k 1 + N k, k 1 + N k)
<N ,
from the denition of j and condition P1. Therefore, N = deg M 6 deg (R jT R T j ) < N :
contradiction.
Hence, we deduce T j |S j T . We know from Proposition 4.9 that S j and T j are coprime. This
implies that there exists K[x] such that T = T j , from which follows S T j = S j T = S j T j . As
T j is nonzero, we also obtain S = S j . Finally, R = S M + T P = (S j M + T jPB) = R j . Weve
just proved the following results.

Theorem 4.12.

1. There exists a solution (R, T ) = / (0, 0)to P2, which is (R, T ) = (R j,T j ). If, moreover,
gcd (R j,T j ) = 1, then (R j,T j )is also a solution to P1.
2. If P1 has a solution R/T K[x] with gcd (R, T ) = 1, then there exists K \ {0} such
R = R j and T = T j.
The problem P1 has a solution if and only if gcd (M , T j )is also a solution to P1.

Exercise 4.2. How to modify the extended Euclidean algorithm to answer P1 and in particular the Pad
approximation problem?

Remark 4.13. The cost of the computation of a solution to P1 or P2 is essentially the cost of the
extended Euclidean algorithm. Proposition 4.11 tells us that this cost is in at most O(N 2) arith-
metic operations. Using a fast Euclidean algorithm [27], one can reduce this cost to O(M (N )log N )
arithmetic operations.

4.3.3 Summary for the case of Pad approximation


Let m, nN, f K[[x]], a Pad approximant of type (m, n) for f is a rational fraction P /Q K(x)
such that
P
X| Q, deg P 6 m, deg Q 6 n and = f mod X m+n+1. (4.3)
Q
We established that if (Ri)06i6+1, (Si)06i6+1, (Ti)06i6+1 are the sequences of remainders and
Bzout coecient constructed by the extended Euclidean algorithm applied to the couple (M , P )
and j is the minimum of the set {i J0, + 1K; deg Ri 6 m} then
1. the problem (4.3) has a solution if and only if gcd (R j ,T j ) = 1.
2. If gcd (R j ,T j ) = 1 then R j /T j is the unique Pad approximant of type (m, n) for f .

4.4 Application of Pad approximation to irrationality and


transcendence proofs

It has been known since antiquity that 2 Q. Then Euler proved that e
/ Q, Lambert proved in
/ Q and that ea
1761 that / Q when a Q.

Definition 4.14. A number C is algebraic when there exists a nonzero polynomial P Q[x]
such that P () = 0. It is said to be transcendental otherwise.
4.4 Application of Pad approximation to irrationality and transcendence proofs 41

Nothing was known about transcendence until Liouville proved in 1844 that for a N, a > 2,

X
an! is transcendental.
n=0

The transcendence of this number is related to the very fast convergence of the series. This is also
illustrated in the proof of the irrationality of ea
/ Q when a Q that we now present.

Example 4.15. We have, for all n N,


n +   +
X 1 X 1 1 1 1 1 X 1
e = 6 1+ + + = n+k .
k! (n + 1)! n + 2 (n + 2)(n + 3) (n + 1)!
 
k! k!
k=0 k=n+1 k=1 n+1

Thus, we established, for all n N,


n
X 1 e1
0<e < .
k! (n + 1)!
k=0

Assume that e = p/q with p, q N. Then for n > (e 1)q 1, we have


n
!
X 1 q(e 1)
0 < n! p q < < 1,
k! n+1
k=0
||||||||||||||||||||||||||||{z}}}}}}}}}}}}}}}}}}}}}}}}}}}}
Z
a contradiction!

Exercise 4.3. Try (and fail!) to extend this proof to the case of ea, a N, a > 1.

Actually, weve just applied the following result:

Lemma 4.16. Let x R, if there exist two sequences (pn)and (qn) ZN satisfying the conditions:
1. there exists n0 Nsuch that qnx pn =
/ 0, n > n0 ,
2. limn+ qnx pn = 0,

then x
/ Q.

Proof. Exercise. 

Now lets try to prove that ex


/ Q for any x Q. First we notice that it is sucient to address
P ak
/ Q . We have ea = +
the case a N. Lets try to use again the proof of the irrationality e k=0 k! .
Pn a k
We set pn = n! k=0 k! and qn = n! for all n N. The values of both sequences belong to Z and
P ak
for all n N,qnea pn = n! +
k=n+1 k! > 0. Unfortunately,
 
a an+1 a a2 an+1ea
qne pn 6 1+ + + = + as n + if a > 2.
n+1 n + 2 (n + 2)(n + 3) n+1
Lets have a look at this proof: We started from the Taylor approximation to exp. We truncated
it in order to get the sequence of integers pn and nonzero remainders qnea pn. Unfortunately the
remainder, which comes from the tail of the Taylor series, does not get small enough as n grows.
Then, a fairly natural attempt is to try to get a smaller remainder from the tail of a series associated
to a Pad approximation. If we consider diagonal Pad approximants, that is to say of type
(n, n), we should have again integers pn computed from degree-n polynomials while the remainders
qnea pn will be computed from the tail of the series Qn(z)exp z Pn(z) which converges to 0 as
n + faster than the Taylor series remainder. So, lets now try to compute Pad approximants
for exp.
42 Rational Approximation

We want to determine two sequences (Pm,n) and (Qm,n) Q[X], such that
deg Pm,n 6 m and deg Qm,n 6 n, Qm,n(z)ez Pm,n(z) = 0 mod (z m+n+1). (4.4)
d
We rst determine Qm,n. If D denotes the operator dz , we have D(Q(z)ez) = Q (z)ez + Q(z)ez =
ez(D + I)Q(z). Therefore, if we apply the operator (D + I)m+1 to the last equality of(4.4), it yields

ez (D + I)m+1 Qm,n(z) + Dm+1Pm,n(z) = 0 mod (z n)


||||||||||||||||||||{z}}}}}}}}}}}}}}}}}}}}
=0

which is equivalent to (D + I)m+1 Qm,n(z) = 0 mod (z n). Since deg Qm,n(z) 6 n, there exists
km,n Q such that (D + I)m+1 Qm,n(z) = qm,nz n. Now recall that
+  
1 X
k m+k
= (1) X k.
(1 + X)m+1 m
k=0

This allows us to obtain an explicit formula for Qm,n:


+   n  
X m+k k n X m+k k n
Qm,n(z) =qm,n (1)k D z = qm,n (1)k D z
m m
k=0 k=0
n   
X m+k n
=qm,n (1)kk! z nk.
m nk
k=0
We choose qm,n so that the leading coecient of Qm,nis equal to 1. Hence we have qm,n = 1 and
n   
X m+nk n
Qm,n(z) = (1)n(n k)! (z)k.
m k
k=0

We want Qm,n(z)ez Pm,n(z) O(z m+n+1) which is equivalent to Qm,n(z) Pm,n(z)ez


O(z m+n+1), which is also equivalent to Qm,n(z) Pm,n(z)ez O(z m+n+1). It then follows
m   
X m+nk m k
Pm,n(z) = Qn,m(z) = (1)n(m k)! z .
n k
k=0
m+1
If we use again the relation D (Qm,n(z)ez Pm,n(z)) = z nez and repeated integration by parts,
we get
Z 1
z z m+n+1
Qm,n(z)e Pm,n(z) = tn(1 t)mezt dt.
m! 0

We now assume m = n and we set, for all n N,


qn = Qn,n(a), pn = Pn,n(a).
We have pn and qn Z for any n N. The sequence of general term
2n+1
Z 1
a na
qne pn = (1) (t(1 t))nea t dt
n! 0
satises to the two conditions of Lemma 4.16 (exercise!): the number ea is irrational.
Chapter 5
Numerical approximation using Pad
approximants

5.1 Numerical experiments

5.1.1 Starting from power series


We start with the functions ln (1 + x) and tan (x). The rst one has a radius of convergence 1 and
logarithmic singularity at 1 but no other nite singularity, while the second one has poles at
all the odd multiples of /2. In both cases, we start from a truncation of the Taylor series, that
converges only up to the closest singularity.
> K := 10:
> f := ln(1+x);

ln (1 + x)

> S := series(f, x, K);


1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9
x x + x x + x x + x x + x + O (x10)
2 3 4 5 6 7 8 9
> app := i -> convert(series(S+O(x^(i+1)), x, K+1), ratpoly):
> app(1), app(2), app(3);
1
x x2 + x
x, 1
,6 2
1+ 2 x 1+ 3 x

> plot([seq(app[i],i=1..K),f],x=-2..5,view=-1..3,scaling=constrained);

On this picture, we observe that the approximants do not converge for x 6 1 (where there is
nothing to converge to), but do seem to converge to f for the other real values of x. Note that
this is achieved by starting from a series that converged only for |x| < 1. Another phenomenon
that appears is that the approximants are alternatively above and below the graph of the function.
This appears more clearly on a specic value, as well as the speed of convergence:

43
44 Numerical approximation using Pad approximants

> eval([seq(app(i), i=1..K)], x=2.);

[2., 1.000000000, 1.142857143, 1.090909091, 1.101449275, 1.098039216, 1.098805646, 1.09857\


0353, 1.098625759, 1.098609242]

> evalf(ln(3));

1.098612289

This gives the interval (1.098609, 1.098626) of width less than 2105 containing ln (3). Observe
again that we have computed these good approximations starting from a power series that con-
verges only in the unit disk!
The case of tan is similar:
> S := series(tan(x), x, K+1);

1 3 2 5 17 7 62 9
x+ x + x + x + x + O (x11)
3 15 315 2835
> eval([seq(app(i), i=1..10)], x=evalf(Pi));

[3.141592654, 3.141592654, 3.141592654, 1.371953523, .3645065196, .3645065196, .36450\


65196, .8193036257e 1, .1102048086e 1, .1102048086e 1]

Since the function tan is odd, Pad approximants are identical by pairs. Convergence still
holds, even beyond the disk of convergence of the power series they are constructed from. Here
is a picture illustrating this phenomenon with approximants constructed from the expansion at
order 20 of the series:

We now try something more unsettling: using Pad approximants to evaluate a divergent series!
> S := add((-1)^n*n!*x^n, n=0..K);

1 x + 2 x2 6 x3 + 24 x4 120 x5 + 720 x6 5040 x7 + 40320 x8 362880 x9 + 3628800 x10

> eval([seq(app(i), i=1..10)], x=2.);

[1., .6000000000, .1428571429, .5135135135, .3538461538, .4847645429, .4167776298, .473056\


6384, .4404474877, .4676639926]

> evalf(subs(x=2,Int(exp(-t)/(1+t*x),t=0..infinity)));
0.4614553164

Again, the aproximations seem to converge, and it turns out that the limiting value has a
meaning. In that case, the power series is the (divergent) asymptotic expansion of the following
function: Z
et
dt,
0 1+xt
5.1 Numerical experiments 45

which is a function of x correctly evaluated by the Pad approximant!

All these examples motivate a better study of Pad approximants and their convergence prop-
erties from the numerical point of view.

5.1.2 Acceleration of convergence


In the context of acceleration of convergence, one is given the rst elements S0, S1, ..., SN of
a sequence supposed to converge in a certain sense to a limit S. The aim is to compute an
approximation of S better than SN , by exploiting the regularity of the sequence. A natural
relation to Pad approximants is obtained by considering the power series S0 + (S1 S0) x +
(S2 S1) x2 + , whose value at 1 is desired.
Huygens and the calculation of . Archimedes gave one of the rst good approximations
of . He constructed two regular polygons, one inscribed and one circumscribed to the unit circle.
Letting their number n of sides increase produces converging enclosures of the form

n sin < < n tan .
n n
From there, Archimedes realized that one can compute simultaneously two sequences sin (/2k)
and tan (/2k) using the relations
1 1 1 x 1
x = + and sin2 = 1 .
tan 2 tan x sin x 2 1+ x
tan2 2

Starting from there and using = /3, Archimedes was able to compute the values up to k = 5,
corresponding to polygons with 96 edges! Let us rst look at the values he must have obtained.

> alpha := Pi/3:


> K := 5:

> S1 := evalf([seq(3*2^k*sin(Pi/(3*2^k)), k=1..K)]);

[3., 3.105828541, 3.132628613, 3.139350203, 3.141031953]

> S2 := evalf([seq(3*2^k*tan(Pi/(3*2^k)), k=1..K)]);

[3.464101616, 3.215390309, 3.159659942, 3.146086215, 3.142714601]

These estimates show an accuracy of about four correct digits.


Many years later, Huygens applied acceleration of convergence to this sequence to get more
digits. He did it basically using Eulers method described below. But let us do it slightly dierently,
by converting to Pad approximants.
> S := S1[1] + add((S1[i+1]-S1[i])*x^i, i=1..K-1);

3. + .105828541 x + .26800072e 1 x2 + .6721590e 2 x3 + .1681750e 2 x4

> convert(series(S, x, K), ratpoly);

2.999999999 .8330876223 x + .4090840118e 1 x2


.9999999998 .3129720544 x + .1574323503e 1 x2
> eval(%, x=1);

3.141592654

>
46 Numerical approximation using Pad approximants

Increasing the precision shows that an accuracy of about 10 digits has been obtained, starting
from the same first 5 values!

We now review a few classical methods in acceleration of convergence, before relating them to
Pad approximants.
Eulers method. A rst idea is the following. Assume that
Sn = S + r n (n)
for some known r < 1 and (n + 1)/(n) 1. Writing
Sn+1 = S + r n+1 (n + 1),
we can see that  
Sn+1 r Sn r n+1 (n)
Tn = = S + (n + 1) 1
1r 1r (n + 1)
should converge faster, since the last factor tends to 0. This is the idea that was used by Huygens
in 1654 who used r = 1/2 to obtain the rst estimate of with 15 digits of accuracy.

Aitkens 2 method. When r is not available, it can be estimated, starting from


Sn = Sn+1 Sn = r n (r (n + 1) (n)),
Sn+1 = Sn+2 Sn+1 = r n+1 (r (n + 2) (n + 1)),
whence by division
Sn+1
r.
Sn
Eulers method with r replaced by the this estimate gives:
Sn+1
Sn+1 Sn (Sn) Sn+1 (Sn+1) Sn (S )2
Tn(1) = Sn
Sn+1
= 2
= Sn 2 n .
1 Sn Sn
Sn

Shanks method. Shanks formula is a generalization of Aitkens formula. Observe that Aitkens
formula rewrites as
Sn Sn+1

(1) Sn+1 Sn+2
Tn = .
2 Sn
In a way, Aitkens method eliminates one unknown r. Shanks method eliminates k of them
simultaneously. It is based on the more general

Sn Sn+k





Sn+k Sn+2k
Tn(k) =

.
2 Sn 2 Sn+k 1




2
Sn+k 1 2 Sn+2k 2

The relation to Pad approximants is very strong:

Proposition 5.1. If Sn = u0 + u1 x + + un xn, then Tn(k) is a (n + k, k) Pad approximant to


Sn+2k.

Proof. We prove only the case k = 1, corresponding to Aitkens 2-method. We construct a


(n + 1, 1) approximant of Sn+2 of the form
A(x)
= Sn+2 + O(xn+3),
1 b1x
with A(x) a polynomial of degree at most n + 1. Multiplying both sides by 1 b1x, we obtain
A(x) = Sn+2 (1 b1x) + O(xn+3). (5.1)
5.2 Continued Fractions 47

Extracting the coecient of xn+2 on both sides gives


0 = un+2 b1un+1.
Since un+1xn+1 = Sn+1 Sn = Sn, we thus obtain
Sn+1
b1 x = .
Sn
Finally, we obtain A(x) by truncating explicitly (5.1) at precision xn+2, leading to
Sn+1 Sn+1
A(x) = Sn+1(1 b1 x) + un+2 xn+2 = Sn+1 (Sn + Sn) + Sn+1 = Sn+1 Sn.
Sn Sn
(1)
This is exactly the formula for Tn .


Slow convergence. In cases of slow convergence of the form


a b
Sn = S + + +
n n2
the previous methods break down since there is no r < 1 to get rid of. However, a simple idea that
works is to use the above methods to the subsequence S2n. Examples include
the computation of Eulers constant
n
!
X 1
= lim log n ;
n k
k=1

integration by the trapezoidal rule (in this context, Eulers acceleration scheme is called
Rombergs method).
Wynns -algorithm. Shanks method seems to require the computation of large determinants,
which is computationally expensive and dicult to organize in a way so as to get successive
approximants eciently. A simple method due to Wynn in 1956 proceeds as follows:
(n) (n) (n+1) (n+1) 1
1 = 0, 0 = Sn , k+1 = k 1 + (n+1) (n)
.
k k
We admit the following result.
(n) (k)
Proposition 5.2. Wynns 2k is Shanks Tn .

In other words, this method gives a way to compute (values of) Pad approximants without
any heavy linear algebra.

5.2 Continued Fractions


All the examples of the previous sections show how eective Pad approximants can be numerically.
However, so far we do not have any proof (or even explanation) for their behaving so nicely. We
now concentrate on diagonal Pad approximants, which in turn are related to the classical and
beautiful topic of continued fractions.

5.2.1 Definition and notation


P Q
In the same way that we are familiar with the symbols i=1 or i=1 , we dene

a1
K (ai/bi) = a2 .
i=1 b1 + a
b2 + 3

48 Numerical approximation using Pad approximants

= C {} to
The general framework is that we are given a sequence (k) of mappings from C
itself, we dene
n = 1 2 n ,
and study the limit of the sequence (n(c)) for an appropriate value of c.
P
In the case n: t 7 t + an, we recover n(0) = ni=1 ai.
Q
Similarly for n: t 7 an t, we have n(1) = ni=1 ai.
an
The notation dened above corresponds to n: t 7 b , and then we have
n+t
n

K (ai/bi) = n(0).
i=1


Definition 5.3. The n-th convergent to the continued fraction C = Kn=1 (an/bn) is by definition
n(0) = An/Bn. The elements an and bn are called the nth partial numerator and partial denom-
inator of C.

5.2.2 Mbius transformations and the Riemann sphere


The study of convergence properties of continued fractions is made easy by a geometric point of
view on the mappings n and their action on the complex plane.
The Riemann sphere
We can view C = C {} as a sphere as follows. Let S be the unit sphere of C R =R ,
3

and let N be its north pole. Consider the stereographic projection that maps a point z C
to the intersection of S\{N } and the line that goes through z and N , and maps to N .
The coordinates are given by
2 z 2 z |z |2 1
x1 = , x2 = , x3 = .
|z |2 + 1 |z |2 + 1 |z |2 + 1

Lines in C correspond to circles (passing through the north pole) on the sphere. This can
be seen by observing that this circle is the intersection of the sphere with the plane passing
through the line and containing N .
Circles on the sphere correspond to either lines or circles on the plane. To see this, consider
the circle dened as the intersection of the sphere with the plane
a1 x1 + a2 x2 + a3 x3 = b,
where without loss of generality we may assume a21 + a22 + a23 = 1 and 0 6 b < 1. (The scalar
product of both vectors is smaller than the product of their norms, equality to 1 would
correspond to a circle reduced to one point, which we exclude). Now, if the corresponding
point is z = x + i y, we have
2 a1x + 2 a2 y + a3 (x2 + y 2 1) = b (x2 + y 2 + 1).
If b = a3, this is the equation of a line (and the north pole (0, 0, 1) is on the circle). Otherwise,
we have a circle with a positive radius: the equation rewrites
 2  2
a1 a2 a +b a2 + a22 a2 + a22 + a23 b2
x+ + y+ = 3 + 1 2
= 1 .
a3 b a3 b a3 b (a3 b) (a3 b)2

The inversion map


1 xiy
z = x + i y 7 =
z x2 + y 2
maps generalized circles (circles on the sphere, lines and circles in C) to generalized circles.
Indeed, if (x1 , x2 , x3 ) are the coordinates of the image of 1/z on the sphere, we have
2x 1
x1 = = x1
|z |2 |1/z |2 + 1
5.2 Continued Fractions 49

and by a similar computation x2 = x2 and x3 = x3. Thus an inversion corresponds to a


rotation of the sphere, which obviously preserves circles, then mapped to generalized circles
on C.
Mbius transformations. These are mappings of the form
az+b
t: z 7 , adbc=
/ 0.
cz+d
By partial fraction decomposition, we see that
for c = 0,  
a b
t(z) = z+
d a
is the composition of a translation and a scaling;
for c =
/ 0,
a bcad 1
t(z) = +
c c2 d
z+
c
is the composition of a translation, and inversion, a scaling and another translation.
In particular, Mbius transformations map generalized circles to generalized circles.
Exercise 5.1. Show that Mbius transformations form a group.

5.2.3 Basic properties of continued fractions

Convergents are compositions of Mbius transformations, and hence are themselves Mbius
transformations.

Theorem 5.4. Write


An + rn u
n(u) = 1 2 n(u) = .
Bn + s n u
Then, we have rn = An1 , sn = Bn1 and the recurrence relation
     
An An2 An1
= an + bn , n>1
Bn Bn2 Bn1

with A1 = 1, B1 = 0, A0 = 0, B0 = 1.

Proof. It is a simple induction. For n = 1, we have 1(u) = a1/(b1 + u). Thus in that case, the
property holds. If it holds up to n 1, then a straightforward computation yields
an
An1 + An2 b anAn2 + bnAn1 + An1u
n+u
n(u) = n1(n(u)) = an = ,
Bn1 + Bn2 b anBn2 + bnBn1 + Bn1u
n+u

which concludes the proof. 

5.2.4 Relation to Pad approximants


Let
f (z) = f [0](z) = c[0] [0]
0 + c1 z + , c[0]
0 = / 0.
Then !
1 1 1 1
[0]
f (z) = , and [1]
f (z) = [0] = c[1] [1]
0 + c1 z + .
1 [1] z [0]
f (z) c0
[0]
+ z f (z)
c0
50 Numerical approximation using Pad approximants

Provided that at each step c[k]


0 =/ 0, we thus dene a sequence f [k] by
!
1 1 1
f [k+1](z) = , c[k] [k]
0 = f (0),
z f [k](z) c[k]
0

and f (z) has the following continued fraction expansion:


1
f (z) = .
1 z
+
[0]
c0 1 z
+
[1]
c0

Theorem 5.5. Provided that all the c[i]


0 are nonzero, to any power series corresponds a unique
continued fraction, and
A2m
= 2m(0)
B2m
is a (m 1, m) Pad approximant of f (z), while
A2m+1
= 2m+1(0)
B2m+1
is a (m, m) Pad approximant of f (z).

In other words, diagonal Pad approximants are obtained as convergents of continued fractions.
   
Proof. Using Theorem 5.4 with 0(w) = 1/ 1/c[0] [n1]
0 + w and n(w) = z/ 1/c0 + w for n > 1,
[0]
gives 1(0) = c0 and      
An+2 An 1 An+1
=z + [n+1] , n > 0.
Bn+2 Bn c Bn+1
0

[0]
Thus by induction starting from A0 = 0, A1 = c0 , B0 = 1, B1 = 1, we get
deg A2n 6 n 1, deg A2n+1 6 n, deg B2n 6 n, deg B2n+1 6 n,
so that these polynomials have the required degrees. Also by induction from the recurrence we
obtain that Bn(0) =/ 0. We now show that they provide Pad approximants to the expected
precision. We have 1(0) = f (0) showing that A1 B1 f (z) = O(z). Next, inverting
An + An1u
n(u) =
Bn + Bn1u
yields
An Bnn(u)
u= .
An1 Bn1n(u)

In view of f (z) = n z f [n](z) , we deduce
An Bnf (z)
z f [n](z) = = O(z)
An1 Bn1 f (z)

and by induction from there An Bnf (z) = O(z n). The conclusion follows since Bn(0) =
/ 0. 

5.2.5 Changes of representation


The representation of a continued fraction by partial numerators and partial denominators is not
unique. Two continued fractions C and C are called equivalent if they have the same sequence of
convergents. This is denoted C C . Depending on the context it is sometimes more convenient to
use dierent variants with only one family of parameters. A starting point is the following simple
lemma.
5.2 Continued Fractions 51

Lemma 5.6. For an arbitrary sequence of nonzero c0 = 1, c1, c2, ..., we have

ai ci1ciai
K
i=1 bi
=
i=1
K cibi
.

Proof. The rst convergents are both equal to a1. Next, by induction from Theorem 5.4, we obtain
that the convergents An/Bn and An /Bn of the left-hand and right-hand sides are related by
An = c1cnAn , Bn = c1cnBn. 

Appropriate choices of the sequence cn give the following useful special cases.

Corollary 5.7. The following one-parameter families are equivalent to Ki=1(ai/bi):

1 1
bi1 bi ai 1
K
i=1 1
and K
i=1 bici
,
with
a1a3a2i1 a2a4a2i
c2i = , c2i+1 = .
a2a4a2i a1a3a2i+1

Thus from now on, we can use partial numerators or denominators equal to 1 depending on
our needs.

5.2.6 Simple continued fractions



This is the name given to continued fractions of the form Ki=1(1/bi) with bis that are positive
integers. This was not discussed during the course, but many nice results are known:
the convergents An/Bn are such that An and Bn are relatively prime;
the fraction is terminating (i.e., has a nite number of terms) if and only if it is the expansion
of a rational number;
a proof of the irrationality of e = exp (1) can be deduced from this fact and the continued
fraction of exp given below;
the rst proof of the irrationality of was obtained from this fact and the continued fraction
of tan given below by Lambert in 1761 (using the fact that tan (/4) = 1 is rational and
thus must have a nite simple continued fraction);
continued fraction expansion gives the best rational approximations to irrational numbers;
the worst-case complexity for the Euclidean algorithm computing the greatest common
divisors of two integers occurs when these integers are two consecutive Fibonacci numbers,
which is related to the continued fraction of the golden ratio;
the application of continued fractions to the Gregorian calendar.
For this and much more, we refer to the chapter on Continued Fractions in the classical and very
accessible book by Hardy & Wright [13].

5.2.7 Hypergeometric series and recurrences of order 2


This section has not been treated during the course. It provides several explicit continued fractions
that are used elsewhere in the examples, but can be admitted with no loss.
Given a sequence (fk(z)) of power series that satisfy a 2nd order linear recurrence
fk(z) + bk(z)fk+1(z) + ak+1(z)fk+2(z) = 0, (5.2)
dividing by fk+1(z) and reorganizing terms gives the begining of a continued fraction expansion
under appropriate conditions:
fk(z) a (z)
= bk(z) + fk+1(z) .
fk+1(z) k+1
fk+2(z)
52 Numerical approximation using Pad approximants

Sucient conditions for this process to provide an explicit continued fraction expansion for f0(z)/
f1(z) (or for further fk(z)/fk+1(z)) are that bk(0) =
/ 0 while ak+1(0) = 0. In that case unrolling this
recurrence relates successively the pairs of indices (0, 1), (1, 2), (2, 3), ... and leads to
f0(z) a1(z)
= b0(z) + . (5.3)
f1(z) a2(z)
b1(z) +
a (z)
b2(z) + 3

In some cases, rewritings using Corollary 5.7 may lead to nicer formul.
Hypergeometric series
A large source of such sequences of power series is provided by hypergeometric series, that have
as special cases many classical elementary and special functions. The general framework is the fol-
lowing. Start with a parameterized power series f (; z) that satises a 2nd order linear dierential
equation with coecients in Q(, z). Assume moreover that the shifted power series f ( + 1; z)
satises an identity of the form
f ( + 1; z) = u(; z)f (; z) + v(; z)f (; z),
where the derivative is with respect to z. Then eliminating f (; z) between f (; z), f ( + 1; z),
f ( + 2; z) provides a recurrence like (5.2).

5.2.7.1 The hypergeometric 0F1(; z)


/ 0, 1, 2, ... as
The hypergeometric series 0F1(; z) is dened for =

1 z 1 z2 1 z3
0F1(; z) = 1 + + + + .
1! ( + 1) 2! ( + 1)( + 2) 3!
Since its sequence of coecients satises n( + n 1)un+1 = un, the series satises a 2nd order
linear dierential equation. Also, by simple inspection,
1
0F1 (; z) = 0F1( + 1; z).

Thus by the reasoning above, there exists a linear dependency between 0F1(; z), 0F1( + 1;
z), 0F1( + 2; z). Linear algebra (or a direct examination) yields
z
0F1(; z) = 0 F1( + 1; z) + 0 F1( + 2; z).
( + 1)
The valuation conditions are satised and thus we get a continued fraction expansion for 0F1(;
z)/0F1( + 1; z).

Example 5.8. (tan) As a special case, observe that


cos (z) = 0F1(1/2; z 2/4), sin(z) = z 0 F1(3/2; z 2/4).
From there, the following continued fraction expansion of
tan (z) = z/(0F1(1/2; z 2/4)/0F1(3/2; z 2/4))
1 3 
follows: in (5.3), we have bk(z) = 1 and ak+1(z) = z 2/ 4 k + 2 k + 2 , whence
z
tan (z) = 2 . (5.4)
z /(13)
1
z 2/(35)
1
z 2/(57)
1

Lamberts proof of the irrationality of follows from the fact that if itself was rational then we
would have a nonterminating simple continued fraction (after rewriting) for tan /4 = 1 which is a
rational number. That is a contradition.
5.2 Continued Fractions 53

5.2.7.2 The hypergeometric 1F1(; ; z)

/ 0, 1, 2, ... by
With one more parameter, we consider the hypergeometric series dened for =

z ( + 1) z 2 ( + 1)( + 2) z 3
1F1(; ; z) = 1 + + + + .
1! ( + 1) 2! ( + 1)( + 2) 3!

By the same reasoning this series satises a linear dierential equation of order 2 and each of
1F1( + 1; ; z) and 1F1(; + 1; z) can be rewritten as linear combination of 1F1(; ; z) and its
derivative. It follows that any three among the shifts 1F1( + p; + q; z) with integers p, q are
linearly dependent. A direct use of the same method as before leads to the slightly unpleasant
 
z ( + 1) z
1F1(; ; z) = 1 1F1( + 1; + 1; z) + 1 F1( + 2; + 2; z).
( + 1)

Although the valuation conditions are satised and we get a continued fraction expansion for 1F1(;
; z)/1F1( + 1; + 1; z), this is not really a nice one because of the presence of the variable z both
in the partial numerators and denominators.
Instead, using the following two identities

1F1(; ; z) = 1F1( + 1; + 1; z) +z 1 F1( + 1; + 2; z), 1F1(; ; z) = 1F1(; + 1;
( + 1)

z) +z 1 F1( + 1; + 2; z).
( + 1)

in alternance relates successively the indices (, ), ( + 1, + 1), ( + 1, + 2), ( + 2, + 3),


( + 2, + 4), ... and produces the following continued fraction:

z
1F1(;
; z) ( + 1)
=1+ .
1F1( + 1; + 1; z) +1
z
( + 1)( + 2)
1+
1+
z
( + 2)( + 3)
1+
+2
z
( + 3)( + 4)
1+

Example 5.9. (exp) Specializing at = 0 gives a simplication of the rst partial numerator
that makes the expression hold even for = 0. Then, the (obvious) special values

1F1(0; ; z) = 1, 1F1(1; 1; z) = exp (z)

provide a continued fraction expansion for exp (z):


z
exp (z) = 1 + , (5.5)
z/(21)
1+
z/(23)
1+
z/(23)
1+
z/(25)
1+
1 +

taking the inverse of which yields a continued fraction for exp (z).

5.2.7.3 The hypergeometric 2F1(, ; ; z)


With again one more parameter, we obtain Gauss hypergeometric series

z ( + 1)( + 1) z 2 ( + 1)( + 2)( + 1)( + 2) z 3


2F1(, ; ; z) = 1 + + + + .
1! ( + 1) 2! ( + 1)( + 2) 3!
54 Numerical approximation using Pad approximants

As before, this series satises a second order linear dierential equation and all the shifts of its
parameters belong to the same vector space of dimension 2, so that any three of them are linearly
dependent. We use the following linear dependency
( )
2F1(, ; ; z) = 2F1(, + 1; + 1; z) +z 2F1( + 1, + 1; + 2; z),
( + 1)
and that obtained by exchanging the roles of and . Using them in alternance relates successively
the indices (, , ), (, + 1, + 1), ( + 1, + 1, + 2), ( + 1, + 2, + 3), ( + 2, + 2, + 4), ...
and produces the nice continued fraction
( )
2F1(,; ; z) z ( + 1)
=1+ ( + 1)( 1)
2F1(, + 1; + 1; z) z ( + 1)( + 2)
1+ ( + 1)( 1)
z ( + 2)( + 3)
1+ ( + 2)( 2)
z ( + 3)( + 4)
1+

discovered by Gauss himself. The special value

2F1(, 0; ; z) = 1
gives a continued fraction for the whole 2-parameter family
( + 1) 2 ( + 1)( + 2) 3
2F1(, 1; ; z) = 1 + z+ z + z +
( + 1) ( + 1)( + 2)
under the form
1
2F1(, 1; ; z) = .
z ( + 1)
1+ 1(1 + )
z ( + 1)( + 2)
1+ ( + 1)( + 1)
z ( + 2)( + 3)
1+ 2(2 + )
z ( + 3)( + 4)
1+

Several classical functions are obtained as special cases, such as arctan (z), (1 + z) and log (1 + z).

Example 5.10. (log(1+z)) The special case log (1 + z) = z 2 F1(1, 1; 2; z) leads to the explicit
form
z
log (1 + z) = 12 . (5.6)
z 23
1+ 12
z 34
1+ 23
z 45
1+ 23
z
1 + 56

5.3 Convergence
Up to now, we have observed many properties of continued fractions or other Pad approximants
and witnessed their great numerical properties, but we are still missing an explanation for this
observed convergence. We now give such results for several classes of continued fractions.

5.3.1 Series expression


The study of the convergence of a continued fraction can often be reduced to the convergence of
a numerical series by the following formula.
5.3 Convergence 55


Proposition 5.11. The nth convergent An/Bn of the fraction Ki=1(ai/bi) is
a1 aa (1)n+1a1an
n(0) = 1 2 + + (5.7)
B1 B1B2 Bn1Bn
provided all the Bis are nonzero.

Proof. For n = 1 the result is a1/b1. Next, consider the dierence


An+1 An An+1Bn AnBn+1
= .
Bn+1 Bn BnBn+1
Theorem 5.4 shows that
    
An+1 Bn+1 bn+1 an+1 An Bn
= .
An Bn 1 0 An1 Bn1
Taking determinants then gives
An+1Bn AnBn+1 = an+1(AnBn1 An1Bn) = = (1)n+1an+1a1(A1B0 A0B1)
whence the result in view of the initial conditions given in Theorem 5.4. 

5.3.2 Positive coefficients


The series representation (5.7) leads to simple consequences when the partial numerators and
denominators are positive.

Corollary 5.12. When ai and bi are positive for all i, then the sequence Cn = n(0) of convergents

to Ki=1 (ai/bi) satisfies
C2 < C4 < C6 < < C2n < C2n+1 < C2n1 < < C1.
As a consequence, both sequences (C2n) and (C2n+1) converge. When they have the same limit, it
belongs to the intervals (C2n , C2n+1).

Proof. First, the positivity of the partial numerators and denominators induces the positivity of
the numerators and denominators of the convergents, by Proposition 5.4. Next, the sum of two
consecutive terms in the series (5.7) is
 
(1)n+1a1an (1)na1an+1 (1)n+1a1an 1 an+1
Cn+1 Cn1 = + = .
Bn1Bn BnBn+1 Bn Bn1 Bn+1
Positivity of bn+1 and Bn and the recurrence of Proposition 5.4 imply that Bn+1 = an+1Bn1 +
bn+1Bn > an+1Bn1 so that the last factor in the equality above is positive. It follows that the sign
of Cn+1 Cn1 is that of (1)n+1, which concludes the rst part of the proof. Then (C2n+1) is a
bounded increasing sequence, so that it converges, and similarly for (C2n). The last statement is
clear. 

We now (at last!) give our rst criterion for the convergence of continued fractions (and therefore
also for Pad approximants).

Proposition 5.13. When Pbi > 0 for all i, the continued fraction Ki=1(1/bi) converges if and only
if the numerical series i=1 bi is divergent.

Proof. The general term of the alternating series (5.7) is 1/BnBn+1 which is decreasing when
n increases. Thus the convergence of the continued fraction is equivalent to BnBn+1 . The
recurrence gives
BnBn+1 = bnBn2 + Bn1Bn = = bnBn2 + bn1Bn1
2
+ + b1B12 > (b1 + + bn) min (Bi2).
P
This proves that divergence of the sum i=1 bi implies convergence of the continued fraction.
Conversely,
Bn+1 + Bn = (1 + bn)Bn + Bn1 6 (1 + bn)(Bn + Bn1) 6 6 (1 + bn)(1 + b1)b1 6 b1eb1 ++bn.
56 Numerical approximation using Pad approximants

Since BnBn+1 6 (Bn+1 + Bn)2/4, if the sum of the bis is convergent with limit , then the last
term above is bounded by b1e and therefore the sequence BnBn+1 is bounded, which prevents the
series (5.7) from converging. 

Example 5.14. (log(1+z ) for z >0). The continued fraction for ln (1 + z) given in (5.6) has
positive coecients. A change of representation following Corollary 5.7 shows that this fraction

is equivalent to Ki=1 (1/bi) with b2i

= 2/i and b2i+1 = (2i + 1)/z. Thus, the sum diverges for any
z > 0 and the continued fraction converges over all of R+. This proves the behaviour observed at
the begining of this chapter, together with the fact that successive convergents provide smaller and
smaller intervals containing the limiting value.

5.3.3 Fractions with complex coefficients


A criterion that is often easy to check is provided by the following.

Theorem 5.15. [Worpitzky (1865)] If |ak | 6 1/4, for k = 1, 2, ... then the continued fraction

C = Kk=1 (ak/1) converges. Its limit C satisfies |C | 6 1/2 and the nth convergent obeys |n(0)
1
C | 6 2n + 1 .

Proof. Recall the notation n(u) = 1 2 n(u), where here k(u) = ak/(1 + u). The proof is
based on the fact that the image of a disk by the ks is a disk. Moreover, if D(, r) denotes the disk
of center and radius r, the hypothesis on ak will be shown to imply that k(D(0, 1/2)) D(0, 1/2).
Then the sequence
D0 = D(0, 1/2), D1 = n(D0), ..., Dn = 1(Dn1)
will be proved to have radii decreasing to 0 when n increases. Now, n+k(0) n(D0) = Dn implies
that the sequence (n(0)) is a Cauchy sequence, whence its convergence.
We now prove the required bounds. First, the image of D(, r) by : u 7 a/(1 + u) is computed
by a sequence of translation, inversion, scaling, given by
 
r
c + D(, r) = D(c + , r), D(, r) = D(, ||r), 1/D(, r) = D , .
||2 r 2 ||2 r 2
This gives
(D(, r)) = a/(1 + D(, r)) = a/D(1
 + , r)   
1 + r a(1 + ) |a|r
=a D , =D , .
|1 + |2 r 2 |1 + |2 r 2 |1 + |2 r 2 |1 + |2 r 2
 
4a 2|a|
In particular, when = 0 and r = 1/2 we obtain (D0) = D 3 , 3 which is indeed a subset
of D0 when |a| 6 1/4. For r < |1 + |, the following inequalities are consequences of the decrease
of x/(x2 r 2) for x > r:

a(1 + ) 1 || |a|r r
|1 + |2 r 2 6 |a| (1 ||)2 r 2 , 6 |a| . (5.8)

2
|1 + | r 2 (1 ||)2 r2
The right-hand sides of these inequalities are increasing functions of || and r as long as r < 1 ||
and thus provide the basis for an induction: if Dk = D(k , rk), then upper bounds on |k | and rk
produce upper bounds on k+1 and rk+1. In particular, the inequalities
k 1
|k | 6 , rk 6
2k + 1 2(2k + 1)
are valid for k = 0 and follow by induction using (5.8). This concludes the proof. 

Example 5.16. (log(1+z ) for complex z). In (5.6), the partial denominators are 1, while the
partial numerators are given by
k (k + 1) k(k + 1)
a2k = z, a2k+1 = z.
(2k)(2k + 1) (2k + 1)(2k + 2)
5.3 Convergence 57

This shows convergence of the continued fraction over the disk |z | 6 1.

Example 5.17. (exp(z )). The continued fraction (5.5) is in the desired form with a1 = z and
1 1
a2k = z, a2k+1 = z, k > 0.
2(2k 1) 2(2k + 1)
In this example, the partial numerators are all of modulus smaller than 1/4 if and only if the rst
one is, which implies |z | 6 1/4. However, since the sequence of partial numerators decreases to 0,
as soon as k > |z |, the continued fraction k k+1 is convergent with a limit of modulus at
most 1/2. From there follows that the whole fraction converges for arbitrary z C provided the k
poles of the kth convergent k(w) are avoided, for k = |z |.

Example 5.18. (tan(z )) A continued fraction for tan (z) is given in (5.4). It has |ak+1(z)| =
1 3 
|z |2/ 4 k + 2 k + 2 so that the continued fraction converges starting at the index k = |z |. As
above, from there follows that the whole fraction converges for arbitrary z C provided the k poles
of the kth convergent k(w) are avoided. This explains the behaviour observed on this example
in 5.1.1.

5.3.4 Speed of convergence


Worpitzkys theorem only assumes that the partial numerators are bounded by 1/4. When, as is
the case for exp (z) (Example 5.17), they tend to 0, more precise information can be obtained by
using the speed of convergence to 0. This is exemplied in the following result, that we admit.

Proposition 5.19. [Jones & Thron 1990] If, for m > n > 1, (an) satisfies
1
|am | < n <
4

then for any nonnegative n and k, the convergents of the continued fraction Kk=1 (ak/1) obey
Qn
1 1 4n j =1 j
|n+k(0) n(0)| 6 Q .
1 4n + 1 4n n1 j =1 (1 3 j )

Example 5.20. (exp(z ) again). We restrict ourselves to |z | < 1/4, but by the reasoning in
Example 5.17, this can be extended to larger values of |z |. For those values of |z |, the sequence
dened by 1 = |z |, and n = 1/(8n 8) for n > 1 satises the constraints an gives the bound
q
1/2
1 1 n1 8 nc
q Q n1 n
8 n!
1 n 1 + 1 n 1 (n 1) j =1 (8j 11)
1/2 1/2

for a certain constant c. Thus the convergence is extremely fast in this case.

5.3.5 Stieltjes series


There is a beautiful theory due to Stieltjes that relates continued fractions to orthogonal poly-
nomials and measure theory. The series that are concerned have a continued fraction of the

type Kk=1 (ak z/1) with nonnegative ak, so that the results of 5.3.2 apply for real positive z.
They also possess a strong structure that makes them suciently regular to exhibit very strong
convergence properties.
P
Definition 5.21. A power series U (z) = n>0 un(z)n is a Stieltjes series when there exists a
function : R+ R+ such that Z
un = tn(t) dt, n > 0.
0
58 Numerical approximation using Pad approximants

A rst (and clear) observation is that such a series is the asymptotic expansion of the function
Z
(t)
f (z) = dt
0 1 +tz
as z 0 with |arg z | < . This can be seen by integrating by parts. We admit the following powerful
result that summarizes the properties of these series.

Proposition 5.22. Let U (z) be a Stieltjes and Kk=1 (ak z/1) its corresponding continued fraction.
Then,
the coefficients ak are nonnegative;
P 1/2n
if an diverges then the sequence (n(0)) of convergents is convergent;
if (n(0)) converges for a value of z0 C \ R , then it is convergent for all z C \ R.

Example 5.23. With (t) = et, we have un = n! and thus the divergent series of our introduction
is the asymptotic expansion of the function
Z
et
F (z) = dt.
0 1+tz

This integral is convergent for all z C \ R. It has a continued fraction expansion with
coecients an = n/2. Thus the series in the second condition of the proposition converges and
the continued fraction is convergent in the whole slit complex plane.
Chapter 6
Orthogonal polynomials - Chebyshev series

6.1 Orthogonal polynomials


Let (a, b) R be an open interval (note that, in this section, it does not need to be bounded),
and let w be a weight function, that is to say w: (a, b) (0, ) is a continuous function (this last
hypothesis is not strictly necessary, we use it for ease of presentation). We assume
Z b
n N, |x|n w(x) dx < .
a

This is the case, for instance, if (a, b) is bounded and


Z b
w(x) dx < .
a
Let ( !1/2 )
Z b
E(w) = f C((a, b)) : kf k2 := f (x)2 w(x) dx < .
a

Observe that R[x] E(w). The space E(w) is equipped with an inner product
Z b
hf , g i = f (x) g(x) w(x) dx;
a

and kk2 is the norm associated to this inner product.

Definition 6.1. A family of orthogonal polynomials associated with w is a sequence (pn) R[x]N
where deg pk = k for all k, and
i=
/j hpi , p j i = 0.

Theorem 6.2. For any weight w, there exists a family of orthogonal polynomials associated with w.
If additionally we request that the pk are all monic, this family is unique.

Proof. We use Gram-Schmidt orthogonalization. Starting with p0 = 1, we iteratively construct


polynomials pk obeying the three conditions for an orthonormal family:
pk is monic;
SpanR{p0, ..., pk } = Rk[x];
pk SpanR{p0, ..., pk1}.
In view of these conditions, the polynomial pk is necessarily of the form
k
X 1
pk = x k + k,j pj , k, j R.
j =0

The third condition above is equivalent to the system




0 = hpk , p j i = xk , pj + k,j kp j k22, j = 0, ..., k 1.

59
60 Orthogonal polynomials - Chebyshev series

The unique solution to this system is


hxk , pj i
k,j = .
kp j k22
Thus uniqueness is established and the polynomial pk thus constructed has the required proper-
ties. 

The following statement gives us a way to recursively compute a sequence of orthogonal polyno-
mials. Note also that if you adapt Clenshaws method (cf. 2.5) to this recurrence, it also yields an
evaluation scheme in linear time for polynomials expressed in the corresponding basis of orthogonal
polynomials.

Theorem 6.3. The polynomials (pn)nN satisfy the recurrence relation


pn(x) = (x n) pn1(x) n pn2(x) (n > 2)
with
hxpn1, pn1i kpn1k22
n = , n = .
kpn1k22 kpn2k22

Proof. Let n > 2. The polynomial x pn1 is monic and has degree n, hence
n1
X
x pn1 = pn + a k pk .
k=0

hx pn 1, pk i
The orthogonality of the pn gives ak = kpk k22
for k = 0, ..., n 1. If we notice that hx pn1,
pk i = hpn1, xpk i for all k = 0, ..., n 1, we obtain ak = 0 if k 6 n 3 since xpk Rn2[x] and
pn1 (Rn2[x]). Hence, there are at most two nonzero coecients:
hpn1, x pn1i
an1 = = n ,
kpn1k22
hp , x pn2i hpn1, pn1 + q i
an2 = n1 = with q Rn2[x]
kpn2k22 kpn2k22
hp ,p i
= n1 n1 2 = n.
kpn2k2


Example 6.4.
(1, 1) w(x) = (1 x2)1/2 Chebyshev polynomials of the rst kind (up to normalization)
(1, 1) w(x) = 1 Legendre polynomials
(0, +) w(x) = ex Laguerre polynomials
2
(, ) w(x) = ex Hermite polynomials

Exercise 6.1. Prove that the rst statement of Example 6.4 is correct.

Theorem 6.5. For any weight w and for all n, the polynomial pn has n distinct zeros in (a, b).

Proof. Fix n. Let x1, ..., xk be the distinct zeros of pn in (a, b), with respective multiplicities
m1, ..., mk. We introduce the polynomial
k
Y
q(x) = (x x j )mj mod 2.
j=1

If k < n, we have deg q 6 k < n, and hence


Z b
hq, pn i = pn (x)q(x) w(x) dx = 0,
a

but the integrand is strictly positive over (a, b)\{x1, ..., xk }: contradiction. 
6.1 Orthogonal polynomials 61

Theorem 6.6. Let f E(w), n N. There exists a unique best L2(w) polynomial approximation
to f in Rn[x], denoted p2,n:
kf p2,n k2 = min kf pk2.
pRn[x]
It is characterized by
p Rn[x], hf p2,n , pi = 0.

Exercise 6.2. Prove this theorem.


Pn hpk , f i
Remark 6.7. We have p2,n = k=0 kpk k22 pk .

Theorem 6.8. If (a, b) is bounded, then for all f E(w), we have


kk2
p2,n








f
as n .

Proof. First assume that f C([a, b]). Let pn be the minimax degree-n approximation to f : then
Z b !1/2 Z b !1/2
kf p2,n k2 6 kf pn k2 = (f pn)2 w(x) dx 6 En(f ) w(x) dx
a a

but we already know that En(f ) 0 as n .


For general f , for all > 0, let

: [a, b] [0, 1] = ,

dened more precisely by



02 if x [a, a+ /2] [b /2, b],



x a 2 if x [a + /2, a + ],
a(x) = 2 



b 2 x if x [b , b /2],

1 if x [a + , b ].

We have f C([a, b]) if we assume f (a) = f (b) = 0. For almost all x [a, b], we have
|f (x) (f)(x)| |f (x)| 1[a,a+][b,b](x),
where 1[a,a+][b,b] denotes the indicator function of the set [a, a + ] [b , b]. Hence, for
almost all x [a, b]
lim0 |f (x) (f)(x)| = 0,
|f (x) (f)(x)| |f (x)| , with f L2([a, b], w).
It follows from Lebesgues dominated convergence theorem that
Z b
|f (x) (f)(x)|2 dx









0.
a 0

()
Denoting by p2,n the best L2(w) degree-n approximation to f , we have

() ()
kf p2,n k2 6 f p2,n 6 kf f k2 + kf p2,n k2
2

for all n and . Let > 0, there exists > 0 such that kf f k2 < . For this , there exists
n0 N such that kf p()
2,n k2 < for all n N, n > n0. 

Remark 6.9. The previous statement can be wrong if one does not assume that (a, b) is bounded.
Can you give a counter-example?
62 Orthogonal polynomials - Chebyshev series

Note that, from Remark 6.7, the computation of the coecients of the best approximations in
the basis of orthogonal polynomials seems to require the evaluation of several integrals. Hence,
this kind of polynomials approximation is often signicantly more expensive than the approach via
interpolation polynomials.

6.2 A little bit of quadrature: Gauss methods


Let w be a weight function over (a, b), and let f C((a, b)). We briey study methods which
approximate the integral
Z b
f (x) w(x) dx
a
with a sum of the form
n
X
wk f (xk), wk R, xk [a, b] pairwise distinct. (6.1)
k=0
Qn x xj
First of all, if k(x) = j =0, xk x j
, observe that if
j=/k n
X
p(x) = f (xk) k(x) Rn[x]
k=0

interpolates f at the points x0, ..., xn, then our approximation for the integral is equal to
R b Pn
a
p(x)w(x) dx = k=0 wk f (xk) with
Z b
wk = k(x)w(x) dx for k =0, ..., n.
a

Thus we obtain an approximation of the integral that is exact at least for polynomials of degree
up to n. It is possible to obtain a much better result if one is allowed to choose the points x0, ..., xn:

Theorem 6.10. There exists a unique choice of the points xk and the weights wk such that,
whenever f R2n+1[x], the formula ( 6.1) is exact in the sense that
Z b X n
f (x) w(x) dx = wk f (xk).
a k=0

These points xk belong to (a, b) and are the roots of the (n + 1)-th orthogonal polynomial associated
to w.

Proof. We start with the uniqueness. Assume that x j , w j are such that the method is exact for
any f Rm[x], m 6 2 n + 1. Set
Yn
n+1(x) = (x x j ).
j =0

For all p Rn[x], we have deg (pn+1) 6 2n + 1. Hence


Z b n
X
hp, n+1i = p(x) n+1(x) w(x)dx = p(xk) n+1(xk) wk = 0.
a k=0


The polynomial n+1 is monic and belongs to (Rn[x]) : it is the (n + 1)-th orthogonal polynomial
Pn R b
associated to w. The xk are its roots and, as noted above, wk = k=0 wkk(xk)= a k(x)w(x)dx.
As for the existence, let x0, ..., xn be the distinct
R b roots in (a, b) of the (n + 1)-th orthogonal
polynomial (cf. Proposition 6.5), and let wk = a k(x)w(x)dx where k is the corresponding k-th
Lagrange polynomial. Clearly the method is exact if f Rn[x]. If now f R2n+1[x], write
f = q n+1 + r, deg r 6 n.
6.3 Lebesgue constants 63

R b
As n+1 Rn[x] et deg q 6 n, we have a q(x)n+1(x)w(x)dx = 0. It follows that
Z b Z b n
X n
X
f (x) w(x)dx = r(x) w(x)dx = wk r(xk) = wk f (xk). 
a a k=0 k=0

See Chapter 19 of [22] for an interesting and up-to-date account on Gauss methods. Note that
a recent work [10] showed that the weights and the nodes for Gauss-Legendre or Gauss-Chebyshev
quadrature, for instance, can be computed in O(n) operations.

Remark 6.11. When w = 1, an alternative to Gauss quadrature with Legendre points is the so-
called Clenshaw-Curtis quadrature, which uses Chebyshev points as interpolation nodes. The
Chebyshev polynomials of the rst kind satisfy
Z 1 (
2
, k 2 N,
Tk(x)dx = 1 k2
1 0, k / 2 N.
Pn
Hence, if p = k=0 ck Tk is the interpolation polynomial of f , we deduce that the integral with
weight w = 1 of f is approximated by
Z 1 X 2 ck
p(x) dx = .
1 1 k2
06k6n
k2N

Since the coecients ck can be computed in O(n log n) arithmetic operations using the FFT, this
yields a complexity in O(n log n) for the computation of the quadrature approximant.

6.3 Lebesgue constants


For simplicity, we assume [a, b] = [1, 1].

Definition 6.12. We say that a linear mapping L: C([1, 1]) Rn[x] is a projection onto Rn[x]
if Lp = p for all p Rn[x]. The operator norm
kLf k
= sup
f C([1,1]) kf k

is called the Lebesgue constant for the projection.

Proposition 6.13. Let be the Lebesgue constant for the linear projection L of C([1, 1]) onto
Rn[x]. Let f C([1, 1]) and let p = Lf. Let p denote the minimax approximation to f. Then, we
have
kf pk 6 (1 + ) kf pk.

Proof. We have L (f p) = p p. It follows that


kp f k kf pk 6 kp pk = kL (f p)k 6 kf pk. 

6.3.1 Lebesgue constants for polynomial interpolation


Let x0, ..., xn be pairwise distinct points in [1, 1]. Consider the Lagrange interpolation operator
n
X
Ln: C([1, 1]) Rn[x], Ln f (x) = f (xk) k(x).
k=0

Clearly, Ln is a linear projection of C([1, 1]) onto Rn[x]. On the one hand, we have
n
X
|Ln f (x)| 6 kf k |k(x)|, for all x [1, 1],
k=0
64 Orthogonal polynomials - Chebyshev series

which implies that the corresponding Lebesgue constant n = 9Ln9 satises


n
X
n 6 A := max |k(x)|.
x[1,1]
k=0
P
On the other hand, since the function x [1, 1] 7 nk=0 |k(x)| is continuous, there exists
Pn
[1, 1] such that A = k=0 |k()|. Let g: [1, 1] [1, 1] be a continuous piecewise ane
function such that g(xi) = sgn i(). Then, we have
n
X
Ln g() = |k()|
k=0

and hence kLn g k > A kg k. Weve just proved the following statement.

Theorem 6.14. The Lebesgue constant of degree-n Lagrange interpolation at x0, ..., xn is equal to
n
X
max |k(x)|.
x[1,1]
k=0

Theorem 6.15. The Lebesgue constant n satisfies


   
2 4 2 4
log (n + 1) + + log 6 n , where + log = 0.52125...

Additionally,
for Chebyshev nodes (of the first and the second kinds), we have the bound
2 2
n 6 log (n + 1) + 1 and n log n as n+

for equispaced points,
2n2 2n+1
n > and n as n+.
n2 en log n

Proof. See Chapter 15 of [22]. 

Remark 6.16. We deduce from this theorem that Chebyshev interpolants (i.e. interpolation
polynomials at Chebyshev nodes) are "near-best" approximations:
15 = 2.76...: one loses at most 2 bits if one uses a Chebyshev interpolant instead of the
minimax polynomial;
30 = 3.18...: one loses at most 2 bits if one uses a Chebyshev interpolant instead of the
minimax polynomial;
100 = 3.93...: one loses at most 2 bits if one uses a Chebyshev interpolant instead of the
minimax polynomial;
100000 = 8.32...: one loses at most 4 bits if one uses a Chebyshev interpolant instead of the
minimax polynomial.

6.3.2 Lebesgue constants for L2 best approximation


Proof. This is obtained by combining
 Proposition 6.13, the bound on n above and Corollary 2.7
that states En(f ) = O n1/2 . Thus there exists K such that for large enough n,
 
2 K
kf pn k 6 2 + log(n + 1) 0, n .
n

6.4 Chebyshev expansions 65

Remark 6.17. We deduce from this theorem that truncated Chebyshev series are "near-best"
approximations:
15 = 4.12...: one loses at most 3 bits if one uses the truncated Chebyshev series instead of
the minimax polynomial;
30 = 4.39...: one loses at most 3 bits if one uses the truncated Chebyshev series instead of
the minimax polynomial;
100 = 4.87...: one loses at most 3 bits if one uses the truncated Chebyshev series instead of
the minimax polynomial;
100000 = 7.66...: one loses at most 3 bits if one uses the truncated Chebyshev series instead
of the minimax polynomial.

6.3.3 Corollary: A first statement on the convergence of Chebyshev


interpolants and truncated Chebyshev series
 
1
Definition 6.18. When the space under consideration is E
2
, the best L2 polynomial
1x
approximation p2,n is called the truncated Chebyshev expansion of f of order n. Its coefficients ak
are called the Chebyshev coecients of f. They are given by

hf , Tk i , k =

/0
hT , T i
ak = 1 khf , kT0i
2 hT , T i , k = 0,

0 0

and the formal series



X
akTk(x)
k=0
is called the Chebyshev expansion of f.
 
1
Theorem 6.19. The Lebesgue constant for the map f E 7 p2,n is
1 x2
Z

1 sin ((n + 1/2)t)
n = dt.
2
sin (t/2)
Its behaviour obeys
4 4
n 6 log (n + 1) + 3 and n log n as n+.
2 2

Proof. See Chapter 15 of [22]. 

Corollary 6.20. If f is Lipschitz continuous over [1, 1], then the truncated Chebyshev expansion
of f converges uniformly to f.

Proof. Same as above. 

6.4 Chebyshev expansions


The Chebyshev expansion of f is the Fourier expansion of f (cos t), so that many results on the
convergence of Chebyshev expansions can be deduced from corresponding results in the well-
developed theory of Fourier series.

6.4.1 Convergence
Here is a summary of convergence results that we are going to rely on (see [22] Thm.3.1, 7.1, 7.2,
8.1, 8.2 for versions with weaker hypotheses).
66 Orthogonal polynomials - Chebyshev series

Theorem 6.21. Let f be continous on [1, 1]. Denote by (ak) its sequence of Chebyshev coefficients
and by (fn) its sequence of truncated Chebyshev expansions. Then
1. The coefficients ak tend to 0 when k .
2. If f is Lipschitz continuous on [1, 1], then (fn) converges absolutely and uniformly to f.
3. If f is C m and f (m) is Lipschitz continous, then ak = O(1/k m+1) and kf fn k = O(nm).

4. If f is analytic inside the ellipse z + z 2 1 6 r with r > 1, then ak = O(r k) and
kf fn k = O(r n).

6.4.2 Relation with Chebyshev interpolation


Recall that Tn(cos t) = cos n t can be used to dene the Chebyshev polynomials (including the case
(n)
with negative index, with Tn = Tn) and that the Chebyshev points are j = cos (j/n). Then,
we have
(n) (n) (n)
Tn( j ) = (1) j = T(2m+1)n( j ) = Tn( j ).

Proposition
P 6.22. Assume that f is Lipschitz continuous on [1, 1]. Let its Chebyshev expansion
be ak Tk, and its n-th Chebyshev interpolant be
n
X (n)
pn(x) = ck Tk(x).
k=0
Then, we have
(n)
X
ck = aj.
j mod 2n=k
or 2nj mod 2n=k

These relations might be more readable under the form


(n)
c0 = a0 + a2n + a4n +
c(n)
1 = a1 + a2n1 + a2n+1 + a4n1 + a4n+1 +
...
c(n)
n = an + a3n + a5n + .

The sums converge thanks to the absolute convergence of the Chebyshev expansion.

Proof. Consider the polynomial q whose coecients are those sums. Then, we have
(n) (n) (n)
q( j ) = f ( j ) = pn( j )

for all 0 6 j 6 n. Since both pn and q have degree at most n, they are therefore equal. 

Hence, when convergence is fast, the interpolants are very close to the Chebyshev expansion.
This is exploited for instance in Trefethens Matlab package chebfun, which uses the pn as a data
structure to represent mathematical functions. Since computing the pn by FFT requires to evaluate
the function f , which may be expensive, we now turn to the direct computation of the Chebyshev
coecients when f is solution of a linear dierential equation.

6.5 Chebyshev expansions for D-finite functions


Consider the linear dierential equation

L(x, ) f = pm f (m) + + p0 f = 0, pi(x) = Q[x],


d
where = dx denotes the dierentiation operator, and x in the expression L(x, ) actually denotes
the linear operator that maps a function f to the function x 7 x f (x).
6.5 Chebyshev expansions for D-finite functions 67

Our aim is to show that for all functions that cancel such an operator, the rst n terms of the
Chebyshev expansion can be computed in a linear number O(n) of operations.

6.5.1 Warm-up: Taylor series


Denote Mn(x) = xn. We know the following important relations, valid for all n Z:
x Mn(x) = Mn+1(x), Mn (x) = n Mn1(x).
Let
X X
f= fk Mk , g= gk Mk
k>K Z k>K Z

be two formal Laurent series, and assume L f = g. By the above formulas, we have:
X X X
x f = fk x Mk = fk Mk+1 = fk1 Mk
X X X

f= fk Mk = fk k Mk1 = (k + 1) fk+1 Mk.
Dene the shift operator
S: (fk) 7 (fk+1)
that sends a sequence to the sequence with the same values but where the indices are shifted by 1.
Also dene
X = S 1, D = (k + 1) S = S k.
Then, an iterative application of the previous rules, plus linearity, leads to
(gk) = L(X , D) (fk).

Example 6.23. Consider y y = 0, that is, f (x) = ex, g(x) = 0, and L(x, ) = 1. Then we have
L(X , D) = D 1 = (k + 1) S 1,
hence ((k + 1) S 1) (yk) = 0, and we have obtained the expected recurrence (k + 1) yk+1 = yk
satised by the coecients yk = 1/k!.

6.5.2 Formal manipulation of Chebyshev series


Let us now try to do the same for Chebyshev series, working in an entirely formal way at rst.
Starting with the relation Tn(cos t) = cos (n t), simple trigonometric identites prove that
Tn+1(x) + Tn1(x)
x Tn(x) = , 2 (1 x2) Tn (x) = n (Tn1(x) Tn+1(x)).
2
Let now
X X
f= fk Tk , g= gk Tk ,
kZ kZ
1
be formal Chebyshev series (writing them this way avoids the factor 2 that appears in the
standard denition of Chebyshev coecients above). Assume L f = g. Writing
X X  Tk+1 + Tk1  X fk+1 + fk1
x f = fk x Tk = fk = Tk ,
2 2
we choose
S + S 1
X= .
2
We do not have such a nice expression for f , but we can still write
X X 1 X (k + 1) fk+1 (k 1) fk1
(1 x2) f = fk (1 x2) Tk = fk k (Tk 1 Tk+1) = Tk.
2 2
If now
X X
f = fk Tk = wk Tk ,
68 Orthogonal polynomials - Chebyshev series

then
(k + 1) S (k 1) S 1 S S 1
(1 X 2) (wk) = (fk) = k (fk).
2 2
Let us be bold and use a formal inverse of (1 X 2). Proceeding purely formally produces
 
S S 1 S 2 + S 2 + 2 1 S S 1
(1 X 2)1 k= 1 k
2 4 2
 
S S 1 2 S S 1
= k = 2(S 1 S)1k.
2 2
It is therefore natural to set
D = 2(S 1 S)1k.
Then, if g = L(x, ) f , we should have (gk) = L(X , D) (fk). There remains to make sense of this.

6.5.3 An algorithm for Chebyshev expansion


Notation 6.24. We let C be the space of sequences (ak)kZ such that ak = ak and ak










0.
k

The sequences of Chebyshev coecients of any function in C([1, 1]) belong to C: the symmetry
is clear from the denition of the coecients and the limit has been given in Theorem 6.21.
In this space C, the meaning of the operator D and its inverse can be made clear as follows.

Proposition 6.25. Let f and g belong to C([1, 1]) and let (fk) and (gk) be their Chebyshev
sequences. Then (gk) is the unique solution in C of the equation gk = Dfk (i.e., gk 1 gk+1 =
2 k fk). Conversely, given the Chebyshev sequence (gk) of a continuous function g, the primitive f
of g has for Chebyshev coefficients (fk) the unique solution of fk = D 1 gk in C such that f0 is given
by ( 6.18).

Proof. It is not hard to prove that for k =


/ 0, we have
   
T d Tk+1 Tk 1 d 1 
k = = 1 x2
Tk .
1 x2 dx 2 k 1 x2 dx k 2

Using this relation and integrating by parts, we get for k =/ 0 in Z


Z 1
1 f (x) Tk(x)
fk = dx
1 1 x2
 1 Z
1 1 1 1 T T
= f (x) 2 1 x Tk2
f (x) k+1 k1 dx
k x=1 1 2 k 1 x2
gk+1 gk+1
= +
2k 2k
and hence
2 k fk = gk1 gk+1. (6.2)
Here we are using bilateral Chebyshev series with fk = fk. This explains the factor 1/ instead
of 2/ above and shows that this last equality is true also for k = 0.
If (gk) and (hk) are solutions, then for all k Z, we have
gk1 gk+1 = hk 1 hk+1,
so that the sequence (ck) dened by ck = gk hk is in C and satises ck+1 ck 1 = 0, or in other
words (S S 1)(ck) = 0. Uniqueness is then proved by the following.

Lemma 6.26. If (ck) C and (S S 1)(ck) = 0, then ck = 0 for all k Z.

Proof. From
ck = ck+2 = = ck+K












0,
K
6.5 Chebyshev expansions for D-finite functions 69

we deduce that |ck | < for arbitrarily small , hence that ck = 0. 

The proof of the second statement of the Proposition is obtained by observing that conversely,
since f C 1([1, 1]), its sequence of Chebyshev coecients belongs to C and satises (6.2) where
now (fk) is the unknown sequence. For k = / 0, dividing by k shows uniqueness of the solution. The
case k = 0 depends on the constant of integration. 

Theorem 6.27. If f C m([1, 1]) and if g = L(x, ) f, with L of order m, then g is continuous
and their Chebyshev sequences are related by
L(X , D) (fk) = (gk).
In particular, if L(x, ) f = 0, then L(X , D) (fk) = 0.

Proof.
If L = 1 (the identity), the result is clear: f = g implies fk = gk.
Assume L = x, that is, g = x f . Then
1
gk = hx f , Tk i = hf , x Tk i = hf , 2 (Tk+1 Tk1)i = (X f )k.

If L = xm, then (gk) = X m (fk) by the previous point.


The case L = is dealt with by the previous proposition.
The case of k follows by induction...
...and the rest by linearity (and induction). 

6.5.4 Examples
Applying the operator D amounts to solving a second order recurrence. However, this is generally
unncessary.

Example 6.28. The function y(x) = exp x satises y y = 0. We get (D 1) (fk) = 0. Expressed
in terms of k and S, this is
D 1 = (2 (S S 1)1 k 1).
Now, we factor out S S 1 and obtain
D 1 = (S S 1)1 (2 k S 1 + S).
Multiplying by (S S 1) on the left shows that (2 k S 1 + S) (fk) = 0, or in other words
2 k fk fk1 + fk+1 = 0.

Example 6.29. The function y(x) = erf(x) satises y + 2x y = 0. We get (D2 + 2 X D)(f )k = 0,
which rewrites as
D2 + 2X D = 4(S 1 S)1k(S 1 S)1k + 2(S 1 + S)(S 1 S)1k.
In order obtain a recurrence equation out of this, we note that multiplying the commutative product
(S 1 + S)(S 1 S) = (S 1 S)(S 1 + S)

on the right and on the left by (S 1 S)1 lets us factor out S S 1 on the left as in the previous
example, which leads to
D2 + 2X D = (S 1 S)12(2k(S 1 S)1k + (S 1 + S)k).

We can now factor out k(S S 1), leading to


 
1
L(X , D) = (S 1 S)1k(S 1 S)12 2k + (S 1 S) (S 1 + S)k .
k
70 Orthogonal polynomials - Chebyshev series

Again, the equation L(X , D)(fk) = 0 is of the form (S 1 S)1(uk) = 0, which implies (uk) = 0.
Next, (uk) = k(S 1 S)1(vk) = 0 implies (S 1 S)1(vk) = 0 for all k =
/ 0, and this identity also
holds for k = 0 (y being analytic all the functions we consider are continous). Thus we nally get
the following recurrence for the Chebyshev coecients of erf:
   
k2 k k k+2
2 fk2 + 2k + fk fk+2 = 0,
k1 k 1 k+1 k+1
which can be further simplied to
(k 2)(k + 1)fk 2 + 2k 2 fk (k + 2)(k 1)fk+2 = 0.

6.6 Ore polynomials


At this point, it is not clear that we can always get a recurrence out of the expression L(X , D).
We now show that it is always possible to rewrite L(X , D) as

L(X , D) = Q(k, S) 1 P (k, S)


where P , Q are polynomials, thanks to the theory of the next section. Then P (k, S) is the
recurrence we are after.

6.6.1 Definition
Let : Q(x) Q(x) be an injective ring morphism, and let be a -derivation, that is, a linear
map such that
(f g) = (f ) (g) + (f ) g.
Then, the ring of polynomials in a variable with coecients in Q(x) subject to the commutation
rule
f = (f ) + (f ), f Q(x)

is called the ring of Ore polynomials over Q(x), and denoted Q(x)h; , i.

Example 6.30.
Commutative polynomials Q(x)[]: take = id, = 0.
Recurrence operators: = S, = 0.
d
Dierential operators: = id, = dx .

6.6.2 Properties
Degree Using the commutation rule, it is always possible to write an Ore polynomial P in a
unique way in the form
P = ak k + + a0,
with the coecients ai on the left. The largest exponent of in this expression is called the degree
of P .

Lemma 6.31. For P , Q Q(x)h; , i, we have deg (P Q) = deg (Q P ).

Proof. By induction, the leading monomial of iP with P as above is i(ak) k+i, since by
injectivity of , the coecient is not zero. If Q has degree and coecients bi, then by linearity,
it follows that the leading monomial of Q P is b (ak) k+, which has degree k + and by the
same reasoning, this is also the degree of the leading monomial ak k(b) k+ of P Q. 
6.6 Ore polynomials 71

Euclidean division on the right Based on the previous lemma, it is easy to obtain that for
any A, B, there exists a unique pair (Q, R) such that A = Q B + R, with deg R < deg B and Q on
the left of B. (Proof and algorithm are as usual, except that one needs to take care of the order
of the factors).
Euclidean algorithm for right gcds (gcrds) Again, the same algorithm as in the commutative
case works, using the Euclidean division on the right at each stage. The same proof shows that
the last nonzero remainder is a greatest common right divisor (gcrd).
Extended Euclidean algorithm This is the variant where the cofactors are computed at each
stage, leading in particular to Um , Vm such that
Um A + Vm B = G = gcrd(A, B).

Least common left multiples (lclms) In the last relation Um+1 A + Vm+1 B = 0 found by the
Extended Euclidean algorithm (on the right), one can prove that Um+1 A = Vm+1 B are lclms
(note that this is not the standard proof in the commutative case).

Example 6.32. The Fibonacci sequence (Fn) dened by Fn+2 = Fn+1 + Fn, F0 = 0, F1 = 1 satises
Fn+15 = 610Fn+1 + 376Fn. This is clear by Euclidean division: S 15 = A (S 2 S 1) + 610S + 376.

Example 6.33. In Chapter 3, it was proved that the sum of the solutions f and g of linear
dierential equations L(x, )f = 0 and M g = 0 satises a linear dierential equation N (f + g) = 0.
The operator N in this equation is nothing but lclm(L, M ).

6.6.3 Fractions
Two pairs (P1, Q1) and (P2, Q2) of Ore polynomials will be called equivalent when

Q1 P1 = Q2 P2, where lclm(Q1, Q2) = Q1 Q1 = Q2 Q2.

We view equivalence classes as fractions and write (P , Q) = Q1 P . This makes sense, since for
two equivalent pairs we have
 
Q11P1 = Q1 Q1 1 Q1 P1 = Q2 Q2 1 Q2 P2 = Q21P2.

Addition and multiplication of the equivalence classes are obtained in a way that is completely
similar to the construction of the fraction eld of a commutative Euclidean ring:
Addition:
1 consider A1 B + C 1 D and let L = lclm(A, C) = A A = C C. Then A1B =
A A AB = L AB and similarly for C 1D, so that
1


A1 B + C 1 D = lclm(A, C)1 A B + C D .

Multiplication: consider A 1 B C 1 D and write lclm(B , C) = BB = C C, then A1 B =


(B A)1 B B = (B A)1 C C so that
A1 B C 1 D = (B A)1 C D.

Theorem 6.34. (Ore, 1931). The set of fractions of Ore polynomials forms a non-commutative
field.

Proof. See [15]. 

6.6.4 Application to Chebyshev Series


A consequence of the possibility of expressing any expression using sums, products and inverses of
Ore polynomials as a fraction is that L(X , D)(fk) = 0 from Theorem 6.27 can be rewritten in the
form Q(k, S) 1 P (k, S)(fk) = 0. We now conclude this chapter by proving the following.
72 Orthogonal polynomials - Chebyshev series

Theorem 6.35. Let f C m([1, 1]) be such that L(x, ) f = 0 with L of order m and let
L(X , D) = Q(k, S)1P (k, S). Then the sequence (fk) of coefficients of the Chebyshev expansion
of f satisfies P (k, S)(fk) = 0.

Proof. Recall that L is an operator of order m. By integrating the equation L(x, )f = 0 exactly
m times, we obtain an equation of the form M (x, I)f = 0, where I denotes the integration operator.
By Proposition 6.25, this operator acts on Chebyshev sequences by D 1 = k 1(S 1 S)/2. It
follows that I mL(x, ) acts on Chebyshev sequences as D mL(X , D) = M (X , D 1) which is a
polynomial , i.e., does not have a denominator. From there follows that L(X , D) = DmM (X , D 1);
in other words the denominator of this fraction equal to L is exactly a power of D. Thus Q1P
and D mM are equal fractions. This implies that QP = DmM , where lclm(Q, D m) = QQ = D mDm.
From there follows that whatever the representation of L(X , D) as a fraction Q1P , the operator D
is a right factor of Q. Simplifying these fractions by D and repeating this reasoning shows that Q
is necessarily a power of D provided P and Q are relatively prime, so that L(X , D) = D iP
for some i 6 m. Thus, we have obtained that the sequence (fk) C satises D iP (fk) = 0. By
Proposition 6.25, this implies that P (fk) = 0 as was to be proved. 
Chapter 7
Interval Arithmetic, Interval Analysis
Interval Arithmetic is an arithmetic for inequalities. Assume for instance that we know that
5 6 a 6 6 and 10 6 b 6 11: then of course 50 6 a b 6 66. We will dene a product of real intervals
such that
[5, 6] [10, 11] = [50, 66]
that allows for such reasoning. Another need for interval arithmetic comes from the roundo errors
that occur when working with nite precision numbers.
Notable applications of interval arithmetic to bring rigor to numerical computations
performed on a computer include T. Hales proof of Keplers conjecture [11][12]
(see http://code.google.com/p/yspeck/), and W. Tuckers solution of Smales 14th
problem [20][21] (see http://www2.math.uu.se/~warwick/main/thesis.html and also
http://paulbourke.net/fractals/lorenz/).
The interested reader will nd numerous additional interesting information on the website
http://www.cs.utep.edu/interval-comp/.
In this course, we are interested in the use of interval arithmetic in the evaluation of mathem-
atical functions. Given > 0 and f : [a, b] R, we would like to make sure that the evaluation f (x)
of f at any value x [a, b] is such that
|f (x) f (x)| 6 .

f (x)
Note that, in practice, one commonly uses on relative error 1 rather than on absolute
f (x)
error |f (x) f (x)|. We focus on the absolute error case for the sake of clarity. To perform the
evaluation, we replace f by a polynomial p. Then we evaluate p, and f (x) = (p(x)), where is
the active rounding mode. There are two sources of error:
approximation error : let 1 be an upper bound for kf pk,
rounding error : let 2 be an upper bound for the error |p(x) (p(x))|,
we have to guarantee that 1 + 2 . In this chapter and in Chapter 9, we will develop tools
that help to establish rigorous approximation error. Regarding rounding errors, G.Melquiond has
developed a formal proof tools which address this issue[14][5][6](see http://gappa.gforge.inria.fr/).

7.1 Interval arithmetic

Definition 7.1. (Real interval.) Let x , x R, x 6 x. We define the interval


X = [x, x ] = {x R : x 6 x 6 x }.
The reals x and x are called the endpoints of the interval, x is its minimum, x its maximum. The
set of all real intervals will be denoted IR.

Definition 7.2. Let x IR. The width of x is denoted w(x) = x x. We also define the center
x + x
mid(x) = ,
2

73
74 Interval Arithmetic, Interval Analysis

1
and the radius rad(x) = 2 w(x).

Remark 7.3. It is common in the litterature to encounter the notation (mid(x), rad(x)) =
{x R: |x mid(x)| 6 rad(x)}.

Definition 7.4. A point (or degenerate, or thin) interval is one of the form [x, x], also denoted [x].

7.1.1 Operations on intervals


We now dene basic arithmetic operations on intervals. As you will see, monotonicity plays an
essential role for obtaining sharp enclosures.

Definition 7.5. Let X , Y IR. Let {+, , , /}. We denote

XY = {xy : x X , y Y }
where, if = /, we assume that 0
/ Y.

Proposition 7.6. We can compute the XY above using formulae such as

[x, x ] + [ y, y ] =[x + y, x + y ],
[x, x ] [ y, y ] =[x y , x y],
[x, x ] [ y, y ] =[min (x y, x y , x y , x y ), max (x y, x y , x y , x y )],
 
1 1
[x, x ]/[ y, y ] =[x, x ] , if 0/Y ,
y y
which depend only on the endpoints.

Proof. Exercise. 

Remark 7.7. Note that, in IR, the operations + and are associative and commutative.

Remark 7.8. In practice, multiplication (hence division) can be made more ecient. From the
formula in the previous proposition, it seems to require four real multiplications and several com-
parisons. And yet, if one checks the sign of the endpoints of the two intervals before starting the
computation, one can reduce this amount. Note that there are nine possible cases: one of them
indeed leads to four multiplications while the other eight only need two multiplications. Likewise,
there are six possible cases for the division.

Remark 7.9. It can be convenient to dene a result for the division even when 0 Y . One can
nd an interesting discussion regarding this issue in Section 2.3 of W. Tuckers book [23].To do
that, we work over R=R {+, }, with two signed zeros +0 and 0 (more precisely, over
over R\{0} {+, , +0, 0}, with two signed zeros +0 and 0). Hence, we can take advantage
of the following relations
1/(+ ) = +0, 1/(+0) = +, 1/() = 0, 1/(0) = .
Assume y < 0 < y . Then we dene
   
1 1 1 1
= , , = , +
[ y, 0] y [0, y ] y
and in general    
1 1 1
= , , + .
[ y, y ] y y

We will thus dene the notion of extend interval by removing the condition x 6 x and set

{x R : x 6 x 6 x } if x 6 x ,
X = [x, x ] =
[, x ] [x, +]otherwise.
7.1 Interval arithmetic 75

We introduce the notation IR = {[x, x ]: x, x R}. We then dene division over IR as follows


X [1/y , 1/ y] if 0 /Y ,

[, +] if 0 X and 0Y ,




[x / y, +] if x < 0 and y < y = 0,

[x / y, x /y ] if x < 0 and y < 0 < y ,

X/Y = [, x /y ] if x < 0 and 0 = y < y , (7.1)



[, x/ y] if 0 < x and y <
y = 0,

[x/y , x/ y] if 0 < x and y < 0<y ,




[x/y , +] if 0 < x and 0 = y <y ,

if 0/ X and Y = [0, 0].

Proposition 7.10.
1. Interval subtraction is not the inverse of addition.
2. Interval division is not the inverse of multiplication.
3. Interval multiplication of an interval with itself is not equivalent to squaring the interval,
i.e., in general,
/ [min (x2, x 2), max (x2, x 2)].
[x, x ] [x, x ] =

4. Interval multiplication is sub-distributive wrt addition: for all X , Y , Z IR, we have


X (Y + Z) X Y + X Z.

5. For all X IR, we have X + [0] = X and [0] X = [0].

Proof. Exercise. 

A straightforward yet quite useful statement is the following.

Lemma 7.11. (Inclusion isotonicity) If X X , Y Y , {+, , , /}, then


X Y X Y .
/ Y .
For division, we assume that 0

Proof. Obvious from Denition 7.5. 

7.1.2 Floating-point interval arithmetic


When it comes to implementing interval arithmetic on a computer, we no longer work over R, but
in most cases with oating-point numbers. Let F be the set of machine numbers we are working
with. Then we denote
IF = {[x, x ] : x, x F }.
Of course the set of oating-point numbers is not arithmetically closed (e.g., the sum of two oating-
point numbers is not always a oating-point number). When we perform arithmetic operations
on intervals in IF , we need to make sure to round the resulting interval outwards in order to
guarantee that it contains the true result. For X , Y IF , we set
X + Y = [(x + y), (x , y )],
X Y = [(x y ), (x y )],
X Y = [min ((x y), (x y ), (x y) , (x y )),
max ((x y), (x y ), (x y) , (x y ))],
X/Y = [min ((x / y), (x /y ), (x / y) , (x /y )),
max ((x / y), (x /y ), (x / y) , (x /y ))] if 0
/Y ,
where and denote rounding to and + respectively. Using (7.1), we can extend oating-
point interval arithmetic: IF = {[x, x ]: x, x F }.
76 Interval Arithmetic, Interval Analysis

Remark 7.12. Standard machine oating-point numbers are not always sucient, e.g., to work
with very small intervals. We may also use multiple-precision oating-point numbers as bounds for
our intervals. An example of a library which oers support for multiple precision interval arithmetic
is MPFR7.1.

7.2 Interval functions


Definition 7.13. Let D R, and let f : D R. We denote
R(f , D) = {f (x) : x D}
the range of f over D.

Remark 7.14. Finding the exact image of a (usually multivariate) function, and, in particular,
a value where f attains its minimum is a whole subdomain of Math and CS called Optimization
Theory.

Let X = [x, x ] IR. By monotonicity, interval functions dened as follows give the exact range
of the corresponding real functions:
eX = [exp x, exp x ],

X = [ x , x ], x > 0,
log X = [log x, log x ], x > 0,
arctan X = [arctan x, arctan x ],
For some other functions like xn, trigonometric functions..., writing down R(f , D) is also possible,
as long as we know their extrema. For instance, let n Z, X IR,


if n2N + 1, [xn , x n]
if nN \ {0}, n even, [min (xn , x n), max (xn , x n)] if 0
/ X,
n
X = pow(X , n) = [0, max (xn , x n)] otherwise,

[1, 1] if n =n
0,


[1/x , 1/x] if n N and 0 / X.
Exercise 7.1. Write the analogous formulas for sin, cos, tan. For sin and tan, consider

n o n o
S1+ = 2k + , k Z , S1 = 2k , k Z .
2 2
For cos, consider
S2+ = {2k, k Z}, S2 = {2k + , k Z}.

The example of f (x) = x2 x + 1 over [0, 2] illustrates two important issues:


overestimation;
dependency on the way the function is written.
We have f (x) [0, 2]2-[0,2]+[1]=[0, 4]+[-2,0]+[1]=[-1,5]. Now write f (x) = x(x 1) + 1. We have
f (x) [0, 2][1, 1] + [1] = [2, 2] + [1, 1] = [1, 3]. Actually, R(f , [0, 2]) = [3/4, 3].

Definition 7.15. (Interval extension.) Let X IR, and let f : X R. A function f: IR IR is


called an interval extension of f over X if for all Y IR with Y X, we have

f(Y ) R(f , Y ).

Several interval extensions are possible for the same function over the same X. Interval exten-
sions of exp over [1, 1] include
the constant function X 7 [e1, e],

7.1. http://www.mpfr.org
7.2 Interval functions 77

the constant function X 7 [1 + e1, 1 + e],


the function [x, x ] 7 [ex , ex].

Let us try to propose a systematic process for computing interval extensions. If f (x) is a
rational expression, one means to get an interval extension of the function it denotes is to replace
each occurrence of the variable x by the interval X, and overload all arithmetic operations with
interval operations. The resulting extension is called the natural interval extension.

Theorem 7.16. Given a rational expression denoting a real-valued function f, and its natural
interval extension F, which we assume to be well-defined over some interval X IR, then
1. Z Z X implies F (Z) F (Z ) (inclusion isotonicity);
2. R(f , X) F (X) (range enclosure).

Proof. To prove assertion 1, it suces to repeatedly use Lemma 7.11. Regarding assertion 2,
assume that there exists y X such that f (y) R(f , X) but f (y)
/ F (X). This implies that
F ([y, y]) = [f (y), f (y)]F (X)which contradicts assertion 1. 

We now would like to extend this notion of natural interval extension to a larger class of
functions.

Definition 7.17. We call basic (or standard) functions the elements of



S = sin, cos, exp, tan, log, x p/q
for which we can determine the exact range over a given interval based on a simple rule.

These functions are said to have a sharp interval enclosure.

Definition 7.18. We call elementary function a symbolic expression built from constants and basic
functions using arithmetic operations and composition. The class of elementary functions will be
denoted E. A function f E is given by an expression tree (or dag, for directed acyclic graph).

Definition 7.19. An interval valued function F : X IR IR is inclusion isotonic over X IR


if Z Z X implies F (Z) F (Z ).

Theorem 7.20. Given an elementary function f and an interval X over which the natural interval
extension F of f is well-defined:
1. F is inclusion isotonic over X;
2. R(f , X) F (X).

Proof. The statement holds for rational functions (cf. Lemma 7.11) and, by denition, for standard
functions. Let g and h be two elementary functions for which the Theorem holds. We have to
prove that it holds as well for the function gh, where {+, , , /, }. Lets address the case of
(the other cases are analogous). We assumed that F (X) is well dened: this implies that neither
f nor any of its sub-expressions have singularities in their domains, induced by the interval X.
Then, is Zand Z denote two sub-intervals on the domain of h such that Z Z , the function h
is continuous over the compacts Zand Z . If G and H denote the natural interval extensions of
g and h, the sets H(Z) and H(Z ) are compact intervals. Using the inclusion isotonicity property
satised by G and H, we obtain H(Z) H(Z )and G H(Z) = G(H(Z)) G(H(Z )) = G H(Z ).
The proof of the second assertion is analogous to the proof of assertion 2 of Theorem 7.16. 
78 Interval Arithmetic, Interval Analysis

Example 7.21. Consider


f (x) = (cos x x3 + x) (tan x + 1/2)
over [0, /4]. To show that f has no zero in this range, we compute the natural interval extension
" #
2 3
f ([0, /4]) = ,1+ [0.22, 1.18].
2 64 4

Exercise 7.2. Show that f (x) = x sin x + 2/5 has no zero over [0, /4]. Hint: evaluating the natural interval
extension is not enough. Split the domain.

Theorem 7.22. Let X IR. Let f be an elementary function such that any subexpression of f
is Lipschitz continuous. Let F be an inclusion isotonic interval extension such
S that F (X) is well-
defined. Then, there exists > 0, depending on F and X, such that, if X = ki=1 Xi, with Xi IR
for all i, then
k
[
R(f , X) F (Xi) F (X)
i=1
and !
k
[
rad F (Xi) 6 rad(R(f , X)) + max rad Xi.
i=1,...,k
i=1

Proof. The rst inclusion follows from the inclusion isotonicity and the range enclosure properties:
k
! k k k
!
[ [ [ [
R(f , X) = R f , F (Xi) = R(f , Xi) F (Xi) = F Xi = F (X).
i=1 i=1 i=1 i=1

Now, we are going to prove the following fact: if Z X and y0 F (Z), then for all y R(f , X), we
have |y y0| 6 rad(Z). This implies the inequality in the theorem.
This statement is true for constants. It is also true for standard functions which are bounded
(they have a sharp interval enclosure). In the same way as in the proof of Theorem 7.20, we
consider two connecting branches g1 and g2 of the expression tree dening f and we prove that
the statement is also valid for g1g2, where {+, , , /, }. We focus on , the other cases being
analogous. The functions g1 and g2 are elementary (as sub-expressions of the elementary function
f ) and Lipschitz continuous. From Theorem 7.20, their natural interval extensions G1 and G2 are
inclusion isotonic. We also know that, since F (X) is well-dened, these extensions ae also well-
dened on their respective domains ZG1 and ZG2 induced by X. We assumed that the statement
is true for g1 and g2: for i = 1, 2,
if V ZGi , y0 Gi(V ) and y R(gi , V ), then |y y0| 6 irad(V ).
The range enclosure property of Theorem 7.20 gives: for all V ZH , we have
R(g1 g2, V ) = R(g1, R(g2, V )) R(g1,G2(V )).
Let z R(g1 g2, V ), there exists u R(g2, V ) s.t. z = g1(u). The real number u also belongs to
G2(V ). Therefore, if z0 G1 G2(V )=G1(G2(V )), the inductive assumptions on gi and Gi yield
|z z0| 6 1rad(G2(V ))
and
rad(G2(V )) 6 rad(R(g2, V )) + 2rad(V ) 6 (K2 + 2)rad(V ),
where K2 is a Lipschitz constant for g2. If we combine these two inequalities, we obtain
|z z0| 6 1(2 + K2)rad(V ).
Now, we use the fact that the expression tree dening f is nite by denition, which implies that
the constant of the statement exists: it is the result of a nite accumulation of constants yielded
as above. 
7.2 Interval functions 79

However, the number of subdivisions needed may be very large.

Example 7.23. Let f (x) = e1/cos x, and let p be a degree-5 minimax approximation of f over [0, 1].
Let
(x) = f (x) p(x).
Using the natural interval extension of , we get kk < 0.4. But one can show that obtaining the
true value kk 1.12 106 by subdivision would require about 107 subintervals.
Chapter 8
Linear recurrences and Chebyshev series
Recall that we showed earlier that (under some regularity conditions) solutions of linear ODEs with
polynomial coecients have Chebyshev series that satisfy linear recurrences. These recurrences
provide an ecient way to compute the coecients. However, we also saw that using recurrences
to compute some sequences could be numerically unstable. It turns out that the recurrences we
computed for Chebyshev series are aected by this problem. In this chapter, we show how this
diculty can be overcome and values be computed in a numerically stable way. Our starting point
is the recurrence
un+k + a1(n) un+k 1 + + ak(n) un = 0, n N. (8.1)
or its matrix version using the companion matrix: Un+1 = A(n) Un with

1
un
Un = and A(n) = .
1
un+k1
ak(n) ak1(n) ... a1(n)
We rst consider this general setting and then specialize it to the specic form that recurrences
for Chebyshev coecients take.

8.1 Constant coefficients and the power method


When the recurrence (8.1) takes the form
un+k + 1un+k1 + + kun = 0, n N, (8.2)
with constant coecients 1, ..., k, the matrix A(n) above does not depend on n; its characteristic
polynomial
A(X) = X k + a1X k1 + + ak
is called the characteristic polynomial of the recurrence (8.2). The roots of A are the eigenvalues
of A; they play an important role in the convergence of solutions of the recurrence. In order
to simplify the technical aspects of the discussion, we henceforth make the following important
assumption:
Throughout this chapter, we assume that the matrix A Ck k is diagonalizable, that
its eigenvalues are numbered so that |1| > |2| > > |k | and that the corresponding
eigenvectors v1, ..., vk are taken with norm 1.
Under this assumption, any vector of initial conditions U0 = (u0, ..., uk 1) decomposes in this
basis as
U0 = c1v1 + + ckvk
so that
Un = c1n1 v1 + + cknk vn. (8.3)

8.1.1 Power Method


The previous observation is the basis of the power method , which consists in iterating Un+1 = AUn
in order to nd an eigenvector corresponding to the largest eigenvalue. The basic property and
justication of this method is summarized in the following lemma.

81
82 Linear recurrences and Chebyshev series

Lemma 8.1. If c1 = / 0 (ie, U0 does not belong to the subspace Vect(v2, ..., vk)) and |1| > |2|, then
the iteration Un+1 = AUn is such that, as n , Un/kUn k tends to Vect(v1) and hUn |Un+1i/
kUn k2 1

Proof. By hypothesis, neither c1 nor 1 is 0, thus (8.3) can be divided by c1n1 , which yields
 n  n
Un c2 2 ck k
v1 = v2 + + vk.
c1n1 c1 1 c1 1
Since the eigenvalues are sorted by decreasing modulus, the norm of the right-hand side tends
to 0, which proves at the same time that the vector Un/(c1n1 ) tends to v1. Its norm tends to 1,
which shows kUn k |c1n1 | and proves the rst statement. It follows that hUn| AUn i/kUn k2 hv1|
Av1i = 1 by continuity of multiplication by A and scalar product, which proves the second one. 

In general, a random U0 will do, since the probability that it belongs to Vect(v2, ..., vn) is
negligible. Note that in practical use of this method, in order to avoid an overow, it is preferable
to normalize Un at each step.

8.1.2 Inverse Power Method


A simple variant of the power method gives access to the other eigenvalues and eigenvectors.
Indeed, for any {1, ..., k }, the matrix B = (A Id)1 has for eigenvalues 1/(i ), i = 1, ...,
k the largest of which corresponds to the eigenvalue of A that is closest to . It is thus obtained
by applying the power method to B. In general, this requires either computing an inverse or at
least solving the equation (A Id)Un+1 = Un for Un+1, which is more costly than the direct
power method, but in some special cases the inverse is known in advance, making that method
very ecient.
Application to recurrences. The inverse of the map (un , ..., un+k 1) 7 (un+1, ...,
un+k) is simply the map shifting the indices backwards, which is computed by unrolling the
recurrence (8.2) backwards. Thus if the companion matrix of the recurrence has eigenvalues sat-
isfying 0 < |k | < |k 1| 6 6 |1|, then with artibrary initial conditions (uN +1, ..., uN +k)
outside of a strict subspace of Ck, the corresponding vector (u0, ..., uk 1) once normalized, con-
verges to the initial conditions of a solution of (8.2) of minimal growth. The condition 0 < |k |
simply means that ak = / 0, which in turns corresponds to the recurrence having exactly order k
rather than a smaller one.

Example 8.2. The Fibonnacci recurrence Fn+2 = Fn+1 + Fn has the sequence of Fibonnaci
numbers as solution with initial conditions F0 = 0, F1 = 1. Another solution is ((1 )n)n,
where is the golden ratio, solution together with 1 0.618033988 of the characteristic
polynomial X 2 X 1. Numerical use of this recurrence in the forward direction with initial
conditions (u0, u1) = (1, 0.618034) yields the successive values
(u10, u11) (0.008130, 0.005026), (u20, u21) (0.000010, 0.000164), (u30, u31) (0.09360,
0.015146), (u40, u41) (1.151270, 1.862794), (u50, u51) (141.596850, 229.108516).

Once scaled by dividing out by their rst coordinates, these become


(1, 0.6182041820), (1, 16.40000000), (1, 1.618162393), (1, 1.1618033997), (1, 1.618033989)
in complete agreement with Lemma 8.1: the dierence between the exact value of 1 and
the initial condition u1 makes the coordinate c1 =
/ 0. Even though this coordinate is very small,
ultimately this component of the vector takes over and the iterates converge to an eigenvector
for .
Conversely, the inverse power method suggests taking N = 20 and starting with (uN , uN +1) = (1,
0). This gives the sequence of values
(u0, ..., u20) = (10946, 6765, 4181, 2584, 1597, 987, 610, 377, 233, 144, 89, 55, 34, 21, 13,
8, 5, 3, 2, 1).
8.2 Asymptotics of linear recurrences 83

In particular, u1/u0 0.6180339850 which coincides with 1 up to 108.


This technique is the basis of Millers algorithm in the case of a recurrence whose coecients
are not necessarily constant, see Section 8.3.

8.1.3 Block Power Method


If the power method is applied not to one, but to m 6 k initial vectors, an analogous reasoning
shows that the iterates converge in a sense to Vect(v1, ..., vm). Here is a more precise statement
that generalizes Lemma 8.1.

Lemma 8.3. Let (v1, ..., vk) be the eigenvectors of A as above and (w1, ..., wm) be m vectors
of Ck such that B = (w1, ..., wm , vm+1, ...,vk) is a basis of Ck. If moreover, |m | > |m+1|, then
the sequence of m-tuples of vectors Wn = w1(n), ..., wm(n)
= An(w1, ..., wm) is such that Wn tends
* +
  
m+1 n
(n)
wi
to Vect(v1, ..., vm) in the sense that for any > m, v| (n) = O
.
wi m

 
F 0
Proof. First, the matrix of change of basis betwen B and (v1, ..., vk) has a block structure G Id
with F a m m matrix of rank m. We then consider the new vectors 0 = W F 1. Then we have
(0)
i = vi + cm+1vm+1 + + ckvk and the same reasoning as in the proof of Lemma 8.1 applies
(n) (0)
to i = Ani , showing that An0 tends to Vect(v1, ..., vm) in the sense given in the Lemma.
The result on Wn is obtained by multiplying An0 on the right by F . 

8.2 Asymptotics of linear recurrences


In the case of linear recurrences with constant coecients, the asymptotic behaviour of the solu-
tions provided by (8.3) has an impact on the numerical use of the recurrence as shown in the
previous section. It turns out that a similar phenomenon occurs for linear recurrences with variable
coecients. An idea due to Poincar is to view the variable coecients case as a perturbation
of the constant coecients case. We now review these ideas, before applying them to Chebyshev
recurrences in the next section.

8.2.1 Poincars theorem


Consider again the recurrence (8.1).

Theorem 8.4. (Poincar, 1885.) Assume that, for all i {1, ..., k }, the limit
lim ai(n) = i R
n

exists, and let (un) be a solution of ( 8.1). If the roots 1, ..., k of the characteristic polynomial

P (X) = X k + k1 X k1 + + a0
satisfy |1| > |2| > > |k |, then either un = 0 for all large enough n, or un =
/ 0 for all large
enough n and moreover un+1/un for some .

An elegant and more recent proof due to Mt and Nevai is based on the following generaliz-
ation.

Proposition 8.5. (Mt & Nevai, 1990.) Let A be a k k matrix with eigenvalues |1| > > |k |
and vi corresponding eigenvectors, with kvi k = 1. Let (Bn) be a sequence of k k matrices with
limn kBn k = 0, and let un satisfy Un+1 = (A + Bn) Un. Then either Un = 0 for large enough n, or

Un/kUn k











v
n
84 Linear recurrences and Chebyshev series

for some .

Before turning to the proof of the previous proposition, we show how it implies Poincars
theorem.

Proof of Poincars theorem. Write the recurrence in matrix form:



1
un
 



0
Un+1 = C Un , Un =
, C = + .
1 (i ai(n))i=k,...,1
un+k 1
k k 1 ... 1
||||||||||||||||||||||||||||||||||||||||||||||{z}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
A

The eigenvalues of the matrix A are the is from the statement of Poincars theorem and
satisfy |1| > > |k |. By Proposition 8.5, either Un = 0 (and thus un = 0 too) for n suciently
large, or Un/kUn k v for some eigenvector v of A. In that case,
Un+1 Un U
= (A + Bn Id) n 0.
kUn k kUn k

If v1 denotes the rst coordinate of v, we have un/kUn k v1 and this is not 0, since, by the special
form of the companion matrix A, Av = v implies vi+1 = vi for i = 1, ..., k 1 so that v1 = 0
would imply v = 0. For n suciently large, multiplying (8.5) by kUn k/un is thus possible and then
extracting the rst coecient concludes. 

Proof of Proposition 8.5. If one P Un is 0, then all the following ones are 0 too. We assume that
it is not the case and write Un = i cin vi. Then we have
X
Un+1 = i cin vi + Bn Un
i
whence
|cin+1 i cin | 6 kBn k kUn k 6 kBn kk max |cin |. (8.4)
i

Let n be the smallest integer such that |cnn | = maxi |cin |.


For large enough n, the sequence (n) is nonincreasing. Indeed, if i > n, then we have the
sequence of inequalities

|cin+1| 6 |i | |cin | + k |cnn | kBn k 6 (|i | + k kBn k) |cnn | 6 (|n | k kBn k) |cnn |

for n large enough to ensure kBn k < (|n | |i |)/(2k), and thus |cin+1| 6 |cn+1
n
|.
Being bounded, this sequence has a limit = lim n for which cn is nonzero asymptotically.
For this coordinate, (8.4) implies cn+1/cn .
For any > 0, the other coordinates satisfy
i
cn+1 cin
c i c 6 2

n n

for suciently large n. This rewrites


i i i
c c
6 |i | cn + 2.

|i | n 2 6 n+1
c
cn cn n

By the previous point and since 0 6 |cin |/|cn | 6 1 for n suciently large we have
i i
cn i
c c
n+1
|i | 6 | | 6 |i | n + .

cn cn+1 cn

/ |i |, which implies Un/cn v and


From there follows that limsup |cin |/|cn | = 0 if | | =

kUn k/|cn | 1, thus concluding the proof. 
8.2 Asymptotics of linear recurrences 85

8.2.2 Perrons theorem


In Poincars theorem, all solutions might tend to the same root of the characteristic polynomial.
This is not the case, as shown by the following more precise result.

Theorem 8.6. (Perron, 1919.) With the same hypotheses as in Poincars theorem and assuming
/ 0, there exists a basis (1n , ..., kn) of solutions such that
k =

in+1
i , i = 1, ..., k.
in

Proof. The following proof, due to Evgrafov (1953) is by induction on k.


The case k = 1 is clear: we are looking at the recurrence un+1 + a1(n) un = 0 and claiming
that un+1/un lim a1(n) = a.
There exists a solution fn0 such that fn0 =
/ 0 for all n N. Indeed, consider a basis of solutions
fn1, ..., fnk. There exists
/ {fn1/fn2 : n N, fn2 =
c2 / 0},

and then fn1 + c2 fn2 = 0 i fn1 = fn2 = 0. Then there exists c3 {(fn1 + c2 fn2)/fn3: n N,
fn3 =
/ 0}, and so on. In the end fn0 := fn1 + c2 fn2 + + ck fnk does not have any zero, otherwise
all the fni would have a common zero. This would force a linear dependency on the initial
conditions and the dimension of the space of solutions could not be k.
0
Poincars theorem then implies that there exists such that fn+1/fn0 as n .
Set un = vnfn0/n , which is possible since =
/ 0. Then vn satises the recurrence
vn+k + a1 (n) vn+k1 + + ak (n) vn = 0 (8.5)
with
0
fn+ki
ai(n) = i 0 ai(n)











i.
fn+k n

By construction, (n )n is a solution of (8.5), i.e., that recurrence has (S ) as a right


factor, where S is the shift operator as in 6.6. Hence, wnQ:= vn+1 vn satises a linear
recurrence of order k 1, with characteristic polynomial i= /
(X i).
By the induction hypothesis, this recurrence admits a basis of solutions wni, i =
/ , such that
i
wn+1
i.
wni

There remains to prove that for each wni, there exists a solution vni of the recurrence equation
i
vn+1 vni = wni
i
such that vn+1/vni i. If |i | > | |, we take

vni = wn1
i i
+ wn2 + 2 wn3
i
+ ,
otherwise
vni = 1 wni + 2 wn+1
i
+ .
(The convergence of these sums is justied by Lemma 8.7 below.)
i
In the rst case, the quotient vn+1/vni is
n 1 wi 2 n2 wi
i
vn+1 wni 1 + wni + wni +
=
vni i
wn1 wi wi
1 + wni 2 + 2 wn3 +
i
n 1 n1

from which the limit follows with a bit of analysis (bounding the tails of the series and then
controling the leading part by a geometric series). The second case is completely analogous.
86 Linear recurrences and Chebyshev series

The general solution to wn = vn+1 vn is thus


X
ci vni + c n .
/
i=

From there, we obtain a basis of solutions of the original recurrence as


X fn0
un = civni + cfn0. 
n
/
i=

8.2.3 From Perrons theorem to asymptotic behaviour


A limiting behaviour for successive quotient gives a rst taste of the actual asymptotic behaviour,
as described in the following.

Lemma 8.7. If (yn) is a sequence of complex numbers such that yn =


/ 0 for n large enough and
/ 0 as n , then yn = n enn where n 0.
yn+1/yn =

Proof. Exercise. [Hint: take the logarithm.] 

8.2.4 Birkhoff-Trjitzinskys theorem


In the case of interest when the coecients ai(n) of the recurrence (8.1) are rational functions in n,
a much more precise result is known, whose proof is unfortunately out of the scope of this course.

Theorem 8.8. If the coefficients a1(n), ..., ak(n) of ( 8.1) belong to C(n), with ak = / 0, then there
exists a basis (n1 , ..., nk) of solutions with linearly independent asymptotic behaviours of the form

n!nexp P n1/ n lnJn,
with Q, N\{0}, P C[n] of degree at most 1, J N, , in C.

Finding the corresponding elements , , P , , J, in this order, from (8.1) is not too dicult by
a sort of undeterminate coecient method, but will not be detailed here.
Observe that the previous results were mainly discussing the case when = 0. When = / 0,
they can be used after proper scaling.
2
Example 8.9. The sequence (2n/n!) satises un+1 n + 1 un=0. The characteristic polynomial is X
whose only root is 0 and the only information we obtain from Poincars theorem is that un+1/un
tends to 0. However, one can scale the sequence so as to obtain a more interesting characteristic
polynomial by setting un = vn n!. Clearly the appropriate is 1 and then the new recurrence
has characteristic polynomial X 2 and a 2nenn behaviour with n 0 is recovered.

8.3 Millers algorithm


Perrons theorem or the more precise result of Birkho-Trjitzinsky show that the situation is very
similar to that of linear recurrences with constant coecients. Any solution of (8.1) decomposes as
un = c1 n1 + + cknk ,
where the coecients ci are independent of n and the basis elements can be numbered so
that ni+1 = O(ni), for i = 1, ..., k 1. Thus the same proof as that of the power method
gives the following result.

Lemma 8.10. If the coefficients a1(n), ..., ak(n) of ( 8.1) belong to C(n), with ak =
/ 0, if un does
not belong to the subspace Vect(n2 , ..., nk) and n2 = o(n1 ), then as n , un/n1 tends to a
nonzero constant.
8.4 Computation of Chebyshev series 87

The statement of the analogue of Lemma 8.3 for the block power method is clear and left as
an exercice.

In this setting, the inverse power method is known as Millers algorithm. Consider a recurrence
yn = A(n) yn1
where A(n) is a nonsingular matrix in Ck k and A(n) = A + Bn with kBn k 0, but now used in
the backward direction: starting from
1
0
yN = e =
,


0
we compute yN 1, ..., y0 and normalize by setting
yn
wn: = .
ky0k
The natural question is: Does (wn) converge when N ?
Let Un be a basis of solutions dened by U0 = Id and Un = A(n) Un1. Then we can write
1
yn = Un T for some xed vector T . In particular we have e = UN T so that T = UN e and
1
1 Un UN e
yn = Un UN e, wn = 1 .
kUN ek
The convergence of wn is thus equivalent to that of
1
UN e
w0 = 1 .
kUN ek

Definition 8.11. The adjoint equation of the matrix recurrence above is


yn = tA(n)1 yn1.

Lemma 8.12. The sequence (tUn1) is a solution of the adjoint equation.

Proof. Inverting and transposing Un = A(n)Un1 yields tUn1 = tA(n)1 tUn1


1
. 

From there, we obtain Un1 e as the transpose of the rst row of this solution. This is not directly
a solution of an easily written recurrence, but a more careful analysis shows that the required
convergence can be deduced as in Mt and Nevais result [26] and similarly (but more technically)
for the block extension of this method [2].

8.4 Computation of Chebyshev series


We now apply these techniques to the special case of recurrences satised by coecients of Cheby-
shev series of solutions of linear dierential equations. This section describes the result of very
recent and interesting work by Alexandre Benoit, Mioara Jolde and Marc Mezzarobba [2].

8.4.1 Shape of the recurrence


We start from a dierential equation expressed with a linear dierential operator
L(x, ) = ar(x) r + + a0(x).
Chapter 6.6 introduces the morphism dened by
S + S 1
X= , D = 2 (S S 1)1 k, (8.6)
2
88 Linear recurrences and Chebyshev series

and uses it to show that the Chebyshev recurrence P (k, S) is given by:

L(X , D) = Q(k, S)1 P (k, S).

Proposition 8.13. The operator P (k, S) is of the form


s
X
bj (k) S j (8.7)
j =s
with bj (n) = bj (n) for all j , n.

Proof. For any polynomial P (x), the polynomial P ((S + S 1)/2) is symmetric and satises the
property. The proof of Theorem 6.35 shows that the denominators of fractions M (X , D) with M a
polynomial are all powers of D. It follows that it is sucient to prove that the property is preserved
by addition and product by D 1. Addition is clear. Expanding the product of such operator by D 1
gives
2 2b (k 1) s1 2bs+1(k 1) s
(S S 1)(bs(k)S s + + bs(k)S s) = s S S +
k k k
2(bs(k + 1) bs+2(k 1)) s+1 2(bs2(k + 1) bs(k 1)) s1 2b (k + 1) s
S + + S + s1 S +
k k k
2bs(k + 1) s+1
S ,
k
from where the property is visible. 

8.4.2 Problems
The recurrence operator (8.7) on the Chebyshev coecients suers from several drawbacks:
Its order is too big: in general s = r + maxdeg ai. This recurrence, besides the coecients of
Chebyshev series expansions of its solutions, possesses extra solutions that:
do not tend to zero,
and/or are not symmetric.
Also, the leading coecient bs(n) has zeros, which complicates the use of the recurrence
both in the forward and backward directions.

8.4.3 Convergence
We want to isolate the solutions of the recurrence tending to 0, since only them can correspond to
Chebyshev series of solutions.

Lemma 8.14. The characteristic polynomial of the recurrence ( 8.7) is given by


 
X + X 1
P (X) = X deg ar ar .
2

Proof. The highest degree of the monomials images of the morphism (8.6) is reached when the
degree of is maximal. 

/ 0 on the interval [1, 1], then for any root of P, ||


Lemma 8.15. Assume that ar = / {0, 1}
and P ( 1) = 0.

Proof. The rst property is clear: the constant term of P is the leading term of ar divided by a
power of 2. The constraint on the modulus comes from the fact that the image of the unit circle
under the mapping X 7 (X + X 1)/2 is the interval [1, 1]. The last one comes from the quasi-
invariance of P under X 7 1/X. 
8.4 Computation of Chebyshev series 89

Note that the hypothesis on ar is quite natural from the analytic point of view: it implies that
all solutions of the linear dierential equation are analytic in a neighborhood of [1, 1] (by Cauchys
theorem), hence have a (quickly) convergent Chebyshev series. A consequence of this corollary is
that the recurrence possesses many solutions that tend to innity (those with || > 1) and thus do
not correspond to Chebyshev series. The symmetry property of Proposition 8.13 can be used to
prove that there is a one-to-one correspondence between divergent and convergent solutions of the
recurrence, encompassing also the solutions growing like powers of n!.

8.4.4 Symmetry
The considerations of the previous sections suggest the use of Millers method for a block of
dimension s, which is half the order of the recurrence, but still larger than the order of the
dierential equation. Further constraints are obtained by considering symmetric solutions only. Let
S = {n > s : bs(n) = 0} be the set of indices where the recurrence can be used backwards. Let then
E = {(u|n|)nZ : n N\S , P (u|n|) = 0}
denote the space of symmetric sequences that are solutions of the recurrence, except possibly
at n S.

Proposition 8.16. The symmetric, convergent solutions of P (un) = 0 are exactly the sequences
of E that satisfy the extra conditions
(P u)n = 0, n S {r, r + 1, ..., s}.

At this stage, we refer to [2] for the proof.

8.4.5 Algorithm
We conclude this chapter by stating without proof the (simple) algorithm due to Benoit, Jolde
and Mezzarobba and their (not so simple) main theorem.

Algorithm Chebyshev Series.


Input:
L(x, ) of order r
P
r linearly independent conditions 1, ..., r of the form i: kZ ci,k yk = i (for instance, the
special case ci,k = k,0 corresponds to fixing y0)
d degree of approximation
N sufficiently large integer
Output:
Pd
an approximation n=d yn Tn(x) of a solution with the given initial conditions i.
Sketch of the algorithm:
Ps
Compute P (n, S) = k=s bk(n)S k the numerator of L(X , D);
Set S: ={n > s : bs(n) = 0};
Set (TN , ..., TN +2s) := ( Ids 0) a s 2s matrix of initial conditions;
For n from N + s 1 to s do if n S then Tns: =0 else Tns is computed with P ;
Use the (i) and the equations in Prop. 8.16 with the values T0, ..., TN and solve the resulting
s s linear system;
Return (yn) the corresponding linear combination of T0, ..., TN

Theorem 8.17. limN yk = yk, for d 6 k 6 d.


Chapter 9
Rigorous Polynomial Approximations

Let p(x) be the degree-5 minimax approximation to e1/cos x over [1, 1]. We observed earlier
that obtaining a good enclosure of e1/cos x p(x) using the natural interval enclosure of this
expression on subintervals would require about 107 subintervals. In this chapter, we present some
tools that make it possible to get a certied enclosure, as sharp as wished, of this error function
e1/cos x p(x) in an ecient way.

Definition 9.1. Let f C([a, b]). A rigorous polynomial approximation to f is a pair (p, ) where
p R[x] and f (x) p(x) for all x [a, b].

9.1 Taylor models

9.1.1 Taylor series, automatic differentiation and Taylor models


Recall the following classical result.

Lemma 9.2. (Taylor- formula.) Let n N, and let f C n+1([a, b]). Let x0 [a, b]. For all x [a, b],
we have
n Z x (n+1)
X f (i)(x0) f (t)
f (x) = (x x0)i + (x t)n dt.
i! x n!
i=0 ||||||||||||0||||||||||||||||||||||||||||{z}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}
Rn(x)

Using the mean value theorem, there exists x between x0 and x such that
f (n+1)(x)
Rn(x) = (x x0)n+1.
(n + 1)!

To obtain rigorous polynomial approximations based on the previous lemma, there are two
kinds of computations to perform:
that of the Taylor coecients;
1
that of (n + 1)!
f (n+1)([a, b]).
When it comes to compute values of derivatives of a function f , there are several options.
Divided dierences: this gives rise to gross errors.
Symbolic computation: time (and memory?)-consuming... Also may give rise to huge over-
estimations for composite functions.
Automatic dierentiation. (Cf. Griewanks texts [9][8].) This is the idea we develop next.
We want to compute values of derivatives of f up to the order k. We associate to each function
u present in the expression tree/dag of f an array of size k + 1, [u0, u1, ..., uk]. Depending on the
context, ui may be either of u(i)(x0), u(i)([x0]) or u(i)([a, b]). Then we dene operations on these
arrays that overload the usual ones. For instance, given u = [u0, u1, ..., uk] and v = [v0, v1, ..., vk],
we dene
u + v = [u0 + v0, u1 + u1, ...],
u v = [u0 v0, u0 v1 + u1 v0, ...].

91
92 Rigorous Polynomial Approximations

We can dene dierences, divisions, compositions the same way.


This works very well to compute derivatives in a single point. Unfortunately, using the nave for-
mulas with interval arithmetic may lead to huge overestimations. Consider again e1/cos x over [0, 1],
and let T be its degree-14 Taylor approximation. Using the method outlined above, we obtain
kf T k < 2.73 102, whereas in fact kf T k 2.6 103.
Here again, we will distinguish between
basic or standard functions for which we assume to have a means to compute a tight Taylor
remainder eciently,
and more general functions resulting from the composition of standard functions.
Instead on computing only the Taylor coecients by induction on the shape of the expression tree,
we will apply the same idea to the remainders. Main principle: for bounding the remainders,
for basic functions, we use the Taylor- formula,
for composite functions,
rst compute a rigorous polynomial approximation (T , ) for each basic function in
the expression tree of the function,
and then apply arithmetic rules overloading the operations.

Definition 9.3. We call Taylor model a rigorous polynomial approximation computed using this
scheme.

9.1.2 Arithmetic operations on Taylor models


Let (p1, 1) and (p2, 2) be two Taylor models with deg p1 = deg p2 = n associated respectively to
f1 and f2.
Addition. We have for all x [a, b]
f1(x) p1(x) 1, f2(x) p2(x) 2,
and hence
(f1(x) + f2(x)) (p1(x) + p2(x)) 1 + 2.
So we dene
(p1, 1) + (p2, 2) = (p1 + p2, 1 + 2).

Multiplication. Letting i,x = fi(x) pi(x), we have


f1(x) f2(x) = p1(x) p2(x) + 1,x p2(x) + 2,x p2(x) + 1,x2,x ,
so
f1(x) f2(x) p1(x) p2(x) I1 = p1([a, b]) 2 + p2([a, b]) 1 + 1 2.
A dierence with the previous case is that in general deg (p1 p2) > n. So we write p1 p2 = p + q
where p is the degree-n truncation of p1 p2, and we let I2 = q([a, b]), and we set
(p1, 1) (p2, 2) = (p, I1 + I2).
Using nave polynomial mutliplication, the cost of computing (p1, 1) (p2, 2) is O(n2) arithmetic
operations.
Composition. We reduce to doing multiplications by writing
n
X
f2 f1(x) p2(f1(x)) = f2 f1(x) bj f1(x) j 2.
j =0

Basic functions. We use Taylor formulae. As most basic functions we deal with are D-nite, we
can use linear recurrence relations to compute enclosures of their Taylor coecients and (to some
extent) remainders.
9.3 A little, little, little bit of fixed-point theory 93

9.1.3 Ranges of polynomials


Observe that we heavily used enclosures of ranges of polynomials. This raises several questions.
First, how do we compute these enclosures? A rst option is Horners scheme, i.e. computing
p(I) where p = a0 + a1(x x0) + + an (x x0)n as
((an (I x0) + an1) (I x0) + an2) (I x0) + + a0.
Another possibility is to use Bernsteins basis: indeed, one can show that if
n
X
p(x) = pk Bn,k(x),
k=0
then for all x [0, 1], we have
min p > min pk and max p 6 max pk.
[0,1] k [0,1] k

Second, why would this process yield tight enclosures? Our basic functions are analytic, and
hence have (fast) converging Taylor series.

9.2 Chebyshev models


Lemma 9.4. Let f C n+1([a, b]), and let a 6 x0 < < xn 6 b. Let p denote the interpolant of f
at the nodes xi. There exists x (a, b) such that
n
f (n+1)(x) Y
f (x) p(x) = (x xi).
(n + 1)!
i=0
|||||||||||||||{z}
| }}}}}}}}}}}}}}}
Wn(x)

If [a, b] = [1, 1] and the xi are the Chebyshev nodes of the rst kind, we have
1
Wn(x) = Tn(x).
2n+1
In this case,
f (n+1)([1, 1])
f (x) p(x) .
(n + 1)! 2n

We may dene Chebyshev models similar to the Taylor models. We express the polynomials in
the Chebyshev basis (Tk). For basic functions, we use an interpolant at the Chebyshev nodes of the
rst kind. The remainder formula is computed using the recurrence satised by the corresponding
Taylor expansion. For composite functions, we resort to the same two-step process as in the Taylor
1
case, making use of the formula Ti T j = 2 (Ti+j + T|ij |). For bouding the ranges of polynomials,
we replace the Horner scheme with the Clenshaw scheme.

9.3 A little, little, little bit of fixed-point theory


We are given f as a solution of a linear ordinary dierential equation. We are also given a poly-
nomial p, and we want to determine a bound B such that kf pk < B. Consider the case of an
equation of order 1, over [1, 1]

y (x) = a(x) y(x), a(x) Q(x),
(9.1)
y(0) = y0,

where a(x) has no pole over [1, 1]. We introduce

: C([1, 1]) 
C([1, 1]) 
Z x
y 7 x 7 y0 + a(t) y(t) dt .
0
94 Rigorous Polynomial Approximations

A function f is a solution to (9.1) i (f ) = f , a xed point equation in the Banach space C([1, 1]).
For all u, v C([1, 1]) and x [1, 1], we have
Z x

|u(x) v(x)| = a(t) (u(t) v(t)) dt 6 kak ku vk,
0

and by induction one can show that


kaki
k i u i v k 6 ku v k = i










0.
i! i

Hence there exists i0 such that i0 < 1, and then i0 is a contraction over C([1, 1]). So there exists
a unique f such that C([1, 1]).
Bibliography
[1] A. I. Aptekarev. Sharp constants for rational approximations of analytic functions. Russian Acad. Sci. Sb.
Math., 193:172, 2002.
[2] Alexandre Benoit, Mioara Jolde and Marc Mezzarobba. Rigorous uniform approximation of D-nite functions.
, , 2012. In preparation.
[3] W. J. Cody, G. Meinardus and R. S. Varga. Chebyshev rational approximations to ex in [0, +) and
applications to heat-conduction problems. J. Approximation Theory, 2:5065, 1969.
[4] E. W. Cheney. Introduction to Approximation Theory, 2nd edition. AMS Chelsa Publishing. Providence, Rhode
Island, 1982.
[5] Marc Daumas and Guillaume Melquiond. Certication of bounds on expressions involving rounded operators.
Transactions on Mathematical Software, 37(1):120, 2010.
[6] Florent de Dinechin, Christoph Lauter and Guillaume Melquiond. Certifying the oating-point implementation
of an elementary function using Gappa. Transactions on Computers, 60(2):242253, 2011.
[7] Rida T. Farouki. The Bernstein polynomial basis: a centennial retrospective. Comput. Aided Geom. Design,
29(6):379419, 2012. Available from http://mae.engr.ucdavis.edu/~farouki/bernstein.pdf.
[8] Andreas Griewank. A mathematical view of automatic dierentiation. Acta Numer., 12:321398, 2003.
[9] Andreas Griewank and Andrea Walther. Evaluating derivatives. Society for Industrial and Applied Mathematics
(SIAM), Philadelphia, PA, Second edition, 2008. Principles and techniques of algorithmic dierentiation.
[10] Nicholas Hale and Alex Townsend. Fast and Accurate Computation of GaussLegendre and GaussJacobi
Quadrature Nodes and Weights. SIAM Journal on Scientific Computing, 35(2):0, 2013.
[11] Thomas C. Hales. A proof of the Kepler conjecture. Ann. of Math. (2), 162(3):10651185, 2005.
[12] Thomas C. Hales, John Harrison, Sean McLaughlin, Tobias Nipkow, Steven Obua and Roland Zumkeller. A
revision of the proof of the Kepler conjecture. Discrete Comput. Geom., 44(1):134, 2010.
[13] G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford University Press, Fifth
edition, 1979.
[14] Guillaume Melquiond. De larithmtique dintervalles la certification de programmes. PhD thesis, cole
Normale Suprieure de Lyon, Lyon, France, 2006.
[15] Oystein Ore. Linear equations in non-commutative elds. Annals of Mathematics, 32:463477, 1931.
[16] Allan Pinkus. Weierstrass and approximation theory. J. Approx. Theory, 107(1):166, 2000. Available from
http://www.math.technion.ac.il/hat/fpapers/wap.pdf.
[17] M. J. D. Powell. Approximation theory and methods. Cambridge University Press, Cambridge, 1981.
[18] A. Schnhage. Zur rationalen Approximierbarkeit von exber [0, ). J. Approximation Theory, 7:395398,
1973.
[19] H. Stahl. Best uniform rational approximation of |x| on [1, 1]. Russian Acad. Sci. Sb. Math., 76:461487,
1993.
[20] Warwick Tucker. The Lorenz attractor exists. C. R. Acad. Sci. Paris Sr. I Math., 328(12):11971202, 1999.
[21] Warwick Tucker. A rigorous ODE solver and Smales 14th problem. Found. Comput. Math., 2(1):53117, 2002.
[22] Lloyd Nick Trefethen. Approximation Theory and Approximation Practice. SIAM, 2013. See http://
www2.maths.ox.ac.uk/chebfun/ATAP/.
[23] Warwick Tucker. Validated numerics, A short introduction to rigorous computations. Princeton University
Press, Princeton, NJ, 2011. A short introduction to rigorous computations.
[24] R. S. Varga and A. J. Carpenter. On the Bernstein conjecture in approximation theory. Constr. Approx.,
1(4):333348, 1985.
[25] L. Veidinger. On the numerical determination of the best approximations in the Chebyshev sense. Numer.
Math., 2:99105, 1960.
[26] Ray V.M. Zahar. A mathematical analysis of Millers algorithm. Numerische Mathematik, 27(4):427447,
1976.
[27] J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 2nd
edition, 2003.

95

S-ar putea să vă placă și