Sunteți pe pagina 1din 303

Copyright c _ 2012 by David G. Schaeer and John W.

Cain
All rights reserved
ODE: A BRIDGE BETWEEN UNDERGRAD AND
GRADUATE MATH
by
David G. Schaeer and John W. Cain
Contents
1 Introduction 1
1.1 Some simple ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Descriptive concepts . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Solutions of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Examples and discussion . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Geometric interpretation of solutions . . . . . . . . . . . . . . 5
1.3 Three solution techniques from the elementary theory . . . . . . . . . 6
1.3.1 Linear equations with constant coecients . . . . . . . . . . . 6
1.3.2 First-order linear equations . . . . . . . . . . . . . . . . . . . 7
1.3.3 Separable equations . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Examples of physically based ODEs . . . . . . . . . . . . . . . . . . . 9
1.4.1 Mechanical systems . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.2 Physical equations with nonmechanical origins . . . . . . . . . 16
1.5 Systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Topics covered in this book . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.1 General remarks . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6.2 Qualitative behavior of some predator-prey models . . . . . . 21
1.7 Software for numerical solution of the IVP . . . . . . . . . . . . . . . 24
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8.1 Exercises to consolidate your understanding . . . . . . . . . . 27
i
1.8.2 Exercises referenced elsewhere in this book . . . . . . . . . . . 29
1.8.3 Computational Exercises . . . . . . . . . . . . . . . . . . . . . 31
1.8.4 Exercises of independent interest . . . . . . . . . . . . . . . . 33
1.9 Additional notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.9.1 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.9.2 The concept generic . . . . . . . . . . . . . . . . . . . . . . 35
2 Linear systems with constant coecients 36
2.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Denition and properties of the matrix exponential . . . . . . . . . . 38
2.2.1 Preliminaries about norms . . . . . . . . . . . . . . . . . . . . 38
2.2.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2.3 The main theorem . . . . . . . . . . . . . . . . . . . . . . . . 43
2.3 Calculation of the matrix exponential . . . . . . . . . . . . . . . . . . 45
2.3.1 The role of similarity . . . . . . . . . . . . . . . . . . . . . . . 45
2.3.2 Two problematic cases . . . . . . . . . . . . . . . . . . . . . . 48
2.3.3 Use of the Jordan form . . . . . . . . . . . . . . . . . . . . . . 50
2.4 Large-time behavior of solutions of homogeneous linear systems . . . 52
2.4.1 The main results . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.4.2 Tests for negative eigenvalues . . . . . . . . . . . . . . . . . . 54
2.5 Solution of inhomogeneous problems . . . . . . . . . . . . . . . . . . 55
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.6.1 Routine exercises . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.6.2 Exercises of independent interest . . . . . . . . . . . . . . . . 58
ii
2.7 Additional notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3 Nonlinear systems: local theory 61
3.1 Two counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 The existence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Statement of the theorem . . . . . . . . . . . . . . . . . . . . 63
3.2.2 Dierentiability implies Lipschitz continuity . . . . . . . . . . 64
3.2.3 Reformulation of the IVP as an integral equation . . . . . . . 66
3.2.4 The contraction-mapping principle . . . . . . . . . . . . . . . 66
3.2.5 Proof of the existence theorem . . . . . . . . . . . . . . . . . . 68
3.2.6 An illustrative example . . . . . . . . . . . . . . . . . . . . . . 70
3.2.7 Concluding remark . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 The uniqueness theorem . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 Gronwalls Lemma . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.2 More on Lipschitz functions . . . . . . . . . . . . . . . . . . . 73
3.3.3 The uniqueness theorem . . . . . . . . . . . . . . . . . . . . . 74
3.4 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4.1 Nonautonomous systems . . . . . . . . . . . . . . . . . . . . . 76
3.4.2 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5.1 Exercises to consolidate your understanding . . . . . . . . . . 77
3.5.2 Exercises used elsewhere in this book . . . . . . . . . . . . . . 80
3.5.3 Computational exercise . . . . . . . . . . . . . . . . . . . . . . 81
3.5.4 Exercises of independent interest . . . . . . . . . . . . . . . . 81
iii
3.6 Additional notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4 Nonlinear systems: global theory 84
4.1 The maximal interval of existence . . . . . . . . . . . . . . . . . . . . 84
4.2 Two sucient conditions for global existence . . . . . . . . . . . . . . 86
4.2.1 Linear growth of the RHS . . . . . . . . . . . . . . . . . . . . 86
4.2.2 Trapping regions . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3 Nullclines and trapping regions . . . . . . . . . . . . . . . . . . . . . 91
4.3.1 An activator-inhibitor system . . . . . . . . . . . . . . . . . . 92
4.3.2 Lotka-Volterra with a logistic modication . . . . . . . . . . . 93
4.3.3 van der Pols equation . . . . . . . . . . . . . . . . . . . . . . 95
4.3.4 The torqued pendulum and ODEs on manifolds . . . . . . . . 97
4.3.5 Michaelis-Menten kinetics . . . . . . . . . . . . . . . . . . . . 100
4.4 Continuous dependence of the solution . . . . . . . . . . . . . . . . . 102
4.4.1 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4.2 Some associated formalism . . . . . . . . . . . . . . . . . . . . 103
4.4.3 Continuity with respect to the equation . . . . . . . . . . . . . 104
4.5 Dierentiable dependence on initial data . . . . . . . . . . . . . . . . 104
4.5.1 Formulation of the main result . . . . . . . . . . . . . . . . . . 104
4.5.2 The order notation . . . . . . . . . . . . . . . . . . . . . . . . 106
4.5.3 Proof of Theorem 4.5.1 . . . . . . . . . . . . . . . . . . . . . . 107
4.5.4 Further discussion . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5.5 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
iv
4.6.1 Exercises to consolidate your understanding . . . . . . . . . . 111
4.6.2 Exercises referenced elsewhere in this book . . . . . . . . . . . 119
4.6.3 Computational exercises . . . . . . . . . . . . . . . . . . . . . 120
4.6.4 Exercises of independent interest . . . . . . . . . . . . . . . . 123
4.7 Additional notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.8 Appendix: Eulers method . . . . . . . . . . . . . . . . . . . . . . . . 124
4.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.8.2 Theoretical basis for the approximation . . . . . . . . . . . . . 125
4.8.3 Convergence of the numerical solution . . . . . . . . . . . . . 127
5 Trajectories near equilibria 130
5.1 Stability of equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.1.1 The main theorem . . . . . . . . . . . . . . . . . . . . . . . . 131
5.1.2 An illustrative example: . . . . . . . . . . . . . . . . . . . . . 133
5.2 An orgy of terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2.1 Description of behavior near equilibria . . . . . . . . . . . . . 133
5.2.2 Classication of eigenvalues of 2 2 Jacobians . . . . . . . . . 136
5.2.3 Two-dimensional equilibria and slopes of nullclines . . . . . . 137
5.2.4 The Hartman-Grobman Theorem . . . . . . . . . . . . . . . . 138
5.3 Activator-inhibitor systems and the Turing instability . . . . . . . . . 138
5.3.1 Equilibria of the activator-inhibitor system . . . . . . . . . . . 139
5.3.2 The Turing instability: Destabilization by diusion . . . . . . 141
5.4 Liapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.4.1 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . 143
v
5.4.2 Lasalles invariance principle . . . . . . . . . . . . . . . . . . . 145
5.4.3 Construction of Liapunov functions . . . . . . . . . . . . . . . 146
5.5 Stable and unstable manifolds . . . . . . . . . . . . . . . . . . . . . . 147
5.5.1 A preparatory example . . . . . . . . . . . . . . . . . . . . . . 147
5.5.2 Statement of the main result . . . . . . . . . . . . . . . . . . . 151
5.5.3 Proof of Theorem 5.5.1 . . . . . . . . . . . . . . . . . . . . . . 152
5.5.4 Global behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.5.5 Section 1.6 revisited . . . . . . . . . . . . . . . . . . . . . . . 159
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.6.1 Exercises to consolidate your understanding . . . . . . . . . . 160
5.6.2 Exercises referenced elsewhere in the book . . . . . . . . . . . 166
5.6.3 Computational exercises . . . . . . . . . . . . . . . . . . . . . 167
5.6.4 Exercises of independent interest . . . . . . . . . . . . . . . . 168
5.7 Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6 Oscillations in ODEs 171
6.1 Periodic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.1.1 Basic issues and examples . . . . . . . . . . . . . . . . . . . . 171
6.1.2 Contents of this chapter . . . . . . . . . . . . . . . . . . . . . 177
6.2 Special behavior in two dimensions . . . . . . . . . . . . . . . . . . . 179
6.2.1 The Poincare-Bendixson Theorem: minimal version . . . . . . 179
6.2.2 Application to the van der Pol equation . . . . . . . . . . . . 179
6.2.3 Limit sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.2.4 The Poincare-Bendixson Theorem: strong version . . . . . . . 183
vi
6.2.5 Dulacs Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.3 Limit cycles in the van der Pol equation for small . . . . . . . . . . 184
6.3.1 Two illustrative examples of perturbation theory . . . . . . . . 185
6.3.2 Application to the van der Pol equation . . . . . . . . . . . . 189
6.4 Limit cycles in the van der Pol equation for large . . . . . . . . . . 191
6.4.1 Setting up the problem . . . . . . . . . . . . . . . . . . . . . . 191
6.4.2 The limit-cycle solution . . . . . . . . . . . . . . . . . . . . . 191
6.4.3 Relaxation oscillations in the van der Pol equation . . . . . . . 194
6.5 Stability of periodic orbits: the Poincare map . . . . . . . . . . . . . 195
6.5.1 The basic construction . . . . . . . . . . . . . . . . . . . . . . 195
6.5.2 Discrete dynamical systems . . . . . . . . . . . . . . . . . . . 198
6.5.3 Application of the Poincare-map criterion . . . . . . . . . . . 199
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
6.7 Appendix: Index theory in two dimensions . . . . . . . . . . . . . . . 206
7 Bifurcation from equilibria 213
7.1 Example 1: Pitchfork bifurcation . . . . . . . . . . . . . . . . . . . . 213
7.2 An outline of this chapter . . . . . . . . . . . . . . . . . . . . . . . . 219
7.3 Example 2: Transcritical bifurcation . . . . . . . . . . . . . . . . . . 220
7.4 Example 3: Saddle-node bifurcation . . . . . . . . . . . . . . . . . . . 221
7.5 Theory for steady-state bifurcation: the Liapunov-Schmidt reduction 224
7.5.1 Bare bones of the reduction . . . . . . . . . . . . . . . . . . . 224
7.5.2 Stability issues . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.5.3 Exploration of one-dimensional bifurcation problems . . . . . 226
vii
7.5.4 Symmetry and the pitchfork bifurcation . . . . . . . . . . . . 230
7.5.5 The two-cell Turing instability . . . . . . . . . . . . . . . . . . 230
7.5.6 Imperfect bifurcation . . . . . . . . . . . . . . . . . . . . . . . 231
7.5.7 A bifurcation theorem . . . . . . . . . . . . . . . . . . . . . . 232
7.6 Example 4: The Hopf bifurcation . . . . . . . . . . . . . . . . . . . . 233
7.7 Hopf bifurcation: theory . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.8 Bifurcation in the FitzHugh-Nagumo equations . . . . . . . . . . . . 245
7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
7.10 Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8 Global bifurcations 250
8.1 Mutual annihilation of two limit cycles . . . . . . . . . . . . . . . . . 250
8.1.1 An academic example . . . . . . . . . . . . . . . . . . . . . . . 250
8.1.2 The FitzHugh-Nagumo equations . . . . . . . . . . . . . . . . 250
8.1.3 Phase locking in coupled oscillators . . . . . . . . . . . . . . . 251
8.2 Saddle-node bifurcation points on a limit cycle . . . . . . . . . . . . . 251
8.2.1 An academic example . . . . . . . . . . . . . . . . . . . . . . . 251
8.2.2 The overdamped torqued pendulum . . . . . . . . . . . . . . . 251
8.2.3 Other examples . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.3 Homoclinic bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8.3.1 van der Pol with nonlinearity in the restoring force . . . . . . 252
8.3.2 The torqued pendulum with small damping . . . . . . . . . . 252
8.3.3 The Lotka-Volterra model with logistic growth and the Allee eect253
8.3.4 Other examples . . . . . . . . . . . . . . . . . . . . . . . . . . 253
viii
8.4 Hopf-like bifurcation to an invariant torus . . . . . . . . . . . . . . . 253
8.4.1 An academic example . . . . . . . . . . . . . . . . . . . . . . . 253
8.4.2 The forced van der Pol equation . . . . . . . . . . . . . . . . . 254
8.5 Period doubling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
8.5.1 An academic example . . . . . . . . . . . . . . . . . . . . . . . 254
8.5.2 Rosslers equation . . . . . . . . . . . . . . . . . . . . . . . . . 255
8.6 Appendix: ODEs on a torus . . . . . . . . . . . . . . . . . . . . . . . 255
8.7 Appendix: What is chaos? . . . . . . . . . . . . . . . . . . . . . . . . 255
8.8 ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
A Guide to Commonly Used Notation 256
B Notions from Advanced Calculus 257
B.0.1 Regions with smooth boundaries . . . . . . . . . . . . . . . . 265
C Notions from Linear Algebra 267
C.1 Appendix: A compendium of results from linear algebra . . . . . . . 267
C.1.1 How to compute Jordan normal forms . . . . . . . . . . . . . 267
C.1.2 The Routh-Hurwitz criterion . . . . . . . . . . . . . . . . . . . 271
C.1.3 Continuity of eigenvalues of a matrix with respect to its entries 273
C.1.4 Fast-slow systems . . . . . . . . . . . . . . . . . . . . . . . . . 275
C.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
D Nondimensionalization and Scaling 277
D.1 Classes of equations in applications . . . . . . . . . . . . . . . . . . . 277
D.1.1 Mechanical models . . . . . . . . . . . . . . . . . . . . . . . . 277
D.1.2 Electrical models . . . . . . . . . . . . . . . . . . . . . . . . . 278
ix
D.1.3 Bathtub models . . . . . . . . . . . . . . . . . . . . . . . . 278
D.2 Scaling and nondimensionalization . . . . . . . . . . . . . . . . . . . . 279
D.2.1 Dungs equation . . . . . . . . . . . . . . . . . . . . . . . . . 280
D.2.2 Lotka-Volterra with logistic limits . . . . . . . . . . . . . . . . 282
D.2.3 Michalis-Menton kinetics . . . . . . . . . . . . . . . . . . . . . 284
D.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Bibliography 288
x
Chapter 1
Introduction
1.1 Some simple ODEs
1.1.1 Examples
An ordinary dierential equation (ODE) is an equation involving an unknown func-
tion of one variable and some of its derivatives. We hasten to assure the reader that
this bland phrase has meaning for us primarily through examples, so lets proceed
to these immediately.
Most simply we have the equation for exponential growth or decay,
x

= x, (1.1)
where is a constant (lets say real), x(t) is the unknown function, and x

denotes
the derivative of x with respect to t. The logistic equation modies this equation, in
case > 0, by inclusion of a negative term that limits growth as x becomes large:
x

= x x
2
(1.2)
where is a positive constant.
The equation
x

+ x = 0 (1.3)
describes what is often called simple harmonic motion. In physical terms, which will
be introduced in Section 1.4, equation (1.3) describes the motion of a mass pulled
back towards equilibrium by a frictionless spring. Here of course x

denotes the
second derivative. A useful point of comparison for (1.3) is
x

+ sin x = 0, (1.4)
1
which describes the motion of a pendulum under gravity under some simplifying
assumptions about units (also discussed in Section 1.4). Two other modications of
(1.3) are
(a) x

+ tx = 0 and (b) x

+ ( + cos t)x = 0, (1.5)


known as Airys equation and Mathieus equation, respectively.
We conclude our rst round of examples with a Riccati equation
x

= x
2
t (1.6)
and a purely pedagogical example
1 + (x

)
2
= x
2
. (1.7)
1.1.2 Descriptive concepts
The most basic concepts used in describing ODEs is order, which refers to the order
of the highest derivative that appears in the equation. Thus equations (1.1), (1.2),
(1.6), and (1.7) are rst order, while (1.3), (1.4), and (1.5) are second order. Here is
an example of a third-order equation:
d
3
y
dx
3
=
y +
y
3
, (1.8)
where y(x) is the unknown function of the variable x. This example also illustrates
the following three points: (i) Usually the independent variable in the ODE we
study is time, but other choices also occurin this equation x represents a spatial
coordinate. (The dependent variable y(x) represents the thickness of a thin lm as
a function of position.) (ii) We have written the derivative using the d/dx-notation
rather than with primes; no mathematical signicance should be attached to this
choice, it is only a matter of taste as to which notation seems more appropriate to
us in a given situation. (iii) An ODE need not be dened for all values of either
the dependent or independent variable. For example, (1.8) is not dened for y = 0.
Incidentally, much information can be gained by focusing on such exceptional points
of an equationcalled singularities in the usual terminology.
Normally we will solve for the highest derivative of the dependent variable as a
function of lower-order derivatives and of t: i.e., for an equation of order n
x
(n)
= f(x, x

, . . . , x
(n1)
, t). (1.9)
The value of this convention is illustrated by equation (1.7), which may be rewritten
x

x
2
1. (1.10)
2
Two problems in (1.7) become evident from rewriting the equation in this way:
(i) Really two dierent ODEs are hidden in (1.7); to get a specic ODE we need
to choose between the plus and minus signs in (1.10). (ii) For some values of x
specically [x[ < 1equation (1.7) has no real-valued solutions. The restriction
on values of x is not a problem in itselfas noted above, an ODE need not be
dened for all values of its variablesbut there is yet another problem in (1.7) that
is less obvious but more serious. It relates to the fact that the behavior of

x
2
1
is nondierentiable at [x[ = 1, the boundary of its natural domain. Specically,
Exercise 3(c) shows that equation (1.7) suers from a nonuniqueness pathology.
Next we dene the very important notion of linearity. We shall call an ODE
linear
1
if it may be written in the form
x
(n)
= a
1
(t)x
(n1)
+ a
2
(t)x
(n2)
+ . . . + a
n1
(t)x

+ a
n
(t)x + f(t); (1.11)
i.e., the unknown function x and its derivatives appear only raised to the rst power.
Thus equations (1.1), (1.3), and (1.5) are linear. Equations (1.2), (1.6), and (1.7)
are obviously nonlinear; (1.4) is also nonlinear because
sin x = x
x
3
3!
+
x
5
5!
+ . . .
has many higher powers of x hidden in it. Likewise
x

= x

x
is also nonlinear because of the product on the RHS of the equation.
The linear equation (1.11) is called homogeneous if f(t) 0, and it is said to have
constant coecients if all the functions a
j
(t), j = 1, . . . , n are actually independent
of t. The latter concept for nonlinear equations has a dierent name: (1.9) is called
autonomous if the function f does not depend on t.
1.2 Solutions of ODEs
1.2.1 Examples and discussion
Here is a denition even more vapid than the denition of an ODE: A function x(t)
is called a solution of (1.9) if the two sides of the equation become equal when this
function is substituted into the equation. Lets proceed to examples.
For any constant C, x(t) = Ce
t
is a solution of (1.1). (In Exercise 1 we
show the reader how to prove that this is the most general solution of (1.1).) The
1
More formally, we say that equation (1.9) is linear if the function f is linear is its rst n arguments;
no restriction on the t-dependence is implied.
3
general solution of (1.2) will be determined in Section 1.3.3 below, using a technique
introduced in that section.
For any constants C
1
, C
2
R,
x(t) = C
1
cos t + C
2
sin t (1.12)
is a solution of (1.3). This solution provides an instance of the principle of linear
superposition for a linear, homogeneous ODE. Specically, if x
1
(t) and x
2
(t) are solu-
tions of (1.11) (with f(t) 0), then for any constants C
1
, C
2
, the linear combination
C
1
x
1
(t) + C
2
x
2
(t)
is also a solution. In the language of linear algebra, the set of solutions of a homoge-
neous linear ODE forms a vector space. As we shall see in Chapter 2, any solution of
(1.3) can be written in the form (1.12); this means that the set of solutions of (1.3)
is a two-dimensional vector space for which cos t, sin t is a basis.
The above examples of solutions illustrate one of the most fundamental points
in the whole subject: ODEs have innitely many solutions. Thus, some auxiliary
information must be given to pick out exactly one solution from the innite set of
solutions. The most common such auxiliary information is an initial condition. For
example, if one seeks a solution of (1.1) subject to the auxiliary condition
x(0) = b
where b is a real constant, then x(t) = be
t
is the unique solution of this more specic
problem. (To show its unique, we need to know that Ce
t
is the general solution of
(1.1), as is proved in the Exercise 1.) Similarly, given real constants b
0
, b
1
there is a
unique function (1.12) that satises (1.3) plus the initial condition
x(0) = b
0
, x

(0) = b
1
.
For a general problem, given constants b
0
, b
1
, . . . , b
n1
, one seeks a solution of (1.9)
such that
x(0) = b
0
, x

(0) = b
1
, . . . , x
(n1)
(0) = b
n1
. (1.13)
We shall call this the initial-value problem for (1.9). The initial condition may be
imposed at any point t = t
0
, but we will usually impose this condition at t = 0
as in (1.13). Of course the general case is easily reduced to (1.13) without loss of
generality.
Warning: The symbol x may refer to a generic real variable, or it may refer to
a function x(t) that satises an ODE. The rst usage occurs in (the italicized part
of) the phrase The equation x

= f(x), where f is a dierentiable function of x,


is a general rst-order autonomous ODE, while the second occurs in The general
4
x
t
b
*
Figure 1.1: Direction eld for the Riccati equation (1.6), with several solution tra-
jectories corresponding to dierent choices of initial condition x(0).
solution x of (1.1) is given by the function x(t) = Ce
t
.
In general, it is a rare and pleasant occurrence when one is able to nd explicit
solutions of an ODE. In Section 1.3 we shall describe three solution techniques from
the elementary theory. Although much information about solutions of equations such
as (1.4), (1.5), and (1.6)) has been obtained through intensive study, simple formulas
like the solutions of (1.1) and (1.3) are not readily available.
1.2.2 Geometric interpretation of solutions
The geometric interpretation of ODEs is an essential part of the subject. The in-
terpretation is clearest for rst-order equations. We illustrate this with the help of
Figure 1.1, which shows the direction eld for (1.6): i.e., imagine that at every point
in the half-plane (t, x) : t > 0, a line segment whose slope is x
2
t is drawn. The
basic geometrical fact is that a function x(t) is a solution of (1.6) i at every point
(t, x(t)) its graph is tangent to the line segment at that point. This interpretation
makes it seem natural that ODEs have many solutions and that a unique solution
may be selected by specifying a starting point for the curve (t, x(t)) at t = 0..
Much qualitative information about solutions of an ODE may be obtained from
its direction eld. For example, from Figure 1.1 we make a conjecture regarding
the behavior of solutions of (1.6) as t : i.e., for all initial data such that b is
negative, the solution of (1.6) asymptotes to a curve in the in the half-plane x < 0,
while if b is large and positive, the solution grows without bound. Moreover, there
5
is a special initial condition x(0) = b

that marks the threshold between these vastly


dierent long-term behaviors. In Exercise 13 we invite the reader to explore this
conjecture with the numerical software introduced in Section 1.7 below.
1.3 Three solution techniques from the elementary theory
In this section we introduce three methods for nding explicit solutions of certain
ODEs that come from the elementary theory. In this book we shall assume the reader
can use these three techniques. No other prerequisites from the elementary theory
of ODEs will be assumed.
1.3.1 Linear equations with constant coecients
If the coecients a
j
in (1.11) are actually independent of t, then one may nd
explicit solutions of this equation using exponentials. This method will be developed
extensively in Chapter 2, so here we limit ourselves to applying it in a simple example
x

+ x

+ x = 0. (1.14)
(This equation diers from (1.3) by the rst-order term x

. As discussed in Sec-
tion 1.4, this new term represents friction, and normally > 0.) Let us look for
solutions of (1.14) of the form x(t) = e
t
. Substituting into (1.14) we see that e
t
satises (1.14) if
(
2
+ + 1)e
t
= 0.
In other words, because the derivative of the exponential is a multiple of itself,
nding an exponential solution of (1.14) reduces to solving the algebraic equation

2
+ + 1 = 0, which of course has solutions

=

_

2
4
2
.
Thus, e

t
is a solution of (1.14), and by linear superposition for any constants
C
+
, C

C
+
e

+
t
+ C

t
is also a solution. As we shall show in Chapter 2, this is the general solution of
(1.14).
If the friction coecient is positive, then the solutions e

t
decay as t increases.
If 0 < < 2, then the roots

are complex. In this case we may separate real and


imaginary parts of the roots,

= /2 i
_
1
2
/4
6
and use Eulers formula e
i
= cos + i sin to rewrite the solution
e

+
t
= e
t/2
_
cos
_
1
2
/4 t + i sin
_
1
2
/4 t
_
and similarly for e

t
. Moreover we may form linear combinations
e
t/2
cos
_
1
2
/4 t, e
t/2
sin
_
1
2
/4 t
to choose a dierent basis for the set of solutions of (1.14) whose elements are real-
valued.
This discussion illustrates a tension that exists in this text: usually we are in-
terested in real-valued solutions of an ODE, but often it is convenient to consider
complex-valued solutions in order to take advantage of the complex exponential. In
general, as here, a complex exponent indicates oscillatory behavior of real-valued
solutions of an ODE.
1.3.2 First-order linear equations
For a rst-order linear ODE, say
x

+ a(t)x = f(t), (1.15)


one can nd explicit solutions even if the coecients are variable. In Exercise 1(b)
we ask the reader to verify the following claim: Let a(t) =
_
t
0
a(s) ds; then for any
constant C,
x(t) = Ce
a(t)
+
_
t
0
e
a(s) a(t)
f(s) ds (1.16)
satises (1.15). Moreover since x(0) = C, (1.16) also provides a solution to the
initial-value problem.
For the reader seeking a deeper understanding, the derivation of (1.16) from
the equation using an integrating factor is given in most introductory dierential
equations textbooks, see for example [4].
1.3.3 Separable equations
A rst-order ODE is called separable if the RHS may be factored
x

= f(x)g(t). (1.17)
For example equations (1.2) and (1.7) are separable, where in both cases the factor
g(t) is trivial. Let us illustrate how to exploit this property by solving (1.2). In the
following derivation, we temporarily suspend all concerns of rigorwe shall freely
7
perform manipulations that might be problematic in order to obtain a formula for
the solution. After it has been derived, we may verify that the formula actually does
provide a solution. Given such a verication, there is no need to justify intermediate
steps.
We write the LHS of (1.2) using the notation x

= dx/dt, and we treat dx and dt


as separate factors. Let us bring all x-dependence in the equation to the LHS and
all t-dependence to the RHS, obtaining
dx
x x
2
= dt. (1.18)
The LHS of (1.18) may be expanded in partial fractions:
1
x x
2
=
1
x
+
/
x
.
Then multiplying both sides of (1.18) by and integrating, we derive
ln x ln( x) = t + C
where C is an arbitrary constant of integration. Exponentiation of this equation
yields
x
x
= C

e
at
,
where C

= e
C
, and this relation may be solved for x(t),
x(t) =
C

e
at
1 + C

e
at
. (1.19)
In Exercise 1(c) we ask the reader to verify that the above, rather formal, manipu-
lations actually produce solutions of (1.2).
Regarding the IVP, we seek a value of C

in (1.19) that will satisfy the initial


condition x(0) = b. The reader may check that for any b ,= /, the initial condition
is satised if and only if
C

=
b
b
. (1.20)
It is interesting that (1.19) fails to provide a solution to the IVP precisely in the case
when the original ODE has the trivial solution x(t) /. This behavior arises
from one of the gaps in rigor in the above derivation: if x(t) /, then the term
dx/( x) in the derivation is undened and hence cannot be integrated. This
behavior reminds us that solutions obtained using separability always need to be
checked.
Note that, unless C

= 0, the solution (1.19) is not dened for all t. This warns


8
0
/
x
t
Figure 1.2: Direction eld for (1.2), with solution trajectories corresponding to
several dierent choices of initial condition x(0).
us that an IVP may have a solution only for a limited time. (We explore this issue
in a more serious way in Chapter 3.)
Despite the above non-existence problem, (1.2) gives acceptable predictions re-
garding the future evolution of a population, for which of course x 0. Specically,
in Exercise 1(c) we ask the reader to show that if b 0, then the solution of the
IVPobtained from (1.19) if b ,= / and from x(t) / if b = /exists for
all t 0. The reader may also verify that, no matter what the initial conditions, the
solution x(t) tends to / as t , as may be anticipated from considering the
direction eld shown in Figure 1.2.
Here is another point illustrated by the logistic equation: If x(t) is a solution of
(1.2) for some time interval, then for any constant t
0
, the shifted function x(t) =
x(t t
0
) is also a solution of (1.2) on the appropriate translated interval. Such
translational invariance will occur whenever the governing equation is autonomous.
1.4 Examples of physically based ODEs
1.4.1 Mechanical systems
The motion of spring-mass systems provide invaluable insight into many phenomena
involving ODEs. Such systems are governed by Newtons second law of motion,
mass acceleration = sum of all forces,
9

= k x
spring
F
=
friction
F x
x
m
equilibrium
Figure 1.3: Schematic diagram of the mass-spring-dashpot system corresponding
to (1.21).
or more compactly and more famously, F = ma. Consider for example the system
illustrated in Figure 1.3. The mass is constrained to move along a single axis. If
we let x measure the displacement of the mass from a reference position, then the
acceleration is simply the second derivative d
2
x/dt
2
. There are two forces acting on
the mass: (i) the restoring force F
spring
from the spring and (ii) friction F
friction
. The
restoring force opposes any displacement from the equilibrium position. The simplest
assumption, called Hookes law, is that the force is proportional to the displacement
from equilibrium. If we measure displacements relative to the equilibrium position,
we have the simple formula for the restoring force F
spring
= kx where k is the
constant of proportionality. Regarding friction, the simplest assumption is that this
force opposes the motion of the mass with a strength proportional to its speed: in
symbols F
friction
= dx/dt where 0. Truth to tell, this formula is a rather
poor approximation of dry friction
2
; i.e., friction of a mass sliding over a dry surface.
Despite this inaccuracy, friction is widely approximated by such a term because of
the appealing fact that this leads to a linear ODE.
Combining the above forces in Newtons equation we get the ODE for the motion
of the mass
mx

= x

kx. (1.21)
Equation (1.14) is a special case of this equation
3
. As for (1.14), the general solution
of (1.21) is a linear combination of exponentials. Substituting into the equation, we
2
This formula is more typical of the drag from a viscous uid at moderate velocitiessee [?].
For that reason in Figure 1.3 we have represented friction by a dashpot: i.e., a piston sliding
through a viscous uid.
3
In fact, (1.21) can be reduced to (1.14) by scaling. Specically, if we let = (
_
k/m) t and divide
10
nd that e
t
is a solution of (1.21) if
=

_

2
4mk
2m
. (1.22)
This formula for the roots contains interesting information about how friction changes
the behavior of the system. If there is no friction (i.e., = 0), the roots (1.22) are
pure imaginary and the solutions of (1.21) are trigonometric functions with (angular)
frequency =
_
k/m; i.e., oscillations continue forever. As increases from zero,
the solutions retain their oscillatory character, with some decrease in the frequency,
but are conned within a decaying exponential envelope. This behavior continues as
increases, the decay becoming more rapid, until
2
= 4mk. After this point both
roots (1.22) are real, and the solution x(t) will cross x = 0 at most once in the course
of its decay. One calls the cases
2
< 4mk,
2
= 4mk, and
2
> 4mk underdamped,
critically damped, and overdamped, respectively.
These ideas have a practical consequence in the automotive world. The shock
absorbers of a car can be crudely modeled by (1.21). As the name suggests, one
wants shock absorbers to have a lot of damping: i.e., to be overdamped. This gives
rise to the following quick test for whether shock absorbers are worn out. Depress
the car and release it from rest. If the car returns monotonically to equilibrium, then
the shock absorbers are OK. If on the other hand, the car oscillates up and down in
its return to equilibrium, then the shock absorbers need to be replaced.
It is easy to imagine springs in which the restoring force is not exactly proportional
to the displacement; indeed exact linearity is the unlikely behavior. For example,
although the restoring force is gravity rather than a spring, consider a pendulum
as illustrated in Figure 1.4. The mass is conned to move on a circle by a rigid
(massless) arm, say of length , and its position is specied by a single coordinate,
an angle rather than a displacement. If the angle x that the pendulum makes with the
vertical is measured in radians, then x equals the displacement of the mass along
the circumference, dx/dt equals its velocity, and d
2
x/dt
2
equals its tangential
acceleration. The tangential component of gravity is F
tang
= mg sin x. Thus, if
there is no friction, Newtons equation of motion may be written
x

+ (g/) sin x = 0, (1.23)


where we have divided both terms by m. This derivation explains the origin of
(1.21) by k, we obtain
d
2
x
d
2
+
dx
d
+ x = 0
where = /

mk. Mathematically, such scalings can greatly simplify an ODE, and physically,
much insight can be gained by relating such scalings to the units of the parameters in the equation.
We shall develop these ideas systematically in Appendix D.
11
F
tang
=
mg sin x
mg (gravity)
x
m
Figure 1.4: Schematic diagram of the pendulum corresponding to (1.4).
(1.4), but our purpose here is to illustrate the deviation of this restoring force from
linearity. If x is small, then sin x x so in this range the force is approximately
linear, but as x increases, the force falls behind this linear growth (see Figure 1.5).
Incidentally, (1.4), along with elaborations that include other eects, is one of the
examples we recall throughout this book to illustrate the theory.
Similarly, it is also possible for the force to grow more rapidly than linearly. A
more extreme deviation from Hookes law is illustrated by the cantilever beam (i.e.,
supported at one end only) placed between two magnets as in Figure 1.6. If the
beam bends so that its tip is displaced slightly to the right, the beam is closer to
the magnet on the right and hence more strongly attracted to it than to the magnet
on the left. If the magnetic forces dominate the bending resistance of the beam
then, following a small displacement, the net force on the beam will be to the right:
i.e., the beam is pulled away from the centerline rather than towards it. However,
if the beam moves beyond the magnet to the right, then both magnetic forces and
the bending resistance all pull the beam back towards the center line. Suppose we
naively describe the bending of the beam by a single variable x, say the displacement
of the tip. The simplest force law
4
reproducing the above behavior is
F(x) = +k
1
x k
2
x
3
(1.24)
where both k
1
, k
2
are positive. Despite its crudeness as a physical model, (1.24)
is often used in applications, and mathematically it provides a useful illustrative
4
See Exercise 11 for another system exhibiting qualitatively similar behavior.
12
F
tang
= mg sin x
F
lin
= mgx
x
0

0
force
Figure 1.5: Comparison of the tangential component F
tang
with its linear approxi-
mation F
lin
= mgx.
N N
S S
x
Figure 1.6: Schematic diagram of a cantilever beam between two magnets.
13
example. We shall call the equation
x

+ x

x + x
3
= 0 (1.25)
Dungs equation
5
. The force law (1.24) is called a double-well potential, which leads
us to the concept of potential energy, which we now discuss.
The potential energy V (x) associated with a force law F(x) is dened as the work
that must be done against the force to move the mass from a reference position,
typically equilibrium, to the position specied by x, in symbols
V (x) =
_
x
0
F(s) ds. (1.26)
The potential functions for Hookes law, for the pendulum, and for (1.24) are graphed
in Figure 1.7; the gure explains name double-well potential for (1.24). Of course
(1.26) is equivalent, apart from an additive constant, to F(x) = V/x(x), so the
equation of motion of a particle moving in a force eld
6
with potential energy V is,
assuming linear friction,
mx

+ x

+
V
x
(x) = 0. (1.27)
Note that this equation is nonlinear except in the special case when V (x) is quadratic.
One may attempt to visualize solutions of (1.27) as the motion of a marble rolling
in the x, z-plane along a curve given by the equation z = V (x). As discussed in the
Notes at the end of the chapter, this analogy is quantitatively inaccurate, but it
makes useful qualitative predictions nonetheless
7
: For example, a particle moving
according to (1.25), which has double-well potential
V (x) =
1
2
x
2
+
1
4
x
4
, (1.28)
will indeed come to rest at the bottom of one of the wells, just as a rolling marble
would do.
One technique for proving the previous statement, which will be studied in Chap-
ter 5, is based on energy considerations, and we now introduce this important con-
cept. The total energy of a mass is the sum of its potential energy and its kinetic
5
Strictly speaking (1.25) should be called Dungs equation without forcing. Dungs equation
with forcing would include a nonzero inhomogeneous term f(t) on the RHS of (1.25). Some
eects of forcing are studied in Exercise6.
6
Although we introduced forces in connection with spring-mass systems, we want to consider more
general force laws than can reasonably be associated with any spring. For that reason we adopt
the language of a particle in a force eld.
7
One is reminded of the aphorism, A simple lie may be more useful than a complicated truth.
14
V =
2
1
kx
2
(a)
x
V = mg (1cos x) (b)
x
(c)
x
2
V =
2
x
2
+
k
4
x
4
k
1
Figure 1.7: The potential functions for (a) Hookes law; (b) the pendulum; and (c)
the double-well potential.
15
energy, in symbols
E =
m
2
(x

)
2
+ V (x). (1.29)
We ask the reader to compute that if x(t) satises (1.27) then energy is dissipated
at the rate
dE
dt
= (x

)
2
. (1.30)
To interpret: if there is no friction then energy is conserved (i.e., remains constant
as time evolves), while if > 0 energy decreases at a rate given by (1.30).
1.4.2 Physical equations with nonmechanical origins
We have assumed 0 in the discussion of spring-mass systems since friction
normally dissipates energy. However, in certain electrical circuits what amounts to
negative friction can arise over a limited region of state space. The most famous
equation exhibiting such behavior is van der Pols equation
x

+ (x
2
1)x

+ x = 0 (1.31)
where x measures a voltage in a circuit. If x is small (specically, [x[ < 1), the
coecient of x

is negative, leading to the increase of energy, while if x is large


this coecient has the usual positive sign. We will analyze this equation in detail in
Chapter 6. Historically van der Pols equation arose in modeling circuits with vacuum
tubes, but of much greater current interest, it also arises in modeling semiconductor
circuits [?].
By way of background, a linear equation of the form (1.21) arises from the de-
scription of an electrical circuit containing linear elements: an inductor (L), a resistor
(R), and a capacitor (C). Specically, as may be derived from Kirchos laws [?],
the voltage x(t) across the capacitor in Figure 1.8 at time t satises
8
Lx

+
L
RC
x

+
1
C
x = 0. (1.32)
The van der Pol equation arises if the linear resistor in the gure is replaced by an
appropriate nonlinear element in which current depends non-monotonically on volt-
age. (It may appear that, with negative friction, energy is being created out of
nowhere, but the derivation [?] explains how the system is consistent with conserva-
tion of energy.) Electrical circuits are perhaps further from everyday intuition than
spring-mass systems, and we do not develop the theory here.
Below we shall also consider ODEs derived from other applications, as for example
the predator-prey population model (1.33) below.
8
A circuit with the elements in series, rather than parallel, is probably more familiar to most
readers. We consider the parallel circuit since this conguration leads to van der Pols equation.
16
L
x = voltage
across
capacitor
C R
Figure 1.8: An inductor, a resistor, and a capacitor in parallel. The voltage x
across any of the elements satises (1.32). If the linear resistor is replaced by a
suitably-chosen nonlinear resistor, then x satises van der Pols equation (1.31).
1.5 Systems of ODEs
All of the examples of ODEs considered above contained a single unknown function.
It will be crucial to also study systemsi.e., several simultaneous equationsof
ODEs involving several unknown functions.
Some physical or biological systems
9
are most naturally modeled by systems of
ODEs. One of the best known such systems is the Lotka-Volterra model
x

= x xy
y

= xy y
(1.33)
where all parameters are positive. Let us describe the physical assumptions under-
lying (1.33) since, in our view, such understanding is an essential part of acquiring
facility with ODEs. This system describes the evolution of two interacting popula-
tions, a predator (say foxes, represented by y) and a prey (say rabbits, represented
by x). In the absence of predators (i.e., y = 0), the prey population satises x

= x,
the equation for exponential growth. However, their growth rate is reduced by preda-
tion, which is assumed to occur at a rate proportional to each population
10
. Similarly,
9
There is an unfortunate conict between dierent elds in the use of the word system. Here
we mean system in the biological sense a group of interacting, interrelated, or interdependent
elements forming a complex whole, while later in this same sentence we mean system in its more
restricted mathematical meaning, several simultaneous equations.
10
For maximal realism, the underlying process should be modeled probabilistically. An ODE model
provides a useful approximation for the evolution of average populations provided the populations
are large. The rate term proportional to xy may be derived from the probability that members
of the two species encounter one another. In chemical kinetics, the corresponding approximation
is called the Law of Mass Action.
17
the predator equation for y represents a balance between two eects: the predator
population is increased by a term proportional to the amount of food the preda-
tors consume, and their population is decreased by a death term proportional to
their population. Remarkably, in the full equation for the evolution of y, these two
eects are simply added! This in an instance of a very general phenomenawhen
several eects occur in a physical system, typically the ODE describing its evolution
is obtained simply by adding the contributions of each eect in the ODE. This is
the source of the power of ODEs. Of course, although the equations may be simple
to formulate, solving them is anything but simple, and thats what these notes are
about.
Besides arising naturally, systems of ODEs also arise as a mathematical conve-
nience. For example, we claim that van der Pols equation (1.31) is equivalent to the
2 2 rst-order system
y

1
= y
2
y

2
= (y
2
1
1)y
2
y
1
.
(1.34)
To see this, suppose x(t) is a solution of (1.31). Then let y
1
(t) = x(t) and y
2
(t) =
x

(t); it is easily seen that the two-component vector y(t) satises (1.34). Conversely,
if (y
1
(t), y
2
(t)) satises (1.34), then a trivial calculation shows that x(t) = y
1
(t)
satises (1.31).
This construction is quite general. Specically, the n
th
-order ODE (1.9) is equiv-
alent to the n n system for functions y
1
(t), . . . , y
n
(t)
y

1
= y
2
y

2
= y
3
.
.
.
.
.
.
.
.
.
y

n1
= y
n
y

n
= f(y
1
, y
2
, . . . , y
n
, t).
(1.35)
The proof of this statement is completely analogous to the above calculation with
van der Pols equation.
It turns out that it is more convenient to study rst-order systems of ODEs than
to study a single, higher-order equation. Among other reasons, this convenience
derives from the geometric interpretation of ODEs: although (1.35) requires working
in more dimensions, the equation may still be interpreted in a fashion analogous
to Figure 1.1. Indeed, geometric language is fundamental to the advanced theory.
The presentation of the theory is also simplied by using vector notation. Thus,
for example, if we write y = (y
1
, y
2
, . . . , y
d
), then (1.35) can be written compactly
y

= F(y) where the vector-valued function F(y) has the components on the RHS
18
y
2
y
1
Figure 1.9: The vector eld associated with (1.34) with = 1. One sample solution
trajectory is shown. Like all other non-equilibrium solutions, it converges to the
periodic solution of (1.34).
of (1.35).
Let us illustrate this geometric interpretation for van der Pols equation (1.34),
where the two-dimensional geometry
11
simplies visualization. Figure 1.9 shows the
vector eld
F(y) =
_
y
2
(y
2
1
1)y
2
y
1
_
(1.36)
dened by the RHS of (1.34). A curve y(t), T
1
< t < T
2
is a solution of (1.34)
i for every t the tangent to the curve at the point y(t) equals F(y). Despite the
transparency of this interpretation, it is not at all easy to deduce global behavior
of solutions from this local information. The curve shown in the gure is a typical
solution of (1.34)any non-zero solution converges to a periodic trajectory: i.e., a
solution for which there exists a time T > 0 such that y(t + T) = y(t) for all t. We
invite the reader to use the software of Section 1.7 to verify this claim numerically.
Considerable theory, which we will develop in Chapter 6, is needed in order to verify
it analytically.
Let us conclude this section with some terminology. A system y

= F(y, t)
is called linear if F has the special form F(y, t) = A(t)y + a(t) where A(t) and
a(t) are matrix-valued and vector-valued functions of time, respectively. It is called
homogeneous if a(t) 0. A system y

= F(y, t), linear or nonlinear, is called


autonomous if F is actually independent of t.
11
The phrase phase plane is used to describe graphs like Figure 1.9 that show trajectories of a
two-dimensional autonomous rst-order system of ODE.
19
Given an autonomous system y

= F(y), say of dimension d, we call a point


b

R
d
an equilibrium of this system if F(b

) = 0. In particular, the constant


function y(t) b

is one solution of this system.


Regarding geometrical interpretation, let us distinguish between trajectory and
orbit. Both terms refer to the curve traced out by a solution of an autonomous ODE,
say
y

= F(y). (1.37)
By trajectory, we mean the curve with parametrization by time,
t (y
1
(t), . . . , y
n
(t))
where y(t) satises (1.37). By contrast, orbit refers to the point set,
(y
1
(t), . . . , y
n
(t)) : t (a, b)
where we assume the solution exists for times with a < t < b. The orbit is inde-
pendent of any specic parametrization of the curve; for example, in the case of a
two-dimensional system, an orbit could be written as the level set of some function
(y) of two variables, say
(y
1
, y
2
) : (y
1
, y
2
) = const. (1.38)
1.6 Topics covered in this book
1.6.1 General remarks
In a rst course in ODEs the focus is on nding explicit formulas to represent solu-
tions of equations. This is a fascinating subject that oers boundless opportunities
for ingenuityit would be an interesting digression to describe the application of
such methods just to equations (1.2), (1.5), (1.6). However, the sad fact is that for
most equations explicit solutions cannot be found. Approximate solutions, obtained
either from numerical computations or asymptotic analysis, frequently provide an
adequate substitute for explicit formulas. We will touch briey on both kinds of
approximate solutions, but neither they nor explicit solutions are the main focus in
this book.
In Part I of this bookChapters 24we address the holy trinity of theoretical
questions regarding the initial value problem:
Existence of solutions (local in Section 3.2, global in Section 4.2),
Uniqueness of solutions (Section 3.3), and
Continuous dependence on the initial data (Section 4.4).
20
The rst two phrases are probably self-explanatory; we shall wait till Chapter 4 to
esh out the third. Our treatment in these chapters is completely rigorous.
In Part IIChapters 58we develop, with less concern about rigor, the quali-
tative theory of ODEs, especially bifurcation theory. The qualitative theory studies
what can be said analytically about solutions of ODEs in the absence of explicit
formulas. A central question in the theory is to characterize the asymptotic behavior
of solutions as t . In the next unit we illustrate typical answers to this question
by considering the Lotka-Volterra equation (1.33) as well as some of its elaborations.
1.6.2 Qualitative behavior of some predator-prey models
The large-time behavior of solutions of the Lotka-Volterra is easily described. Let us
simplify the Lotka-Volterra equations to
x

= x xy
y

= (xy y)
(1.39)
where is a positive constant; as we will show in Appendix D, (1.39) can be derived
from (1.33) by scaling. One particular solution of (1.39) is the constant, equilibrium,
solution x(t) 1, y(t) 1, which describes a steady balance between the two
species. Every other solution in the open rst quadrant
12
x > 0, y > 0 circles
this equilibrium point in a periodic fashion, as indicated in Figure 1.10. Indeed, the
orbits are level sets of the function
L(x, y) = (x ln x) +y ln y, (1.40)
which has a global minimum at (1, 1). (In Exercise 5 we discuss how to derive this
conclusion using the solution technique of separability.)
However, the Lotka-Volterra model is far too simplistic for realistic modeling. Ac-
knowledging that fact, let us also examine the large-time behavior of two of the many
modications of it that have been studied. Specically, we correct two unsatisfactory
consequences of the linear growth rate in the prey-only equation x

= x:
Solutions of this equation grow indenitely large as time evolves. As we saw
in (1.2), a more realistic equation is x

= x(1 x/K) where K is the carrying


capacity of the environment.
No matter how small x(0) may be, the prey population never goes extinct.
This defect may be corrected in an ad hoc manner by assuming the growth
rate
13
equals x(x)/(x+). Then for x < the growth rate is negative; thus
12
At the boundary of the rst quadrant, there are solutions with y 0, in which case x grows
exponentially, and solutions with x 0, in which case y decays exponentially.
13
A growth rate that depends on the population size is called the Allee eect.
21
0
1
2
3
0 1 2 3
x
y
Figure 1.10: Several solution curves of the Lotka-Volterra system (1.39). (Here
we have chosen = 1.) All non-equilibrium solutions are periodic and encircle the
equilibrium at (1, 1).
if x(0) < , the prey will die out. On the other hand, for large x the growth
rate is close to x, as in the original equations.
Inserting both of these modications
14
of the prey growth rate into the system (1.39)
gives us the equations
(a) x

= x
_
x
x+
_
(1
x
K
) xy
(b) y

= (xy y).
(1.41)
If = 0 and K = , then we obtain the unmodied Lotka-Volterra equations (1.39).
To begin the discussion of (1.41), let us nd the equilibrium solutions of this sys-
tem. From (1.41b), we nd that (xy y) = 0 if either y = 0 or x = 1. Substituting
y = 0 into (1.41a) gives the rst three equilibria listed in Part (a) of Table 1.6.2, and
substituting x = 1 gives the fourth, (1, y

) where
y

= (1 1/K)(1 )/(1 + ). (1.42)


In studying (1.41) we assume that
0 < < minK, 1 : (1.43)
we want < K so that the carrying capacity exceeds the threshold for extinction,
and when K > 1, we want < 1 so that the prey population at coexistence exceeds
14
Although (1.41) is physically motivated, we are not claiming that it is a realistic model for
population growth.
22
Part (a)
Equilibrium Description
(0, 0) Extinction
(, 0) Extinction threshold
(K, 0) Prey-only equilibrium
(1, y

) Co-existence equilibrium
Part (b)
Region Characterizing inequalities Generic long-time behavior
I (1 + 2
2
)/2 < K Converges to extinction
and < 1
II 1 < K < (1 + 2
2
)/2 Converges to extinction or
the co-existence equilibrium
III < K < 1 Converges to extinction or
the prey-only equilibrium
Table 1.1: Part (a): Equilibria of (1.41), the Lotka-Volterra system augmented
by logistic growth and the Allee eect. At the co-existence equilibrium y

is given by
(1.42). Part (b): Generic long-term behavior of solutions of (1.41), depending on
, K. Regions I, II, and III refer to Figure 1.11(a). (Generic is a rough synonym
for typicalsee Additional Notes, Section 1.8.)
23
the threshold for extinction.
In contrast to (1.39), solutions of the perturbed equation (1.41) are almost never
periodic. Rather, as illustrated in Figure 1.11(b)-(d), solutions converge to one of
the equilibria of (1.41) as t . In Figure 1.11(a) we have identied three regions
in the subset of , K-plane dened by (1.43). In Figure 1.11(b)-(d) we show typical
trajectories that occur for , K in each of the three regions, and in Table 1.6.2 we
describe their behavior as t in words.
Note that the following surprising behavior is contained in the above summary.
Imagine starting with , K in Region II and initial conditions such that the solution
of (1.41) converges to the co-existence equilibrium. Now increase K so that (, K)
moves into Region III. Although increasing the carrying capacity seems like it should
promote the overall health of the system, this parameter change leads to a worse
fatetotal extinction!
1.7 Software for numerical solution of the IVP
Previously, we mentioned that it is rarely possible to produce explicit solutions of
initial value problems. This begs the question: how might we describe solutions
of dierential equations that are resistant to analytical techniques such as those in
Section 1.3? Two of the most common approaches are
the qualitative approach: Given an analytically intractable DE, produce a lin-
ear, constant-coecient DE (Section 1.3.1) which has qualitatively similar dy-
namics to the original DE (at least locally
15
); and
the numerical approach: Use computer software/algorithms to approximate the
solution of an initial value problem over some time interval.
This section concerns the latter approach and, although numerical methods are not
the central theme of this textbook, we want the reader to be aware of their importance
in the study of DEs.
There is myriad of available software for numerical solution of IVPs, some com-
mercially available and some freely available. One free program we urge you to
download and install is XPP, which was developed by Professor G. Bard Ermentrout
of the University of Pittsburgh. The purpose of XPP is to numerically solve dieren-
tial equations, dierence equations, delay equations, functional equations, boundary
value problems, and stochastic equations. Because XPP is bundled with another pro-
gram called AUTO (a tool for exploring bifurcationssee Chapters 7 and 8), XPP is
also known as XPPAUT.
We have developed a website for readers who wish to supplement our textbook
with the XPP software: Please visit
15
This approach will be developed in detail in Chapter 5.
24
Region I
Region II
Region III
Region I
Region II Region III
(a) (b)
(c)
3
1
0
0 1

K
(d)
0.0
2.5
y
2.5 0.0
x
K K
0.0
2.5
y
2.5 0.0
x
0.0
2.5
y
2.5 0.0
x
start
K = 4.0
K = 2.0 K = 0.4
start
start
start start
start
Figure 1.11: Panel (a): Three regions in -K parameter space for which solutions
of (1.41) have dierent dynamical behavior. The boundaries of the three regions are
formed by the curves K = , K = 1, and K = (1 +2
2
)/(2). Panels (b) through
(d): Solution trajectories of (1.41) with = 1, = 1/5 and three dierent carrying
capacities K. Each panel shows trajectories corresponding to two choices of initial
conditions: x(0) = y(0) = 0.5 and x(0) = y(0) = 2.0. Panel (b): With K = 4,
both species go extinct. Growing oscillations in prey population ultimately lead to the
preys extinction as a result of the Allee eect. Panel (c): With K = 2, the initial
conditions determine whether both species go extinct or whether their populations
experience transient oscillation en route to the coexistence equilibrium. Panel (d): If
K = 2/5, the predators go extinct, but the prey may either go extinct or equilibrate
to K depending upon the initial conditions. In Panels (c) and (d) the prey-only
equilibrium at (K, 0) is indicated, but in Panel (b) this equilibrium lies outside the
range of x in the gure.
25
http://www.math.duke.edu/jcain/book/main.html
for (i) instructions on downloading and installing XPP as well as a link to the ocial
XPP website; (ii) XPP code that we have written to solve DEs that appear in this
textbook (including exercises); and (iii) step-by-step instructions on how to use our
XPP code to explore solutions of initial value problems. For example, we have included
XPP code for solving some of the equations in this chapter, including the Riccati
equation (1.6), the Dung equation (1.25), the van der Pol equation (1.31), the
modied Lotka-Volterra system (1.41), and the vibrated pendulum equation (1.51).
A few tips, disclosures, and disclaimers that you should be aware of, some of
which are specic to XPP:
1. The instructions and code that we posted on the above website assume that the
reader will run XPP under the Windows
R _
operating system. If you install and
run XPP under a dierent operating system, there may be slight discrepancies
between our instructions and what you see on your computer screen. Well rely
on you to adapt and improvise as needed.
2. Be sure to try out our rst two examples of XPP code under the Chapter 1
link on the website (Riccati and van der Pol equations).
3. You will nd that most of the syntax in XPP is easy, but there are a few
conventions and quirks that you should be aware of. Examples of conventions:
DEs that are second-order or higher must be written as systems of rst-order
DEs, and the default variable name for the independent variable is t. Examples
of quirks: XPP has a couple of default settings that cause it to stop computing
if either (i) a variable ever exceeds 100 in magnitude or (ii) more than 5000
data points are generated. Dealing with such issues is very straightforward as
long as you are aware that they exist, and we have compiled a list of advice
under the XPP Installation and Syntax link on the above website.
4. Caveat emptor! When using any software for numerical solution of initial value
problems, you need to be careful. Just because software uses mathematical
algorithms that are intended to approximate solutions does not guarantee that
those approximations will be satisfactory. You may be alarmed to learn that
numerical methods can perform poorly even for DEs that are not rigged
for pathological behavior. In the Chapter 4 exercises, you will see that the
simplest numerical method (Eulers method) for approximating the solution of
an IVP can be an utter disaster if applied to the seemingly innocent problem
x

= Mx, x(0) = 1, where M is a large constant. Conversely, you should be


relieved to know that numerical methods work beautifully for most IVPs, and
are a wonderful tool for gaining intuition regarding a systems behavior. Even
if an IVP does not admit an explicit solution, it is usually possible to tune
26
a numerical method so as to generate an approximation of the true solution
that is accurate to within a user-specied error tolerance. The trade-o is that
the less error you are willing to accept, the longer it will take for the software
to generate the approximate solution. Readers interested in these and other
issues should consult texts in numerical analysis, such as [1].
1.8 Exercises
1.8.1 Exercises to consolidate your understanding
1. Supply details omitted in the text.
(a) Show that every solution of x

= x is of the form Ce
t
for some constant
C. (Hint: Show that, for a solution x(t), the derivative of e
t
x is zero.)
(b) Verify that formula (1.16) solves the rst-order linear equation (1.15).
(c) Prove the following claims about the logistic equation made in the text.
Check that (1.19) satises (1.2).
Verify that the choice (1.20) satises the initial condition x(0) = b.
Show the logistic equation has a solution for all positive time if the initial
datum b is positive.
(d) Derive (1.30), the equation for energy dissipation in (1.27).
2. Construct ODEs with the following properties. The ODEs should be in the
standard form where the highest derivative has been solved for.
(a) A third-order scalar ODE that is nonlinear and nonautonomous.
(b) A fth-order, linear, homogeneous scalar equation with constant coecients
such that every solution tends to zero as t . (Hint: Start by making up a
fth-order polynomial with all its roots in the LHP.)
(c) A nonlinear autonomous three-dimensional system
16
.
3. Find solutions for the following equations or IVPs using separability. (See also
Exercise 5 below.)
16
Perhaps the most famous such system is the Lorenz equations, (7.5).
27
(a) The Gompertz model for tumor growth, in which the center is starved for
oxygen (see p217 Edelstein-Keshet []):
dN/dt = e
t
N.
(b) The logistic equation with constant harvesting:
x

= x(1 x)
where is a positive constant.
Discussion: The cases 0 < < 1/4, = 1/4, and 1/4 < must be
treated separately. Think about the equilibrium equation x(1 x)
= 0 to understand why the behavior of this equation changes at
= 1/4.
In this equation it is assumed that constant harvesting continues even
as x 0, which of course is unsustainable. This faulty assumption
is related to the fact that this equation can predict negative popula-
tions.
(c) The pedagogical example, (1.7), manipulated into standard form:
x

x
2
1, x(0) = 1. (1.44)
Discussion: You will most likely nd the solution x(t) = cosh t.
However, x(t) 1 is also a solution! In Chapter 3 we will give con-
ditions that guarantee that the initial-value problem has a unique
solution. In the meantime, you may want to ponder what misbehav-
ior of

x
2
1 leads to this nonuniqueness.
(d) x

= 1/x, x(0) = 1.
Discussion: The RHS of this equation is singular at x = 0. Be alert
to what behavior results from this singularity.
(e) A system that will be used as an illustration in later chapters:
x

= x y (x
2
+ y
2
)x x(0) = b
1
y

= x + y (x
2
+ y
2
)y y(0) = b
2
.
Hint: Solving this system would be hopeless except for the fact that
it may be rewritten in polar coordinates
r

= r(1 r
2
),

= 1,
28
in which the two equations are uncoupled.
4. (a) Show that if D
1
, D
2
C, then
x(t) = D
1
e
it
+ D
2
e
it
(1.45)
is a complex-valued solution of (1.3). Also show that if D
1
=

D
2
, where bar
indicates complex conjugation, then (1.45) is real-valued.
(b) Show that for any solution x(t) of the form (1.12), there exist real constants
C, such
x(t) = C sin(t + ), (1.46)
and conversely.
Remark: This exercise illustrates that other representations of solu-
tions of (1.3) are possible.
1.8.2 Exercises referenced elsewhere in this book
5. In this exercise the reader develops evidence that nonconstant solutions of the
Lotka-Volterra equations are periodic.
(a) Although it is not possible to solve the Lotka-Volterra equations for x and y
as functions of t, it is possible to eliminate time and derive an implicit relation
between x and y for the orbits. We derive an ODE for y as a function of x by
the chain rule
dy
dx
=
dy/dt
dx/dt
=
(xy y)
x xy
, (1.47)
where we have substituted (1.39) for the second equality. Let L(x, y) be dened
by (1.40). Derive
L(x, y) = const (1.48)
as an implicit solution of (1.47), preferably working directly from (1.47) using
the fact that this equation is separable, or alternatively simply by dierentiat-
ing (1.40).
(b) Verify that the level sets of L(x, y) are closed curves.
Discussion: Combining (a) and (b), we see that the trajectories of
(1.39) are contained in closed curves. To complete the proof that ev-
ery nonconstant trajectory is periodic, we would have to rule out the
possibility that a trajectory might not complete the circuit around
the closed curve. Although this is not beyond our present capabil-
ities, such arguments will be much easier after we have developed
29
more theory, so we leave this gap open for the time being. (Cf.
Chapter 6.)
6. (a) Consider an inhomogeneous, linear scalar ODE of order n, say
x
(n)
+ a
1
(t)x
(n1)
+ a
2
(t)x
(n2)
+ . . . + a
n1
(t)x

+ a
n
(t)x = f(t). (1.49)
Let x
partic
(t) be some solution of (1.49). (Such a solution is called a particular
solution, which provides the mnemonic for the subscript.) Show that any
solution x(t) of (1.49) can be written in the form
x(t) = x
partic
(t) + x
homog
(t)
where x
homog
(t) satises the homogeneous equation: i.e., (1.49) with the inho-
mogeneous term f(t) set equal to zero.
Remark: This ideai.e., particular solution plus homogeneous solution
is taught in all elementary courses on ODE.
(b) Consider periodic forcing of a spring-mass system
mx

+ x

+ kx = C cos t. (1.50)
Find a particular solution of this equation by looking for a solution in the form
x
partic
(t) = Acos t + Bsin t.
(c) Show that, provided > 0, every solution of (1.50) tends to x
partic
(t) as
t .
(d) Graph the amplitude

A
2
+ B
2
in x
partic
(t) as a function of , both for
large and small . Note that in the latter case the amplitude has quite a
large spike if is close to the frequency
_
k/m of the undamped oscillator.
Remark: This is our rst encounter with the phenomenon of reso-
nance.
7. Consider the two-dimensional system
x

= 1 x
y

= x y.
where is a small positive parameter, subject to initial conditions
x(0) = a, y(0) = b.
30
(a) Note that the x-equation does not involve y. Use this observation to solve
the above initial-value problem, treating the x-equation and the y-equation
sequentially.
(b) Given that 1, it is tempting to consider an approximation setting = 0.
This approximation has the alarming eect of transforming the dierential
equation x

= 1 x into a purely algebraic equation, x = 1. Ignoring the


warning bells that such a violent approximation sets o, nevertheless substitute
x 1 into the y-equation and solve the IVP
y

= x y, y(0) = b.
Discussion: Observe that, apart from an initial transient during
which x tends rapidly to 1, the two solutions closely track one an-
other. This is the rst instance of a theme that appears frequently
in applied math. The original system has two widely separated time
scales: the x-equation tends to an equilibrium in a short time on the
order of , while the y-equation evolves more slowly. The approxi-
mation is to let the rapid variable proceed to equilibrium, which
results in a simpler problem for the remaining variable(s). This is
an exceedingly useful approximation in many contexts, but caution
is required in using it.
1.8.3 Computational Exercises
In addition to the explicit exercises below, we invite the reader to use the software to
check any statements made in the text. For example, although we prove that every
solution of (1.39) (in the rst quadrant) is periodic, it may be reassuring to see this
behavior in computed solutions. Incidentally, Exercise 13 also has a computational
component.
8. Compare numerical solutions of the logistic equation (1.2) with the analytical
solution (1.19).
Discussion: This exercise is more for practice in using the software
than for any interesting math. One particular lesson is to see how
the software behaves in case the solution (1.19) blows up, which
happens for some positive time if the initial datum b is negative. The
blowup may be seen in better detail if one plots y, or rather [y[, on
a log scale.
9. This exercise in intended to show that typical solutions of the van der Pol
equation (1.34) tend to periodic behavior as t .
31
x
A cos( t)
Figure 1.12: Schematic of the vertically-vibrated pendulum of Exercise 11.
(a) Set = 1 and solve the initial-value problem for several choices of initial
conditions. As long as b ,= 0, you will see the periodic solution in Figure 1.9
emerge.
(b) Choose some other values of and repeat the above computation. You will
again see periodic behavior, but the exact orbit depends on .
10. For the augmented equation (1.41), verify the behavior claimed in Figure 1.11.
11. Discussion: Consider a pendulum whose supporting pin is vibrated vertically
(see Figure 1.12). The next computation demonstrates the amazing fact that
rapid vibration of the pin can make the straight up position of the pendulum
stable! If the height of the pin is Acos t and if friction is small, then the
displacement x of the pendulum approximately satises an equation of the form
x

+ x

+ [1 +
2
cos t] sin x = 0 (1.51)
where is proportional to A, the amplitude of the vibrations. This diers from
(1.4) by two terms: x

, which models friction, and the term proportional to


the acceleration of the pin, A
2
cos t.
(a) Write equation (1.51) as a rst order system.
(b) Start with the pendulum at rest and nearly vertical, say x(0) = 3.1, and
let = 0.1 Solve the equations for various s, say starting from = 1 and
increasing it repeatedly. If >??, the pendulum will come to rest in the
straight-up position!
32
1.8.4 Exercises of independent interest
12. A point b is called an equilibrium of x

= f(x) if f(b) = 0. In such a case


x(t) b is a solution of the equation. Considering only scalar equations, make
an educated guess (dont bother with a proof) which of the following two
statements is associated with f

(b) > 0 and which with f

(b) < 0:
(a) If the initial datum x(0) is suciently close to b, then x(t) tends to b as
t .
(b) No matter how close the initial datum x(0) may be to b, the solution x(t)
moves further away from b as t increases.
Remark: This problem anticipates the concept of stability in Chap-
ter 5.
13. (a) Verify numerically the behavior conjectured in Section 1.2.2 for the Riccati
equation, (1.6). Specically,
Compute that for negative and small positive values of b the solution
asymptotes to the parabola x
2
t = 0.
Compute that for large values of b the solution appears to blow up in nite
time.
Locate the initial datum b

that separates the two behaviors.


Discussion: In the remainder of this exercise we attempt to obtain
analytical information about the asymptotic behavior as t of
solutions of (1.6).
(b) Show that there are formal series solutions of this equation,
x
+
(t) =

t + a
0
1
t
+ a
1
1
t
5/2
+ . . . and x

(t) =

t + b
0
1
t
+ b
1
1
t
5/2
+ . . . ,
series in inverse powers of t
3/2
.
Discussion: When we say formal series, we are allowing the possi-
bility that the series may not converge. Thus to show that a formal
series solution exists, you need only derive a recursion relation for
successive coecients in the series. You should also calculate the
rst few coecients in each series.
The series x
+
separates the two asymptotic behaviors of solutions of
(1.6); x

characterizes the asymptotic behavior of all solutions that


remain in the lower-half plane x < 0 as t .
33
(c) Since the series are based on inverse powers, they are useful in the limit
t . Compare, for large t, say t > 10, the sum of the rst few terms of
these series with numerical solutions of (1.6).
1.9 Additional notes
1.9.1 Miscellaneous
There is a general construction to reformulate a nonautonomous system of ODE in
n dimensions, say
y

= F(y, t),
where y(t) = (y
1
(t), . . . , y
d
(t)) and F : R
d
R
d
, as an autonomous system in one
higher dimension. Form a new unknown z(t) = (z
1
(t), . . . , z
d
(t), z
d+1
(t)), a d + 1-
dimensional vector, by appending an additional variable. Using the notation
z(t) = ( z(t), z
d+1
(t)) where z(t) = (z
1
(t), . . . , z
d
(t)),
we require that z(t) satisfy
z

= G(z)
where G : R
d+1
R
d+1
is dened by
G(z) =
_
F( z(t), z
d+1
(t))
1
_
.
We may connect the two systems by observing that z
d+1
(t) is essentially equivalent
to time. To see this, it may be helpful to write out both equations in components.
In Section 1.3 we suggested that one might attempt to visualize solutions of
(1.27), a particle moving in a potential V (x), as the motion of a marble rolling in the
x, z-plane along a curve given by the equation z = V (x). Quantitatively this analogy
fails badly. In the rst place, rolling introduces a whole new level of complexity
one needs to distinguish between rolling with and without slipping, which requires
examining friction between the marble and the surface. Suppose we completely
ignore rollingperhaps solutions of (1.27) are analogous to a particle sliding (with
minimal friction) in the x, z-plane along a curve given by the equation z = V (x).
However, this analogy is also awed, even if one ignores the possibility of the marble
moving so rapidly that it lifts o the curve. Specically, for sliding along a curve,
motion in the z-direction inuences the x-component. Precise equations for sliding
along a curve are most easily derived with the Lagrangian formulation of mechanics
[?].
Equation 1.41 is a system of ODEs that contains several parameters, and the
34
behavior of solutions depends on the parameters. Situations of this type will arise
frequently in this book.
1.9.2 The concept generic
Let us explore the meaning of the term generic used in Table 1.6.2. This is our rst
encounter with this useful, but perhaps over-used, concept. Although the term is a
rough synonym for typical, it carries a lot of mathematical associations that can
be learned only by exposure over time. Lets get started now.
Not every solution of (1.41) has long-time behavior as listed in the table. For ex-
ample, no matter what , K may be, the threshold equilibrium solution (x(t), y(t))
(, 0) does not move away from its initial condition (x(0), y(0)); in particular it does
not converge to any of the equilibria listed in Part (b) of Table 1.6.2. On the other
hand, with the slightest perturbation of the initial conditions from (, 0), the solution
will evolve in time and (probably) converge to some other equilibrium, which one
depending on the perturbation. Robustness is one association of generic; thus, we
dismiss the equilibrium solution (x(t), y(t)) (, 0) as non-generic.
Here is a more interesting example of a non-generic solution. (Although we
describe it in words, the message is more vivid if you check what we say with your
own computations.) In Figure 1.11(b) imagine a one-parameter family of initial
conditions lying on the line y = 0.5, say (x(0), y(0)) = (b, 0.5) where 0 < b < 1.
On the one hand, if b is close to 0, the solution will converge to extinction, like the
upper trajectory in the gure. On the other hand, if b is close to 1, the solution
will converge to the prey-only equilibrium, like the lower trajectory in the gure. By
continuity, somewhere in between these extremes is an initial condition (b

, 0.5) that
separates these behaviors. As one might guess, the solution with initial condition
(b

, 0.5) in fact converges to the threshold equilibrium, (, 0), as t . However,


we again dismiss this as non-generic since perturbing x(0) to either side of b

leads
to qualitatively dierent behavior.
Yet another example of generic/non-generic behavior is contained Figure 1.1.
Generically, solutions of the Riccati equation (1.6) either blow up in nite time or
asymptote to the parabola x =

t. The dividing case with x(0) = b

is a non-
generic solution.
In the preceding paragraphs we have spoken of non-generic solutions of one spe-
cic equation. One may also speak of a non-generic equation. Indeed, this term
describes the Lotka-Volterra system (1.33) perfectlyevery non-constant solution of
this system is periodic, but an arbitrarily small perturbation of the equation, such
as (1.41) with 1 and K 1, can completely change the phase plane.
Stay tuned for more occurrences of this concept, but not till Chapter 5.
35
Chapter 2
Linear systems with constant coecients
2.1 Preview
The bulk of this chapter is devoted to homogeneous linear systems of ODEs with
real constant coecients. Such a system may be written
x

1
= a
11
x
1
+ a
12
x
2
+ . . . + a
1d
x
d
x

2
= a
21
x
1
+ a
22
x
2
+ . . . + a
2d
x
d
.
.
.
.
.
.
.
.
.
x

d
= a
d1
x
1
+ a
d2
x
2
+ . . . + a
dd
x
d
.
(2.1)
(From now on we shall use d for the dimension of our systems so that the index n is
available for other uses.) The written-out system (2.1) is awkward to read or write,
and we shall normally use the vastly more compact linear-algebra notation
x

= Ax (2.2)
where x = (x
1
, x
2
, . . . , x
d
) is a d-dimensional vector of unknown functions, A is a
d d matrix with real entries, and matrix multiplication is understood in writing
Ax. In vector notation, an initial condition for (2.2) is
x(0) = b (2.3)
where b R
d
.
In the Section 2.2 we show that the initial-value problem (2.2), (2.3) has the
36
unique solution given by the formula
x(t) = e
At
b (2.4)
where the exponential of a matrix is dened in complete analogy with the exponential
of a scalar,
e
At
= I + At +
1
2!
(At)
2
+
1
3!
(At)
3
+ . . . . (2.5)
Our rst task, the goal of Section 2.2, is to show that the series (2.5) converges and
to establish the basic properties of the matrix exponential. Note that each term in
the series is a square matrix and hence, if it converges, the sum is a d d matrix;
thus the RHS of (2.4) is dimensionally consistent as a matrix product. Also note
that, unlike for dimension one, the two factors in (2.4) must be written in the order
in which they appear.
It turns out that using the series (2.5) is rarely the most convenient way to
compute e
At
. In Section 2.3 we discuss how to compute the exponential of a matrix
by nding its eigenvalues and eigenvectors. Some linear-algebra background for this
is reviewed in Appendix C.
The following simple calculation motivates the appearance of the eigenvalue prob-
lem in solving linear systems. We ask whether, in analogy with a scalar linear ODE,
there might be for some solutions of the vector equation (2.2) of the form e
t
times
a constant? Of course the constant would have to be a vector in order to have a
function of the appropriate dimension. Thus we rene our question to: are there any
scalars and any vectors v R
d
such that
x(t) = e
t
v (2.6)
is a solution of (2.2)? Making this substitution, we calculate for the two sides of
(2.2):
x

(t) = e
t
v, Ax(t) = e
t
Av.
The two sides of (2.2) are equal i
Av = v
where we have canceled the exponential factor, which is nonzero. In other words,
(2.6) is a solution of (2.2) if and only if v is an eigenvector of A with eigenvalue .
In the two nal sections of the chapter, we discuss the asymptotic behavior of
solutions of (2.2) as t , and we give formulas for solving an inhomogeneous
equation.
37
2.2 Denition and properties of the matrix exponential
2.2.1 Preliminaries about norms
To proceed, we need to dene what it means for a series of matrices like (2.5) to con-
verge. We could dene convergence of a series of matrices in terms of the convergence
of each entry, considered as a sequence of real numbers. However, proofs are simpler
if we introduce a metric on the set of matrices and use it to dene convergence. The
metric is based on how matrices act when multiplying vectors, so we begin with some
concepts related to vectors.
For any vector x R
d
, the d-dimensional generalization of the Pythagorean
Theorem suggests that we dene the length of x, written [x[, by
[x[ =

_
d

j=1
x
2
j
. (2.7)
In analysis it is more common to call the expression [x[ the norm of x rather than
its length, and we shall follow that usage. Note that the norm may be expressed as
[x[ =
_
x, x) (2.8)
where , ) denotes the usual inner product on R
d
,
x, y) =
d

j=1
x
j
y
j
. (2.9)
In two and three dimensions it is known that
x, y) = [x[ [y[ cos (2.10)
where is the angle between x and y. In d-dimensions, (2.10) is used to dene the
angle between two vectors. The following lemma, the Cauchy-Schwarz inequality,
supports this denition by guaranteeing that cos computed from (2.10) is at most
unity in absolute value.
Lemma 2.2.1. For any vectors x, y R
d
,
[x, y)[ [x[ [y[.
Proof. If y = 0, both sides of the inequality vanish and the result is trivial, so we
assume y ,= 0. Consider choosing a constant c to minimize
[x + cy[
2
= [x[
2
+ 2cx, y) + c
2
[y[
2
. (2.11)
38
x
1
x
2
y
x+y
x
Figure 2.1: Illustrating the triangle inequality.
To nd the minimum, we dierentiate (2.11) with respect to c, set the derivative
equal to zero, and solve the resulting trivial linear equation for c, obtaining
c

=
x, y)
[y[
2
.
On substituting into (2.11) and combining terms, we nd
[x + c

y[
2
= [x[
2

x, y)
[y[
2
.
Since [x + c

y[
2
0, the lemma follows.
In the next lemma we collect several simple but fundamental properties of the
norm function.
Lemma 2.2.2. For any vectors x, y R
d
and scalar c R,
(i) [x[ 0, and [x[ = 0 i x = 0.
(ii) [cx[ = [c[ [x[
(iii) [x +y[ [x[ +[y[.
Inequality (iii) is called the triangle inequality, for reasons suggested by Figure 2.1.
Proof. We leave the derivation of properties (i) and (ii) to the reader; we merely call
the readers attention to the fact that the vertical bars in [c[ and in [x[the absolute
value of a scalar and the norm of a vectorare subtly dierent. Regarding (iii),
observe that
[x +y[
2
= [x[
2
+ 2x, y) +[y[
2
.
39
Using Lemma 2.2.1 to bound the middle term, we compute that
[x +y[
2
[x[
2
+ 2[x[ [y[ +[y[
2
= ([x[ +[y[)
2
.
The result follows on taking a square root.
The norm of a vector species its size. The analogous quantity measuring size
for matrices, written with double bars |A| and but also called the norm, is dened
in terms of the operation of a matrix on vectors; specically, if A is a d
1
d
2
matrix,
we dene
|A| = max
|x|1
[Ax[. (2.12)
In this expression, for Ax to be dened, x must be a d
2
-dimensional vector, while
Ax has dimension d
1
. Thus if d
1
,= d
2
, [x[ and [Ax[ are computed with respect
to dierent spaces, even though the notation does not indicate this explicitly. In
Exercise 1 we ask the reader to justify that the maximum (2.12) actually exists. This
follows from compactness, a topic covered in real analysis courses and discussed below
in Appendix B. Even if these ideas may seem rather abstract when rst encountered
in a theoretical course, we hope that seeing them used in specic applications will
demystify them. In any case, the reader needs to become comfortable with them.
Let us collect useful properties of the matrix norm.
Lemma 2.2.3. For any matrices A, B of appropriate dimensions, for any vector
x R
d
, and for any scalar c R,
(i) |A| 0, and |A| = 0 i A = 0.
(ii) |cA| = [c[ |A|
(iii) |A + B| |A| +|B|
(iv) [Ax[ |A| [x[
(v) |AB| |A| |B|
(vi) |A
2
| |A|
2
.
These properties are of vital importance, and in the Exercises we ask you to verify
them. Although this task probably seems less than exciting, we urge the reader not
to skip over it lightly because it helps develop prociency with the use of norms.
The next result relates the norm of a vector or matrix to information about the
size of its entries.
Lemma 2.2.4. If x R
d
, then
max
1jd
[x
j
[ [x[
d

j=1
[x
j
[.
40
If A is a d
1
d
2
matrix with entries a
jk
, then
max
1jd
1
max
1kd
2
[a
jk
[ |A|
d
1

j=1
d
2

k=1
[a
jk
[.
We refer the reader to Exercise 1 for hints on how to prove the matrix part of
this lemma.
Note that properties (iiii) in Lemmas 2.2.2 and 2.2.3 are the same. Indeed, any
function | | from a vector space to the non-negative reals satisfying these three
properties is called a norm. Given a norm, one may dene the distance between two
vectors in the space
1
. Thus, for vectors in R
d
and for matrices we dene
dist(x, y) = [x y[, dist(A, B) = |A B|, (2.13)
respectively.
2.2.2 Convergence
Given the notion of distance, one may dene convergence. Specically, for matrices,
we say that a sequence A
n
of matrices converges to a limit L if
lim
n
|A
n
L| = 0.
In the usual way we say that an innite series of matrices

0
A
n
converges if the
sequence of partial sums

N
0
A
n
converges as N . More carefully,

0
A
n
converges to a limit matrix L if, for every > 0 there is an integer
2
N
0
such that
N > N
0
=
_
_
_
_
_
N

n=0
A
n
L
_
_
_
_
_
< .
Lemma 2.2.5. A sequence of matrices A
n
converges if and only each sequence of
entries converges, and likewise for an innite series

0
A
n
.
Proof. This result is easily proved using Lemma 2.2.4; we leave the details to the
reader.
1
In the next chapter, we shall explore this construction in an innite-dimensional context.
2
Sometimes in denitions of this sort one appends a restriction like N
0
> 0. This is necessary,
for example, in the usual , denition of the continuity of a function of a real variable (i.e., for
every > 0 there is a > 0 such that [x x
0
[ < implies that [f(x) f(x
0
)[ < ): if were
negative, the implication would be valid by virtue of its hypothesis never being satised. In the
present case, no such restriction on N
0
is needed to avoid trivialities.
41
We shall say that a series

0
A
n
of matrices is absolutely convergent if

0
|A
n
| <
.
Lemma 2.2.6. If

0
A
n
is absolutely convergent, then the series converges; i.e.,
there exists a matrix L such that
lim
N
N

n=0
A
n
= L.
Moreover, for any integer M,
_
_
_
_
_
M

n=0
A
n
L
_
_
_
_
_

n=M+1
|A
n
|. (2.14)
Proof. In Exercise 1 we ask the reader to invoke the analogous result for scalars
(Proposition B.0.8 from Appendix B) and to use Lemma 2.2.4 to reduce the proof
of convergence of the matrix series to the scalar case. The inequality (2.14), a seem-
ingly innocuous generalization of the triangle inequality to an innite sum, actually
requires a limiting argument that we outline in the Exercise.
Proposition 2.2.7. The series

0
(At)
n
/n! in (2.5) converges absolutely.
Proof. For the general term in (2.5) we have the estimate
_
_
_
_
(At)
n
n!
_
_
_
_

([t[ |A|)
n
n!
, (2.15)
and by comparison with the series representation of the ordinary exponential e
|t| A
,
we see that

0
|(At)
n
/n!| < .
We will write e
At
for the sum (2.5), which is guaranteed to exist by the proposi-
tion.
Corollary 2.2.8. For any real number M, the series

0
(At)
n
/n! converges uni-
formly on t : [t[ M. I.e., given any M, for any > 0 there is an integer N
0
such that for any N > N
0
and all t satisfying [t[ M,
_
_
_
_
_
N

n=0
(At)
n
n!
e
At
_
_
_
_
_
< .
Proof. By (2.14) and (2.15)
_
_
_
_
_
N

n=0
(At)
n
n!
e
At
_
_
_
_
_
<

n=N+1
([t[ |A|)
n
n!
<

n=N+1
(M|A|)
n
n!
42
Since the series for e
MA
converges, if N is suciently large, the RHS of this in-
equality can be made arbitrarily small.
2.2.3 The main theorem
Having slogged through a lot of rather dry material, and with more of the same
ahead of us, let us reward ourselves by jumping ahead to the main result of this
section. We return to the case where A is a square matrix, say d d.
Theorem 2.2.9. For any b R
d
the solution to the IVP
x

= Ax, x(0) = b
is unique, and it is given by the formula
x(t) = e
At
b. (2.16)
To prove the theorem, its back to the salt mines. First we must study the
dependence of e
At
on t. If (t) is a matrix-valued function, we shall say that is
continuous at t or that is dierentiable at t with derivative L if
lim
t0
(t + t) = (t) or lim
t0
(t + t) (t)
t
= L,
respectively. It is natural to interpret these limits using norms. Of course, by
Lemma 2.2.4, is continuous or dierentiable in this sense if and only if each entry
of is continuous or dierentiable. In the obvious notation, we shall write

(t) for
the derivative of at t, and we shall call continuously dierentiable on an interval
if is dierentiable at every point in the interval and

(t) is continuous with respect


to t there.
Proposition 2.2.10. e
At
is a continuously dierentiable function of t and
d
dt
e
At
= Ae
At
= e
At
A.
Discussion: The proof of this result would be simple if we could dierentiate an
innite series of functions term by term; in symbols,
d
dt

n=0
f
n
(t) =

n=0
df
n
dt
(t). (2.17)
However, while the derivative of a nite sum is the sum of the derivatives, this need
not be true for an innite sum. Corollary B.0.10 in Appendix B provides a sucient
condition justifying term-by-term dierentiation; specically (2.17) is valid if the
43
series on the RHS converges uniformly. The appendix also contains a counterexample
in which (2.17) fails.
Proof. Each term (At)
n
/n! = t
n
(A
n
/n!) in the series for e
At
is continuously dieren-
tiable; we have
d
dt
(At)
n
n!
= nA
(At)
n1
n!
= n
(At)
n1
n!
A,
and we may simplify by observing that n/n! = 1/(n 1)! provided n 1. Taking a
nite sum, we have
d
dt
N

n=0
(At)
n
n!
= A
N

n=1
(At)
n1
(n 1)!
=
N

n=1
(At)
n1
(n 1)!
A.
Now
N

n=1
(At)
n1
(n 1)!
=
N1

m=0
(At)
m
m!
,
and as N this series converges to e
At
. Indeed, as in the proof of Corollary 2.2.8,
the convergence is uniform for [t[ M, so the proposition follows by applying Corol-
lary B.0.10 in Appendix B.
In the next two results we suppress t in e
At
for brevity. Since A is an arbitrary
matrix, we lose no generality by doing this.
Proposition 2.2.11. The exponential of the zero matrix is the identity; in symbols,
e
0
= I. For any matrix A, e
A
is invertible and
(e
A
)
1
= e
A
.
Proof. It is readily seen from the series expansion (2.5) that e
0
= I. Regarding the
claim about inverses, let (t) = e
At
e
At
. According to the previous proposition, each
factor in is continuously dierentiable. In Exercise 1 we ask the reader to prove
that the product of two continuously dierentiable matrix-valued functions is con-
tinuously dierentiable and its derivative is given by Leibniz rule for dierentiation
of a product. Thus,
d
dt
(t) =
_
d
dt
e
At
_
e
At
+ e
At
_
d
dt
e
At
_
= e
At
(+A)e
At
+ e
At
(A)e
At
= 0,
where we have applied Proposition 2.2.10. Hence (t) = (0) = I for all t, in
particular for t = 1, and the result is proved.
One consequence of Proposition 2.2.10 is that A and e
At
commute. More gener-
ally, we have:
44
Proposition 2.2.12. If AB = BA, then Ae
B
= e
B
A, e
A
e
B
= e
B
e
A
, and
e
A+B
= e
A
e
B
.
Proof. We prove only the displayed formula; the other two results are left for the
Exercises. Let
(t) = e
t(A+B)
e
tA
e
tB
.
By Leibniz rule
d
dt
(t) = e
t(A+B)
(A B)e
tA
e
tB
+ e
t(A+B)
(A)e
tA
e
tB
+ e
t(A+B)
e
tA
(B)e
tB
.
In third term we may commute the middle two factors, e
tA
and B, and then all three
terms add up to zero. Thus (t) = (0) = I for all t.
It is now a simple matter to prove the main result of this section:
Proof of Theorem 2.2.9. It is obvious that x(t) = e
At
b satises the initial condition.
To show that it satises the equation, just dierentiate and apply Proposition 2.2.10.
To show uniqueness, suppose x(t) is one solution and let
y(t) = e
At
x(t). (2.18)
Dierentiate (2.18) to show that
d
dt
y(t) = e
At
Ax(t) +e
At
d
dt
x(t).
Since x(t) satises the ODE, the two terms in this equation cancel, yielding dy(t)/dt =
0. Thus
y(t) = y(0) = x(0) = b,
and the result follows on multiplying (2.18) by e
At
.
2.3 Calculation of the matrix exponential
2.3.1 The role of similarity
Suppose x(t) is a solution of the linear system
x

= Ax. (2.19)
Let us consider a linear change of coordinates for the unknown functions x
j
(t);
i.e., let S be a nonsingular matrix and dene a new vector of unknown functions by
45
y(t) = Sx(t). Then we may derive an ODE for y(t) as follows:
y

= Sx

(t) = SAx = SAS


1
y.
In other words y also satises a linear homogeneous system of ODEs, and the coef-
cient matrix in the ODEs for y is SAS
1
a matrix similar to A in the technical
sense of linear algebra. The following proposition guarantees that the exponentials
of similar matrices are themselves similar.
Proposition 2.3.1. If B = SAS
1
then e
Bt
= Se
At
S
1
.
The reader is asked to prove this result in Exercise 1
To illustrate the value of this result, let us suppose that the matrix A in (2.19) is
diagonalizable over R. Specically, suppose that A = SS
1
where = Diag(
1
,
2
, . . . ,
d
)
is a diagonal matrix
3
with eigenvalues of A along its diagonal:
=
_

1
0 0 . . . 0
0
2
0 . . . 0
0 0
3
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . .
d
_

_
,
where
j
R. Hence by Proposition 2.3.1
e
At
= Se
t
S
1
.
Now, for any power,
n
is also diagonal, simply Diag(
n
1
,
n
2
, . . . ,
n
d
). Thus the series
for e
t
converges to the diagonal matrix
e
t
= Diag(e

1
t
, e

2
t
, . . . , e

d
t
).
Hence
e
At
= S Diag(e

1
t
, e

2
t
, . . . , e

d
t
) S
1
. (2.20)
To take advantage of (2.20), we need to be able to calculate the similarity matrix
that diagonalizes A, and the next result tells us how to do this. In this propo-
sition, the notation Col(v
1
, v
2
, . . . , v
d
) denotes the matrix whose columns are the
specied vectors v
1
, v
2
, . . . , v
d
. If A is diagonalizable over R, then there is a basis
v
1
, v
2
, . . . , v
d
for R
d
consisting of eigenvectors of A.
3
Unless is a multiple of the identity, it is not unique because the eigenvalues may be enumerated
in any order.
46
Proposition 2.3.2. Suppose A is diagonalizable over R with eigenvectors v
1
, v
2
, . . . , v
d
,
and let S = Col(v
1
, v
2
, . . . , v
d
). Then
4
S
1
AS = ,
where is the diagonal matrix whose j, j-entry is the eigenvalue
j
of A associated
with the eigenvector v
j
.
Proof. The proof of this proposition relies on one of the interpretations of matrix
multiplication: specically (See Assertion 1D(ii) on p25 of Strang [5])
The j
th
-column of AB = A times the j
th
-column of B. (2.21)
Applying this interpretation to S
1
S = I, we deduce that
e
j
= S
1
v
j
where e
j
is the j
th
-column of the identity, or the j
th
-vector in the standard basis for
R
d
. Next, applying this interpretation to S
1
AS, we have
j
th
-column of (S
1
AS) = (S
1
A)v
j
.
By associativity of matrix multiplication,
(S
1
A)v
j
= S
1
(Av
j
) = S
1
(
j
v
j
) =
j
S
1
v
j
=
j
e
j
, .
In words, we have shown, column by column, that S
1
AS = , as claimed.
In the Exercises the reader is asked to use this Proposition to compute the expo-
nentials of various matrices.
It is instructive to re-interpret changes of variable in linear ODEs, as introduced
at the start of this subsection. Suppose A is diagonalizable over R, and choose S as
in Proposition 2.3.2 so that S
1
AS = . Let us compare the ODE x

= Ax with
y

= y obtained by the substitution y = Sx. In the x-equation, the rate of change


of x
j
depends on all the components of x, while in the y-equation, the rate of change
of y
j
, which equals
j
y
j
, depends only on the same component y
j
. In other words,
by diagonalizing A we are performing a change of coordinates on R
d
such that the
new coordinates y
j
evolve uncoupled from one another.
4
Up to this point it would not have mattered whether we considered the basic equation expressing
similarity as B = S
1
AS or B = SAS
1
. Here, however, there is a dierence: the columns of
the matrix S such that S
1
AS is diagonal are the eigenvectors of A, while neither the columns
nor rows of S
1
are as easily described.
47
2.3.2 Two problematic cases
The hypothesis in Proposition 2.3.2 may fail in two ways (and both failures may
occur together):
A has multiple eigenvalues but not enough eigenvectors, or
A has complex eigenvalues.
Let us consider simple examples of each case before dealing with the general case.
The following is the simplest example of a matrix that fails to have enough
eigenvectors to span R
d
:
A =
_
a 1
0 a
_
.
It is readily seen that = a is the only possible eigenvalue of A but the eigenspace
associated with this eigenvalue, ker(A aI), is only one-dimensional. However, let
us write
A = aI + N, where N =
_
0 1
0 0
_
.
Since I and N commute, we have from Proposition 2.2.12 that
e
(aI+N)t
= e
aIt
e
Nt
= e
at
e
Nt
.
Moreover, since N
2
= 0, the exponential series for e
Nt
truncates to just two terms,
e
Nt
= I + Nt =
_
1 t
0 1
_
, so e
At
= e
at
_
1 t
0 1
_
. (2.22)
The following matrix has complex eigenvalues
j
= a bi:
A =
_
a b
b a
_
. (2.23)
We may again apply Proposition 2.2.12 to calculate the exponential of A. Specically,
we write
A = aI + bJ, where J =
_
0 1
1 0
_
.
Since I and J commute,
e
(aI+bJ)t
= e
at
e
bJt
. (2.24)
48
Now as with nilpotent matrices, the exponential of J can be computed conveniently
using the power-series denition because of the fact that J
2
= I so that
J
n
=
_

_
I if n = 0 (mod 4)
J if n = 1 (mod 4)
I if n = 2 (mod 4)
J if n = 3 (mod 4).
Thus, grouping odd and even powers (this rearrangement of terms to be justied in
Exercise 1), we see
e
bJt
=
_
1
1
2!
(bt)
2
+
1
4!
(bt)
4
+ . . .
_
I +
_
bt
1
3!
(bt)
3
+
1
5!
(bt)
5
. . .
_
J, (2.25)
where the power series for cos bt and sin bt can be recognized. On substituting this
formula into (2.24), we obtain
e
At
= e
at
_
cos bt sin bt
sin bt cos bt
_
(2.26)
Incidentally, in Exercise 9 we ask the reader to prove, with hints, that every 22-
matrix with nonreal eigenvalues is similar to (2.23) for some values of a, b. Equation
(2.23) is called the real canonical form for 2 2 matrices with complex eigenvalues.
Let us show that the exponential of (2.23) may also be calculated by diagonal-
ization. For this we need to work over the complex numbers, starting with the basic
denitions. Temporarily, for a real vector x or a real matrix A we shall write [x[
R
or
|A|
R
for the norms dened above. Generalizing to complex vectors, if z C
d
, we
let
[z[
C
=

_
d

j=1
[z
j
[
2
. (2.27)
This norm may be calculated from the complex inner product [z[
C
=
_
z, z)
C
, where
z, w)
C
=
d

j=1
z
j
w
j
(2.28)
with z
j
denoting the complex conjugate of z
j
. If x R
d
, then [x[
R
= [x[
C
. If A is a
matrix with complex entries, then let
|A|
C
= max
|z|
C
1
[Az[
C
. (2.29)
49
If A has real entries, it is not obvious but still true that
|A|
R
= |A|
C
, (2.30)
which we ask the reader to verify in Exercise 1. Because of (2.30) we will omit the
subscript R or C in writing normsif A has complex entries, we understand | |
C
,
and if A has real entries, it doesnt matter which norm we choose. Moreover in the
Exercises we ask the reader to check that the various lemmas and propositions about
norms all carry over to the complex case.
Now we calculate the exponential of (2.23) by diagonalization. Let
S =
_
1 1
i i
_
, S
1
=
1
2
_
1 i
1 i
_
.
The columns of S are eigenvectors of A so by Proposition 2.3.2, we have S
1
AS =
where
=
_
a + bi 0
0 a bi
_
.
Therefore
e
At
= S e
t
S
1
= S
_
e
(a+bi)t
0
0 e
(abi)t
_
S
1
.
Recalling Eulers formula e
ibt
= cos bt + i sin bt and multiplying out the product, we
obtain (2.26).
2.3.3 Use of the Jordan form
By a Jordan block we mean a square matrix of the form
B =
_

_
1 0 . . . 0 0
0 1 . . . 0 0
0 0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . 1
0 0 0 . . . 0
_

_
.
In words, B has entries on the diagonal, 1 on the superdiagonal, and zeros
elsewhere. B may be of any dimension, including 1 1, in which case B is simply
the scalar . The diagonal entry is the only eigenvalue of B. No matter how large
the dimension of B may be, there is only one linearly independent eigenvector.
The Jordan normal-form theorem asserts that any square matrix A is similar to
50
a diagonal array of Jordan blocks; in symbols, S
1
AS = J, where
J =
_

_
B
1
0 0 . . . 0
0 B
2
0 . . . 0
0 0 B
3
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . B
M
_

_
.
Here, for m = 1, 2, . . . , M, the matrix B
m
is a d
m
d
m
Jordan block, and

M
1
d
m
= d,
the dimension of A. To shorten the notation we shall generalize the notation for
diagonal matrices and write J as
J = Diag(B
1
, B
2
, . . . , B
M
).
The Jordan canonical-form theorem is spot on for computing the exponential of
a matrix. First observe that
If A = S Diag(B
1
, . . . , B
M
) S
1
, then e
At
= S Diag(e
B
1
t
, . . . , e
B
M
t
) S
1
. (2.31)
The exponential of each Jordan block may be computed explicitly by the same
method as was used for the computation of the 2 2 Jordan block (2.23) above.
Specically to exponentiate a d d Jordan block,
1. Write B = I + N where N is the d d nilpotent matrix with ones on the
superdiagonal.
2. Observe that by Proposition 2.2.12, e
Bt
= e
t
e
tN
.
3. Calculate e
tN
with the truncated power series
I + tN +
1
2!
(tN)
2
+ . . . +
1
(d 1)!
(tN)
d1
.
This procedure yields
e
Bt
= e
t
_

_
1 t 0 . . . t
d2
/(d 2)! t
d1
/(d 1)!
0 1 t . . . t
d3
/(d 3)! t
d2
/(d 2)!
0 0 1 . . . t
d4
/(d 4)! t
d3
/(d 3)!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . 1 t
0 0 0 . . . 0 1
_

_
. (2.32)
51
In order to use this method one needs to be able to to nd the Jordan normal form
of a matrix. This may be done in a manner that generalizes Proposition 2.3.2, as is
explained in Appendix C. We urge you to read this section now and to apply the
method by doing Exercise ??? in the Appendix. We think you will nd our approach
to this topic refreshing. In particular, normal forms are determined with natural
calculations nding generalized eigenvectors although the minimal polynomial is
needed to prove that transformation to the Jordan form is possible, it is not needed
to calculate the Jordan form. Of course in general both the Jordan normal form J
and the similarity matrix S have complex entries.
The Jordan normal form is perfect for theoretical purposes because it exhibits
the structure of the solution so clearly. However, this normal form is poorly suited
to numerical computation because it is so sensitive to round-o errors. For example,
consider the matrices
A =
_
a 1
0 a
_
, A =
_
a 1
0 a +
_
. (2.33)
No matter how small > 0 may be, the structure of the Jordan normal forms of
these two matrices are completely dierentthe rst has a single 2 2 block, while
the second is diagonalizable and hence has two 1 1 blocks. (In the Exercises we
ask the reader to compare the exponentials of these matrices.)
2.4 Large-time behavior of solutions of homogeneous linear
systems
2.4.1 The main results
We shall say that the origin in R
d
is a sink (or attractor) for a linear system x

= Ax
if for every initial condition b R
d
lim
t
e
At
b = 0.
The eigenvalues of A provide an elegant test for such behavior.
Theorem 2.4.1. The origin is a sink for x

= Ax i
max
1jM

j
< 0.
The ideas underlying the proof of this theorem are clearest if A is diagonalizable
(over either the real or complex numbers), so we formulate a separate, stronger,
result for that case.
52
Proposition 2.4.2. If A = SS
1
where = Diag(
1
, . . . ,
d
), then
K
1
e
t
|e
At
| Ke
t
(2.34)
where K = |S| |S
1
| and
= max
1jM

j
, (2.35)
Proof. Regarding the upper bound in (2.34), observe that
|e
At
| = |Se
t
S
1
| |S| |e
t
| |S
1
|.
In Exercise 1 we ask the reader to show that
|e
t
| = max
1jd
[e

j
t
[.
Of course [e

j
t
[ = e
(
j
)t
, so |e
t
| = e
t
, from which the upper bound in (2.34)
follows.
Conversely, regarding the lower bound,
e
t
= |e
t
| = |S
1
e
At
S| K|e
At
|,
and the result follows on dividing by K.
The next result will be used in proving Theorem 2.4.1.
Proposition 2.4.3. For any > 0 there is a constant
5
K such that
|e
At
| Ke
(+)t
, (2.36)
where is given by (2.35).
Proof. We may prove the proposition by examining the Jordan normal form of A.
At rst, the derivation of (2.36) exactly parallels the proof of Proposition 2.4.2:
|e
At
| |S| |e
Diag(B
1
,...,B
M
)t
| |S
1
|
and
|e
Diag(B
1
,...,B
M
)t
| max
1mM
|e
Bmt
|. (2.37)
However, because of the polynomial entries in (2.32), as t tends to innity, |e
Bmt
|
may grow like t
p
e
(m)t
for some power p. The increase in the exponent from
to + in (2.36) compensates for this extra growth of |e
Bmt
|, provided one also
5
In an exercise in Chapter 5 we ask you to show that there is a matrix B that is similar to A and
satises |e
Bt
| e
(+)t
; i.e., (2.36) with the constant K = 1.
53
increases the constant K by an appropriate factor. The reader is asked to supply
the details of this argument in Exercise 1
Proof of Theorem 2.4.1. It follows from Proposition 2.4.3 that if < 0, then the
origin is a sink for x

= Ax. The proof that if > 0 then the origin cannot be a sink
is similar to the derivation of the lower bound for |e
At
| in (2.34); details are left for
the reader.
2.4.2 Tests for negative eigenvalues
Because of Theorem 2.4.1, it is useful to be able to test whether a matrix has all its
eigenvalues in the left half plane without actually having to nd the eigenvalues. For
2 2 and 3 3 matrices the following two results give a simple test. Please forgive
us a homily:
Use these results! Generations of students have ignored them, wasting
their time by calculating eigenvalues when it was not actually necessary.
Proposition 2.4.4. If A is a 2 2 matrix with real entries, then
j
< 0 i
(i) trA < 0 and
(ii) det A > 0.
Proposition 2.4.5. If A is a 3 3 matrix with real entries, then
j
< 0 i
(i) trA < 0,
(ii)
1
2
trA [(trA)
2
tr(A
2
)] < det A, and
(iii) det A < 0.
The proof of Proposition 2.4.4 is left as an Exercise. A proof of Proposition 2.4.5 is
given in the Appendix, Section 2.7.2. Regarding the latter proposition, Conditions (i)
and (iii) are clearly necessary for the eigenvalues of A to have negative real parts. In
Exercise 11 we suggest a calculation that helps motivate Condition (ii).
The following result is sometimes useful to test for oscillatory behavior in a 2 2
system. It may be derived by examining the quadratic formulas for the eigenvalues
of A.
Proposition 2.4.6. A real 2 2 matrix has complex eigenvalues if and only if
(trA)
2
< 4 det A.
54
2.5 Solution of inhomogeneous problems
In case x satises an inhomogeneous linear equation, say
x

= Ax +f (t), (2.38)
a solution to the IVP with x(0) = b is given by
x(t) = e
At
b +
_
t
0
e
A(ts)
f (s) ds. (2.39)
The derivation of this result, which will be used in Chapter 4, is included in Exer-
cise 1.
The formula (2.39) answers the existence question for the inhomogeneous equa-
tion. Uniqueness follows from Theorem 2.2.9 because if x
1
and x
2
are two solutions
of the IVP for (2.38), then their dierence y = x
1
x
2
satises a homogeneous
equation y

= Ay with initial condition y(0) = 0.


The similarity of (2.39) to the scalar case (1.16) is striking. However, no simple
formula exists for solutions of a linear system of ODE with variable coecients,
x

= A(t)x +f (t), (2.40)


unless A(t
1
) and A(t
2
) commute for all t
1
, t
2
. A counterexample demonstrating this
fact is contained in Exercise 12.
2.6 Exercises
2.6.1 Routine exercises
1. Supply details omitted in the text:
(a) Verify that the maximum in Equation (2.12) exists and is nite.
(b) Prove Lemmas 2.2.2, 2.2.3, 2.2.4, 2.2.5, and 2.2.6.
Hint for Lemma 2.2.3: In the matrix part of the lemma, the lower
bound for |A| may be obtained by applying A to a basis vector e
k
.
The upper bound may be obtained by writing A as a sum of d
1
d
2
matrices, each of which has only one nonzero entry.
Hint for Lemma 2.2.6: Regarding (2.14), for any N > M add and
subtract terms A
n
for n between M and N to write
M

n=0
A
n
L =
_
N

n=0
A
n
L
_

_
N

n=M+1
A
n
_
.
55
For any > 0, N may be chosen large enough so that the rst term
here is bounded by , and since the second term is a nite sum, it
may be estimated using the triangle inequality.
(c) Prove Leibniz rule for dierentiation of the product of two matrix-valued
functions of a scalar variable:
d
dt
(t)(t) =

(t)(t) + (t)

(t),
and prove the rst two assertions in Proposition 2.2.12.
(d) Prove Proposition 2.3.1.
second term is a nite sum, it may be estimated using the triangle inequality.
Remark: One may prove this result either by comparing terms in the
series for the two sides of the equation or by starting from t = 0 and
dierentiating.
(e) Justify the rearrangement of terms in Equation (2.25).
(f) Establish Equation (2.30).
(g) Show that if = Diag(
1
, . . . ,
d
), then
|e
t
| = max
1jd
[e

j
t
[,
a result used in proving Proposition 2.4.2. Generalize this result to a diagonal
array of Jordan blocks as in (2.37).
(h) Complete the proofs of Theorem 2.4.1 and Proposition 2.4.3.
Discussion: Supplying the missing details in the proof of Proposi-
tion 2.4.3 is a skill-building exercisedo it! Here is a practice prob-
lem that may help in this task: Prove that for any positive power
p,
max
0t<
t
p
e
t
exists and is nite. The exact value of the maximum can in fact be
computed using calculus, but it is useful training to do this practice
exercise merely with estimation, as follows: Argue that
lim
t
t
p
e
t
= 0.
56
Deduce that there is a constant M such that t
p
e
t
< 1 if t > M.
Therefore
max
0t<
t
p
e
t
max
_
max
0tM
t
p
e
t
, 1
_
and the maximum over [0, M] is nite by compactness.
(i) Prove Propositions 2.4.4 and 2.4.6.
(j) Verify that (2.39) satises (2.38). One may derive (2.39) by multiplying
(2.38) by e
At
and manipulating the result, or one may just dierentiate (2.39)
and see that (2.38) is actually satised.
(k) Show that the theory of Sections 2.2 and 2.3 extends to matrices with
complex entries. (Ugh!)
2. Suppose A is a square matrix with at least one eigenvalue such that < 0.
Show that the linear system x

= Ax has at least one nonzero solution x(t)


such that
lim
t
x(t) = 0.
3. Show that if A is d 1a column vectoror 1 da row vectorthen
|A| is just the norm of the vector. Also show that for an invertible matrix
|S||S
1
| 1.
4. Rederive (2.22) by explicitly solving the ODE x

= Ax.
Hint: The x
2
-equation does not involve x
1
; solve this equation rst
and then attack the x
1
-equation.
5. Compute e
tA
for the following matrices A:
(a)
_
a 1
0 a +
_
Hint: Subtract aI from this matrix, exponentiate the result, and
then multiply by e
at
.
Discussion: Recalling (2.33), you will want to compare your answer
with (2.22), the exponential of the Jordan block.
(b)
_
_
1 0 0
1 2 0
1 0 1
_
_
57
6. Find the 22 matrix A that has the indicated eigenvalues and eigenvectors:
e-value e-vector
1/ (1, 1)
1 (1, 1 +)
Discussion: This exercise is easy if one makes use of Proposition 2.3.2.
The point of the exercise is to observe that |A| may become large
if and/or tends to zero. This behavior is not surprising if 0
since every eigenvalue of A is bounded by |A|. It may be surprising
for 0i.e., the two eigenvectors of the matrix become nearly
parallel. Question: The norm |A| need not blow up as 0 if
1; explain this.
2.6.2 Exercises of independent interest
7. (a) Prove the following slight improvement of the upper bound in Lemma 2.2.4:
|A| max
i

j
[a
ij
[.
(b) Derive the following exact formula for the norm of a matrix:
|A| =
_

max
(A
T
A).
Use this result to nd the norm of
_
1 2
0 1
_
.
Compare this answer with the estimate of Part (a).
Hint for Part (b): Deduce from the denition of |A| that
|A|
2
= max
|x|1
A
T
Ax, x)
and invoke the spectral theorem for symmetric matrices to estimate
A
T
Ax, x).
8. Show that if for all x R
d
, x, Ax) [x[
2
, then
|e
At
| e
t
.
58
Discussion: The hypothesis in this Exercise implies that
j
(A)
(show this). Thus, the estimate (2.36) is available for e
At
. The
point of this exercise is that under the stronger hypothesis the con-
stants and K in (2.36) are not needed. Incidentally, another ap-
proach in which K is not needed is described in Additional notes,
Section 2.7.
Hint: For any x R
d
, let u(t) = [e
At
x[
2
. Use the hypothesis to show
that
d
dt
u(t) 2u(t).
Then estimate u(t) by dierentiating e
2t
u(t). Finally consider max-
imizing over [x[ 1.
9. Suppose A is a 22 real matrix with eigenvalues a ib where b ,= 0. Let
u = v + iw be an eigenvector of A with eigenvalue a + ib; thus
A(v + iw) = (a + ib)(v + iw). (2.41)
Let S be the 22 matrix Col(v, w). Deduce from (2.41) that S
1
AS = C
where
C =
_
a b
b a
_
. (2.42)
Remark: As mentioned in the text, the matrix C is called the real
canonical form of A.
10. If A is an d d matrix with real entries, dene the Euclidean norm of A,
|A|
E
=
_
d

j,k=1
a
2
jk
_
1/2
.
Determine which of the following is true and prove it:
(a) For all A dierent from zero, |A| < |A|
E
.
(b) For all A, |A| |A|
E
, with equality occurring for at least one nonzero
A.
(c) There is a matrix A such that |A| > |A|
E
.
11. This exercise is intended to make Condition (ii) in Proposition 2.4.5 seem
less mysterious. Consider this inequality applied to a matrix with eigenvalues
a, i, where a > 0. First show that if = 0, then the two sides of the
inequality are in fact equal. Then extend your calculation to show that that
Condition (ii) holds if < 0 and is violated if > 0.
59
12. (a) Consider the variable-coecient ODE (2.40), supposing that A(t
1
) and
A(t
2
) commute for all t
1
, t
2
. Let
A(t) =
_
t
0
A(s) ds.
Show that
x(t) = e
A(t)
b +
_
t
0
e
A(t)A(s)
f (s) ds
solves the IVP for (2.40) with initial condition x(0) = b.
(b) Solve (2.40) in case
A(t) =
_
0 t
0 1
_
,
show that this solution diers from what the above construction produces.
2.7 Additional notes
We urge you to re-read the paragraph at the end of Section 2.3.1 and to take to heart
its message: two systems of ODEs with similar coecient matrices, say x

= Ax and
y

= S
1
ASy, dier only by the choice of coordinates on R
d
they describe exactly
the same phenomena.
There are numerous alternative norms used to measure the size of vectors and
matrices. For vectors, two common choices are
[x[
1
=
d

j=1
[x
j
[ and [x[

= max
1jd
[x
j
[,
which give rise to matrix norms
|A|
1
= max
|x|
1
1
[Ax[
1
and |A|

= max
|x|1
[Ax[

.
If S is an invertible matrix, the norms
[x[
S
= [Sx[, |A|
S
= max
|x|
S
1
[Ax[
S
= |S
1
AS|
where [ [ is the usual mean-square norm (2.7), are sometimes useful. Another choice
for matrices is |A|
E
, discussed in Exercise 10.
60
Chapter 3
Nonlinear systems: local theory
3.1 Two counterexamples
In this chapter we formulate the main existence and uniqueness theorem for the
IVP for systems of nonlinear ODEs. For the moment we consider only autonomous
systems, say
x

= F(x) (3.1)
where F : R
d
R
d
. More generally, we may assume that F is dened only on
an open subset | R
d
. In Section 3.4, the theory is extended to nonautonomous
systems.
To introduce the main theorems, we begin with two examples to emphasize what
they do not say.
Example 1: Without special conditions on F, the RHS in (3.1), the IVP may
possess a solution only for a nite, possibly very short, time. To see this, consider
the IVP for a scalar unknown function x(t)
x

= x
2
, x(0) = 1. (3.2)
The equation may be solved using separability:
dx
x
2
= dt, which integrates to
1
x
= t C.
Solving for x and imposing the IC to deduce that C = 1, we obtain
x(t) =
1
1 t
. (3.3)
The reader may check that this formula satises both the equation and the initial
61
condition, but this solution exists only for t < 1.
Strictly speaking formula (3.3) makes sense provided t ,= 1, but in most modeling
contexts, continuation of the solution beyond the blow-up timei.e., to t > 1is
rejected on physical grounds. Suppose for example that x represents a population; re-
emergence of x with large negative values after the singularity at t = 1 is nonsensical.
We shall say that the solution ceases to exist at t = 1.
The blow up of (3.3) at t = 1 may be understood as follows: From the equation
we see that x

> 0, so the solution is always increasing. As the solution grows,


the equation forces x

to increase ever more quickly, and the growth accelerates


out of control in a nite time. It is instructive to compare (3.2) with the linear
equation x

= x, whose solutions also grow without bound, but in the latter case the
cumulative growth up to time t remains nite no matter how large t may get. The
key dierence between these equations is that in (3.2) the RHS x
2
grows faster than
linearly as x ; nite blow-up time is associated with such superlinear growth.
(See the Exercises for more examples relating the growth of F and blow-up in nite
time.)
In Chapter 4 we give sucient conditions to guarantee that the solution to an
IVP exists for all time.
Example 2: Without special conditions on F, solutions of the IVP for (3.1) need
not be unique. To see this, consider another scalar IVP
1
x

=
_
[x[, x(0) = 0. (3.4)
Note that x

0, so that for positive t we have x(t) 0, and thus we may drop the
absolute value in the equation. Again we may solve x

=

x by separability to get
the general solution
x(t) =
_
t C
2
_
2
.
Choosing C = 0 to satisfy the IC, we get the solution x(t) = t
2
/4. The reader may
check that this formula satises both the equation and the initial condition.
However note that x(t) 0 also solves both the equation and the initial condi-
tions. In other words, the solution to this IVP is not unique. Moreover, the situation
is worse than is so far apparent: as we ask you to verify in the Exercises, for any
constant t
0
0 the function
x(t) =
_
0 if t t
0
(t t
0
)
2
/4 if t > t
0
(3.5)
1
Recall that in Exercise 1 of Chapter 1 we found multiple solutions to (1.44). The present example
is simply a reworking of (1.44) to make the equation a little simpler and to make the RHS dened
in a full neighborhood of the singularity at x = 0.
62
is a continuously dierentiable solution of the equation that also satises the initial
condition. In other words, there are innitely many solutions of the IVP.
The problem with this example stems from the singularity of

x at the origin. As
we shall see below, uniqueness may be guaranteed if F is continuously dierentiable.
One might wish that physical problems always led to nonsingulare.g., continuously
dierentiableequations. Unfortunately this is not true: in the Exercises we intro-
duce a physically based ODE with a singularity like (3.4) and discuss non-uniqueness
issues in this example.
3.2 The existence theorem
3.2.1 Statement of the theorem
In this section we state and prove the fundamental existence theorem for the IVP
for (3.1). We will assume that the function F on the RHS of the ODE is continuous,
and in fact we will impose a stronger condition that we now describe.
If S is a subset of R
d
1
, a function F : S R
d
2
is called Lipschitz continuous, or
simply Lipschitz, if there is a constant L such that for all points x, y S,
[F(x) F(y)[ L[x y[. (3.6)
Condition (3.6) is much more restrictive than mere continuity. For example on the
line F(x) =
_
[x[ is continuous but not Lipschitz continuous. In fact, a continuous
function can exhibit much more pathological behavior than a square root, such as
the Cantor function , also called the devils staircase.
Let | be an open subset of R
d
1
, and let F : | R
d
2
. We shall call F locally
Lipschitz if for every point x
0
| there is a neighborhood 1 of x
0
such that the
restriction F[1 is Lipschitz. For example, on the real line the function F(x) = x
2
is locally Lipschitz, even though it is not Lipschitz on the whole line. Sometimes,
to emphasize the distinction with locally Lipschitz, we shall say that a function is
globally Lipschitz on S to mean that it is Lipschitz on S.
We shall study (3.1) under the assumption that F is locally Lipschitz. As Propo-
sition 3.2.2 below shows, a (
1
function is locally Lipschitz; in fact, being locally
Lipschitz is only slightly less restrictive than being (
1
. In most applications we con-
sider in later chapters, F will actually be (
1
; we consider the more general condition
here only because, as we shall see, the local existence and uniqueness theory is so
well suited to a Lipschitz condition.
Here is our fundamental existence theorem for the IVP: i.e., given b R
d
, to nd
a continuously dierentiable function x(t) such that
x

= F(x), x(0) = b. (3.7)


63
Theorem 3.2.1. Let | R
d
be open and contain the initial data b, and let F :
| R
d
be locally Lipschitz. Then there is an interval (, ) and a (
1
function
x : (, ) | that satises (3.7).
The construction of x will show that the solution is unique on the possibly-
very-short interval (, ) in the theorem. In fact, however, uniqueness holds in far
greater generality, as we shall show in Section 3.3. Pending those stronger results, we
ignore the information regarding uniqueness that may be obtained through proving
Theorem 3.2.1.
Remark: As we discuss in the Exercises, the existence theorem may be easily
generalized to nonautonomous equations, x

= F(x, t).
We must develop substantial preliminaries before we are ready to prove Theo-
rem 3.2.1. The results of the next subsection are not actually needed to prove the
theorem, but they help elucidate Lipschitz continuity.
3.2.2 Dierentiability implies Lipschitz continuity
Proposition 3.2.2. If | R
d
1
is open and F : | R
d
2
is (
1
, then F is locally
Lipschitz.
Incidentally, the converse of this result is not truefor example, on the real line,
F(x) = [x[ is locally Lipschitz (globally Lipschitz, in fact) despite not being dier-
entiable at x = 0. In Exercise ?? we propose a more dramatic example illustrating
the same point.
We oer two proofs of the proposition, the rst only for scalar-valued functions
of one variable and the second for the general case. The second proof is greatly to
be preferred. We oer the rst proof only because it illustrates a common bad habit
among beginning analysis studentsover-reliance on the mean-value theoremand
we want to have an identied target to shoot down.
Proof 1 (Only for d
1
= d
2
= 1). Given x
0
|, choose a closed interval 1 such that
x
0
Int 1 1 |,
where Int means interior, and let
L = max
xI
[F

(x)[.
Given x, y 1, we have from the mean-value theorem that there is a point between
x and y such that
F(x) F(y) = F

()(x y),
so by the choice of L
[F(x) F(y)[ L[x y[.
64
This proof generalizes easily to all d
1
, but there are problems with it if d
2
> 1.
The diculty is that one needs to apply the mean-value theorem separately to each
of the d
2
components of F. This is not impossible, but it results in a clunky proof.
The following is a far more elegant alternative, and we strongly urge you to absorb
the idea in Lemma 3.2.3 on which the proof is based.
Proof 2 (For general d
1
, d
2
). Given x
0
|, we choose as an appropriate neighbor-
hood of x
0
a closed ball

B(x
0
, ) that is contained in |. We show that F is Lipschitz
on this ball with Lipschitz constant
L = max
z

B(x
0
,)
|DF(z)|. (3.8)
Let two points x, y

B(x
0
, ) be given. We isolate the following simple lemma as a
separate result so that we can refer to it later.
Lemma 3.2.3. In the above notation,
F(x) F(y) =
__
1
0
DF(y + s(x y)) ds
_
(x y). (3.9)
Proof. Note that the argument of DF in the above integrand,
(s) = y + s(x y), 0 s 1, (3.10)
denes the line segment from y to x, which is entirely contained in

B(x
0
, ) |.
Thus, the composition F : [0, 1] R
d
2
is dened and (
1
. By the Fundamental
Theorem of Calculus
2
F(x) F(y) =
_
1
0
d
ds
[F ] (s) ds. (3.11)
According to the chain rule, (d/ds)[F ] = DF

, and dierentiation of (3.10)


yields

(s) = x y. Thus (3.9) follows.


Proof 2 of Proposition 3.2.2, concluded. Recalling the denition (3.8), we estimate
2
This equation involves vector-valued functions, which might worry you. Roughly speaking, cal-
culus for vector-valued functions of a single real variable is no more complicated than the usual
one-variable calculus. For example, each component of (3.11) is simply the standard Fundamen-
tal Theorem of Calculus. By contrast, calculus of functions of several real variables introduces
many new complications, such as divergence, curl, multiple integrals, Greens theorem, etc. The
few ideas from multi-variable calculus needed for this book, such as the chain rule, are developed
in Appendix D.
65
from (3.9) that
[F(x) F(y)[ L[x y[
_
1
0
ds,
which yields the required bound.
Although we do not make this an explicit exercise, we ask you to verify that the
exact same construction as in the above proof supports the following extension:
Corollary 3.2.4. If | R
d
1
is open, if F : | R
d
2
is (
1
, and if / | is convex,
then F[/ is Lipschitz.
3.2.3 Reformulation of the IVP as an integral equation
The proof of Theorem 3.2.1 is based on analyzing the equivalent integral equation
(3.12) that appears in the following Proposition. The integral equation is more
tractable than (3.7) because (i) integration is a much less singular operation than
dierentiation and (ii) the two separate equations in (3.7) are combined in a single
integral equation.
In the Proposition, it suces if F is merely continuous, so we temporarily weaken
our hypotheses. Also we consider the IVP on a general interval (, ).
Proposition 3.2.5. Let | R
d
be open, and let F : | R
d
be continuous. If
x (
1
((, ), |) satises (3.7), then x satises the integral relation
x(t) = b +
_
t
0
F(x(s)) ds, < t < . (3.12)
Conversely, if x is continuous on (, ) and satises (3.12), then x is (
1
and satises
(3.7).
The proof of this result is a straightforward application of the Fundamental The-
orem of Calculus, and we leave it as an Exercise for the reader.
Despite its appearance, equation (3.12) is not a formula that tells us what the
solution is because we need to know x in order to evaluate the integral.
3.2.4 The contraction-mapping principle
In Chapter 2 we encountered the concept of a norm on a vector space X: i.e., a
function | | : X [0, ) such that for any vectors x, y X and scalar c R
(i) |x| 0, and |x| = 0 i x = 0.
(ii) [cx[ = [c[ |x|
(iii) |x +y| |x| +|y|.
(3.13)
66
There the norms were dened on nite dimensional spaces of vectors or matrices.
In the present chapter we are interested in norms on innite dimensional spaces,
especially on (([, ], R
d
), the set of continuous functions from the closed interval
[, ] into R
d
, where for x (([, ], R
d
) we dene
|x| = max
t
[x(t)[. (3.14)
Since [, ] is compact, the maximum exists. In the Exercises, the reader is
asked to show that (3.14) satises the axioms (3.13). Convergence of a sequence
in (([, ], R
d
) with respect to this norm is simply uniform convergence of a se-
quence of functions.
Let X, | | be a normed linear space. A sequence x
n
X is called Cauchy if
for every > 0 there is an integer N such that
m, n > N = |x
m
x
n
| < .
The space X is called complete if every Cauchy sequence converges: i.e., there exists
an element x

such that
lim
n
|x
n
x

| = 0.
Both these concepts should be familiar to the reader from their application to the
real numbers.
Proposition 3.2.6. The space (([, ], R
d
) is complete.
This theorem is merely a restatement of the result (Theorem B.0.5 in Appendix B)
that the uniform limit of a sequence of continuous functions is itself continuous.
Incidentally, a complete normed linear space is called a Banach space. Thus, in
this terminology, Proposition 3.2.6 asserts that (([, ], R
d
) is a Banach space.
After two more denitions, we will be ready to state and prove the contraction-
mapping principle. Let be a subset of a normed linear space X, and let T :
be some mapping of that set into itself. We use a Gothic letter for the mapping as
a warning that it is a more complicated mathematical object than others we have
encountered so farif for example X = (([, ], R
d
), then T needs a vector-valued
function x(t), t , as its argument, and the result of applying T to x, which
we write as T[x] with square brackets, is also a function on [, ]. We shall call T
a contraction if there is a constant C < 1 such that for all x, y
|T[x] T[y]| C|x y|; (3.15)
in words, for T to be a contraction, it must be Lipschitz continuous with a Lipschitz
constant less than unity. (Although we originally dened Lipschitz continuity for
functions on R
d
, the denition generalizes to any metric space, even an innite-
67
dimensional one as here.) Finally we shall call a point x a xed point of T if
T[x] = x.
Theorem 3.2.7. If is a closed subset of a Banach space X and if T : is
a contraction, then T has a unique xed point in .
Proof. Choose a vector x
0
arbitrarily. Dene a sequence inductively as follows:
having chosen x
0
, x
1
, . . . , x
n
, let x
n+1
= T[x
n
]. Observe that
|x
n+1
x
n
| = |T[x
n
] T[x
n1
]| C |x
n
x
n1
|.
Iterating this inequality, we deduce that
|x
n+1
x
n
| C
n
|x
1
x
0
|. (3.16)
We ask the reader to show that, since C < 1, (3.16) implies that x
n
is Cauchy.
Then since X is complete, we conclude that x
n
has a limit x

in X; moreover,
since is closed, x

.
We claim that x

is a xed point of T. To see this, observe that


T[x

] = T[limx
n
] = limT[x
n
] = limx
n+1
= x

,
where we have used the continuity of T to pull the limit outside the argument of T.
To show that the xed point is unique, suppose x, y are both xed points of T.
Then
|x y| = |T[x] T[y]| C |x y|,
or
(1 C) |x y| 0.
Since 1 C > 0, we deduce |x y| 0. Then by (3.13i) we obtain x = y.
3.2.5 Proof of the existence theorem
To prove Theorem 3.2.1 we will construct a solution of (3.7) by nding a xed point
of a mapping based on the integral equation (3.12), as follows: Choose a ball B(b, ),
a neighborhood of b in R
d
, such that (a) the closure

B(b, ) is contained in the open
set | and (b) the restriction of F to

B(b, ) is Lipschitz: in symbols
(a)

B(b, ) | and (b) F[

B(b, ) is Lipschitz continuous. (3.17)


Let (([, ], R
d
) be dened by
= x (([, ], R
d
) : t [, ] [x(t) b[ ;
68
in other notation, we could also write
= x (([, ], R
d
) : |x b|
in which formula b denotes the constant function that, at every point t, equals
the vector b R
d
. In words, B(b, ) is the ball in the Euclidean space R
d
of
radius around the vector b, while is the ball in the innite dimensional space
(([, ], R
d
) of radius around the constant function b. Then for any x , the
integrand F(x(s)) in (3.12) makes sense and is a continuous function of s. Hence we
may dene a mapping from into (([, ], R
d
), in symbols T : (([, ], R
d
),
by the RHS of (3.12): i.e.,
T[x](t) = b +
_
t
0
F(x(s)) ds, t . (3.18)
This formula may involve a way of thinking unfamiliar to the reader since T is a
mapping between subsets of (innite-dimensional) function spaces. Thus the argu-
ment of T is a function, written x without any argument, and the result T[x] is also
a function. To know what function T[x] is, we have to be told its value for every
point t [, ], and that is what (3.18) gives us.
The following two claims will allow us to apply Theorem 3.2.7 to extract a xed
point of T in (([, ], R
d
). Application of Proposition 3.2.5 above then shows that
this xed point is a solution of (3.7) on the open interval (, ). In fact our proof of
Theorem 3.2.1 shows a little more than what is claimedi.e., the solution we obtain
is actually continuous on the closed interval [, ].
Claim 1: If is suciently small then for any x , the image T[x] belongs
to .
In other words, although as originally dened the range of T was (([, ], R
d
),
by reducing if needed, we may regard T as a mapping into .
Proof. We need to show that for any x ,
|T[x] b| .
From (3.12) we compute that
(T[x] b)(t) =
_
t
0
F(x(s)) ds,
so
[T[x] b[(t)
_
[0,t]
[F(x(s))[ ds.
(We replace limits of integration by the interval [0, t] to cover the case when t may
69
be negative; even if t < 0, in our notation
_
[0,t]
ds = [t[ > 0.)
Let
K = max
z

B(b,)
[F(z)[
Since x , the integrand above satises [F(x(s))[ K. Thus observing that
_
[0,t]
ds , we conclude that [T[x] b[(t) K, so the claim will be satised
provided is chosen such that K .
Claim 2: If is suciently small, then T is a contraction.
Proof. By (3.17b), there is a Lipschitz constant L for F over

B(b, ). Let x, y
be given. From (3.12)
[T[x] T[y][(t)
_
[0,t]
[F(x(s)) F(y(s))[ ds.
By the Lipschitz property,
[T[x] T[y][(t) L
_
[0,t]
[x(s) y(s)[ ds.
Of course
[x(s) y(s)[ |x y|,
and estimating
_
[0,t]
ds , we deduce that
|T[x] T[y]| L|x y|.
Thus the claim follows if L < 1.
3.2.6 An illustrative example
Unscrambling the proof of the xed point theorem, we see that the construction of
the solution of the IVP ultimately comes down to the limit of an iterated sequence:
x
0
is chosen arbitrarily (e.g., x
0
(t) b) and subsequent xs are chosen iteratively:
x
n+1
= T[x
n
]. (3.19)
Let us compute the iterates for the simplest of scalar IVPs,
x

= x, x(0) = 1.
Equation (3.19) becomes
x
n+1
= 1 +
_
t
0
x
n
(s) ds.
70
If we choose x
0
(t) 1, then we nd
x
n
(t) = 1 +t +
1
2!
t
2
+ . . . +
1
n!
t
n
.
In other words, the n
th
iterate is just the polynomial approximation of degree n to the
exponential e
t
. Thus, the iteration works very well indeed for this simple example.
Some authors prove Theorem 3.2.1 directly by iteration of the integral equation
(3.12), which is called Picard iteration. This approach avoids the abstractness of
the contraction-mapping principle. However, in our view the contraction-mapping
formalism claries the proof by isolating exactly what one needs to show to guarantee
that the iteration may be continued indenitely and convergesi.e., Claims 1 and 2
above.
3.2.7 Concluding remark
The following two results are mildly interesting in their own right, but more impor-
tant, they are useful as tools in certain proofs below. In Exercise 1 you are given
hints for proving both of them.
The rst shows that, subject to a continuity hypothesis, two solutions of an ODE
on adjacent intervals may be pasted together to give a solution on a larger interval.
Lemma 3.2.8. Suppose x
1
, x
2
are solutions of an ODE such that
(i) x
1
is continuous on (, ] and satises x

= F(x) on (, )
(ii) x
2
is continuous on [, ) and satises x

= F(x) on (, ).
If x
1
() = x
2
(), then the denition
x(t) =
_
x
1
(t) if < t
x
2
(t) if t <
yields a solution of the ODE on the combined interval (, ).
Corollary 3.2.9. If x is a (
1
solution of an ODE in an interval (, ) that is
continuous on the closed interval [, ], then x may be extended to a solution of the
equation in a slightly larger interval ( , + ).
3.3 The uniqueness theorem
3.3.1 Gronwalls Lemma
Our rst result, Gronwalls Lemma, is a simple inequality, but it provides extremely
useful estimates for solutions of an ODE. In particular, we will use it to derive the
uniqueness result Theorem 3.3.4 below.
71
Lemma 3.3.1. Let g : [0, T] R be continuous, and suppose there are non-negative
constants C, K such that
g(t) C + K
_
t
0
g(s) ds, 0 t T. (3.20)
Then
g(t) Ce
Kt
, 0 t T. (3.21)
Here is a corollary of Gronwalls Lemma whose hypotheses are more intuitive: if
g is dierentiable and satises
g

Kg, g(0) C, (3.22)


then g is bounded by an exponential as in (3.21). This could easily be proved directly,
and it follows the lemma because integration of condition (3.22) yields condition
(3.20). However, Gronwalls inequality does not require that g be dierentiable, and
this makes application of the lemma much more exible.
Proof of Lemma 3.3.1. We dene a function by the RHS of (3.20),
G(t) = C + K
_
t
0
g(s) ds.
The function G is (
1
, and it satises
(a) g(t) G(t) and (b) G

(t) = Kg(t). (3.23)


Applying Leibniz Rule to dierentiate the product e
Kt
G(t), invoking (3.23b) to
calculate G

, and recalling (3.23i), we compute


d
dt
_
e
Kt
G(t)

= e
Kt
Kg(t) Ke
Kt
G(t) = Ke
Kt
[g(t) G(t)] 0.
Thus, e
Kt
G(t) is nonincreasing, so
e
Kt
G(t) G(0) = C. (3.24)
From this it follows that
g(t) G(t) Ce
Kt
where we have again invoked (3.23a) for the rst inequality and multiplied (3.24) by
e
Kt
for the second.
Remark: It is a minor generalization of Gronwalls Lemma to relax the hypotheses
to assume that g is merely piecewise continuous. (See Exercise 5.)
72
3.3.2 More on Lipschitz functions
Proposition 3.3.2. Suppose | R
d
1
is open and F : | R
d
2
is locally Lipschitz.
Then for any compact set / |, the restriction F[/ is (globally) Lipschitz.
Proof. For each x /, choose a ball B(x,
x
) such that (i) its closure

B(x,
x
) is
contained in | and (ii) F is Lipschitz continuous on

B(x,
x
). The collection
B(x,
x
/2) : x |
is an open cover of /. (Note that the radii here have been halved.) Choose a nite
subcover of /, say B(x
j
,
j
/2), j = 1, 2, . . . , J; let
j
be a Lipschitz constant for F
on B(x
j
,
j
) (radius not halved); and let
L
1
= max
j=1,...,J

j
, L
2
= 4 max
xK
[F(x)[/ min
j

j
.
Let us show that F is Lipschitz over / with Lipschitz constant L = maxL
1
, L
2
.
To prove this, suppose x, y /. The rst point x belongs to one of the balls in the
nite subcover, say x B(x
k
,
k
/2). We consider two cases: (i) If y belongs to the
full-radius ball B(x
k
,
k
) with the same index, then by construction
[F(x) F(y)[
k
[x y[ L
1
[x y[.
(ii) If y lies outside B(x
k
,
k
), then it may be seen from Figure 3.1 that
[x y[
k
/2. (3.25)
(In the Exercises we ask the reader to supply an analytical justication of this in-
equality.) Of course
[F(x) F(y)[ 2 max
K
[F[; (3.26)
we multiply the RHS of (3.26) by 2[x y[/
k
, a quantity that by (3.25) is greater
than unity, to obtain
[F(x) F(y)[ 4 max
K
[F[
[x y[

k
L
2
[x y[.
For future use let us record a corollary of the compactness construction in the
above argument.
Corollary 3.3.3. If / | R
d
where / is compact and | is open, then there is
a larger compact set /

| and a > 0 such that for every x /, the ball B(x, )


is contained in /

.
73

k
2
x
k

k
y
x
Figure 3.1: Illustrating the validity of (3.25).
Proof. Exercise.
3.3.3 The uniqueness theorem
It is convenient to have a uniqueness theorem with minimal hypotheses. For that
reason we introduce the following, apparently weaker, notion of a solution of an IVP:
By a solution of (3.7) in forward time we mean a continuous function x : [0, ) |
with x(0) = b that is continuously dierentiable on the open interval (0, ) and
satises the ODE there.
The greater generality of the above denition is only apparentgiven a solution x
in forward time, it follows from Corollary 3.2.9 that x has an extension to a solution
on an open interval (, ) that contains the origin. Thus, in particular, if x is a
solution of the IVP in forward time, then the derivative x

is continuous even at
t = 0.
This information is useful for the integral equation characterizing solutions of the
IVP,
x(t) = b +
_
t
0
F(x(s)) ds, 0 < t < . (3.27)
The example
x(t) = t sin
_
1
t
_
, x

(t) =
1
t
cos
_
1
t
_
+ sin
_
1
t
_
shows that, in general, if x is continuous on [0, ) and dierentiable on (0, ), then the
integral
_
t
0
x

(s) ds may actually be an improper integral. However, if x is a solution


74
of x

= F(x) in forward time, then x

is continuous at t = 0, and the interpretation


of the integral in (3.27) raises no such diculties.
Theorem 3.3.4. Suppose that F : | R
d
in (3.7) is locally Lipschitz. Let x
1
, x
2
be two solutions in forward time of the initial value problem (3.7), say dened for
0 t <
j
, j = 1, 2. Then for all t in the range 0 t < min
1
,
2
where both
solutions are dened, x
1
(t) = x
2
(t).
Remark: Although we proved existence only for a short time interval, the unique-
ness result applies to any interval over which the IVP happens to have a solution,
no matter how long.
Proof. We want to apply Gronwalls Lemma to the function
3
to g(t) = [x
1
(t)x
2
(t)[.
Both solutions satisfy integral equations as in (3.27). Subtracting these and canceling
the constant terms, we have
x
1
(t) x
2
(t) =
_
t
0
[F(x
1
(s)) F(x
2
(s))] ds, 0 t < min
1
,
2
. (3.28)
Thus
g(t) = [x
1
(t) x
2
(t)[
_
t
0
[F(x
1
(s)) F(x
2
(s))[ ds. (3.29)
Temporarily we restrict t to an interval [0, T] where T < min
1
,
2
. Since [0, T]
is compact, so is the union of images, / = x
1
([0, T]) x
2
([0, T]). Therefore by
Proposition 3.3.2 there is a Lipschitz constant L for F on /. Hence for 0 s T,
the integrand on the RHS of (3.29) may be estimated
[F(x
1
(s)) F(x
2
(s))[ L[x
1
(s) x
2
(s)[ = Lg(s).
Substituting into (3.29) we see that
g(t) L
_
t
0
g(s) ds. (3.30)
Thus from Gronwalls inequality with C = 0 we deduce that g(t) 0 for 0 t T.
But g is non-negative, so g 0, and thus x
1
(t) = x
2
(t) for 0 t T. Finally, we
3
Note that the absolute-value function is not dierentiable, so it is possible that g is not dieren-
tiable. The weak hypothesis in Gronwalls Lemma greatly simplies the proof of the Theorem.
This is just one instance of how proofs in ODE, which is an old subject, have been polished
over the years. Indeed, beware of reading through this and other proofs too quickly and missing
the cleverness. For example, in this proof, before applying Gronwalls Lemma, we prepare for it
by (i) restricting t to a large, closed subinterval of [0, ) to obtain compactness and (ii) invoking
Proposition 3.3.2 to derive a Lipschitz constant that works on all of [0, T]. In this way we are
able to prove uniqueness for as long as a solution exists.
75
may take T arbitrarily close to min
1
,
2
, so we have equality for all t where both
solutions are dened.
Remark 1: We could derive a uniqueness result for two-sided solutions of (3.7) by
modications in the above proof, but there is a trick that requires even less eort. Let
us dene a solution in backwards time of an IVP by making the obvious modications
of the forward-time concept. Note that x(t) is a solution in backward time if and only
the function y(t) = x(t) is a solution in forward time of the equation y

= F(y).
Since F is locally Lipschitz if F is, by applying Theorem 3.3.4 to the IVP for
y

= F(y) in forward time, we may derive uniqueness for the original equation
in backward time. Of course, uniqueness in both forward and backward time gives
uniqueness of two-sided solutions.
Remark 2: Because of the uniqueness theorem, two solutions of x

= F(x) can
never cross one another.
Remark 3: The uniqueness theorem states that two solutions of x

= F(x)
that start from the same initial conditions coincide for as long as they both exist.
Gronwalls Lemma may also be used to show that two solutions of x

= F(x) that
start from nearby initial conditions diverge from one another at most exponentially
fast. (Cf. Theorem 4.4.1.) We wait until Chapter 4 to pursue this since a more
satisfactory result can be obtained using information about global existence that
will be developed in Section 4.2.
3.4 Generalizations
3.4.1 Nonautonomous systems
Both the existence and uniqueness theorems generalize to nonautonomous IVPs, say
x

= F(x, t), x(0) = b. (3.31)


Suppose F is dened on | 1 where | R
d
is open and 1 R is an open interval.
To avoid trivialities, we assume that b | and 0 1. We shall say that F is locally
uniformly Lipschitz if for every (x
0
, t
0
) | 1 there is a neighborhood 1 of
(x
0
, t
0
) and a constant L such that
(x
1
, x
2
1) (t ) [F(x
1
, t) F(x
2
, t)[ L[x
1
x
2
[. (3.32)
Theorem 3.4.1. If F : | 1 R
d
is locally uniformly Lipschitz, then there is an
interval (, ) and a (
1
function x : (, ) | that satises (3.31).
Theorem 3.4.2. Suppose that F : | 1 R
d
is locally Lipschitz. Let x
1
, x
2
be two solutions in forward time of the initial value problem (3.31), say dened for
0 t <
j
, j = 1, 2. Then for all t in the range 0 t < min
1
,
2
, x
1
(t) = x
2
(t).
76
Both these results may be proved by imitating the analogous proof for the au-
tonomous case. Indeed, adapting the proofs of Theorems 3.2.1 and 3.3.4 to nonau-
tonomous problems is probably a better way to understand the autonomous case
than just reading the proofs given in the text. The Exercises invite you to perform
this task.
Note that, to shorten the notation in (3.31), we have imposed the initial condition
at t = 0. Unlike for autonomous equations, solutions of nonautonomous equations
do not have translational invariance. Thus, imposition of an initial condition at a
dierent time, say x(t
0
) = b, is strictly speaking a dierent problem. However, no
real generality is lost by assuming t
0
= 0 in (3.31) since the general case can easily
be reduced to (3.31) by appropriate translation.
3.4.2 Linear systems
Stronger results are available for linear systems with variable coecients, say an IVP
x

= A(t)x +g(t), x(0) = b. (3.33)


Theorem 3.4.3. Suppose the coecient matrix A(t) and the inhomogeneous term
g(t) in (3.33) are continuous in the (possibly innite) open interval (T
1
, T
2
) which
contains t = 0; then this IVP has a unique solution that exists for T
1
< t < T
2
.
In particular, solutions of a linear system do not blow-up in nite time, no matter
how quickly |A(t)| or [g(t)[ may grow with t. Also note that Lipschitz continuity
need not be explicitly assumed in the theorem.
Like Theorem 3.2.1, this result may be proved with Picard iteration. It turn
out that, for a linear problem, the iteration converges for arbitrarily large times!
We give hints for showing this in Exercise 4. We recommend the exercise since it
provides a useful perspective that will enhance your understanding of the proof of
Theorem 3.2.1.
3.5 Exercises
3.5.1 Exercises to consolidate your understanding
1. Supply details omitted in the text:
(a) Consider the scalar IVP
x

= [x[
p
, x(0) = b,
where p > 1 and b > 0. By solving the problem explicitly, show that the
solution blows up in nite time. What is the behavior of the solution as t
if b < 0?
77
(b) Verify that the functions (3.5) are (
1
solutions of (3.4).
(c) Show that F(x) =
_
[x[ is not Lipschitz on R. Show that F(x) = [x[ is
(globally) Lipschitz on R but not dierentiable. Show that F(x) = x
2
is locally
Lipschitz but not globally Lipschitz.
(d) Prove Proposition 3.2.5.
(e) Show that the denition (3.14) satises the axioms (3.13).
(f) Show that, since C < 1, equation (3.16) implies that the sequence x
n
is
Cauchy.
(g) Prove Lemma 3.2.8.
Hint: You may proceed in either of two ways: You can show that
the one-sided limits of x

are equal at and proceed from there, or


alternatively you can show that the integral equation (3.12) holds for
all t (, ), including for t = , and then invoke Proposition 3.2.5.
(h) Prove Corollary 3.2.9.
Hint: It suces to consider the upper end-point t = . Apply
Theorem 3.2.1 to solve an IVP y

= F(y) on < t < + with


initial condition y() = x(). Use Lemma 3.2.8 to obtain a solution
on [, + ).
(i) Give an analytical proof of (3.25).
(j) Prove Corollary 3.3.3
2. Are the following functions Lipschitz continuous on the indicated sets? (All of
these examples are scalar-valued functions. Vector-valued functions would not
pose any additional diculties except the need to examine more components.)
(a) (x + 2)
1/3
on the interval [1, 1]? On R?
(b) (x
2
+ 2)
1/3
on the interval [1, 1]? On R?
(c) (x
2
+ 2)
1/3
sin(e
x
) on the interval [1, 1]? On R?
(d) [x + y
2
2[ on the square [x[ 2, [y[ 2? On R
2
?
Hint: Observe that the function is the composition of x
2
+ y 2
with the absolute value function. Show that the composition of two
Lipschitz functions is Lipschitz continuous. Thus, [x + y
2
2[ is
Lipschitz on the indicated set if x + y
2
2 is.
(e)

x
2
+ 1/(x
2
+y
2
1) on the bounded annulus 2 x
2
+y
2
8? On the
unbounded annulus 2 x
2
+ y
2
< ?
78
Discussion: The indicated function is certainly not Lipschitz on R
2
;
indeed because the denominator goes to zero on the unit circle, this
function is not even continuous. Propositions 3.2.2 and 3.3.2 may
be used to show this function is Lipschitz on the bounded annulus.
More thought is required to analyze the unbounded annulushave
fun!
3. (a) Prove Theorem 3.4.1.
(b) Prove Theorem 3.4.2.
Hint: Structure your proofs by following the proofs of Theorems 3.2.1
and 3.3.4. For Theorem 3.4.2 you will have to prove an analogue
of Proposition!3.3.2 regarding the restriction of a locally uniformly
Lipschitz function to a compact set. Please do these problemsthey
are more useful for understanding the proofs in the text than reading
the text.
4. This exercise addresses the derivation of the special behavior of linear equa-
tions, Theorem 3.4.3. According to Proposition 3.2.5, a continuous function
x : (T
1
, T
2
) R
d
satises (3.33) if and only if it satises the integral equation
x(t) = b +
_
t
0
[A(s)x(s) +g(s)] ds, T
1
< t < T
2
. (3.34)
We outline an argument that (3.34) has a unique solution for t 1, where 1
is an arbitrary compact subinterval of (T
1
, T
2
), which will establish the desired
result.
Let us extract a linear operator L : ((1, R
d
) ((1, R
d
) from (3.33),
L[x](t) =
_
t
0
A(s)x(s) ds.
We may then rewrite (3.33) as
(I L)x = b +G
where G(t) =
_
t
0
g(s) ds. The crux of the argument is to show that I L is
invertible on the space ((1, R
d
) by proving that the innite series
(I L)
1
= I +L +L
2
+L
3
, . . .
79
converges. For this, rst show that
L
n
[x](t) =
_
t
0
ds
1
_
s
1
0
ds
2
. . .
_
s
n1
0
ds
n
A(s
1
)A(s
2
) . . . A(s
n
)x(s
n
).
Let K = max
tI
|A(t)|. Estimating maxima in the above equation, conclude
that as an operator on ((1, R
d
),
|L
n
| K
n
_
t
0
ds
1
_
s
1
0
ds
2
. . .
_
s
n1
0
ds
n
K
n
[1[
n
/n!
where [1[ denotes the length of 1. Thus one may prove that the series for
(I L)
1
converges by comparison with the series for e
K|I|
.
3.5.2 Exercises used elsewhere in this book
5. In this Exercise we derive two generalizations of Gronwalls inequality, Lemma
3.3.1.
(a) Show that the estimate (3.21) continues to hold if g is assumed merely to
be piecewise continuous.
Discussion: A function g is called piecewise continuous on an interval
1 if there is a nite set of points a
j
: j = 1, . . . , J in 1 such that
(i) g is continuous on 1
j
a
j
and (ii) at each point a
j
the one-
sided limits of g exist (and are nite).
(b) Show that if g : [0, T] R is continuous and if there are non-negative
constants C, B, K such that
g(t) C + Bt + K
_
t
0
g(s) ds, 0 t T.
Then
g(t) Ce
Kt
+ B
e
Kt
1
K
, 0 t T. (3.35)
Hint: One approach for Part (b) is to imitate the proof of Lemma 3.3.1.
A more elegant alternative, which does not require re-emaninging the
proof of Lemma 3.3.1, involves applying Gronwalls inequality to the
function h(t) = g(t) + B/K.
80
3.5.3 Computational exercise
6. Consider the linear system
x

= y
y

= (1/4 + cos t)x


which comes from writing Mathieus equation (1.5b) as a rst-order system,
assuming = 1/4. Solve an IVP for this equation, say with = 0.02, over
a long time interval, say 0 t 1000. Exact initial conditions dont matter
greatly, but if you want a suggestion, try x(0) = 1, y(0) = 0.
Discussion: This exercise is intended as an antidote to complacency:
Based on Theorem 3.4.3 one might regard linear equations as boring,
but contrast the behavior you nd here with that of a constant-
coecient analogue of Mathieus equation, x

+x/4 = 0. Incidentally,
note that the coecients in the above system are periodic functions
of time. Linear systems with periodic coecients are analyzed in
Floquet theory, which we discuss in Chapter 6.
3.5.4 Exercises of independent interest
7. (a) Find the general solution of the equations
tx

x = 0
for t > 0. This may be solved either as a rst-order linear equation or as a
separable equation.
Discussion: Note that writing this equation in standard form yields
x

= F(x, t) where F(x, t) = x/t is singular at t = 0. The remainder


of this exercise illustrates how badly such equations may behave.
(b) Deduce from your solution in Part (a) that the IVP
tx

+ x = 0, x(0) = b
has no solutions.
(c) Deduce from your solution in Part (a) that the IVP
tx

x = 0, x(0) = b
has no solutions if b ,= 0 and has innitely many solutions if b = 0..
81
8. Here is a physical situation that gives rise to the backwards version of our
nonuniqueness example, (3.4). Consider a partially lled bucket that has a
hole in its bottom. Need gure? Under certain simplifying assumptions the
height h(t) of the water in the bucket satises
dh
dt
= C

h.
We refer to p191 of Hubbard and West for details, but briey the derivation
is (i) dh/dt is proportional to the speed v with which the water emerges from
the bucket and (ii) if friction is neglected, the kinetic energy (i.e., v
2
) of the
emerging water is proportional to the loss of potential energy (i.e., h). Without
loss of generality we may scale time so that C = 1 in this equation.
(a) Apply separability to solve the IVP
dx
dt
=

x, x(0) = b (3.36)
where b > 0. Observe that after some nite time, x(t) reaches zero. On physical
grounds one knows x(t) 0 for all later times, and this of course satises the
equation.
(b) Argue that from Theorem 3.3.4 the solution of (3.36) is unique in backwards
time.
(c) Show that the solution of (3.36) is unique in forward time.
Hint for Part (c): On physical grounds we need consider only non-
negative solutions of this problem. Suppose that x(y) and y(t) are
both (
1
, non-negative solutions of (3.36). Then
d
dt
[x(t) y(t)]
2
= 2[x(t) y(t)][
_
x(t) +
_
y(t)] 0
where the inequality may be derived by considering x y and y x
as two separate cases. Thus [x(t) y(t)]
2
[x(0) y(0)]
2
= 0.
Remark: The IVP (in forward time) for (3.4) may be interpreted in
the present context as, Suppose you come into the room and see
the bucket is empty; how full was the bucket an hour ago? In other
words, of course the solution in backwards time is not unique.
9. Here is a more dramatic example that a Lipschitz function need not be continu-
ously dierentiable: Dene the saw-tooth function as the periodic extension
(with period 2) of
S(x) = [x[ if [x[ 1.
82
Let
F(x) =

n=1
1
n
3
S(nx).
Show that F is Lipschitz continuous with Lipschitz constant L =

1
n
2
.
Discussion: It is clear that F is not (
1
indeed F

fails to exist at all


rational values of x. It might seem like F is nowhere dierentiable,
but in fact this is not truethe limit dening the derivative F

exists
at all irrational points. Although it is not important for ODE, you
might enjoy proving this.
3.6 Additional notes
Although we have assumed Lipschitz continuity of F to prove the existence of so-
lutions of x

= F(x), this is not necessaryin fact continuity of F suces. This is


proved, for example, in Birkho-Rota, Chapter 6, Section 14. (Regarding uniqueness,
as we saw in Section 4.1, Lipschitz continuity is necessary.)
Incidentally, Lipschitz continuous is only slightly weaker than dierentiability. In
particular, a Lipschitz continuous function is dierentiable almost everywhere
this concept has a precise mathematical meaning in terms of the theory of Lebesgue
integration. (Cf. Problem 9.)
83
Chapter 4
Nonlinear systems: global theory
Theorem 3.2.1 proved that a solution to the IVP exists for what might be an ex-
tremely short time. Typically the ODEs that arise in physical applications possess
solutions for much larger times than can be deduced with the contraction-mapping
principle, and in this chapter we introduce methods for demonstrating this behav-
ior. The technique is based extending a short-time solution obtained from Theorem
3.2.1. The main tool for such extensionsTheorem 4.1.2is proved in Section 4.1.
Two results that guarantee that an IVP in fact has a solution for all time are given
in Section 4.2. In Section 4.3 we introduce nullclines, which are an important tech-
nique in their own right and are also useful in applying the existence theorem from
Section 4.2.
In Sections 4.4 and 4.5 we show that the solution of an IVP depends continuously
and dierentiably on its initial conditions. While theorems of this type could have
been formulated in Chapter 3, by using Theorem 4.1.2 we are able to obtain much
more satisfactory results.
4.1 The maximal interval of existence
Our rst results asserts that there is a maximal interval for which the solution of an
IVP exists, a sort of gold standard for solutions.
Proposition 4.1.1. Let F : | R
d
be locally Lipschitz on | R
d
. Given b |,
there is a solution x

: (

) | of the IVP
x

= F(x), x(0) = b (4.1)


that is maximal in the following sense: If x solves (4.1) for t in some open interval
1, then
(i) 1 (

) and (ii) x(t) = x

(t) if t 1. (4.2)
84
Remark: It often happens that either

or

, or both, equals innity. For


example, for the IVP (3.2), the maximal interval of existence is (, 1). Note that
the maximal interval of existence is always open, even if

or

is nite.
Proof. We focus only on

and t 0, leaving the analogous treatment of

and
t 0 for the dedicated reader. Let

= sup : (4.1) is solvable in forward time for 0 t < .


Of course by Theorem 3.2.1,

> 0. We consider separately the cases when

is
nite or innite.
Case 1:

< . For n = 1, 2, . . . choose solutions x


n
of (4.1) that exist for
times t [0,
n
) where
n
>

1/n. To dene x

, given t [0,

) choose any n
such that
n
> t and let
x

(t) = x
n
(t). (4.3)
By Theorem 3.3.4, the uniqueness result, the denition (4.3) does not depend on the
choice of n, and moreover x

is a solution of (4.1). It is readily checked that any


solution x of (4.1) on some interval 1 satises properties (i) and (ii) of (4.2).
Case 2:

= . In this case one may choose a sequence of solutions existing


for times
n
> n and proceed as above.
Despite its somewhat wimpy appearance, the following result is the main
workhorse in extending solutions to larger times. Of course there is an analogous
result for negative time.
Theorem 4.1.2. Suppose x

, the maximal solution of (4.1), exists only for times


t <

where

< . Then for any compact set / |, there is a time t [0,

)
such that x

(t) / /.
Proof. We argue by contradiction: Suppose that x

(t) / for all t [0,

). Recall
from Proposition 3.2.5 that x

satises the integral equation


x

(t) = b +
_
t
0
F(x

(s)) ds, 0 t <

.
The integrand F(x

(s)) is bounded on [0,

) by max
K
[F[. The existence of the
one-sided limit
I = lim
t

_
t
0
F(x

(s)) ds (4.4)
follows easily from this fact (see hint in Exercise 1a). Thus, the denition x

) =
b + I gives an extension of x

to the closed interval [0,

] that is continuous there.


By Corollary 3.2.9, the IVP (4.1) has a solution on a larger interval [0,

+ ),
contradicting the assumption that

was maximal.
85
In the next result we strengthen the conclusion of Theorem 4.1.2 from there
exists a time . . . to for all suciently large times . . . .
Corollary 4.1.3. Suppose x

, the maximal solution of (4.1), exists only for times


t <

where

< . Then for any compact set / |, there is a time <

such that x

(t) / / for all t with < t <

. In particular, if F(x) is dened for all


x R
d
, then for the maximal solution [x

(t)[ tends to innity as t

.
In the Exercises we provide the reader hints for proving the corollary, but we
challenge him/her to try to prove it without consulting the hints. In teaching the
course, we have found that student eorts to do this are highly educational.
4.2 Two sucient conditions for global existence
4.2.1 Linear growth of the RHS
Our rst result gives existence for all times, positive and negative.
Theorem 4.2.1. If F : R
d
R
d
is locally Lipschitz and if there exist nonnegative
constants B, K such that
[F(x)[ K[x[ + B, x R
d
, (4.5)
the solution x(t) of (4.1) exists for all time, < t < , and moreover
[x(t)[ [b[e
K|t|
+
B
K
(e
K|t|
1), < t < . (4.6)
Proof. This proof will use the generalization of Gronwalls Lemma given in Exercise 5
in Chapter 3. We consider only forward time, t 0. Suppose (4.1) has a solution
for t [0, ), which of course satises the integral equation
x(t) = b +
_
t
0
F(x(s)) ds, 0 t < .
Dening g(t) = [x(t)[, we deduce that
g(t) [b[ +
_
t
0
[Kg(s) + B] ds, 0 t < .
Hence by the generalized Gronwall Lemma, x satises the estimate (4.6) for its entire
domain of existence, 0 t <
Now let x

be the maximal solution of (4.1), and suppose

< . According
86
to (4.6), x

(t) belongs to the compact ball


/ = z R
d
: [z[ [b[e
K
+
B
K
(e
K
1)
for all t [0,

). This estimate contradicts Theorem 4.1.2, so we must have

innite.
Like many other results in ODE, this theorem has an analogue for a nonau-
tonomous equation, x

= F(x, t)see Exercise 4. Indeed, adapting the proof of


Theorem 4.2.1 to the nonautonomous case may be a better way to understand the
proof than simply reading it.
4.2.2 Trapping regions
Our second result gives global existence, but only in forward time, for an ODE that
need be dened only on an open subset | of R
d
. Let / be a compact subset of |
whose boundary is a (
1
surface, and for any x /, let ^
x
be an inward normal
to / at x. We shall call / a trapping region for x

= F(x) if
(x /) F(x), ^
x
) 0 (4.7)
where , ) is the inner product. In words, the direction of the ow of the ODE at /
is inward, or at worst tangential or zero. The following existence theorem explains
the reason for the name trapping region.
Theorem 4.2.2. Suppose that F : | R
d
is (
1
and that / is a trapping region
for x

= F(x). If the initial data b lies in the interior of /, then the solution x to
equation (4.1) exists for all positive time and moreover lies in the interior of /.
In less formal language, we may restate the theorem: If in a picture it looks like
the solution is conned to a region /, then it really is conned to that region. In the
next section we use it to derive global existence for several specic ODEs. Other
applications of the theorem are given in the Exercises. In Exercise 6 you will show
that global existence is still obtained if b belongs to the boundary of a trapping
region.
The proof is much easier if the inequality (4.7) is strict, so we begin by considering
separately the special case with the augmented hypothesis
(x /) F(x), ^
x
) > 0. (4.8)
Breaking the proof into cases makes it longer but (we hope) more readable.
Case A: Proof of Theorem 4.2.2 assuming (4.8). Let
t

= supt : (s t) x(s) Int / < . (4.9)


87
x = (x )
1 2
x
2
x
1
K of
boundary
x
*
N
V
Figure 4.1: Schematic of a trapping region / and the inward normal vector at a
hypothetical exit point x

on the boundary of /. The neighborhood 1 and the function


of (4.17) are also shown.
By the local existence theorem, t

> 0. We suppose t

< and look for a contra-


diction. We ask you to prove that x(t

) /. For brevity we write x

= x(t

) and
^

= ^
x
.
Since / is smooth, it may be described locally as the zero set of a smooth
function. Thus, there is a neighborhood 1 of x

and a function on 1 such that


/ 1 = x 1 : (x) = 0. (4.10)
Now is a normal to /, and replacing by if necessary, we may assume that
is an inward normal. Thus, is non-negative on /, or in symbols
/ 1 = x 1 : (x) 0.
By continuity, there is an interval [t

, t

] such that for t in this interval x(t) lies


in the neighborhood 1 of x

. We derive a contradiction by considering the function


g(t) = (x(t)), t

t t

.
On the one hand, g(t) 0 for t

t < t

while g(t

) = 0, and dierentiating
g

(t

) = lim
0
+
g(t

) g(t

)
()
0. (4.11)
On the other hand, by the chain rule
g

(t

) = (x

), x

(t

)) = (x

), F(x

)). (4.12)
Combining (4.11) and(4.12), we deduce that (x

), F(x

)) 0, which contradicts
88
(4.8).
If the hypothesis (4.8) is eliminated, the strategy of the proof remains the same,
but we have to work harder to derive a contradiction. To prepare for the general
case, let us rst prove the result in one dimension.
Case B: Proof of Theorem 4.2.2 in one dimension assuming only (4.7). The hypoth-
esis that a closed interval / = [a
1
, a
2
] is a trapping region for a scalar ODE x

= F(x)
means that
F(a
1
) 0 and F(a
2
) 0. (4.13)
In analogy with (4.9), we suppose that
t

= supt : (s t) a
1
< x(s) < a
2
< (4.14)
and look for a contradiction.
First we focus on a
1
. Observe from the Fundamental Theorem of Calculus that
F(x) = F(a
1
) +
_
x
a
1
F

(y)dy.
Using (4.13) to drop F(a
1
), we deduce that
F(x) K(x a
1
) for x [a
1
, a
2
] (4.15)
where K = max
[a
1
,a
2
]
[F

[. Then considering the solution x(t) on the interval 0 t


t

, we compute with Leibniz rule that


d
dt
[e
Kt
(x(t) a
1
)] = e
Kt
F(x(t)) +Ke
Kt
(x(t) a
1
) 0,
where for the nal inequality we have invoked (4.15) to estimate the rst term.
Therefore
1
x(t

) a
1
(x(0) a
1
)e
Kt
> 0.
Applying similar considerations near x = a
2
, we conclude that if t

< , then
a
1
< x(t

) < a
2
.
This contradiction proves the one-dimensional result.
Case C: Proof of Theorem 4.2.2 in d dimensions assuming only (4.7). As in Case A,
suppose that t

dened by (4.14) satises t

< . Again we characterize / near


1
This estimate has much in common with the proof of Gronwalls Lemma (3.3.1), but note that
here we are estimating a solution from below rather than from above.
89
x

as the zero set of a function as in (4.10). However, to carry out the necessary
estimates it is desirable to massage into a special form as follows.
Relabeling the components of x and reecting x
1
if necessary, we may assume
without loss of generality that the rst component of the inward normal (x

) to
/ at x

is positive (see Figure 4.1); in symbols


(x

), e
1
) =

x
1
(x

) > 0. (4.16)
Then by the Implicit Function Theorem, we may solve the equation
(x
1
, x
2
, . . . , x
d
) = 0
for x
1
as a function of the remaining coordinates: i.e., there is a neighborhood 1 of
x

, say with 1 |, and a (


1
function (x
2
, . . . , x
d
) on 1 such that
/ 1 = (x
1
, x) 1 : x
1
= ( x) (4.17)
where x is a shorthand for (x
2
, . . . , x
d
). Thus, if

indicates a (d 1)-component
vector of partial derivatives with respect x
2
, . . . , x
d
, then ^ = (1,

( x)) is normal
to /; and since by assumption the rst component of the inward normal is positive,
^ is an inward normal. Moreover, the set / lies in the direction of increasing x
1
;
hence
(Int /) 1 = (x
1
, x) 1 : x
1
> ( x).
Let
g(t) = x
1
(t) ( x(t)), t

t t

(4.18)
where > 0 is chosen so that x(t) 1 for t in this interval. Then g(t) > 0 for
t

t < t

while g(t

) = 0. Now it follows from the chain rule that


g

(t) = G(x(t)) (4.19)


where
G(x) = F
1
(x)
d

j=2

x
j
( x)F
j
(x). (4.20)
Note that if x = (( x), x) / , then
G(x) = F(x), ^
x
) 0, (4.21)
the latter inequality by the hypothesis (4.7).
90
Lemma 4.2.3. There is a constant K such that for all x / 1
G(x) K[x
1
( x)].
Proof. We add and subtract G(( x), x) to G(x):
G(x
1
, x) = [G(x
1
, x) G(( x), x)] + G(( x), x). (4.22)
By (4.21) the second term here is non-negative; dropping it we have
G(x
1
, x) G(x
1
, x) G(( x), x). (4.23)
By the fundamental theorem of calculus
2
, the RHS of (4.23) may be rewritten
G(x
1
, x) G(( x), x) =
_
x
1
( x)
G
x
1
(s, x) ds K[x
1
( x)],
where K is a bound for the integrand for x 1, from which the lemma follows.
Proof of Theorem 4.2.2, concluded: Combining (4.18), (4.19), and Lemma 4.2.3, we
deduce that g

Kg. Dierentiating e
Kt
g we may show that
g(t

) e
K
g(t

),
which contradicts the above construction in which g(t

) = 0.
Theorem 4.2.2 may be generalized by allowing trapping regions to have corners,
which provides a much more versatile result. Restricting attention to two dimensions,
for example, this generalization means there may be points where / is dened by
two dierent curves. At such a singular boundary point, we require that inequal-
ity (4.7) be satised for both normals. Sad to say, the proof of this generalization is
neither trivial nor rewarding, but in the Exercises we shall explore it further in the
context of specic examples.
Like may other results for autonomous equations, Theorem 4.2.2 may be gener-
alized to non-autonomous equations. However, in practice such a result turns out
not to be terribly useful, and we do not develop this idea.
4.3 Nullclines and trapping regions
We introduce nullclines through the following examples. After dening this term in
the discussion of the rst example, we apply nullclines to construct trapping regions.
2
Although G(x
1
, x) is merely continuous with respect to x (because of

), it may be seen from
(4.20) that G(x
1
, x) is (
1
with respect to x
1
. Indeed, we massaged into the form (4.17) precisely
to obtain this extra modicum of smoothness.
91
Here and below, whenever feasible, we choose physically meaningful equations to
illustrate the theory, either including or referencing an explanation of the origins of
the equation. Most of these examples will re-occur in later chapters.
4.3.1 An activator-inhibitor system
Our rst example is the 2 2 system
(a) x

=
1
1+r
x
2
1+x
2
x
(b) r

= [
x
2
1+x
2
r]
(4.24)
where , , are positive parameters. The variables x and r represent the concen-
tration of two chemical species that evolve through their interaction inside a reaction
vessel. In physical terms, x promotes its own production, or in mathematical terms,
the factor x
2
/(1 + x
2
) in (4.24a) increases with x. At the same time x promotes
the production of r which in turn inhibits the production of x. Of course the linear
terms in each equation represent the decay of the two species. (See Problem ?? in
Appendix D for more information about this system.)
In the context of (4.24), the term nullcline refers to a curve where one of the two
components of the velocity vector (x

, r

) vanishes. By (4.24a), x

vanishes if
x = 0 or r =
x
1 + x
2
1, (4.25)
while by (4.24b), r

vanishes if
r =
x
2
1 + x
2
. (4.26)
Both curves are graphed in Figure 4.2(a); we have shown the more interesting case
when is large enough (
2
> 4( + 1)) that the two nullclines have multiple inter-
sections (bold dots in the gure). Such points (a, b) where the nullclines intersect
represent equilibrium solutions of the ODE: i.e., points such that the constant func-
tion (x(t), r(t)) (a, b) is a solution. Thus, for the case of large as shown in the
gure, (4.24) has three equilibrium solutions, the trivial one (0, 0) and two nontrivial
ones.
To extract information from the nullclines, it is helpful to proceed in the three
stages represented in Figures 4.2(a), 4.2(b), and 4.2(c). Whenever you need to graph
nullclines, we urge you to proceed in these three stagesover the years we have found
doing so organizes ones thoughts for maximum eciency.
In Figure 4.2(a), vertical lines have been drawn along the curve (4.25) because
the ow is vertical there: i.e., x

= 0. Similarly, horizontal lines have been


drawn along (4.26).
92
Figure 4.2(b) augments the previous gure by specifying along (4.25) whether
the ow is up or down; and along (4.26) whether the ow is to the right or
left. Here is the thinking behind constructing this gure. The orientation of
the ow along (4.25) changes from up to down whenever this curve crosses a
nullcline of the other family. More particularly, on any segment of (4.25) that
does not intersect (4.26), the orientation of the ow does not change. Often
one can ll in these arrows starting from the determination of the sign of the
ow in some extreme case. For example, it may be seen from (4.24b) that r

is
positive at the point where (4.25) crosses the positive x-axis (i.e., where r = 0),
and thus the ow is upward at this point. The ow remains upward as one
moves away from this point, as long as the other nullcline (4.26) is not crossed.
More generally, all the remaining arrows along (4.25) in Figure 4.2(b) may be
constructed from this starting case by reversing the direction every time (4.25)
crosses (4.26). Similarly, by (4.24a), x

is negative near the origin (in the rst


quadrant), and as before, all the arrows along along (4.26) in Figure 4.2(b)
may be constructed by building on this starting case.
The nullclines partition the x, r-plane into regions. Within one region the ow
F(x, r) points into one of the four quadrants, x > 0, r > 0, and the
quadrant remains the same if (x, r) moves within this region. The quadrant
of the ow in each of the regions is indicated by a thick black arrow in Fig-
ure 4.2(c). Again, one can complete this gure by understanding an extreme
case and making appropriate reversals on crossing nullclines.
Construction of a trapping region: Consider a rectangular region
R = (x, r) : 0 x A, 0 r B (4.27)
as illustrated in Figure 4.3. Pictorially, it may be seen from the gure that the ow
along each side of R is inward provided A and B are large enough. Analytically, in
Exercise 1 we ask you to determine explicit estimates for A, B to ensure this.
We now claim that, for any initial conditions (a, b) with a > 0, b > 0, the
IVP for (4.24) has a solution for all positive time that remains in the rst quadrant.
Given (a, b), we may choose A, B large enough so that (i) the ow is inward along the
boundary of R and (ii) (a, b) belongs to R. The direct application of Theorem 4.2.2
to prove the claim is prevented by the fact that the boundary of R is only piecewise
smooth, but in Exercise 7 we indicate how to overcome this diculty.
4.3.2 Lotka-Volterra with a logistic modication
In Exercise 5 in Chapter 1 we saw that all trajectories of the Lotka-Volterra equations
are periodic. In particular, solutions of the Lotka-Volterra equations (1.33) exist for
93
x
2

x
2

x
2

x
r
x

=

0
1+x
r =
x
1
1+x
2
2
x
r
x

=

0
1+x
r =
x
1
1+x
2
2
x
r
x

=

0
1+x
r =
x
1
1+x
2
2
r =
r =
r =
(a)
(b)
(c)
Figure 4.2: Nullclines of the activator-inhibitor equations (4.24) for the specic
choice of parameters = 4 and = 1, in which case there are three equilibria (bold
dots). Insets show a blowup of a region near the origin which contains two of the
three equilibria. Panel (a): Horizontal and vertical segments along the nullclines
indicate the orientation of the ow. Panel (b): Arrows along the nullclines indicate
the direction of the ow. Panel (c): Several short, thick arrows indicate the direction
of the ow in regions partitioned by the nullclines.
94
r = B
x

=

0
x
x

=

A
r
Figure 4.3: A rectangular trapping region for solutions of the system (4.24) lying
in the rst quadrant.
all time. On the other hand, the only trapping regions are those bounded by the
orbits themselves. Thus, trapping regions are not a useful technique for proving
global existence for the Lotka-Volterra system.
Let us modify the Lotka-Volterra equations by limiting the growth of the prey to
a nite carrying capacity:
(a) x

= x(1 x/K) xy
(b) y

= (xy y).
(4.28)
The nullclines of (4.28) consist of four straight lines in the xy plane (Figure 4.4). Two
of these come from Equation (4.28a), namely x = 0 and y = 1 x/K, and two from
equation (4.28b): y = 0 and x = 1. Provided that K > 1, the nullclines intersect as
shown in the left panel of Figure 4.4. Flow directions between the nullclines are also
shown in the gure.
For (4.28) we are unable to take a rectangle as a trapping region, but it is possible
to nd a triangle. In the Exercises, the reader is asked to show that the triangular
region in the right panel of Figure 4.4 forms a trapping region provided A and B are
chosen appropriately large. Repeating the argument above, including the extension
of Theorem 4.2.2 to allow / to have corners, we may show that the IVP for (4.28)
has a solution for all positive time for any initial conditions in the rst quadrant.
4.3.3 van der Pols equation
We now consider a technically more dicult example, the van der Pol system,
(a) x

= y
(b) y

= (x
2
1)y x,
(4.29)
95
y
x
K
x
y
0
0
B
1
1
A
(a) (b)
1 K
Figure 4.4: Left panel: Nullclines of equations (4.28a) (thick, black lines)
and (4.28b) (thinner, lighter-colored lines) assuming K > /. Right panel: If A, B
are chosen suciently large, the indicated trapezoid forms a trapping region.
which we encountered in Chapter 1. Let us rst try to construct trapping regions
without using nullclines. For starters, we check whether a level curve of an energy
function like E = x
2
/2 + y
2
/2a circlecould form the boundary of a trapping
region. However, a simple calculation shows that
dE
dt
= (x
2
1)y
2
:
i.e., the ow is not inward on the portion of the circle within the vertical strip
1 < x < 1.
To accommodate this behavior, we attempt to construct a trapping region with a
piecewise smooth boundary as shown in Figure 4.5(a): i.e., a line segment PQ with
the equation
y = ( + 1)x + A, 2 x 2 (4.30)
where A is a large constant, joined onto two circles. From our calculation above, we
know the ow is inward along the two circles
3
. By dividing the two equations in
(4.29) we calculate that
dy
dx
= (1 x
2
)
x
y

x
y
;
therefore the ow is also inward along PQ provided the constant A is chosen large
enough so that [x/y[ 1 along this line segment. However, we run into trouble
at large negative y in the problematic strip 1 < x < 1 it is not possible to
3
We put inward in quotes because we have not actually dened a closed region.
96
close up the gure in a way that maintains inward ow over the entire boundary.
Moreover, this diculty is not easily resolved by moving the centers of the circles or
by adjusting their radii.
We now resort to nullclines, which are shown in Figure 4.5(b). Note that the
direction of the ow changes rapidly when it crosses the x-axis at large values of x
it changes from vertical on the x-axis to horizontal at the nullcline where y 1/x.
This fact gives us great latitude in choosing the boundary of a trapping region in
the fourth quadrant. For our purposes it will suce to have the estimate below the
nullcline
x > 1 , y <
x
(x
2
1)
= x

< 0 , y

> 0. (4.31)
We use (4.31) to x the problem encountered in Figure 4.5(a). Specically, as
illustrated in Figure 4.5(c), we construct trapping regions with six sides, as follows.
Note that the ow (4.24) is odd under the reection (x, y) (x, y); we construct
a region that is invariant under this reection. We begin with the line segment PQ
as in (4.30). We continue with QR, a portion of a circle with center (0, 0) that
extends from Q to a point R just below the x-axis where the circle intersects the
nullcline. Then we proceed along the straight line from R to S, where S, which has
coordinates (2, 2(+1)A), is the reection of P through the origin. The remainder
of the boundary is the reection of what we have already constructed. By previous
construction the ow is inward along PQ and QR, and by (4.31) it is inward along
RS.
To conclude, starting with A in (4.30) large enough, we may construct a trap-
ping region that encloses arbitrary initial conditions. Hence we have proved global
existence for all initial data.
We have guided the reader through the above construction because it is good
training to work through such a complicated case. After the fact, we confess that
there is an easier alternative based on a clever, nonstandard reduction of the scalar
van der Pol equation to a rst-order system (see Exercise 8b).
The above three examples provide an adequate introduction to using nullclines
to nd trapping regions for ODEs, and more examples are given in the Exercises.
However, we present two more examples of constructing trapping regions because
there are more lessons to be learned.
4.3.4 The torqued pendulum and ODEs on manifolds
Consider a pendulum, as illustrated in Figure 4.6, that is subjected to a torque ,
which tends to twist the unperturbed pendulum away from its stable equilibrium.
If friction is modeled by linear damping, then this problem is described by the rst-
97
yintercept A
y
x
0 2 2
slope + 1
Q
R
x
y
P
Q
y
x
(b)
(c)
(a)
x

=

2
x

=

2
P
S
Figure 4.5: (a) An attempt to form a trapping region for the van der Pol
system (4.29), as described in the text. The points P and Q have coordinates
(2, A 2( + 1)) and (2, A + 2( + 1)), respectively. Although the ow is directed
inward along the solid, black curve, it is not feasible to close o the curve to
form a trapping region. (b) Nullclines of the van der Pol system (4.29), with arrows
indicating the direction of the ow. (c) Trapping region (enclosed by bold curve)
for the van der Pol system (4.29). The thin, solid curves are the nullclines, and
the dashed vertical curves (included for reference) are the lines x = 0 and x = 2,
and a circle of [suciently large] radius centered at the origin. The boundary of the
trapping region is a piecewise smooth curve consisting of six pieces as described in
the text.
98
torque
x
Figure 4.6: Schematic diagram of the torqued pendulum.
order system of ODE
x

= y
y

= sin x y + .
(4.32)
Without loss of generality, it suces to consider the case 0.
In the absence of torque, the energy E = y
2
/2 cos x decreases along orbits.
With torque one calculates that
dE
dt
= y
2
+ y;
this may have either sign, but if
[y[ > / (4.33)
then dE/dt is negative. Note that if E = E
0
where E
0
>
2
/ + 1, then (4.33)
follows. Hence the ow along the boundary of the region
/ = (x, y) : y
2
/2 cos x E
0
(4.34)
is everywhere inward. Indeed / would be a trapping region except for the fact that it
is not bounded in x and therefore is not compact (see Figure 4.7). Thus, in principle,
a solution of (13) might stay inside / while x marches o to innity in nite time.
In fact, this does not happen: The RHS of (13) satises the hypothesis (4.5) of
Theorem 4.2.1, so the solution may grow exponentially but it exists for all t R..
(Remark: In fact in positive time x grows at most linearly in tsee Exercises.)
The above clunky proof of global existence may be replaced by a much more
elegant approach based on geometry. Note that the RHS of (13) is periodic in x.
Rather than considering (13) as an ODE on the Euclidean space R R, we regard
it as an ODE on the cylinder S
1
R, where S
1
is the circle. Then, considered as
99
4 2 4 2
y
x y = 0
Figure 4.7: Sketch of the region / in (4.34). In the left panel, / is the unbounded
region that lies between the two curves. The right panel is a rendering of the vertical
strip 0 x < 2, < y < wrapped onto a cylinder so that x = 0 is identied
with x = 2.
a subset of S
1
R, the set / is compact (see right panel of Figure 4.7). Moreover
Theorem 4.2.2 generalizes to the manifold S
1
R as one may prove by working
with appropriate periodic functions. This technique establishes global existence in
forward time for (13).
Occasionally it will be convenient to regard a system of ODEs as describing ow
on a manifold. Although a extension of Theorem 4.2.2 is valid on general manifolds,
we do not formulate such a result. but give reference For our purposes it suces to
have this extension for two manifolds, the cylinder S
1
R and the torus T
2
= S
1
S
1
,
for which the extension may be proved with simple, albeit ad hoc, considerations of
periodic functions of a real variable.
4.3.5 Michaelis-Menten kinetics
Michaelis-Menten kinetics arises in modeling the concentrations of certain chemical
species in an enzyme-mediated reaction. Applying suitable scaling and reasonable
assumptions on the underlying chemistry (see Appendix D), the equations take the
form
(a) r

= r(1 c) + c
(b) c

= r(1 c) (1 +)c,
(4.35)
where , are positive parameters. Typically is very small indeed.
Global existence for (4.35) is trivial. By forming a linear combination of the
equations one sees that the derivative of r + c is negative. Any triangular region
bounded by the r-axis, the c-axis, and a line r + c = A can serve as a trapping
region.
100
r
c
0
0
Figure 4.8: Nullclines for the scaled Michaelis-Menten equations (4.35) form a
narrow trapping region.
However, the region between nullclines
c =
r
r + 1
and c =
r
r + 1 +
is a much more interesting trapping region (see Figure 4.8). The rapid ow (4.35b)
drives (r, c) into this trapping region, after which both variables tend slowly to zero
within the trapping region.
We ask the reader to recall Exercise 7 from Chapter 1, which also dealt with
two variables evolving at radically dierent rates. In the spirit of that exercise, we
consider the approximation of setting = 0 in (4.35b). In this approximation, we
may then solve (4.35b), now an algebraic equation, to obtain c = r/(r + 1 + );
substituting into (4.35a) we derive
dr
dt
=
r
r + 1 +
. (4.36)
This equation is the (scaled) Michalis-Menton approximation for the enzymatic reac-
tion rate arising from (4.35). One may worry about the violent approximation from
which it is derived, which changes the dierential equation (4.35b) to an algebraic
equation, but the above analysis with nullclines shows that, apart from the initial
transient, it does not lead us astray.
101
4.4 Continuous dependence of the solution
4.4.1 The main result
Theorem 4.4.1. Suppose F : | R
d
is locally Lipschitz, and let x
0
(t), 0 t < ,
be a solution in forward time of x

0
= F(x
0
) with initial condition x
0
(0) = b
0
. Then:
(i) For any positive T that satises T < , there is a neighborhood 1 of b
0
such
that for any b 1, the IVP
x

= F(x), x(0) = b (4.37)


has a solution for 0 t < T.
(ii) Moreover, there is a constant L such that for all b 1,
[x(t) x
0
(t)[ [b b
0
[ e
Lt
, 0 t < T. (4.38)
In words, Conclusion (ii) asserts that if the initial data for an IVP are altered
slightly, then the perturbed solution diverges from the original solution no faster
than at a controlled exponential rate. Conclusion (i), which guarantees that the
perturbed solution exists for nearly as long as x
0
, gives the estimate more signicance.
Incidentally, in Exercise 17 we give an example where x
0
blows up in nite time but
nearby solutions exist for much longer times; indeed, for all positive time.
Proof of Theorem 4.4.1. Let /
0
be the image of [0, T] under x
0
, which is a compact
subset of |. By Corollary 3.3.3, there is a larger compact subset / | and a > 0
such that for every point x
0
(t) /
0
, the closed ball

B(x
0
(t), ) is contained in /; in
symbols,
t T =

B(x
0
(t), ) /. (4.39)
By Proposition 3.3.2, F[/ is Lipschitz continuous, say with Lipschitz constant L > 0.
Let 1 = B(b
0
, e
LT
) be the neighborhood in the theorem. If b 1, let x be
the solution in forward time of (4.37), on whatever interval it exists, and dene
t

= supt [0, T] : (s t) [x(s) x


0
(s)[ < . (4.40)
Since [x(0) x
0
(0)[ < e
LT
< , we know t

> 0.
Let g(t) = [x(t) x
0
(t)[. From subtracting the integral equations for x and x
0
we deduce that
g(t) [b b
0
[ +
_
t
0
[F(x(s)) F(x
0
(s))[ ds. (4.41)
If t t

, both x(s) and x


0
(s) in the integrand belong to

B(x
0
(s), ) /, so Lipschitz
102
continuity gives us
[F(x(s)) F(x
0
(s))[ L[x(s) x
0
(s)[.
Substituting into (4.41) we have
g(t) [b b
0
[ + L
_
t
0
g(s) ds
and hence by Gronwalls Lemma
g(t) [b b
0
[e
Lt
, 0 t t

. (4.42)
We claim that t

= T. Certainly t

T. But we have from (4.42) that


g(t

) (e
LT
) e
Lt
e
L(Tt)
.
If t

were strictly less than T, then we would have g(t

) = [x

x
0
(t

)[ < , contra-
dicting the denition (4.40) of t

This completes the proof.


Corollary 4.4.2. In Theorem 4.4.1, if F : | R
d
is (
1
, then in the estimate (4.38)
we may take the constant
L = max
xK
|DF(x)|
where / is the compact subset of | chosen in the above proof.
As an Exercise, we ask the reader to review the proof and verify that this choice
of L yields the desired estimate.
These results and related results below have analogues in backwards time, but
we do not bother to formulate these.
4.4.2 Some associated formalism
Sometimes, when it is convenient to focus on how the solution of an IVP depends on
the initial data, we shall use the ow notation. Specically, we shall write
(t, b) = x(t) (4.43)
where x is the solution of the IVP (4.37). This solution operator or ow function is
a mapping : |, where its domain is given by
= (t, b) [0, ) | : t maximal interval of existence for (4.37) . (4.44)
It follows from Theorem 4.4.1 that is Lipschitz continuous with respect to its
second argument b. We ask the reader to supply the trivial proof that it is Lipschitz
continuous with respect to both arguments simultaneously.
103
The solution operator satises the following relation, which is known as the semi-
group property.
Proposition 4.4.3. If (s, b) and if (t, (s, b)) , then (s + t, b) and
(t, (s, b)) = (s + t, b). (4.45)
This result follows easily from Lemma 3.2.8.
4.4.3 Continuity with respect to the equation
In Theorem 4.4.1 we showed that the solution of an IVP depends continuously on its
initial data. In the following result we show that the solution depends continuously
on the equation. More precisely, we compare solutions of two IVPs
x

= F(x), x(0) = b and y

= G(y), y(0) = b. (4.46)


Theorem 4.4.4. Suppose F : | R
d
and G : | R
d
are locally Lipschitz, and
suppose x(t), y(t) satisfy (4.46) for 0 t < . Then for every T < , there exists
constants L and > 0 such that if sup
U
[F G[ < then
[x(t) y(t)[
e
Lt
1
L
sup
U
[F G[, 0 t T, (4.47)
In this result, the supremum over | could be replaced by a supremum over an
appropriate compact subset of |. Note that the strength of the estimate (4.47)
derives from the fact that sup
U
[F G[ might be much smaller than .
The estimate (4.47) is analogous to Conclusion (ii) of Theorem 4.4.1. We could
formulate an existence result for the perturbed equation that was analogous to Con-
clusion (i), but this does seem not worth the bother.
The proof of this result, which involves just recycling ideas in the proof of Theo-
rem 4.4.1, is left as an Exercise.
4.5 Dierentiable dependence on initial data
4.5.1 Formulation of the main result
In the previous section we showed that the ow (t, b) is Lipschitz continuous in b.
In this section we show that is in fact (
1
, provided of course that F is (
1
.
The above phrasing is how a pure mathematician would describe the results of
this section. However, let us adopt the language of applied math since we believe this
makes the discussion more intuitive. Thus we suppose x
0
(t), 0 t < , is a solution
104
in forward time of x

= F(x) with initial condition x


0
(0) = b
0
, and we ask how the
solution changes if the initial condition is perturbed: let x(t, ) be the solution of
x

= F(x), x(0) = b
0
+ b
1
. (4.48)
We look for an expansion
4
of the solution in powers of
x(t, ) = x
0
(t) + x
1
(t) + . . . . (4.49)
The size of the neglected terms, represented by the dots, will be estimated in Theo-
rem 4.5.1. For the moment we proceed formally. Substituting (4.49) into (4.48) we
obtain
x

0
(t) + x

1
(t) + . . . = F(x
0
(t) +x
1
(t) + . . .).
Using a Taylor series to expand the RHS of this equation in powers of , we calculate
x

0
(t) + x

1
(t) + . . . = F(x
0
(t)) +DF(x
0
(t)) x
1
+ . . . . (4.50)
Equation (4.50) must hold for all values of : i.e., the two power series on either side
of the equation dene the same functions of . Thus the coecients of each power
of must be equal. Matching corresponding powers of in(4.50) we obtain
O(
0
) : x

0
= F(x
0
)
O(
1
) : x

1
= DF(x
0
(t)) x
1
.
The O(
0
)-equation merely repeats the equation for our original solution. The O(
1
)-
equation gives new information: i.e., an ODE for x
1
, which is a linear equation with
time-dependent coecients
x

1
= A(t)x
1
(4.51)
where the coecient matrix is given by A(t) = DF(x
0
(t)). Of course matching
powers of in the initial conditions requires
x
1
(0) = b
1
. (4.52)
Theorem 3.4.3 guarantees that the IVP (4.51), (4.52) has a unique solution x
1
(t) for
t in the same interval [0, ) on which x
0
is dened.
Theorem 4.5.1. Let F : | R
d
be (
1
. In the above notation
lim
0
[x(t, ) x
0
(t) x
1
(t)[

= 0, 0 t < . (4.53)
Moreover, for any T < , the limit is uniform for 0 t T.
4
We may describe (4.49) as an ansatz : i.e., an assumed form for the solution of a problem. This
term, which comes from German, is a useful one to add to your (mathematical) vocabulary.
105
4.5.2 The order notation
We rst introduce the order notationbig-O and little-obecause this makes an
otherwise messy proof fairly straightforward.
The basic use of big-O, the simpler concept, is as follows: Given a quantity that
depends on a parameter, f() where 0 < <
0
and f may be either vector or scalar,
we say that f is order-, written f() = O(), if
(
1
> 0)(C) such that 0 < <
1
= [f()[ C.
(The formula f() = O() may also be read f is big-O of .) The same notation
is used if can assume either sign: i.e., if f() is dened for 0 < [[ <
0
, we write
f() = O([[) if [f()[ C[[ provided [[ is suciently small. The notation is
also used to estimate quantities that depend on multiple parameters. For example,
in Theorem 4.4.1 the solution x depends on the d parameters of the initial data,
b
1
, . . . , b
d
, and we may paraphrase the conclusion of the theorem as
[x(t) x
0
(t)[ = O([b b
0
[). (4.54)
Indeed we will say that (4.54) holds uniformly for t [0, T] because the single
constant e
LT
makes the inequality work for all t in this interval.
Little-o is a more subtle concept. If f is dened for 0 < <
0
, we say that f is
little-o of , written f = o(), if
( > 0)(
1
> 0) such that 0 < <
1
= [f()[ .
Of course this denition is equivalent to limf()/ = 0. In Exercise 1 we ask the
reader to show that for any constant C and any function () 0,
(a) If f() = o(), then Cf() = o().
(b) If f() = o() and if () = O(), then f(()) = o().
(4.55)
This result illustrates the usefulness of the order notationreinterpreting (4.55) in
terms of the original Cs, s and s obscures the basic simplicity of the behavior in
question. The little-o concept and (4.55) also extend to functions dened for both
positive and negative and to functions of multiple parameters. The proofs of these
claims are left for the dedicated reader.
To relate these concepts to Theorem 4.5.1, let g(t, ) = [x(t, ) x
0
(t) x
1
(t)[;
inequality (4.53) simply asserts that
g(t, ) = o(), (4.56)
106
uniformly in t. We shall show that there is a constant L such that
g(t, ) o() + L
_
t
0
g(s, ) ds 0 t T (4.57)
where the o()-term is uniformly small for 0 t T. We ask the reader to combine
Gronwalls lemma with (4.55) to show that (4.53) follows from (4.57). Need to
explain what (4.57) means.
4.5.3 Proof of Theorem 4.5.1
Let /
0
be the image of [0, T] under x
0
. By Corollary 3.3.3, there is a larger compact
subset / | and a > 0 such that
0 t T =

B(x
0
(t), ) /. (4.58)
We bound the location of the solution x(t, ) of (4.48) with Theorem 4.4.1 using the
constant L of Corollary 4.4.2,
L = max
xK
|DF(x)|. (4.59)
Note that [x(0, ) x
0
(0)[ = [b
1
[. Thus, if [[ <
0
, where
0
= e
LT
/[b
1
[, then
for all t [0, T], we have x(t, )

B(x
0
(t), ) /.
Starting from the integral relations
x(t, ) = b
0
+ b
1
+
_
t
0
F(x(s, )) ds
x
0
(t) = b
0
+
_
t
0
F(x
0
(s)) ds
x
1
(t) = b
1
+
_
t
0
A(s)x
1
(s) ds,
we form an appropriate linear combination in which the constant terms cancel to
deduce that deduce that
g(t, )
_
t
0
[F(x(s, )) F(x
0
(s)) A(s)x
1
(s)[ ds.
Let us add and subtract A(s)(x(s, ) x
0
(s)) in the integral on the RHS and use the
triangle inequality to obtain
g(t, ) 1
1
(t, ) +1
2
(t, ) (4.60)
107
where
1
1
(t, ) =
_
t
0
[F(x(s, )) F(x
0
(s)) A(s)(x(s, ) x
0
(s))[ ds
1
2
(t, ) =
_
t
0
[A(s)[x(s, ) x
0
(s) x
1
(s)][ ds.
(4.61)
The following two claims verify (4.57), which will complete the proof.
Claim 1: 1
1
(t, ) = o(), uniformly for t [0, T).
Claim 2: If L is given by (4.59), then
1
2
(t, ) L
_
t
0
g(s, ) ds.
Proof of Claim 2. The integrand in the denition (4.61) of 1
2
may be estimated by
|A(s)| g(s). Since x
0
(s) / we have |A(s)| = |DF(x
0
(s))| L.
Proof of Claim 1. For any z
1
, z
2
/ such that the entire line segment from z
1
to z
2
belongs to |, it was shown in Lemma 3.2.3 that
F(z
1
) F(z
2
) =
__
1
0
DF(z
2
+ (z
1
z
2
)) d
_
(z
1
z
2
).
Subtracting DF(z
2
) (z
1
z
2
) from both sides of the equation, we deduce
F(z
1
)F(z
2
)DF(z
2
)(z
1
z
2
) =
__
1
0
[DF(z
2
+ (z
1
z
2
)) DF(z
2
)] d
_
(z
1
z
2
).
(4.62)
Now DF(z) is continuous on |, and its restriction to the compact set / is uniformly
continuous. Therefore, for every > 0, there is a > 0 such that
[z
1
z
2
[ < = |DF(z
1
) DF(z
2
)| < .
Substituting into (4.62) and observing that the distance between z
2
+(z
1
z
2
) and
z
2
is [z
1
z
2
[ [z
1
z
2
[ we conclude that
F(z
1
) F(z
2
) DF(z
2
) (z
1
z
2
) [z
1
z
2
[ = o([z
1
z
2
[), (4.63)
or more compactly,
F(z
1
) F(z
2
) DF(z
2
) (z
1
z
2
) = o([z
1
z
2
[)
Letting z
1
= x(s, ), z
2
= x
0
(s), we deduce that the integrand of 1
1
in (4.61) satises
F(x(s, )) F(x
0
(s)) A(s) (x(s, ) x
0
(s)) = o([x(s, ) x
0
(s)[). (4.64)
108
Now we invoke Theorem 4.4.1:
[x(s, ) x
0
(s)[ = O([x(0, ) x
0
(0)[) = O([b
1
[) = O().
Substituting this estimate and (4.64) into (4.55) where () = [x(s, ) x
0
(s)[, we
see that
F(x(s, )) F(x
0
(s)) A(s)(x(s, ) x
0
(s)) = o().
Of course 1
1
is bounded by T times the maximum of the integrand in (4.61), so we
are done.
4.5.4 Further discussion
To start, we record the pure-mathematicians version of Theorem 4.5.1.
Theorem 4.5.2. If F : | R
d
is (
1
, then the ow : | is (
1
with respect to
all its arguments.
Proof. We focus on the derivatives of with respect to b since the t derivative is
easily handled. First we must show that the partial derivatives of exist; i.e., show
the existence of the limits

b
j
(t, b) = lim
h0
(t, b + he
j
) (t, b)
h
,
where e
j
is a unit vector in the j
th
-coordinate direction. This follows from The-
orem 4.5.1, and in fact the limit equals the solution v(t) of a linear initial value
problem
dv
dt
= DF((t, b))v, v(0) = e
j
. (4.65)
By Theorem 4.4.4, the solution of (4.65) depends continuously on b and t. (Note
that, as regards continuity with respect to t, the function v depends on t both
indirectly through the coecient matrix in (4.65) and directly as the solution of an
ODE.) Therefore is (
1
.
Equation (4.65) is a beautiful characterization of /b
j
. It is used in theoretical
analysis of ODE, for example in the study of stability of periodic solutions in Sec-
tion 6.4 through the Poincare map. Unfortunately, cases where (4.65) can be solved
explicitly are rather rare, not least because cannot be found explicitly.
Here is one example where explicit solution is possible. Consider the IVP for the
Lotka-Volterra equations,
(a) x

= x xy, x(0) = b
1
(b) y

= (xy y), y(0) = b


2
.
(4.66)
109
If b
2
= 0, then (4.66) has the explicit solution (t, b
1
, 0) = (b
1
e
t
, 0). How is the
solution changed if initially there is a small population of predators? To estimate this
change we may calculate /b
2
, which is the goal of Exercise 10. This interpretation
of the derivative makes this primarily instructional exercise slightly less academic.
Here we calculate /b
1
to prepare for the exercise.
The dierential of (4.66) is
DF =
_
1 y x
y (x 1)
_
.
Substituting (t, b
1
, 0) into this matrix we nd that v = /b
j
satises the linear
system v

= A(t)v where
A(t) =
_
1 b
1
e
t
0 (b
1
e
t
1)
_
.
For v = /b
1
, the appropriate initial condition is v(0) = (1, 0), and solving the
system we calculate that /b
1
(t, b
1
, 0) = (e
t
, 0). Of course this answer may be
checked by direct dierentiation of (t, b
1
, 0) = (b
1
e
t
, 0).
We refer the reader to Exercise 10 for the more interesting calculation of /b
2
.
(See also Exercise 14 for another example where a derivative of a solution with respect
to initial conditions may be computed.)
4.5.5 Generalizations
First let us extend Theorem 4.5.2 to nonautonomous IVP. Let (t, t
0
, b) denote the
solution of
x

= F(x, t), x(t


0
) = b. (4.67)
Theorem 4.5.3. If F : | (T
1
, T
2
) R
d
is (
1
, then the ow (t, t
0
, b) is (
1
with
respect to all its arguments (in the domain where the IVP has a solution). Moreover,
as a function of time, /b
j
(t, t
0
, b) satises a linear system
dv
dt
= DF((t, b), t)v, v(t
0
) = e
j
. (4.68)
Remark: In (4.68), DF denotes the d d matrix of partial derivatives F/x
j
,
not including the t derivative.
The proof of the theorem is posed for the dedicated reader in Exercise ??.
As in the discussion of continuity in Section 4, let us move on from dierentiability
with respect to initial conditions to dierentiability with respect to the equation.
The benign interpretation of this concept involves a parametrized family of ODEs,
say
x

= F(x,
1
,
2
, . . . ,
k
). (4.69)
110
Naturally we simplify the notation in (4.69) with vector notation, writing =
(
1
,
2
, . . . ,
k
). In the following theorem and below, DF denotes the d d ma-
trix of derivatives of F with respect to the coordinates x
j
; we indicate derivatives
with respect to one of the parameters explicitly, F/
j
.
Theorem 4.5.4. Let | be an open subset of R
d
R
k
, and suppose F : | R
d
is (
1
. Let (t, b, ) be the solution of the IVP for (4.69) with initial condition
(0, b, ) = b. Then is (
1
with respect to all its arguments. Moreover the partial
derivative /
j
= v(t) where v(t) satises the linear, inhomogeneous IVP
v

= DF((t, b, ))v +
F

j
, v(0) = 0.
The proof of this result, which simply imitates the previous proof, is left as an
Exercise for the dedicated reader.
Our nal result asserts that if F possesses more derivatives, so does the solution
of the IVP.
Theorem 4.5.5. Let | be an open subset of R
d
R
k
, and suppose F : | R
d
is (
m
. Let (t, b, ) be the solution of the IVP for (4.69) with initial condition
(0, b, ) = b. Then is (
m
with respect to all its arguments.
How to prove this result is discussed in Exercise 13.
4.6 Exercises
4.6.1 Exercises to consolidate your understanding
1. Supply details omitted in the text:
(a) Prove that the limit in (4.4) exists.
Hint: For t <

let
I(t) =
_
t
0
F(x

(s)) ds.
Observe that [I(t
1
) I(t
2
)[ K[t
1
t
2
[ where K = max
K
[F[ and
apply the continuous analogue of the Cauchy criterion: i.e., for every
> 0 there is a > 0 such that

< t
1
, t
2
<

= [I(t
1
) I(t
2
)[ < .
(b) Before reading the hint, please try to prove Corollary 4.1.3 on your own.
111
Hint: Invoke Corollary 3.3.3 to obtain a larger compact set /

such
that for every x /,

B(x, ) /

. Apply Proposition 4.1.2 to /

.
Find a lower bound for the time required for the solution to move
from | /

to /.
(c) If t

is dened by (4.9), show that x(t

) /.
(d) Determine sucient conditions on A, B that make the region in Figure 4.3
a trapping region for (4.24).
Remark: See Exercise 7 regarding the fact that R is only piecewise
smooth.
(e) Determine sucient conditions on A, B that make the region in Figure 4.4b
a trapping region for (4.28).
Hint: Choose A rst; the condition on B depends on A.
(f) Verify the claim (4.31).
(g) Regarding (13), improve on Theorem 4.2.1 by proving that y remains
bounded and [x[ A + B[t[ for some constants A, B.
Hint: Let / be the (noncompact) region (4.34). Combine and modify
the proofs of Theorems 4.2.1 and 4.2.2
(h) Prove Corollary 4.4.2.
(i) Show that the solution map dened in (4.43) is Lipschitz continuous with
respect to all arguments.
(j) Prove Proposition 4.4.3.
(k) Prove Proposition 4.4.4.
(l) Verify the claims made in (4.55).
(m) Show that (4.53) follows from (4.57).
2. Regarding Theorem 4.1.2, give an example to show that the conclusion need
not follow if

= .
3. Construct an IVP x

= F(x), x(0) = b in one-dimension for which F is bounded


but whose solution has a maximal interval of existence (

) with both end-


points nite.
112
Discussion: We have seen examples where the solution of an IVP for
x

= F(x) does not exist for all time because the solution blows up
in nite time. As Theorem 4.2.1 shows, this behavior may be traced
to super-linear growth of F as x and apparently cannot occur
if F is bounded.. However, if the domain of F is a proper subset |
of R
d
, then the solution of an IVP might cease to exist because the
solution tends to |, which is what underlies the present exercise.
4. Generalize Theorem 4.2.1 to nonautonomous equations: i.e., prove that if there
exist nonnegative constants B, K such that
[F(x, t)[ K[x[ + B, x R
d
and T
1
< t < T
2
,
then solutions satisfy
[x(t)[ [x(0)[e
K|t|
+
B
K
(e
K|t|
1), T
1
< t < T
2
where either T
1
or T
2
(or both) may be innite.
Discussion: Recall that Theorem 3.4.3 gave a complete existence
theory for linear systems such as (3.33), but its proof required delving
more deeply into the details of the integral-equation formulation of
the IVP. This exercise is part of a alternative approach to linear
equations that circumvents such analysis: i.e., start from the easily
proved local existence result Exercise 3(a) in the previous chapter
and use the present exercise to obtain a global solution.
Regarding perspective, it is instructive to recall Exercise 6 in which
you solved the special case of Mathieus equation x

+(1/4+ cos t)x =


0 (written as a linear rst-order system with variable coecients).
Naively, based on a comparison with x

+x/4, one would not expect


solutions of Mathieus equation to grow at all as t increased, but
computation showed otherwise. The present result does guarantee
that the growth is no worse than exponential.
5. Give an example of a locally Lipschitz function (or even (
1
) that satises the
linear-growth estimate (4.5) but is not (globally) Lipschitz.
6. Under the hypotheses of Theorem 4.2.2, show that if b /, then the solution
x to (4.1) exists for all positive time and moreover belongs to /.
This exercise is obsolete, given the rewriting of the proof of Theo-
rem 4.2.2.
113
7. Explore how the proof of Theorem 4.2.2 may be simplied if one has strict
inequality in (4.7). Start with the one-dimensional case, and argue by contra-
diction: specically, suppose x(t) > a
1
for t < t

but x(t

) = a
1
and deduce
from this that x

(t

) 0, which contradicts (4.13) if the inequality is strict.


Then extend this argument to d-dimensions.
Discussion: In the proof in the text we invoked the ODE to esti-
mate in a neighborhood of / how fast the solution may approach
the boundary. Such estimation is needed, for example, in analyz-
ing solutions of (4.24) along the r-axis, where the ow is tangential.
By contrast, when the inequality in (4.7) is strict, one may derive a
contradiction by focusing, as above, solely on a hypothetical point
where the solution crosses /.
This simplication is pertinent to generalizing Theorem 4.2.2 by
allowing two-dimensional trapping regions with piecewise smooth
boundaries. For deniteness consider this issue in the existence proof
for (4.24) with the rectangular trapping region (4.27). It follows from
the proof of Theorem 4.2.2 that a solution curve x(t) cannot reach
R at a non-corner point. At each of the four corners, for at least one
of the two intersecting sides, the dot product of the ow F(x) vector
with the inward normal ^
x
is strictly positive. Thus one may derive
a contradiction by assuming there is a time t

such that the solution


lies in the interior of R if t < t

but passes through a corner at t = t

.
In this way the existence proof for (4.24) may be completed.
In fact, these considerations may be developed into a general proof
of Theorem 4.2.2 for two-dimensional trapping regions with corners.
Two key observations are: (i) if F(x) ,= 0 at a corner point x, then
the dot product F(x), ^
x
) must be positive for one of sides and
(ii) if F(x) = 0 at a corner point, then x is an equilibrium, and by
uniqueness no other solution can pass through this point. We invite
the dedicated reader to make an actual proof from these fragments.
The existence result may be extended to trapping regions with piece-
wise smooth boundaries in dimensions greater than 2. Unfortunately,
even the denition of a piecewise smooth boundary in higher di-
mensions is painfully technical. One tractable special case concerns
rectangular solidsi.e., regions of the form
/ = x R
d
: a
j
x
j
b
j
, j = 1, . . . , d
where a
j
, b
j
are real constants, but we do not pursue this.
114
8. Construct trapping regions and thereby prove global existence for the follow-
ing equations, possibly with restrictions on the initial data as indicated. Use
nullclines if helpful.
(a) The Lotka-Volterra equations including both the Allee eect and logistic
growth of the prey:
x

= x
_
x
x+
_
(1
x
K
) xy
y

= (xy y)
where x(0), y(0) 0.
(b) The FitzHugh-Nagumo equations,
x

= y (x
3
/3 x)
y

= x,
(4.70)
which admit a rectangular trapping region. Not true!
Discussion: Show that the function x obtained from a solution of
(4.70) satises the second-order scalar van der Pol equation
x

+ (x
2
1)x

+ x = 0.
This alternative reduction of the van der Pol equation to a rst-
order system provides a simpler existence proof than the one given
in Section 5.3.3.
This simpler existence proof is based on a very clever idea. It is easy
to get cowed into thinking, I never would have thought of that in
a million years. It is important to remember that a lot of clever
people have been working in this eld for a long time, and we are
the beneciaries of their eorts. Many of us could not have come up
with this idea, either; and at the same time you might be surprised
by how resourceful you become after studying a problem intensely
over an extended period.
Incidentally, (4.70) is one of several, similar, equations that go by
this name.
(c) Dungs equation without forcing, written as as a rst-order system:
x

= y
y

= y + x x
3
.
(4.71)
Hint: Use the energy (kinetic plus potential) of this system,
E(x, y) = y
2
/2 x
2
/2 + x
4
/4 (4.72)
115
to construct the trapping region.
(d) A bead sliding on a rotating loop,
x

= y (4.73)
may

= y mg sin x + m(a sin x)


2
cos x,
as shown in Figure 4.9. Here x is the angle the position of the bead makes with
the vertical, m is the mass of the bead, a is the radius of the hoop, and is
the angular speed of the hoop.
Discussion: These equations come from applying Newtons second
law to the motion, which is purely tangential. The tangential ac-
celeration is ax

. There are three forces acting on the bead: (i)


gravity, whose projection onto the tangential direction is mg sin x;
(ii) friction, which we model as x

; and (iii) centrifugal force from


rotation about the vertical axis in a circle of radius equal to a sin x,
whose tangential projection is m(a sin x)
2
cos x. The rst two forces
are exactly the same as for the pendulum without rotation; the third
is new. Alternatively, one may regard (4.73) as another variant of
the pendulum equation in which the pivot point rotates.
In this problem the energy to use in constructing a trapping region
is
E = my
2
/2 + m(a sin x)
2
/2 mga cos x.
The rst term here is the kinetic energy of motion around the hoop;
the third is the gravitational potential energy; the middle term is an-
other contribution to kinetic energy, coming from the rotation around
the vertical axis.
It is quite easy to obtain global existence for (4.73) by constructing
a trapping region as a subset of the cylinder S
1
R. Challenge:
Can you prove that, even if (4.73) is considered as an ODE on R
2
,
trajectories remain bounded? I.e., show that friction brings the bead
to rest after a nite number of revolutions around the loop, no matter
what the initial conditions.
(e) The Lorenz equations:
x

= (y x)
y

= x y xz
z

= z + xy.
116
a
x

mass
m
mg
a sin(x)
Figure 4.9: Schematic of the bead on a rotating wire hoop in Exercise 8d.
Hint: Show that a region of the form
/ = x
2
+ y
2
+ (z )
2
A
2

is a trapping region if A is suciently large.


9. Consider the equations for growth of two symbiotic species:
x

= r
1
x
_
1
x
K
1
+
y
K
3
_
, y

= r
2
y
_
1 +
x
K
2

y
K
4
_
.
The constants K
j
are all positive. For some parameter values, solutions of these
equations with x(0), y(0) 0 exist for all time; for other values, solutions blow
up in nite time. Examine the nullclines of this system and determine the
condition on the parameters that separates the two cases. Prove that solution
exists for all time in good case. Prove that the solution does blow up in nite
time when the nullclines suggest this possibility.
Remark: Note that each species has logistic growth that is enhanced
by the presence of the other.
10. For the Lotka-Volterra system (4.66), calculate /b
2
(t, b
1
, 0).
11. (a) Use separability to show that the solution of a scalar IVP
dx
dt
= f(x), x(0) = b
satises
F(x) = t + F(b) (4.74)
117
where F(x) is an anti-derivative of 1/f(x).
(b) Dierentiate (4.74) to show that
x
b
=
f(x)
f(b)
.
(c) Show that your answer to Part (b) satises the appropriate IVP for (4.65);
i.e.
v

= f

(x(t)) v, v(0) = 1.
Discussion: This problem provides an independent check on (4.65).
A comment about notation: it might be more precise to use the
ow notation throughout the above problem, but when explicit
calculations are involved we usually nd it more intuitive to use x
instead. Note that what we are calling x depends on both t and b,
and we are inconsistent about how many, if any, of the arguments of
x we choose to write. Again, we nd this vagueness is helpful when
nding explicit solutions.
12. (a) Solve the IVP for the logistic equation with constant harvesting
dx
dt
= x(1 x) , x(0) = b.
(b) Compute the derivative x/ as a function of t, , and b.
(c) Verify that Theorem 4.5.4 gives the same answer as Part (b).
(d) Allowing harvesting to be time-dependent,
dx
dt
= x(1 x) f(t), x(0) = b,
nd x/ as a function of t for the special values = 0, and b = 1.
Discussion: Part (a) repeats Exercise 3(c) from Chapter 1. This ex-
ercise, which involves some messy calculations, has limited intrinsic
interest, but it has some pedagogical value: it helps make Theo-
rem 4.5.4 more concrete, it provides an independent check on the
formula of the theorem, and Part (d) invites the reader to generalize
the theorem to nonautonomous equations. As in the previous exer-
cise, we avoid the use of the ow notation and are sloppy about
indicating arguments of dependent variables.
118
13. The least painful way to prove Theorem 4.5.5 is to extend the order notation.
For any exponent p, dene the concepts f() = O(
p
) and f() = o(
p
) with
the obvious modications of the denitions in Section 4.5.2.
(a) Verify the generalization of (4.55):
(a) If f() = O(
p
) and g() = o(
q
), then f()g() = o(
p+q
).
(b) If f() = o(
p
) and if () = O(
q
) where q > 0, then f(()) = o(
pq
).
(4.75)
(b) Show that a function of one variable f(t) is of class (
k
i
f(t + t)
k

=0
1
!
d

f
dt

(t)(t)

= o((t)
k
),
uniformly for t in compact sets. Extend this to an analogous result for functions
of several variables.
(c) Prove Theorem 4.5.5.
Discussion: It may be that the principal obstacle to Part (c) of the
Exercise is nding adequate notation for the higher derivatives of ,
especially with respect to the initial conditions b. Struggle with this
part, or maybe with the whole exercise, for as long as you nd it
instructive to do so, but no longer.
4.6.2 Exercises referenced elsewhere in this book
14. Let r(t, b), (t, b) be the solution of the IVP
r

= f(r, )(r 1), r(0) = b

= 1, (0) = 0.
Calculate r/b(2, 1)
Discussion: Note that r(t, 1) 1 and that for all initial conditions b
the second component satises (t, b) = t. The solution with initial
conditions b = 1 traces out the unit circle (see Figure 4.10). The
above derivative may be interpreted as an approximate answer to
the following question: if a solution of this system starts out at a
point on the x-axis near r = 1, what will the value of r be when the
solution next crosses the (positive) x-axis? This apparently academic
exercise will have some value in Chapter 6 illustrating the Poincare
map.
119
f > 0
f < 0
b
r(2 , b)
Figure 4.10: Schematic illustration of the dynamics of the system in Exercise 14.
Note that r(2, b) 1+r/b(2, 1)b assuming b is close to 1 as sketched. The sam-
ple solution trajectory strays further from the [dashed] unit circle where f(r, ) > 1
(shaded wedge region) and becomes closer to the unit circle where f(r, ) < 0.
4.6.3 Computational exercises
Discussion: When seeking to understand an ODE through numerical solutions, it
usually is more ecient to rely on existing software than to write a routine from
scratch. However, even when using software, it is helpful to have some more direct
experience with writing numerical routines yourself, which is one objective of the
following two exercises. Another objective is to hint how Eulers method may be
improved.
Q: Use Matlab to write code?
15. (a) Apply Eulers method to solve
x

= x y (x
2
+ y
2
)x
y

= x + y (x
2
+ y
2
)y.
(4.76)
for 0 t 10. Choose various initial conditions and mesh sizes h = 10
n/2
, n =
2, 3, . . . , 10. Note that (4.76) may be solved explicitly (see Exercise 3(e)) be-
cause it simplies when written in polar coordinates. Compare your numerical
solution with the exact solution; specically, make a log-log plot of the errors
in x(10) and y(10) as a function of h over this range.
Discussion: As mentioned in the text, errors in Eulers method are
painfully large. The simplest method that achieves better accuracy is
the so-called improved Euler method. In this method, the basic step
120
of the iterationadvancing from y
n
, an approximation of x(nh), to
y
n+1
is a two-stage process:
(a) y
n+
1
2
= y
n
+
h
2
F(y
n
)
(b) y
n+1
= y
n
+ hF(y
n+
1
2
).
(4.77)
After y
n+1
is computed, y
n+
1
2
is discarded.
(b) Repeat Part (a) using the improved Euler method. Again plot the error
your numerical solution as a function of h.
Discussion: To motivate this method, as in Section 4.8.2, we focus on
the rst step of the iteration, calculation of y
1
x(h). The improved
Euler method is based on the integral-equation formulation,
x(h) = x(0) +
_
h
0
F(x(s)) ds.
Equation (4.77b) represents a one-term Riemann-sum approximation
of the integral,
_
h
0
F(x(s)) ds hF(x(h/2)), (4.78)
but with the integrand evaluated at the center of the interval. As
illustrated in Figure 4.11 this midpoint rule is substantially more ac-
curate than the left-end-point rule on which the (unimproved) Euler
method is based. Equation (4.77a) makes an estimate for x(h/2) to
be used in (4.78). Of course y
n+
1
2
,= x(h/2), and this is yet another
source of error in the approximation. However, the key point is this:
In (4.78), F(x(h/2)) is multiplied by the small factor h, and because
of this, the errors generated by substitution of y
n+
1
2
into (4.78) are
no larger than those inherent in the midpoint-rule approximation.
It may be shown that, for small h, the improved Euler method sat-
ises the estimate
[x(nh) y
n
[ Ch
2
e
Lnh
.
(For a proof, see Atkinson [1]. the proof does not comfortably t
the Gronwall-based proof of Theorem 4.8.1.) Notice that the error is
h
2
, a substantial improvement over the original Euler method if h
is small. Based upon the exponents of h in these error estimates, we
say that the [original] Eulers method is rst-order, and the improved
121
h h
s s
F(x(s)) F(x(s))
Figure 4.11: The areas of the shaded rectangles are the left endpoint (Euler) and
midpoint approximations of
_
h
0
F(x(s)) ds.
Euler method is second-order. Variants of the more sophisticated
numerical methods ode45 and Runge-Kutta-Fehlbergs rkf45, both
accessible in the Matlab
R _
software package, are fourth-order (errors
on the order of h
4
).
16. Another weakness of Eulers method is that it performs poorly for sti ODEs.
There is no rigorous denition of stiness, but typically this behavior occurs
when dierent components of a system of ODE evolve at radically dierent
rates. Here is an academic example of such a system:
(a) x

= y
(b) y

= x (2x
2
z)y
(c) z

= M (z 1 x
2
)
(4.79)
where M is a large number. We invite the reader to attack (4.79) numerically
before we attempt to explain further.
(a) Solve (4.79) with Eulers method for M = 10
3
and M = 10
5
. Warning:
You will nd that h needs to be rather small for Eulers method to perform
appropriately.
Discussion: Some insight into stiness may be gleaned from the
scalar IVP x

= Mx, x(0) = 1, where M > 0 is large. The exact


solution to this IVP is a rapidly-decaying exponential function x(t) =
e
Mt
. Using Eulers method to generate the approximations y
n

x(nh), we compute
y
n+1
= y
n
+ h(My
n
) = (1 Mh)y
n
,
122
from which it follows that y
n
= (1 Mh)
n
for each n 0. Unless
h < 2/M, the iterates y
n
do not tend to zero! Indeed, if h > 2/M,
then the sequence y
n
grows exponentially with consecutive ele-
ments alternating in signbehavior that is totally unrelated to the
true solution. Eulers method is guaranteed to converge if h is su-
ciently small, but in this case, because of the rapid evolution in the
ODE x

= Mx, h
0
must be exceedingly small.
Stiness occurs when, as in (4.79), an equation with rapid evolution
accompanies others with modest evolution rates. Thus, the restric-
tion on h required by (4.79c) means that prohibitively many steps
are required in order to solve (4.79a,b) x(t) over an interval of any
appreciable length.
Since M is large, we might consider the approximation M ,
in which case (4.79c) is transformed to an algebraic equation z =
1 + x
2
, and substitution of this into (4.79b) reduces the original
three-dimensional system to the van der Pol system, which is two-
dimensional and has no stiness problem.
(b) Compare your solution in Part (a) with solutions obtained from the ap-
proximation of M .
Give some idea of software that is eective on sti problems? Maybe
give an exercise on an implicit method?
4.6.4 Exercises of independent interest
17. Example of ODE such that one sln blows up in nite time but pertbn exist
for all time:
x

= x
2
y
2
y

= 2xy.
Orbits are the x-axis and circles x
2
+(y C)
2
= C
2
. Can prove in 2 ways. (1)
Write equation as
2C = x
2
/y + y
and dierentiate. (2) Solve the ODE
dy
dx
=
2xy
x
2
y
2
to obtain the above equation, with C as a constant of integration. Include
hint: Multiply equation by x
2
y
2
and divide by y
2
to make exact.
Give reference so interested student can follow up.
123
18. Prove global existence for
x

= y
y

= y [1 +
2
cos t] sin x,
the model introduced in Exercise 11 for a pendulum vibrated at its base. Move
this to routine exercises. Also, connect with computation.
Q: Which rst, Notes or Euler?
4.7 Additional notes
While Theorem 4.4.1 gives control of the solution of an IVP for any nite time, it
does not give control for innite time. Indeed, as we will see in Chapter 5,
lim
t
(t, b)
may be discontinuous in b.
4.8 Appendix: Eulers method
4.8.1 Introduction
In practice, it is rarely possible to produce explicit solutions of ODEs, so one often
resorts to the use of numerical methods
5
in order to approximate the solution of
an IVP. Indeed, on several occasions above we have already used software for this
purpose. In this section we introduce the simplest numerical method, known as
Eulers method. We do not propose to actually use this method to learn about
solutions of ODEsthe software employs methods that are far more accurate than
this, and their automated control of step size makes them a joy to use. Rather,
we study Eulers method for cultural reasons: i.e., it provides useful insight into
numerical methods in general, while its simplicity allows the conceptual issues to
come through more easily.
Eulers method is an iterative process for approximating the solution of (4.1), say
on a set of evenly-spaced
6
t-values. Given h > 0, for n = 0, 1, . . . , we calculate the
approximation y
n
for x(nh) recursively according to the rule
y
0
= b; y
n+1
= y
n
+ hF(y
n
), n = 0, 1, . . . . (4.80)
5
Perturbation methods , some of which are discussed in Sections 6.3 and 6.4 (among other places),
oer another valuable way of approximating solutions.
6
Actually, we assume equal spacing only for simplicity; this is not necessary.
124
Although y
n
depends on h, we follow the usual convention of not indicating this
dependence.
As an illustration, consider the scalar IVP x

= x, x(0) = 1, which has exact


solution x(t) = e
t
. Let approximate the solution on the interval t [0, 1] with
Eulers method, say using a step size of h = 1/N where N is a positive integer.
Starting from y
0
= 1, we use (4.80) to generate the subsequent iterates recursively:
y
n+1
= y
n
+ hy
n
=
_
1 +
1
N
_
y
n
, (n = 0, 1, . . . , N 1)
so y
n
= (1 + 1/N)
n
. In particular,
y
N
= (1 + 1/N)
N
e = x(1)
as N (and thus h 0). In other words, the approximation works as desired for
the point t = 1. More generally, y
n
provides an approximation for e
t
for all t [0, 1],
but the formulation of this behavior is made slightly awkward by the following two
issues: (i) the number of iterations needed to reach time t increases as 1/h as h 0
and (ii) any specic time t need not belong to the set of grid points nh : n = 0, 1, . . .
for which the approximations are computed. Specically, the convergence result for
this example guarantees that
lim
h0
max
0nh
1
[e
nh
y
n
[ = 0.
A convergence result for the general case is given in Theorem 4.8.1 below.
4.8.2 Theoretical basis for the approximation
We oer three motivations
7
for Eulers method. All three motivations begin with
the limited goal of understanding the rst step in (4.80), which we may rephrase as
x(h) x(0) + hF(x(0)). (4.81)
(For simplicity, we assume temporarily that (4.1) is a scalar equation.)
Method 1: (Tangent line) Interpreting the derivative geometrically (see Figure 4.12),
we see from the ODE that the slope of the solution curve through (0, x(0)) equals
F(x(0)). Thus, we may estimate x(h) by following the tangent line, resulting in the
approximation (4.81).
7
As regards Eulers method, all three motivations produce the same approximation, but dierent
advanced numerical methods may result from starting with one or another of these three points
of view.
125
x(h)
h
exact
0
x
t
x(0) =
slope
0
y
F(y )
0
1
y
Figure 4.12: Schematic illustration of one iteration of Eulers method. Note the
discrepancy between the exact solution x(h) and its Eulers method approximation
y
1
.
Method 2: (Finite dierences) Using the dierence quotient approximation
x(h) x(0)
h
x

(0) = F(x(0)),
we again obtain (4.81).
Method 3: (Integral equation) Reformulating the IVP as an integral equation,
x(h) = x(0) +
_
h
0
F(x(s)) ds,
we derive (4.81) from a one-term Riemann-sum approximation for the integral.
The continuation of Eulers method may seem like an act of desperation. It is
extremely unlikely that the point (h, y
1
) will lie on the exact solution curve (see
Figure 4.12). Nevertheless, it is the best information we have about the solution.
Therefore, we will use that point as the starting point for another iteration of Eulers
method: i.e., we let y
2
equal the Euler approximation to the solution of x

= F(x)
through the point (h, y
1
). All subsequent steps are derived similarly. One may well
wonder about an approximation in which each step is based on increasingly faulty
information, especially since as h 0 more and more steps are required to advance
a nite time. In fact, as we show in the next section, the accumulated error in the
numerical solution tends to zero with h.
126
4.8.3 Convergence of the numerical solution
If F is dened everywhere, then the denition (4.80) of y
n
remains meaningful for
arbitrarily large n, even if the solution x that is being approximated blows up in nite
time. However, if F is dened only on a subset | R
d
, then some iterate y
n
may lie
outside |, so the iteration would halt. This possibility is addressed in Conclusion (i)
of the following theorem, which has much in common with Theorem 4.4.1.
Theorem 4.8.1. Suppose F : | R
d
is locally Lipschitz, and let x(t), 0 t < ,
be a solution in forward time of x

= F(x) with initial condition x(0) = b. Then:


(i) For any positive T that satises T < , there exists a positive constant h
0
such that, if h < h
0
, the iterates y
n
are dened for all n such that nh T.
(ii) For any > 0, there are constants C, L such that if h < h
0
, then
[x(nh) y
n
[ Che
Lnh
, for 0 nh T.
Remark: Note that Conclusion (ii) implies the uniform estimate [x(nh) y
n
[
Che
LT
.
Proof. Choose a compact subset / | and a constant > 0 such that
(t [0, T])

B(x(t), ) / |.
Regarding Conclusion (ii), we let C = max
K
[F[ and we let L be a Lipschitz constant
for F[/; regarding Conclusion (i) we let h
0
= e
LT
/C. We compute y
n
for as many
iterations as nh T and y
n
/, say n N. Note that y
N
/, so that it is possible
to calculate at least one more iterate, y
N+1
. We shall prove that if (N + 1)h T,
then y
N+1
/. Thus, the iteration stops only because nh exceeds T.
Now the solution x satises the integral equation
x(t) = b +
_
t
0
F(x(s)) ds.
In order to derive an analogous equation for the approximate solution, we construct
a piecewise constant function that is dened for (continuous) t [0, (N + 2)h):
y
(h)
(t) = y
n
for nh t < (n + 1)h.
We claim that at the grid points
y
(h)
(nh) = b +
_
nh
0
F(y
(h)
(s)) ds , n = 0, 1, 2, . . . , N + 1,
127
which may be veried with induction since
_
(n+1)h
nh
F(y
(h)
(s)) ds = hF(y
n
),
the integrand being constant. More generally, between the grid points, if nh t <
(n + 1)h
y
(h)
(t) = b +
_
t
0
F(y
(h)
(s)) ds
_
t
nh
F(y
(h)
(s)) ds.
Suppose (N + 1)h T. For 0 t (N + 1)h, let g(t) = [x(t) y
(h)
(t)[.
Subtracting integrals we deduce that
g(t)
_
t
0
[F(x(s)) F(y
(h)
(s))[ ds +
_
t
nh
[F(y
(h)
(s))[ ds, 0 t (N + 1)h
where n is the largest integer such that nh t. Note that in the integrands
x(s), y
(h)
(s) /. By the denition of C the second term here satises
_
t
nh
[F(y
(h)
(s))[ ds Ch,
and by Lipschitz continuity the rst satises
_
t
0
[F(x(s)) F(y
(h)
(s))[ ds L
_
t
0
g(s)ds.
Thus by Gronwalls inequality (more properly, by the extension to piecewise contin-
uous functions in Exercise 5 in Chapter 3)
g(t) Che
Lt
. (4.82)
In particular, taking t = (N + 1)h, we see that
[y
N+1
x((N + 1)h)[ Che
L(N+1)h
< Che
LT
< ,
so y
N+1
/, which completes the proof.
Although the iterates produced by Eulers method converge to the true solution
of the IVP as h 0, the error estimate provided by Theorem 4.8.1 actually alludes
to one of the methods weaknesses: i.e., the error is on the order of h to the rst
power. By contrast, the error in the Matlab
R _
routine ode45 is of the order of h
4
.
Thus if the step size is halved, the error is decreased by a factor of 16! (Of course
typically h is chosen by the software, not the user, so this behavior is not readily
apparent to the user.)
128
Much eort has gone into devising highly accurate numerical methods for solving
ODEs. While this subject lies outside the scope of the present text, Exercises 15 and
16 hint at ways Eulers method can be improved. For a more thorough presentation,
see Atkinson [1].
129
Chapter 5
Trajectories near equilibria
From here on in this book, unless otherwise stated, we shall assume in the ODE
x

= F(x) that the function F is (


1
.
5.1 Stability of equilibria
A point b

R
d
is called an equilibrium point for an ODE x

= F(x) if at this
point F(b

) = 0. If b

is an equilibrium point for an ODE, then x(t) b

is a
solution of the equation. For a linear homogeneous equation x

= Ax, the origin


b

= 0 is always an equilibrium, and it is the only equilibrium if A is invertible.


Regarding nonlinear equations, consider the 2 2 system derived from Dungs
equation without forcing
x

= y
y

= x x
3
y,
(5.1)
which has equilibrium points (0, 0) and (1, 0); these correspond to the local max-
imum and the two global minima, respectively, of the potential function V (x) =
x
2
/2 +x
4
/4. Similarly, we saw in Section 4.3.1 that the activator-inhibitor system
(a) x

=
1
1+r
x
2
1+x
2
x
(b) r

= [
x
2
1+x
2
r],
(5.2)
has multiple equilibria if
2
> 4( + 1). Indeed, multiple equilibria is the more
typical behavior for nonlinear equations.
130
5.1.1 The main theorem
Recall from Theorem 2.4.1 that for a linear homogeneous equation x

= Ax, if the
eigenvalues of the coecient matrix satisfy

j
(A) < 0, j = 1, 2, . . . , d
then every solution x(t) of the equation decays to zero as t . The following
theorem, a major and quite beautiful result, asserts that one may deduce similar
behavior for solutions of a nonlinear equation x

= F(x) near an equilibrium point


b

, provided the eigenvalues of DF(b

), the dierential of F at the equilibrium,


satisfy this condition. Here and below we shall abbreviate DF(b

) to DF

.
Theorem 5.1.1. Suppose b

is an equilibrium point for x

= F(x), where F is (
1
,
and assume that

j
(DF

) < 0, j = 1, 2, . . . , d. (5.3)
Then there is a neighborhood 1 of b

in R
d
such that for any initial data b 1, the
IVP
x

= F(x), x(0) = b (5.4)


has a solution for all t 0 and moreover lim
t
x(t) = b

.
Remark. The theorem includes the linear case because if F(x) = Ax, then at any
point x the dierential DF(x) = A. In general, if b

is an equilibrium of x

= F(x)
and if A = DF(b

), we shall call the equation y

= Ay the linearization of x

= F(x)
at b

.
Proof of Theorem 5.1.1. Making an appropriate translation in R
d
, we may assume
without loss of generality that the equilibrium b

is located at the origin: i.e., F(0) =


0. Lets expand F in a Taylor series at the origin: F(x) = Ax + r(x) where the
constant term is missing since F(0) vanishes, in the linear term A = DF(0), and
using the order-terminology of Section 4.5.2, the remainder r = o([x[). We may
rewrite the IVP (5.4) as
x

Ax = r(x(t)), x(0) = b. (5.5)


We interpret (5.5) an a linear equation with constant coecients with the inhomo-
geneous term r(x(t)). As we saw in (2.39), a solution of (5.5) satises the integral
equation
x(t) = e
At
b +
_
t
0
e
(ts)A
r(x(s)) ds. (5.6)
Like (3.12), used in proving the existence theorem, (5.6) appears to give a formula for
the solution of the IVP, but actually it is only an integral equation that characterizes
131
the solution. The advantage of (5.6) over (3.12) is that the assumption on the
eigenvalues of A implies that e
At
tends to zero as t , assisting the convergence of
the integral for large t. Specically according to Proposition 2.4.2 there are constants
K, where K 1 and > 0 such that
|e
At
| Ke
t
, t 0. (5.7)
Choose a positive constant < /. Since r(x) = o([x[), there is a > 0 such that if
[x[ < then [r(x)[ < [x[.
Claim: If [x(0)[ = [b[ < /K, then the solution of (5.5) satises [x(t)[ < for
as long as it exists.
Proof. With the same as in (5.7), let g(t) = e
t
[x(t)[. We seek to control the growth
of g(t) as t in order to prove that x decays. Now (5.6) implies that
g(t) e
t
[e
At
b[ + e
t
_
t
0
[e
(ts)A
r(x(s))[ ds.
Applying (5.7) to estimate the exponentials in each term we conclude that
g(t) K[b[ + K
_
t
0
e
s
[r(x(s))[ ds. (5.8)
Let us derive a contradiction by assuming that there is a time t

such that [x(t)[ <


for t < t

while [x(t

)[ = . Then for t t

we may estimate the second term of


(5.8)
K
_
t
0
e
s
[r(x(s))[ ds K
_
t
0
e
s
[x(s)[ ds = K
_
t
0
g(s) ds.
Hence by Gronwalls Lemma, g(t) K[b[e
Kt
. Recalling the denition of g, we
conclude that
[x(t

)[ = e
t
g(t

) K[b[e
(K)t
.
But K[b[ < , and e
(K)t
< 1 by our choice of . Thus [x(t

)[ < , contradicting
our assumption above, and this proves the claim.
Proof of Theorem 5.1.1 concluded. Let 1 = b R
d
: [b[ < /K. By the
claim, the solution of (5.4) stays inside

B(0, ) for as long as it exists, and it follows
from Theorem 4.1.2 that the solution exists for all positive time. Moreover [x(t)[
e
(K)t
, so x tends to zero as t .
The following corollary of Theorem 5.1.1 makes the convergence to the equilib-
rium more quantitative.
132
Corollary 5.1.2. If in Theorem 5.1.1 the eigenvalues satisfy

j
(DF

) <

, j = 1, 2, . . . , d (5.9)
where

> 0, then 1 may be chosen with the property that there is a constant K
such that for all b 1, the solution of the IVP satises
[x(t)[ Ke
t
[b[, t 0. (5.10)
This result may be derived by exercising a little more care in the proof of Theo-
rem 5.1.1, a task we ask you to complete in the Exercises.
5.1.2 An illustrative example:
Let us illustrate these ideas by applying them to Dungs equation (5.1), which
describes motion in the double-well potential V (x) = x
2
/2 + x
4
/4. At the two
minima of V we have
DF(1, 0) =
_
0 1
2
_
.
To test whether eigenvalues have negative real parts, we compute
det DF = +2 > 0, tr DF = < 0
where we have assumed > 0: i.e., normal friction. Hence by Proposition 2.4.4,
the eigenvalues of DF both have negative real parts, and so Theorem 5.1.1 applies
to the equilibria (1, 0). Of course the term potential well suggests this behavior;
really the point of this example is more about checking the theorem than gaining
new information.
Examples that are more interesting will be studied below.
5.2 An orgy of terminology
Researchers have introduced a bewildering variety of terminology related to the hy-
potheses and conclusion of Theorem 5.1.1. Lets get through this, because these
concepts will help us to describe the phenomena more precisely.
5.2.1 Description of behavior near equilibria
(a) The terminology: An equilibrium b

of a system x

= F(x) is called Liapunov


stable if for every neighborhood 1 of b in R
d
, there is a smaller neighborhood 1
1
such
that if initial data b are restricted to belong to 1
1
, then the IVP (5.4) is solvable for
all positive times and moreover x(t) 1 for all t 0. The equilibrium b

is called
133
asymptotically stable if it is Liapunov stable and if there is one neighborhood 1

of
b

such that for all initial data in 1

the solution of (5.4) satises


lim
t
x(t) = b

. (5.11)
The results of Section 5.1 imply that b

is asymptotically stable if condition (5.3) is


satisednote that we must invoke Corollary 5.1.2 to conclude that b

is Liapunov
stable.
Unstable will mean the negation of Liapunov stability: i.e., there is some neigh-
borhood 1 of b in R
d
such that for every smaller neighborhood 1
1
there are initial
conditions b 1
1
such that the solution to the IVP (5.4) leaves 1 at some nite,
positive time.
Two examples may explain the need for the careful language used in the deni-
tions of stability and asymptotic stability. First, one might think that if property
(5.11) held, then b

was surely Liapunov stable. This is false, as shown by the


following 2 2 system, which we write in polar coordinates
r

= r r
3

= 1 cos .
(5.12)
Trajectories for this system are illustrated in Figure 5.1. As may be seen from the
gure, all solutions of this system converge to (r, ) = (1, 0) as t . However, a
trajectory that starts at a point (r, ) = (1, ) where is small and positive, proceeds
all the way around the circle before it converges to (r, ) = (1, 0); in particular, it
leaves the ball around (r, ) = (1, 0) of radius 1/2.
Secondly, regarding the nested neighborhoods in the denition of Liapunov stable,
consider the following linear 2 2 system:
x

=
_
M
1
M
_
x
where M is a large constant and > 0 a small one. For any constant C,
x(t) = Ce
t
_
cos t
M sin t
_
is a solution of this system. Even though they spiral into the origin, these orbits are
very elongated, as shown in Figure 5.2. Suppose we are given a circular neighborhood
1 = b R
2
: [b[ < of the origin. We want to nd another circular neighborhood
1
1
= b R
2
: [b[ < such that if x(0) 1
1
, then the trajectory remains conned
to 1. We have to choose < /M because orbits are so elongated. If we were smart
enough to choose 1
1
with a perfect shape adjusted to the orbit, we could arrange that
x(t) 1
1
for all t 0. In fact this is possible for linear systems (see Exercise ??.)
134
y
x
1
Figure 5.1: The point (1, 0) is attracting but not Liapunov stable.
but completely impractical in general.
(b) Unstable and borderline cases: In the Exercises we ask you to establish the
following claim: If b

is an equilibrium of x

= F(x) and if
j
(DF

) > 0 for
some j, then b

is unstable. (Often much more specic information is available


see Section 5.5.) For example, regarding the equilibrium at the origin of Dungs
equation (5.1), we have det DF(0, 0) < 0. Thus the eigenvalues of DF have opposite
signs, one of them being positive, so the origin is unstable. Of course this conclusion
is entirely expected.
If at an equilibrium, we have only
j
(DF

) 0i.e., if the inequality (5.3) is


not strict
1
then no information can be deduced from the linearization. To justify
this statement, consider the scalar ODE
x

= x
3
. (5.13)
For either sign the origin is an equilibrium, and the lone eigenvalue of DF(0) vanishes.
However if the minus sign is chosen, the origin is asymptotically stable, while if the
plus sign is chosen, it is unstable. Indeed, in the latter case, all solutions except
x(t) 0 blow up in nite time, an extreme form of instability.
In Dungs equation (5.1), if the friction coecient = 0 then the eigenvalues
of DF at (1, 0) are pure imaginary. Thus, in this case one cannot determine the
stability of these equilibria from theory based on linearization. In fact, the equilibria
are Liapunov stable but not asymptotically stable, as can easily be shown using a
Liapunov function, a technique we will introduce in Section 5.4.
1
For example, (5.12) suers from this degeneracy.
135
b
2
b
1

{b: |b|= }
{b: |b|= }
Figure 5.2: Choosing large M gives rise to an elongated spiral.
5.2.2 Classication of eigenvalues of 2 2 Jacobians
Many authors use the term hyperbolic to describe an equilibrium b

of an ODE
x

= F(x) such that

j
(DF

) ,= 0, j = 1, . . . , d. (5.14)
We regard this terminology as unfortunate since the word hyperbolic already has so
many uses in mathematics, but it is well established and we will also use it. The
adjective hyperbolic derives from the simplest ODE with such an equilibrium,
_
x

_
=
_
0 1
1 0
_ _
x
y
_
,
whose solutions move along hyperbolas x
2
y
2
= C where C is a constant; in
general, however, orbits near a hyperbolic equilibrium have only a weak, qualitative
resemblance to hyperbolas.
In two dimensions there is an extensive vocabulary describing the eigenvalues of
the Jacobian at an equilibrium. In Table 5.1 we have listed many terms classifying
such equilibria that are hyperbolic. It will be useful below to have these available while
considering specic examples, and we recommend that, during some captive time
like on a long ight, you commit them to memory. Along with the memorization, you
should make phase-plane plots for a linear ODE with a node, with a focus, and with
a saddle. (Strictly speaking, there are three qualitatively dierent case for a node,
according as (i)
1
,=
2
, (ii)
1
=
2
with Jacobian equal to
1
I, or (iii)
1
=
2
with the Jordan normal form of the Jacobian being a 2 2 block.)
136
Name Eigenvalues of Jacobian Characterizing inequalities
Node Both eigenvalues real, of same sign (trA)
2
> 4 det A > 0
(May be stable or unstable)
Focus Eigenvalues complex conjugates, ,= 0 0 < (trA)
2
< 4 det A
(May be stable or unstable)
Saddle Both eigenvalues real, of opposite sign det A < 0
(Unstable)
Sink Both eigenvalues in LHP, < 0 trA < 0, det A > 0
(Stable; may be node or focus)
Source Both eigenvalues in RHP, > 0 trA > 0, det A > 0
(Unstable; may be node or focus)
Table 5.1: Types of hyperbolic equilibria in two dimensions.
5.2.3 Two-dimensional equilibria and slopes of nullclines
Consider an equilibrium (x

, y

) of a two-dimensional system
_
x

_
=
_
F
1
(x, y)
F
2
(x, y)
_
(5.15)
at which det DF

,= 0. Because the determinant is non-vanishing, both gradients


F
j
(x

, y

), j = 1, 2 are nonzero, so both nullclines are nonsingular curves. Note


that the slope of the x-nullcline is the quotient
F
1
x
/
F
1
y
, interpreted as if the
denominator vanishes, and similarly for the slope of the y-nullcline.
Multiply and divide det DF by the product
F
1
y
F
2
y
to deduce
det DF =
F
1
y
F
2
y
_
F
1
x
F
1
y

F
2
x
F
2
y
_
, (5.16)
provided
F
1
y
F
2
y
,= 0i.e., provided neither nullcline has innite slope. If either
F
1
y
or
F
2
y
vanishes, then only one of the two terms in det DF

is nonzero, so the sign of


det DF

may be determined by inspection, and hence we assume that


F
1
y
F
2
y
,= 0.
Thus (5.16) represents det DF

as a factor times the dierence in the slopes of the


nullclines. This observations allows to determine the sign of det DF

from the slopes


of the nullclines, as given in Table 5.2. Reference should be 5.2, but this isnt
working. (Dont forget the minus sign in the formula for the slope.) Of course the
equilibrium is a saddle if det DF

< 0; and it is a sink, a source, or nonhyperbolic


according as tr DF

is negative, positive, or zero, respectively.


137
Signs of
F
1
y
,
F
2
y
Slope x-nullcline larger Slope y-nullcline larger
Same det DF

< 0 det DF

> 0
Opposite det DF

> 0 det DF

< 0
Table 5.2: Types of equilibria and slopes of nullclines
5.2.4 The Hartman-Grobman Theorem
Suppose a two-dimensional system x

= F(x) has a hyperbolic equilibrium of one


of the above types. A little computation with examples oers convincing evidence
that the ow of the nonlinear system near the equilibrium qualitatively resembles the
ow of the linearization. The resemblance is made precise in the Hartman-Grobman
Theorem, which states that the ow of x

= F(x) is topologically conjugate to that


of its linearization. More precisely:
Theorem 5.2.1. Suppose b

is a hyperbolic equilibrium point for x

= F(x), let be
the ow map for this equation, and let A = DF

. Then there exists a neighborhood


1 of b

and a continuous map : 1 R


d
, which is a homeomorphism onto its
range, such that
((t, b

)) = e
tA
(b

) (5.17)
for all b

1 and all times such that (t, b

) is dened and belongs to 1.


Note that the result asserts only the continuity of the homeomorphism. There are
several dierent results giving additional conditions which guarantee that a smoother
exists. One of the more satisfying results is due to Hartman need ref: if the equi-
librium b

is asymptotically stable (i.e.,


j
(DF

) < 0), there is a (


1
dieomorphism
such that (5.17) is satised, provided F is (
2
. However, all these issues are beyond
the scope of this book.
References:
Perko proves H/G theorem, states Hartmans improvement.
Chicone proves H/G.
Meiss also proves H/G.
Wiggins refers to Arnold and to Palis-de Melo for proof. Also gives example to show
that in Hartman theorem cannot be (
2
.
5.3 Activator-inhibitor systems and the Turing instability
In Section 4.3.1 we proved global existence in forward time for the system
(a) x

=
1
1+r
x
2
1+x
2
x
(b) r

= [
x
2
1+x
2
r]
(5.18)
138
where , , are positive parameters. (Also we briey described the physical signif-
icance of the equations.) In the rst subsection below we determine the equilibria
of (5.18) and their stabilities. In the second, we present the Turing instability: i.e.,
we use Theorem 5.1.1 to show that, if the reaction takes place in several vessels,
a stable equilibrium of (5.18) can be destabilized as a result of chemicals diusing
between vessels. This is a lovely application of the theorem, and it introduces some
fascinating mathematics.
5.3.1 Equilibria of the activator-inhibitor system
The origin (0, 0) is an obvious equilibrium of (5.18). This system has the Jacobian
DF =
_

1+r
2x
(1+x
2
)
2
1

(1+r)
2
x
2
1+x
2

2x
(1+x
2
)
2

_
, (5.19)
which at the origin equals
DF

=
_
1 0
0
_
,
so by Theorem 5.1.1 (0, 0) is asymptotically stable.
Turning our attention to the other equilibria, we set the RHS of (5.18b) to zero
and solve for r to obtain
r =
x
2
1 + x
2
, (5.20)
and we process (5.18a) similarly: excluding the zero solution, we divide by x and
rewrite the equation as
r =
x
1 + x
2
1. (5.21)
Now substitute (5.20) for r in (5.21), clear 1 + x
2
from the denominator, and rear-
range, yielding
( + 1)x
2
x + 1 = 0. (5.22)
This relation is graphed in Figure 5.3. Thus, (5.18) has two nontrivial equilibria if
> 2
_
+ 1, (5.23)
and none if < 2

+ 1. We assume that the parameters satisfy (5.23).


Let us investigate the stabilities of the two equilibria (5.22), say E

= (x

, r

).
As shown in Section 5.2.3, we may determine the sign of det DF

by comparing
the slopes of the two nullclines. Note from (5.19) that F
1
/r and F
2
/r always
have the same sign, both negative. Thus, by Table 5.2, the equilibrium E

is a
saddle point since, as we see in Figure 5.4 the x-nullcline has larger slope than the
r-nullcline there. By contrast, E
+
is either a sink or a source; to determine which
139
+1
1
+1 2
x
+
x

Figure 5.3: Nonzero equilibria of (5.18) for 2

+ 1.
we must compute the sign of tr DF

.
The sign of tr DF

depends on the parameters , , . According to (5.22), given


, either quantity or x
+
determines the other. The calculations are more convenient
if we regard x
+
, , as the independent parameters and determine from (5.22)
= ( + 1)x
+
+ 1/x
+
, (5.24)
remembering from Figure 5.3 that the range of x
+
is bounded from below, x
+
>
1/

+ 1. Now
tr DF

= a
11
=
2
1 + x
2
+
1 (5.25)
where we have invoked (5.21) in calculating a
11
, the 1,1-entry of the Jacobian (5.19).
Manipulating (5.25), we nd that E
+
is asymptotically stable (i.e., a sink) if
_
1
1 +
< x
+
. (5.26)
If > 1i.e., if the radical is complexthen E
+
is asymptotically stable for all x
+
in the physical range x
+
> 1/

+ 1. If (5.26) is not satised, then E


+
is a source.
In the next subsection we focus on the parameter range
min
__
1
1 +
,
1

+ 1
_
< x
+
< 1. (5.27)
In this case (5.18) has three equilibria and the top equilibrium is a sink. Moreover,
of particular interest to us, the inequality x
+
< 1 means that at this equilibrium
the 1,1-entry of the Jacobian (5.19) is positive, as may be seen from the formula
a
11
= 2/(1 + x
2
+
) 1.
140
E
+
E

r
x
rnullcline
xnullcline
Figure 5.4: Nullclines and equilibria of (5.18) for = 4 and = 5/2. The r- and
x-nullclines are given by (5.20) and (5.21), respectively. Note that this gure diers
from the nullclines in Figure 4.2 because the present gure is drawn assuming that
the parameters satisfy (5.27).
5.3.2 The Turing instability: Destabilization by diusion
The Turing instability may arise if an activator and an inhibitor react in a spatially
extended environment in which the chemicals may diuse. It is believed that, in
morphogenesis, this mechanism underlies the formation of periodic structures like
hair on mammals, gills on sh, feathers on birds, etc. The full description of the
Turing instability requires both space and timei.e., a PDEwhich is beyond the
scope of this book. However, the essential phenomenon occurs in a toy model that
we study in this section.
To develop intuition about the eect of diusion, let us consider a hypothetical
scalar reaction, say modeled by r

= 1 r, that takes place in two reaction vessels


coupled by diusion. This situation is described by the equations
r

1
= 1 r
1
+ D(r
2
r
1
)
r

2
= 1 r
2
+ D(r
1
r
2
).
The diusive terms cause reactant to move from the cell with higher concentration
to the cell with lower concentration, at a rate proportional to the dierence in con-
centration. The original scalar ODE has a unique equilibrium at r = 1, and it is
asymptotically stable. The system with diusion has the unique equilibrium (1, 1);
moreover, the eigenvalues of the 2 2 coecient matrix are 1 and 1 2D, so
this equilibrium is also asymptotically stable. Indeed, the extra eigenvalue 1 2D
is even more negative than the original one. Thus diusion is usually regarded as a
stabilizing eect.
141
However, let us consider two reaction vessels, each containing an activator-inhibitor
system modeled by (5.18). Suppose the inhibitor is allowed to diuse between the
two cells, which leads to the four-dimensional system
(a) x

1
=
1
1+r
1
x
2
1
1+x
2
1
x
1
(b) r

1
=
x
2
1
1+x
2
1
r
1
+ D(r
2
r
1
)
(c) x

2
=
1
1+r
2
x
2
2
1+x
2
2
x
2
(d) r

2
=
x
2
2
1+x
2
2
r
2
+ D(r
1
r
2
)
(5.28)
This system has the equal-concentration equilibrium (x
+
, r
+
, x
+
, r
+
) where (x
+
, r
+
)
are the coordinates of the top equilibrium E
+
of (5.18). Let us apply Theorem 5.1.1
to determine the stability of this equilibrium. The Jacobian DF

of (5.28) at the
equilibrium may be conveniently written in terms of the 2 2 blocks
A =
_
a b
c d
_
, B = D
_
0 0
0 1
_
where A is the Jacobian of (5.18) at the equilibrium and B includes the diusion
terms; specically,
DF

=
_
A B B
B A B
_
. (5.29)
By applying a similarity transformation with
S =
_
I I
I I
_
where I is the 2 2 identity matrix, we may reduce DF

to the block diagonal form


S
1
DF

S =
_
A 0
0 A 2B
_
,
whose eigenvalues are the two eigenvalues of A and the two eigenvalues of A 2B.
Since E
+
is asymptotically stable, the eigenvalues of A are both negative, but let us
consider those of
A 2B =
_
a b
c d 2D
_
.
The surprising behavior occurs when the parameters satisfy (5.27), which means that
the 1,1-entry a is positive. Note from (5.19) that b, c, d are always positive. If D is
large enough, the determinant of this matrix is negative, in symbols
a(d + 2D) + bc < 0.
142
For such large D, one of the eigenvalues of DF

must be positive, meaning the equi-


librium of the 4 4 system is unstable. This is the Turing instabilityan otherwise
stable equilibrium has been destabilized by diusion!
What is the long-term behavior of solutions of (5.28) when the equal-concentration
equilibrium is unstable? We cant stop the determined reader from ring up his/her
computer to answer this question right now, but let us mention that in Chapter 7
we will develop analytical methods to answer this question.
For reference when we return to this problem, we record the following information:
One eigenvalue of the Jacobian DF

is positive if D > D
thr
where the threshold value
of diusion is
D
thr
=
bc ad
2a
.
For this value of D, the equilibrium of (5.28) is non-hyperbolic. If D = D
thr
, the
null eigenvector v of DF

is
v =
_
w
w
_
(5.30)
where w spans the kernel of A 2B.
5.4 Liapunov functions
5.4.1 The main result
Liapunov functions provide another approach to analyze the stability of an equilib-
rium b

of x

= F(x) where F : | R
d
. Suppose |
1
| is an open neighborhood
of b

, and suppose L : |
1
R is continuous on |
1
and (
1
on |
1
b

. We shall
call L a Liapunov function for b

if
(i) For all x |
1
b

, L(x), F(x)) 0 and


(ii) For all x |
1
, L(x) L(b

), with equality only if x = b

.
(5.31)
Condition (ii) requires that b

is a strict local minimum of L. Regarding Condi-


tion (i): by the chain rule, the derivative of L(x) along a trajectory x(t) of x

= F(x)
is given by
d
dt
L(x(t)) = L(x(t)), x

(t)) = L(x(t)), F(x(t)));


thus, the inequality in Condition (i) implies that L(x) is nonincreasing along any
trajectory of x

= F(x).
Given a Liapunov function L(x), if the interior of the level set
x |
1
: L(x) c (5.32)
is compact, then this set is a trapping region. Typically, this set is compact, at least
143
provided c is suciently close to the minimum L(b

). Thus, usually a Liapunov


function provides a one-parameter family of trapping regions surrounding b

.
If in (5.31)(i), less-than-or-equal-to is replaced by strict inequality, L(x), F(x)) <
0, then L is called a strict Liapunov function. In this case, of course, along any tra-
jectory L(x) is strictly decreasing.
Theorem 5.4.1. (a) If x

= F(x) admits a Liapunov function L(x) near b

, then the
equilibrium is Liapunov stable. (b) If the equation admits a strict Liapunov function
near b

, then the equilibrium is asymptotically stable.


The proof is straightforward but rather drysorry.
Proof. (a) Let 1 |, a neighborhood of b

, be given. Choose so small that

B(b

, ) 1. Let
= min
|xb|=
L(x);
by (5.31ii) and compactness, > L(b

). Let
1
1
= x B(b

, ) : L(x) < 1.
Suppose x(t) is a trajectory of x

= F(x), dened for some interval of time, such


that x(0) 1
1
. Then for t 0
L(x(t)) L(x(0)) < ,
so x(t) does not cross the boundary of this ball, x : [x b

[ = . Thus, by
Theorem 4.1.2 the solution x must exist for all positive time and moreover
x(t) 1
1
1,
which shows that b

is stable.
(b) Let L be a strict Liapunov function near b

. By Part (a), if a trajectory starts


near b

, then it exists for all positive time and stays within a compact neighborhood
of b

. Suppose such a trajectory does not converge to b

. Then there exists a


sequence t
n
tending to innity such that x(t
n
) is bounded away from b

. By
invoking compactness and passing to a subsequence, if necessary, we may assume
without loss of generality that the sequence x(t
n
) has a limit, say x(t
n
) b for
some point b. Since L is continuous,
lim
n
L(x(t
n
)) = L(b) = lim
t
L(x(t)) (5.33)
the latter equality because L(x(t)) is a decreasing function. Now consider the IVP
for x

= F(x) with initial condition b; we write this solution as (s, b), using the
144
ow notation of Section 4.3. Since L is a strict Liapunov function, for any s > 0
L((s, b)) < L((0, b)) = L(b). (5.34)
On the other hand, by Theorem 4.4.1 giving the continuity of , we have (s, b) =
lim
n
(s, x(t
n
)), and by Proposition 4.4.3 giving the semigroup property, we have
(s, x(t
n
)) = x(t
n
+ s). Combining these, we conclude that
L((s, b)) = lim
n
L(x(t
n
+ s)) = L(b)
where we have invoked the second equality in (5.33). But this equation contradicts
(5.34).
5.4.2 Lasalles invariance principle
To illustrate a common diculty in applying Liapunov functions, let us attempt to
show that the equilibria (1, 0) of Dungs equation (5.1) are asymptotically stable.
We propose the energy E(x, y) = y
2
/2 x
2
/2 +x
4
/4 as our Liapunov function. The
equilibria (1, 0) are local minima of E, and Since
dE
dt
= y
2
0;
thus E is indeed a Liapunov function. Unfortunately it is not a strict Liapunov
function because dE/dt vanishes along the x-axis. Thus Theorem 5.4.1 implies only
that (1, 0) are Liapunov stable, not asymptotically stable.
The following result, known as Lasalles Invariance Principle, will allow us to
extract the desired conclusion despite this diculty. The analysis is based on infor-
mation about the set where the Liapunov inequality fails to be strict,
S = x |
1
0 : L(x), F(x)) = 0. (5.35)
Specically, we require that:
No trajectory that starts in S remains in S for all positive time. (5.36)
Theorem 5.4.2. If, near an equilibrium b

, x

= F(x) has a Liapunov function L


that satises assumption (5.36), then b

is asymptotically stable.
Proof. The proof of this result closely follows that of Theorem 5.4.1(b). Suppose
there is a trajectory x(t) starting close to b

that does not converge to b

. Then
proceeding as in the previous proof, we may choose a sequence t
n
such that x(t
n
)
b 1 and such that (5.33) holds. Again we consider the solution (s, b) of the IVP,
but with the following dierence: Before we could guarantee that (5.34) held for any
145
s > 0. Now, however, if b S, it might happen that L((s, b)) = L(b) for a range
of s, but the crucial point is this: by Assumption 5.36, the trajectory (s, b) cannot
remain in S indenitely, and hence there must be some value of s such that (5.34)
holds. The proof may now be completed as in the previous case.
In the Exercises we oer hints for how to use Lasalles Invariance principle to
obtain asymptotic stability for the equilibria (1, 0) of Dungs equation.
5.4.3 Construction of Liapunov functions
The real mystery regarding Liapunov functions is not how to use them but how to
nd them. In mechanical problems, such as Dungs equation above, energy is an
obvious candidate. In the Exercises we introduce you to two classes of equations
gradient systems and Hamiltonian systemswhose structure automatically provides
a Liapunov function. Failing these special cases, one is forced to rely on ingenuity.
Heres one reason why you need not feel discouraged by this prospectin studying
an equation over an extended period of time you will nd that you develop intuition
about it that may astonish those less familiar with the equation.
In support of these encouraging words, we present an example of a Liapunov
function constructed by ingenuity, which builds on knowledge the student has already
acquired in this book. We consider the logistic Lotka-Volterra system (4.28), which
for convenience we repeat here:
(a) x

= x(1 x/K y)
(b) y

= y(x 1).
(5.37)
We assume that K > 1 so that the coexistence equilibrium of (5.37), located at
(x, y) = (1, 1 1/K), It is readily veried that this equilibrium is asymptotically
stable, but nevertheless let us construct a Liapunov function. Note that setting
K = in (5.37) yields the original Lotka-Volterra equation (1.33). As we saw in
Chapter 1, the function
(x ln x) + y ln y
is constant on the (periodic) orbits of (1.33). Let us try to modify this function to
obtain a Liapunov function for (5.37), say
L(x, y) = Ax Bln x + y Dln y (5.38)
for some constants A, B, D. (By scaling the Liapunov function, we have assumed
without loss of generality that the coecient of y in (5.38) is unity.) To determine
the coecients in (5.38), we rst require that this function assumes its minimum at
the equilibrium of (5.37) for which (x, y) = (1, 1 1/K); this yields that A = B and
146
D = 1 1/K. Thus we may rewrite (5.38) as
L(x, y) = A(x ln x) + y (1 1/K) ln y.
Calculate that
dL
dt
= A(x 1)(1 y x/K) +(y 1 + 1/K)(x 1).
If we choose A = , then all terms that are not O(1/K) cancel, and this equation
simplies to
dL
dt
= (x 1)
2
/K.
In particular, dL/dt 0, so L is a Liapunov function for (5.37).
It follows from Theorem 5.4.1 that the equilibrium (1, 1 1/K) of (5.37) is
Liapunov stable. Although L is not a strict Liapunov function, by invoking Lasalles
Invariance Principle, one may show this equilibrium is asymptotically stable. In
fact, in the Exercise 3 we outline how to use L to prove the equilibrium is globally
attracting.
5.5 Stable and unstable manifolds
5.5.1 A preparatory example
In this section we study solutions of an ODE near an equilibrium where the dieren-
tial DF has eigenvalues with both positive and negative real parts. We shall assume
the equilibrium b

is hyperbolic, and we will call b

a generalized saddle.
Here is some useful geometric terminology. We dene the orbit of a solution of an
ODE to be the curve traced out by the solution, considered merely as a subset of R
d
,
independent of any parametrization. This term is used in contrast to trajectory; the
latter concept, which includes the specic parametrization that yields a solution of
the ODE, is an approximate synonym for solution, although the term trajectory
carries the suggestion of focusing on the geometry of the parametrized curve.
To introduce stable manifolds, as well as to illustrate this terminology, we consider
an academic example, the ODE
x

= y
y

= x + x
3
.
(5.39)
This equation describes a particle in a nonlinear, purely repulsive force F(x) = x+x
3
,
or potential V (x) = x
2
/2 x
4
/4. Because there is no friction, each orbit of (5.39)
147
y
x
Figure 5.5: A level set (5.40) in case C > 0. The level set decomposes into two
orbits.
is contained in (but not necessarily equal to) a level set of the energy function,
(x, y) : H(x, y) = C (5.40)
where H(x, y) = y
2
/2 + V (x). Below we will argue that, if for example C > 0, then
(5.40) consists of exactly two distinct orbits (see Figure 5.5)
y =
_
C + x
2
/2 +x
4
/4 and y =
_
C + x
2
/2 + x
4
/4 (5.41)
where < x < . By contrast, innitely many trajectories are associated with
the single orbit y =
_
C + x
2
/2 + x
4
/4 because for any given solution (x(t), y(t)) of
the ODE and any real t
0
, the shifted function (x(t t
0
), y(t t
0
)) is another solution,
and this species a dierent trajectory. Thus, it is more convenient to enumerate the
qualitatively dierent orbits of an ODE because this concept eliminates irrelevant
redundancy in trajectories.
Another dierence between orbits and trajectories is related to the fact that
solutions of (5.39) blow up in nite time, both forwards and backwards. By contrast,
the orbits (5.41) extend to innity. This behavior reects a general phenomenon: If
F is dened for all x R
d
, either an IVP is solvable for all time, or by Corollary 4.1.3
the orbit extends out to innity.
It is instructive to interpret the orbits of (5.39) physically in terms of the analogy
of a marble rolling in the x, z-plane over the hill given by z = V (x) (see Figure 5.6).
In case C > 0, the orbit y =
_
C + x
2
/2 + x
4
/4 in the upper half plane derives from
a particle that at large negative times is far to the left of the origin and is moving
to the right towards the top of the hill at x = 0 (see Figure 5.6); it slows down
as it approaches the top of the hill, but it has enough energy to clear it; after it
passes x = 0, it sails o to the far right, at ever increasing speeds. In focusing on
148
V(x)
x
Figure 5.6: Illustration of the rolling-marble analogy for (5.39) described in the
text.
y
x
Figure 5.7: A level set (5.40) in case C < 0. The level set decomposes into two
orbits.
the orbit, we suppress information regarding exactly when the particle moves over
the hill. Similarly, the orbit in the lower half plane may be interpreted in terms of a
particle moving from the right to the left that clears the hill.
If C < 0, (5.40) again consists of exactly two orbits (see Figure 5.7), one in the
right half plane x > 0 and one in the left half plane x < 0. In the rolling-marble
analogy, the orbit in the right half plane derives from a particle that at large negative
times is far to the right of the origin and is moving towards the hill but does not
have enough energy to clear it; thus the particle is turned around and sails back to
the far right as time increases. The orbit in the left half plane is similarly described.
When C = 0, the level set (5.40) consists of two crossed curves (see Figure 5.8),
y = x

1 + x
2
. Let us show that this level set contains exactly ve orbits, as
149
y
x
(v)
(ii)
(iii) (iv)
Figure 5.8: The level set (5.40) in the case that C = 0. The level set decomposes into
ve orbits, (5.42).Dierent colors for /
s
and /
u
? Describe color convention
in caption. Likewise in the next two gures.
follows:
(i) x = y = 0,
(ii) y = x

1 + x
2
, 0 < x < ,
(iii) y = x

1 + x
2
, < x < 0,
(iv) y = x

1 + x
2
, 0 < x < ,
(v) y = x

1 + x
2
, < x < 0.
(5.42)
We continue to invoke the rolling-marble analogy. The equilibrium at the origin
requires no comment. Orbit (ii) derives from a particle that at large negative times
was far to the right and is moving to the left with just enough energy to converge to
the top of the hill as t ; this is a single orbit. Similarly orbit (iii) derives from
a particle moving to the right that converges to the top of the hill. Mathematically,
orbits (iv) and (v) are quite similar to orbits (ii) and (iii), but the description in
words is a little harder to swallow: the particle falls o the hilltop at time minus
innity and is moving away from the equilibrium for all time, initially at innitesimal
speeds but continuously acceleratingit takes an innite amount of time to fall o
an equilibrium at time minus innity, just as it takes an innite amount of time to
converge to equilibrium as t tends to plus innity. The motion is to the right or left
for orbits (iv) or (v), respectively.
The orbits (5.42) are special in that they are the only orbits of (5.39) that make
contact with the equilibrium at the origin. In the next subsection we study analogous
behavior near an arbitrary hyperbolic equilibrium.
150
5.5.2 Statement of the main result
We are concerned with the IVP
x

= F(x), x(0) = b (5.43)


for initial conditions b close to a hyperbolic equilibrium b

. The notation /
s
in the
following theorem is mnemonic for stable manifold. An unstable manifold /
u
, having
similar asymptotic properties as time tends to negative innity, will be introduced
below.
Theorem 5.5.1. Suppose that the rst d
s
eigenvalues of the Jacobian DF(b

) have
negative real parts and that the remaining dd
s
eigenvalues of DF(b

) have positive
real parts. Then there is a (bounded) neighborhood 1 of b

in R
d
and a dierentiable
manifold /
s
1 of dimension d
s
through b

such that:
(i) If b /
s
, then the IVP (5.43) has a solution (t, b) for all positive time
with (t, b) /
s
, and moreover
lim
t
x(t) = b

. (5.44)
(ii) If b 1 /
s
, then (t, b) leaves 1 at some positive time.
First let let us interpret the theorem for the saddle of (5.39). The neighborhood
1 may be chosen with great latitude; for deniteness let 1 = B(0, r) be a ball of
some nite radius. You should verify that
/
s
= (x, y) 1 : y = x

1 +x
2
, (5.45)
satises all the claims of the theorem. In words, /
s
is the intersection of 1 with
the union of orbits (5.42i, ii, iii) above. For (5.39), conservation of energy allowed
us to derive an explicit formula for /
s
. If we modied (5.39) slightly, for example
by including a friction term, we could no longer parametrize /
s
explicitly, but we
could invoke the theorem to guarantee that a stable manifold still existed.
The theorem concerns what properly should be called a local stable submanifold
2
.
For (5.39), the curve y = x

1 + x
2
where < x < is a global stable manifold.
We shall discuss global stable manifolds in the general case below.
If in the theorem d
s
= d, then the stable manifold is the entire neighborhood 1
of b

. In this case, Theorem 5.5.1 is eectively just a restatement of Theorem 5.1.1.


2
Some authors use a notation such as /
(loc)
s
to indicate this idea explicitly. This notation seems
unbearably heavy to us, and we shall resort to it only when being so completely explicit is
necessary to avoid confusion.
151
To continue protably reading this book you need to develop intuition about
stable manifolds. This may be achieved, even without reading the proof of Theo-
rem 5.5.1, through interpreting the conclusions of Theorem 5.5.1 in various examples,
including (5.39), (5.56), the examples in the rest of Section 5, and a selection of ex-
ercises. About the proof of the theorem, there is good news and bad news. The
bad news is that the proof is long and technical. The good news is that it does not
involve new ideas and it may be omitted on a rst reading without serious loss of
continuity.
5.5.3 Proof of Theorem 5.5.1
The construction of /
s
is facilitated by two reductions. First, we translate co-
ordinates so that the equilibrium is at the origin, b

= 0, and second, we rotate


coordinates to separate the eigenvectors of DF(b

) associated with eigenvalues hav-


ing positive and negative real parts. Specically, regarding the second reduction,
after performing an appropriate similarity transformation we may assume without
loss of generality that DF(b

) has block-diagonal form


DF(b

) =
_
B 0
0 C
_
where B and C are (square) matrices of dimension d
s
and dd
s
, respectively, whose
eigenvalues all have negative real parts. Thus, the eigenvalues of B are the eigenvalues
of DF(b

) with negative real parts, and the eigenvalues of C are negatives of the
eigenvalues of DF(b

) with positive real parts. Moreover, let us apply the conclusion


of Exercise ??: i.e., at the expense of yet another similarity transformation we may
assume without loss of generality that there is a constant > 0 such that
|e
Bt
| e
t
, |e
Ct
| e
t
; (5.46)
note that there is no constant pre-factor on the RHSs of these estimates. Using these
new coordinates, we may write x R
d
as x = (y, z) where y = (x
1
, . . . , x
ds
) contains
the rst d
s
components of x and z contains the last dd
s
components. More formally,
we decompose R
d
= E
s
E
u
where E
s
= R
ds
0 is the span of the eigenvectors
of DF(b

) whose eigenvalues have negative real parts and E


u
= 0 R
dds
is the
span of the eigenvectors with positive real parts.
The manifold /
s
E
s
E
u
is tangent at the origin to E
s
. Thus, we may
152
describe /
s
as the graph
3
of a function : E
s
E
u
/
s
= (y, (y)) : y E
s
(5.47)
where (0) = 0. (Strictly speaking, (y) will be dened only for near y = 0.) The
construction of below is based on a xed-point argument. We shall not prove that
is dierentiable nor of course verify the tangency condition D(0) = 0.
We expand F(x) = Ax+r(x) in a Taylor series, where A = DF(0) and Dr(0) = 0.
By continuity we may choose > 0 such that
[x[

2 = [Dr(x)[

3
(5.48)
where is the decay rate in (5.46). The following lemma is less of a distraction now
than later.
Lemma 5.5.2. If [x
1
[, [x
2
[ <

2, then
[r(x
1
) r(x
2
)[

3
[x
1
x
2
[. (5.49)
Proof. The estimate (5.49) follows from integrating Dr along the line from x
1
to x
2
,
as in Lemma 3.2.3, and invoking (5.48).
Partial proof of Theorem 5.5.1: We start from the equivalent integral equation for
solutions of the IVP (5.43)
x(t) = e
At
b +
_
t
0
e
(ts)A
r(x(s)) ds.
Let us write this equation in components (y, z). Since A is block diagonal, multipli-
cation of a vector by e
At
does not mix components. Thus if we similarly decompose
b = (c, d) and r(x) = (p(x), q(x)) into components, the integral equation may be
rewritten
_
y(t)
z(t)
_
=
_
e
Bt
c +
_
t
0
e
(ts)B
p(y(s), z(s)) ds
e
Ct
d +
_
t
0
e
(ts)C
q(y(s), z(s)) ds
_
. (5.50)
Because C appears in (5.50) with a minus sign, it is desirable in the second component
of this equation to change from an initial condition to a terminal condition. In the
Exercise 1 we ask you to show that if T > 0, any solution of the IVP (5.43) satises
_
y(t)
z(t)
_
=
_
e
Bt
c +
_
t
0
e
(ts)B
p(y(s), z(s)) ds
e
C(Tt)
z(T)
_
T
t
e
(st)C
q(y(s), z(s)) ds
_
(5.51)
3
The present representation of /
s
is dierent from (5.45) since in (5.45) we have not performed
the preliminary similarity transformation to make the eigenvectors of DF(b

) parallel to the
coordinate axes.
153
for 0 t T.
Such integral equations are never formulas for the solution; here this is even
more true since z(T), about which we know nothing, appears on the RHS of
(5.51). However, in the next proposition we show that this term disappears in the
limit T .
Proposition 5.5.3. If x(t) satises (5.43) for all t 0 and if sup
t0
[x(t)[ < ,
then
x(t) =
_
e
Bt
c +
_
t
0
e
(ts)B
p(x(s)) ds

t
e
(st)C
q(x(s)) ds
_
. (5.52)
Proof. In (5.51) we hold t xed and let T . Because x(t) is bounded, it follows
from (5.46) that e
C(Tt)
z(T) tends to zero. Similarly, the integral from t to is
absolutely convergent. Thus (5.52) results from (5.51) by taking this limit.
Let us use (5.52) to dene a mapping on a space of functions as follows: Let
1 E
s
E
u
be the direct product of -balls in Euclidean space
1 = (y, z) R
d
: [y[ < , [z[ < ;
Note that 1 in contained in the ball of radius

2 so (5.49) holds for points in 1.


Let be the set of functions
= x (([0, ), R
d
) : y(0) = c, (t)x(t)

1.
Dene the map T : (([0, ), R
d
)
T[x](t) =
_
T
s
[x](t)
T
u
[x](t)
_
=
_
e
Bt
c +
_
t
0
e
(ts)B
p(x(s)) ds

t
e
(st)C
q(x(s)) ds
_
(5.53)
where for convenience below we have indicated the decomposition of T[x] into sta-
ble and unstable components.
Claim 1: If [c[ < , then for any x , the image T[x] belongs to .
Proof. It is clear that T
s
[x](0) = c, so to show T[x] we must estimate the
components of T[x]. For the unstable component, this is easyby (5.46)
[T
u
[x](t)[
_

0
e
s

[q(x(t + s

))[ ds

where we have made the substitution s

= s t in the integral. Restricting (5.49) to


the case x
2
= 0 we observe that
[q(x(t + s

))[

3
[x(t + s

)[

3

2.
154
On substitution into the integral, we deduce that
|T
u
[x]|

3

2
_

0
e
s

ds

2
3
< .
For the stable component, this is a little more delicatewe have
[T
s
[x](t)[ e
t
[c[ +
_
t
0
e
s

[p(x(t s

))[ ds

where we have made the substitution s

= t s in the integral. The rst term is


strictly less than e
t
. For the second, invoking (5.49), we deduce that [p(x)[
(/3)

2 and evaluate the integral, nding


[T
s
[x](t)[ <
_
e
t
+

2
3
(1 e
t
)
_
.
Remark: In fact we have shown that T[x](t) belongs to the open set 1.
Claim 2: If [c[ < , the mapping T : is a contraction.
Proof. We estimate components separately. For the unstable components
[T
u
[x
1
] T
u
[x
2
][(t)
_

0
e
s
[p(x
1
(t + s)) p(x
2
(t + s))[ ds.
Using (5.49) we conclude
|T
u
[x
1
] T
u
[x
2
]|
_

3
_

0
e
s
ds
_
|x
1
x
2
|,
which integrates to |x
1
x
2
|/3. Similarly
|T
s
[x
1
] T
s
[x
2
]| |x
1
x
2
|/3,
and by adding estimates for the components we see that T has Lipschitz constant at
most 2/3.
For completeness let us record that (([0, ), R
d
) is a Banach space and that
is a closed subset of (([0, ), R
d
). Therefore, if c E
s
and [c[ < , then T has a
unique xed point x
x
, which we decompose into components (y
x
, z
x
). Dene
the mapping in (5.47) by
(c) = z
x
(0).
155
Now consider the IVP
x

= F(x), x(0) = (c, d) (5.54)


where [c[ < . Regarding Conclusion (i) of Theorem 5.5.1 if (c, d) /
s
i.e., if
d = (c)then x
x
solves (5.54) and belongs to 1 for all time. In Exercise 1 we ask
you to show that if b /
s
and t 0, then (t, b) /
s
.
On the other hand, if the solution of (5.54) belongs to 1 for all time, then by
Proposition 5.5.3 this solution is a xed point of T and by uniqueness equals x
x
.
Thus, regarding Conclusion (ii), if d ,= (c), the solution of (5.54) cannot belong to
1 for all time.
We still need to verify (5.44), for which we introduce the following lemma.
Lemma 5.5.4. If c[ > and if x , then
limsup
t
[T[x](t)[
2
3
limsup
t
[x(t)[.
Proof. To begin, let us record a simple fact that we will use repeatedly: If 0 a
b , then

_
b
a
e
s
ds 1. (5.55)
Let L = limsup
t
[x(t)[. For an arbitrary > 0 there is a time t
0
such that
x(t) < L + if t > t
0
. Choose such that, with as above, e

< /6.
Now [T[x](t)[ e
t
+1
1
+1
2
where
1
1
=
_
t
0
e
(ts)
[p(x(s))[ds, 1
2
=
_

t
e
(st)
[q(x(s))[ds.
We suppose that t t
0
+, so for the rst term e
t
< /6. We write 1
1
as the sum
of integrals over [0, t ] and [t , t]. On [0, t ], we have [p(x)[ (/3)

2 < .
Substituting into the integral and using (5.55) we deduce that
_
t
0
e
(ts)
[p(x(s))[ds e

_
t
0
e
(ts)
ds < e

< /6.
On [t , t], we have [x[ L + so
_
t
t
e
(ts)
[p(x(s))[ds (/3)(L + )
_
t
t
e
(ts)
ds (L + )/3.
Regarding 1
2
, over the entire interval [t, ), we have [x[ L+ so 1
2
(L+)/3.
Putting together the various pieces we conclude that [T[x](t)[ + 2L/3. Since
was arbitrary, we are done.
156
The claim (5.44) follows immediately from this lemma because, if x is a xed
point of T,
limsup
t
[x(t)[
2
3
limsup
t
[x(t)[,
so the lim sup, which is non-negative, must vanish.
This completes as much as the proof of Theorem 5.5.1 as we promised. See Meiss
[] for the proof that is dierentiable and that D(0) = 0.
Remark. Although we have not proved that /
s
is tangent to E
s
, nor even stated it
in Theorem 5.5.1, this fact is one of the important properties of /
s
that you should
keep in mind as we use these manifolds to help understand the behavior of ODEs.
Under the hypotheses of Theorem 5.5.1 there is also an unstable manifold /
u
of
dimension dd
s
through b

. It is most easily described as the stable manifold of the


time-reversed system x

= F(x). Thus the IVP (5.43) has a solution that belongs


to 1 for all negative times if and only if the initial data b lies in /
u
. Moreover if
b /
u
, then (t, b) /
u
for all t 0 and tends to b

as t .
Incidentally, in the case of a saddle point in the plane (d = 2, d
s
= 1), the
stable and unstable manifolds (which are simply curves) were traditionally called
separatrices.
5.5.4 Global behavior
First let us correct a possible misunderstanding about what Conclusion (ii) in The-
orem 5.5.1 asserts: although solutions with initial conditions in 1 /
s
eventually
leave 1, it is possible for them to return at some later time. To explore this assertion,
lets consider Dungs equation (without friction)
x

= y
y

= x x
3
,
(5.56)
which diers from our example (5.39) above only in the sign of the nonlinear term
in the force. The origin is an equilibrium of (5.56), and the Jacobian there is given
by
DF(0) =
_
0 1
1 0
_
,
which has eigenvalues 1. Thus, by the theorem, /
s
is a curve through the origin
tangent to (1, 1), the eigenvector of DF with eigenvalue 1, and /
u
is a curve
tangent to (1, 1).
Energy is conserved in (5.56), so orbits are contained in level sets of the energy.
157
2

M
u
M
s
2
1 1
V
x
y
Figure 5.9: Stable and unstable manifolds through the saddle point at the origin
in (5.56), Dungs equation without friction. Both manifolds are contained in the
zero-energy level set (5.57). The local manifolds /
s
and /
u
are shown in bold, and
the global manifolds actually coincide.
/
s
and /
u
are contained in the zero-energy level set,
S = (x, y) : y
2
/2 x
2
/2 + x
4
/4 = 0, (5.57)
the only level set to make contact with the origin (see Figure 5.9). In Theorem 5.5.1,
let us take 1 = (x, y) : [x[ < 1, [y[ < 1. Solving the equation in (5.57) for y, we
obtain the description
/
s
= (x, y) : y = x
_
1 x
2
/2, 1 < x < 1, (5.58)
and /
u
is similarly described in terms of the function y = x
_
1 x
2
/2.
Now consider nonzero initial data b

for (5.56) in /
u
; in particular b

1 /
s
.
As time increases, the solution moves away from the origin along the level set (5.57).
It does indeed leave 1 but, verifying the above claim, eventually it returns; in fact,
it returns along /
s
!
As noted above, for the example (5.39), the local stable manifold /
(loc)
s
of The-
orem 5.5.1 is a subset of a global stable manifold
/
(glob)
s
= (x, y) R
2
: y = x

1 + x
2
.
Here are two properties of /
(glob)
s
that are characteristic of global stable manifolds.
(i) /
(glob)
s
is invariant
4
under the ow.
4
A set S is called invariant if for every initial condition b S and for all t such that the IVP has
a solution, (t, b) S. A set is invariant i it is a union of orbits.
158
(ii) /
(glob)
s
contains all initial conditions b such that the solution (t, b) converges
to the equilibrium as t .
At a general hyperbolic equilibrium point b

of an equation x

= F(x), a global
stable manifold may be dened by
5
/
(glob)
s
=
_
(t, b) : t R, b /
(loc)
s
. (5.59)
Of course only negative times contribute anything new to this union. As we shall see
below, in simple examples, /
(glob)
s
is a nice submanifold of R
d
. More generally,
especially in cases where the equation x

= F(x) exhibits chaos, the set /


(glob)
s
may
become hopelessly entangled with itself as t tends to innity. In such cases one may
obtain a nice submanifold only by restricting t to a compact range in (5.59). We
refer to Meiss [] for a careful discussion of this construction.
Naturally we dene a global unstable manifold /
(glob)
u
with the obvious modi-
cation in (5.59).
For the saddle point of (5.56) at (0, 0), both /
(glob)
s
and /
(glob)
u
equal the level set
(5.57). For the local manifolds, as constructed in Theorem 5.5.1, /
(loc)
s
/
(loc)
u
=
b

, but as this example shows the global manifolds can intersect nontrivially. If
b /
(glob)
s
/
(glob)
u
, then the entire orbit through b is contained in the intersection.
Such nontrivial intersections are not robustthe slightest perturbation of the
equation is likely to remove them. For example, suppose we modify (5.56) by in-
cluding a small amount of friction, as in (5.1). In that case the two halves of /
(glob)
u
spiral into the equilibria at (1, 0), as illustrated in Figure 5.10. In Exercise 3 you
are asked to use the Liapunov function to show this. By contrast, /
(glob)
s
spirals in
from innity and ends at (0, 0).
The exercises contain other saddle points for which /
(glob)
s
and /
(glob)
u
intersect
one another. These examples are more typical in that no conserved quantity is
involved.
5.5.5 Section 1.6 revisited
In Section 1.6 we described the behavior of the predator-prey system obtained by
including logistic growth and the Allee eect in the Lotka-Volterra system, the system
x

= x
_
x
x+
_
(1
x
K
) xy
y

= (xy y).
(5.60)
5
Strictly speaking in this union we may take only those times t for which the initial value problem
has a solution. This may be expressed quite precisely in the ow notation of Section 4.4.2, but
such precision obscures more than it claries. In the precision-that-obscures department, let us
also mention that if an equation has more than one saddle point, we may need to indicate the
base point b

of a stable manifold/
(glob)
s
(b

), ugh.
159
M
s
M
u
y
x
Figure 5.10: Global stable and unstable manifolds through the saddle point at (0, 0)
in Dungs equation with friction (5.1) and = 1/4. Reduce the number of
turns in the spirals. Maybe put an arrowhead where you make the spirals
end?
We are now in a position to verify most of the claims made there. (See also Exercise 3
and Section 6.2.3.)
To begin, in Exercise 2 we ask you to verify the information in Table 5.3 about
how the equilibria of (5.60) depend on the parameters in the equation. Note that the
stability changes in the equilibria correlate with the three regions in the , K plane
shown in Figure 1.11(a).
To visualize the phase portraits of (5.60), it is extremely helpful to plot the global
stable manifold of the saddle point (, 0), as is done in Figure 5.11. Of course the
behavior of /
(glob)
s
as t diers in Cases I, II, and III. (Cf. Figure 1.11(a) in
Chapter 1.) In fact, two dierent behaviors occur within Case II, shown in panels
(b) and (c) in the present gure. The precise location where the behavior shifts from
IIa to IIb, which depends on as well as on , K, can be located only using the
computer. Compute by choosing starting point along stable eigenvector, integrate
backwards. Q: Describe as shrinking basin of attraction of co-exist eqlb..
5.6 Exercises
5.6.1 Exercises to consolidate your understanding
1. Supply details omitted in the text:
(a) Prove Corollary 5.1.2.
(b) Prove the claim made in Section 5.2.1: If b

is an equilibrium of x

= F(x)
and if
j
(DF

) > 0 for some j, then b

is unstable.
160
Equilibrium Description Stability
(0, 0) Extinction Always a stable node
(, 0) Extinction threshold Always a saddle
(K, 0) Prey-only equilibrium Stable node if K < 1
Saddle if K > 1
(1, y

) Co-existence equilibrium Unphysical saddle if K < 1


Sink if 1 < K < (1 + 2
2
)/2
Source if (1 + 2
2
)/2 < K
Table 5.3: Equilibria of (5.60), the Lotka-Volterra system augmented
by logistic growth and the Allee eect. In the co-existence equilibrium,
y

= (1 1/K)(1 )/(1 + ). The stability determinations were made assuming


that < K. Q: Is coexist eqlb always a focus?
I
K = 5
Region IIa Region
K = 3.35
III Region
K = 0.4
Region
K=1.5
IIb
(a)
(c)
(b)
(d)
2.5
x
1.5
y
0

0
2.5
x
1.5
y
0

0
2.5
x
1.5
y
0

0
2.5
x
1.5
y
0

0
K K
(1,11/K)
Figure 5.11: Global stable manifolds for the saddle point (, 0) of (5.60) for various
choices of the parameter K.
161
(c) Complete the analysis begun in Section 5.4.2: Use Lasalles invariance
principle to prove that the equilibria (1, 0) of (5.1) are asymptotically stable.
Hint: Calculate that the exceptional set (5.35) is the x-axis. Then
examine the equation (5.1) to show that (5.36) is satised.
(d) Show that, as claimed in the text, solutions of (5.39) blow up in nite time.
(e) Verify equation (5.51).
(f) Verify that the set /
s
constructed using in the proof of Theorem 5.5.1
has the following key property of stable manifolds: If b /
s
and t 0, then
(t, b) /
s
.
2. Miscellaneous applications of Theorem 5.5.1:
(a) The chemostat (refce: Edelstein-Keshet) is described by the two ODEs:
(a) x

= k
n
1 + n
x x
(b) n

=
n
1 + n
x n +
where k, are positive constants.
Find conditions on the constants k, so that these equations have an
equilibrium with x > 0.
If your condition is satised, show that the equilibrium with x > 0 is
asymptotically stable.
(b) Recall from Exercise ?? in Chapter 4 the equations for the evolution of
two symbiotic species. Show that, for values of the constants K
j
such that
this system has an equilibrium with both x and y positive, the equilibrium is
asymptotically stable.
(b) Competing species:
x

= r
1
x
_
1
x
K
1
b
y
K
1
_
, y

= r
2
y
_
1
y
K
2
c
x
K
2
_
.
Figure out conditions for there to be an equilibrium in the open rst quadrant.
Determine the stabilities of all equilibria in all cases. (The equilibrium in the
rst quadrant may be unstable.)
(c) Recall from Exercise 8 in Chapter 4 the equations for a bead sliding on a
rotating loop,
x

= y (5.61)
may

= y mg sin x + m(a sin x)


2
cos x.
162
Show that if a
2
< g the only equilibrium of the system is the origin
(x, y) = (0, 0) and it is asymptotically stable.
Show that if a
2
> g, there are a total of three equilibria, the origin is
unstable, and the other two equilibria are asymptotically stable.
(d) Recall the Lorenz equations from Exercise 8 in Chapter 4. Show that the
origin is an asymptotically stable equilibrium provided < 1.
(e) Verify the information given in Table 5.3.
3. Uses for a Liapunov function
(a) For Dungs equation without friction, show that the equilibria (1, 0) are
Liapunov stable but not asymptotically stable.
(b) For Dungs equation with friction, show that given any initial condition
in the region
b

R
2
: b
2
2
/2 b
2
1
/2 + b
4
1
/4 < 0 and b
1
> 0,
the solution to the IVP converges to the equilibrium (1, 0). (In words, the above
region is the subset of the right-half-plane where the energy is negative.)
(c) Consider a bead sliding on a rotating loop, (5.61). Show that the function
(??) decreases along trajectories. What information may be deduced from this
function if (i) m
2
< g or (ii) m
2
> g?
(d) Say that a function L : | R tends to innity at the boundary if | if for
every M > 0 there is a compact subset / | such that L(x) > M on | /.
Show that if x

= F(x) has a Liapunov function L : | [0, ) that tends


to innity at the boundary of |, then this ODE has a unique equilibrium
in | that moreover is globally attracting for any initial conditions in |.
Need hint
Use the Liapunov function of Section 5.4.3 to prove that the equilibrium
(1, 1 1/K) of (5.37) is globally attracting for initial data in the open
rst quadrant.
Show for the Lorenz equations (Exercise 8 in Chapter 4) that
L(x, y, z) = x
2
/ + y
2
+ z
2
is a Liapunov function if < 1. Use this to deduce that the origin is
globally asymptotically stable when < 1, an improvement over Exer-
cise 2(d).
163
4. Examples of stable and unstable manifolds
(a) In Dungs eqn with friction, show that the two sides of the unstable
manifold /
glob
u
through (0, 0) converge to the equilibria at (1, 0).
Hint: Part (b) of Exercise 3 is helpful for this problem.
(b) Recall the system in Exercise ?? of Chapter 4 for the evolution of symbi-
otically coupled species.
Show that the equilibrium (K
1
, 0) is a saddle point.
Locate the stable and unstable subspaces E
s
and E
u
at this saddle point.
Sketch the (global) stable and unstable manifolds of this saddle point,
both in case there is and there is not an equilibrium for which both x
and y are positive. (If you are reluctant to guess about /
u
, feel free to
compute.)
(c) In the bead-on-a-rotating-pendulum of Exercise 8 in Chapter 4, if m
2
> g
then the origin is a saddle point. Locate the unstable subspace of the saddle.
Sketch the (global) unstable manifold through this saddle point, using the
Liapunov function (??) to determine the behavior at innity. Also sketch the
stable manifold through the saddle.
(d) For the Lorenz equations (Exercise 8 in Chapter 4) with > 1, nd the
stable and unstable subspaces of the saddle point at the origin.
Discussion: In Chapter ?? we will study the onset of chaos in this
equation by tracking changes in the global unstable manifold through
this saddle point as increases. We are not addressing the global un-
stable manifold here because the behavior becomes very complicated
indeed. However, if you wish, you could perform such computations
now. For example, if = 10 and b = 8/3, the real complexity starts
when 24 < < 25.
5. Use the energy H(x, y) = y
2
/2x
2
/2+x
4
/4 to enumerate the orbits of (5.56),
Dungs equation without friction, as we did for the repulsive cubic (5.39).
Interpret the orbits physically, using rolling-marble analogy.
6. It is possible to prove Theorem 5.1.1 using a xed point theorem, like in the
proof of the stable manifold theorem, Theorem 5.5.1. Construct such a proof.
Doing so is a good way to get a handle on the proof of Theorem 5.5.1, which
otherwise may seem impenetrable.
164
7. Cases where the condition of Theorem 5.1.1 is not satised. (a) Is the equilib-
rium x = 0 of the scalar ODE
x

= x
2
.
asymptotically stable?
(b) Is the equilibrium x = 0 of the scalar ODE
x

= x
3
asymptotically stable?
Remark: See Exercise 10 for another example where the condition of
Theorem 5.1.1 is not satised.
8. Consider the three-dimensional system
x

= rx(x A) p
1
xy p
2
xz
y

=
1
xy d
1
y
z

=
2
xz d
2
z
(5.62)
where r, A, p
j
,
j
, d
j
are positive constants. These equations are analogous to
the Lotka-Volterra equations with logistic growth of the prey except there are
two species (y and z) that attack the third (x).
(a) Show that unless d
1
/
1
= d
2
/
2
, (5.62) has no equilibria at which all three
populations are non-zero.
Discussion: In the language of Chapter 1, the relation d
1
/
1
=
d
2
/
2
, is non-generic. In non-technical language, it could be sat-
ised only by accident, and even if it were satised, the slightest
perturbation of the system would undo it. In ecology, this behavior
is known as the Law of competitive exclusion: i.e., it is ecologically
unstable for two species to compete for exactly the same resources.
(b) Find the equilibria for which one of the prey populations is zero, and
determine their stabilities.
165
5.6.2 Exercises referenced elsewhere in the book
9. Let H(x
1
, . . . , x
d
, y
1
, . . . , y
d
) be a smooth function of 2d real variables. A system
of the form
x

j
=
H
y
j
j = 1, . . . , d
y

j
=
H
x
j
j = 1, . . . , d
is called Hamiltonian.
(a) Show that the Hamiltonian H(x, y) is constant along trajectories of such a
system.
(b) Consider the two-dimensional system (the torqued pendulum without fric-
tion)
x

= y
y

= sin x + ,
(5.63)
interpreted as an ODE on R
2
, not on the cylinder S
1
R as in Section 4.3.4.
Show that this system is Hamiltonian with respect to the Hamiltonian function
H(x, y) = y
2
/2 cos x x. (5.64)
Discussion: An attempt to dene H as a function of the cylinder
S
1
R would produce a multi-valued function. Incidentally, note that
the function (5.64) is not bounded from below. Thus, this function
may provide a Liapunov function near an equilibrium (if [[ < 1, but
it is of no value in proving global existence.
(c) Consider a bead sliding on a rotating hoop without friction, and scale y in
the equations of Exercise 8 so that they read
x

= my/a
2
y

= mga sin x + ma
2

2
sin xcos x.
Show that this system is Hamiltonian with the function
H(x, y) = my
2
/2a
2
mga cos x m[a sin x]
2
/2. (5.65)
Discussion: With the rescaling, y equals the angular momentum
of the bead moving around the loop. The function (5.65) makes
one think of the total energy of this systemthe term my
2
/2a
2
represents the kinetic energy of motion around the hoop, the term
166
mga cos x represents the potential energy of the bead, and the term
m[a sin x]
2
/2 represents the kinetic energy of the bead being whirled
around the rotation axis. However, note that the term m[a sin x]
2
/2
is subtracted in (5.65), not added!
As the above examples hint, Hamiltonian systems are frictionless.
Despite this fact, many mechanical systems can be understood as a
Hamiltonian system perturbed by the addition of a frictional term.
In such cases the Hamiltonian usually provides a Liapunov function,
for free, so to speak.
Gradient systems provide another case in which a Liapunov function
may be obtained for free: i.e., a system of the form
x

j
=
V
x
j
, j = 1, . . . , d
where V : | R is a smooth function. Please check that (i) V de-
creases along orbits and (ii) a point b

is an equilibrium i V (b

) =
0. Thus V provides a Liapunov function for an equilibrium if V has
a local minimum there.
5.6.3 Computational exercises
10. Consider the activator-inhibitor system (5.2) in borderline case when
= 2
_
+ 1;
i.e., (5.23) is just barely violated. Determine whether or not the equilibrium
at x = 2/ is asymptotically stable.
11. If (5.23) is satised, the activator-inhibitor system (5.18) has three equilibria,
one of which is a saddle point. Compute the (global) unstable manifold through
the saddle point of this system. Consider rst the case when (5.26) is satised
(so E
+
is asymptotically stable) and then the case when (5.26) is violated.
12. Consider a modication of van der Pols equation (4.29) in which there is a
quadratic non-linearity is the restoring force:
(a) x

= y
(b) y

= (x
2
1)y x x
2
.
(5.66)
(a) Show that provided ,= 0, (5.66) has an equilibrium that is a saddle point.
167
(b) Starting with small , nd the global unstable manifold through the saddle
point.
(c) Explore what happens to this unstable manifold if is increased (say to
= 1).
13. Recall the torqued pendulum
x

= y
y

= sin x y + .
(5.67)
Provided 0 < 1, this equation has two equilibria, solutions of sin x = .
(a) Show by calculating eigenvalues that the equilibrium with 0 < x < /2 is
asymptotically stable and that the equilibrium with /2 < x < is a saddle
point.
(b) Using the Liapunov function (5.64), give an alternative proof that the
equilibrium with 0 < x < /2 is asymptotically stable.
(b) First assuming the friction coecient is small, nd the (global) unstable
manifold through the saddle point.
(c) Increase and look for a qualitative change in this unstable manifold.
Interpret what you see.
Discussion: This equation (interpreted as an ODE on R
2
, not on the
cylinder S
1
R) has an energy function that decreases along orbits,
H(x, y) = y
2
/2 cos x x. (5.68)
See Exercise 3(c) above.
5.6.4 Exercises of independent interest
14. Recall from the Exercises of Chapter 4 the FitzHugh-Nagumo equations,
x

= x(1 x
2
) y + I
y

= (x y)
(5.69)
where I, , are parameters with , positive. These equations will be dis-
cussed later where?, including where they come from.
(a) If < 1, show that for every I, equation (5.69) has a unique equilibrium
solution.
(b) Deduce from Figure ?? and Table 5.2 that in such cases the unique equi-
librium is either a sink or a source.
168
(c) Consider the special case of these equations in which I = 0, still assuming
< 1. Compute the trace of DF and show that the unique equilibrium is a
sink or source according to the following inequalities:
Case 1: > 1 a sink
Case 2: < 1 a source
(d) If > 1, then (5.69) may have one, two, or three equilibria, depending on
the value of I. Draw appropriate graphs to demonstrate this behavior.
(e) Show that if (5.69) has only one equilibrium, it is either a sink or a source.
Rmk: In fact, it is globally attracting. Q: How to prove this? The
Liapunov function in Maria Ss paper? Then show that the trace of DF
is negative, so that the equilibrium must be a sinkasymptotically stable, in
fact.
(f) Show that if (5.69) has three equilibria, then the outside two are sinks while
the inner one is a saddle.
(g) Show that if (5.69) has two equilibria, then one of them is degenerate in
the sense that one of the eigenvalues of DF vanishes there.
15. feedback control to achieve stability.
5.7 Ideas
Mention that if e-values at stable eqlb are complex, traj circle eqlb innitely many
times.
Mention that there is result on structural stability (Hartman-Grobman) if allow
homeomorphism, but this seems too violent to me.
One may wonder how much of Theorem 5.5.1 survives at a nonhyperbolic equi-
librium. The ODE in Exercise 17 in Chapter 4 illustrates an extreme break down
of the conclusions of this theorem. Specically, all orbits except the positive x-axis
have the dening property (5.44) of the stable manifold! Similarly all orbits except
the negative x-axis have the dening property of the unstable manifold.
The ODE in Exercise 10 illustrates a less extreme break down of Theorem 5.5.1.
In this example the Jacobian DF

has one negative and one zero eigenvalue. In this


case there still is a one-dimensional stable manifold tangent to the subspace E

. One
may also dene a (local) center manifold /
c
that is tangent to the null space of DF

169
and has the following property: There is a neighborhood 1 of the equilibrium b

such
that, for any initial condition b 1, if (t, b) 1, then (t, b) /
c
. Indeed,
center manifolds are an extremely useful tool in bifurcation theory ref. However, in
this book we rely on more elementary techniques, even though this means we must
leave some results unproved.
Hidden material for exercises follows
170
Chapter 6
Oscillations in ODEs
As its title implies, this chapter is concerned with oscillatory solutions of ODEs.
Solutions of the van der Pol equation (1.34),
x

= y
y

= (x
2
1)y x.
(6.1)
are representative of the kind of behavior we focus on (see Figure 1.9). Up to now,
we have been forced to rely on the computer to study solutions of this equation.
Starting in this chapter, we introduce analytical techniques to characterize solutions
of this and other equations that have oscillatory behavior. We describe the specic
contents of this chapter more fully in Section 6.1.2.
6.1 Periodic Solutions
6.1.1 Basic issues and examples
A non-constant solution of x

= F(x), dened for < t < , is called periodic if


there exists a real number ,= 0 such that x(t +) = x(t) for all t R. Any number
for which this equation holds is called a period of x. Let us recall various ODEs
we have already encountered that have periodic solutions.
Example 1: (Linear equations) Recall the equation
x

= y, y

= x,
which comes from writing (1.3), the equation of motion for a simple harmonic oscil-
lator, as a rst-order system. Every nonconstant solution of this system is periodic
with period 2, the orbits being circles x
2
+ y
2
= C
2
. More generally, if A is a
171
d d matrix, then the linear system x

= Ax has periodic solutions if and only if A


has at least one complex-conjugate pair of pure imaginary eigenvalues.
The following lemma and its corollary will help in analyzing the next example.
Lemma 6.1.1. Let x(t) be a continuous function on a closed interval [t
1
, t
2
] that
satises x

= F(x) for t
1
< t < t
2
and moreover
x(t
1
) = x(t
2
). (6.2)
Then (i) the maximal interval of existence of x(t) is < t < , and (ii) x(t) is
periodic with period = t
2
t
1
.
Proof. Let us dene x(t) for all t by extending x periodically: thus, for all t
x(t + ) = x(t).
By (6.2) this extension is unambiguously dened and continuous on R. Since the
equation x

= F(x) is autonomous, the extension satises the ODE on any translate


of the original (open) interval, (t
1
+n, t
2
+n) where n is an integer. By Lemma 3.2.8,
the extension in fact satises the equation everywhere. By uniqueness, this periodic
extension equals the maximal solution derived from the original solution.
Corollary 6.1.2. If the trajectory of a solution x

of x

= F(x), dened for all t, is


contained in a closed (
1
curve and if there are no equilibria of x

= F(x) on ,
then x is periodic.
Proof. Since is compact and has no equilibria, the minimum speed along i.e.,
min

[F(x)[is nonzero. Thus x will complete the circuit of in some time less than
T, the length of divided by this minimum speed. This shows that there is a time
such that x() = x(0), and we may therefore apply the lemma.
Example 2: (An equation with a conserved energy) Dungs equation (1.25),
without friction or forcing, is the system
x

= y, y

= x x
3
. (6.3)
As we have seen, the energy E(x, y) = y
2
/2x
2
/2+x
4
/4 remains constant along so-
lution trajectories. There are two exceptional values of C for the level sets E(x, y) =
C: i.e., C = 0, the energy of the saddle point in Figure 6.1, and C = 1/4, the
energy of the two minima. Apart from these, every level set E(x, y) = C is a
closed curve (with two components if C < 0, but no matter). Applying the Corol-
lary, we conclude that all solutions of (6.3) with energy dierent from 0 and 1/4
are periodic.
172
E
x
y
Figure 6.1: Double-well potential energy function for Dungs equation (6.3). En-
ergy remains constant along solution trajectories. Q: Can we make a joke that
the graph of E looks like a molar? Q: Should we keep the trajectories
that the gure now contains?
Remark: The level set E(x, y) = 0 consists of three orbits, the equilibrium
(0, 0) plus two loops (see Figure 5.9 in Chapter 5). As we saw in Section 5.5.3, the
loops equal the (global) stable and/or unstable manifold of the saddle point, which in
fact coincide. The loops represent a limiting case of a periodic orbit: i.e., a trajectory
that closes up on itself, but only in an innite amount of time. More formally, if
x(t), < t < is a solution of an autonomous ODE such that x(t) converges
to the same point as t and t , then its orbit x(t) : t R is called a
homoclinic orbit. Both loops of the level set E(x, y) = 0 are homoclinic orbits.
In both of the preceding examples, there are innitely many periodic orbits. Of
greater interest to us will be isolated periodic orbits known as limit cycles. (This
name derives from the fact that, for planar systems, nearby trajectories approach
the periodic orbit in one of the limits t .) As our numerics show, van der
Pols system (6.1) has such a periodic orbit, and we shall derive this analytically
later in this chapter. In Chapter 7 we will see that, for certain parameter values, the
augmented Lotka-Volterra system (1.41) and the activator-inhibitor system (4.24)
also have such solutions. In the meantime, here are two examples for which, even
with just our present techniques, we can show analytically that such an orbit exists.
Example 3: (An academic example) Recall the system considered in Exer-
cise 3(e) from Chapter 1
x

= x y (x
2
+ y
2
)x
y

= x + y (x
2
+ y
2
)y.
(6.4)
The circle of radius 1 is a limit cycle of this equation. Indeed, rewriting the system
173
in polar coordinates
r

= r(1 r
2
),

= 1, (6.5)
we found explicit solutions of this system. Even without the explicit solutions, one
may see from (6.5) that the angular variable increases at constant rate and, unless
r(0) = 0, the radial variable r approaches 1 as t . Nearby trajectories are
attracted to the periodic orbit r = 1 as t .
Example 4: (The torqued pendulum) Recall from Section 4.3.4 the system de-
scribing the torqued pendulum,
x

= y
y

= sin x y + .
(6.6)
Suppose > 1; i.e., suppose the torque is large enough to overcome the pull of
gravity, no matter what the angle x of the pendulum may be. Under this hypothesis,
we will construct a solution x

(t), y

(t) of (6.6) such that the pendulum continues to


rotate indenitely in a periodic fashion. Strictly speaking, if (6.6) is regarded as an
ODE on R R, this solution is not a periodic function; rather it satises
x

(t + ) = x

(t) + 2, y

(t + ) = y

(t) (6.7)
for an appropriate period . However, the RHS of (6.6) is periodic in x, and as in
Section 4.3.4 we regard this equation as an ODE on the cylinder S
1
R. In this
sense a solution that satises (6.7) is periodici.e., its orbit is a closed curve
1
on
S
1
R.
To construct this solution, we rst eliminate time from (6.6); y as a function of
x satises the scalar ODE dy/dx = F(y, x) where
F(y, x) =
y

=
sin x
y
. (6.8)
Choose constants and M such that
0 < <
1

, M >
+ 1

.
Then F(, x) > 0 and F(M, x) < 0. Thus, as illustrated in Figure 6.2, if b M,
1
Similar issues are implicit in the transformation of (6.4) to polar coordinates. If (6.5) were
regarded as an ODE on (0, ) R, the orbit r = 1 would not be periodic. Of course because
(r, ) represent polar coordinates on R
2
, it is most natural to regard (6.5) as an ODE on the
manifold (0, ) S
1
.
174
b
*
2
M
0

0
y
x
Figure 6.2: Trajectories for (6.8) in the strip < y < M with = 1/2, = 3/2,
= 1/2, and M = 6. The bold trajectory is such that y(2) = y(0) = b

.
then the solution of the IVP
dy
dx
= F(y, x), y(0) = b (6.9)
is trapped between the lines y = and y = M. More formally, we have:
Claim 1: If b M, then the solution (x, b) b of the IVP (6.9) exists for
all x 0 and moreover satises (t, b) M.
Regarding an analytical proof of the claim, it does not actually follow from any
specic result we have articulated above. As a worthwhile review exercise, make
Exercise we invite you to construct a rigorous proof of the claim from techniques
developed in Chapter 4.
Claim 2: The derivative /b satises the estimate
0 <

b
(x, b) < 1.
Proof of Claim 2. According to Theorem ??, Chapter 4, need to write thm
there to cover variable coecients the solution of (6.9) depends dierentiably
on the initial condition b and moreover /b(x, b) satises the linear IVP
dv
dx
=
_
sin x

2
(x, b)
_
v, v(0) = 1,
where the RHS of the equation was obtained by dierentiation of (6.8) with respect
to y. The claim follows from the observation that the coecient of v in this equation
175
b
*

M 0
b

0
M
slope 1
P(b)
Figure 6.3: Graph of the map P : [, M] (, M) constructed from (6.6), using
the same parameters as in Figure 6.2.
is negative.
Dene a map P : [, M] (, M) by the formula P(b) = (2, b). (This map is a
special case of what is called the Poincare map, which we will study in Section 6.3.)
As illustrated in Figure 6.3, by our claims above there is a unique point b

(, M)
where the graph of P crosses the diagonal in [, M] [, M].
Now let x

(t), y

(t) be the solution of (6.6) with initial conditions x

(0) = 0, y

(0) =
b

. It follows from Theorem 4.2.1 that this IVP has a solution for all t R. Since
b

is a xed point of P, there is a time such that x

() = 2, y

() = b

. Now
x

(t + ) 2, y

(t + ) (6.10)
also satises (6.6) and has the same initial conditions as x

(t), y

(t). Thus, by unique-


ness, (6.10) coincides with x

(t), y

(t), which therefore provides our desired periodic


solution of (6.6).
To conclude this introduction, let us record a couple of simple properties of peri-
odic solutions of an ODE that almost dont require proof.
Proposition 6.1.3. If x
0
(t) is a periodic solution of x

= F(x), then (i) there are


no equilibria on the orbit of x
0
and (ii) x
0
has a minimal period.
Proof. Regarding Claim (i), if x
0
(t

) = b

where b

is an equilibrium of x

= F(x),
then y(t) b

and x
0
(t) are two dierent solutions of the IVP
x

= F(x), x(t

) = b

,
176
contradicting uniqueness. Regarding Claim (ii), let
S = > 0 : x
0
() = x
0
(0).
Suppose that there were a sequence
n
S converging to zero. Then
x

0
(0) = lim
n
x
0
(
n
) x
0
(0)

n
= 0;
in other words, this assumption contradicts Claim (i), so S must contain a minimal
element
min
> 0. By Lemma 6.1.1, every element of S is a period of x
0
, and
min
is
a minimal period.
Remark: If is the minimal period of x
0
, then the function x
0
: [0, ] R
d
denes
a closed curve that has no self-intersections: it is closed because x
0
() = x
0
(0) and
it has no self-intersections because if x
0
(t
1
) = x
0
(t
2
) for 0 t
1
< t
2
< , then by
Lemma 6.1.1, t
2
t
1
would be a smaller period of x
0
. In other words, x
0
: [0, ] R
d
is a simple closed curve, what in complex analysis (where d = 2) is called a Jordan
curve. The Jordan Curve Theorem ref ce is the basis of the special behavior of
ODEs in two dimensions, which is discussed in Section 6.2.
6.1.2 Contents of this chapter
The qualitative theory of ODEin particular, the question of asymptotic behavior
of solutions as t provides an instructive perspective on oscillatory solutions.
The simplest asymptotic behavior as t of a solution that remains bounded is to
converge to an equilibrium point. Indeed, in the previous chapter we saw that near
an asymptotically stable equilibrium every solution has this behavior, and that near
a hyperbolic equilibrium solutions belonging to the stable manifold /
s
have this
behavior. Limit cycles represent the next level of complexity in possible asymptotic
behavior of solutions.
Despite numerous analogies, limit cycles are more dicult to analyze than equi-
libria; even showing that they exist can be challenging. In Section 6.2, we introduce
one of the two general analytical tools
2
in this book for proving existence of periodic
solutionsi.e., the Poincare-Bendixson theoremand we use it to show that the van
der Pol equation has a periodic solution. Since this theorem is ultimately based on
the Jordan Curve Theorem, it applies only to two-dimensional systems. In more
general terms, Section 6.2 explores special properties of two dimensional systems,
including a criterion for non-existence of periodic solutions.
Naturally, we also want to describe limit cycles as opposed to merely proving
that they exist, which is the focus of Sections 6.3 and 6.4. There are three kinds of
2
The other tool, which applies in any dimension, is the Hopf bifurcation theorem in Section 7.6.
177
techniques for this task:
Numerical computation
Asymptotic perturbation theory
Rigorous mathematical analysis.
Virtually any problem is amenable to numerical solution; the limitation of this tech-
nique is that one may solve equations only with specic values of the parameters
in it, which can make it dicult to get an overview of the behavior of solutions.
Asymptotics, which works by deriving simpler, approximate problems that can be
solved explicitly, is applicable only if there is a small or large parameter that can be
exploited; on the other hand, it often provides an excellent overview of the behav-
ior of solutions. Rigorous analysis is the least general of the three methodsnew
arguments must be developed for each new problem, and many problems are too
complicated for complete analysis; however, the attraction of complete rigor is irre-
sistible for many.
Remark: It is instructive to ask yourself which method you nd most appealing
your preference provides guidance about possible career choices, or at least specializa-
tions within mathematics. If you like numerics best, consider scientic computation;
if you like asymptotics best, consider traditional applied mathematics; if you like
rigorous methods best, consider mathematical analysis.
In this book, our approach to these three techniques is as follows: Rather than
study the vast arsenal of numerical techniques that have been developed to solve
ODEs, we rely on existing software; if you wish to explore this fascinating subject
further, start for example with [1] best ref ce?. Likewise we slight rigorous anal-
ysis, the third technique; for the application of such methods to van-der-Pol-like
equations, we refer you to [?, ?]. Regarding the second, in two sections of this chap-
ter we illustrate the use of asymptotic methods to describe limit cycles in ODEs.
Specically, in Section 6.3 we study the van der Pol equation in the limit of small ,
and in Section 6.4 we study a more general class of problems that includes the van
der Pol equation in the opposite limit of large . However, asymptotics is only a sec-
ondary focus for this book, and we merely scratch the surface; [3] is an appropriate
reference going beyond our limited coverage.
As with equilibria, there is a notion of stability for limit cycles, driven by the
question, What happens if we start from initial conditions that are close to a
limit cycle? In Section 6.5, we dene stability notions for limit cycles and intro-
duce a general theoretical technique for analyzing their stability: the Poincare map.
Reminiscent of Theorem 5.1.1, the stability or instability of a limit cycle may be
determined from the eigenvalues of a certain matrix related to the Poincare map.
However, unlike Theorem 5.1.1, it is rarely possible actually to able to calculate this
178
matrix. Nevertheless, the conceptual framework provided by the Poincare map is
exceedingly useful for understanding limit cycles.
6.2 Special behavior in two dimensions
6.2.1 The Poincare-Bendixson Theorem: minimal version
The topology of the planein particular the Jordan curve Theoremgreatly con-
strains the possible dynamical behavior of two-dimensional systems of ODE. This
is captured most fully by the strong version of the Poincare-Bendixson Theorem in
Section 6.2.4. The following, simplied version of the theorem provides a sucient
condition for the existence of a periodic orbit.
Theorem 6.2.1. (Poincare-Bendixson) Let F : | R
2
be (
1
on the open set
| R
2
, and suppose that / | is a (compact) trapping region for x

= F(x) that
does not contain any equilibria. Then / contains at least one periodic orbit of the
ODE.
Give reference for proof.
At the risk of boring you through repetition, we emphasize: This theorem is valid
only for planar systems. No analogous result holds in higher dimensions.
Note that the trapping in the theorem might contain several periodic orbits. For
a rather contrived example, consider the system in polar coordinates
dr
dt
= r(1 r
2
)(4 r
2
)(9 r
2
),
d
dt
= 1.
The annulus 1/2 r 4 is a trapping region which contains no equilibria, but
it contains three periodic orbits: i.e., circles of radii 1, 2, and 3. corresponding to
r = 1, r = 2, and r = 3.
Incidentally, in Section 6.2.5 we will state sucient conditions for a periodic orbit
to be unique. Unfortunately, in practice the hypotheses of this result are dicult to
verify.
6.2.2 Application to the van der Pol equation
Recall that in Section 4.3 we constructed trapping regions for the van der Pol equation
(6.1) in order to demonstrate that solutions of this system exist for all positive time
(see Figure 4.5(c)). Let /
0
be one such trapping region, corresponding to a choice
of the parameter A in (4.30). We cannot apply Theorem 6.2.1 using /
0
because the
equilibrium (x, y) = (0, 0) lies inside /
0
. However, let us remove a disk of radius
around the origin from /
0
; thus we dene
/ = /
0
B(), (6.11)
179
K
Figure 6.4: Annular trapping region / of the sort used when applying Theorem 6.2.1
to prove existence of a periodic orbit. The inner boundary of the region encloses the
equilibria (bold dots) so that none are contained in / itself.Change from a general
gure to one specic to van der Pol.
and for deniteness we will let = 1/2 (see Figure 6.4). Now / = /
0
where
= r = is the circle of radius . We know from Section 4.3 that the ow of
(6.1) is inward along /
0
, the outer boundary of /. Let us parametrize , the inner
boundary of /, by . On the inward normali.e., the normal pointing into /is
^

= (cos , sin ), and we calculate that


F, ^

) =
_
1
2
cos
2

_
sin
2
0.
Hence / is a trapping region for (6.1) that contains no equilibria, so there must be
a periodic orbit of (6.1) inside /.
In fact, although we cannot conclude this from Theorem 6.2.1, there is a unique
periodic orbit of the van der Pol equation inside /. We have observed this fact in
computations; we shall derive it with asymptotics in the limit of large or small ;
and we refer you to [?] for a rigorous proof for all .
Remark: Because of the equilibrium of (6.1) at the origin, only an annular
trapping region can be equilibrium-free. This is a general phenomenon: by The-
orem ??, a periodic orbit of a planar system x

= F(x) must enclose at least one


equilibrium. Consequently, whenever we want to obtain a periodic orbit by applying
Theorem 6.2.1, the trapping region / will need to have one or more holes in it.
6.2.3 Limit sets
We now introduce a concept used in the formulation of the strong version of the
Poincare-Bendixson Theorem. Unlike the rest of Section 6.2, this concept makes
sense in arbitrary dimension.
180
Recall from Section 4.4.2 the ow notation (t, b) for the solution of an IVP
x

= F(x), x(0) = b. (6.12)


A point z is called an omega-limit point of b if (t, b) is dened for all t 0 and
there exists a sequence t
n
of real numbers tending to innity such that
lim
n
(t
n
, b) = z.
The set of all omega-limit points of b will be denoted (b). Incidentally, the alpha
limit set, consisting of points obtained in the limit as t , is dened analogously,
but we will not make much use of this latter concept.
The omega-limit set certainly can be empty, as illustrated by the scalar ODE
x

= x with x(0) = b ,= 0. Of course (b) is non-empty if the (forward) orbit


through b is bounded. Here are some examples of omega-limit sets.
Example 1: (A single point) If b

is an asymptotically stable equilibrium of


x

= F(x), then there exists an neighborhood 1 of b

such that (b) = b

for all
b 1. Similarly, if b

is a hyperbolic equilibrium and if b /


s
, the stable manifold
of b, then (b) = b

.
Example 2: (A limit cycle) Example 3 in Section 6.1 was the ODE in polar
coordinates
r

= r r
3

= 1.
We saw that , the unit circle, was a limit-cycle orbit of (6.4) and that every non-
equilibrium solution of (6.4) approaches . Thus, in the present terminology, (b) =
for all b ,= 0. Similar behavior occurs for van der Pols equation.
Example 3: (Homoclinic cycles) Let us generalize the -equation in the preced-
ing example to
r

= r r
3

= (r, ).
(6.13)
If as in (5.12) we have (r, ) = 1 cos , then every non-zero trajectory converges
to the equilibrium at (r = 1, = 0). However, addition of a term (r 1)
2
to (r, )
changes the omega-limits greatly. In need Figure ?? we show the ow for
(a) (r, ) = 1 cos + (r 1)
2
, and (b) (r, ) = 1 cos 2 + (r 1)
2
. (6.14)
In both cases (b) is the unit circle if b ,= 0 and [b[ , = 1. However, for (6.14a),
consists of the equilibrium (r = 1, = 0) and a homoclinic orbit connected to
it as t , while for (6.14b), consists of two equilibria, (r = 1, = 0) and
181
(r = 1, = ), and heteroclinic orbits (i.e., dierent limits as t ) connecting
these equilibria. A simple closed curve consisting of one or more equilibria of an ODE
and orbits connecting these equilibria is called a homoclinic cycle. More interesting
examples of this type of omega limits will arise naturally in Chapter 8.
Example 4: (Limit sets in higher dimensions) In the Exercises we ask you
to construct a three-dimensional system for which the typical omega-limit set is a
torus, the direct product of two circles. Actually, the omega-limit set can be far
more complicated than this in three or more dimensions. In fact, in the 1960s
mathematicians were so perplexed by limit sets they observed that they coined the
pejorative phrase, strange attractor. We will get a chance to dig into this rich
treasure in Chapter 8.
Primarily we dene omega limit sets in this chapter in order to formulate the
strong Poincare-Bendixson Theorem. Although it is a slight digression, let us pause
to develop derive a couple of simple properties of such sets. Let (t, b) be the
ow associated with an ODE x

= F(x) where F : | R
d
. We say that 1 |
is invariant with respect to the ow if (t, b) 1 for all b 1 and all t R.
(Incidentally, positive invariance of a set 1 is a less restrictive concept, requiring
only that (t, b) 1 for t 0.)
Proposition 6.2.2. Any omega-limit set (b) associated with a solution x(t) of an
ODE is a closed, invariant subset of |.
Proof. If (b) is empty, the assertion is trivial. Suppose that z
m
is a sequence of
points in (b) that converges to z. Then we must show that z (b): i.e., there
exists a sequence t
n
tending to innity such lim
n
x(t
n
) = z. Since z
m
(b), there
exist sequences s
(m)
k
, all tending to innity as k , such that lim
k
x(s
(m)
k
) = z
m
.
For n = 1, 2, . . ., choose t
n
= s
(n)
k(n)
, with t
n
n, such that
[x(t
n
) z
n
[ <
1
n
.
Then
[x(t
n
) z[ [x(t
n
) z
n
[ + [z
n
z[;
since both terms on the right tend to zero, the rst claim in the proposition is proved.
Regarding invariance, suppose z
0
(b); thus there is a sequence t
n
tending
to innity such that z
0
= lim
n
(t
n
, b). Given any point z = (t

, z
0
) on the
trajectory through z
0
, consider the sequence (t
n
+ t

, b). If t

< 0, then early


elements (t
n
+ t

, b) in the sequence might be undened if t


n
+ t

< 0, but let us


restrict n N to exclude these problematic elements. By the semi-group property
Proposition 4.4.3
(t

+ t
n
, b) = (t

, (t
n
, b)),
182
and by continuity
lim
n
(t

, (t
n
, b)) = (t

, lim
n
(t
n
, b)) = (t

, z
0
) = z.
Hence z (b), as claimed.
6.2.4 The Poincare-Bendixson Theorem: strong version
Theorem 6.2.3. (Poincare-Bendixson): Suppose that Let F : | R
2
be (
1
on
|, where | is positively invariant with respect to the ow and contains only nitely
many equilibria. If b |, then either (b)
(i) consists of a single point,
(ii) is a periodic orbit, or
(iii) is a homoclinic cycle.
Examples 13 of Section 6.2.3 illustrate possibilities (iiii) of the theorem.
Give reference
Incidentally, the hypotheses that F has only nitely many equilibria in | is
essential. A simple counterexample is provided by (6.13) with (r, ) = (r 1)
2
. In
this case most trajectories converge to the unit circle , but is an innite union of
equilibria. A more perverse example may be constructed using the following special
case of a result from [?] Malgrange, Ideals of dible fcn: For any closed subset
K , there is a non-negative, (

function : R such that () = 0 i K.


Now consider (6.13) with (r, ) = (r 1)
2
+(). Again most trajectories converge
to , but now may be a horrible jumble of innitely many equilibria plus orbits
connecting them.
6.2.5 Dulacs Theorem
We conclude our discussion of two-dimensional systems with a proposition that shows
non-existence of periodic solutions, plus an application of that result. Q: Make ex-
ercise about uniqueness result? Bad for text since there is hidden topo-
logical assumptionthe two periodic orbits need to be deformable to one
another. Not a problem for a specic application. Q: Apply to van der
Pol?
Proposition 6.2.4. (Dulac). Suppose that F : | R
2
is (
1
on the open, simply
connected set | R
2
. If there exists a (
1
function g : | R such that the divergence
(gF) is non-negative and is not identically zero on any open subset of |, then the
ODE x

= F(x) has no periodic solutions lying entirely within |.


Remarks: (i) The same conclusion follows if (gF) is non-positive and is not
identically zero on any open subset of |.
183
(ii) The proof below does not shed much light on why the proposition is true.
As we explore in Exercise??, such intuition can be derived from considering how the
area of regions evolve under the ow .
Proof of Proposition 6.2.4. Suppose to the contrary that there exists a simple, closed
orbit , and let denote the interior of . By Greens Theorem,
__

(gF) dA =
_

(gF) n ds,
where n, ds, and dA have their usual meanings. By our assumptions regarding
(gF), the double integral on the LHS is strictly positive. By contrast, the contour
integral on the RHS is zero since the velocity vector F(x) = x

is tangent to and
therefore orthogonal to the normal n.
As an interesting application of the two-dimensional theory, recall the modica-
tions of Lotka-Volterra model for a predator-prey system introduced in Section 1.6.
Specically, consider (1.41) with the carrying capacity K set equal to innity:
x

= x
_
x
x +
_
xy, y

= (xy y), (6.15)


where > 0 and 0. If = 0 trajectories of (6.15) in the (open) rst quadrant
are periodic. The seemingly innocent factor (x )/(x +), which is approximately
equal to 1 if x is large, changes the dynamics completelywe claim that, apart from
the equilibria, all trajectories converge to total extinction, (x, y) = (0, 0).
The rst step in proving the claim is to apply Dulacs theorem with g(x, y) = 1/xy
and F(x, y) given by the RHS of (6.15). We calculate
(gF) =
2
y(x + )
2
,
which is strictly positive throughout the rst quadrant. It follows that (6.15) has no
periodic solutions in the biologically meaningful regime x > 0, y > 0.
In Exercise?? we help the reader complete the proof of the claim by combining
the above information with the strong Poincare-Bendixson Theorem.
6.3 Limit cycles in the van der Pol equation for small
In perturbation theory, one calculates approximate solutions of problems whose exact
solutions are not easily computed. When used with appropriate care, perturbation
methods often produce approximations that are accurate well beyond what one has
any right to expect. In this section we use perturbation theory to describe the
184
limit cycle of the van der Pol equation in the limit of small . In Section 6.4.1 we
introduce perturbation theory through two examples, and in Section 6.4.2 we make
the application to the van der Pol equation.
6.3.1 Two illustrative examples of perturbation theory
Example 1: Consider the one-parameter family of initial value problems,
x

= x + x
2
, x(0) = 1, (6.16)
where is a small parameter. If = 0, then (6.16) has the solution x(t) = e
t
. Even
if ,= 0, (6.16) may be solved exactly because the equation is separable. However,
let us temporarily ignore this exact solution and use perturbation theory to obtain
the approximation
x(t) e
t
+ (e
t
e
2t
). (6.17)
In other words the small term x
2
changes the solution by (e
t
e
2t
), at least
approximately.
In perturbation theory, in attacking a one-parameter family of problems like
(6.16), one considers all small values of simultaneously. To emphasize this point
of view we write x(t; ) for the solution, indicating the dependence on , and we
suppose x(t; ) has a power-series expansion:
x(t; ) = x
0
(t) + x
1
(t) +
2
x
2
(t) + . . . . (6.18)
Even if the series might not converge, each term in the series should be small com-
pared to all terms that precede it. Inserting the expansion into (6.16) yields
x

0
+ x

1
+
2
x

2
+ . . . = [x
0
+ x
1
+
2
x
2
+ . . .] + [x
0
+ x
1
+
2
x
2
+ . . .]
2
where the dots indicate terms that are of order
3
or higher. Expanding out the
squared term we obtain
x

0
+ x

1
+
2
x

2
+ . . . = x
0
+ [x
1
+ x
2
0
] +
2
[x
2
+ 2x
1
x
0
] + . . . .
For each t, the LHS and RHS of this equation are functions of , and for them to
be the same functions, the coecient of each power of on the left must equal the
corresponding coecient on the right. This principle may be used to calculate ODEs
for every coecient x
n
(t) in (6.18). In particular, matching terms of corresponding
185
orders through
2
generates the ODEs
O(
0
) : x

0
+ x
0
= 0
O(
1
) : x

1
+ x
1
= x
2
0
O(
2
) : x

2
+ x
2
= 2x
1
x
0
.
Since the initial condition x(0, ) = 1 holds for all , it follows that
x
0
(0) = 1, x
1
(0) = 0, x
2
(0) = 0, . . . .
We attack the equations sequentially. First, x
0
(t) = e
t
satises the O(
0
)-IVP.
Given x
0
(t), the O()-problem is an inhomogeneous IVP whose solution is x
1
(t) =
e
t
e
2t
. In the Exercises we ask you to solve the O(
2
)-IVP for x
2
(t).
The rst two terms of (6.18) yield the approximation (6.17). To assess the ac-
curacy of this approximation, we solve (6.16) explicitly via separation of variables,
obtaining
x(t, ) =
1
+ (1 )e
t
.
Please check that
x(t, ) = x
0
(t) + x
1
(t) +O(
2
), (6.19)
as expected.
In point of fact, (6.19) holds uniformly for 0 t < . Such uniformity is rare
one would expect errors in the approximation to accumulate as time increases. Thus,
normally (6.19) would be uniform only over nite intervals, say 0 t T. Problem
(6.16) was hand-picked so that all coecients x
n
(t) in (6.18) decay as t ,
meaning that the estimate (6.19) is uniform over [0, ), but only because both sides
tend to zero for large t.
Example 2: Our next example illustrates the accumulation of errors in a power-
series approximation as t increases and how to cope with this diculty. Consider
the one-parameter family of initial value problems
x

+ (1 +)x = 0, x(0) = b, x

(0) = 0, (6.20)
where b ,= 0. The exact solution of the IVP is x(t) = b cos(

1 + t), which is
periodic; in particular it does not decay as t . Ignoring the exact solution and
seeking an approximation as above, we suppose x(t, ) = x
0
(t) + x
1
(t) + . . . and
insert this expansion into the ODE,
x

0
+ x

1
+ . . . + (1 +)[x
0
+ x
1
+ . . .] = 0.
186
Multiplying out the product and grouping like powers of we obtain ODEs
O(
0
) : x

0
+ x
0
= 0
O(
1
) : x

1
+ x
1
= x
0
subject to initial conditions
x
0
(0) = b, x

0
(0) = 0, x
1
(0) = x

1
(0) = 0.
The solution of the leading-order problem is x
0
(t) = b cos(t). Substitution of x
0
(t)
into the O()-equation leads to an inhomogeneous ODE for x
1
with a resonant forcing
term:
x

1
+ x
1
= b cos t.
Imposing the initial conditions, we nd
x
1
(t) =
b
2
t sin t,
which yields the two-term asymptotic approximation
x(t, ) x
0
(t) + x
1
(t) = b cos t
bt
2
sin t. (6.21)
The error in (6.21) is O(
2
), uniformly for t in any nite interval 0 t T.
However, this approximation fails miserably as t (see Figure 6.5). Indeed, for
large t, the supposedly small, rst correction to x
0
(t) in fact becomes large compared
to x
0
(t)!
The problem with the simple ansatz above, x(t, ) = x
0
(t) + x
1
(t) + . . ., is
that both x(t, ) and x
0
(t) are periodic, but they have dierent periods, with the
period of x(t, ) depending on . Because of this dierence, the two functions cannot
remain close to one another indenitely, no matter how small may be; in hindsight,
obviously the correction term x
1
(t) cannot stay small as t .
A bit of terminology: in an expansion x(t, ) = x
0
(t) + x
1
(t) + . . ., a term in a
coecients x
n
(t) that grows without bound as t is sometimes called a secular
term. Such terms typically come from resonant forcing, as above.
When dealing with an IVP such as (6.20) that has a periodic
3
solution, the
Poincare-Lindstedt method allows one to obtain an approximation that holds for
arbitrarily large times. In this method one introduces a scaled time
(t, ) = ()t = (1 +
1
+
2

2
+ . . .)t (6.22)
3
There is a more general approximation technique for nonperiodic problems: the method of multi-
ple scales, a.k.a. two-timing. See [3] for more details, or better yet, take a course in asymptotics,
which is not an easy subject to learn without guidance from a pro.
187
10 20
4
0
4
0
exact
approximation
Figure 6.5: Comparison of the exact solution of (6.20) with its two-term regular
perturbation expansion approximation (6.21) for b = 1 and = 0.1. To the naked
eye, the Poincare-Lindstedt approximation (6.24) is indistinguishable from the exact
solution for this choice of and time window.
and seeks a power-series expansion of x(t, )
x(t, ) = x
0
((t, )) + x
1
((t, )) +
2
x
2
((t, )) + . . . (6.23)
in which the coecients x
n
() depend on the scaled time. If the scaling factor ()
is chosen cleverly, it can compensate for the mismatch between the periods of the
exact and approximate solutions. Of course the $64 question is, how to choose the
scaling factor? In the calculation below we will see that by requiring at each order
that no secular terms arise, both the undetermined coecient
n
in (6.22) and the
next term x
n
() in the series are uniquely determined.
Lets get on with it. Invoking the chain rule d/dt = d/d, we may rewrite the
ODE in (6.20) as

2
()
d
2
x
d
2
+ (1 +)x = 0.
Inserting the expansions for () and x(t, ), we obtain
(1 + 2
1
+ . . . )
_
d
2
x
0
d
2
+
d
2
x
1
d
2
+ . . .
_
+ (1 +)[x
0
+ x
1
+ . . . ] = 0.
Expanding the products and grouping terms according to their order in , the leading-
order and next-order correction terms obey the equations
O(
0
) :
d
2
x
0
d
2
+ x
0
= 0
O(
1
) :
d
2
x
1
d
2
+ x
1
= 2
1
d
2
x
0
d
2
x
0
.
188
Since is proportional to t, these functions must satisfy the initial conditions at
= 0
x
0
(0) = b,
dx
0
d
(0) = 0, x
1
(0) =
dx
1
d
(0) = 0.
The solution of the leading-order equation is x
0
() = b cos . Given x
0
(), the O()-
equation becomes
d
2
x
1
d
2
+ x
1
= (2
1
1)b cos .
Now here is the key point: to avoid a secular term in x
1
(), we must require that

1
= 1/2, so that the RHS of this equation vanishes. The solution of the (now
homogeneous) IVP for x
1
is the trivial function x
1
() 0. Thus, modulo errors that
are of order
2
or higher, our approximation of the true solution x(t) = b cos(

1 +t)
is given by
x(t, ) b cos [(1 + /2)t] . (6.24)
This estimate is far more satisfactory than (6.21). As mentioned in the caption
of Figure 6.5, there is no visible discrepancy between (6.24) and the true solution
of (6.20) (at least not over the given viewing window and with b = 1 and = 0.1).
In Exercise ?? we ask you to compare the errors in (6.21) and (6.24) analytically.
In general terms, the error in (6.21) is small if t 1, while the error in (6.24) is
small if
2
t 1. Thus, (6.24) is accurate for a much larger, but still nite, range of
t. By carrying the Poincare-Bendixson approximation to successively higher orders,
one can obtain approximations that are accurate if
n
t 1 for any integer n.
Here is another perspective on (6.24): since

1 + = 1+/2+O(
2
), (6.24) may
be derived from the exact solution cos

1 + t by neglecting
2
t inside the argument
of the cosine.
A pessimist might complain that the above calculation seems a little mysterious,
and we would agree. On the other hand, an optimist might exclaim how wonderfully
it all works out, and we would again agree. Over time we have found that the mystery
in asymptotics resolves itself, while the wonder remains and even grows.
6.3.2 Application to the van der Pol equation
Let us now apply the Poincare-Lindstedt method to analyze the limit-cycle solution
of a nonlinear ODE for which exact solutions are not available: the van der Pol
equation
x

(t) + (x
2
(t) 1)x

(t) +x(t) = 0 (6.25)


where is small. In notable contrast with the linear equation in (6.20), for which
all solutions are periodic, the periodic solution of (6.25) is unique up to translation.
Without loss of generality we may perform a translation in t such that a local maxi-
mum of the periodic solution of (6.25) is located at t = 0. Then this periodic solution
189
will satisfy initial conditions
x(0) = b, x

(0) = 0 (6.26)
where b > 0 must be determined along with the solution itself.
Here goes. Dening scaled time as in (6.22), we may rewrite (6.25) as

2
()
d
2
x
d
2
+ (x
2
1)()
dx
d
+ x = 0.
Inserting the expansions for () and x into the equation, we obtain
[1 + 2
1
+ ]
_
d
2
x
0
d
2
+
d
2
x
1
d
2
+
_
+[(x
2
0
1) + ] [1 + ]
_
dx
0
d
+
_
+ [x
0
+ x
1
+ ] = 0,
where we have retained only those terms that contribute to orders
0
or
1
. Grouping
terms of like order, we calculate the equations
O(
0
) :
d
2
x
0
d
2
+ x
0
= 0
O(
1
) :
d
2
x
1
d
2
+ x
1
= 2
1
d
2
x
0
d
2
(x
2
0
1)
dx
0
d
subject to the initial conditions
x
0
(0) = b,
dx
0
d
(0) = 0, and x
1
(0) =
dx
1
d
(0) = 0.
The solution of the lowest-order problem is x
0
() = b cos , where b is yet to be
determined. We substitute x
0
() into the O() equation to obtain
d
2
x
1
d
2
+ x
1
= 2
1
b cos
_
b
2
cos
2
1
_
(b sin ).
The problematic resonant forcing terms 2
1
b cos and b sin are easy to spot, but
there is another troublemaker lurking here as well. Indeed, by use of the trigonomet-
ric identity
sin cos
2
=
1
4
[sin + sin 3],
the ODE for x
1
can be rewritten:
d
2
x
1
d
2
+ x
1
= 2
1
b cos +
_
b
3
4
b
_
sin +
b
3
4
sin 3.
190
The sin 3 term is harmless: i.e., it has the particular solution (b
3
/32) sin 3, which
is periodic. By contrast, to avoid secular terms, we must require that
1
b = 0 and
b
3
/4 b = 0. Since b > 0 by assumption, we conclude that
1
= 0 and b = 2. Thus
our calculation has shown that, to this order, (6.25,6.26) has a periodic solution only
if b = 2. Hence x
0
() = 2 cos . In other words, to lowest order, the periodic solution
of (6.25) is a trigonometric oscillation of amplitude 2 and period 2.
In Exercise ?? we help you continue the expansion to the next order. You will
nd the O(
2
)-correction to the period and calculate the O()-distortion of the orbit
from a perfect sine wave. Challenge: Can you predict whether the O(
2
)-correction
will make the period longer or shorter? Hint: It may be useful to reect on the
information in the next section about the large- behavior of solutions.
6.4 Limit cycles in the van der Pol equation for large
6.4.1 Setting up the problem
Consider the system
(a) x

= y (x
3
/3 x)
(b) y

= x
(6.27)
where > 0 is a small parameter. In Exercise ?? we ask you to show that if x(t), y(t)
satisfy this system, then with respect to the scaled time t =

t the rst component
satises the van der Pol equation
d
2
x
dt
2
+ (x
2
1)
dx
dt
+ x = 0 where = 1/

. (6.28)
Thus, we may study solutions of the van der Pol equation for large by analyzing
solutions of (6.27) for small . This reduction of van der Pols equation to a rst-order
system is similar to that of Exercise 8, but the scaling in (6.27) is more convenient
for analyzing the large- limit. Plug Appendix on scaling?
In Exercise ?? we ask you to carry out the following steps: (i) Following the ideas
in Exercise 8 in Chapter 4, construct a rectangular trapping region /
0
for (6.27).
(ii) Show that the origin is the only equilibrium of (6.27) in /
0
(or in R
2
), and it
is a source. (iii) By removing a small ball B around the origin from /
0
, obtain
a trapping region for (6.27) that contains no equilibria. (iv) Invoke the Poincare-
Bendixson Theorem to prove that (6.27) has a periodic solution in /
0
B.
6.4.2 The limit-cycle solution
Since is small, (6.27) is a fast-slow system. Specically, the x-equation evolves
rapidly compared to the y-equation. With such systems, it is natural to make the
191
y
x
Figure 6.6: A compendium of data for (6.27) with = 1/100. The vertical seg-
ment (1, b) : 1 < b < 0 is shown in bold. The solution trajectory starting from
the initial conditions (x(0), y(0)) = (1, 0.2) is shown along with the graph of the
x-nullcline (6.29).
approximation of letting the fast equation proceed to equilibrium. Here this means
solving
y + (x
3
/3 x) = 0 (6.29)
for x as a function of y and substituting the result into (6.27b). It might seem more
convenient to solve (6.29) for y as a function of x, but conceptually it is clearer to
have x as a function of y in order to substitute into an equation for the evolution of
y. With (6.27) the fast-slow approximation faces a new diculty, not present in pre-
vious instances of this approximation: i.e., solving (6.27) gives x as a multiple-valued
function of y. For this reason it is more intuitive to consider (6.27) geometrically
(i.e., through pictures) rather than analytically. In geometric terms, the fact that
is small mean that, as shown in Figure 6.6, the ow is nearly horizontal, except hear
the x-nullcline (6.29).
To explore the implications of this geometry, let us consider specic initial condi-
tions for (6.27), say (x(0), y(0)) = (1, b), where 1 < b < 0. Such initial conditions lie
on the line below the local maximum of x
3
/3 +x at (x, y) = (1, 2/3). As indicated
in Figure 6.6, the solution initially moves to the right, staying close to the horizontal
line y = b, until it reaches the nullcline. After this, equation (6.27b) pushes the so-
lution upward, but at a much slower rate. As y increases, x also evolves, keeping the
solution close to the nullcline; any signicant departure from the nullcline would be
quickly counteracted by (6.27a). As long as x > 0, (6.27b) implies that y continues
to increase. However, once the solution reaches (1, 2/3), there no longer is a nullcline
to follow. At this point (6.27b) pushes the solution o the nullcline; it begins rapid
motion, close to the horizontal line y = 2/3, towards the left branch of the nullcline.
192
After reaching the nullcline, similar behavior ensues. Specically, the solution moves
slowly down the left branch of the nullcline till it reaches (1, 2/3), after which it
moves rapidly to the right, approximately along the horizontal line y = 2/3. The
key observation here is this: solutions with initial conditions (x(0), y(0)) = (1, b)
may start out far from one another, but after their circuit around the origin they all
return close to another, clustered around the line y = 2/3.
We use this information to argue that (6.27) has a limit-cycle solution. Dene a
mapping P : [1, 0] [1, 0] as follows. Given b [1, 0], follow the solution with
initial condition (x(0), y(0)) = (1, b) as it evolves and let P(b) be (the y-coordinate
of) the point where the solution rst crosses the line x = 1 after its circuit around
the origin. As observed above P(b) 2/3 so, as need gure makes clear, P will
have a unique xed point b

in [1, 0]. Lemma 3.2.8 may be invoked to show that


the solution with initial conditions (x(0), y(0)) = (1, b

) is periodic.
This limit-cycle is close to the piecewise smooth curve specied below. In the
following description, the point (2, 2/3) in the rst bullet arises as the intersection
of the line y = 2/3 with the x-nullcline (6.29). has four pieces (see need gure),
as follows:
Phase 1: A horizontal piece from the local minimum of the x-nullcline at
(1, 2/3), intersecting the nullcline at (2, 2/3). Here the speed is O(1).
Phase 2: A piece that follows the x-nullcline upward from (2, 2/3) to the
local maximum of the nullcline at (1, 2/3). Here the speed is O().
Phase 3: A horizontal piece from (1, 2/3), intersecting the x-nullcline at (2, 2/3).
Here the speed is O(1).
Phase 4: A piece that follows the x-nullcline downward from (2, 2/3) return-
ing to (1, 2/3). Here the speed is O().
Although our discussion has been purely heuristic, reference makes this analysis
completely rigorous. However, this theory is not light reading!
Let us contrast the solution of (6.27) with previous instances of fast-slow systems
we have encountered. During Phase 2, the solution of (6.27) is well described by
the usual fast-slow approximation: i.e., a single scalar ODE obtained by letting the
fast equation proceed to equilibrium. The new element here is that the fast-slow
approximation predicts its own breakdown.
Let us argue that the period of the above limit cycle is approximately
(3 2 ln 2)
1
. (6.30)
Most of the time required to complete the cycle around the origin is spent in Phases 2
193
and 4, and by symmetry both phases last the same time; thus
Period of 2 [Time spent in Phase 2].
To estimate the time spent in Phase 2, it is convenient to alter the description above
of the fast-slow approximation. Here we propose to solve (6.29) for y as a function
of x and substitute the result on the left-hand side of (6.27b), giving
d
dt
_
x x
3
/3
_
= x.
Dierentiating, dividing by x, and separating variables we obtain
(x
1
x)dx = dt.
Integrating this from x = 2 to x = 1, we deduce that the duration of Phase 2 is
approximately,
(3/2 ln 2)
1
,
from which (6.30) follows.
With a tour de force application of higher-order asymptotics, it has been shown
that the period the limit cycle equals
(3 2 ln 2)
1
+ C
1/3
+O([ ln [) (6.31)
where the constant C can be expressed in terms of a zero of the Airy function. The
O(
1/3
)-term is perhaps surprising since it seems like the neglected durations of
Phases 1 and 3 are only O(1) as 0. The subtle point is that a substantial
amount of time is required to make the transitions from Phase 2 to 3 and from
Phase 4 to 1: i.e., the transitions at which the solution is pushed o the nullcline.
The estimate (6.31) represents a good-news-bad-news kind of situation. The good
news is that such precise estimates can be obtained through the careful application of
asymptotics. The bad news is that lengthy calculations are needed to derive (6.31),
and sloppy analysis might easily miss the O(
1/3
)-term altogether.
6.4.3 Relaxation oscillations in the van der Pol equation
Figure 6.7 shows the graph of the periodic solution of (6.28) for = 10. As one
would expect from the above piecewise smooth limit-cycle solution of (6.27), the
gure shows long intervals of slow evolution of x separated by brief intervals in
which x relaxes to a new metastable state. Such oscillations are described by the
term relaxation oscillations. Undoing the scaling in (6.28), we see that the period of
these oscillations is approximately (3 2 ln 2); in particular, the period gets large
as . Intuitively, the rst-order term in (6.28) involves friction, and as friction
194
x
t
Figure 6.7: Relaxation oscillations in the van der Pol equation (6.28) with = 10.
gets large, all motion slows down.
6.5 Stability of periodic orbits: the Poincare map
6.5.1 The basic construction
The denition of stability for limit cycles is completely analogous to stability of
equilibria. Thus, we say a limit cycle of x

= F(x) is Liapunov stable if for every


neighborhood 1 of , there is a smaller neighborhood 1
1
such that if initial data
b are restricted to belong to 1
1
, then the IVP is solvable for all positive times and
moreover x(t) 1 for all t 0. Similarly, we say a limit cycle is asymptotically
stable if it is Liapunov stable and if there is one neighborhood 1

of such that for


all initial data in 1

the solution of the IVP satises


lim
t
dist(x(t), ) = 0.
For completeness, let us record that the distance from a point x to a compact set /
is dened by
dist(x, /) = min
yK
[y x[.
For example, the limit cycles in Examples 3 and 4 of Section 6.1 and the limit cycle
in the van der Pol equation, for any value of , are asymptotically stable. Intuitively
this seems clear, but moving beyond intuition, in this section we introduce a rigorous
technique for establishing the stability of a limit cyclethe Poincare map. Similar
to Theorem 5.1.1, which considers stability of an equilibrium point, the stability of a
limit cycle is related to the eigenvalues of a certain matrix. Unfortunately, for limit
cycles, this matrix is often dicult to calculate by hand.
195
It is customary to say the Poincare map, but in fact there are many: the maps
we are about to construct depend on arbitrary choices
4
, as follows:
Choose a starting point on the trajectory, say its position (0) at time zero.
Choose a small section of any smooth, (d1)-dimensional surface transverse
to the periodic orbit at (0).
In symbols may be written
= b B((0), ) : p(b) = 0 (6.32)
where B((0), ) is the ball in R
d
of radius with center (0) and p : B((0), ) R
is a (
1
function such that
p((0)) = 0 and p((0)),

(0)) , = 0. (6.33)
Now for initial conditions b , consider the IVP
x

= F(x), x(0) = b. (6.34)


If b = (0), then the solution of (6.34) crosses when t = T, the minimal period of
(t), precisely at the point (0). The Poincare map focuses on the question: starting
from more general b , when and especially where does the solution (t, b) of
(6.34) next cross . In mathematical terms the question When does (t, b) cross
? may be answered by solving
p((t, b)) = 0 (6.35)
for t, say t = (b); and the question Where does (t, b) cross ? is answered by
the formula that denes the Poincare map P,
P(b) = ((b), b). (6.36)
Theorem 6.5.1. There exists a neighborhood
5
^ R
d
of (0) such that: (i) if
b ^, then (6.35) has a unique solution (b) T that is a (
1
function of b,
and (ii) equation (6.36) denes a (
1
map P : ^ .
Proof. (i) Let us apply the Implicit Function Theorem to (6.35). Of course p is
dierentiable, and we know from Theorem 4.5.1 that is also dierentiable. By
4
The is justied in the sense that all of these mappings may be transformed to one another by
appropriate changes of coordinates.
5
In many examples that we consider, in which is asymptotically stable, the neighborhood ^
may seem like an unnecessary complication. See Exercise ?? for a case where is unstable and
^ must be included.
196
too large
not too large


Figure 6.8: If is chosen too large, could cross prematurely.
periodicity, (T, (0)) = (0), so if b = (0), then t = T solves (6.35). By the chain
rule, the t-derivative of (6.35) at (T, (0)) equals
p((0)),
t
(T, (0))); (6.37)
by periodicity

t
(T, (0)) =
t
(0, (0)) =

(0);
and by (6.33) the derivative (6.37) is nonzero. This proves Claim (i).
(ii) The map P is (
1
because it is the composition of dierentiable functions.
Remark: If the radius of the Poincare section (6.33) is suciently small, then
(b) represents the rst time at which the trajectory (t, b) returns to . Thus,
the Poincare map is sometimes called the rst-return map. If were too large, the
periodic orbit could cross prematurely without completing a full cycle (see
Figure 6.8). Even if were too large, the requirement that (b) depends smoothly
on b selects the right solution of (6.35), but we shall nevertheless suppose that is
appropriately small.
The Poincare map allows us to recast questions regarding the stability of a pe-
riodic orbit . Suppose b
0
^ is a point near (0), and follow the trajectory
(t, b
0
) forward in time. At time t = (b
0
), the trajectory makes its rst return
to , crossing the Poincare section at the point b
1
= P(b
0
). If b
1
happens
to belong to ^, then the trajectory crosses a second time at b
2
= P(b
1
).
Continuing for as long as these iterates remain in ^, we may recursively dene
a sequence of subsequent crossings, b
n+1
= P(b
n
). If b
n
^ for all n and if
the trajectory (t, b
0
) converges to the periodic orbit, then b
n
(0). Conversely,
in Exercise ?? we ask you to show that if b
n
^ for all n and if b
n
(0),
then the trajectory (t, b
0
) converges to the periodic orbit.
197
In more picturesque language, the Poincare map lets us examine trajectories
under a strobe light.
If (t) is a periodic trajectory of a d-dimensional ODE, then the Poincare map
is dened on a portion of a (d 1)-dimensional surface. Thus, given a choice of
coordinates, the dierential DP((0)) is a (d 1) (d 1) matrix. (Theorem 4.5.1
provides the means to calculate this matrix, but in practice this is usually messy and
often intractable.) In the next subsection we shall relate the stability of to the
eigenvalues of DP as follows:
Theorem 6.5.2. If every eigenvalue of DP((0)) satises [[ < 1, then is
asymptotically stable. If any eigenvalue satises [[ > 1, then is unstable.
Here is further useful information that is contained in the Poincare map: if there
are any periodic orbits of x

= F(x) that pass close to (0), then these will show up


as additional xed points of P (besides the obvious xed point (0)). Give Exercise
This property will be important in studying the Hopf bifurcation in Chapter 7.
6.5.2 Discrete dynamical systems
By a discrete dynamical system we mean a mapping : | R
d
where | R
d
is
open. Time for such a system is a discrete quantity (n = 0, 1, 2, . . . ) which counts
the number of iterations of : i.e., we dene
0
(z) = z and

n+1
(z) = (
n
(z))
for as long as the iterates remain in |. Of course, if (|) |, the iteration
continues indenitely.
A xed point of a discrete dynamical system is a point z

| such that (z

) =
z

. This concept is analogous to an equilibrium of an ODE. Liapunov stability,


asymptotic stability, and instability of a xed point are dened with the obvious
modications of the denitions for an equilibrium. For example, consider the one-
dimensional discrete dynamical system : R R given by (z) = z
2
. has two
xed points, 0 and 1. The former is asymptotically stable, the latter is unstable.
Exercise
The following theorem provides a convenient test for asymptotic stability of a
xed point; the analogy with Theorem 5.1.1 for stability of equilibria of an ODE
should be apparent.
Theorem 6.5.3. Let z

be a xed point of the (


1
-map : | R
d
. If every
eigenvalue of the Jacobian D(z

) satises [[ < 1, then z

is asymptotically stable.
If [[ > 1 for some eigenvalue, then z

is unstable.
We leave the proof of this theorem, as well as its application to prove Theo-
rem 6.5.2, as an exercise for the reader. Note that an eigenvalue = + i of
198
D(z

) might be complex, in which case one recognizes that [[ =


_

2
+
2
, but
the theorem still holds.
Mention one-dim maps that give chaos, here or in Additional Notes
6.5.3 Application of the Poincare-map criterion
Lets apply Poincare maps to determine the stability of several of our limit-cycle
examples. Since all these examples involve two-dimensional ODEs, the Poincare
maps are one-dimensional, and the only eigenvalue of the dierential is the derivative
P

itself. In Exercise ?? we present one of the rare three-dimensional examples for


which the supporting calculations are feasible.
Example 1: The academic equation (6.4).
Enjoying the articial simplicity of this example, let us calculate the Poincare
map of the periodic solution r(t) 1, (t) = t. We work in polar coordinates, and
we choose the section (6.32)
= (r, ) (1/2, 3/2) S
1
: = 0 (mod 2),
which we represent more simply as the interval (1/2, 3/2). Given b (1/2, 3/2), we
must solve (6.5) with initial conditions
r(0) = b, (0) = 0; (6.38)
since (t) = t, solution of (6.35) gives (b) 2; thus, P(b) = r(2). Recalling the
explicit solution of (6.5) in Exercise 3(e) from Chapter 1, we obtain
P(b) = [1 +e
4
(b
2
1)]
1/2
. (6.39)
The graph of P(b) is essentially constant for b (1/2, 3/2), owing to the microscopic
size of the constant e
4
. Dierentiation yields P

(1) = e
4
< 1, so by Theorem 6.5.2
the limit cycle is asymptotically stable.
For pedagogical reasons, let us also calculate P

(1) by making use of Theo-


rem 4.5.1, ignoring this explicit formula. Let (t, b) be the solution of (6.5) with
initial conditions (6.38), which we write in components (t, b) = (
r
(t, b),

(t, b)).
Then
P(b) =
r
((b), b) where (b) is dened by

((b), b) = 2. (6.40)
By the chain rule
P

(1) =

r
t
((1), 1)

(1) +

r
b
((1), 1), (6.41)
199
and

(1) may be calculated by implicit dierentiation of its dening equation (6.40).


Now by Theorem 4.5.1, /b(t, 1) satises a linear ODE with coecients obtained
from the dierential of (6.5) along the periodic solution: i.e.,
d
dt
_

r
b
_
= 2
_

r
b
_
,
d
dt
_

b
_
= 0.
The theorem also provides initial conditions at t = 0

r
b
(0, 1) = 1,

b
(0, 1) = 0.
The solution of this IVP is
(a)

r
b
(t, 1) = e
2t
, (b)

b
(t, 1) 0. (6.42)
Dierentiating (6.40) and invoking (6.42b), we nd that

(1) = 0. Substituting
(6.42a) into (6.41) and recognizing from the original periodic solution that (1) = 2,
we calculate P

(1) = e
4
, which of course agrees with our previous result.
Example 2: The torqued pendulum (6.6)
In Section 6.1 we constructed a periodic solution of (6.6) from a xed point of
a mapping P : (, M) (, M). Adopting the framework of Poincare maps, let us
dene a section
= (x, y) S
1
R : x = 0 (mod 2) and < y < M.
In analyzing this equation, we showed there that if a trajectory of (6.6) started at a
point (0, b) , then P(b) equals the y-coordinate of the point where the trajectory
next crosses . Thus, if (, M) and are identied, then P is just the Poincare map
of the periodic solution we constructed. Since we calculated that P

(b) < 1, we may


conclude that the limit cycle is asymptotically stable.
Example 3: The van der Pol equation for small
We will calculate an approximate Poincare map with perturbation theory. Su-
percially the calculation resembles our calculation of the periodic solution in Sec-
tion 6.3, but actually it is quite dierent, as we note below. Because the calculations
are more convenient with a second-order scalar equation than with a rst-order
system, we use coordinates (x, x

) on the plane; thus, sometimes prime means dier-


entiation, but it may also simply be a distinguishing mark for a second coordinate.
Dene a section contained in the positive x-axis, say
= (x, x

) R
2
: 1 < x < 3, x

= 0.
200
Regarding the Poincare map, given initial data (b, 0) , consider the IVP
x

+ (x
2
1)x

+ x = 0, x(0) = b, x

(0) = 0
whose solution we will denote x(t, ). First dierence from Section 6.3: here b is
arbitrary, not the special value that leads to a periodic solution. The Poincare map
is given by
P(b) = x((b), ) (6.43)
where (b) satises the equation
x
t
((b), ) = 0. (6.44)
We look for an expansion of x(t, ) with the usual form
x(t, ) = x
0
(t) +x
1
(t) + . . . .
Second dierence from Section 6.3: we do not rescale time here because we want
only to follow the solution for one loop around the origin; there is no issue of errors
accumulating over long times. Substituting the series into the equation and repeating
some calculations from Section 6.3, we nd ODEs
O(
0
) :
d
2
x
0
dt
2
+ x
0
= 0
O(
1
) :
d
2
x
1
dt
2
+ x
1
=
_
b
3
4
b
_
sin t +
b
3
4
sin 3t
subject to initial conditions
x
0
(0) = b,
dx
0
dt
(0) = 0; x
1
(0) =
dx
1
dt
(0) = 0.
Third dierence from Section 6.3: the solution of the IVP for x
1
will have secular
terms, and in fact they are crucialthey are what make the solution converge to the
circle r = 2 as t . The solutions of these IVPs are
x
0
(t) = b cos t
x
1
(t) =
_
b
3
4
b
__

t
2
cos t +
1
2
sin t
_

b
3
32
_
sin 3t
1
3
sin t
_
.
Substitution into (6.44) gives the equation
b sin (b) +O() = 0,
201
from which we conclude that (b) = 2+O(). Then, substitution into (6.43) yields
6
P(b) = x
0
(2) + x
1
(2) +O(
2
) = b
_
b
3
4
b
_
+O(
2
).
(At rst blush one might expect an O()-error in to contribute an O()-error in
x
0
(), but because x

0
(2) vanishes, this pushes the error to higher order.) Dieren-
tiating and substituting b = 2, we nd
P

(2) = 1 2 +O(
2
).
Hence P

(2) < 1 provided is suciently small, so the limit cycle is asymptotically


stable.
Example 4: The van der Pol equation for large
This example is similar to the torqued pendulum, considered above. Specically,
in Section 6.4 we dened a Poincare map P before we knew there was a periodic
solution, we obtained a periodic solution as a xed point of P, and a posteriori we
can see that P is the Poincare map of the periodic solution so constructed. Moreover,
P

1, so the solution is asymptotically stable.


Remark: No calculation possible for between extremes.
6.6 Exercises
Rmk: If I 0 (and if is small), then FH/N is an excitable system. Explain. Also
true if I > 2/3. Actually, you need to make an exercise to introduce the FH/N
system.
Exercises verifying claimed limit sets in Section 6.2.3.
Also give example of homoclinic cycle.
Generalize Section 6.4 to larger class of FH/N eqn
6
Neglecting the O(
2
)-term in this equation, we see that P(b) = b i b = 2. This reproduces the
Poincare-Lindstedt result of Section 6.3 that the periodic solution has radius 2, perhaps with
fewer technicalities. Indeed, one may ask why bother with the Poincare-Lindstedt method at all?
You may not nd our answer terribly compelling: (i) it is traditional, and you will see it in books
on asymptotics, and (ii) it is somewhat easier to determine the orbit to higher order with this
method.
202
Example 6: The equation x

+x

+x = C cos t models a periodically-forced


damped oscillator. As we showed in Exercise 6, as t every solution of this
equation tends to
x
partic
(t) = Acos t + Bsin t, (6.45)
where the coecients A, B were calculated in the exercise.
With the usual constructions, the above second-order, non-autonomous ODE can
be written as an autonomous rst-order system with three variables:
x

= y
y

= x y + Acos z
z

= .
(6.46)
Because the RHS of this system is periodic in z, we may regard it as an ODE on
R
2
S
1
, a generalized cylinder. By comparison with (6.45), we deduce that the curve
(x, y, z) R
2
S
1
: x = Acos z + Bsin z, y = Asin z + Bcos z
is a solution orbit of (6.46) and moreover every solution tends towards this orbit as
t . Make solution more explicit, or make an exercise to ll in details.
Example 3: (Serendipity) In Exercise 5 in Chapter 1, we saw that every solution
of the (scaled) Lotka-Volterra equations
x

= x xy, y

= (xy y) (6.47)
in the open rst quadrant of the x, y-plane was contained in a level set of the function
L(x, y) = (x ln x) + y ln y. It follows from the corollary that, except for the
equilibrium solution (x(t), y(t)) (1, 1), these solutions are all periodic.
Make Exercise:
Estimate the period of the relaxation oscillator. Parametrize eqn with x
eq
= ,
recalling that I() =
3

2
+ /. Along the x-nullcline y = x
2
(1 x) + I()
want to know how long it takes to move from (1, I) to (2/3, I + 4/27). The motion
is driven by y-eqn, but the calculations are more tractable in terms of x. Along the
branch we have
(2x 3x
2
)dx/dt = dy/dt = [x/ (x
2
x
3
+ I())].
Rework this to show the cubic on RHS equals
RHS = (x )q(x)
203
where q(x) = x
2
(1 )x + (
2
+ 1/). Write eqn
_
3x
2
2x + 1/
x
3
x
2
+ x I()
(1/)
1
(x )q(x)
_
dx = dt.
First term integrates to ln(x
3
x
2
+x I()), second term may be expanded into
partial fractions
A
x
+
B(x ) + C
(x )
2
+
2
which integrates to
Aln(x ) + (B/2) ln[(x )
2
+
2
] + (C/) arctan(x ).
If combine various terms and write integrated equation as
(x) = C t,
then time to move along branch is to lowest order
T = ()
1
((1) (2/3)),
and the total period is twice that. Remark: First correction is O(
1/3
), not
O(1) as you might expect. Problem is time to get o nullcline. Ref ce
Computer exercise: Solve the eqn numerically and plot
period/
for a sequence of s tending to zero. (Q: Break up the period into four segments,
two slow segments and two fast segments.) End exercise about period of relax
oscil
Make exercise out of this
Let us apply the theorem to classify the stability of xed points of a particular
system, say
_
x
n+1
y
n+1
_
=
_
x
n
+ ax
n
(1 y
2
n
)
y
n
+ by
n
(x
2
n
1)
_
,
where a and b are real parameters. Independent of a and b, there are ve xed points:
(0, 0), (1, 1), (1, 1), (1, 1) and (1, 1). The Jacobian matrix of
F(x, y) =
_
x + ax(1 y
2
)
y + bx(x
2
1)
_
204
is given by
DF(x, y) =
_
(1 + a) 2ay
2
2xy
2bxy bx
2
+ (1 b)
_
.
In particular,
DF(0, 0) =
_
1 + a 0
0 1 b
_
and DF(1, 1) =
_
1 a 2
2b 1
_
.
If, for example, = 1 and = 1, then 0 is a repeated eigenvalue of DF(0, 0), and
therefore (0, 0) is an asymptotically stable xed point. The eigenvalues of DF(1, 1)
would be
3
2

15
2
i, and the fact that these have modulus larger than 1 implies that
(1, 1) is unstable for this choice of a and b. Testing the stability of the other three
xed points is handled in a similar fashion.
Of course, the stability of the xed points in the above example depends upon the
choices of the parameters a and b and, for each xed point, it is possible to charac-
terize the ranges of a and b for which asymptotic stability occurs. Because DF(0, 0)
is diagonal, the eigenvalues are the diagonal entries, and Theorem 6.5.3 implies that
(0, 0) is asymptotically stable if a (2, 0) and b (0, 2). The eigenvalues
1
,
2
of
the [non-triangular] matrix DF(1, 1) are less apparent, but the following counterpart
to Proposition 2.4.4 allows us to test whether [
1
[ < 1 and [
2
[ < 1 without actually
computing the eigenvalues. end of exercise
Q: Another exercise?
Proposition 6.6.1. If A is a 2 2 matrix, then its eigenvalues have modulus less
than 1 if and only if
(i) trA + det A > 1
(ii) trA det A < 1
(iii) det A < 1.
Proof. See Exercises.
Show that a scalar, autonomous ODE x

= f(x) cannot have periodic solutions.


To see why, suppose indirectly that there exists a periodic solution of x

= f(x),
and let p denote its period. Then, multiply both sides of the ODE by dx/dt
and then integrate over an interval of length p and try to spot a contradiction.
In Lotka-Volterra, all orbits are periodic
Apply Dulacs theorem for a Lotka-Volterra system in which predators can
saturate:
x

= growth
xy
1 + Sx
y

=
_
xy
1 + Sx
y
_
.
Prove rst statement in Theorem 6.5.1.
205
Prove that if Ais a dd matrix with eigenvalues
1
,
2
, . . .
d
, then lim
n
A
n
=
0 if and only if [
i
[ < 1 for each i = 1, 2, . . . d.
Prove Theorem 6.5.3.
Prove Proposition 6.6.1.
Traveling wave of KdV (or maybe in earlier Chapter?)
6.7 Appendix: Index theory in two dimensions
The notion of the index of a closed curve relative to a (
1
vector eld F : | R
2
will help us analyze global behavior of planar systems. In this section, f and g will
denote the components of F, and we shall analyze systems of the form
x

= f(x, y) and y

= g(x, y). (6.48)


Much of the theory in this section relies upon a famous result regarding Jordan
curves in the plane:
Theorem 6.7.1. (Jordan Curve Theorem): Every Jordan curve in the plane sep-
arates R
2
into two disjoint, open, connected sets, both of which have as their
boundary. One region is bounded and simply connected, while the other is neither
bounded nor simply connected.
The Jordan Curve Theorem seems rather intuitive in that we might expect any
simple, closed curve to divide the plane into regions interior and exterior to .
However, the proof (ref ) of Theorem 6.7.1 is surprisingly complicated, as Jordan
curves can be very elaborate (e.g., labyrinths). ref, maybe to Olmstead Coun-
terexamples in Analysis or Bill Ross survey paper?
Let F be as described above, and suppose that is a Jordan curve whose graph
contains no zeros of F. The index of relative to F is an integer I
F
() that measures
the winding of the vector eld F as is traversed exactly once in the counterclockwise
direction. More explicitly (see Figure 6.9), the angle
7
= arctan
_
g(x, y)
f(x, y)
_
(6.49)
formed by the vector F(x, y) and the positive x axis varies continuously as is
traversed. If denotes the net change in over one counterclockwise cycle of ,
7
Here, is not necessarily conned to [0, 2). For example, if the vector F(x, y) spins clockwise
four times during one counterclockwise trip along , then has decreased by 8 and the index
of is 4.
206
2
3
4
5
6
7
8
9
10
11
12
12
1
7
2
8
3
9
4
10
5
11
6

y y
x
x
Figure 6.9: A Jordan curve in a smooth vector eld F(x, y). The angle formed by
F(x, y) relative to the positive x-axis varies continuously as is traversed. Following
the vectors 1 through 12 in increasing order, these vectors complete two clockwise
cycles during one counterclockwise cycle along . Hence, = 4 and the curve
has index 2.
then I
F
() is dened as /(2). Figure 6.9 illustrates this concept, by showing the
vectors F(x, y) (normalized for convenience) at 12 dierent chronologically-labeled
points during a single counterclockwise trip around . Relocating the numbered
vectors so that they are all anchored at the origin (right panel of gure) makes it
easier to observe that the vectors complete two clockwise cycles as we follow them
in increasing order. In this case, = 4, and I
F
() = /(2) = 2.
The index of can also be computed analytically as follows:
Lemma 6.7.2. Let be a (
1
Jordan curve contained in an open set |, and let
F : | R
2
be a (
1
vector eld. Then the index of relative to the vector eld F is
given by
I
F
() =

2
=
1
2
_

fdg gdf
f
2
+ g
2
. (6.50)
Proof. Using (6.49),
=
_

d =
_

d arctan
_
g(x, y)
f(x, y)
_
=
_

fdg gdf
f
2
+ g
2
.
Remark: This Lemma is easily extended to the case in which is piecewise (
1
.
It is illuminating to test out Lemma 6.7.2 on vector elds F associated with the
207
linear systems covered in Chapter 2. Consider the vector eld dened by
F(x, y) =
_
x
y
_
,
for which all vectors point radially outward from the origin, an unstable equilibrium
of the system x

= F(x). Let be the circle parametrized by (t) = (cos 2t, sin 2t),
a Jordan curve which is traversed once in the counterclockwise direction as t increases
from 0 to 1. By the lemma, the index of relative to this vector eld is
1
2
_

xdy ydx
x
2
+ y
2
=
1
2
_
1
0
2 cos
2
(2t) + 2 sin
2
(2t)
cos
2
(2t) + sin
2
(2t)
dt = 1.
Reversing the direction of all vectors in a eld F does not aect the index of . If
denotes the same Jordan curve as in the above example, then
If F(x, y) = (x, y), then I
F
() = 1. The origin is a global attractor for the
system (6.48).
If F(x, y) = (y, x), then I
F
() = 1 once again. The origin is stable but not
attracting, and happens to correspond to a periodic orbit.
If F(x, y) = (x, y), then I
F
() = 1. The origin is a saddle for the system
(6.48).
If
F(x, y) =
_


_ _
x
y
_
=
_
x y
x + y
_
,
with ,= 0, then the origin is either a stable focus (if < 0), center (if = 0),
or an unstable focus (if > 0) for the system (6.48). In any case, I
F
() = 1.
The values of I
F
() computed in the above example would have been the same
regardless of the radius of the circle . In fact, the index would remain unchanged
even if were continuously deformed into any other Jordan curve enclosing the origin.
These statements are made more precise by the following series of propositions which,
in turn, oer considerable information regarding global behavior of (6.48).
Proposition 6.7.3. Let F be (
1
vector eld in the plane and let be a piecewise
smooth Jordan curve. If there are no zeros of F on or in its interior, then I
F
() =
0.
Proof. By Lemma 6.7.2 (see also the Remark that follows that Lemma),
I
F
() =
1
2
_

fdg gdf
f
2
+ g
2
=
1
2
_

fg
x
gf
x
f
2
+ g
2
dx +
fg
y
gf
y
f
2
+ g
2
dy.
208
By our hypotheses, f
2
+ g
2
,= 0 on or inside , so we may apply Greens Theorem
to obtain
I
F
() =
1
2
__
interior of

x
_
fg
y
gf
y
f
2
+ g
2
_


y
_
fg
x
gf
x
f
2
+ g
2
_
dA. (6.51)
The integrand actually reduces to 0 (take our word for it) after the tedious process
of computing the partial derivatives.
Proposition 6.7.4. Suppose F is a (
1
vector eld in the plane, and that
1
and
2
are Jordan curves. If
1
can be continuously deformed to
2
without passing through
any zeros of F, then I
F
(
1
) = I
F
(
2
).
Proof. Referring to Lemma 6.7.2, the index varies continuously as
1
is continuously
deformed to
2
. Since the index is integer-valued, the only way the index could vary
continuously is if its value remains constant, implying that
1
and
2
have the same
index.
If x

is an isolated equilibrium of x

= F(x), the index I


F
(x

) is dened as I
F
(),
where is any Jordan curve such that (i) x

is interior to and (ii) there are


no other equilibria on or inside . Recall our above example with the vector eld
F(x, y) = (x, y), whose only zero is at the origin. If is any circle centered at the
origin, then I
F
() = 1, and therefore we may write I
F
(0) = 1.
If x

is non-degenerate in the sense that DF(x

) has no zero eigenvalues, then


the index of x

is not aected by linearizing the system. Assuming without loss of


generality that x

= 0, let us write F(x) = Ax + r(x), where A = DF(0) and r(x)


is small (following the proof of Theorem 5.1.1). Let G(x) = Ax be the vector eld
dened by the linearization of F.
Proposition 6.7.5. Under the above hypotheses, I
F
(0) = I
G
(0).
Proof. Exercise??? Easier to state for hyperbolic equilibria and avoid possibility
of a pair of complex, pure imaginary e-values....
Consequence: The index of an isolated, hyperbolic equilibrium x

is as follows:
Stable nodes and foci: If both eigenvalues of DF(x

) have negative real part,


then I
F
(x

) = 1.
Unstable nodes and foci: If both eigenvalues of DF(x

) have positive real part,


then I
F
(x

) = 1.
Saddles: If both eigenvalues of DF(x

) are real and of opposite sign, then


I
F
(x

) = 1.
209
Examples corresponding to these three cases would be the vector elds F(x, y) =
(x, y), F(x, y) = (x, y), and F(x, y) = (x, y), respectively, for which the origin
is the lone equilibrium.
Proposition 6.7.6. If happens to be the orbit of a periodic solution of x

= F(x),
then I
F
() = 1.
Proof. Here, we opt for a heuristic argument as opposed to a technical proof. At any
point x on , the vector F(x) is tangent to the graph of . Therefore, during one
counterclockwise trip around , the vector F(x) must spin once counterclockwise, so
that = 2 and I
F
() = 1.
Proposition 6.7.7. Suppose x
1
, x
2
, . . . x
n
are isolated equilibria associated with a (
1
vector eld F in the plane. If is a Jordan curve containing these equilibria in its
interior, then
I
F
() =
n

i=1
I
F
(x
i
).
Proof. We sketch the proof for the special case of n = 2 equilibria, from which
the general case follows immediately. Because the two equilibria are isolated and
contained on the interior of , it is possible to construct two disjoint circles centered
at the equilibria and contained inside (see Figure 6.10). Cut the Jordan curve
into two piecewise smooth Jordan curves along the dashed lines and circles, resulting
in two piecewise smooth Jordan curves as illustrated in the gure. Let J
upper
=

u
A
u
E
u
, denote the Jordan curve in the upper half of the gure, and let
J
lower
(dened analogously) denote the Jordan curve in the lower half of the gure.
By Proposition 6.7.3, both J
upper
and J
lower
have index zero because they enclose no
equilibria. The indices of J
upper
and J
lower
are also equal to the sum of the changes
in the angle over each of the smooth arcs whose unions form those curves:
(J
upper
) = (
u
) + (A
u
) + (B
u
) + (C
u
) + (D
u
) + (E
u
) = 0
(J
lower
) = (
l
) + (A
l
) + (B
l
) + (C
l
) + (D
l
) + (E
l
) = 0.
Now convince yourself of each of the following:
() = (J
upper
) + (J
lower
);
(A
u
) = (A
l
) and similarly for the pairs C
u
, C
l
and E
u
, E
l
;
Combining the preceding facts, the change in during one trip around must
equal the negative of the change in around the circular arcs formed by B
u
, B
l
,
D
u
, and D
l
. The circles formed by B
u
, B
l
and by D
u
, D
l
are oriented clockwise
and, as we complete one clockwise trip around each circle, /2 measures the
negative of the index of the equilibrium enclosed by the circle.
210
A
U
B
U
C
U
D
U
E
U
A
L
C
L
D
L

U
E
L
B
L

Figure 6.10: Illustrating the proof of Proposition 6.7.7. Cut the Jordan curve
along the dashed lines to get the (U)pper and (L)ower pieces.
Piecing everything together, the index of must be the sum of the indices of
the two equilibria.
The previous two propositions combine to form a rather useful result.
Theorem 6.7.8. If (t) is a periodic solution of the planar system (6.48), then the
interior of its orbit must contain equilibria whose indices sum to 1.
Theorem 6.7.8 has a host of consequences. A periodic orbit must enclose at
least one equilibrium and, if the interior of contains exactly one equilibrium, it
must be a stable node or an unstable node (which have index 1) as opposed to a
saddle (index 1). A periodic orbit cannot enclose an even number of hyperbolic
equilibria.
Index theory can sometimes be used to prove non-existence of periodic orbits.
Consider the system
x

= x xy, y

= y xy, (6.52)
where , , and are positive parameters. The system (6.52) can be interpreted
as a crude model for population of two species in competition for the same food
source. If species x is absent, then y grows exponentially with growth constant ,
and if y = 0 then x grows exponentially with growth constant . Both species are
penalized equally by the term xy, which is proportional to the product of the two
populations. There are four nullclines in the phase plane: x = 0, x = /, y = 0,
and y = /, and two equilibria, the origin and (x

, y

) = (/, /). The Jacobian


211
matrices associated with these equilibria are
DF(0, 0) =
_
0
0
_
, DF(x

, y

) =
_
0
0
_
.
The origin is an unstable node since both eigenvalues of DF(0, 0) are real and pos-
itive, and (x

, y

) is a saddle because det DF(x

, y

) = < 0. It follows that


I
F
(0, 0) = 1 and I
F
(x

, y

) = 1. If a periodic orbit (call it ) exists, Theorem 6.7.8


implies that must enclose equilibria whose indices sum to 1, and the only way
this is possible is if encloses the origin but not (x

, y

). Certainly, such would


fail to be biologically relevant since it would include points with negative x and y
coordinates. In fact, we claim that there can be no periodic orbits even if we allow
the possibility that x < 0 or y < 0. Any trajectory enclosing the origin would
cross both coordinate axes, violating the existence and uniqueness theorem because
the axes themselves form solution trajectories. It follows that (6.52) cannot have
periodic solutions.
212
Chapter 7
Bifurcation from equilibria
In Chapter 5 we studied the behavior of solutions of an ODE near a hyperbolic
equilibrium point. In this chapter we address behavior near a nonhyperbolic point.
Both for theoretical reasons and applications, it is natural to consider this problem
in the context of a one-parameter family of ODE, say
x

= F(x, ) (7.1)
where F : | 1 R
d
is a vector-valued function on an open subset of R
d
R. For
example, suppose that for near some xed value

in the interval 1 , (7.1) has a


smoothly varying equilibrium x
eq
() that is nonhyperbolic for =

but hyperbolic
on either side of

. Bifurcation theory seeks to characterize the behavior of solutions


of (7.1) for near

. Unlike near a hyperbolic point, in the present context the


behavior of solutions depends crucially on nonlinear terms in the expansion of F at
the equilibrium point.
In this chapter we study bifurcation phenomena, focusing especially on specic
examples taken from applications. After presenting the most familiar type of bifur-
cation in Section 7.1, we pause to summarize the remainder of the chapter.
7.1 Example 1: Pitchfork bifurcation
(a) The rotating pendulum
In Exercise 8(d) of Chapter 4 we encountered the system
x

= y (7.2)
my

= y mg sin x + m( sin x)
2
cos x,
which describes the motion of a pendulum of length that is rotating about a vertical
213
axis with constant angular speed , as illustrated in Figure 7.1. We t (7.2) into the
context of (7.1) by identifying as the bifurcation parameter . For any value of
, (7.2) has the obvious equilibrium x = y = 0, in which the bead is located at the
bottom of hoop. (We ignore the precarious equilibrium at x = .) To investigate
stability we compute the 2 2 Jacobian matrix
1
of (7.2) at (0, 0):
DF(0, 0, ) =
_
0 1
g +
2

m
_
.
This determinant of this matrix, g
2
, vanishes if =
_
g/, in which case the
equilibrium is nonhyperbolic.
In fact, this example possesses additional structure that is typical for bifurcation
problems. Specically, as the reader may easily verify, Exercise if <
_
g/, then
the equilibrium (0, 0) is asymptotically stable, while if >
_
g/, it is unstable
(more precisely, a saddle point). More colloquially, we say that the equilibrium loses
its stability when crosses
_
g/.
The central message of bifurcation theory is this: When an equilibrium loses
stability as a parameter is varied, expect new solutions of some type to appear. Acting
on this message, we look for steady-state solutions of (7.2). The rst equation implies
that y = 0, and the second then yields the condition
(g +
2
cos x) sin x = 0. (7.3)
The sine factor vanishes if x = 0 or x = : i.e., this factor gives the two equilibria
noted above. The other factor vanishes if
cos x =
g/

2
. (7.4)
This equation has no real solutions if <
_
g/, but two real solutions appear as
soon as crosses the critical value
_
g/. (Can you hear the spirit of bifurcation
theory whispering smugly, I told you so?)
Figure 7.2, known as a bifurcation diagram, shows a graph of these various equi-
librium solutions in the x, -plane. Intervals of where the equilibria are stable are
indicated by a solid curve; unstable, by a dotted curve. (In the Exercises we ask
you to show that the new equilibria given by (7.4) are stable, as is indicated in the
gure.)
Bifurcation diagrams are usually interpreted in the context of what is called
quasistatic variation of parameters. Imagine that, starting from the equilibrium
1
In the context of a general equation (7.1), the notation DF denotes the matrix of derivatives of
F with respect to the state variables x
1
, . . . , x
d
only. We write out derivatives with respect to
parameters explicitly, such as F/ in the case of (7.2).
214
x = 0 with <
_
g/, we increase by a small increment and wait until the system
returns to equilibrium; then increase by another small increment and again wait
for re-equilibration, etc. Nothing will happen as long as stays smaller than
_
g/
the system will remain at its stable equilibrium at x = 0. However, when crosses
_
g/, we expect the system to move away from this equilibrium. Strictly speaking,
x = 0 is still an equilibrium when >
_
g/, but since it is now unstable, if the
system is subjected to the slightest bit of noise, the solution will evolve away from
x = 0. It is natural to conjecture that, for >
_
g/, the solution will tend to one
of the equilibria (7.4). In fact, in Exercise ?? of Chapter 5 we already asked you
to show this. The solution may evolve to either equilibrium, x = arccos(g/
2
),
when rst crosses
_
g/which case occurs depends on accidents in the initial
conditions and the noise. However, once one of the two branches has been selected,
the system will follow that branch under further quasistatic increases of .
Various remarks: (i) The origin of the term bifurcation may be seen in Figure 7.2:
as is increased the unique stable solution x = 0 is replaced by the two stable solu-
tions, x = arccos(g/a
2
). (The now-unstable equilibrium at x = 0 for >
_
g/
is not included in the countingthus one does not speak of trifurcation.) (ii) The
particular bifurcation diagram in Figure 7.2 is known as a pitchfork, for obvious
reasons. Pitchfork bifurcations are common in systems that exhibit reectional sym-
metry. (See Section 7.5.3 for elaboration of this statement.) Note that solutions of
the reduced equation (7.3) are unchanged by the reection x x. In the original
ODE, symmetry is expressed as the following property: If (x(t), y(t)) is a solution of
(7.2), then so is (x(t), y(t)). (iii) It follows from general principles, which will be
developed below, that the bifurcating solutions in Figure 7.2 are stableone need
not do a specic calculation to derive this fact, although for pedagogical reasons we
ask you to perform this exercise.
(b) The Lorenz equations
As a second example of a pitchfork bifurcation, recall from Exercise 8(e) the
Lorenz equations
x

= (y x)
y

= x y xz
z

= z + xy,
(7.5)
where , , and are positive parameters. We reverse the order of presentation from
the previous examplehere we rst look for equilibrium solutions of (7.5), and then
we make the connection with a loss of stability. The rst equation implies that x = y
at equilibrium, the third equation then implies that z = y
2
/, and substitution into
the second yields the equation
y( 1 y
2
/) = 0. (7.6)
215
a
x

mass
m
mg
a sin(x)
Figure 7.1: Schematic diagram of the bead on a rotating wire hoop.
g/a =
x =
+_
arccos(g/aw )
2

x
Figure 7.2: Bifurcation diagram for the system (7.2) The equilibrium x = 0 loses
stability when the hoops rotation speed reaches
_
g/.
Figure 7.3: Convective rolling of uid between two parallel plates of dierent tem-
perature. The plates extend to innity; the gure shows only four representative cells
in an innite array.
216
This equation can be satised by virtue of either factors vanishing, which yields
a pitchfork bifurcation diagram as shown in Figure 7.4, where is taken as the
bifurcation parameter. As is conventional, we plot only the one variable y in the
gure. This variable is sucient to determine the equilibria of (7.5)given y, the
other two variables may be obtained, as above, from the equations x = y and z =
y
2
/.
For any , consider the trivial solution x = y = z = 0 of (7.5) derived from
the solution branch y = 0 of (7.6). In the Exercises we ask you to show that this
trivial solution is asymptotically stable
2
for < 1 and unstable for > 1. The
bifurcating solutions, for which y =
_
( 1), grow out of the trivial solution
at the same pointthe bifurcation pointwhere the trivial solution loses stability.
This is another instance of the central phenomenon of bifurcation theory.
Note that the solutions of (7.6) are unchanged by the reection y y, and in
the original ODE, if (x(t), y(t), z(t)) is a solution of (7.2), then so is (x(t), y(t), z(t)).
As noted above, problems that exhibit a pitchfork bifurcation typically have such a
reectional symmetry.
In this example like in the preceeding one, the bifurcating equilibria are asymptot-
ically stable in some neighborhood of the bifurcation point. Again this behavior also
follows from general principles, but while one is learning the subject, it is informa-
tive to verify it explicitly. As it happens, depending on parameters, the bifurcating
solutions may become unstable at large values of through another bifurcationsee
Exercises.
Some discussion of physical interpretations may make this example more mean-
ingful. E. Lorenz ref ce studied (7.5) as a model problem to shed light on the
generation of weather patterns in the atmosphere. The system arises from a massive
simplication of PDEs that describe Raleigh-Benard convectionthis term refers to
motion of uid conned between parallel plates held at xed temperatures. Speci-
cally, if the lower plate is suciently hot, then thermal expansion of the uid induces
buoyant motion as low-density, expanded uid rises through the denser layer above
it.
In (7.5), the variable y species the amplitude of a velocity eld in the uid in
the form of rolls, as indicated in Figure 7.3. (x and z specify the temperature.)
Rolls in adjacent cells alternate in orientation, clockwise or counterclockwise. If the
velocity eld in the gurecw, ccw, cw, ccw for the four cells showncorresponds
to y > 0, then y < 0 corresponds to ccw, cw, ccw, cw for the four cells: i.e., the
reverse orientation. This reversal of orientation gives rise to the reectional symmetry
mentioned above.
2
In Exercise ?? in Chapter 5 you were asked to show that, if < 1, this trivial equilibrium is
globally stable.
217
In (7.5), the bifurcation parameter is a nondimensionalized Rayleigh number;
i.e., it is proportional to the temperature dierence between the two plates. The
equilibrium x = y = z = 0, which corresponds to a state in which the uid is
stationary, is (globally) stable provided that the temperature dierence between the
two plates is small enough: i.e., < 1. However, if > 1, the trivial equilibrium
loses stability and motion ensues.
(c) A laterally supported pendulum
In terms of the interpretation of bifurcation diagrams based on quasistatic varia-
tion of parameters, both Figures 7.2 and 7.4 show the same qualitative behaviori.e.,
as forcing is increased, a trivial response of the system evolves into one of two pos-
sible nontrivial, steady-state responses. Our next example is a pitchfork bifurcation
that exhibits dierent qualitative behavior.
Recall the laterally supported pendulum from Exercise ?? in Chapter 5, which is
a Hamiltonian system with potential energy
V (x) = mg cos x + k( sin x)
2
/2. (7.7)
For variety lets analyze the bifurcations of this system using energy considerations
rather than writing out equilibria of the ODEs. The equation for equilibria, V/x =
0, is
[mg + k
2
cos x] sin x = 0. (7.8)
The two obvious equilibria x = 0, of (7.8) are associated with the sine factor
vanishing. Provided m < k/g, the nontrivial factor of (7.8) has two solutions
located symmetrically about the inverted equilibrium x = . These solutions are
graphed in the bifurcation diagram of Figure ??, where the mass m is taken as
the bifurcation parameter. Stabilities of the equilibria, which may be determined
from the sign of
2
V/x
2
, are also indicated in the gure. In words: the inverted
equilibrium x = is stable provided the mass m is not too great; otherwise only the
straight-down equilibrium x = 0 is stable. In particular, we ask you to verify that in
this case the nontrivial solutions of (7.8) are unstable.
Let us contrast the behavior of this system with that of the two preceding exam-
ples. Before, under quasistatic variation of parameters, when forcing was sucient
to de-stabilize the trivial equilibrium, the system evolved to a nontrivial response
that was an equilibrium close to the trivial equilibrium. By contrast, consider the
laterally supported pendulum when, starting from the inverted position x = , the
mass m is increased quasistatically: once the threshold k/g is exceeded, the system
evolves to states far removed from x = . In other words, the response in the rst
two examples is continuous; in the present example, discontinuous.
The pitchfork bifurcations (a) and (b) above are called supercritical. This term
refers to the fact that the nontrivial equilibria appear for forcing greater than required
218

crit
0

1
S
S
S
U
U
U
y
Figure 7.4: Bifurcation diagram for the Lorenz system assuming 1 < 0. Solid
curves correspond to (S)table equilibria and dashed curves correspond to (U)nstable
equilibria. Note the pitchfork bifurcation at = 1 and the subsequent bifurcation at
=
crit
, both of which are discussed in the text.
to destabilize the trivial solution. By contrast, bifurcation (c) is called subcritical be-
cause the nontrivial equilibria appear for forcing smaller than required to destabilize
the trivial solution. (But see Exercise ?? for a subtlety concerning these denitions.)
7.2 An outline of this chapter
The term bifurcation seems natural for the phenomena described in Section 7.1.
More generally, however, this term has come to used to describe any change in the
qualitative behavior of solutions of an ODE as a parameter changes. Such changes
include local phenomenathe focus of the present chapterand global phenomena
the focus of Chapter 8. Particularly in the latter case, the relevant behavior may
have no association whatsoever with the term bifurcation.
Order of sections changed. Rewrite following paragraph.
The context of the local theory is a one-parameter family of ODE x

= F(x, )
such that for some specic parameter value

the equation x

= F(x,

) has an
isolated non-hyperbolic equilibrium. The central phenomenon of local bifurcation
theory is that there (usually) are additional, unexpected solutions of the equation
near the nonhyperbolic equilibrium. In the previous section and in Sections 7.35,
we present typical examples of such behavior. The examples of Sections 7.1 and 7.3,
4 all share the property that the additional solutions are steady-state (equilibrium)
solutions, which is called steady-state bifurcation; in Section 7.5, the unexpected
solutions are (time-dependent) periodic solutions, which is called Hopf bifurcation.
In contrast to these examples-oriented sections, Sections 7.6, 7 concern theory.
Section 7.6 introduces the Liapunov-Schmidt reduction, a very useful tool for study-
219
Equilibrium Description Stability
(0, 0) Extinction A saddle for all K
(K, 0) Prey-only A stable node if K < 1
A saddle if K > 1
(1, 1 1/K) Co-existence An unphysical saddle if K < 1
A sink for K > 1
Table 7.1: Equilibria of (7.9), the Lotka-Volterra system augmented by logistic
growth.
ing steady-state bifurcation. A full analysis of Hopf bifurcation is beyond the scope
of this book, but Section 7.7 introduces some theoretical tools for studying these
bifurcations.
Finally in Section 7.8 we apply the tools of the preceding section to a Hopf
bifurcation in a specic equation that arises in models for nerve cells, the FitzHugh-
Nagumo equations.
7.3 Example 2: Transcritical bifurcation
The Lotka-Volterra equations with logistic limits
In this section we encounter another type of bifurcation. Recall from Section 1.6
the Lotka-Volterra model of a predator-prey system (1.41) modied to have logistic
growth for the prey (but, for simplicity, not including the Allee eect):
(a) x

= x(1 x/K) xy
(b) y

= (xy y).
(7.9)
The variables x and y represent prey and predator populations, respectively, while
, K are positive parameters; we regard the carrying capacity K as the bifurcation
parameter. The equilibria of (7.9), along with their stabilities, are listed in Table 7.3,
which is taken from Table 5.3; and the equilibria are graphed in the bifurcation
diagram of Figure 7.5.
The example provides another illustration of the fundamental phenomenon of
bifurcation theory. Specically, the prey-only equilibrium loses stability as K crosses
1, and the co-existence equilibria bifurcate from the prey-only equilibrium at precisely
this point. This kind of bifurcation is known as a transcritical bifurcation because
the bifurcating solutions exist for K both below and above the bifurcation point.
It is useful to articulate the behavior implied by Figure ?? if, starting from K < 1,
this parameter is increased quasistatically. As long as K < 1, equation (7.9) predicts
that the predators will die out; but when K > 1, then any solution with a nonzero
220
0
0 1 2
K
11/K
S
S U
U
y
Figure 7.5: Transcritical bifurcation in (7.9) at K = 1. The equilibrium
(x, y) = (K, 0) switches from (S)table to (U)nstable as K increases past 1, just as
(x, y) = (1, 1 1/K) switches from unstable to stable.
prey population at t = 0 will converge to the co-existence equilibrium. We may say
that the two equilibria in Figure ?? experience an exchange of stability at K = 1.
This idea will be developed considerably in Section 7.6.
7.4 Example 3: Saddle-node bifurcation
In pitchfork and transcritical bifurcations, a smoothly varying equilibrium solution of
an ODE loses stability as a parameter varies. Saddle-node bifurcations, also known as
limit-point bifurcations [] or blue-sky bifurcations [], dier in that a stable equilibrium
disappears altogether.
(a) The torqued pendulum
The most easily visualized such bifurcation is the torqued pendulum, introduced
in Section 4.3.4:
x

= y
y

= sin x y + .
(7.10)
The bifurcation diagram for this equationa graph of the equilibrium value of x vs.
the bifurcation parameter is shown in Figure 7.7. In the exercises we ask you to
show that the solution satisfying 0 < x < /2 is a stable node while the solution
in (/2, ) is a saddle, which of course is unstable. (In particular, this information
motivates the name saddle-node.) In words, the stable equilibrium disappears
when it and the unstable equilibrium annihilate one another.
Once again, the bifurcation diagram Figure ?? suggests a specic scenario under
quasistatic increase of : While < 1, the system can follow its stable equilibrium in
the interval (0, /2), but when passes 1, the system evolves to states far removed
from this equilibrium. Specically, it converges to the periodic solution discussed in
221
torque
x
Figure 7.6: Torqued pendulum.
Example 4 of Section 6.1.
(b) Activator-inhibitor systems
Lets show that a saddle-node bifurcation also appears in the activator-inhibitor
system (??),
(a) x

=
1
1+r
x
2
1+x
2
x
(b) r

= [
x
2
1+x
2
r].
(7.11)
In Section 5.3 we were interested in cases where was large and (7.11) had three
equilibrium solutions, but here we remove such restrictions on , which we regard as
the bifurcation parameter. To enumerate equilibria of (7.11), rst we solve (7.11b)
to obtain
r =
x
2
1 + x
2
; (7.12)
then, excluding the zero solution from (7.11a), we divide by x and rewrite this
equation as r = x/(1 +x
2
) 1, substitute (7.12) for r in this formula, clear 1 +x
2
from the denominator, and rearrange, yielding
( + 1)x
2
x + 1 = 0.
Thus, recalling the zero solution, we see that (7.11) has three equilibria if >
2

+ 1 and one if 0 < < 2

+ 1. This information is shown graphically in the


bifurcation diagram of Figure 7.8.
Suppose, starting from > 2

+ 1 and assuming the system is in the top equi-


librium in Figure 7.8, the bifurcation parameter is decreased quasistatically. While
> 2

+ 1, the system can follow this top equilibrium branch, but when passes
2

+ 1, the system evolves to states far removed from this equilibrium. In this
222
2

1
0

x
S
U
bifurcation
saddlenode
Figure 7.7: Saddle-node bifurcation in the torqued pendulum equations (7.10).
= 2 +1
0
0

x
S
U
S
bifurcation at
saddlenode
Figure 7.8: Saddle-node bifurcation in the Turing equations (7.11).
223
case, the system collapses to the state x = 0.
7.5 Theory for steady-state bifurcation: the Liapunov-Schmidt
reduction
7.5.1 Bare bones of the reduction
With the Liapunov-Schmidt reduction, one may greatly reduce the number of vari-
ables in calculations of steady-state bifurcation. Indeed, in all the above examples,
the reduced problem has only one state variable (plus of course the various parame-
ters). For example, recall how we analyzed bifurcation in the Lorenz equations (7.5):
rst we solved (7.5a) to obtain x = y, using this we solved (7.5c) to obtain z = y
2
/,
and we nally substituted into (7.5b), yielding
y[ 1 y
2
/] = 0.
This one-dimensional equation is the relation graphed in Figure 7.4. The general
reduction proceeds in pretty much the same way.
To set the general context, consider a one-parameter family of ODEs, say
x

= F(x, ). (7.13)
Suppose that for =

, equation (7.13) has an equilibrium solution x = x

. If the
equilibrium x

is hyperbolic, or even if the Jacobian matrix DF

merely satises
det DF

,= 0, (7.14)
then by the implicit function theorem one may solve the equilibrium equation
F(x, ) = 0 (7.15)
uniquely near (x

) for x as a smooth function of : i.e., no bifurcation occurs.


Thus, to have bifurcation at (x

), (7.14) must be violated. The minimal failure


of (7.14) occurs if
0 is a simple eigenvalue of DF

. (7.16)
(Exercise: Check that (7.16) is satised in all the examples in Sections 7.13.) In
this case, the Liapunov-Schmidt technique may be used to reduce (7.15) to a single
scalar equation.
To see how this works, let us for simplicity rotate the coordinates so that the rst
coordinate vector
e
1
= (1, 0, . . . , 0) spans ker DF

. (7.17)
224
Then the rst column of DF

vanishes, so we may write in block notation


DF

=
_
0 v
T
0 A
_
where v and 0 are (d 1)-component vectors and A is the (d 1) (d 1)-matrix,
the Jacobian of F
2
, . . . , F
d
with respect to x
2
, . . . , x
d
. By (7.16), the submatrix A is
nonsingular. Thus by the implicit function theorem the last d1 equations of (7.15)
F
2
(x
1
, x
2
, . . . , x
d
, ) = 0
. . . . . .
F
d
(x
1
, x
2
, . . . , x
d
, ) = 0
(7.18)
may be solved near (x

) for x
2
, . . . , x
d
as functions of x
1
and , say
(x
2
, . . . , x
d
) = X(x
1
, ).
Dene the scalar function g(x
1
, ) by substituting this formula for x
2
, . . . , x
d
into the
rst component of F:
g(x
1
, ) = F
1
(x
1
, X(x
1
, ), ).
Here is the reduction: under the above hypotheses, there is a one-to-one corre-
spondence between solutions near (x

) of the full problem (7.15) and the reduced


problem
g(x
1
, ) = 0. (7.19)
Specically, if (x
1
, ) is a solution of (7.19), then (x
1
, X(x
1
, ), ) is a solution of
(7.15), and every solution of (7.15) near (x

) arises in this way. Indeed, the claim


follows from observing that (7.19) results merely from processing the d equations in
(7.15) sequentially.
The best way to understand these ideas is to re-examine the bifurcation problems
considered above and interpret them as specic examples of the reduction. Exercise
7.5.2 Stability issues
In fact, stability information may also be derived from the reduction, provided (7.16)
is strengthened as follows :

1
(DF

) = 0,
j
(DF

) < 0, j = 2, . . . , d. (7.20)
(In words, (7.21) asserts that the condition for x

to be asymptotically stable misses


by one dimension.) Let us put the reduced function (7.19) on the RHS of a scalar
225
ODE
x

= g(x, ). (7.21)
Observe that if g(x, ) = 0i.e., if x is an equilibrium of (7.21)then this equi-
librium is asymptotically stable for (7.21) provided the derivative g
x
(x, ) < 0 and
unstable provided g
x
(x, ) > 0.
Theorem 7.5.1. Given a solution (x, ) of g(x, ) = 0, let x = (x, X(x, )) be the
corresponding equilibrium of (7.13). Then regarding (7.13), x is
asymptotically stable if g
x
(x, ) < 0,
unstable but hyperbolic if g
x
(x, ) > 0, and
nonhyperbolic if g
x
(x, ) = 0.
While this result can be rigorously proved, the notation is mind numbing, and
we prefer to give an informal discussion. The stability of equilibria of (7.13) may be
computed from the signs of the eigenvalues of DF

. Given (7.20), we deduce from


Theorem ?? appendix on evalues that (a) only
1
(DF

) could become positive


i.e., be a stability breakerand (b)
1
(DF

) is a smooth, real-valued function of


(x, ) near (x

, ). The crux of the proof is to show that at any equilibrium of (7.21),


the sign of g
x
is the same as the sign of
1
(DF

) at the corresponding equilibrium


of (7.13). We refer to [?] for the details of this proof. In the Exercises we ask you to
verify the conclusions of this theorem for the bifurcation problems considered above.
7.5.3 Exploration of one-dimensional bifurcation problems
In this section we use the Liapunov-Schmidt reduction to introduce a partial hi-
erarchy of steady-state bifurcation problems that satisfy (7.20). Without loss of
generality we can translate coordinates so that the bifurcation point is located at
x = 0, = 0. Thus we consider the ODE (7.13) supposing that
F(0, 0) = 0 (7.22)
and that the Jacobian DF
0
satises condition (7.20). The reduced function g(x, ),
dened near (0, 0), then satises
g(0, 0) = g
x
(0, 0) = 0, (7.23)
as follows from Theorem 7.5.1 since 0 is a non-hyperbolic equilibrium of x

= F(x, 0).
(Alternatively, in Exercise ?? we ask you to show directly that g
x
(0, 0) = 0.)
One-dimensional bifurcation problems can be roughly classied by how many
derivatives of g, beyond (7.23), vanish, as is done in Table 7.5.3. The phrase normal
226
Vanishing der. Normal form Name Example
None x

= x
2
+ saddle-node Activator-inhibitor network, (7.11)
g

= 0 x

= x
2
+ x transcritical Logistic Lotka-Volterra , (7.9)
g
xx
= 0 x

= x
3
+ hysteresis point CSTR, Exercise ??
g

= 0, g
xx
= 0 x

= x
3
+ x pitchfork Lorenz equations, (7.5)
Table 7.2: Partial classication of one-dimensional bifurcation problems
form, which appears in the table, refers to a particularly simple version of a bifurca-
tion problem that captures the essential behavior of a class of problems. (We shall
use this phrase informally, shying away from a precise, technical denition.)
To gain intuition, let us consider the construction of one of these normal forms,
the pitchfork. A somewhat more general scalar ODE
3
with a degenerate equilibrium
(i.e., satisfying (7.23)) for which g

= 0, g
xx
= 0 is
x

= Ax
3
+ Bx.
Suppose the coecients A and B are nonzero. If we rescale x = [A[
1/2
x and = [B[,
then we may reduce this equation to
x

= x
3
x. (7.24)
The dierent bifurcation diagrams for the four choices of sign are shown in Figure 7.9.
We regard Cases 1 and 2, for which the cubic coecient in (7.24) is negative, as
essentially equivalent, for the following reasons. In Case 1 the trivial solution is
stable for < 0 and unstable for > 0, while in Case 2 it is the other way around.
However, in both cases, as the bifurcation parameter crosses zero, the trivial solution
loses stability, to be replaced by two stable bifurcating solutions; the only dierence
is the reversal of the orientation of the parameter change that causes instability, a
dierence that does not seem important to us. (In the terminology of Section 7.1, the
bifurcation of x

= x
3
+x is supercritical.) It is conventional to collapse these two
cases into the one case x
3
+x in the Table by orienting the bifurcation parameter
so that the trivial solution loses stability as is increased
4
.
In Cases 3 and 4, which we similarly collapse into the normal form x
3
+ x, the
cubic coecient in (7.24) is positive. This change in sign cannot be scaled away.
Rather it indicates a real dierence in behaviorin Cases 3 and 4 the bifurcation is
subcritical.
Note that we have not included any higher-order terms in the normal forms in
3
Below we will discuss examples with additional, higher-order, terms.
4
This convention is natural in that in applications it is more common for instability to appear as
the parameters in the problem are increased, as occured in the examples of Section 7.1.
227
x x
x x


Case 3: + +
Case 1: +
Case 4: +
Case 2:
Figure 7.9: Pitchfork bifurcations in (7.24) for each of the possible pairs of sign
choices. Per our usual convention, stable equilibria are indicated with bold, solid
curves and unstable equilibria are indicated with thinner, dashed curves.
228
the Table. More generally, consider an ODE that does have higher-order terms, say
x

= Ax
3
+ Bx +O(x
4
, x
2
,
2
x)
where A < 0 and B ,= 0. We may of course scale the variables to obtain
x

= x
3
+ x +O(x
4
, x
2
,
2
x). (7.25)
The higher-order terms in (7.25) do not aect the qualitative dynamics because
under appropriate change of coordinates, they may be transformed away. We do
not prove this statement, but we shall discuss it further, including references, in the
Notes. Here we limit ourselves to proving the more modest claim that the qualitative
structure of the equilibria of (7.25) is identical to that of the pitchfork normal form.
To prove this, let us factor the equilibrium equation for (7.25) as
x[x
2
+ +O(x
3
, x,
2
)] = 0. (7.26)
Thus, if is given, then x is an equilibrium of (7.25) i x = 0 or
x
2
+ +O(x
3
, x,
2
) = 0.
By the implicit function theorem one may solve this relation for to deduce that
= x
2
+O(x
3
).
Therefore, near the origin, for < 0 equation (7.25) has the unique equilibrium
x = 0, and for > 0 it has three equilibria. Moreover, in the Exercises we ask you
to show that the trivial solution is stable for < 0 and unstable for > 0 while the
two bifurcating solutions that appear for > 0 are stable. This proves the claim.
To conclude the discussion of normal forms and the pitchfork: Consider a family
of d-dimensional ODEs
x

= F(x, ) (7.27)
that has a nonhyperbolic equilibrium (x

) where the Jacobian DF

satises (7.20).
We shall say that equation (7.27) exhibits a pitchfork bifurcation if at the bifurcation
point the reduced equation satises
g = g
x
= g
xx
= g

= 0, and g
xxx
,= 0, g
x
,= 0. (7.28)
As shown above, it then follows that the qualitative structure of the equilibria of
(7.27) is identical to that of the normal form x
3
x = 0. The bifurcation is
supercritical if g
xxx
< 0, subcritical if g
xxx
> 0. Incidentally, general formulas for
calculating these low-order derivatives of g are given in Section 1.3 of [2]. Regrettably,
such calculations are often rather technical.
229
Similar discussions for the other normal forms in the table are given in [?], but
we shall not pursue this subject further.
7.5.4 Symmetry and the pitchfork bifurcation
By a reection on R
d
we mean a linear map R : R
d
R
d
such that R
2
= I. We say
that an ODE x

= F(x) is symmetric
5
under R if
F(Rx) = RF(x) for all x. (7.29)
Alternatively, if an ODE is symmetric under R, then for any solution x(t), the
reected function Rx(t) is also a solution, and conversely. Recall that all three ex-
amples of pitchfork bifurcation in Section 7.1 possessed such symmetry; the following
result indicates that such behavior is no accident.
Theorem 7.5.2. Suppose that: (i) for near

the ODE (7.27) is symmetric with


respect to the reection R; (ii) for near

, (7.27) has an equilibrium solution


x
eq
() that is invariant under R: i.e., Rx
eq
() = x
eq
(); (iii) the Jacobian DF

satises condition (7.20) at (x


eq
(

),

); and (iv) the null eigenvector v of DF

satises Rv = v. Then at the bifurcation point the reduced function satises


g
xx
= 0, g

= 0.
Of course (7.20) implies that g = g
x
= 0. Thus, provided g
xxx
,= 0 and g
x
,= 0,
it follows from the theorem that the bifurcation is a pitchfork.
This result is proved in [?]. More informative than reading the proof, however,
is to verify that each of the examples in Section 7.1 satises the hypotheses of the
theorem.
Remarks: (i) If the Liapunov-Schmidt reduction is performed in a manner that
respects symmetry (and it would be perverse to do otherwise), then g is an odd
function of x so all derivatives of even order in x must vanish at x = 0. (ii) General
considerations imply that the null eigenvector v satises either Rv = +v or Rv =
v. In words, the fourth condition in the theorem requires that the bifurcation
break the symmetry. By contrast, if Rv = +v, symmetry has no implications for
the bifurcation.
7.5.5 The two-cell Turing instability
Let us return to the Turing instability with two interacting cells, introduced in Sec-
tion 5.3.2. We consider (5.28) as a bifurcation problem with the diusion coecient
D as the bifurcation parameter. From our calculations in that section we know:
5
The technical term used in [?] is equivariant.
230
for all D (5.28) has a trivial equilibrium (x
+
, r
+
, x
+
, r
+
) in which the concen-
trations are the same in both cells;
There is a threshold value D
thr
such that the trivial solution is stable if D < D
thr
and unstable if D > D
thr
. Moreover, at D = D
thr
the bifn problem satises
the minimal-degency condition (7.20); and
equation (5.28), as well as the trivial solution, is symmetric with respect to the
interchange of concentrations in the two cells,
R (x
1
, r
1
, x
2
, r
2
) = (x
2
, r
2
, x
1
, r
1
), (7.30)
which is a reection.
In Section 5.3.2 we left open the question of how solutions of (5.28) behave when
D > D
thr
. How, however, Theorem 7.5.2 strongly suggests that the trivial solution
undergoes a pitchfork bifurcation. To prove this (as well as to determine analytically
whether the pitchfork is supercritical or subcritical) we would need to calculate g
xxx
and g
x
and show that they are non-zero. Although this is possible, it is tedious, and
we prefer to answer these questions by the simulation shown in need Figure ??.
Although the computations do not prove anything, who can see the gure and doubt
that the trivial solution of (5.28) undergoes a supercritical pitchfork bifurcation at
D = D
thr
?
7.5.6 Imperfect bifurcation
An ODE such as (7.13) represents an idealized description of some physical system,
but real systems will dier from the idealized description in myriad ways that are
impossible to enumerate. For example, in the bead equation (7.2), let us suppose the
axis of rotation of the ring is very slightly o-center, say by a distance as in need
Figure ??. In this case, the length of the rotation arm is slightly changedfrom
sin x to sin x + and the equations of motion for the bead will read
x

= y (7.31)
my

= y mg sin x + m(a sin x + )


2
cos x.
This perturbation splits the bifurcation diagram into two connected pieces, in dif-
ferent ways depending on the sign of . (See Figure ??) If is increased quasi-
statically, say with > 0, then the equilibrium of the bead will evolve smoothly
from an equilibrium near the bottom of the loop to an equilibrium with x > 0; sim-
ilarly if < 0. In other words, making nonzero removes the indeterminancy of the
idealized, perfectly symmetric, problem.
231
Note that even with ,= 0, both nontrivial equilibria exist and are stable for
suciently large. The dierence is that you cannot reach one of them by quasi-static
variation of it takes a nite (noninnitesmal) perturbation to reach the other
one.
Similarly, in the Turing instability, if the perfect symmetry between the two cells
is slightly perturbed, one of the two bifurcating solutions will be preferredi.e., be
reachable by quasi-static variation of the bifurcation parameter.
In an attempt to model deviations of a physical problem from an idealized descrip-
tion, one may consider subjecting the equation to an arbitrary small perturbation.
(This kind of analysis goes by the name of imperfect bifurcation.) For problems
satisfying (7.16), it suces to consider one-dimensional examples because of the
Liapunov-Schmidt reduction. For example, the perturbed supercritical pitchfork
x

= x
3
+ x +
exhibits the same qualitative behavior seen in the bead example. In the Exercises we
ask the reader to construct perturbed bifurcation diagrams for other normal forms
in Table 7.5.3.
7.5.7 A bifurcation theorem
To conclude our discussion of steady-state bifurcation, we record a bifurcation theo-
rem of the traditional sort. This result doesnt add much for steady-state bifurcation,
but it provides a useful point of comparison for Hopf bifurcation.
Consider a one-parameter family of ODE such that for each , the equation
x

= F(x, ) (7.32)
has an equilibrim, x = x
eq
(), which varies smoothly with . Suppose that for
=

, the Jacobian DF

of this equation at (x
eq
(

),

) has a simple eigenvalue


zero: i.e., assume (7.20). Let DF

be the Jacobian at (x
eq
(), ). By Theorem ?? in
Appendix C, there is a smoothly varying eigenvalue () of DF

such that (

) = 0.
The bifurcation theorem requires the following additional hypothesis:
d
d
(

) ,= 0. (7.33)
Theorem 7.5.3. Under the above hyp, there is smooth curve (X(a), M(a)), where
< a < of nontrivial equilibrium solutions of (7.32) that bifurcates from the triv-
ial solution at (x
eq
(

),

); thus, (X(0), M(0)) = (x


eq
(

),

). Moreover, X(a) =
av +O(a
2
), where v spans the kernal of DF

. There is a neighborhood ^ R
d
R
of (x
eq
(

),

) such that any equilibrium of (7.32) in ^ lies on either the trivial


branch or the bifurcating branch.
232
We do not prove this result. Provided that F (
2
, the proof is just an application
of the implicit function theorem to the Liapunov-Schmidt reduction of (7.32) (see
[]). Moreover, there are generalizations of the theorem that do not require even this
degree of smoothness []
Remarks: (i) The theorem makes no assertion about how M(a) depends on the
parameter a (e.g., super- or subcritical) nor about the stability of the bifurcating
solutions. In this connection consider the following one-dimensional example:
x

= x.
In this case the bifurcating branch of solutions is just the line = 0i.e., M(a)
0and every equilibrium on the bifurcating branch is neutrally stable. (ii) The
following example shows the need for the hypothesis (7.33):
x

= x
3

2
x.
If the minus sign is selected, then there are not any bifurcating solutions, and if the
plus sign is selected, then there are two curves of bifurcating solutions. Incidentally,
in (7.28) the inequality g
x
,= 0 is equivalent to (7.33). (iii) Exercise ?? illustrates
what can happen if DF

has a multiple eigenvalue 0.


7.6 Example 4: The Hopf bifurcation
(a) An academic example
To establish the ideas regarding this rather dierent bifurcation, let us begin with
a simple, but purely academic, example of it:
x

= x y (x
2
+ y
2
)x
y

= x + y (x
2
+ y
2
)y
(7.34)
which we may rewrite in matrix notation
_
x

_
=
_
1
1
_ _
x
y
_
(x
2
+ y
2
)
_
x
y
_
.
For all , this equation has a (unique) equilibrium at x = y = 0. At the equilibrium
the linearization is
DF(0, 0) =
_
1
1
_
which has eigenvalues i. In particular it is stable for < 0 and unstable for
> 0. However, the mechanics for the loss of stability are dierent from the previous
examples of bifurcation. In those examples, the trivial solution lost stability when a
233
single real eigenvalue of the Jacobian passed through zero. Here stability is lost as a
pair of complex-conjugate eigenvalues crosses the imaginary axis
6
.
The bifurcation examples above lead us to expect some change in the set of
solutions of (7.34) as crosses zero. However, no new equilibria appearthe only
equilibrium of (7.34) is x = y = 0, no matter what the value of . To see what
change does occur near = 0, let us rewrite the equations in polar coordinates
r

= r r
3

= 1.
(7.35)
Note that r

vanishes if r = 0 or if r
2
= . The former solution just represents
the equilibrium x = y = 0. By contrast, the latter solution represents a new type
of orbit that appears as crosses zero: i.e., r =

, = t where we have chosen
the phase arbitrarily, or in Cartesian coordinates, x =

cos t, y =

sin t. To
summarize, in Hopf bifurcation, periodic solutions appear when a pair of eigenvalues
of the Jacobian crosses the imaginary axis.
It is instructive to consider a more general, but still academic, example of Hopf
bifurcation:
r

= r r
3

= 1 + r
2
(7.36)
where , are parameters. If > 0, then the bifurcating periodic solutions exist
for > 0 and are stable (where the equilibrium is unstable) as in Figure 7.10; this
case is called supercritical. On the other hand, if < 0, then the periodic solutions
exist for < 0 and are unstable, which is called subcritical. In the supercritical
case, the bifurcating solutions describe the new behavior of solutions if is increased
beyond zero. In the subcritical case, the bifurcating solutions constrict the domain
of attraction of the equilibrium as tends to zero from below, but for > 0 the
solution evolves to states far away from the equilibrium.
In fact, (7.36) has greater generality than may be apparentindeed it is shown in
[?] that under rather general circumstances, near a Hopf bifurcation of an equation
such as (7.1), there is a coordinate transformation that reduces the general problem
to the normal form (7.36), modulo higher-order corrections. Incidentally, note that
the parameter in (7.36) makes the period of the oscillations vary with their ampli-
tude. (In the Exercises we study a nonlinear resonance phenomenon in which this
parameter plays a key role.)
(b) The repressilator
Hopf bifurcations also occur in higher-dimensional ODEs. To illustrate this point,
let us consider a gene network called the repressilator. The name tries to capture
the idea that oscillations occur through the mutual repression, or inhibition, of three
6
In connection with this behavior, we recommed that you revisit Exercise 11 from Chapter 2.
234

0
r r
0

supercritical subcritical
> 0 < 0
Figure 7.10: Bifurcation diagrams for the system (7.36) for the cases > 0 and
< 0.
x y
z
Figure 7.11: Diagram of the repressilator system: x inhibits z, z inhibits y, and y
inhibits x.
genes, as indicated schematically in Figure 7.11. It is believed [] that oscillations in
many biological systemsbiological clocks in informal parlanceare based on such
networks.
Mathematically the repressilator is described by the system
x

=

1+y
n
x
y

=

1+z
n
y
z

=

1+x
n
z.
(7.37)
To simplify the analysis we have made the interactions in (7.37) completely symmet-
ric, but this is not necessary for the phenomenon. Also we have scaled the equations
to eliminate inessential parameters.
These equations have a steady-state solution with x = y = z = x
eq
if x
eq
()
satises
x
eq
(1 + x
n
eq
) = . (7.38)
It is easily seen that, for all > 0, (7.38) has a unique solution with x
eq
> 0.
235
2
1
x

equilibrium
stable stable periodic
orbits
unstable
equilibrium
0
0
Figure 7.12: Bifurcation diagram for (7.37) assuming a Hill coecient of n = 4.
The steady-state value of x (denoted by x
eq
in Equation (7.38)) loses stability via
a Hopf bifurcation at = 2. This Hopf bifurcation spawns stable, periodic orbits,
which are indicated in the diagram by plotting the minimum and maximum values of
x for each periodic orbit; the gap between these values gives a visual representation
of the amplitude of the periodic solutions as a function of .
= 1.8 = 2.2
1.4
1.0
0.6
x
0.6
1.0
1.4
0 30
t
0 30
t
x
Figure 7.13: Sample trace of x versus t obtained by numerical solution of (7.37)
with initial conditions x(0) = 0.6, y(0) = 0.4, z(0) = 0.2, and Hill coecient n = 4.
Left panel: With = 1.8, transient oscillations occur before the system settles to
equilibrium. Right panel: With = 2.2, the solution trajectory approaches a limit
cycle.
236
Moreover, this solution depends smoothly and monotonically on , and it tends to
innity as . The graph of this solution is the backbone of the bifurcation
diagram of Figure 7.12.
How does the stability of this equilibrium depends on ? To answer this question,
we compute that the Jacobian of the system:
DF =
_
_
1 0
0 1
0 1
_
_
(7.39)
where = nx
n1
/(1 + x
n
)
2
, and at equilibrium, we have from (7.38) that
=
nx
n
eq
1 + x
n
eq
. (7.40)
Note that DF = I + B where
B =
_
_
0 1 0
0 0 1
1 0 0
_
_
The eigenvalues of B are cube roots of unity, so the eigenvalues of DF are
1 and
_

1
2
i

3
2
_
1.
As increases, the complex-conjugate roots cross the imaginary axis when = 2,
which by (7.40) corresponds to x
n
eq
= 2/(n 2) and

=
_
2
n 2
_
1/n
n
n 2
.
Thus, the equilibrium (7.38) is stable if <

and unstable if >

.
Calling on simulations, we nd that the Hopf bifurcation at =

is supercriti-
cal: In Figure 7.13 periodic solutions appear after crosses

.
(c) The augmented Lotka-Volterra equations
Another example of a Hopf bifurcation occurs in the Lotka-Volterra equations
augmented to include logistic growth and the Allee eect for the prey, as considered
in Section 1.6,
(a) x

= x
_
x
x+
_
(1
x
K
) xy
(b) y

= (xy y).
(7.41)
Specically, bifurcation occurs along the curve K = (1 + 2
2
)/2 that separates
237
Regions II and III in Figure 1.11(a). To simplify the calculations to show this
7
, we
dene the function
(x) =
_
x
x +
_
_
1
x
K
_
(7.42)
so that the RHS of the rst equation in (7.41) may be rewritten x(x) xy. With
this notation, the Jacobian of (7.41) at the co-existence equilibrium (1, 1 (1))
equals
DF =
_

(1) 1
(1) 0
_
. (7.43)
Now det DF > 0 provided K > 1, while
tr DF =

(1) =
2 (1 + 2
2
)/K
(1 + )
2
.
If K = (1 + 2
2
)/2, then tr DF = 0, and hence (DF) = tr DF/2 vanishes.
Hence, the system (7.41) undergoes a Hopf bifurcation as K increases through (1 +
2
2
)/2.
Unlike the preceding two examples, this bifurcation is subcritical. This assertion is
supported by the fact that when (K, ) belongs to Region III, where the co-existence
equilibrium is unstable, solutions of (7.41) evolve to states far from the equilibrium:
specically, as shown in Figure 1.11(d), both populations go extinct. We may observe
the periodic solutions directly in Figure 7.14, which shows a simulation of (7.41) with
time run backwards.
7.7 Hopf bifurcation: theory
The Liapunov-Schmidt reduction provides an eective tool for understanding steady-
state bifurcation. By contrast, for Hopf bifurcation, no such simple tool is available.
Thus, the following purely existential result, which is modeled on Theorem 7.5.3,
assumes greater importance.
In the Hopf theorem we also consider a one-parameter family of ODE (7.32)
near a non-hyperbolic equilibrium, but (7.20) is altered as follows. Suppose that for
=

, the Jacobian DF

of this equation at (x
eq
(

),

) satises:
DF

has simple eigenvalues i


0
, where
0
,= 0, and
has no other eigenvalues on the imaginary axis.
(7.44)
In particular, zero is not an eigenvalue of DF

, so by the implicit function theorem,


for near

, there is a smooth branch of equilibria x


eq
() passing through (x

).
Let DF

be the linearization of (7.32) at these nearby equilibria. By Theorem ?? in


7
These calculations repeat work you may have already done in Exercise 2(d) in Chapter 5.
238
1.5
0.0
0.5
1.0
1 0 2
x
y
Figure 7.14: Reversing time in (7.41) with = 0.2, K = 3.2, and = 1 reveals
an asymptotically stable periodic orbit (solid curve) that encloses the equilibrium
(x, y) = (1, 0.5). (Curves with arrows schematically indicate the backwards-in-time
ow.) Hence, the solid curve also represents an unstable periodic orbit of the for-
ward-in-time equations (7.41).
Eigenvalue Appendix, there is a smoothly varying eigenvalue () of DF(x
eq
(), )
such that (

) = i
0
. In analogy with (7.33) we assume that
d
d
(

) ,= 0. (7.45)
Theorem 7.7.1. Under hypotheses (7.44,7.45), a one-parameter family (t, a) of
periodic solutions of (7.32) bifurcates from (x

) and satises the following:


(i) (t, a), which exists for 0 a < where > 0, is smooth in both variables
and satises
(t, a) = x

+O(a).
.
(ii) (t, a) satises (??) for the specic parameter value = (a) where the
smooth function (a) has an expansion (a) =

+
1
a
2
+O(a
4
). In particular, the
union of the orbits
_
((t, a), (a)) : t R, 0 a <
is a two-dimensional submanifold of R
d
R. There are no other periodic solutions
of (??) in an appropriate neighborhood of (x
0
,
0
).
(iii) (t, a) has period equal to 2/
0
+O(a
2
).
To help the reader understand this and other theorems in this section, we rec-
239
ommend that he/she check the conclusions of the theorems in the context of the
example
x

1
= x
1
x
2
x
1
(x
2
1
+ x
2
2
)
x

2
= x
1
+ x
2
x
2
(x
2
1
+ x
2
2
)
x

3
= x
3
. . . . . . . . .
x

d
= x
d
,
(7.46)
which was constructed from (7.34) by inserting a general coecient in the cubic
terms and appending d 2 auxiliary variables whose evolution is trivial.
The bifurcation (7.46) is supercritical if > 0, and in this case the bifurcating
solutions are stable. Likewise, if < 0, the bifurcation is subcritical and the solu-
tions are unstable. In the theorem the bifurcation is supercritical if in Item (ii) the
quadratic coecient
1
> 0; and subcritical if
1
< 0. Moreover, we have:
Theorem 7.7.2. In Theorem 7.7.1, the periodic solutions are stable if
1
> 0;
unstable if
1
< 0.
To describe these bifurcating solutions more quantitatively, it is useful to perform
a preliminary reduction of (??). By translating and subtracting the steady-state
solution from x, we may arrange without loss of generality that (x

) = (0, 0) and
in fact x
eq
() 0. Next we move the essential coordinates for the bifurcation to the
x
1
, x
2
position. It follows from (7.44) that there is a similarity matrix S such that
S
1
DF

S has the block structure


S
1
DF

S = =
_

0
0
0 A

_
where
0
=
_
0 i
0
i
0
0
_
(7.47)
and A

is a (d 2) (d 2) matrix with no eigenvalues on the imaginary axis. If


we dene a new unknown Sx (without introducing a new name for it), then we have
the transformed equation
x

= S
1
F(Sx) = x +R(x, ) (7.48)
where R(x, ) = O([x[
2
, [x[).
Theorem 7.7.3. If in the above Theorem, the ODE has the form (7.48), then the
bifurcating solutions solutions have the form
(t, a) =
_

_
a cos (a)t
a sin (a)t
0
. . .
0
_

_
+O(a
2
)
240
where the frequency (a) has the expansion
(a) =
0
+
1
a +
2
a
2
+ . . . .
Connect thm to stable manifold theorem
In general it is dicult to determine whether a Hopf bifurcation is subcritical or
supercritical, but for a 2 2 system the following formula may be applied. Let us
assume the system has been written in the form
x

= (
0
+ L)x +
_
f(x, )
g(x, )
_
(7.49)
where
0
is given by (7.47) and L is a 22 matrix. Let us write the nonlinear terms
_
f(x, )
g(x, )
_
= Q(x) +C(x) +O([x[
4
, [x[
2
,
2
[x[) (7.50)
where Q and C are pure quadratic and cubic, respectively. We will write f
x
, f
xx
, etc
for derivatives of f evaluated at x = 0, and similarly for g. Thus for example
Q(x) =
1
2
_
f
xx
x
2
+ 2f
xy
xy + f
yy
y
2
g
xx
x
2
+ 2g
xy
xy + g
yy
y
2
_
where to avoid a proliferation of subscripts, we write x = (x, y); a similar formula
holds for C(x).
Theorem 7.7.4. The bifurcation of (7.49) is supercrit if
f
xxx
+f
xyy
+g
xxy
+g
yyy
+
1

0
[f
xy
(f
xx
+f
yy
)g
xy
(g
xx
+g
yy
)f
xx
g
xx
+f
yy
g
yy
] < 0 (7.51)
and subcritical if this expression is negative.
We shall prove result with calculation similar to calculation of periodic orbit for
van der Pol. Can scale van der Pol calculation to make completely parallel.
Q: Make this an exercise with signposts?
Must revise proof, Calculate formula for
1
. This will involve trace of
L. Maybe remark trace must be nonzero, maybe normalize so that its
positive? Also reduce to the case where
0
= 1.
Look for solution of (7.49) in a series expansion
x(t, ) =
1/2
x
1
() + x
2
() +
3/2
x
3
() +. . . (7.52)
where x
j
= (x
j
, y
j
) is 2-periodic in = ()t with
() = 1 +
1/2

1
+
2
+ . . . .
241
Substitute (7.52) into (7.49), collect powers of :
(a) O(
1/2
) :
dx
1
d
x
1
= 0
(b) O() :
dx
2
d
x
2
=
1
dx
1
d
+Q(x
1
)
(c) O(
3/2
) :
dx
3
d
x
3
=
1
dx
2
d

2
dx
1
d
+ Lx
1
+ DQ(x
1
)x
2
+C(x
1
)
(7.53)
where DQ(x
1
) is the 2 2 matrix, the dierential,
DQ(x
1
) =
_
f
xx
x
1
+ f
xy
y
1
f
xy
x
1
+ f
yy
y
1
g
xx
x
1
+ g
xy
y
1
g
xy
x
1
+ g
yy
y
1
_
(7.54)
Solution of (7.53a):
x
1
() = a
_
cos()
sin()
_
(7.55)
where a is to be determined from higher-order calculations.
Solution of (7.53b):
First calculate that
Q(x
1
) =
a
2
2
_
f
xx
cos
2
() + 2f
xy
cos() sin() +f
yy
sin
2
()
g
xx
cos
2
() + 2g
xy
cos() sin() + g
yy
sin
2
()
_
.
Expressing cos
2
and sin
2
in terms of double angles, we write
Q(x
1
) = a
2
[q
0
+q
1
cos(2) +q
2
sin(2)]
where q
j
is the constant vector given in the rst column of Table 7.3.
For (7.53b) to have a solution, the RHS of this equation must satisfy an orthog-
onality condition; specically, for any function w() belonging to the kernel of the
transpose operator, we must have
_
2
0
RHS(), w()) d = 0.
Now since
T
= , the transpose operator is

d
d

T
= (
d
d
);
i.e., the transpose operator is just the negative of the original operator. Thus x
1
and
dx
1
/d span the kernel of the adjoint. Note that Q(x
1
) is orthogonal to both x
1
and
242
j q
j
v
j
p
j
0
_
(f
xx
+ f
yy
)/4
(g
xx
+ g
yy
)/4
_ _
(g
xx
+ g
yy
)/4
(f
xx
+ f
yy
)/4
_ _
(f
xx
+ g
xy
)/2
(f
xy
+ g
yy
)/2
_
1
_
(f
xx
f
yy
)/4
(g
xx
g
yy
)/4
_ _
(g
xx
g
yy
4f
xy
)/12
(f
xx
+ f
yy
4g
xy
)/12
_ _
(f
xx
g
xy
)/2
(f
xy
g
yy
)/2
_
2
_
f
xy
/2
g
xy
/2
_ _
(f
xx
f
yy
+ g
xy
)/6
(g
xx
g
yy
f
xy
)/6
_ _
(f
xy
+ g
xx
)/2
(f
yy
+ g
xy
)/2
_
Table 7.3: Coecients needed for the middle term of (7.52).
dx
1
/d. Thus the RHS of (7.53b) satises the orthogonality condition if and only if

1
= 0. Given this relation, we may look for a particular solution of (7.53b) in the
form
x
2
() = a
2
[v
0
+v
1
cos(2) +v
2
sin(2)] (7.56)
where v
j
is a constant vector. Substituting into (7.53b) and matching coecients,
we get the equations
v
0
= q
0
2v
2
v
1
= q
1
2v
1
v
2
= q
2
These have solution
v
0
= q
0
v
1
=
1
3
[q
1
+ 2q
2
]
v
2
=
1
3
[2q
1
q
2
] .
Taking q
j
from the rst column of Table 7.3, we nd that v
j
equals the vectors
given in the second column of the table. The general solution of (7.53b) will consist
of this particular solution plus a solution of the homogeneous equation. In fact the
homogeneous solution, whatever it may be, will have no inuence on our calculations
below. Justify this below
Solution of (7.53c):
Equation (7.53c), like the equation for x
2
, can be solved only if the RHS is
orthogonal to x
1
and dx
1
/d. Orthogonality with respect to dx
1
/d provides an
equation for
2
that is pursued in the Exercises. Orthogonality with respect to x
1
will yield an equation that determines the amplitude a of x
1
: in symbols this equation
is
1
2
_
2
0
Lx
1
, x
1
) +DQ(x
1
)x
2
, x
1
) +C(x
1
), x
1
) d = 0 (7.57)
where we have used the facts that
1
= 0 and that dx
1
/d is orthogonal to x
1
. For
243
the rst term in (7.57) we calculate from (7.55) that
1
2
_
2
0
Lx
1
, x
1
) d =
a
2
2
trL,
For the third term in (7.57) we calculate that
1
2
_
2
0
C(x
1
), x
1
) d =
a
4
2
_
2
0
_
fxxx
6
cos
3
(2) +
fxxy
2
cos
2
() sin() +
fxyy
2
cos() sin
2
() +
fyyy
6
sin
3
()
_
cos() d
+
a
4
2
_
2
0
_
gxxx
6
cos
3
(2) +
gxxy
2
cos
2
() sin() +
gxyy
2
cos() sin
2
() +
gyyy
6
sin
3
()
_
sin() d.
(7.58)
All integrals with an odd power of sin or cos vanish, and
1
2
_
2
0
cos
4
() d =
1
2
_
2
0
sin
4
() d =
6
16
,
1
2
_
2
0
cos
2
() sin
2
() d =
2
16
.
Hence (7.58) reduces to
1
2
_
2
0
C(x
1
), x
1
) d =
1
16
(f
xxx
+ f
xyy
+ g
xxy
= g
yyy
). (7.59)
For the middle term in (7.57), it is convenient to take a transpose:
1
2
_
2
0
DQ
T
(x
1
)x
2
, DQ
T
(x
1
)x
1
) d.
We have already calculated x
2
. Substituting into (7.54) we nd
DQ
T
(x
1
)x
1
= a
2
_
f
xx
cos() +f
xy
sin() f
xy
cos() + f
yy
sin()
g
xx
cos() +g
xy
sin() g
xy
cos() + g
yy
sin()
_ _
cos()
sin()
_
.
Multiplying this out and using the double-angle formulas, we obtain
DQ
T
(x
1
)x
1
= a
2
[p
0
+p
1
cos(2) +p
2
sin(2)]
where p
j
is given in the third column of Table 7.3. Recalling (7.56) for x
2
and
performing the trivial trigonometric integrals, we see that the middle term in (7.57)
equals
1
2
_
2
0
x
2
, DQ
T
(x
1
)x
1
) d = v
0
, p
0
) +v
1
, p
1
)/2 +v
2
, p
2
)/2.
244
Finally, taking v
j
, p
j
from Table 7.3 we may compute that
1
2
_
2
0
x
2
, DQ
T
(x
1
)x
1
) d =
1
16
[f
xy
(f
xx
+ f
yy
) g
xy
(g
xx
+ g
yy
) f
xx
g
xx
+ f
yy
g
yy
].
(7.60)
This calculation isnt as bad as it might seem. Note that there are six independent
coecients in Q(x). In principle a quadratic form in these six coecients might
have 6 7/2 = 21 independent terms. However, note in the Table that in the inner
product coecients in the group f
xx
, g
xy
, f
yy
never multiply another coecient in
the same group, only coecients from the complementary group g
xx
, f
xy
, g
yy
. Thus
there are only 3 3 = 9 possible independent terms, and it may be seen from (??)
that three of the nine vanish and the others all reduce to 1/16.
Put the three terms together to get the desired result.
7.8 Bifurcation in the FitzHugh-Nagumo equations
Add something about saddle-node bif n before discussion Hopf ?
Let us recall the FitzHugh-Nagumo equations from Chapter 6:
x

= x
2
(1 x) y + I
y

= (x y)
(7.61)
We saw in that chapter that if < 3, then (7.61) has a unique equilibrium for every
value of I. This equation exhibits relaxation oscillations if 0 < I < 2/3, provided
is suciently small, and the periodic orbit encloses a repelling equilibrium. On the
other hand, (7.61) has a globally attracting equilibrium if I < 0 or I > 2/27. Not
proved that its globally attracting, just stable. As I varies, the transition
between these dierent behaviors is eected through a Hopf bifurcationsee the
numerical simulations presented in Figure ??. Here we study the bifurcation that
occurs for I close to zero, and our main result is:
Main claim: Equation (7.61) undergoes a Hopf bifurcation at I
bif
= /2+O(
2
)
that is supercritical if 0 < < 3/2 and subcritical if 3/2 < < 3.
A completely analogous bifurcation occurs for I close to 2/27. Incidentally, as
Figure ?? shows, the periodic solutions bifurcate from an equilibrium that varies
(smoothly) with Ithis also happened above with (7.41).
Proof of claim: We may eliminate y from the equilibrium equations associated
with (7.61), obtaining
x
3
eq
x
2
eq
+ x
eq
/ = I. (7.62)
This equation denes the equilibrium value of x as a function of I, and its graph
is the backbone of Figure ??. Actually it is more convenient to regard x
eq
as the
independent variable and compute I from (7.62). Thus, we take , the equilibrium
245
value of x, as the bifurcation parameter. Following (7.62) we dene
I() =
3

2
+ / (7.63)
and rewrite (7.61) as
x

= x
2
(1 x) y + I()
y

= (x y).
(7.64)
In terms of incremental variables x = x , y = y / we may rewrite (7.64) as
x

= (x
2
+ 2x) (x
3
+ 3x
2
+ 3
2
x) y
y

= (x y);
(7.65)
all terms that are independent of x, y cancel, so for all , x = y = 0 is an equilibrium
solution of (7.65). The dierential of the equation at this equilibrium is
DF =
_
2 3
2
1

_
. (7.66)
The determinant of this matrix is times (3
2
2) + 1; provided < 3 this
quadratic form is positive denite; thus, det DF > 0. The system undergoes a Hopf
bifurcation if
tr DF = 3
2
+ 2 = 0 (7.67)
which may be solved to yield

bif
=
1

1 3
3
=

2
+O(
2
),
where we have chosen the minus sign which gives the bifurcation near I = 0. We
dene =
bif
and substitute into (7.65) to obtain, modulo terms that are
O(x
2
,
2
x)
x

= (2
bif
3
2
bif
)x y + (2 6
bif
)x + (1 3
bif
)x
2
x
3
y

= (x y);
(7.68)
here we have listed the terms on the RHS in the order
linear in x, y, O(x), O(x
2
), O(x
3
).
In preparation for applying criterion (7.51), we rewrite (7.68) in matrix notation
_
x

_
= (A + B)
_
x
y
_
+
_
(1 3
bif
)x
2
0
_
+
_
x
3
0
_
(7.69)
246
where
A =
_
1

_
, B =
_
2 6
bif
0
0 0
_
.
We need to reduce A to real canonical form as in (7.49). Thus we dene
S =
_
1 0

_
(1
2
)
_
Explain choicewant x-coordinate to be unchanged to remain simple. Com-
pute
S
1
AS =
_
0
0
_
where =
_
(1
2
) and
B =
_
2 6
bif
0

1
2

(2 6
bif
) 0
_
.
Thus application of this similarity transformation yields the normal form eqn (7.49)
with = S
1
AS, L = S
1
BS,
Q = (1 3
bif
)
_
x
2

1
2

x
2
_
, C =
_
x
3

1
2

x
3
_
.
Now =

(1 +O(), and using the notation of Section 7.6, we have
f
xx
= 2 +O(), g
xx
= 2

(1 +O(), f
xxx
= 6, g
xxx
= O(

),
while all other derivatives in (7.51) vanish. Substituting into this equation we com-
pute
f
xxx
+ g
xxx

f
xx
g
xx
= 6 + 4 +O(

),
which proves the claim.
Concluding remark: Canard explosion. Also note: period is O(

) near bifn
point, grows to O() in relaxation oscillations.
7.9 Exercises
Subcritical bifn: Euler strut supported from side.
Q: introduce minimizing energy as a technique to derive eqlb eqn.
247
Bioswitch example, Ellner and Guckenheimer page 120, making equations sym-
metric in u and v:
x

=

1+y
n
x
y

=

1+x
n
y.
Bistability, through supercrit bifn if increases through positive values, through
saddle-node bifurcation if decreases through large negative values.
For (7.11), verify the claimed value of for the bifurcation point. To do this,
write the equation for equilibria (with r eliminated) as
[/x
2
] [x 1]
1 + x
2
= 0.
I.e., put 1 + x
2
in the denom. When there is one eqlb, num is positive for all x, at
bifn, num becomes a perfect square.
Rosenzweig-MacArthur (after scaling) Note: Dierent scaling used here,
based on carrying capacity. Connect to nonunique scaling idea above.
v

= v(1 v)
xv
1+Sv
x

=
Exv
1+Sv
Dx.
(change letters?) Find ss-sln: Forget (0, 0) Trivial solution (1, 0). Stable for large
D, loses stability. Find nontrivial solution branch that bifurcates at point where
trivial sln loses stability. Show stable nearby. Caveat.
Show there is a Hopf bifurcation for still smaller D.
Give alternate conditions for an eigenvalue to be simple: ker intersect range =
zero, dim ker (A I)
2
is one.
Note there is energy function for rotating bead, provided you include the rota-
tional energy.
the CSTR: continuous stirred-tank reactor (make exercise)
Start with diml eqn. Do diml analysis, scaling. Obtain scaled equations:
c

= (1 c) ce
T
T

= T + Hce
T
.
Assume H is large. Find equilibria as function of . Show unique ss-sln for
large or small, three ss-sln in between.
Rmk: If H = 4, there is a hysteresis point. If H > 4, have interval of with 3
sln; if H < 4, no such interval. Q: Make Exercise out of this?
248
Hopf bifurcation for equilibria of Lorenz at large
Rossler ODE has saddle-node bifurcation before all the period-doubling action
starts.
Hopf bifn in repressilator. (See junk at end of TeX le.)
7.10 Ideas
Other examples? E.g., Selkovs glycolysis model? Nowaks virus model?
Appendix: Do eigenvalues of matrix depend continuously on entries?
Reference for proofs.
What about degenerate bifurcation, codim 2? Combine a couple of the above?
Ex: buckling of a spring-beam. Something in Morris-Lecar?
Terms to give intuition about: unfolding, robust, generic (=what produces only
robust behavior)
Construct coordinate transf to show can push some terms out to higher order.
Give intuitive argument that hot (x
3
, 0) or (x
2
, x
2
) stabilize the basic Hopf
equation. Explain why quadratic more potent at high rotation frequency.
249
Chapter 8
Global bifurcations
In this chapter we present examples of ve types of global bifurcations. Some of
these bifurcations are actually present in equations considered above, and we begin
with these. Unfortunately, with this order of presentation, the examples that exhibit
more interesting behavior are pushed to later on the list. For most cases, we begin
the discussion of a type of bifurcation with an academic (pedagogical?) example and
then proceed to examples with more physical interest.
Overall remark: Even more dependent on examples and calculations than in
previous chapters. Cant prove much of anything.
8.1 Mutual annihilation of two limit cycles
8.1.1 An academic example
r

= (r 1)
2
+

= 1.
(8.1)
If 1 < < 0, there are two periodic orbits, r = 1

, the inner one
stable and the outer one unstable. If > 0, there are none. This bifurcation
is completely analogous to saddle-node bifurcation of equilibria, even though it is
nonlocal explain why call it nonlocal.
8.1.2 The FitzHugh-Nagumo equations
If have subcritical Hopf bifurcation and bifurcating limit cycle turns around, an
unstable cycle becoming stable. Refer to Chapter 7.
250
8.1.3 Phase locking in coupled oscillators
Consider equations on torus T
2
, with coordinates , . (Could of course pose on R
4
,
using polar coordinates on each copy of R
2
.)

=
1
K
1
sin( )

=
2
K
2
sin( ).
Interpret (from Strogatz): two runners on circular track. Each has natural speed,
and these are dierent, but they place a value on running together.
Find solutions, phase locked if dierences small enough. Note there is a M/A
bifurcation at edge of inequality above.
Warning: If inequality is not satised, behavior is a lot more compli-
cated than one might guess. See Appendix on ODEs on torus.
8.2 Saddle-node bifurcation points on a limit cycle
8.2.1 An academic example
r

= 1 r

= cos
Bifurcation occurs at = 1. Describe
Q: Add (r 1)
2
to -eqn? Connect with example of eqlb that is attract-
ing but not stable.
8.2.2 The overdamped torqued pendulum
Recall the torqued pendulum from Chapter 7:
x

= y
y

= sin x y + .
(8.2)
We showed that this system undergoes a saddle-node bifurcation at = 1: as
crosses 1, the two solutions of
sin x =
merge and disappear into the complex domain. Let us suppose that is large: i.e.,
there is a lot of damping. Argue that for < 1 the two halves of the unstable manifold
from the saddle point fall into the stable equilibrium, one being very short and the
other making a nearly complete revolution. Together these make up a homoclinic
cycle. When crosses 1, the equilibria disappear and the homoclinic cycle becomes
a limit cycle.
251
8.2.3 Other examples
Show exists in FitzHugh-Nagumo? In dumbed-down Morris-Lecar
v

= C(v) w + I
w

= [e(v v
0
) w]
Here C is an appropriate cubic, perhaps C(v) = v
2
(1 v) or C(v) = v(1 v
2
), and
e is an elbow function,
e(v) = [

v
2
+ 1 + v]/2.
The model has 4 parameters, I, , , v
0
8.3 Homoclinic bifurcation
8.3.1 van der Pol with nonlinearity in the restoring force
In this case there is a physical example that already is simple emough to exhibit the
phenomenon, so there is no need to begin with an academic example.
x

= y
y

= (x
2
1)y x x
2
.
(8.3)
Dierence from van der Pol: the linear restoring force is perturbed by the nonlinear
term x
2
.
If is small, the periodic orbit of van der Pol continues to exist, being only slightly
perturbed. Argue or compute to show: If is increased suciently, the periodic orbit
disappears. Argue that it disappears through a homoclinic bifurcationmust dene
this term.
Do dimension counting to show homoclinic requires special value of parameter.
8.3.2 The torqued pendulum with small damping
The torqued pendulum considered above, (8.2), also exhibits a homoclinic bifurcation
as the torque is increases, provided the damping is small. Show this. Maybe
its clearer to consider decreasing , from situation in which periodic orbit exists to
disappears. Rmk: Phase space is S
1
R.
Q: At what point does transition between saddle-node on a limit cycle and ho-
moclinic bifurcation occur? Show diagram in Strogatz of gure taken from paper by
Levi.
252
8.3.3 The Lotka-Volterra model with logistic growth and the Allee eect
Strictly speaking should call this homoclinic-cycle bifurcation. (Recall homoclinic
cycles in Poincare-Bendixson Theorem.)
Recall equations
Know unstable limit cycle appears as K decreases, but where does it gonot
present at K = 1 when coexistence equilibrium meets prey-only equilibrium at
(K, 0)? Show homoclinic cycle, involving connecting orbits between saddle points
at (, 0) and (K, 0). Compute the value of K at the bifurcation.
8.3.4 Other examples
FitzHugh-Nagumo or dd-Morris-Lecar?
Appearance of periodic orbit in Lorenzpart of the onset of chaos.
Ex of bifurcation from saddle point to unstable limit cycle, like in Lorenz? im-
portant! (There is an example in my notes, between lectures 24 and 25.)
8.4 Hopf-like bifurcation to an invariant torus
Can describe with Poincare map, e-value = e
i
. Note analogy with Hopf bifurcation
of equilibrium.
8.4.1 An academic example
Pose on R
3
. Dene coordinates (create name), starting from cylindrical coord r, , z.
Specically, unchanged, use polar coordinates to describe r, z-plane with origin at
(r, z) = (1, 0):
r 1 = cos , z = sin ,
or the inverse
=
_
(r 1)
2
+ z
2
, = arctan(z/(r 1)).
Give gure. Specify range of variables.
Consider

= 1

=
3

= +
2
.
Discuss behavior as crosses 0the , -subsystem undergoes a Hopf bifurcation.
But orbits are located on a torusi.e., evolves along with , .
Discuss closed orbits and skew lines on T
2
.
253
Warning: Typically (generically) behavior of orbits on T
2
not so simple
see Appendix. For example if added a term K sin() to the equation,
would get the kind of behavior described in Appendix.
Rmk: Example of quasi-periodic behavior in Strogatz, motion in a central force
eld, Strogatz, problem 8.6.7 on p295.
8.4.2 The forced van der Pol equation
x

+ (x
2
1)x

+ x = cos(t).
If forcing is large, there is a unique stable periodic orbit with period 2/. (Cf.
forcing in linear case.) Expect if decreases, periodicity of unperturbed system will
play a role. Conrm with computations. (Use Section 2.1 of Guck-Holmes as guide
in param choice, at least if weakly nonlinear.)
This system considered in much greater detail in the Appendix, ODEs on a torus.
Is it really? Not just used to motivate idea of ODE on torus?
Incidentally, this type of bifurcation occurs in many uid-mechanics problems
(PDE). Part of the evolution towards turbulance at large Reynolds number. Refce??
8.5 Period doubling
Note: can also describe with Poincare map, e-value = -1. (Strictly speaking there is
a third case, in which e-value of Poincare map equals +1. Not very interesting. See
Exercises.)
8.5.1 An academic example
Cylindrical coordinates, but still with r = 1, z = 0 as a periodic orbit.

= 1
_
(r 1)

_
= ( I + A())
_
(r 1)
z
_
where I is the 2 2 identity matrix,
=
_
0 1/2
1/2 0
_
, A() =
_
cos
2
sin cos
sin cos sin
2

_
.
Explain how one rotating direction is amplied, the orthogonal direction is damped.
Write down explicit solutions.
Phenomenon: new periodic solutions with period 4.
254
8.5.2 Rosslers equation
Should have introduced this in Exercises in Chapter 7. It exhibits both saddle-node
and Hopf bifurcations.
No Physical basis, just ingenuity of Rossler.
x

= y z
y

= x + ay
z

= b + z(x c).
(8.4)
Numerical solutions. Period doubling after Hopf bifurcation. In fact, innite se-
quence of these!
Q: What discussion of period doubling in 1-dim maps. Ex can handle explicitly
x
n+1
= tanh(x
n
)
where passes through 1. Mention period doubling in logistic map? Do something
with alternans?
8.6 Appendix: ODEs on a torus
Example: Forced van der Pol.
Discuss this problem through Poincare map.
Discuss Arnolds example in Wiggins, Chapter 21. Get Arnold tongues. (Rmk:
are boundaries of tongues examples of MA bifurcation?)
8.7 Appendix: What is chaos?
Mention Fourier analysis. Denitely discuss transition to chaos in Lorenz (here or
somewhere else?) Mention chaos in forced Dung eqn?
8.8 ideas
Bursting
What on fast-slow systems?
APPENDIX A:
255
Appendix A
Guide to Commonly Used Notation
Symbol Usual Meaning
R
d
d-dimensional Euclidean space
, real-valued parameters
t independent variable (often corresponds to time)
x vector of dependent variables for a system of ODEs
b vector of initial conditions for a system of ODEs
A, B, S, J matrices (usually square and of dimension d d)
a diagonal matrix
N a nilpotent matrix
tr (A) trace of the square matrix A
det(A) determinant of the square matrix A
|A| norm of a matrix A; see Equation (2.12)
eigenvalue of a square matrix
F vector-valued function of a vector (often F : R
d
R
d
)
256
Appendix B
Notions from Advanced Calculus
Supremum and inmum: If E R, a number b is called an upper bound for E if
x b for all x E. We say that b is the supremum of E if b is the least upper bound
for E in the sense that (i) b is an upper bound for E, and (ii) if

b is an upper bound
for E, then b

b. The inmum of E is dened analogously as the greatest lower
bound for E. The supremum and inmum of E R are denoted by sup(E) and
inf(E), respectively. If, for example, E = 1,
1
2
,
1
3
,
1
4
, . . . R, then sup(E) = 1 and
inf(E) = 0. Notice that in this example, sup(E) E whereas inf(E) / E. When a
set E contains it supremum, we typically write sup(E) = max(E), the maximum of
the set E. Likewise, if inf(E) E we may write inf(E) = min(E), the minimum of
the set E. If a < b, then the open interval (a, b) and the closed interval [a, b] have
the same inmum and supremum (a and b, respectively), whereas only the closed
interval has a maximum and a minimum.
Compactness: Let E R
d
. An open cover of E is any collection

I
of
open subsets of R
d
with the property that
E
_
I

.
This union of sets need not be countable, and for that reason we have indexed the
sets using real numbers , drawn from some index set I. The set E is called compact
if every open cover of E has a nite subcover. That is, E is compact if from every
open cover of E, we may select nitely many open sets whose union still contains E.
The same denition of compactness applies to subsets of spaces that are more
abstract than Euclidean space R
d
. Luckily for us, spotting compact subsets of R
d
is much easier than the above denition would suggest:
Theorem B.0.1. (Heine-Borel) A subset E R
d
is compact if and only if E is
257
closed and bounded.
We emphasize that the Heine-Borel theorem is a luxury associated with working
in Euclidean space. In any metric space, compactness always implies closedness and
boundedness, but the reverse implication need not hold.
Many of the technical proofs throughout this text require us to estimate functions,
their derivatives, or their integrals, over some given subset of R
d
. Working with
continuous functions over compact domains can facilitate such estimates, and the
following two theorems are invaluable.
Theorem B.0.2. Suppose that E R
d
1
is compact (i.e., closed and bounded) and
F : E R
d
2
is continuous. Then the image F(E) is a compact subset of R
d
2
.
Phrased more compactly (bad pun intended), images of compact sets under con-
tinuous functions are compact.
Theorem B.0.3. (Extreme Value Theorem) Suppose that E R
d
is compact and
F : E R is continuous. Then there exist points x
min
, x
max
E such that
F(x
min
) = inf
xE
F(x) and F(x
max
) = sup
xE
F(x).
The extreme value theorem guarantees that continuous functions F : E R
always achieve maximum and minimum values over compact sets E. Regarding the
inmum and supremum in the statement of Theorem B.0.3, we typically refer to
these values as the [absolute] minimum and maximum, respectively, of F over the
compact set E.
For estimates involving continuous vector-valued functions, a slight adaptation of
Theorem B.0.3 will aid us:
Corollary B.0.4. Suppose that E R
d
1
is compact and F : E R
d
2
is continuous.
Then there exist points x
min
, x
max
E such that
[F(x
min
)[ = inf
xE
[F(x)[ and [F(x
max
)[ = sup
xE
[F(x)[.
To prove the Corollary, dene the function G : R
d
2
R according to the rule
G(y) = [y[. Since F and G are continuous, then so is the composition (GF) : E
R. The corollary follows upon applying the extreme value theorem to (G F).
Sequences: Suppose a
n
, n = 1, 2, 3, . . . is a sequence of points in R
d
. We say
that a
n
converges to a limit L if for every > 0 there exists an integer N = N()
such that [a
n
L[ < whenever n N.
There are many dierent notions of convergence for sequences of functions, two of
which we single out for use throughout this text. Suppose that f
n
, n = 1, 2, 3, . . . ,
is a sequence of functions dened over some set E R
d
. We say that f
n
converges
258
pointwise on E to a function f if for every > 0 and x E, there exists an
integer N = N(x, ) such that [f
n
(x) f(x)[ < whenever n N;
uniformly on E to a function f if for every > 0, there exists an integer
N = N() such that whenever n N, [f
n
(x) f(x)[ < for all x E.
Take a moment to contrast the denitions of pointwise and uniform convergence.
Importantly, notice that in the denition of pointwise convergence, the integer N
can depend upon both and x, whereas in the denition of uniform convergence,
N depends only upon . For uniform convergence, the same integer N has to work
for all x E, and it is evident that uniform convergence automatically implies
pointwise convergence. On the other hand, pointwise convergence of a sequence of
functions does not imply uniform convergence. Consider, for example, the functions
f
n
: [0, 2] R dened by
f
n
(x) =
x
n
1 + x
n
, (n = 1, 2, 3, . . . ).
We claim that this sequence converges pointwise but not uniformly to the function
f(x) =
_

_
1 if x (1, 2],
1
2
if x = 1,
0 if x [0, 1).
First, suppose that x (0, 1), where we have excluded the endpoints because point-
wise convergence is obvious at those two points. Given > 0, we must produce an
integer N = N(x, ) for which
[f
n
(x) f(x)[ =
x
n
1 + x
n
< for all n N.
To motivate our choice of N, use algebra to rewrite this inequality as
n >
ln
_

1
_
ln x
where
1
we have used the fact that ln x < 0 since x (0, 1). Choosing
N(x, ) =
_
ln
_

1
_
ln x
_
,
where y denotes the least integer not less than y, we have established pointwise
1
We may as well assume < 1 since 0 < f
n
(x) < 1 for all x (0, 1). This allows us to dodge the
possibility of dividing by zero when we choose N(x, ).
259
convergence over the interval 0 x 1. The same sort of algebraic manipulations
can be used to prove that f
n
(x) f(x) pointwise for x (1, 2]. For x in that
interval, the reader is encouraged to produce an integer N = N(x, ) such that
[f
n
(x) f(x)[ = 1
x
n
1 + x
n
=
1
1 +x
n
< for all n N.
(This time, keep in mind that ln x > 0.)
To see why the convergence is not uniform, refer to the above calculation for
N(x, ) for x (0, 1). Our choice of N = N(x, ) is best possible in the sense
that N is the least integer for which the inequality [f
n
(x) f(x)[ < holds for each
n N. Again assuming < 1 (see footnote), letting x approach 1 would force us to
choose N larger and larger, implying that the convergence is not uniform.
The limiting function f(x) in the preceding example is discontinuous even though
the functions f
n
(x) are continuous. The next theorem ensures that this cannot
happen for uniformly convergent sequences of continuous functions.
Theorem B.0.5. If f
n
is a sequence of continuous functions which converges
uniformly to f(x) on a set E R
d
, then f(x) is continuous on E.
If the functions f
n
in the statement of Theorem B.0.5 happen to be dierentiable,
we might wonder whether the sequence f

n
converges to f

. A cautionary example
puts that line of questioning to rest: the sequence of dierentiable functions
f
n
(x) =
arctan(nx)

n
converges uniformly to the constant function f(x) = 0 over the entire real line.
However, the derivatives
f

n
(x) =

n
1 + n
2
x
2
have the unfortunate property that f

n
(0) =

n as n , whereas f

(0) = 0.
With slightly stronger conditions on the sequence f
n
, we can draw conclusions
regarding uniform limits of sequences of dierentiable functions:
Theorem B.0.6. Suppose that f
n
is a sequence of dierentiable functions on [a, b]
and that for some point x
0
[a, b], the sequence f
n
(x
0
) converges. If f

n
converges
uniformly on [a, b], then f
n
converges uniformly on [a, b] to a function f, and
f

(x) = lim
n
f

n
(x) (a x b).
Under weaker hypotheses, an analogous result holds for integrals:
260
Theorem B.0.7. Suppose that f
n
is a sequence of integrable functions on [a, b]
and that f
n
converges uniformly to f on [a, b]. Then f is integrable on [a, b] and
_
b
a
f(x) dx = lim
n
_
b
a
f
n
(x) dx.
Series: Suppose a
n
R
d
, n = 1, 2, 3, . . . and consider the innite series

n=1
a
n
.
The mth partial sum of the series is the nite sum S
m
=

m
n=1
a
n
. We say that the
innite series converges to a limit L if the sequence of partial sums S
m
converges
to L, and in this case we shall write

n=1
a
n
= L. The innite series converges
absolutely if the series

n=1
[a
n
[,
whose terms are non-negative scalars, converges.
Absolute convergence of a series is a stronger property than [ordinary] conver-
gence, as we now state more precisely.
Proposition B.0.8. If the series

n=1
a
n
(a
n
R
d
)
converges absolutely, then the series converges.
Proof. Assuming the hypothesis of the proposition, let

n=1
[a
n
[ = L (a scalar).
Given > 0, choose an integer N

= N

() large enough that


L
N

n=1
[a
n
[ =

n=N

+1
[a
n
[ < .
We claim that the sequence of partial sums of

n=1
a
n
is Cauchy, and therefore
convergent since R
d
is complete. Choose integers M, N larger than N

and consider
the partial sums
S
M
=
M

n=1
a
n
and S
N
=
N

n=1
a
n
.
Assuming without loss of generality that N > M, the estimate
[S
N
S
M
[ =

n=M+1
a
n

n=M+1
[a
n
[

n=N

+1
[a
n
[ <
shows that the sequence of partial sums is Cauchy, completing the proof.
261
Theorems B.0.5 through B.0.7 have immediate corollaries involving series of func-
tions. If

n=1
f
n
(x) is an innite series of functions dened over an interval [a, b],
the mth partial sum is the nite sum S
m
(x) =

m
n=1
f
n
(x). We say that the in-
nite series

n=1
f
n
(x) converges pointwise to S(x) on [a, b] if the sequence S
m
(x)
converges pointwise to S(x). Likewise, we say that the series converges uniformly to
S(x) on [a, b] if the sequence S
m
(x) converges uniformly to S(x).
Corollary B.0.9. Suppose that

n=1
f
n
(x) converges uniformly to S(x) on a set
E R
d
and that each term f
n
(x) is continuous on E. Then S(x) is continuous on
E.
The next two corollaries provide conditions under which we are justied in dif-
ferentiating or integrating an innite series term-by-term.
Corollary B.0.10. Suppose that

n=1
f
n
(x) is a sum of dierentiable functions
on the interval [a, b] and suppose that the partial sums S
m
(x) obey the hypotheses
of Theorem B.0.6. That is, assume that the sequence S
m
(x
0
) converges for some
choice of x
0
[a, b], and that S

m
(x) converges uniformly on [a, b]. Then S
m

converges uniformly to a function S(x), and S

(x) = lim
m
S

m
(x). Equivalently,
_

n=1
f
n
(x)
_

n=1
f

n
(x).
Corollary B.0.11. Suppose that

n=1
f
n
(x) is a sum of integrable functions on the
interval [a, b], and that

n=1
f
n
(x) converges uniformly on [a, b]. Then

n=1
f
n
(x)
is integrable, and the order of integration and summation can be interchanged as
_
b
a
_

n=1
f
n
(x)
_
dx =

n=1
__
b
a
f
n
(x) dx
_
.
Derivatives of vector-valued functions: Suppose E R
d
1
is open and that
F : E R
d
2
.
Denition B.0.12. We say that F is dierentiable at x E if there exists a linear
transformation A from R
d
1
into R
d
2
such that
lim
h0
[F(x +h) F(x) Ah[
[h[
= 0.
We refer to A as the derivative of F at x, and write A = DF(x).
There are several equivalent ways to state this denition; e.g., setting h = y x
reveals that F is dierentiable at x if and only if there exists a linear transformation
262
DF(x) : R
d
1
R
d
2
such that
lim
yx
[F(y) F(x) DF(x)(y x)[
[y x[
= 0.
Alternatively, F is dierentiable at x if and only if there exists a linear transformation
DF(x) : R
d
1
R
d
2
such that
F(x +h) F(x) = DF(x)h +r(h),
where the remainder r is small in the sense that
lim
h0
[r(h)[
[h[
= 0.
A function F : E R
d
1
R
d
2
can be written in component form as
F(x
1
, x
2
, . . . , x
d
1
) =
_

_
F
1
(x
1
, x
2
, . . . , x
d
1
)
F
2
(x
1
, x
2
, . . . , x
d
1
)
.
.
.
F
d
2
(x
1
, x
2
, . . . , x
d
1
)
_

_
,
where each of the functions F
1
, F
2
, . . . , F
d
2
is scalar-valued and (x
1
, x
2
, . . . , x
d
1
) E.
If F is dierentiable on E, then the partial derivatives
F
i
/x
j
(i = 1, 2, . . . , d
2
and j = 1, 2, . . . , d
1
)
exist and, for each x E, the derivative DF(x) has the d
2
d
1
matrix representation
_

_
F
1
/x
1
F
1
/x
2
F
1
/x
d
1
F
2
/x
1
F
2
/x
2
F
2
/x
d
1
.
.
.
.
.
.
.
.
.
.
.
.
F
d
2
/x
1
F
d
2
/x
2
F
d
2
/x
d
1
_

_
(B.1)
with respect to the standard bases for R
d
1
and R
d
2
. The matrix in Equation (B.1)
is called the Jacobian of F.
Remark: If E is an open subset of R
d
1
over which F : E R
d
2
is dierentiable,
then the Jacobian matrix exists for each x E and provides a convenient represen-
tation for DF(x). However, existence of the partial derivatives in the matrix (B.1) is
not enough to conclude that F is dierentiable. Consider, for example, the function
263
F : R
2
R dened by
F(x
1
, x
2
) =
_
1 if x
1
x
2
= 0
0 otherwise,
which is discontinuous (and therefore non-dierentiable) along both coordinate axes.
In particular, F is not dierentiable at (0, 0) despite the fact that both partial deriva-
tives F/x
1
and F/x
2
exist (and are equal to 0) at that point. Such pathologies
can be avoided if the partial derivatives in the Jacobian matrix are continuous on E:
Theorem B.0.13. Suppose E R
d
1
is open and F : E R
d
2
. If all partial
derivatives F
i
/x
j
in the Jacobian matrix (B.1) of F are continuous on E, then F
is dierentiable on E.
Compositions of dierentiable functions are also dierentiable, and the chain rule
from single-variable calculus generalizes to higher dimensions as follows.
Theorem B.0.14. (Chain Rule): Suppose E R
d
1
is open and F : E R
d
2
is dierentiable at x
0
E. Further suppose that G maps an open set containing
F(E) into R
d
3
and that G is dierentiable at F(x
0
). Then the composition H(x) =
(G F)(x) = G(F(x)) mapping E into R
d
3
is dierentiable at x
0
and
DH(x
0
) = DG(F(x
0
)) DF(x
0
).
Note: The composition DG(F(x
0
))DF(x
0
) can be computed by multiplying the
d
3
d
2
Jacobian matrix representation of DG(F(x
0
)) with the d
2
d
1
Jacobian matrix
representation of DF(x
0
), resulting in a d
3
d
1
Jacobian matrix representation of
DH(x
0
).
The implicit function theorem: Under certain circumstances, an equation of
the form F(x, y) = 0 implicitly denes a function y = f(x). If the function F(x, y)
is scalar-valued, then it is not dicult to state conditions guaranteeing that the
equation F(x, y) = 0 implicitly denes a function.
Theorem B.0.15. Suppose that F(x, y) is continuously dierentiable in the xy-
plane. Then the equation F(x, y) = 0 can be solved for y in terms of x in a neigh-
borhood of any point (a, b) at which F(a, b) = 0 and F
y
(a, b) ,= 0.
As an illustration, if F(x, y) = x
2
+y
2
1 then the equation F(x, y) = 0 implicitly
denes a circle of radius 1 centered at the origin in the xy plane. Since F
y
= 2y is
nonzero unless y = 0, the only points in the xy plane satisfying both F(x, y) = 0
and F
y
(x, y) = 0 are (x, y) = (1, 0). At those two points, the lines tangent to the
graph of the circle F(x, y) = 0 are vertical. For all other pairs (x, y) on the graph,
the equation F(x, y) = 0 implicitly denes a function y = f(x).
264
Now let us establish notation that will aid in generalizing Theorem B.0.15 to
higher dimensions. If x = (x
1
, x
2
, . . . , x
d
1
) R
d
1
and y = (y
1
, y
2
, . . . , y
d
2
) R
d
2
, we
shall write
(x, y) = (x
1
, x
2
, . . . , x
d
1
, y
1
, y
2
, . . . , y
d
2
) R
d
1
+d
2
.
Suppose that E R
d
1
+d
2
is open and F : E R
d
2
. The relationship F(x, y) = 0
can be written in component form as
F
1
(x
1
, x
2
, . . . , x
d
1
, y
1
, y
2
, . . . , y
d
2
) = 0
F
2
(x
1
, x
2
, . . . , x
d
1
, y
1
, y
2
, . . . , y
d
2
) = 0
.
.
.
F
d
2
(x
1
, x
2
, . . . , x
d
1
, y
1
, y
2
, . . . , y
d
2
) = 0.
The implicit function theorem provides criteria under which (y
1
, y
2
, . . . y
d
2
) is (at
least locally) a function of (x
1
, x
2
, . . . , x
d
1
). Of course, algebraically solving for the
y variables in terms of the x variables is generally too much to hope for.
Theorem B.0.16. (Implicit Function Theorem) Suppose that E R
d
1
+d
2
is open
and that F : E R
d
2
is continuously dierentiable. Let (x
0
, y
0
) E be a point such
that F(x
0
, y
0
) = 0, and form the square matrix
J =
_

_
F
1
/y
1
F
1
/y
2
F
1
/y
d
2
F
2
/y
1
F
2
/y
2
F
2
/y
d
2
.
.
.
.
.
.
.
.
.
.
.
.
F
d
2
/y
1
F
d
2
/y
2
F
d
2
/y
d
2
_

_
,
where each partial derivative is evaluated at (x
0
, y
0
). If J is invertible, then there
exist open neighborhoods
1
R
d
1
,
2
R
d
2
containing x
0
and y
0
(respectively),
and a unique, continuously dierentiable function G :
1

2
such that
F(x, G(x)) = 0 for all x
1
.
B.0.1 Regions with smooth boundaries
Because we are interested only in trapping regions (see Chapter 4), we consider only
closed regions. A closed set / R
d
is said to have a (
1
-boundary if for every point
x / there is a neighborhood 1 of x and a (
1
-function : 1 R such that
(i) (x 1) (x) ,= 0, and
(ii) / 1 = x 1 : (x) 0.
(B.2)
Of course
/ 1 = x 1 : (x) = 0
265
and (x) equals the inward normal at x.
: Fact F: If /x
j
,= 0, then by the implicit function theorem, the equation
(x) = 0 may be solved for x
j
as a function on the other variables.
266
Appendix C
Notions from Linear Algebra
C.1 Appendix: A compendium of results from linear algebra
C.1.1 How to compute Jordan normal forms
In a linear algebra text, one expects the author to prove that an arbitrary square
matrix is similar to a Jordan canonical form, and this proof is a messy aair
1
. We
assume the reader has seen the denitions and the statement of the theorem but not
really followed the proof. Here we accept that a Jordan normal form exists, and we
ask, more simply, how to compute it. We break this problem into two sub-questions,
focusing more on examples than theory: given a matrix A,
Q1: How can we decide what the normal form of A is?
Q2: How can we nd the similarity transformation that produces the normal form?
The rst step in determining the normal form of A is to nd the eigenvalues of
A. Of course nding eigenvalues analytically is an intractable problem in general.
We work with hand-picked examples in which the eigenvalues are readily computed.
Example 1:
A =
_
5 2
2 1
_
.
It is readily computed that det(A I) = ( 3)
2
. Thus, there are two possible
Jordan forms for A,
J
1
=
_
3 0
0 3
_
and J
2
=
_
3 1
0 3
_
.
1
One of the best treatments of Jordan forms of which we are aware is in Appendix B of Strang [5]
267
If A were similar to J
1
= 3I, then every vector in R
2
would be an eigenvector.
However v R
2
is an eigenvector i (A 3I)v = 0, or writing this out
_
2 2
2 2
_
v = 0.
Obviously not every vector satises this equation, so J
2
must be the normal form
for A. Indeed, in hindsight we may see that if a 2 2 matrix has equal eigenvalues
but is not equal to a multiple of the identity, then its Jordan normal form must be
a 2 2 block.
Higher-dimensional examples in which there are double eigenvalues, but none of
higher multiplicity, do not pose any additional diculties, as we illustrate in some
of the Exercises. Let us turn our attention to eigenvalues of multiplicity three.
Example 2: Consider
A
1
=
_
_
a 1 1
0 a 0
0 0 a
_
_
A
2
=
_
_
a 1 1
0 a 1
0 0 a
_
_
A
3
=
_
_
a 0 1
0 a 1
0 0 a
_
_
.
By inspection, = a is the only eigenvalue of A
j
. Thus the possible normal forms
for A
j
are
J
1
=
_
_
a
a
a
_
_
J
2
=
_
_
a 1
0 a
a
_
_
J
3
=
_
_
a 1 0
0 a 1
0 0 a
_
_
where, to facilitate visualization, entries that are zero but lie outside of any Jordan
block are left blank. We distinguish between cases by examining the dimension of
the eigenspaces. These dimensions may be computed most easily by applying the
rank-plus-nullity theorem (see Strang [5]), which gives us
dimker(J
j
aI) = 3 rank(J
j
aI).
Thus J
1
, J
2
, J
3
have eigenspaces of dimension 3, 2, 1, respectively. Proceeding simi-
larly, we nd that A
1
, A
2
, A
3
have eigenspaces of dimension 2, 1, 2, respectively. Since
the dimension of eigenspaces is preserved under similarity transformations, we con-
clude that A
1
, A
2
, A
3
have Jordan forms J
2
, J
3
, J
2
, respectively.
Example 3:
A
1
=
_

_
a 0 0 1
0 a 0 1
0 0 a 0
0 0 0 a
_

_
A
2
=
_

_
a 0 1 0
0 a 0 1
0 0 a 0
0 0 0 a
_

_
A
3
=
_

_
a 1 0 0
0 a 0 1
0 0 a 0
0 0 0 a
_

_
.
268
The possible Jordan forms are
J
1
=
_

_
a
a
a
a
_

_
J
2
=
_

_
a 1
0 a
a
a
_

_
J
3
=
_

_
a 1 0
0 a 1
0 0 a
a
_

_
J
4
=
_

_
a 1 0 0
0 a 1 0
0 0 a 1
0 0 0 a
_

_
J
5
=
_

_
a 1
0 a
a 1
0 a
_

_
.
Proceeding as above, we compute that J
1
, J
2
, J
3
, J
4
, J
5
have eigenspaces of dimension
4, 3, 2, 1, 2, respectively. We can see potential trouble here in that J
3
and J
5
both
have two-dimensional eigenspaces. Now A
1
, A
2
, A
3
have eigenspaces of dimension
3, 2, 2 respectively. Thus we may conclude that A
1
has J
2
as its normal form, but
the dimension of the eigenspace does not distinguish between J
3
and J
5
for A
2
and
A
3
. For this task we turn to generalized eigenvectors: a vector v R
d
is called a
generalized eigenvector of a matrix A with eigenvalue if for some power p
(A I)
p
v = 0.
Choosing p = 2, we compute that (J
j
aI)
2
has a three-dimensional null space for
j = 3 and a four-dimensional null space for j = 5. On the other hand, (A
j
aI)
2
has
a four-dimensional null space if j = 2 and a three-dimensional null space if j = 3.
Thus the normal forms for A
2
, A
3
are J
5
, J
3
, respectively.
Now we turn to the second question above, nding the similarity matrix S such
that S
1
AS produces the Jordan form of A. As we shall see, the columns of S are
generalized eigenvectors of A (cf. Proposition 2.3.2).
Recall Example 1, where
A =
_
5 2
2 1
_
, with J =
_
3 1
0 3
_
its Jordan form. Observe that, with respect to the standard basis e
1
, e
2
for R
2
, the
matrix J satises
(J 3I)e
1
= 0 (J 3I)e
2
= e
1
.
To match this behavior for A, we need to nd vectors v
1
, v
2
such that
(A 3I)v
1
= 0 (A 3I)v
2
= v
1
,
269
and then the matrix S = Col(v
1
, v
2
) will achieve the required transformation. (Note
that (A 3I)
2
v
2
= 0, so v
2
is a generalized eigenvector.) One possible choice is
S =
_
1 1/2
1 0
_
.
In the Exercises we ask the reader to check that this matrix performs the desired
task. Incidentally, there is great latitude in the choice of S, more so than in the case
of distinct eigenvalues.
More subtle issues may arise in cases of higher multiplicity. Let A be the rst
of the three matrices considered in that Example 2, and let J be its Jordan form.
Observe that J satises
(J aI)e
1
= 0, (J aI)e
2
= e
1
, (J aI)e
3
= 0.
Thus we need to nd vectors v
1
, v
2
, v
3
such that
(A aI)v
1
= 0, (A aI)v
2
= v
1
, (A aI)v
3
= 0 (C.1)
and let S = Col(v
1
, v
2
, v
3
). Note that v
1
and v
3
are eigenvectors of A, but v
1
must be
chosen with care in order that the middle equation in (C.1), which is inhomogeneous,
has a solution. Now the eigenspace of A is spanned by
_
_
1
0
0
_
_
_
_
0
1
1
_
_
.
Suppose v
1
is a linear combination of these vectors with coecients , . Writing
out the middle equation in (C.1), we have
_
_
0 1 1
0 0 0
0 0 0
_
_
_
_
x
y
z
_
_
=
_
_

_
_
.
To have a solution we need = 0; to avoid trivialities we need ,= 0. Thus
S =
_
_
1 0 0
0 1 1
0 0 1
_
_
,
where we have chosen = 1, is one of the possible similarity matrices that transforms
A to its Jordan form.
In the Exercises we ask the reader to carry out this procedure and check that it
works for several of the matrices considered above.
270
C.1.2 The Routh-Hurwitz criterion
It is astonishingly easy to determine whether a polynomial with real coecients has
all its zeros in the left-half-plane. For example, for the two polynomials
Q
1
() =
4
+ 2
3
+ 3
2
+ 2 + 1
Q
2
() =
5
+ 2
4
+ 3
3
+ 3
2
+ 2 + 1,
the calculations in Table C.1.2 show that the rst has all its zeros in < 0 while
the second has at least one zero in 0, respectively. Let us explain these
calculations in the context of a general polynomial
P() =
n
+ c
1

n1
+ c
2

n2
+ . . . + c
n1
+ c
n
.
The algorithm is slightly dierent, depending on whether n is even or odd. Reecting
this dierence we dene = [n/2] where [] is the greatest-integer function: thus
n = 2 if n is even and n = 2+1 if n is odd. The algorithm forms an (n+1)(+1)
matrix A as follows. The rst two rows of A contain the coecients of even and odd
powers of :
a
1l
: 1 c
2
c
4
. . .
a
2l
: c
1
c
3
c
5
. . . .
(If n is even, then 0 is inserted as the last entry of the second row, as in the table on
the left.) Subsequent rows, 3, 4, . . . , n + 1, are calculated inductively from products
that resemble 2 2 determinants
a
k+1,l
= a
k,l
a
k1,l+1
a
k,l+1
a
k1,l
. (C.2)
In words, computation of a
k+1,l
involves selecting entries from the two preceding
rows and from the same column as a
k+1,l
and the one to the right. In calculating the
last column (l = + 1), entries a
k,+2
or a
k1,+2
outside the appropriate range are
assumed to be zero, as has been done in Table C.1.2. Then we have:
Theorem C.1.1. All the zeros of P lie in the open left half plane i all entries
a
k1
, k = 1, . . . , n + 1, in the rst column of the above matrix are positive.
If the calculation produces a zero row, as in the table on the right, then the
calculation is stopped and there is at least one zero in closed right half plane. Indeed,
note that Q
2
(i) = 0. A root of P on the imaginary axis will cause a zero row, but
a zero row may arise under other circumstances, also.
This theorem is proved in Section 4.2 of Engelbergs book. Although the proof
requires careful reading, it is not terribly dicult, just clever. In cases where some
of the zeros of P() lie in the right half plane, it is usually possible to deduce how
many zeros lie there.
271
1 2 3
1 1 3 1
2 2 2 0
3 4 2 0
4 4 0 0
5 8 0 0
1 2 3
1 1 3 2
2 2 3 1
3 3 3 0
4 3 3 0
5 0 0 0
6
Table C.1: The matrices a
kl
in the Routh-Hurwitz calculations for
Q
1
() =
4
+2
3
+3
2
+2+1 (left table) and Q
2
() =
5
+2
4
+3
3
+3
2
+2+1
(right table). Values of k from 1 to n + 1 appear in the rst column of each table;
values for l from 1 to + 1 appear in the top row. The two rows a
kl
: l = 1, 2,
which come directly from the coecients of the polynomial, are separated from later
rows that come from the calculation indicated in (C.2).
Reference: Shlomo Engelberg, A Mathematical Introduction to Control The-
ory, Series in Electrical and Computer Engineering, Vol. 2, Imperial College Press,
London, 2005. Duke Catalogue QA402.3.E527.
If A is a d d matrix with real entries, then in principle one could calculate
the characteristic polynomial of A and apply the Routh-Hurwitz criterion to it to
determine whether the eigenvalues of A lie in the left half plane. However, calcu-
lating the characteristic polynomial of a moderately large matrix by hand is not a
pleasant task. (One could of course resort to symbolic computations to obtain the
characteristic polynomial, but if the computer is involved, one might as well com-
pute eigenvalues directly.) Thus we refrain even from formulating an analogue of
Theorem 2.4.5 for 4 4 matrices. However, let us use the Routh-Hurwitz criterion
to handle the 3 3 case.
Proof of Proposition 2.4.5. Let A be a 3 3 matrix with characteristic polynomial
det(A I) = [
3
+ c
1

2
+ c
2
+ c
3
].
The table applying the Routh-Hurwitz criterion to this polynomial is shown in Ta-
ble C.1.2. Thus the roots of this polynomial are all in the left half plane i
(a) c
1
> 0, (b) c
1
c
2
c
3
> 0, (c) c
3
> 0. (C.3)
272
1 2
1 1 c
2
2 c
1
c
3
3 c
1
c
2
c
3
0
4 c
3
(c
1
c
2
c
3
) 0
Table C.2: The matrix a
kl
in the Routh-Hurwitz calculations for the general cubic

3
+ c
1

2
+ c
2
+ c
3
.
The coecients c
j
are related to the eigenvalues of A through
c
1
= (
1
+
2
+
3
)
c
2
=
1

2
+
2

3
+
3

1
c
3
=
1

3
.
Thus it is apparent that (C.3a) and (c) are equivalent to Conditions (i) and (iii) of
Theorem 2.4.5, and the equivalence of (C.3b) with Condition (ii) follows on observing
that
c
2
=
1
2
[(trA)
2
tr(A
2
)].
C.1.3 Continuity of eigenvalues of a matrix with respect to its entries
Near simple eigenvalues, the dependence of eigenvalues of a matrix on its entries is
as nice as one could wish. Specically, we have the following
Proposition C.1.2. Let
1
be a simple eigenvalue of a d d matrix A
0
. There is
an > 0 and a neighborhood | of A
0
in R
d
2
such any matrix A | has exactly one
eigenvalue in the disk [z
1
[ < , and moreover this eigenvalue is a dierentiable
function on |.
Proof. We prove this result by applying the implicit-function theorem to solve for
in the equation for eigenvalues,
f(, A) = det(A I) = 0.
Now f(, A
0
) is a product of eigenvalues (
1
) . . . (
d
). Dierentiation of this
product with respect to gives d terms, but only one of them is nonzero at =
1
:
f

(
1
, A
0
) = (
2

1
) . . . (
d

1
).
273
Since
1
is a simple eigenvalue, this product is nonzero, which completes the proof.
Suppose that A
0
has a simple eigenvalue
1
, and consider a one-parameter family
of perturbations, A
0
+ B. It follows from the proposition that there is a smoothly
varying eigenvalue
1
(A
0
+B) that equals
1
when = 0. In Exercise ?? we give a
formula for calculating the derivative of this eigenvalue with respect to at = 0.
Near multiple eigenvalues, the dependence of eigenvalues of a matrix on its entries
is complicated by a diculty familiar from complex function theory. For example,
consider the matrix function
A(, ) =
_
0 1
+ i 0
_
,
which has eigenvalues
j
(, ) =

+ i. We claim it is impossible to dene


these square roots as continuous functions of , in a neighborhood of zero in R
2
.
To see this, suppose otherwise that there are continuous eigenvalues
j
(, ). On
the positive -axis, the eigenvalues are

. Index the eigenvalues so that


1
is
positive on the positive -axis, and let us restrict
1
to a small circle that encloses
the origin: i.e., let
() =
1
( cos , sin ), where 0 < 2.
Calcluation then shows that
() =

e
i/2
. (C.4)
If
1
were continuous we would have (2) = (0), but in fact (C.4) implies that
(2) = (0). This contradiction proves the claim.
The reader may protest that the matrix in this example has complex entries, but
here is a 4 4 matrix with real entries that exhibits the same diculty:
A(, ) =
_
0 I
B(, ) 0
_
where 0 is the 2 2 zero matrix, I is the 2 2 identity matrix, and
B(, ) =
_


_
.
Although at a multiple eigenvalue, one cannot dene individual eigenvalues con-
tinuously, nonetheless the group of eigenvalues does vary continuously, in the sense
of the following
Proposition C.1.3. Let
1
be an eigenvalue of a d d matrix A
0
of multiplicity k.
274
There is an
0
> 0 with the property that for all <
0
there is a neighborhood | of
A
0
in R
d
2
such any matrix A | has exactly k eigenvalue in the disk [z
1
[ < .
The conclusion of this result is a standard epsilon-delta characterization of conti-
nuity, with three dierences: (i) a set of eigenvalues, rather than a single eigenvalue,
is being bounded, (ii) an upper bound is needed on epsilon, and (iii) delta is relaced
by the neighborhood |. Two remarks: (i) For
0
one may use any number less
than the minimum separation between
1
and the other eigenvalue of A
0
. (ii) The
maximum diameter of | scales like
1/k
as 0.
The proposition is easily proved with complex-function theory, but, since this
subject is not a pre-requisite for this text, we do not give the proof here.
Corollary C.1.4. Suppose all the eigenvalues of A lie in the left half plane < 0.
For any perturbation matrix B, for suciently small all the eigenvalues of A+B
also lie in the left half plane.
The reader is asked to derive this corollary in the Exercises.
If one restricts attention to symmetric matrices, then all eigenvalues are real, and
one may dene individual eigenvalues continuously by ordering them. For example,
we may dene
1
(A) to be the smallest eigenvalue of A;
2
(A) to be the next smallest
eigenvalue; etc. (Ties do not matter for these denitions.) However, even though
with this convention the eigenvalues are continuous, they are not dierentiable: this
is demonstrated by the matrix
A(, ) =
_


_
,
which has eigenvalues
_

2
+
2
.
C.1.4 Fast-slow systems
Recall problem from Chapter 1. More general system
x

= x +b
T
y
y

= xc + By
Coecient matrix
A =
_

1

1
b
T
c B
_
Compare two approaches:
1. Solve for x = b
T
y. Substitute into y-eqn, get reduced system
y

= (B +cb
T
)y.
275
2. Full system.
Full system has one e-value approx equal to
1
. Show other eigenvalues are
same, modulo to some fractional power. In particular, reduced system is stable i
full system is.
C.1.5 Exercises
Prove Corollary C.1.4.
Suppose that A
0
has a simple eigenvalue
1
, and let () be the eigenvalue of
A
0
+B that equals
1
when = 0. In Exercise ?? we give a formula for calculating
d/d(0).
1. In case A
0
has block-diagonal form, show derivative is 1,1-entry of B.
2. Suppose S
1
A
0
S is block diagonal. Note that rst col of S is eigenvector, say v,
of A
0
, rst row of S
1
is eigenvector, say w, of A
T
. Argue that
d/d(0) = w, Bv).
276
Appendix D
Nondimensionalization and Scaling
D.1 Classes of equations in applications
The applications in this text may be group into three broad classes: Mechanical,
electrical, and bathtub models. ODEs from all three classes appeared already in
Chapter 1. In this Appendix we briey describe each class, and then we discuss an
important technique for analyzing such equations.
D.1.1 Mechanical models
Mechanical systems that we encountered in Chapter 1 include spring-mass systems
(see (1.21)), the pendulum (see (1.4)), and Dungs equation (see (1.25)). The
pendulum and models derived from it represent one of the three central examples
in this text; extra eects added to the pendulum equation include (i) torque (??),
(ii) rotation (??), and (iii) vertical vibration (??). The Lorenz system (see (7.5)) to
be studied in Chapters 7 and 8 also has a basis in mechanics.
Usually the equations of motion for a mechanical system can be derived from a
straightforward application of Newtons second law: mass times acceleration equals
the sum of the forces. Let us illustrate this by deriving the equations of motion of
a rotating pendulum, an example that occurs repeatedly in the text. Although we
speak of a rotating pendulum, we nd it more intuitive to view this system as a
bead sliding on a wire hoop that is rotating about a vertical axis, as illustrated in
Figure 7.1. Let m be the mass of the bead; a the radius of the hoop; the (constant)
speed of rotation. We apply Newtons second law to describe the motion, which is
purely tangential. If x measures the angular position of the bead as a function of time
(see gure), then the tangential acceleration is ax

. There are three forces acting


tangentially on the bead: (i) gravity, whose projection onto the tangential direction
is mg sin x; (ii) centrifugal force from rotation about the vertical axis in a circle
277
of radius equal to a sin x, whose tangential projection is m(a sin x)
2
cos x; and (iii)
friction, which we model as x

. Expressing Newtons law as a rst-order system


yields
x

= y (D.1)
may

= y mg sin x + m(a sin x)


2
cos x.
Regarding global existence for ODEs describing mechanical systems, often a trap-
ping region may be constructed as the region interior to a level set of the energy. We
did this for the torqued pendulum in Section 4.3.4 and for the rotating pendulum in
Section 5.4.?? Further examples are given in the Exercises.
D.1.2 Electrical models
Van der Pols equation (1.31) originally arose in modeling an electrical circuit. ODEs
with an electrical basis also arise in models for nerve cells. Specically, in this
book we study the FitzHugh-Nagumo equations (see (5.69)) and an elaboration of
these equations Chapter 8 (??) intended to mimic the Morris-Lecar model ref ce).
Incidentally, as we show in chapter ref ce?, van der Pols equation (4.29) is a
special case of the FitzHugh-Nagumo equations. In Section 4.3.3 we worked rather
hard to construct a trapping region for van der Pols equation. Ironically, it is easier
to construct a trapping region for the more general FitzHugh-Nagumo equations,
which we ask the reader to do in Exercise ??.
Van der Pols equation and equations for other circuit models may be derived
from Kirchos laws plus information about the current-voltage characteristics of
the various devices in the circuit. For biologically based models, one needs to discuss
physiological issues in order to understand current-voltage characteristics. We do not
attempt to introduce this material in the text. On web site? For mathematicians
not primarily interested in biology, the fast-slow analysis of Section 6.5 provides
more useful intuition about the FitzHugh-Nagumo system than the derivation of the
equations from fundamental laws
1
At end of this TeX le there is derivation of v.d.Pol from FitzHugh-
Nagumo , junked.
D.1.3 Bathtub models
We borrow the phrase bathtub model from Ellner-Guckenheimer []. This tongue-
in-cheek phrase describes models in which a population (of organisms, chemicals,
1
There is a reward for understanding the derivation of the van der Pol equation: this resolves the
mystery of how this equation seems to involve the creation of egergy out of nowhere. In biological
processes, there is no issue: in fact energy is being consumed as time evolves.
278
proteins, enzymes, etc.) is divided into various categories (bathtubs) and the ODEs
track the ow of individuals (water) from one category to another. Strictly speaking
the populations in such models are usually integers, but in the case of large pop-
ulations one may approximate an integer variable by a continuous variable. Two
of our three central examples belong to this class: (i) the Lotka-Volterra equations
with elaborations (1.41), in which the population consists of animals and (ii) the
activator-inhibitor equations (4.24), in which the population consists of chemicals
proteins?. Several other examples occur in the text and the Exercises. Q: Worth
listing these?
Typically in bathtub models there are several processes causing motion from one
category to another, and it is a wonderful simplication that equations describing
the overall evolution may be obtained simply by adding the rates associated with
the various processes. For example, consider the activator-inhibitor equations (4.24):
each equation has two terms, the rst representing a nonlinear production rate and
the second, a linear decay, and these terms are merely added.
Regarding the name activator-inhibitor: The variable x in (4.24) is called an
activator since its production rate
2
increases with its own concentration. The pro-
duction rate of r also increases with x; this variable is called an inhibitor or repressor
because a high concentration of r suppresses the production of x through the fac-
tor 1 + r in the denominator of the rst term in (4.24a). Biologists represent these
interactions in the schematic graph shown in Figure ??. The vertices of the graph
enumerate the chemical species undergoing reaction. The edges of the graph indicate
that one concentration inuences the production rate of another chemical. An edge
that terminates in an arrowhead indicates promotion; one that terminates in a short
cross bar, inhibition.
For many bathtub models it is not dicult to prove global existence. For example,
in (4.24) each population has a death term that grows with the population, and
the growth terms are bounded. As we showed in Section 4.3.1, a suciently large
rectangle will serve as a trapping region. We ask the reader to carry out these
steps in the Exercises for other models.
D.2 Scaling and nondimensionalization
Scaling and nondimensionalization are techniques for simplifying ODEs that arise in
applications, extracting the most important features from them. In the hands of a
skilled user, they are extremely powerful, as is illustrated by the following anecdote
ref ce about G.I. Taylor, a distinguished twentieth-century British physicist/applied
2
A function of the form x
n
/(1 +x
n
) is called a Hill function. ref ce to Alon Although the shape
of this function suggests a hill, the name actually refers to ?? Hill, a ??-century biologist??.
Describe: starts at zero, saturates.
279
mathematician. Using this kind of analysis, he estimated the power of one of the
early tests of the atomic bomb, based on only a series of photographs from the
cover of Time Magazine of the mushroom cloud at several times closely following the
explosion. His estimate was so accurate that it led to an investigation by the FBI!
Q: Where make the point that in scaling want to have all quantities
be O(1) or smaller?
We introduce these techniques by three examples.
D.2.1 Dungs equation
Dungs equation (1.25) provides a simple example in which to introduce the tech-
nique. With all quantities retaining their units, the equation is
mx

+ x

k
1
x + k
2
x
3
= 0. (D.2)
The technique focuses attention on two basic questions:
Q1: What dimensionless quantities can be constructed from the parameters in the
problem by forming products of powers: i.e., quantities of the form
m
a

b
k
c
1
k
d
2
where the exponents a, b, c, d may be chosen arbitrarily?
Q2: How can the form of the equations be simplied by introducing scaled variables,
x = x/L, = t/T (D.3)
where the scale factors L, T may be chosen arbitrarily?
To address these questions, lets make a table of the dimensions of all the quan-
tities in the equation, both variables and parameters. This information for (D.2) is
given in table D.2.1. To compile this information, one rst determines the units of
t, x, m from the physical origins of the equation, and then one determines the other
units by the requirement that the various terms in the equation must have the same
dimensions. For example, we know that the units of the rst term in (D.2) are mass
times length divided by time squared; in symbols
U(mx

) = m/t
2
where U denotes units of. Thus, from requiring consistent units between terms we
deduce that
m/t
2
= U(x

) = U()U(x

) = U()/t,
from which the entry for the units of in the table follows.
280
Variables Description Units
t time t
x length
Parameters Description Units
m mass m
friction m/ t
k
1
lin spring const m/ t
2
k
2
nonlin spring const m/
2
t
2
Table D.1: Units of quantities in Dungs equation, (D.2).
Dimensionless quantities appear explicitly in the rst question, and they also arise
in the second one, in the following way. In Question 2, the most convenient choice
for L has the units of length, so x is dimensionless; likewise t. It would be dicult to
overemphasize the importance of dimensionless quantities in understanding physical
equations. For example, it is meaningless to speak of a quantity with nontrivial
units being either large or small. Let us support this statement with an apparently
outrageous claim: one normally thinks of the speed of light as a very fast velocity,
but in fact it is only 10
6
provided one mischievously measures velocity in terms
of astronomical units per millisecond. The speed of light is fast compared with
velocities encountered in most circumstances like the speed of sound or the speed
of an automobile, which mathematically means that the dimensionless ratio of the
speed of light divided another speed is large.
Turning to the rst question, we nd from the table that
U(m
a

b
k
c
1
k
d
2
) = m
a+b+c+d
t
b2c2d

2d
.
Requiring this to be dimensionless means that we must have
a + b + c + d = 0
b + 2c + 2d = 0
d = 0.
We have three equations in four unknowns, the coecient matrix has rank 3, so there
is one linearly independent solution. We may take b = 1, a = c = 1/2, which gives
the dimensionless quantity /

mk
1
.
To address second question lets substitute (D.3) into (D.2); after dividing the
equation by mL/T we have
d
2
x
d
2
+
T
m
dx
d

k
1
T
2
m
x +
k
2
L
2
T
2
m
x
3
= 0.
281
We choose T =
_
m/k
1
to make coecient of term linear in x equal to unity, so we
have
d
2
x
d
2
+

mk
1
dx
d
x +
k
2
L
2
k
1
x
3
= 0.
Next we choose L =
_
k
2
/k
1
, and the equation simplies to
d
2
x
d
2
+

mk
1
dx
d
x + x
3
= 0,
which is the simple form of the equation introduced in Chapter 1. All coecients
in the equation are either simple numbersi.e., 1or the dimensionless quantity
constructed in answering Question 1.
The task of nondimensionalization is not complete until one has interpreted the
dimensionless quantities constructed in answering Question 1 and the scale parame-
ters T and L arising in answering Question 2. One convenient interpretation is based
on comparing (D.2) with the equation for simple harmonic motion,
mx

+ k
1
x = 0. (D.4)
The time scale T =
_
m/k
1
derives from the period of cos(
_
k
1
/mt), i.e., the trigono-
metric solutions of (D.4). The length scale L =
_
k
2
/k
1
is the displacement for which
the nonlinear and linear forces are of the same order of magnitude. Finally, the di-
mensionless parameter /

mk
1
is a measure of the strength of friction in (D.2).
More precisely, the linear equation mx

+ x

+ k
1
x = 0 has solutions of the form
e
t/2m
times a trigonometric function; in the characteristic time T this exponential
has decayed by
e
T/2m
= e
(/

mk
1
)/2
.
Here is another perspective on the importance of dimensionless quantities: no one
parameter in (D.2) by itself determines the behavior of solutions of this equation, in
contrast with the dimensionless combination /

mk
1
: problems for which /

mk
1
is large and for which it is small are dierent systems, reected in dierent behavior
of solutions. For example, the dierence between under-damped, over-damped, and
critically damped depends on /

mk
1
.
D.2.2 Lotka-Volterra with logistic limits
The fully dimensional form of the Lotka-Volterra equations with logistic limits on
prey growth is
x

= x x
2
xy
y

= xy y
(D.5)
282
Variables Description Units
t time t
x number of prey N
x
y number of predators N
y
Parameters Description Units
prey growth rate 1/ t
predation coecient 1/ N
y
t
growth coecient from predation 1/ N
x
t
predator death rate 1/ t
nonlinear limit to prey growth 1/ N
x
t
Table D.2: Units of quantities in the Lotka-Volterra equation with logistic limits,
(D.5).
Table D.2.2 lists the units of variables and parameters in (D.5). In the Exercises, we
ask the reader, after checking the information in the table, to show that there are
exactly two dimensionless combinations of the parameters in (D.5), which we may
take to be
=

and K =

. (D.6)
If we substitute
x = x/X, y = y/Y, = t/T
into (D.5), we derive the equations
dx
d
= Tx TXx
2
TY xy
dy
d
= TXxy Ty.
We choose T = 1/ to simplify the linear growth term in the x-equation. Similarly
we choose Y = / to simplify the nonlinear predation term in the x-equation.
Finally we choose X = / for reasons we will explain below. These substitutions
yield the equations
dx
d
= x(1 x/K) xy
dy
d
= (xy y)
(D.7)
where , K are dened by (D.6).
It remains to interpret the dimensionless parameters and the various scales in-
troduced in deriving (D.7). It is clear that is a ratio of time scales, specically of
prey growth to predator death. Likewise it is clear that K is the nondimensional
carrying capacity of the environment for the prey. The time scale T is based on the
prey growth rate. The predator scale Y , chosen to simplify the x-equation, may be
interpreted as the population of predators that exactly balances the linear growth
283
rate of the unperturbed prey. Finally, to interpret the prey scale X, we consider the
steady-state solutions of (D.5). Provided the carrying capacity is large enoughin
symbols, < the equations have a steady-state solution in which both popu-
lations are nonzero. In this steady state, x = /, and we chose this value for our
prey scale.
Nondimensional reductions of an ODE are far from unique. In the rst place, one
might have chosen prey scale X = / so that the nonlinear term in the y-equation
would have a coecient of unity; our choice above is simply a matter of taste. On
a more serious level, a dierentbut equally naturalnondimensionalization would
result from choosing prey scale based on the carrying capacity rather than predation:
specically X = /. (In the Exercises we ask the reader to determine the equations
in this scaling.) What normalization is most convenient depends on what problem
one is studying. In our discussion of the Lotka-Volterra equation above, we are
comparing solutions of the equation with nite carrying capacity to periodic solutions
of the original Lotka-Volterra equations: i.e., with innite carrying capacity. While
is is easy to let the carrying capacity tend to innity in (D.7), it of course impossible
to do so if the carrying capacity is chosen as the prey scale.
Incidentally, if one measures both prey and predator populations in terms of their
biomass, then the ratio / may be regarded as an additional dimensionless param-
eter. Indeed, this parameter may be interpreted in terms of conversion eciency:
i.e., the mass of prey that must be consumed to produce a unit mass of predators.
However, as we have seen above, in fact this parameter may be scaled out of the
Lotka-Volterra model. It is unusual for a dimensionless parameter to be of so little
consequence; and in certain more complicated predator-prey models it does play a
decisive rolesee Rosenzweig-MacArthur problem.
D.2.3 Michalis-Menton kinetics
An enzyme is a catalyst in biochemical reactions: i.e., it facilitates the reaction but is
not consumed itself. Michalis-Menton kinetics arise as an approximation of certain
reaction rates where an enzyme is involved. Consider a chemical reaction
3
of the
form
R+E [RE] P+E. (D.8)
Here R is a reactant (often called a substrate), E is an enzyme, [RE] is a compound
in which the reactant and the enzyme are bound to one another, and P is a product.
The reaction R+E [RE] is reversible, while the production of the product is
3
Reaction (D.8) might seem to violate conservation of massotherwise how could P be dierent
from R? The product P might be whats this called? an isomersame chemical composition,
dierent spatial structure. Also possible that [RE] P+E represents a binary reaction [RE]
+X P+E where X is so plentiful that its concentration may be treated as constant during the
reactions; its concentration could be included in the reaction constant k
1
.
284
considered irreversible. According to the law of mass action, a binary reaction such
as R+E [RE] proceeds at a rate proportional to the product of the concentrations
of R and E, while the unitary reactions [RE] R+E and [RE] P+E proceed at
rates proportional to the concentration of [RE]. Let r, e, c, p denote the concentrations
of R,E,[RE],P, respectively (mnemonic: c for compound). We assume that all
concentrations are uniform over some region in space so that the concentrations are
described by ODEs (rather than PDEs), specically by the equations
4
r

= k
+1
re + k
1
c
e

= k
+1
re + k
1
c + k
2
c
c

= k
+1
re k
1
c k
2
c
p

= k
2
c
(D.9)
where we use a standard, natural notation for the reaction constants k
j
. Although
this is a system of four equations, it may be reduced to two equations. In the rst
place, the equation for p decouples from the other three equations, so it may be
ignored until the other three have been solved. Less trivially, by adding the e and c
equations we deduce that
Claim D.2.1. The sum e + c is independent of time.
Let us denote by E the constant value of e+c; for example, if initially none of the
compound [RE] is present, then E is the initial value e(0). We may use the relation
e = E c to eliminate e from the equations. Thus, it suces to consider the system
r

= k
+1
r(E c) + k
1
c
c

= k
+1
r(E c) k
1
c k
2
c.
(D.10)
In the typical situation R is very abundant while the amount of E is rather
limitedit is entirely possible that at some time E will be almost completely bound
in the compound RE while the concentration of R is decreased only modestly. This
dierence in concentrations could arise because E is physically located only on a sur-
face bounding a three-dimensional region which contains R; however, for simplicity
we shall consider both R and E both to be distributed over some region, just that
the concentration of E is vastly smaller than that of R.
With nondimensionalization, one can systematically analyze the mathematical
consequences of the great dierence in the concentrations of R and E. Table D.2.3
4
Note that the same reaction constants appear in dierent equations. This simplication arises
because we measure all concentrations in moles per unit volume. Q: Is this called molar?
This is like measuring concentration in number of molecules per unit volume, except to avoid
excessively large numbers we count molecules in units of Avagadros number, N
a
=??. This
requires more explanation. Also remember Q: Why cant you measure predators
and prey in terms of biomass and get an extra dimless constant in Lotka-Volterra ?
285
Variables Description Units
t time t
r, c concentration molar
Parameters Description Units
k
+1
reaction rate 1/ (t molar)
k
1
reaction rate 1/ t
k
2
reaction rate 1/ t
E concentration molar
Table D.3: Units of quantities in (D.10), the reactions leading to Michalis-Menton
kinetics.
lists the units of all quantities in (D.10). In the Exercises we ask the reader to show
that two dimensionless combinations may be constructed from these parameters,
which may be chosen to be
=
k
2
k
1
and =
Ek
+1
k
1
. (D.11)
We claim the parameter is exceedingly small. Now the concentration of the enzyme,
which equals Ec, will never exceed E. On the other hand, k
1
/k
+1
is an appropriate
scale for rit may be interpreted as the concentration of R at which the forward and
backward reactions in R+E [RE] proceed at equal rates if half of the enzyme
is bound (i.e., c = e = E/2). Thus is bounded by the ratio of the concentration of
the enzyme to that of R, from which the claim follows.
To nondimensionalize (D.10), we dene
r =
k
+1
k
1
r c = c/E, t = k
1
t, (D.12)
which yields
dr/dt = r(1 c) + c
dc/dt = r(1 c) (1 +)c,
(D.13)
the equations (4.35) that we discussed above. In particular we conrmed the validity
of the fast-reactions-to-completion approximation for these equations. Because of
the importance of the Michalis-Menton approximation, let us return to unscaled
variables to express the rate at which the product P is produced
dp
dt
= k
r
K + r
(D.14)
where k = k
2
E and K = (k
1
+ k
2
)/k
+1
. The key feature of (D.14) is that the
286
reaction rate saturates at large r.
Q: Mention Hill function? Q: In Exercise give reaction scheme that
leads to ODE with RHS like 1/(K + r)?
D.3 Exercises
List some exercises as must-do.
Extend theory (exist, unique, dibility) to nonaut eqn.
Scale activator-inhibitor.
Scale Lotka-Volterra -log using K = 1 to scale prey.
A long one: nondimensionalize the FitzHugh-Nagumo equations. Start with
(??), which has eight constants, reduce it to (4.70), which has three. You have ve
constants to play with in
x =
x + a
X
, ybar =
y + b
Y
, t =
t
T
.
Q: Why doesnt multiplying eqn by constants do anything? Is it like
for a linear second-orde eqn? I.e., although there are 3 possible scaling
parameters, scaling x and scaling the equation do the same thing. Maybe
need to put a warning when do naive counting of parameters.
Choose a to move the local min to the origin.
Choose X to move the other root of the cubic to x = 1.
Choose T to make coecient of x
3
equal to unity.
Choose Y to make coecient of y in rst eqn equal to unity.
Choose b to make const term in second eqn go to zero.
Lesson to learn: in some problems you can translate a variable and get
further simplication. The count of dimless parameters is misleading in
such a case.
287
Bibliography
[1] K. E. Atkinson, An introduction to numerical analysis, 2nd ed., Wiley, New York,
1989. 4, 15, 4.8.3, 6.1.2
[2] M. Golubitsky and D.G. Schaeer, Singularities and groups in bifurcation theory,
Springer-Verlag, New York, 1988. 7.5.3
[3] M. H. Holmes, Introduction to perturbation methods, Springer-Verlag, New York,
1995. 6.1.2, 3
[4] R. K. Nagle, E. B. Sa, and A. D. Snider, Fundamentals of dierential equations
and boundary value problems, 6th ed., Addison Wesley, 2011. 1.3.2
[5] G. Strang, Introduction to linear algebra, 4th ed., Wellesley Cambridge Press,
Wellesley, MA, 2009. 2.3.1, 1, C.1.1
288
Index
absolutely convergent sequence, 42
Airys equation, 2
alpha limit set, 181
asymptotically stable equilibrium, 134
attractor, 52
autonomous, 3, 19
Banach space, 67
bifurcation diagram, 214
blow-up of solutions, 62
cantilever beam, 12
capacitor, 16
Cauchy sequence, 67
Cauchy-Schwarz inequality, 38
center manifold, 169
circuit, electrical, 16
complete, 67
constant-coecient system, 36
continuous, 43
continuous dependence on initial data,
20
continuously dierentiable, 43
contraction, 67
contraction mapping principle, 68
convergence of a sequence, 41
diagonalizable matrices, 46
dierentiable, 43
diusion, 141
direction eld, 5
distance, 41
double well potential, 14
Dungs equation, 115, 130
eigenvalue problem, 37
energy, 16, 148
equilibrium point, 130
existence of solutions, 20, 64, 68, 76
existence, global, 86, 87
exponential of a matrix, 37
rst return, 197
FitzHugh-Nagumo equations, 168
xed point, 68
ow, 103
forward time, solution in, 74
fundamental existence theorem, 64, 68
fundamental existence theorem, nonau-
tonomous case, 76
global existence, 86, 87
globally Lipschitz, 63
Gronwalls Lemma, 71, 86
Hamiltonian, 166
homogeneous, 3, 19
Hookes Law, 10
hyperbolic equilibrium, 136
IC, see initial condition
index, 206
index of an equilibrium, 209
inductor, 16
inhomogeneous equation, 55
initial condition, 4
initial value problem, 4
inner product, 38
integral equation, 66
invariant, 158
IVP, see initial value problem
Jordan block, 50
Jordan curve, 177
Jordan Curve Theorem, 206
Jordan normal form, 50
kinetic energy, 16
Lasalles Invariance Principle, 145
Leibniz rule, 44
level set, 20
289
Liapunov function, 143
Liapunov stability theorem, 144
Liapunov stable equilibrium, 133
limit cycle, 173
linear ODE, 3
linear system, 19
linearization, 131
Lipschitz continuity, 63, 73
locally Lipschitz, 63
locally uniformly Lipschitz, 76
logistic equation, 1, 7
Lotka-Volterra model, 17
magnets, 12
Mathieus equation, 2
matrix exponential, 37
matrix norm, 40
maximal interval of existence, 84
Michaelis-Menten kinetics, 101
Newtons second law of motion, 9
non-degenerate equilibrium, 209
non-diagonalizable matrices, 48
norm, 41, 66
norm (for matrices), 40
norm (for vectors), 38
nullcline, 92
numerical methods, 124
ODE, see ordinary dierential equation
omega limit point, 181
omega limit set, 181
orbit, 20, 147
order of a numerical method, 121
order of an ODE, 2
ordinary dierential equation, 1
pendulum equation, 2, 12
periodic solution, 171
perturbation methods, 124
phase plane, 19
Poincare map, 195
Poincare-Lindstedt method, 187
positively invariant set, 182
potential energy, 14
potential function, 147
predator-prey, 17
Pythagorean Theorem, 38
Rayleigh number, 218
real canonical form, 49
relaxation oscillations, 194
resistor, 16
Riccati equation, 2
Routh-Hurwitz criterion, 271
Runge-Kutta-Fehlberg method, 122
saddle-node bifurcation, 221
secular term, 187
semi-group property, 104
separable ODE, 7
separatrices, 157
similar matrices, 46
simple harmonic motion, 1
simple, closed curve, 177
singularity, 2
sink, 52
solution of an ODE, 3
solution operator, 103
spring-mass system, 9
stable manifold, 151
sti ODEs, 122
strict Liapunov function, 144
superposition principle, 4
system of ODEs, 17
total energy, 14
trace-determinant criteria, 54
trajectory, 20, 147
transcritical bifurcation, 220
trapping region, 87
triangle inequality, 39
uniqueness, 62
uniqueness of solutions, 20, 74
uniqueness theorem, 75
290
uniqueness theorem, nonautonomous case,
76
unstable manifold, 151
van der Pols equation, 16, 19
variation of parameters, 55
well-posed, 21
291

S-ar putea să vă placă și