Sunteți pe pagina 1din 65

Classical mechanics: a minimal standard course

Sergei Winitzki
August 18, 2006
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1. Introduction. Classical mechanics: from Newton to Arnold 6
1.1. What is classical mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2. Mathematical methods used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3. Newtonian mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1. From Newtonian mechanics to theoretical mechanics . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Overview: what is a minimal standard course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1. Prerequisites for this course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2. Core material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.3. Extra material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5. Textbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
I. Core topics 10
2. A brief primer on differential equations 11
2.1. Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Differentiation of integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3. General solutions and particular solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4. Method of variation of constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5. Method of separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6. Miscellaneous cases when solutions are guessed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3. Introducing Lagrangians 17
3.1. Mechanics considered using forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2. Introducing the action principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1. Questions and answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3. Variation of a functional . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.1. Intuitive calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.2. Calculation with variational calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.3. General formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.4. Euler-Lagrange equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4. How to choose the Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.1. Examples of Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5. Standard exercises in setting up Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1. Advantage of Lagrangian approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4. Further directions 25
4.1. Overview of our progress so far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2. What remains: applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3. What remains: theoretical developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5. Motion in central force 27
6. Small oscillations 28
7. Rotation of rigid bodies 29
8. Hamiltonian formalism 30
2
Contents
9. Standard problems 31
9.1. Lagrangians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.2. Central force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.3. Small oscillations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.4. Rigid body rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
9.5. Hamiltonians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
II. Optional topics 32
10. Scattering 33
11. More about Lagrangians 34
11.1. Why does the extremum of a functional determine motion? . . . . . . . . . . . . . . . . . . . . . . . . . 34
11.2. Why can we use arbitrary coordinates in the Lagrangian? . . . . . . . . . . . . . . . . . . . . . . . . . . 35
11.2.1. Formal derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
11.2.2. Geometric picture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
11.3. Is the Lagrangian unique? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
12. Constrained systems: Lagrange formalism 38
12.1. What are constrained systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
12.2. Conditional minimization and Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
12.2.1. Example of using Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
12.2.2. General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
12.3. Motion constrained to a surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
12.3.1. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
12.3.2. Lagrange multipliers and constraining forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
12.4. Constraints involving velocities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
13. Advanced canonical methods 43
13.1. Action evaluated on solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
13.2. Hamilton-Jacobi equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
13.2.1. Separation of variables in general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
13.2.2. Separation of variables in HJ equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
13.2.3. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
13.3. Action-angle variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
13.3.1. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
13.4. Adiabatic invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
13.4.1. Change of adiabatic invariant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
14. Perturbation theory and anharmonic oscillations 53
14.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
14.2. What is perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
14.2.1. A rst example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
14.2.2. Limits of applicability of perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
14.2.3. Higher orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
14.2.4. Convergence and asymptotic expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
14.2.5. How to guess the perturbative ansatz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
14.3. Perturbation theory for differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
14.3.1. Precision and limits of applicability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
14.3.2. Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
14.3.3. Perturbative expansions with = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
14.4. Anharmonic oscillations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
14.4.1. Failure of a simple perturbation ansatz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
14.4.2. Lindstedt-Poincar method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
14.4.3. Precision of the approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
14.4.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
14.5. Suggested literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3
Contents
15. License for this text 61
15.1. GNU Free Documentation License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
15.1.0. Applicability and denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
15.1.1. Verbatim copying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
15.1.2. Copying in quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
15.1.3. Modications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
15.2. Authors position on commercial publishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4
Contents
Preface
Copyright c 2006 by SERGEI WINITZKI. Permission
is granted to copy, distribute and/or modify this doc-
ument under the terms of the GNU Free Documenta-
tion License (version 1.2 or any later version published
by the Free Software Foundation) with an Invariant Sec-
tion being chapter 15, with no Front-Cover Texts, and no
Back-Cover Texts (see Sec. 15.1 for the conditions). The
GFDL permits, among other things, unrestricted verba-
tim copying of the text. The source les used to prepare
a printable version of this text, as well as any updates,
will be found at the authors home page.
This text is not yet nished. It is intended to be a min-
imal standard course of classical theoretical mechanics,
providing sufcient material for advanced undergradu-
ates or beginning graduate students who will then con-
tinue studying theoretical physics.
In this text, I shownewconcepts in boldface within or
near the sentence where they are dened, and I italicize
words for semantic emphasis only.
5
1. Introduction. Classical mechanics: from Newton to
Arnold
This chapter describes the place of theoretical mechan-
ics within physics and gives an overview of this course.
You can skip this chapter and start at Section 2 if you
would like to begin studying new material right away.
1.1. What is classical mechanics
Classical mechanics is a branch of physics that describes
the motion of point masses (very small things, or things
whose size is unimportant for the present problem) and
rigid bodies (large things that can rotate as a whole but
cannot change their shape). This is very useful because
many objects in real life can be approximately consid-
ered to be either point masses or rigid bodies in most
situations.
To give a taste of the range of problems solved in clas-
sical mechanics, let me list some typical tasks:
To determine the trajectory of a stone thrown into
the air with known initial velocity. (The stone is con-
sidered to be a point mass.)
To determine how much energy and how much
time is needed to accelerate a car to a given speed.
(Point mass.)
To predict the motion of a spacecraft approaching
some planet, if the initial position and the velocity of
the spacecraft far from the planet are known. (The
spacecraft is considered to be a point mass.)
To determine the possible frequencies of oscillations
of a system of point masses and rigid bodies con-
nected by springs (such a system could be a single
NH
3
molecule or an entire Golden Gate bridge).
A thin, heavy disc needs to rotate at 7,200 revo-
lutions per minute. We need to determine the re-
quired strength of the engine driving the rotation, if
we know the force of friction. (The disc is consid-
ered to be a rigid body rotating as a whole.)
Of course, one can consider also much more complicated
problems than these. For example:
A light spinning top stands at an angle on the sur-
face of a heavy cylinder that can roll along a hori-
zontal plane without sliding. Determine the initial
conditions that would allow the top to avoid falling
off the cylinder. (Both the cylinder and the top are
treated as rigid bodies.)
Aspacecraft launched fromthe Earth needs to reach
the surface of Mars within a certain time. Predict
the most appropriate time of year for this mission
and determine the smallest necessary amount of
rocket fuel. (The spacecraft is considered a point
mass moving in the gravitational eld of the Sun,
the Earth, and Mars.)
Theoretical mechanics studies general descriptions ap-
plicable to every such systemfrom a somewhat abstract,
mathematical perspective. Students of theoretical me-
chanics do not actually learn how to design a highway
bridge or plan a spacecraft mission. Instead, one studies
general principles that apply to every such situation and
allow one to eventually build real, working devices of
arbitrary complexity.
1.2. Mathematical methods used
Classical mechanics uses ordinary differential equa-
tions (ODEs) to formulate the laws of motion of
bodies mathematically. Distances, x, y, z-coordinates,
angles, etc. are functions that depend on time,
e.g. x(t), y(t), z(t), (t), (t), and satisfy certain (systems
of) ODEs.
One can use several mathematical methods to solve
such equations. In some cases, solutions can be found
exactly, for instance: x(t) + x(t) = 0 has the general so-
lution Asin(t) + Bcos(t). In other cases, solutions are
found only in terms of integrals that one cannot evalu-
ate in closed form. Sometimes, exact solutions cannot
be found at all, but one can nd approximate solutions
by using a general method, such as perturbation theory.
Such methods will yield an analytic formula for the ap-
proximate solution. Finally, any ODE with numerical co-
efcients can be solved numerically (using a computer
program) up to a certain precision.
Students of mechanics are expected to learn methods
of solving certain standard differential equations that
are exactly solvable, for instance: multidimensional har-
monic oscillators, motion in 1-dimensional force eld,
motion in 3-dimensional central force eld. Numerical
methods for solving ODEs are important in practice but
are usually not studied as part of classical mechanics be-
cause these methods are not specic to mechanics and
are equally applicable to every differential equation. Nu-
merical methods for solving various equations are nor-
mally studied in a dedicated course that involves hands-
6
1. Introduction. Classical mechanics: from Newton to Arnold
on computer programming and problems from many
branches of science and engineering.
1.3. Newtonian mechanics
The rst successful theory of classical mechanics is con-
tained in Newtons three laws of mechanics that govern
the motion of point masses:
1. There exist reference frames where a point mass not
interacting with any other bodies will move with
constant speedin a xeddirection. (If this is not true
in some reference frame, then that reference frame is
not inertial. Further laws are formulated in inertial
frames.)
2. A point mass interacting with other bodies moves
with the acceleration a found from

F = ma, where

F is the sumof all forces acting on the body, mis the


mass of the body, and a is the acceleration, i.e. the
second derivative of the position vector r with re-
spect to the time.
3. All the forces acting on a given point mass are
caused by other point masses, and there are known
formulas for each kind of force (gravitaitonal, elas-
tic, electric, magnetic, gas pressure, friction, etc.).
Whenever a point mass Aexerts a force

F on a point
mass B, the point mass B also exerts the force

F
on the point mass A.
Thus, the motion of every point mass is described by
a differential equation which can be solved directly as
long as all relevant forces could be predicted or mea-
sured. I assume that you are already familiar with
these laws and with typical situations where they apply
(e.g. motion of bodies thrown at an angle near Earth) be-
fore you start studying theoretical mechanics.
In Newtonian mechanics, a rigid body is simply a
collection of point masses connected by massless rigid
sticks. These sticks are rigid in that they always produce
exactly such forces at their ends as to keep a constant
length and straight shape, regardless of any other forces
or motions. Thus the motion of rigid bodies can be de-
scribed on the basis of mechanics of point masses, with-
out introducing any other special rules. One derives the
concept of angular momentum, torque, etc., without any
additional postulates.
1.3.1. From Newtonian mechanics to
theoretical mechanics
The necessity to consider point masses is certainly incon-
venient if one needs to describe liquids and gases, so a
special branch of mechanics with its own formalism was
developed for that purpose, called continuum mechan-
ics (mechanics of continuous media). The formalism of
continuummechanics is further generalized to eld the-
ory where the basic object is not a point mass but a eld.
A eld is some abstract substance that is present at
once at all points in space and shows its inuence at
every point. Examples are: the gravitational eld, the
electric eld, the magnetic eld, the pressure of gas. A
eld is described by a function of space and time; for ex-
ample, the electric eld is described by a vector-valued
function

E(t, r). (In eld theory, one calls such a vector-
valued function a vector eld.) The behavior of elds
is usually governed by partial differential equations; for
example, the electric eld

E(t, x) and the magnetic eld

B(t, x) satisfy Maxwells equations.


As more and more complicated problems needed to
be solved, various mathematical tools were developedto
simplify and to generalize the mathematical description
of mechanics. Finally, the Lagrangian and the Hamilto-
nian formulations of mechanics were discovered. These
two formulations still remain the cornerstones of clas-
sical mechanics, electrodynamics, Einsteins theory of
gravitation (called General Relativity), and thus indi-
rectly of all modern theoretical physics. These formula-
tions of mechanics are not based on the assumption of
forces between bodies and are equally applicable to
point masses, rigid bodies, elds, and continuous me-
dia. The main subject of theoretical mechanics (some-
times also called analytical mechanics) is the study of
these more rened and more general mathematical for-
mulations of classical mechanics.
1.4. Overview: what is a minimal
standard course
The goal of this minimal standard course is to intro-
duce the material that you absolutely needto learn if you
would like to have a solid foundation for the study of
theoretical physics. Much of theoretical physics is based
on concepts such as the variation of the action, symme-
try transformations, conservation laws, and the phase
space. These concepts are normally studied in courses
of theoretical mechanics (as opposed to ordinary classi-
cal mechanics that studies the motion of bodies in terms
of forces, torques, and accelerations).
To study mechanics effectively, you should not only
read this text and solve the exercises in it, but also take a
textbook or a problem book in theoretical mechanics (or
download exercise problems offered in actual courses
posted on the Web) and solve at least ve of those prob-
lems per chapter or topic. If you dont know how to
solve standard problems, you havent really learned me-
chanics. A list of sample standard problems on the core
topics is found in Sec. 9. Please note that the exercises
in this text were not selected to have nice-looking an-
swers. From the point of view of physics, the number
3
_
5 sin

8 is no more and no less nice as


2
3
. Need-
7
1. Introduction. Classical mechanics: from Newton to Arnold
less to say, answers in real-life problems are not always
nice-looking.
1.4.1. Prerequisites for this course
You can solve simple algebraic equations, e.g. x
3
x
1 = 0.
You can solve simple differential equations with ini-
tial conditions, e.g. x(t) + 2x(t) = 3, x(0) = 1,
x(0) = 2.
You can manipulate two-dimensional and three-
dimensional vectors, compute vector sums, scalar
products, vector (cross) products, projections on
axes, and angles between lines.
You are familiar with basic linear algebra (matrix
multiplication, eigenvectors, diagonalization of ma-
trices). For instance, you can easily determine the
eigenvalues of the matrix
_
_
1 1 0
1 1 0
0 0 3
_
_
.
You can compute (or quickly look up) elementary
integrals such as
_
4
3

5x
2
+ 8 dx.
You are familiar with multivariate calculus and
can compute partial derivatives, for example

y
_
x
2
+y
2
. (Here,
y
is a shorthand notation for
the partial derivative /y.)
You are familiar with (European) school-level
mechanics, including Newtons laws, motion in
straight lines and in circles, and basic ideas about
rotation of rigid bodies and torque.
1.4.2. Core material
The minimal standard course in theoretical mechan-
ics consists of two interconnected tasks: the study of
the Lagrangian and the Hamiltonian formalisms, and
the study of mathematical methods for solving partic-
ular problems. The following are the main steps in the
course:
A general formulation of mechanics, applicable to
almost all systems, can be achieved using a vari-
ation principle (also known as the action princi-
ple). All the information about the laws of motion
of a particular mechanical system is encapsulated
by its Lagrangian, which is a function L(q
i
, q
i
, t).
Given a Lagrangian for a mechanical system, one
can straightforwardly derive the equations of mo-
tion for that system.
You already know how simple mechanical prob-
lems can be solved by elementary methods (using
forces, torques, etc.). Nowthe same problems are re-
formulated using Lagrangians. You will see that it is
rather straightforward to nd the Lagrangian for a
particular systemand to derive the equations of mo-
tion. Then it will become apparent that many prob-
lems are much easier to solve using the Lagrangian
approach.
You need to gain some experience nding La-
grangians for various situations in mechanics. Ba-
sically, you should be able to deal with any setup
involving idealized objects: point masses, perfectly
elastic springs, perfectly rigid sticks, unstretchable
ropes, frictionless rails and pendulums, homoge-
neous discs, rigid cylinders rolling along perfectly
smooth inclines, and so on. You need to learn how
to best choose the coordinates and how to discover
conservation laws in such systems. This comes only
with practice; you will need to solve many practice
problems.
Small oscillations about an equilibrium point can be
treated in the general case by making a harmonic
approximation in the Lagrangian. You need to
learn the corresponding mathematical techniques.
Rotation of rigid bodies presents extra difculties
because of the complicated geometry. You need to
learn some methods for dealing with these situa-
tions efciently (tensor of inertia, Euler angles, ro-
tating frames of reference).
The Hamiltonian formulation of mechanics is de-
rived from the Lagrangian formulation. (For solv-
ing practical problems, the Hamiltonian formalism
is less useful than the Lagrangian one, although it is
somewhat more elegant mathematically. However,
the Hamiltonian formalism is of such extraordinary
importance in theoretical physics that it is usually
studied early on, within the course of theoretical
mechanics.)
Special relativity usually introduces 4-vectors (vec-
tors in the four-dimensional spacetime). In the
Lagrangian formulation, special relativity can be
viewed as a perfectly ordinary mechanical theory
with a different formula for the kinetic energy in the
Lagrangian, without any 4-vectors. However, you
absolutely need to develop some physical intuition
for relativistic effects and also become familiar with
the four-dimensional description of the world as a
spacetime. Without special relativity and the four-
dimensional view of the world, the road to much
of theoretical physics is closed. Therefore, a La-
grangian description using 4-vectors is also studied
at this point.
Regarding the mathematical methods needed to actually
solve practical problems:
8
1. Introduction. Classical mechanics: from Newton to Arnold
You need to learn some basic ideas of the calculus
of variations. In particular, you need to understand
the concept of a functional, and to learn to compute
functional derivatives (i.e. variations of function-
als). Otherwise you cannot really understand the
use of the variational principle, which is very impor-
tant in all of theoretical physics.
You need to learn how to solve various differential
equations and systems of such equations. The typi-
cal equation is that of the harmonic oscillator with a
driving force, x+
2
x = f(t). Typically, you need to
be able to nd general solutions as well as particular
solutions for given initial conditions. At this point,
we study only equations that can be solved exactly.
You need to gain experience analyzing the behav-
ior of solutions of various types of equations: lin-
ear systems, oscillations and resonance, and some
simple kinds of solvable nonlinear equations such
as the Kepler problem. These are very old problems
that occur time and again in physics, so you should
become very familiar with the qualitative features
of their solutions.
You usually need to solve several practice problems for
each mathematical method, so that you can gain experi-
ence and really understand how and when these meth-
ods work.
1.4.3. Extra material
The above topics are considered standard because one
cannot continue studying theoretical physics without
having mastered them. In a course of theoretical me-
chanics, one can also study more advanced topics that
build upon the standard material and lead to other areas
of physics. Here are some such topics:
Elements of scattering theory: elastic/inelastic scat-
tering; cross-sections in central potentials; relativis-
tic scattering (preparation for particle physics).
Elements of perturbation theory: applications to
nonlinear oscillations, parametric resonance (useful
for celestial mechanics and other things).
A general denition of symmetry and the
derivation of conservation laws from symmetry
(Noethers theorem). This is an essential prepara-
tion for gauge eld theory, which is the basis for
particle physics.
Advanced Hamiltonian methods: differential
forms, integral invariants, symplectic structures
(tools to discover conservation laws and to explain
the structure of the theory more deeply; useful in
eld theory).
Canonical transformations, Hamilton-Jacobi equa-
tion (tools to discover integrable systems, needed
mostly in mathematical physics.)
The action-angle variables, questions of integrabil-
ity, adiabatic invariants (preparation for chaos the-
ory).
Hamiltonian formalism for constrained degenerate
systems (preparation for gauge eld theory).
Use of quaternions instead of Euler angles to de-
scribe the orientation of rigid bodies (useful e.g. for
numerical simulations of aircraft motion).
Symmetries and small oscillations in multi-
dimensional systems (e.g. oscillations of molecules).
Elements of continuous mechanics: continuity
equations, stress-energy tensor, Euler equation (use-
ful in hydrodynamics, electrodynamics of media,
kinetics, etc.)
Some of these topics are usually included, depending
on the lecturers preference; none of these are abso-
lutely essential for an initial study of theoretical mechan-
ics. History of theoretical mechanics is usually not stud-
ied; rather, students learn the contemporary, very much
streamlined and simplied formulation of mechanics.
1.5. Textbooks
There exist an extraordinarily large number of textbooks
in theoretical mechanics, because it is a fairly old and
well-studied subject. There are also many online lec-
ture notes on theoretical mechanics. You need any text-
book on classical mechanics that you can understand
and that talks about Lagrangians and Hamiltonians
at length. (Books that only talk about accelerations,
forces, and torques may be quite advanced, but they do
not cover the subject of theoretical mechanics.) If you start
reading a textbook and things dont make sense, consult
another book. Here are some starting points:
H. Goldstein, Classical Mechanics. A standard old
classic.
V. I. Arnold, Mathematical Methods of Classical Me-
chanics. Excellent for more mathematically minded
students (e.g. it heavily relies on differential forms).
9
Part I.
Core topics
10
2. A brief primer on differential equations
You may skip this primer and go to the next section if
you can easily solve the following differential equations
with initial conditions:
dx(t)
dt
= 4x(t) + 8t, x(0) = 0
d
2
x(t)
dt
2
+ 15x(t) = 10, x(0) = 1, x(0) = 0
dx(t)
dt
= 3t
_
x(t), x(0) = 1
If these examples seemdifcult, youneedto refresh your
knowledge of calculus and differential equations. This
section is a crash course which serves as a refresher
but does not substitute a good background in calculus.
2.1. Integration
Integration is the operation which is inverse to differen-
tiation. This operation is very important for theoretical
physics, and you should become familiar with comput-
ing integrals.
Denite integrals are written as follows:
_
b
a
f(x)dx.
The indenite integral, denoted by
_
f(x)dx or by
_
x
f(x)dx, is merely a shorthand for
_
x
x
0
f(x

)dx

, where
x
0
is not used in later calculations and/or is chosen for
convenience in some natural way.
Some basic integrals:
=
1
a
_
f(x)dx;
_
t
n
dt =
t
n+1
n + 1
_
e
x
dx = e
x
;
_
dx
x
= ln x;
_
cos xdx = sin x;
_
sin xdx = cos x;
_
dx
1 +x
2
= arctan x;
_
dx

1 x
2
= arcsin x.
Change of variables: suppose
_
f(x)dx is known, then
_
f(ax)dx =
1
a
_
f(ax)d(ax) =
1
a
_
f(z)dz.
More generally, introduce a new variable z dened by
x = g(z), where g(z) is a function, then
_
f(x)dx =
_
f(g(z))
dg(z)
dz
dz.
Examples of integrals computed by change of variables:
_
dz
a +bz
=
1
b
ln (a +bz)
_
g(z) = a +bz, f(x) =
1
x
_
_
dx
z
2
sin
1
z
= cos
1
z
_
g(z) =
1
z
, f(x) = sinx
_
Integration by parts is another useful technique: one
starts from the identity
d
dx
(f(x)g(x)) = f(x)
d
dx
g(x) +g(x)
d
dx
f(x)
and integrates in x from x = a to x = b. One nds
_
b
a
f(x)g

(x)dx = f(x)g(x)|
b
a

_
b
a
f

(x)g(x)dx.
For example (here we do not write the limits of integra-
tion):
_
(ln x)dx =
_
(x)

(ln x)dx
= xln x
_
x(ln x)

dx
= xln x x.
Note: trigonometric functions are sometimes more
conveniently written in terms of complex exponentials
(using the Euler formula):
e
ix
= cos x +i sin x (Eulers formula)
cos x =
e
ix
+e
ix
2
; sin x =
e
ix
e
ix
2i
.
Exercises: Compute the following indenite integrals.
Check each answer by computing its derivative.

_
e
2x
sin xdx =

_
t
2
sin t dt =

_
sin
2
xdx =

_
dx

2x+1
=

_
x

3x + 1 dx =
2.2. Differentiation of integrals
An indenite integral is a function of its upper limit, for
example:
F(x) =
_
x
0
f(y)dy.
11
2. A brief primer on differential equations
By denition, integration is the operation inverse to dif-
ferentiation; so this function F(x) is such that
f(x) =
dF(x)
dx
=
d
dx
_
x
0
f(y)dy.
What if we integrate fromx to a xednumber, rather than
from a xed number to x? For example, what is
d
dx
_
1
x
f(y)dy =?
To gure this, we write
_
1
0
f(y)dy =
__
x
0
+
_
1
x
_
f(y)dy,
where the choice of xed bounds (0 and 1) could be dif-
ferent and is made just as an illustration. Since
_
1
0
f(y)dy
is a xed, x-independent number, we have
d
dx
_
1
0
f(y)dy = 0,
hence
d
dx
_
1
x
f(y)dy =
d
dx
_
x
0
f(y)dy = f(x).
Now let us consider somewhat more complicated ex-
amples. The chain rule gives
d
dx
_
g(x)
0
f(y)dy = g

(x)f(x).
What if the variable x is contained also in the integrand?
For instance, if the function F(x) is dened by
F(x)
_
x
0
f(x, y)dy,
then what is dF/dx? To compute this, let us consider
F(x)
_
x
0
f(x, y)dy as a function of two auxiliary vari-
ables, A and B,
G(A, B) =
_
A
0
f(B, y)dy.
If we now compute G(A, B), then we will have F(x) =
G(x, x) and the derivative dF/dx will be expressed as
dF
dx
=
d
dx
(G(A, B)|
A=B=x
) =
_
G
A
+
G
B
_

A=B=x
.
So we obtain the following useful formula
d
dx
_
x
0
f(x, y)dy = f(x, x) +
_
x
0
f(x, y)
x
dy.
In this way, one can differentiate arbitrary expressions
involving integrals, even if one is unable to compute
these integrals analytically.
Exercise: Compute the following derivatives of in-
tegrals (but do not actually evaluate any of these inte-
grals!).
d
dx
_
4x+3
3x+1
exp
_
z
_
dz =?
d
dx
_
sinx
0
ln
_
cosh

1 + 2xz
_
dz =?
d
dx
_
g(x)
f(x)
F(x +y)dy =?
d
dx
_
_
R
1
x
g(t)dt
R
x
0
f(t)dt
F
__
y
x
G(z)dz
_
dy
_
=?
Here, F(x) and G(x) are considered to be known func-
tions. The nal expressions may contain derivatives of
F(x) and G(x) and integrals of these derivatives, but no
derivatives of integrals.
2.3. General solutions and particular
solutions
In this section, differential equations are written for un-
known function x(t). Derivatives are denoted by over-
dots: x
dx
dt
, x
d
2
x
dt
2
, etc.
The general solution of a differential equation is a
function that solves the equation and contains arbitrary
constants, such that every solution can be expressed by
selecting particular values of the constants. For equa-
tions with rst derivatives (rst-order equations) it suf-
ces to have only one constant; for second-order equa-
tions, one needs two constants, etc.
It is easy to check that the general solution is correct:
just substitute the function back into the equation.
Examples:
Find the general solution of the equation x(t) = 0.
Answer: x(t) = A, where A is an arbitrary constant.
Find the general solution of x = 2. Answer: x(t) =
A+ 2t.
Find the general solution of x = 4t +cos 2t. Answer:
x(t) = A + 2t
2
+
1
2
sin 2t.
Find the general solution of x + x = 0. Answer:
x(t) = Ae
t
.
Find the general solution of the second-order equa-
tion x = 0. Answer: x(t) = A+Bt.
Find the general solution of x + 7x = 0. Answer:
x(t) = Acos(t

7) +Bsin(t

7).
Find the general solution of x 7x = 0. Answer:
x(t) = Ae
t

7
+Be
t

7
.
12
2. A brief primer on differential equations
Find the general solution of x + 5 x + 4x = 0. Solu-
tion: we look for x(t) = Ae
t
. Then must be such
that
2
+ 5 + 4 = 0. This quadratic equation has
two solutions, = 1 and = 4. So the general
solution is x(t) = Ae
t
+Be
4t
.
Exercises: Find the general solution of the following
equations. Check each answer by substituting it into the
equation.
x = 2x
x = 3t +e
t
x = x
x = 10x
x = 2006 x + 2006x
Particular solutions are selected from general solutions
by conditions, such as the initial conditions:
x = 1, x(0) = 1, x(0) = 2. (2.1)
The general solution of x = 1 is x(t) = A + Bt +
1
2
t
2
.
The conditions x(0) = 1, x(0) = 2 are satised only if
A = 1, B = 2. Therefore, the particular solution of equa-
tion (2.1) is x(t) = 1 + 2t +
1
2
t
2
.
To nd particular solutions, we rst nd the general
solution with arbitrary constants and then determine the
values of these constants using the initial conditions.
Exercises: Solve the following equations with initial
conditions. Draw rough, qualitative plots of the result-
ing functions x(t).
x + 3x = 0, x(0) = 2
x x = 0, x(2) = 1
x 4x = 0, x(0) = 1, x(0) = 1
x = 4t, x(0) = 1, x(0) = 1
Note that there are usually innitely many functions that
satisfy a given differential equation. The general solu-
tion represents all these functions by means of a formula
with arbitrary constants. A particular solution is selected
by conditions, and one needs as many conditions as un-
known constants. So, e.g. for a second-order differen-
tial equation, there are two arbitrary constants, and one
needs two conditions to specify a unique solution. In-
stead of specifying two conditions conditions at the same
value of time, such as x(0) = x(0) = 0, one can spec-
ify two conditions at different values of time, e.g. x(0) =
1, x(1) = 2, and then ask for the solution in the interval
0 < t < 1 that satises these conditions. Such condi-
tions are called boundary conditions. Boundary-value
problems are differential equations with boundary con-
ditions rather than initial conditions.
Exercises: Solve the following boundary-value prob-
lems.
3 x + 4x = 0, x(0) = 0, x(2) = 5
x = 4e
t
, x(0) = 1, x(1) = 0
x + 4x = 0, x(0) = 0, x() =
x +x = 0, x(0) = 2, x(2) = 2
Hint: The last two equations have tricky boundary con-
ditions!
Some simple inhomogeneous equations
Equations of the form x = Bx have the general solu-
tion x(t) = Ae
Bt
. Note that x(t) = 0 is a solution (al-
though perhaps not a very interesting one). What about
x = Bx + 10? Now x(t) = 0 is not a solution any more.
Equations for x are called inhomogeneous if x = 0 is not
a solution.
In fact, the general solution of x = Bx + 10 is x(t) =
Ae
Bx

10
B
. How to guess this? Write x(t) = Ae
Bt
+ C
and substitute into x = Bx+10, then nd the value of C
that ts.
Exercises: Solve the following equations with initial
conditions.
x + 3x = 2, x(0) = 1
x + 5 x + 4x = 8, x(0) = 2, x(0) = 0
2 x + 18x = 1, x(1) = 1, x(1) = 0
x + 9 x = 2, x(0) = 0, x(0) = 3
Hint: In the last equation, replace x(t) by an unknown
function v(t).
Let us now consider second-order equations. We
know that x +
2
x = 0 has the general solution x(t) =
A
1
cos t + A
2
sin t. How to solve an inhomogeneous
equation such as
x +
2
x = +t ? (2.2)
Try substituting x(t) = C
1
+C
2
t into Eq. (2.2) and deter-
mine the correct values of C
1
and C
2
. We nd the solu-
tion x(t) = ( +t)
2
. But this is not the general so-
lution since it does not contain any arbitrary constants;
this is just a particular solution. Now note that
x(t) = ( +t)
2
+ sin t
is also a solution of Eq. (2.2). Note also that sin t is a
solution of the homogeneous version of Eq. (2.2), i.e.
x +
2
x = 0,
which is Eq. (2.2) without the right-hand side. There-
fore in fact we may add the general solution, Acos t +
Bsin t, of the homogeneous equation, to the particular
solution ( +t)
2
. The result is
x(t) = ( +t)
2
+Acos t +Bsin t.
13
2. A brief primer on differential equations
This is the general solution of Eq. (2.2).
This is a manifestation of a general principle that helps
with inhomogeneous equations. The general solution of
the inhomogeneous equation is equal to the general so-
lution of the equation without the right-hand side (the
homogeneous equation) plus a particular solution of
the equation with the right-hand side (the inhomoge-
neous equation). The particular solution of the inhomo-
geneous equation can be guessed; any such solution will
do.
Exercises: Solve the following inhomogeneous equa-
tions with initial conditions.
x + 2x = 3t 1, x(0) = x(0) = 0
x 4x = t
2
4t, x(0) = x(0) = 0
x + x +x = t
3
+t
2
+t, x(0) = x(0) = 0
2.4. Method of variation of constants
Hwo to solve x = Bx +f(t), where f(t) is some compli-
cated function?
Such equations are solved with the method of vari-
ation of constants. The solution is found as x(t) =
A(t)e
Bt
, where A(t) is an unknown function. Substitut-
ing into x = Bx +f(t), we have

Ae
Bt
+BAe
Bt
= BAe
Bt
+f(t),
therefore the function A(t) satises the equation

A =
e
Bt
f(t). Its general solution is
A(t) =
_
t
t
0
e
Bt
1
f(t
1
)dt
1
,
where t
0
is an arbitrary constant. Therefore, the general
solution for x(t) is
x(t) = e
Bt
_
t
t
0
e
Bt
1
f(t
1
)dt
1
.
This can be also rewritten as
x(t) = Ae
Bt
+e
Bt
_
t
0
e
Bt
1
f(t
1
)dt
1
= Ae
Bt
+
_
t
0
e
B(tt
1
)
f(t
1
)dt
1
,
where now A is an arbitrary constant.
Example: Solve x +x = t.
Solution: We rst write x(t) = A(t)e
t
; the function
A(t) is found from

A = te
t
, so A(t) =
_
te
t
dt = te
t

e
t
+ C. So the general solution is x(t) = Ae
t
= t
1 + Ce
t
. We could also guess this general solution by
writing x(t) = C
1
t + C
2
+ Ae
t
and nding the correct
values of C
1
, C
2
.
Exercises: Solve the following equations with initial
conditions.
x 2x = 3t
2
2t, x(0) = 1
x + 2x = e
t
, x(0) = 1
x +x = e
t
, x(0) = 1
More general equations:
x xf(t) = 0 x = Ae
R
f(t)dt
For example: x +2tx = 0 has the general solution x(t) =
Ae
t
2
.
Exercises: Find the general solution of the following
equations.
x +t
2
x = 0
x +e
t
x = e
t
The general solution for an equation of the form
x xf(t) = g(t)
is
x(t) = e
R
t
t
0
f(u)du
_
C +
_
t
t
0
g(u)e

R
u
t
0
f(v)dv
du
_
.
This solution can be obtained by variation of constant
in the ansatz A(t)e
R
f(t)dt
. The value t
0
is arbitrary and
can be chosen for convenience. We wrote the solution
using two constants (t
0
and C) because this form of the
solution is more convenient; in fact, only one of these
constants is independent: a change in C can be com-
pensated by an appropriate change in t
0
. An equivalent
form of the general solution with just one constant (t
0
) is
x(t) =
_
t
t
0
g(u)e
R
t
u
f(v)dv
du.
This form is convenient if the initial condition is x(t
0
) =
0.
2.5. Method of separation of
variables
A useful method applies to differential equations of the
form
x = f(x)g(t). (2.3)
For example, after some rearrangement the differential
equations x = x
2
t
2
and x

x = cos 2t are of this form.


To solve these equations, we use the trick called sepa-
ration of variables. We look for the solution of the form
F(x(t)) = G(t), where F and Gare some unknown func-
tions. (This is called an implicit form of the solution be-
cause the relation F(x) = G(t) still needs to be solved
for x to nd x(t) explicitly.) If the solution were found
14
2. A brief primer on differential equations
in this form, then we would have
d
dt
F(x(t)) =
d
dt
G(t),
which is equivalent to
dF(x)
dx
dx
dt
=
dG
dt
.
This will be equivalent to the original differential equa-
tion, x = f(x)g(t), if the functions F(x), G(t) satisfy
dF(x)
dx
=
1
f(x)
,
dG(t)
dt
= g(t).
These equations are easy to solve:
F(x) =
_
x
1
f(x)
dx, G(t) =
_
t
g(t)dt.
If we can compute these functions F(x) and G(t), we will
nd the general solution of Eq. (2.3) as F(x) = G(t). In
any case, we may write the general solution of Eq. (2.3)
in an implicit form as follows,
_
x
x
0
1
f(x)
dx =
_
t
t
0
g(t)dt.
Here, x
0
and t
0
are arbitrary constants of integration (of
which only one is independent). The above solution sat-
ises the initial condition x(t
0
) = x
0
.
Example: Consider the equation x = x
2
t
2
. We write
1
x
2
x = t
2

_
dx
x
2
=
_
t
2
dt

1
x
+C =
t
3
3
x(t) =
1
C
1
3
t
3
Note that only one constant of integration is necessary,
despite the presence of two indenite integrals.
Exercises: Find the general solution of the following
equations using the method of separation of variables.
x =
x
t

dy(x)
dx
=
1+y
1+x

_
x(t)
dx(t)
dt
=
1
5t
2.6. Miscellaneous cases when
solutions are guessed
One case is second-order equations with source. (In
physics, a source is a nonzero function in the right-hand
side of a linear differential equation.) We guess the par-
ticular solution of the inhomogeneous equation by writ-
ing an ansatz with unknown coefcients. Here are some
examples:
x +x = cos(2t) x(t) = C
1
cos(2t), then nd C
1
x + 4 x + 3x = sin t
x(t) = C
1
sint +C
2
cos t, then nd C
1,2
x +x = sin t x(t) = C
1
t cos t, then nd C
1
Note: in the last example, we need a term t cos t because
sin t and cos t are already solutions of the homogeneous
equation!
Another example:
x + 2 x +x = 0
We look for solutions in the form x(t) = e
t
, nd
2
+
2 +1 = 0 and only one root = 1. Then we again use
the trick with and extra factor t: the general solution is
not Ce
t
with one constant, but x(t) = C
1
e
t
+ C
2
te
t
with two constants (as it should be for a second-order
equation).
Exercises: Solve the following equations with initial
conditions.
x + 4x = sin t + 2t, x(0) = x(0) = 1 (First guess
the particular solution for the source sint, then for
the source 2t.)
x + 4x = sin(2t) + 1, x(0) = x(0) = 1
x 4 x + 4x = 2, x(0) = x(0) = 1
More generally, suppose we need to nd the solution of
the inhomogeneous equation
A x +B x +Cx = Ke
t
, (2.4)
where A, B, C, K, are (perhaps complex-valued) con-
stants. (This also covers sources of the form sin x). First,
we need to nd the general solution of the homogeneous
equation
A x +B x +Cx = 0.
We look for the solution x(t) = e
t
. Then should satisfy
the characteristic equation, A
2
+ B + C = 0. This
quadratic equation may have either two unequal roots

1
=
2
, or just one root
1
. The general solution will be
C
1
e

1
t
+C
1
e

2
t
, if
1
=
2
;
(C
1
+C
2
t) e

1
t
, if
1
=
2
.
Now we need to examine the source term, Ke
t
, and
to guess the particular solution of the inhomogeneous
equation. We distinguish two cases, depending on the
value of . If is not equal to either of
1
,
2
, then e
t
is not a solution of the homogeneous equation, therefore
we may look for an ansatz of the form e
t
. By substitu-
tion we nd
=
K
A
2
+B +C
.
Then the general solution of Eq. (2.4) is
x(t) = C
1
e

1
t
+C
1
e

2
t
+e
t
, or
x(t) = (C
1
+C
2
t) e

1
t
+e
t
.
If is equal to one of
1
,
2
, then is a root of A
2
+B+
C = 0, so the above formula for will not work. In this
case, if =
1
=
2
, the ansatz is x(t) = te
t
, where
=
K
2A +B
.
15
2. A brief primer on differential equations
The general solution of Eq. (2.4) is
x(t) =
_
C
1
+
Kt
2A +B
_
e

1
t
+C
2
e

2
t
.
Finally, if =
1
=
2
, the ansatz is x(t) = t
2
e
t
, where
= K/(2A), so the general solution is
x(t) =
_
C
1
+C
2
t +
Kt
2
2A
_
e
t
.
By combining terms obtained in this way, it is possi-
ble to construct general solutions for equations of the
form (2.4) with arbitrary sources of the form e
t
. Sim-
ilarly, one can nd solutions for sources of the form
te

1
t
+t
2
e

2
t
, etc.
Systems of differential equations: for example,
x = y, y = 2x
may be solved either by differentiation: x = y x = 2x,
or by directly guessing the solution in the form x(t) =
C
1
e
t
, y(t) = C
2
e
t
. Note: Since these equations are lin-
ear, you should add all the possible pieces of the general
solution with different values of .
Exercise: By guessing solutions in the form x(t) =
C
1
e
t
, y(t) = C
2
e
t
, nd the general solution of the sys-
tem
x +x = y, y +y = 3x.
16
3. Introducing Lagrangians
In this section I explain how a mechanical system can
be described using the Lagrangian formalism, instead of
using forces and accelerations.
3.1. Mechanics considered using
forces
In Newtonian mechanics, a mechanical systemis always
made up of point masses or rigid bodies, and these are
subject to known forces. One must therefore specify the
composition of the system and the nature of forces that
act on the various bodies. Then one writes the equa-
tions of motion for the system. Here are some exam-
ples of how one describes mechanical systems in New-
tonian mechanics (these examples are surely known to
you from school-level physics).
Example: a free point mass
This is the most trivial of all mechanical systems: a point
mass that does not interact with any other bodies and
is subject to no forces. Introduce the coordinates x, y, z
to describe the position of the point mass. Since the
force is always equal to zero, the equations of motion
are x = 0, y = 0, z = 0. The general solution of these
equations describes a linear motion with constant veloc-
ity: x = x
0
+v
x
t, etc.
Example: two point masses with springs
|
| \/\/\/ (m1) \/\/\ (m2) ---> x
|
Two masses can move along a straight horizontal line
(the x axis) without friction. The mass m
1
is attached
to a wall by a spring, and the mass m
2
is attached to
the mass m
1
by a spring (see gure). Both springs have
spring constant k and the unstretched length L.
Here is how one solves this problem using school-
level methods. To write the equations of motion, we
rst introduce the two coordinates x
1
, x
2
and then con-
sider the forces acting on the two masses. The force on
the mass m
1
is the sum of the leftward-pointing force
F
1
from the left spring and the rightward-pointing force
F
2
from the right spring. The force on m
2
is a leftward-
pointing F
2
. By denition of a spring we have F
1
=
k(x
1
L) and F
2
= k(x
2
x
1
L). Therefore we write the
equations for the accelerations a
1
, a
2
of the two masses:
m
1
x
1
= F
2
F
1
= k(x
2
x
1
L) k(x
1
L),
m
2
x
2
= F
2
= k(x
2
x
1
L).
At this point we are nished describing the laws of mo-
tion of this system; we could now solve these equations
for particular initial conditions and determine the actual
motion of the system.
3.2. Introducing the action principle
The Lagrangian description of a mechanical system is
rather different from the Newtonian one. First, we do
not ask for the evolution of the system from some ini-
tial conditions, but instead ask for the evolution between
two nearby time moments t
1
and t
2
if the positions at
t
1
and at t
2
are known. For convenience, let us collect
all the coordinates (such as x, y, z or x
1
, x
2
above) into
an array of generalized coordinates and denote them
by q
i
. So the boundary conditions that we impose
on the system are q
i
(t
1
) = A
i
and q
i
(t
2
) = B
i
, where
A
i
, B
i
are xed numbers, and t
1
, t
2
are two moments of
time. We now ask: How does the system evolve dur-
ing the (short) time interval between t
1
and t
2
? The
Lagrangian description answers: During that time, the
system must move in such a way as to give the minimum
value to the integral
_
t
2
t
1
L(q
i
, q
i
, t)dt, where L(q
i
, q
i
, t) is a
known function called the Lagrange function or just La-
grangian. For example, the Lagrangian for a free point
mass m (on which no forces act) is
L(x, y, z, x, y, z) =
m
2
[ x
2
+ y
2
+ z
2
].
So, according to the Lagrangian description, the free
point mass moves in such a way that the functions
x(t), y(t), z(t) give the minimum value to the integral
_
t
2
t
1
m
2
[ x
2
+ y
2
+ z
2
]dt, where the values of x(t), y(t), z(t)
at times t
1,2
are xed. The Lagrangian for the above ex-
ample with two masses attached to a wall is
L(x
1
, x
2
, x
1
, x
2
) =
1
2
[m
2
x
2
1
+m
2
x
2
2
k
2
(x
1
L)
2
k
2
(x
2
x
1
L)
2
The value of the integral
_
L(q
i
, q
i
, t)dt is called the ac-
tion corresponding to a particular trajectory q
i
(t). There-
fore the requirement that the integral should have the
smallest value on the correct trajectory q

i
(t) is often
called the principle of least action or just action princi-
ple.
1
(This is just terminology; nobody is acting here.)
1
We should note that it is only in simple cases (or for short time
17
3. Introducing Lagrangians
3.2.1. Questions and answers
If this is your rst encounter with Lagrangians and the
action principle, you may be feeling puzzled about
some the following questions:
1. Howto determine the function q

i
(t) that minimizes
some integral involving q
i
(t) and q
i
(t)? Does one
have to try every possible function q
i
(t) to see which
one gives the smallest value to the integral?
2. How can it be that the correct trajectory q

i
(t) is
found not by considering forces acting on bodies,
but by requiring that some strange-looking integral
should have the minimum value? How does each
point mass know that it needs to minimize some
integral when it moves around?
3. Suppose we learn how to determine q

i
(t), and
suppose that q

i
(t) is indeed the correct trajectory
(i.e. the trajectory that has the correct accelerations
according to Newtons laws). Then, is the La-
grangian description more useful or more conve-
nient than the Newtonian description (forces and
accelerations)?
4. Suppose we understand why the Lagrangian de-
scription is useful. How does one guess the correct
Lagrangian L(q
i
, q
i
, t) for a given physical system?
Short answers:
1. We shall study an efcient mathematical formalism,
called calculus of variations, and use it to deter-
mine the trajectory q

i
(t). One does not need to try
every possible function q
i
(t).
2. Using calculus of variations, the condition that
some integral should have the minimum value on
the function q
i
(t) is translated into a differential
equation for q
i
(t). If the Lagrangian L(q
i
, q
i
, t) is
chosen correctly, the least-action requirement will
be translated into differential equations that are
equivalent to the correct Newtonian equations for
the accelerations and forces. Thus, the least-action
requirement is mathematically equivalent to the con-
sideration of forces. The point masses perhaps
know nothing about the action integral. It is sim-
ply more convenient to formulate the mechanical
laws in one sentence rather than in many sentences.
(A more detailed explanation is in section 11.1 be-
low.) Obviously the Lagrangian needs to be differ-
ent for each mechanical system since the equations
of motion are different.
intervals [t1, t2]) that the trajectory q

i
(t) is the minimum of the
action integral. In general, the trajectory q

i
(t) is a local extremum
(could even be a maximum) of the action.
3. The Lagrangian description is much more powerful
than the Newtonian description. For instance, one
is free to use arbitrary coordinates (not only Carte-
sian). In many cases, it is much easier to set up a
Lagrangian and derive the equations of motion than
to derive these equations by considering forces. You
will see many examples of this as you proceed.
4. Basically, the Lagrange function is equal to the ki-
netic energy minus the potential energy, computed
in an inertial system of reference. There are some
other technical aspects of choosing the Lagrange
function which we shall consider later. After study-
ing several examples, you will gain sufcient com-
mand of Lagrangians so that you will be able to
choose the correct Lagrange function for any situ-
ation.
In the following sections we shall rst look at how the
mathematical requirement of least action can be equiv-
alent to Newtonian equations of motion, and then see
how to choose correct Lagrange functions for various
systems.
3.3. Variation of a functional
Finding the minimum of an integral expression, such as
_ _
x
2
x
2
_
dt, with respect to all functions x(t), is sim-
ilar to nding the minimum of a function F(z
1
, z
2
, ...)
with respect to all the variables z
1
, z
2
, ... The differ-
ence is that F(z
1
, ..., z
n
) depends on n variables, whereas
_ _
x
2
x
2
_
dt depends on the values of x(t) at every
point t, i.e. it depends on the whole function at once.
Quantities that depend on the whole function at once
are called functionals. Functionals can be visualized as
functions of innitely many variables (a functional de-
pends on the value of x(t) for every t).
Formally, a function is a map from numbers into num-
bers; a functional is a map from functions into numbers.
An application of a functional to a function is usually
denoted by square brackets, e.g. S[f(x)] or, more briey,
S[f].
Here are some randomexamples of functionals, just to
illustrate the concept:
S[f(x)] =
_

0
_
e
f(x)
+
_
d
3
f
dx
3
_
dx
S[f(x)] =
_
1
1
f(3 cos(x))
1 x
2
dx
S[f(x)] = f(15) 8f

(3)
_
1
0
dx
_
1
0
dy sin
_
e
y

x
f(x y)
_
In principle, a functional can be any rule that assigns
numbers to functions. Of course, only some functionals
have interesting applications in physics.
18
3. Introducing Lagrangians
Since the action integral maps trajectories q
i
(t) into
numbers, we can call it the action functional. The ac-
tion principle is formulated as follows: the trajectory
q
i
(t) must be such that the action functional evaluated
on this trajectory has the minimum among all trajecto-
ries (or, more generally, a local extremumwith respect to
nearby trajectories).
This may appear to be similar to the familiar condi-
tion for the mechanical equilibrium: The coordinates
x, y, z are such that the potential energy has the min-
imum value. However, there is a crucial (if technical)
difference: when we minimize the potential energy, we
vary the three numbers x, y, z until we nd the mini-
mumvalue; but when we minimize a functional, we have
to vary the whole function q
i
(t) until we nd the mini-
mum value of the functional.
The branch of mathematics known as calculus of vari-
ations studies the problem of minimizing (maximizing,
extremizing) functionals. One needs to learn a little bit
of variational calculus at this point. Let us begin by solv-
ing some easy minimization problems involving func-
tions of many variables; this will prepare us for dealing
with functionals which can be thought of as functions of
innitely many variables. You should try the examples
yourself before looking at the solutions.
Example 1: Minimize the function f(x, y) = x
2
+xy +
y
2
with respect to x, y.
Solution: Compute the partial derivatives of f with re-
spect to x, y. These derivatives must both be equal to
zero. This gives 2x+y = 0, x+2y = 0. The only solution
is x = 0, y = 0.
Example 2: Minimize the function f(x
1
, ..., x
n
) = x
2
1
+
x
1
x
2
+x
2
2
+x
2
x
3
+... +x
2
n
with respect to all x
j
.
Solution: Compute the partial derivatives of f with re-
spect to all x
j
, where j = 1, ..., n. These derivatives must
all be equal to zero. This gives a system of equations:
2x
1
+x
2
= 0, x
1
+2x
2
+x
3
= 0, ..., x
n2
+2x
n1
+x
n
= 0,
x
n1
+2x
n
= 0. This systemcan be easily solved by sub-
stitution: one gets x
1
=
1
2
x
2
, ..., x
n1
=
n1
n
x
n
, thus
n1
n
x
n
= 0 and so all x
j
= 0.
Example 3: Minimize the function f(x
0
, ..., x
n
) =
(x
1
x
0
)
2
+ (x
2
x
1
)
2
+ ... + (x
n
x
n1
)
2
with respect
to all x
j
subject to the restrictions x
0
= 0, x
n
= A.
Solution: Compute the partial derivatives of f with
respect to x
j
, where j = 2, ..., n 1. These deriva-
tives must all be equal to zero. One nds the equations
x
j
x
j1
= x
j+1
x
j
for j = 1, 2, ..., n 1. The values
x
0
, x
n
are known, therefore we nd x
j
= jA/n.
3.3.1. Intuitive calculation
Let us nowconsider the problemof minimizing the func-
tional S[x(t)] =
_
1
0
( x(t))
2
dt with respect to all functions
x(t) subject to the restrictions x(0) = 0, x(1) = L. We
shall rst perform the minimization in a more intuitive
but approximate way, and then we shall see how the
same task is handled more elegantly by the calculus of
variations.
Let us imagine that we are trying to minimize the in-
tegral S[x(t)] with respect to all functions x(t) using a
digital computer. The rst problem is that we cannot
represent all functions x(t) on a computer because we
can only store nitely many values x(t
0
), x(t
1
), ..., x(t
N
)
in an array within the computer memory. So let us
split the time interval [0, 1] into a large number N of
intervals [0, t
1
], [t
1
, t
2
], ..., [t
N1
, 1], where the step size
t
j
t
j1
t = 1/N is small; in other words, t
j
=
j/N, j = 1, ..., N 1. We can approximately describe
a function x(t) by storing its values x
j
at the points t
j
and assuming that the function x(t) is almost a straight
line between these points. (Such functions are called
piecewise-linear.) The time moments t
1
, ..., t
N1
are
kept xed, and the various values x
j
correspond to var-
ious possible functions x(t). In this way we denitely
will not describe all the possible functions x(t), but the
class of functions we do describe is broad enough so that
we get the correct results in the limit N . Basically,
any reasonable function x(t) can be sufciently well ap-
proximated by piecewise-linear functions when the step
size t is small enough.
Since we have reduced our attention to piecewise-
linear functions, we have
x =
x
j
x
j1
t
within each interval t [t
j1
, t
j
]. So we can express the
integral S[x] as the nite sum,
S[x] =
_
1
0
( x(t))
2
dt =
N

j=1
(x
j
x
j1
)
2
t
2
t,
where we have dened for convenience t
0
= 0, t
N
= 1.
At this point one can perform the minimization of
S[x] quite straightforwardly. The functional S[x] is now
a function of N 1 variables x
1
, ..., x
N1
, i.e. S[x] =
S(x
1
, ..., x
N1
), so the minimum is achieved at the val-
ues x
j
where the derivatives of S(x
1
, ..., x
N1
) with re-
spect to each x
j
are zero. This problem is now quite
similar to the Example 3 above, so the solution is x
j
=
jL/N, j = 0, ..., N. Now we recall that x
j
is the value of
the unknown function x(t) at the point t
j
= j/N. There-
fore the minimum of the functional S[x] is found at the
values x
j
such that would correspond to the function
x(t) = Lt. As we increase the number N of intervals,
we still obtain the same function x(t) = Lt, therefore
the same function is obtained in the limit N . We
conclude that the function x(t) = Lt minimizes the func-
tional S[x] with the restrictions x(0) = 0, x(1) = L.
3.3.2. Calculation with variational calculus
The above calculation has the advantage of being more
intuitive and visual: it is made clear that minimization
19
3. Introducing Lagrangians
of a functional S[x(t)] with respect to a function x(t) is
quite similar to minimization of a function S(x
1
, ..., x
N
)
with respect to a large number of variables x
j
in the limit
of innitely many such variables. However, the calculus
of variations provides a much more efcient computa-
tional procedure to determine the function x(t) that min-
imizes S[x].
Let us consider a very small change (t) in the function
x(t). To see how the functional S[x] changes, we dene
another functional S,
S[x, ] S[x(t) +(t)] S[x(t)].
(In many textbooks, the change in x(t) is denoted by
x(t), and generally the change of any quantity Q is de-
noted by Q. We chose to write (t) instead of x(t) for
clarity.) The functional S[x, ] is called the variation of
the functional S[x] with respect to the change (t) of the
function x(t). The variation S is itself a functional de-
pending on two functions, x(t) and (t).
To understand howto proceed further, let us nowcon-
sider the similarity between the variation of a functional
and the variation of a normal function under a small vari-
ation of its argument. We have
f(t, t) f(t +t) f(t) f

(t)t +
1
2
f

(t)t
2
+...,
and for small t we can neglect nonlinear terms in t. So
the variation f of the value of the function is approxi-
mately linear in the variation t of the argument. Sim-
ilarly, when (t) is very small, we expect that the vari-
ation S[x, ] will be approximately linear in (t), i.e. it
will be a linear functional of (t). To understand what
a linear functional looks like, consider a linear function
g(
1
,
2
, ...) depending on several variables
j
, j = 1, 2, ...
A linear function can always be written as
g(
1
,
2
, ...) =

j
A
j

j
,
where A
j
are suitable constants, which can be expressed
through derivatives of g, namely A
j
= g/
j
. Since
a functional is like a function of innitely many vari-
ables, we can write an analogous equation for S[x, ].
We replace the index j by continuous variable t, the vari-
ables
j
and the constants A
j
become functions (t), A(t),
while the sum over j becomes an integral over t. Thus,
the functional S in the linear approximation (disregard-
ing terms quadratic in ) can be written as an integral,
S[x, ] =
_
1
0
A(t)(t)dt, (3.1)
where A(t) is a suitable function. The function A(t)
above is called the functional derivative of the func-
tional and denoted by S[x]/x(t). At this point, it is
clear that the way to compute the functional derivative
is to express the variation S[x, ] explicitly as an inte-
gral of the form(3.1) and then to read off the coefcients
A(t).
A function has an extremum (minimum, maximum,
or saddle point) at a point where its derivative van-
ishes. A function of many variables, f(x
1
, x
2
, ..., x
n
),
has an extremum at a point (x
1
, x
2
, ..., x
n
) where all the
partial derivatives vanish, f/x
j
= 0 for all j. Simi-
larly, a functional S[x(t)] has a extremum at the point
x(t) where the functional derivative S/x(t) vanishes
for all t. Therefore, the condition for the extremum is
S/x(t) = 0 for all t. (If you are not satised by this con-
sideration, read the end of this section where this state-
ment will be justied more formally.)
To make these considerations more concrete, let us
now determine the extremum of the functional
S[x(t)] =
_
1
0
x
2
dt (3.2)
under the boundary conditions x(0) = 0, x(1) = L.
Substituting x(t) + (t) instead of x(t) into the func-
tional (3.2), we get
S[x, ] =
_
1
0
_
( x + )
2
x
2

dt = 2
_
1
0
x dt +O(
2
),
where we did not write out terms quadratic in (t) since
we are going to neglect them. Now, we would like to
get an integral of the form (3.1) but instead we have an
integral involving the time derivative of (t). Therefore,
we need to rewrite this integral so that no derivatives of
(t) appear there. In order to do that, we integrate by
parts and nd
S[x(t), (t)] = (1) x(1) (0) x(0) 2
_
1
0
x(t)(t)dt.
Since in our case the values x(0), x(1) are xed, the func-
tion (t) is such that (0) = (1) = 0, so the boundary
terms vanish. We nd
S[x, ] =
_
1
0
[2 x(t)] (t)dt,
therefore the variational derivative is what stands in the
brackets above, i.e.
S
x(t)
= 2 x(t).
The condition for the extremum is
S/x(t) = 0 = 2 x(t),
in other words, x(t) = 0. This differential equation has
the general solution x(t) = a + bt, and with the addi-
tional restrictions x(0) = 0, x(1) = L we immediately get
the solution x(t) = Lt. This is, of course, the same solu-
tion as we found by more elementary considerations in
Sec. 3.3.1.
20
3. Introducing Lagrangians
3.3.3. General formulation
Let us now return to the question of why the condition
for the extremum is S/x(t) = 0. The functional S[x]
has an extremum at a point x(t) where the variation
S[x(t), (t)] is purely second-order (or higher) in , un-
der an arbitrary change (t). However, above we have
obtained the variation S to rst order in ; so this rst-
order quantity must vanish for x(t) where the functional
has an extremum. An integral such as
_
1
0
A(t)(t) can
vanish for arbitrary (t) only if the function A(t) van-
ishes for all t. In our case, the function A(t) is the vari-
ational derivative S/x(t). Thus the condition for the
extremum is S/x(t) = 0 for all t.
To summarize: The requirement that the functional
S[x(t)] must have an extremumat the function x(t) leads
to a differential equation on the unknown function x(t).
This differential equation is
S[x]
x(t)
= 0.
The procedure is quite similar to nding on extremum
of a function f(t), where the point t of the extremum is
found from the equation df(t)/dt = 0.
Suppose that we are now asked to minimize the func-
tional
S[x(t)] =
_
1
0
(x
2
+ x
2
x
4
sin t)dt
subject to the restrictions x(0) = 0, x(1) = 1; in mechan-
ics we shall mostly be dealing with functionals of this
kind. We might try to discretize the function x(t) and
compute partial derivatives, as we did in Sec. 3.3.1, but
this is more difcult in the present case.
2
We can use the
calculus of variations, but then everything will have to
be computed anewfor a different functional S[x]. Rather
than go through the above procedures every time, let us
now derive the formula for the functional derivative for
any functional of the form
S[x
i
(t)] =
_
b
a
L(x
i
, x
i
, t)dt,
where L(x
i
, v
i
, t) is a given function of the coordinates x
i
and velocities v
i
x
i
(assuming that there are n coordi-
nates, so i = 1, ..., n). This function L(x
i
, v
i
, t) is called
the Lagrangian.
We introduce innitesimal changes
i
(t) into the func-
tions x
i
(t) and express the variation S through
i
(t) and

i
(t),
S[x
i
(t),
i
(t)] =
_
b
a
n

i=1
_
L
x
i

i
(t) +
L
v
i

i
(t)
_
dt.
2
The words solving XY Z is difcult essentially mean, The author
has little or no idea how to do it, and in any case we are going to
use a different approach now.
Then we integrate by parts,
_
b
a
L
v
i

i
(t)dt =
L
v
i

b
a

_
b
a
_
d
dt
L
v
i
_

i
(t),
discard the boundary terms and obtain
S[x
i
(t),
i
(t)] =
n

i=1
_
b
a
_
L
x
i

d
dt
L
v
i
_

i
(t)dt.
Thus the variational derivatives can be written as
S[x]
x
i
(t)
=
L
x
i

d
dt
L
v
i
. (3.3)
3.3.4. Euler-Lagrange equations
Let us now put together what we have learned. We saw
that the condition for a functional S[x
i
] to have an ex-
tremum at x
i
(t) is S/x
i
(t) = 0. We have derived the
formula (3.3) for the functional derivative S[x
i
]/x
i
(t).
Therefore, the condition for an extremum is formulated
as the following differential equation, called the Euler-
Lagrange equation,
L
x
i

d
dt
L
v
i
= 0. (3.4)
These are the differential equations that express the
mathematical requirement that the functional S[x
i
(t)]
has an extremum at the set of functions x
i
(t). There
are as many equations as unknown functions x
i
(t), one
equation for each i = 1, ..., n.
Note that the Euler-Lagrange equations involve par-
tial derivatives of the Lagrangian with respect to coor-
dinates and velocities. The derivatives with respect to
velocities v = x are sometimes written as L/ x, which
might at rst sight appear confusing. However, all that
is meant by this notation is the derivative of the func-
tion L(x, v, t) with respect to its second argument, v, fol-
lowed by the substitution v = x.
The Euler-Lagrange equations also involve the deriva-
tive d/dt with respect to the time. This is not a partial
derivative with respect to t but a total derivative. In
other words, to compute
d
dt
L
x
i
, we need to substitute the
functions x
i
(t) and x
i
(t) into the expression
L
x
i
, thus ob-
tain a function of time only, and then take the derivative
of this function with respect to time.
Remark: If the Lagrangian contains higher deriva-
tives (e.g. the second derivative), the Euler-Lagrange
formula is different. For example, if the Lagrangian is
L = L(x, x, x, t), then the Euler-Lagrange equation is
L
x

d
dt
L
x
+
d
2
dt
2
L
x
= 0.
Note that this differential equation may contain up to
fourth-order time derivatives of x(t)! Usually, one does
not encounter such Lagrangians in studies of classical
21
3. Introducing Lagrangians
mechanics because ordinary systems are described by
Lagrangians containing only rst-order derivatives.
Summary: In mechanics, one species a system by
writing a Lagrangian and pointing out the unknown
functions in it. From that, one derives the equations of
motion using the Euler-Lagrange formula.
3.4. How to choose the Lagrangian
To choose the Lagrangian for a given mechanical system,
one follows these basic steps:
1. Choose coordinates that describe all possible posi-
tions of every part of the system. These coordi-
nates are called generalized coordinates and may
be a mixture of distances, angles, Cartesian coordi-
nates, polar coordinates, or any other parameters
even area or volume. A generalized coordinate de-
scribing the position of a body does not have to
be a distance between the body and a xed axis.
One can use distances between moving bodies, an-
gles between moving lines, or whatever else makes
the description of the system simpler. But it is im-
portant to choose exactly as many coordinates as
necessarynot more an not less.
2. For each part of the system, compute the kinetic en-
ergy (
1
2
mv
2
) and the potential energy (mgh for grav-
ity,
1
2
k (x)
2
for a spring, qV for electrostatic poten-
tial, etc.) as a function of generalized coordinates
and their time derivatives. (An easy way to com-
pute the kinetic energy is to express Cartesian co-
ordinates through generalized coordinates and then
compute v
2
= x
2
+ y
2
+ z
2
for each point mass.)
3. The Lagrangian is equal to the total kinetic energy
minus the total potential energy. (Both should be
computed in an inertial reference frame! In a non-
inertial frame, e.g. in a rest frame of a rotating body,
this rule will most likely fail.)
It can be shown that this rule works for an arbitrary
mechanical system made up of point masses, springs,
ropes, frictionless rails, etc., regardless of how one intro-
duces the generalized coordinates. We shall not study
the proof of this statement, but instead go directly to the
examples of Lagrangians for various systems.
3.4.1. Examples of Lagrangians
The Lagrangian for a free point mass moving along
a straight line with coordinate x:
L =
1
2
m x
2
.
A point mass moving along a straight line with co-
ordinate x, in a force eld with potential energy
V (x):
L =
1
2
m x
2
V (x).
A point mass moving in three-dimensional space
with coordinates x
i
(x, y, z), in a force eld with
potential energy V (x, y, z):
L =
1
2
3

i=1
_
m x
2
i
_
V (x, y, z) =
m
2
|

x|
2
V (x).
A point mass constrained to move along the circle
x
2
+z
2
= R
2
in the gravitational eld near the Earth
(the z axis is vertical). It is convenient to introduce
the angle as the coordinate, with z = Rcos , x =
Rsin. Then the potential energy is U = mgz =
mgRcos , while the kinetic energy is K =
1
2
mv
2
=
1
2
mR
2

2
=
1
2
mR
2

2
. So the Lagrangian is
L = K U =
1
2
mR
2

2
mgRcos .
Note that we have written the Lagrangian (and
therefore we can derive the equations of motion)
without knowing the force needed to keep the mass
moving along the circle. This shows the computa-
tional advantage of the Lagrangian approach. In
the traditional Newtonian approach, the rst step
would be to determine the constraining force, which
is initially unknown, from a systemof equations in-
volving also the unknown acceleration of the point
mass. See also Sec. 3.5.1.
Two (equal) point masses connected by a spring
with rest length l:
L =
m
2
( x
2
1
+ x
2
2
)
k
2
(x
1
x
2
l)
2
.
A mathematical pendulum, i.e. a massless rigid
stick of length l attached to the ceiling at one end
and to a point mass m at the other end. Suppose for
simplicity that the pendulum can move only in the
x-z vertical plane in the gravitational eld near the
Earth (vertical z axis). As the coordinate, we choose
the angle between the stick and the z axis. The
Lagrangian is
L =
m
2
l
2

2
+mgl cos .
A point mass m sliding without friction along an
inclined plane that makes an angle with the hor-
izontal, in the gravitational eld of the Earth. As
the coordinate, we choose x and y, where the y
axis is parallel to the inclined plane. The height
z is then z = xtan , so the potential energy is
22
3. Introducing Lagrangians
U = mgz = mgxtan . The kinetic energy is com-
puted as
K =
m
2
( x
2
+ y
2
+ z
2
) =
m
2
(
x
2
cos
2

+ y
2
).
Hence, the Lagrangian is
L = K U =
m
2
(
x
2
cos
2

+ y
2
) mgxtan .
3.5. Standard exercises in setting up
Lagrangians
Exercise: Youshouldnowdetermine the Euler-Lagrange
equations that follow from each the Lagrangians in
Sec. 3.4.1. You should also verify that these equations are
the same as would be obtained fromschool-level Newto-
nian considerations for the respective physical systems.
This should occupy you for an hour or two. Only then
you will begin to gain experience and at the same time
to appreciate the power of the Lagrangian approach.
For more examples of setting up Lagrangians for me-
chanical systems and for deriving the Euler-Lagrange
equations, ask your physics teacher or look up in any
theoretical mechanics problem book. Much of the time,
the Euler-Lagrange equations for some complicated sys-
tem (say, a pendulum attached to the endpoint of an-
other pendulum) would be too difcult to solve, but
the goal at this point is to gain experience deriving these
equations from Lagrangians. Their derivation would be
much less straightforward in the tranditional Newtonian
approach using forces and accelerations. (See Sec. 3.5.1
for an illustration.)
I recommend going through every exercise below(un-
less you knowat once howto solve each of them). These
exercises are not difcult but will give you experience in
dealing with Lagrange functions. You will not really un-
derstand the Lagrangian formalismunless you can solve
these standard problems.
Each of the following exercises poses the same ques-
tions: (a) Introduce generalized coordinates and write
down a Lagrange function for the system described be-
low. (b) Derive the Euler-Lagrange equations using
Eq. (3.4). (It is not necessary to solve them!) Every situa-
tion is set up in the gravitational eld near the Earth. All
the mentioned sticks are massless and completely rigid.
All the mentioned springs are massless, perfectly elas-
tic (without friction), and have spring constant k. The
lengths of springs at rest are given in the exercises.
1. A point mass m hangs from the ceiling on a stick of
length l. It can move only in the x-z vertical plane.
(This is called a mathematical pendulum. The pen-
dulum would be called physical if the stick were
massive, but we postpone the consideration of this
situation until we study rotations of a rigid body.)

2
m
2
m
1
l
k
Figure 3.1.: Two point masses connected by a spring.
2. Two point masses m hang from the ceiling (far from
each other) on two sticks of length l each. They can
both move only in the x-z vertical plane.
3. Two point masses m
1
and m
2
hang from the ceiling
on two sticks of length l each. They can both move
only in the x-z vertical plane. In addition, there is
a spring with rest length l connecting the two point
masses. The points where the sticks are attached to
the ceiling are at a distance l from each other. (See
Fig. 3.1.)
4. A point mass m hangs from the ceiling on a stick of
length l. Another point mass m is attached to the
rst one with a stick of length L. Both masses can
move only in the x-z vertical plane.
5. A point mass m hangs from the ceiling on a stick of
length l. Another point mass m is attached to the
rst one with a spring of rest length L. Both masses
can move only in the x-z vertical plane.
6. Two point masses m are attached to the two ends of
a stick of length 2l. The midpoint of the stick is at-
tached to the ceiling by another stick of length l. The
entire arrangement can move only in the x-z vertical
plane.
7. A point mass m is attached to the ceiling by a stick
of length l. The point mass can move in all direc-
tions (in three dimensions, not restricted to a verti-
cal plane).
8. Apoint mass mis attached to the ceiling by a spring
of rest length l. The point mass can move in all di-
rections (in three dimensions).
9. Two identical carts of mass m can roll on a horizon-
tal, frictionless rail along the x axis. The carts are
connected by a spring of rest length l.
10. A cart of mass M can roll on a horizontal, friction-
less rail along the x axis. A pendulum, consisting of
a stick of length l and a point mass m, is mounted
rigidly on the cart and can move freely within the
x-z vertical plane.
23
3. Introducing Lagrangians
3.5.1. Advantage of Lagrangian approach
To demonstrate the advantage of the Lagrangian ap-
proach over the traditional Newtonian one, let us
solve a moderately complicated problem using both ap-
proaches. We shall see that the Lagrangian approach
yields the equations of motion more directly and also
more straightforwardly, that is, with fewer possibilities
for making a mistake.
The problem is a slight modication of Problem 10
from the previous section. A cart of mass M can roll
on an inclined, frictionless plane at an angle with the
x axis. A pendulum, consisting of a stick of length l and
a point mass m, is mounted rigidly on the cart and can
move freely within the x-z vertical plane (see Fig. 3.2).
The task is to determine the equations of motion for s(t)
and (t), where s is the displacement of the cart along
the incline and is the angle with the carts vertical, as
shown in the gure.
Let us rst apply the Lagrangian approach. The gen-
eralized coordinates q
i
are s and . The Cartesian co-
ordinages of the cart (with a certain irrelevant choice of
origin) are x
M
= s cos , z
M
= s sin . The Cartesian
coordinates of the mass m are x
m
= x
M
+ l sin ( ),
z
m
= z
M
l cos ( ), again up to an irrelevant choice
of origin. The Lagrangian is constructed in the standard
way from the kinetic and the potential energy,
L =
1
2
M
_
x
2
M
+ z
2
M
_
+
1
2
m
_
x
2
m
+ z
2
m
_
Mgz
M
mgz
m
.
It remains to express the Lagrangians through the gen-
eralized coordinates and velocities ,

, s, s. After a
straightforward calculation, one nds
L =
1
2
(M +m) s
2
+
1
2
ml
2

2
+ml s

cos
+ (M +m) sg sin +mgl cos ( ) .
The equations of motion follow from the general for-
mula (3.4) and can be written as
(M +m) s +ml
d
2
dt
2
sin = (M +m) g sin, (3.5)
l

+ s cos = g sin ( ) . (3.6)


Let us now derive the same equations in the tradi-
tional Newtonian approach. First we need to consider
the forces acting on the cart and on the point mass m.
The force of tension T in the stick is unknown. It is con-
venient to decompose vectors into components tangen-
tial and normal to the incline. Let us rst consider the
forces acting on the cart. The normal component of the
force T acting on the cart and the normal component of
the gravitational force are balanced by a normal force N
from the incline. (To save time, we shall not determine
this normal force N.) So we only need to compute the
tangential components of the forces acting on the cart.
z
x

m
M
T
T
mg
s
Figure 3.2.: A mathematical pendulum mounted on a
cart rolling on an inclined plane.
The tangential component of the acceleration of the cart
is s, and thus we obtain the equation
M s = Mg sin +T sin. (3.7)
Now we consider the forces acting on the point mass m
and its acceleration. The displacement of the point mass
malong the incline is equal to s+l sin, therefore the tan-
gential component of the acceleration of the point mass
is
d
2
dt
2
(s +l sin ) ,
while the tangential components of the forces are
T sin and mg sin . Thus the tangential component
of Newtons second law is
m
d
2
dt
2
(s +l sin ) = T sin +mg sin . (3.8)
The normal component is
m
d
2
dt
2
(l cos ) = T cos mg cos . (3.9)
We have obtained a system of three equations (3.7)-
(3.9) with three unknowns: s(t), (t), T(t). Nowwe need
to eliminate T and derive a closed system of two equa-
tions for s(t) and (t). Adding Eqs. (3.7) and (3.8), we
nd one such equation,
(M +m) s +ml
d
2
dt
2
sin = (M +m) g sin.
Then we multiply Eq. (3.8) by cos and add to Eq. (3.9)
multiplied by sin . The result is
m s cos +ml
_
cos
d
2
dt
2
sin sin
d
2
dt
2
cos
_
= m s cos +ml

= mg sin ( ) .
It is easy to see that these equations are equivalent to
Eqs. (3.5)-(3.6) derived in the Lagrangian approach.
This example illustrates that the Lagrangian approach
is signicantly more straightforward and requires fewer
intermediate steps. The equations of motion (the Euler-
Lagrange equations) are found directly for the interest-
ing quantities s(t) and (t), without need to consider the
constraining forces T and N.
24
4. Further directions
4.1. Overview of our progress so far
We have seen how to describe various mechanical sys-
tems in terms of Lagrangians. It is straightforward
to nd the Lagrangian for any system consisting of
point masses, rigid sticks, ropes, frictionless rails, rolling
wheels, etc. Given a Lagrangian, it is very easy to de-
rive the equations of motion (the Euler-Lagrange equa-
tions). Solving these equations is a technical task that
may be accomplished analytically or numerically (using
computer codes). In principle, the theoretical descrip-
tion of mechanical systems is now complete.
The Lagrangian formalism has certain limitations. For
instance, mechanical systems involving the force of fric-
tion generally cannot be easily described by the La-
grangian formalism. The Lagrangian formalism in-
cludes only conservative forces, i.e. forces which are
gradients of a potential, or forces that perform no work
(even if they dependon velocities, e.g. the Lorentz force).
However, friction and viscosity are not conservative
forces because they depend on velocity and position in a
nontrivial way. In physics, the force of friction is not con-
sidered a fundamental force, but rather a force arising
out of averaged interactions with a large number of par-
ticles in the environment. The interaction with each of
the particles is described by a conservative force, e.g. the
electromagnetic force. Thus, effects of friction can be de-
rived, in principle, froma more fundamental picture that
involves only conservative forces. Of course, in prac-
tice it is much more convenient to introduce the force of
friction phenomenologically, i.e. by guessing or exper-
imentally measuring the formula for the friction. One
well-known formula is |

F| = |

N|, where

N is the nor-
mal force and is the friction coefcient; this formula
approximately describes dynamic friction on rough sur-
faces. Another known formula for viscous friction is

F = A(v)v, where v is the velocity of a body mov-


ing through a medium, and A(v) is the coefcient that
usually depends on the velocity and on the shape of the
body in some complicated way. This formula can be
used for bodies moving through air or water.
4.2. What remains: applications
You need to learn certain applications of the general the-
ory to various important cases. In each case, one can
derive the equations of motion and use suitable math-
ematical techniques to solve these equations and extract
physically meaningful answers. Here are the major areas
of interest.
Describing the motion of a point mass in a cen-
tral eld of force (Sec. 5). The most important ex-
ample is the Kepler problem, which is involves a
force that decays as 1/r
2
with distance. This setup
describes, for instance, the motion of planets and
comets around the sun. In this case, one can solve
the equations of motion analytically and derive im-
portant properties of the motion in a central eld,
such as periodicity, orbit parameters, escape veloc-
ity, Kepler laws, perihelion precession, etc. The the-
ory of the Kepler problem is the foundation for ce-
lestial mechanics.
Describing small oscillations around a static con-
guration (Sec. 6). For a mechanical system that
has oscillating degrees of freedom around a static
equilibrium position (these systems range from
molecules to bridges), we make an approximation
that the systemhas only very small deviations from
that position. In the limit when these deviations are
small, it is usually possible to derive a linear equa-
tion of motion for them. These equations of mo-
tion can be analyzed to verify that the position of
the system is stable against small deviations, and
to describe oscillations around the equilibrium, as
well as the response to a small external perturba-
tion. (The analysis proceeds most conveniently in
the Lagrangian formalism.) The key concepts in this
area are normal modes, normal frequences, and res-
onance.
Describing the rotation of a rigid body under ex-
ternal forces (Sec. 7). A rigid body is a collection
of a very large number of point masses, which are
spread in space and constrained to remain at xed
distances from each other. So, a rigid body may
move as a whole or rotate as a whole, but it can-
not be squeezed or deformed in any way. The con-
cept of a rigid body is very important because it
is an idealization of the behavior of rigid things that
we use in real life. The motion of a rigid body is,
of course, much simpler than the set of all possi-
ble motions of its constituent particles. So it is very
useful to develop a special formalism describing the
possible motions of a rigid body. This formalism in-
volves such concepts as the tensor of inertia, torque,
angular momentum, and rolling without sliding. I
would like to stress that these concepts are not new
fundamental axioms of mechanics; these concepts
are derived from a standard Lagrangian for the sys-
tem consisting of many massive particles that are
25
4. Further directions
spread in space and constrained to remain at xed
distances from each other.
Describing elastic scattering of point masses
(Sec. 10). More generally, one considers a point
mass moving in a potential, such that the force
is appreciably nonzero only in a small portion of
space. A typical problem is to describe the motion
of a point mass that has a given velocity far away
from the interaction area. An incoming particle
is deected (scattered) by the potential, and ies
away with a the same speed but in a slightly dif-
ferent direction. A typical experimental situation is
when one has an initial beam of particles, containing
a large number of particles with approximately the
same initial velocities but slightly different initial
positions. In that case, the interesting question
is not to describe the precise trajectory of each
particle, but to predict how many outgoing parti-
cles will be observed in a particular direction. In
other words, one wants to characterize the nal
(outgoing) states at innity in terms of the initial
(ingoing) states at innity. The key notion in this
area is the scattering cross-section.
4.3. What remains: theoretical
developments
Besides the applications, there are certain theoretical de-
velopments that enrich the Lagrangian formalism and
provide essential foundations for other areas of theoret-
ical physics. At least some of these theoretical develop-
ments are usually included in courses of theoretical me-
chanics. We shall only study the most important of these
developments:
General properties of Lagrangian formalism: in-
variance under coordinate changes, equivalence of
systems with different Lagrangians, motivations for
using the action principle, etc. (chapter 11), and
a general description of constraints (chapter 12).
These are more or less formal developments that
clarify the structure of the theory but only rarely
help solve particular problems. For instance, the in-
variance under coordinate changes is an important
conceptual fact that justies why we can choose ar-
bitrary generalized coordinates.
Hamiltonian formalism (chapter 8). This is a very
important mathematical development of the La-
grangian formalism, where a mathematical trick is
used to include velocities as independent variables
into the Lagrangian, thus making the equations of
motion rst-order in time derivatives. The Hamil-
tonian formalism involves such notions as Legen-
dre transformation, Poisson brackets, and canon-
ical transformations. The most important appli-
cation of the Hamiltonian formalism is in quan-
tum theory; also, the Hamiltonian description has
far-reaching consequences for the formal develop-
ment of the theory of complicated mechanical sys-
tems. For instance, such features as the presence of
constraints, integrability, and a transition to chaos
is most naturally expressed using the Hamiltonian
formalism. While studying relatively easy problems
of classical mechanics, such as found in introduc-
tory courses, the Hamiltonian formalism only occa-
sionally has an advantage over the Lagrangian for-
malism. In the minimal standard course of mechan-
ics, only some basic ideas of the Hamiltonian for-
malism are covered: the Poisson brackets, canoni-
cal transformations, and the Hamilton-Jacobi theory
(chapter 13.2).
Perturbation theory. This is a method to solve an
equation approximately if that equation is only a
small deviation from another, exactly solvable equa-
tion. A large number of methods of perturbation
theory are applicable to various situations in me-
chanics. Only the most basic ones (anharmonic
oscillations) are studied in the minimal standard
course of mechanics (chapter 14).
Symmetries and conservation laws. The main fact is
that the existence of a group of symmetry transfor-
mations is equivalent to the existence of a conserva-
tion law. This property is very important for particle
physics and eld theory. Keywords include: sym-
metry groups, innitesimal transformations (also
called generators), the Noether theorems, Galilei
and Lorentz invariance. Only a qualitative under-
standing is intended at this point.
26
5. Motion in central force
To be written.
27
6. Small oscillations
To be written.
28
7. Rotation of rigid bodies
To be written.
29
8. Hamiltonian formalism
To be written.
30
9. Standard problems
This is a list of problems which should appear reason-
ably straightforward (not challenging!) to students who
understand the material in this course.
9.1. Lagrangians
To be written.
9.2. Central force
To be written.
9.3. Small oscillations
To be written.
9.4. Rigid body rotation
To be written.
9.5. Hamiltonians
To be written.
31
Part II.
Optional topics
32
10. Scattering
To be written.
33
11. More about Lagrangians
This section contains several theoretical developments
of the Lagrangian formalism that are not directly neces-
sary for solving problems. However, these considera-
tions help understand the theory more deeply and an-
swer certain important questions.
11.1. Why does the extremum of a
functional determine motion?
In the Lagrangian formulation of mechanics, the trajec-
tory q(t) is determined fromthe condition that the action
functional S[q(t)] should have an extremum. (It is not al-
ways the case that the trajectory is the minimumof the ac-
tion; in some cases it might be merely an extremum.) The
functional derivative S/q(t) is equal to 0 on the actual
trajectory q(t) of the system. This condition is known
as the action principle. By now, you should be famil-
iar with the mathematical procedures used to derive the
equations of motion from the action principle and well
used to the fact that the correct equations of motion for
each mechanical systemindeed follow if the Lagrangian
is chosen appropriately. However, it might still feel like
a mystery to you that Newtons laws are equivalent to
the condition for the extremum of some functional. You
might be asking yourself: why is this possible at all?
Here is one explanation that may help. Let us con-
sider a simple mechanical system: a point mass m mov-
ing in one dimension, with coordinate x(t), in a potential
U(x). (The same considerations can be easily general-
ized to the case of more than one dimensions and more
than one coordinate.) Suppose that x
0
(t) is the correct
trajectory according to Newtons law,
m x
0
(t) =
dU
dx

x=x
0
(t)
.
How can we use a functional S[x] to express the condi-
tion that the trajectory x(t) is the correct one? One way is
to demand that the deviation of x(t) from x
0
(t) is every-
where zero. This can be expressed using the functional
S
1
[x] =
_
t
2
t
1
[x(t) x
0
(t)]
2
dt.
It is clear that the functional S
1
[x] has the minimum
value (and the minimum value is obviously 0) if and
only if x(t) = x
0
(t) for all t. This is an example of how to
use a functional to express some condition on functions:
The functional S
1
[x] measures the deviation of x(t) from
x
0
(t) all along the way. The smallest possible deviation is
no deviation at all; thus, the minimum of the functional
S
1
[x(t)] is at the trajectory x(t) that does not deviate at
all from x
0
(t).
Another similar way to specify the trajectory is to use
the functional
S
2
[x] =
_
t
2
t
1
[ x(t) x
0
(t)]
2
dt.
This functional, together with the boundary conditions
x(t
1
) = x
0
(t
1
), x(t
2
) = x
0
(t
2
), has the minimum value if
and only if x(t) = x
0
(t) for all t.
Admittedly, the functionals S
1
[x], S
2
[x] do not help us
to formulate the laws of mechanics, because they already
contain the correct trajectory x
0
(t) explicitly. We shall
now construct another functional, S
3
[x], starting from
S
2
[x] and trying to eliminate the explicit dependence on
x
0
(t).
Let us rewrite S
2
[x] as
S
2
[x] =
_
t
2
t
1
[ x
2
2 x x
0
+ x
2
0
]dt.
The third term, x
2
0
, is a xed function and does not vary
when we vary x(t). Therefore we may omit that term
from S
2
[x]. Now, if we had x
0
rather than x
0
in the func-
tional, we could then use Newtons law for the correct
trajectory. So let us integrate the second term by parts:
2
_
t
2
t
1
x x
0
dt = 2 x x
0
|
t
2
t
1
+
_
t
2
t
1
2x x
0
dt.
The boundary term x x
0
|
t
2
t
1
does not vary with x(t) be-
cause the boundary values of x(t) are xed. Therefore
we may omit that term. Finally, we use Newtons law to
replace x
0
by m
1
U

(x
0
),
_
t
2
t
1
2x x
0
dt =
_
t
2
t
1
2m
1
xU

(x
0
).
If we now assume that the trajectory x(t) deviates very
little from the correct trajectory x
0
(t), then we may ap-
proximately write
xU

(x
0
) = (x x
0
)U

(x
0
) +x
0
U

(x
0
)
= U(x) +O[(x x
0
)
2
] +x
0
U

(x
0
).
The term O[(x x
0
)
2
] can be omitted under the above
assumption. The last term x
0
U

(x
0
) can also be omitted
since it is independent of x(t).
Thus we nd that the functional S
2
is equivalent to the
following functional,
S
3
[x] =
_
t
2
t
1
[ x
2
2m
1
U(x)]dt,
34
11. More about Lagrangians
up to inessential terms that do not vary with x(t) and
up to terms of order O[(x x
0
)
2
]. It is clear that S
3
[x] is
equal to the usual Lagrangian times the constant coef-
cient 2/m.
In this way, we obtained a functional S
3
[x] which has
an extremum when x(t) is very close to x
0
(t); i.e. it is a
local extremum. The newfunctional does not depend ex-
plicitly on x
0
(t), just as we wanted. The price to pay is
that this functional works only for small deviations from
the correct trajectory. Indeed, the functional S
3
may have
other minima or maxima which the original functional
S
2
does not have. The only real justication for the cor-
rectness of S
3
is that the condition for an extremum of
S
3
[x] coincide with the correct equations of motion (de-
rived from Newtons law, and ultimately from experi-
ments).
11.2. Why can we use arbitrary
coordinates in the Lagrangian?
In most cases, the Lagrangian is equal to the difference of
the kinetic and the potential energy terms. However, one
needs to select some coordinates to compute these energy
terms. It turns out that there is a wide choice of vari-
ables to be used as coordinates; these variables could
be lengths, angles, or any functions of lengths and an-
gles (but not velocities!). In other words, one can use
any coordinate systems or even just parts of some coor-
dinate systems, as long as the possible positions of every
point mass are adequately described by the chosen coor-
dinates and the appropriate constraints. For this reason,
the coordinates entering the Lagrangian are called gen-
eralized coordinates. Usually, one chooses generalized
coordinates for convenience, to minimize the required
computational work, or to decrease the number of con-
straints.
However, you may be asking yourself: why are we
allowed to use arbitrary coordinates in the Lagrangian
formalism? Certainly, as we know, Newtons laws are
not the same in different coordinates. For instance, the
mass times the acceleration is equal to the force only if
the acceleration is computed as

x(t), where x(t) is the
vector of Cartesian coordinates (x, y, z). The formula
m

x =

F would be incorrect if the vector x = (x
1
, x
2
, x
3
)
were to consist of, say, the radius r =
_
x
2
+y
2
+z
2
,
the azimuthal angle in the (x, y) plane, and the co-
ordinate z. However, it turns out that the Lagrangian
formalism works just ne when we express the kinetic
energy and the potential energy through the variables
(x
1
, x
2
, x
3
) = (r, , z) and write the corresponding La-
grange function L(x
j
, x
j
). The correct equations of mo-
tion will be given by the Euler-Lagrange equation,
d
dt
L


x
=
L
x
,
as before. One says that the Lagrangian formalism is co-
variant with respect to coordinate transformations.
The reason for this can be explained in two ways: ei-
ther more formally, by showing that the Euler-Lagrange
equations remain the same under an arbitrary change of
coordinates given by a function; and more visually, by
approaching the situation from the geometrical point of
view and by visualizing the Euler-Lagrang equations as
conditions for an extremum of the action functional.
11.2.1. Formal derivation
For simplicity, we shall only consider a one-dimensional
problemwith a Lagrangian L(q, q, t), where q(t) is a gen-
eralized coordinate. The same consideration is very eas-
ily generalized to the case of multiple coordinates.
Suppose that a new coordinate x(t) is chosen instead
of q(t). The new coordinate can be an arbitrary function
of the old coordinate. Let us consider an even more gen-
eral case where the change of coordinates depends on
time (i.e. we may choose different coordinates at differ-
ent times). Then the new coordinate is related to the old
one by a formula such as
q(t) = F(x(t), t),
where F(x, t) is a known function.
Now we need to express the old Lagrangian L(q, q, t)
through the newvariable x and its derivative x. We have
q =
F
t
+
F
x
dx
dt
F
,t
+F
,x
x,
where we denote partial derivatives by subscripts with
commas, e.g. f
,a
f(a, b, c)/a. (This is a condensed
notation frequently used in physics.)
The Lagrangian expressed through the newvariable x
is therefore
L(q, q, t)

L(x, x, t) = L(F(x, t), F
,t
+F
,x
x, t).
The newvariable x is adequate only if the old coordinate
q can be unambiguously expressed through x; the condi-
tion for this is F
,x
= 0. So we shall assume that F
,x
= 0
at least within some interval of x.
Let us now compare the equations of motion (EOM)
that we would derive in the old coordinates and in the
new coordinates. The old EOM can be written as
d
dt
L
, q
= L
,q
.
The new EOM is
d
dt

L
, x
=

L
,x
.
Let us express this equation through L instead of

L:

L
,x
= L
,q
F
,x
+L
, q
(F
,tx
+F
,xx
x),

L
, x
= L
, q
F
,x
,
d
dt

L
, x
= F
,x
d
dt
L
, q
+L
, q
d
dt
F
,x
.
35
11. More about Lagrangians
Therefore, the new EOM is equivalent to
F
,x
d
dt
L
, q
+L
, q
d
dt
F
,x
= L
,q
F
,x
+L
, q
(F
,tx
+F
,xx
x).
Simplifying this equation by computing
d
dt
F
,x
= F
,tx
+F
,xx
x,
we nd
F
,x
d
dt
L
, q
= F
,x
L
,q
.
So the newEOM is indeed equivalent to the old one, un-
der the assumption that F
,x
= 0.
11.2.2. Geometric picture
The computation presented above is straightforward
and explicit, but may leave you wondering why it works.
Here is a more visual explanation.
The Euler-Lagrange equations express the condition
that the functional S[q(t)] has an extremum at the trajec-
tory q(t). Let us imagine the space of all trajectories, which
is a huge set where each point represents one entire
trajectory q(t). The functional S[q(t)] has an extremum
at some point q
0
(t), which represents the actual trajec-
tory of the mechanical system. When we change coordi-
nates, q x, we merely change our description of the
space of trajectories. We cannot change the fact that the
functional S[q] has an extremum somewhere, at some
point q = q
0
. We may only change our description
of this point. Therefore, after a change of variables the
newfunctional

S[x] = S[q] will again have an extremum
at some point x
0
, and this point x
0
will correspond
to the point q
0
after the change of variables. The exis-
tence of the extremumis a geometric characteristic of the
shape of the functional S; thats why it is independent of
the way we choose to describe it with coordinates.
Let us illustrate this by considering a simple example
involving functions instead of functionals. The function
f(q) = (q1)
2
has a minimumat q
0
= 1. We may change
coordinates and use x insteadof q, where e.g. q = F(x)
2 sin x. This is a well-dened change of variables within
the interval x (/2, /2), where F
,x
= 0. In the new
coordinates, the function f(q) becomes
f(q)

f(x) = (2 sin x 1)
2
.
This function has a minimum at x
0
= /6; note that
F(x
0
) = 2 sin x
0
= 1 = q
0
. Geometrically speaking,

f ex-
actly the same function as before, except viewed in dif-
ferent coordinates. Therefore, it is no surprise that the
minimum x
0
= /6 corresponds to the old minimum,
q
0
= 1, after the change of coordinates.
This correspondence can be shown more formally. The
condition for the minimum of the function

f(x) is
d
dx

f(x) = 0 =
df(q)
dq
dF
dx
.
This condition is equivalent to the condition for the min-
imum of the function f(q), namely df(q)/dq = 0, as long
as F
,x
= 0. This is why the position of the minimum
in the old coordinates, q
0
= 1, exactly corresponds to
the position of the minimum in the new coordinates,
x
0
= /6.
Similarly, when we consider functionals, we may
write the condition for the minimum of

S[x] = S[F(q)]
in new coordinates as

S
x(t)
= 0 =
S
q(t)
dF
dx
.
It is clear that the condition for the minimum remains
the same under the change of variables, as long as the
new variables are well-dened, i.e. F
,x
= 0.
11.3. Is the Lagrangian unique?
Another important question is whether there is only one
Lagrangian that yields the correct equations of motion
for a given system. The answer is that there are in-
nitely many different Lagrangians that can be used for
any given system.
First of all, one may always multiply the Lagrangian
by a constant and also addan arbitrary xed function of
time, F(t), to the Lagrangian. The modied Lagrangian
is then

L(q, q, t) = L(q, q, t) + F(t). The term F(t)
is xed in the sense that it does not depend on x(t).
Then we can integrate this term explicitly and express
the modied action as

S[q] = S[q] +
_
t
2
t
1
F(t)dt.
The last term above is simply a number. Clearly, this
modication of the action is irrelevant: if q(t) is an ex-
tremum of S[q], then it is also an extremum of

S[q].
Adding a constant to a function does not change the po-
sition of the extrema.
More generally, we may add an arbitrary total time
derivative to the Lagrangian:

L = L +
d
dt
F(q, t).
The resulting modication of the action is

S[q] = S[q]+
_
t
2
t
1
d
dt
F(q, t)dt = S[q]+F(q
2
, t
2
)F(q
1
, t
1
),
where q
1
, q
2
are the boundary values of q(t). Since these
values are xed and do not vary when we vary q(t), the
extra term in the action is again a constant. Therefore,
this modication of the action does not change the equa-
tions of motion. One says that two Lagrangians differing
by a total derivative are equivalent.
One may even allow functions F that depend on
derivatives of q(t) as well as on q(t). However, in this
36
11. More about Lagrangians
case one would need to keep xed also the values of the
corresponding derivatives of q(t) at the boundary points
t
1
, t
2
.
The variety of equivalent Lagrangians is not limited
to those that differ by a total derivative or by a constant
coefcient. For example, the Lagrangians
L(q, q) = q
2
q
4
,

L(q, q) = q
3
q
6
,
lead to the same equation of motion, which can be writ-
ten as
[ q
2
+ 2q q]q q = 0,
even though one obviously cannot nd a function F(q, t)
and a constant such that L = L + dF/dt. (A term
dF/dt will produce an extra F
,q
q term in the Lagrangian
and cannot produce terms that are nonlinear in deriva-
tives, such as q
4
.)
So, as we can see, the Lagrangian for a given physical
system is not unique. The recipe L =kinetic energy mi-
nus potential energy is merely a useful rule that yields a
simple Lagrangian that correctly describes the dynamics
of the system.
37
12. Constrained systems: Lagrange formalism
12.1. What are constrained systems
In many cases in mechanics, the motion of bodies is con-
strained in some way: for example, a bead (point mass)
may be constrained to move along a bent wire of certain
shape; a wheel (massive cylinder) may be rolling along
a surface (but not sliding); or two masses may be con-
nected by a rigid stick of xed length.
In each of these cases, there are constraining forces
acting on the constrained bodies. In the above exam-
ples, the wire produces a force on the bead, the plane
acts by the force of friction on the cylinder, and the stick
pulls or pushes on the two masses. These forces may
vary in time; in fact, we do not know the magnitude of
these forces in advance. We know, however, that these
forces are at every time exactly such as to guarantee that
the constraints hold. The bead would y away if there
were no forces acting on it, but the wire provides exactly
the required constraining force that will always keep the
bead on the wire. The two masses connected by a rigid
stick experience a constraining force from the stick that
is exactly necessary to keep them at a constant distance
from each other. (This is what it means that the stick is
rigid.)
In the Newtonian approach to mechanics, constrained
systems are treated by introducing unknowns F
1
, F
2
, ...,
representing the unknown forces, and by solving the
system of equations for all the unknown forces and ac-
celerations. This procedure might be complicated; more-
over, we are not always interested in determining the
magnitudes of these unknown forces.
In the Lagrangian approach, there are two straightfor-
ward ways to treat constrained systems.
The method of solving the constraints. One uses
this method when one does not wish to compute
the magnitudes of the restraining forces. In this
method, one introduces the generalized coordinates
in such a way that the constraints are automatically
satised (this is what we have been doing, e.g. in the
examples of Sec. 3.4.1 and in Sec. 3.5). For example,
suppose a point mass is constrained to move along
a circle of radius R. We might describe this situ-
ation by saying that the Cartesian coordinates x, y
are constrained to satisfy the equation x
2
+y
2
= R
2
.
Now we can introduce the angle as the general-
ized coordinate and express the Cartesian coordi-
nates of the point mass as x = Rcos , y = Rsin .
These coordinates solve the constraint for all . The
power of the Lagrangian approach is that any gen-
eralized coordinates are equally correct. Having in-
troduced a generalized coordinate (t), one can di-
rectly write the Lagrangian in terms of the function
(t) and forget about the fact that the systemis con-
strained. The correct equations of motion for (t)
will be derived automatically.
The method of Lagrange multipliers is used when
one does wish to compute the magnitudes of the re-
straining forces, and/or when it is difcult to solve
the constraints. In this method, one does not try
to introduce new generalized coordinates that solve
the constraints. Instead, one solves the variational
problemin the presence of constraints: The correct tra-
jectory q
i
(t) is such that the action functional has
an extremumwith respect to trajectories that satisfy
the constraint equations. For example, suppose a
point mass is constrained to move along a circle of
radius R, but is otherwise unforced. Then the La-
grangian is L =
m
2
( x
2
+ y
2
) and the constraint is
x
2
+y
2
= R
2
. The correct trajectory x(t), y(t) will be
such that the integral
_
Ldt has the minimum value
while the constraint holds at every t. Thus we need
to solve a conditional minimization problem.
These two methods have their advantages and disad-
vantages. The rst method in most cases provides a
direct and concise way to derive the correct equations
of motion in the presence of constraints; however, one
obtains no information about the magnitude of the con-
straining forces. In some cases, this information is in fact
essential. For instance, a point mass gliding on top of a
surface will y off when the normal force becomes zero.
The method of Lagrange multipliers provides the com-
plete information about the constraining forces, at the
expense of a larger number of equations to solve.
12.2. Conditional minimization and
Lagrange multipliers
A conditional minimization problem can be solved by
the mathematical method involving the so-called La-
grange multipliers. Essentially, one considers a dif-
ferent, specially modied Lagrangian that describes the
fact that the system is constrained. The modied La-
grangian is equal to the normal Lagrangian plus special
terms. Let us now motivate and explain this method.
To gain a visual understanding, consider the task of
minimization of a function F(x, y) with respect to vari-
ables x, y, subject to the constraint that G(x, y) = 0,
38
12. Constrained systems: Lagrange formalism
where G is another given function (if this sounds too ab-
stract, follow along with Sec. 12.2.1 where an explicit ex-
ample is shown). First recall how the problem would be
solved without the constraint. The minimum (or, more
generally, an extremum) of F would be the point (x, y)
where the partial derivatives of F(x, y) vanish:
F(x, y)
x
= 0,
F(x, y)
y
= 0.
This would give a system of two equations that deter-
mines the two unknowns x

, y

. With the constraint,


the above system of equations will not give the correct
answer because the solution x

, y

most probably will


not satisfy the constraint: G(x

, y

) = 0. Let us look at
the problem geometrically. The constraint G(x, y) = 0
determines a curve (or perhaps several curves) in the
(x, y) plane; we are looking for the the extremum of
the function F as we go along the curve G = 0. Let
us imagine the level lines of the function F, i. e. the
lines F(x, y) = A for various values of the constant A.
The constraint curve G(x, y) = 0 may cross the level
lines of F; it means that the value of F changes along
the curve. It is clear from this geometric consideration
that the extremum of F along the constraint curve will
be the point where the constraint curve is tangent to
some level line of F. A condition for two curves to be
tangent is that their normal vectors are parallel. The
normal vector to a curve G(x, y) = 0 at a point (x, y)
has components (G(x, y)/x, G(x, y)/y). The nor-
mal vector to the level line F(x, y) = A has components
(F(x, y)/x, F(x, y)/y). These two vectors are paral-
lel if there exists a number such that
F(x, y)
x
=
G(x, y)
x
,
F(x, y)
y
=
G(x, y)
y
.
Together with the constraint G(x, y) = 0, this yields a
systemof three equations for the three unknowns x, y, :
F(x, y)
x
=
G(x, y)
x
,
F(x, y)
y
=
G(x, y)
y
,
G(x, y) = 0.
In this way one solves the conditional minimization
problem.
Now note that the equations are the same as for
minimization of the function K(x, y, ) F(x, y)
G(x, y) with respect to the three variables x, y,
without any constraints. Therefore, the conditional
minimization problem is equivalent to an ordinary
(unconstrained) minimization problem for a different
function, K(x, y, ). This newfunction is built by adding
the original function F to the constraint G multiplied by
an extra variable . This variable is called the Lagrange
multiplier.
12.2.1. Example of using Lagrange multipliers
Here is a worked-out example. Suppose we need to
maximize the function F(x, y) = 5x+12y under the con-
straint x
2
+y
2
= 1. First, we need to write the constraint
in the form G(x, y) = 0, where G is some function. For
instance, we may set G(x, y) = x
2
+y
2
1. (As we shall
see below, it does not matter howwe choose the function
G, as long as the constraint is equivalent to the equation
G(x, y) = 0!) Then we build a new function called K,
K(x, y, ) = F(x, y) G(x, y)
= 5x + 12y (x
2
+y
2
1).
We now need to minimize this function with respect to
the three variables x, y, . We obtain the systemof equa-
tions:
5 2x = 0, 12 2y = 0, x
2
+y
2
1 = 0.
It is easy to solve these equations:
x =
5
13
, y =
12
13
, =
13
2
.
Thus we nd the values of x and y. The value of the La-
grange multiplier is useless for us now, but it will be
useful when we apply this method to problems in me-
chanics. (As we will see below, the values of Lagrange
multipliers can be used to compute the magnitudes of
each of the constraining forces.)
12.2.2. General case
The general form of the constrained optimization prob-
lem is the following. We need to nd an extremum of
a given function F(x
1
, ..., x
n
), where x {x
1
, ..., x
n
} is
an array of variables satisfying m different constraints
G
1
(x) = 0, ..., G
m
(x) = 0.
The geometric consideration that we used in a simple
case (the example with functions F(x, y), G(x, y) above)
can be generalized to many dimensions and many con-
straints. One considers level surfaces of F and the sur-
faces given by the constraints. In an n-dimensional
space, these level surfaces are in fact multidimensional
hypersurfaces. For brevity, I will call them simply sur-
faces.
A constrained extremum will occur at a point x if a
level surface of F is tangent to the constraint surface at
that point. The constraint surface is an intersection of
m different surfaces G
j
= 0, each having its own normal
vector n
j
= G
j
/x (where j = 1, ..., m). It can be shown
using elementary vector algebra (I omit the proof) that
the level surface of F is tangent to the constraint sur-
face if normal vector F/x to the level surface of F is
a linear combination of the m normal vectors n
j
. There-
fore, the conditions for the constrained extremum to be
located at a point x are that (1) x must satisfy all the
39
12. Constrained systems: Lagrange formalism
constraints and (2) that there should exist m numbers

1
, ...,
m
such that
F
x
=
1
G
1
x
+... +
m
G
m
x
.
It is easy to see that these conditions are equivalent to
the conditions for an extremum of a new function
K(x,

) F(x)
1
G
1
(x) ...
m
G
m
(x),
with respect to (m + n) variables x
1
, ..., x
n
,
1
, ...,
m
without any constraints.
Let us then formulate the recipe to solve the problem
of constrained optimization. We introduce an array

of
m different Lagrange multipliers

{
1
, ...,
m
} and
build a new function
K(x,

) F(x)
1
G
1
(x) ...
m
G
m
(x). (12.1)
We then nd an extremum of this function with respect
to the total set of (m + n) variables x
1
, ..., x
n
,
1
, ...,
m
.
In order to do that, we need to solve a system of (m+n)
equations:
K
x
1
= 0, ...,
K
x
n
= 0,
K

1
= 0, ...,
K

n
= 0. (12.2)
By solving these equation, we will obtain a set of values
x
1
, ..., x
n
, which we are interested in. (The values of the
auxiliary variables
j
can be discarded.)
12.3. Motion constrained to a surface
Let us now consider the following general situation: A
point mass is moving in a potential V (x) and, addition-
ally, is constrained to move along a surface given by an
equation G(x) = 0. This can be physically realized by
a mass point sliding without friction on top of a curved
surface.
According to the Lagrangian approach, we must nd
an extremum of the action,
S[x] =
_
L(x,

x)dt =
_
_
m
2

x
2
V (x)
_
dt,
under the condition that G(x(t)) = 0 for all times t.
(Note that G is not a functional, so G(x(t)) is still a func-
tion of t.) We can apply the method of Lagrange mul-
tipliers. We have, in effect, innitely many constraints
one constraint, G(x(t
1
)) = 0, for each moment of time t
1
.
Therefore we need to introduce a set of innitely many
Lagrange multipliersone Lagrange multiplier for each
t. It is convenient to arrange this set of Lagrange multi-
pliers into a function, (t).
According to the method of Lagrange multipliers, we
need to build an extended action, which is equal to
the original action S[x] minus the sum of all the con-
straints multiplied by their respective Lagrange multi-
pliers. Since we have innitely many constraints, one for
each t, the sum over the constraints becomes an integral
over t. Therefore, the new action is

S[x(t), (t)] =
_
L(x,

x)dt
_
(t)G(x(t))dt.
Solving the constrained optimization problem is equiv-
alent to nding an extremum of the functional

S with
respect to arbitrary x(t) and (t).
In principle, we have reduced the constrained prob-
lem to an unconstrained problem with an extra function
(t). What remains is some technical work:
Deriving the Euler-Lagrange equations from the ex-
tended action

S and solving those equations for the
unknown functions x(t) and (t).
Interpreting the function (t). It will turn out (see
Sec. 12.3.2 below) that (t) is related to the force
that is needed to keep the point mass moving only
along the surface G(x) = 0. So the Lagrange mul-
tiplier has a direct physical interpretation in this
case. More precisely, we shall show that the con-
straining force

F(t) exerted by the surface is equal
to (t)G/x(t). In the language of Newtonian
physics, the force

F is called the normal force; in-
deed, it is clear that the direction of

F is always nor-
mal to the surface.
12.3.1. Example
A massive bead is set on a wire curved in the vertical
plane (coordinates x, z) in the shape of the function z =
Cx
2
, where C is a given constant. The only external force
is the gravitational pull of the Earth. We would like to
determine the equation of motion for the position of the
bead.
Choose the constraint function as G(x, z) = z Cx
2
.
Then the extended Lagrangian is

L =
1
2
(m x
2
+m z
2
) mgz (z Cx
2
).
The Euler-Lagrange equations are derived in the stan-
dard way:
Variation w.r.t. x(t) gives: m x(t) = (t)2Cx(t)
Variation w.r.t. z(t) gives: m z(t) = mg (t)
Variation w.r.t. (t) gives: 0 = z(t) Cx
2
(t)
It is not easy to solve these equations, but deriving them
is straightforward and requires no thinking, so to
speak. (That is, we simply follow general rules, and we
do not need to make any special decisions or nd spe-
cial tricks for each particular situation. A computer can
be programmed to derive these equations fromgiven

L.)
Exercise: There is an arbitrary choice in selecting the
constraint function G. Namely, one can select any func-
tion of G instead of G, or multiply G by a function that
40
12. Constrained systems: Lagrange formalism
is everywhere nonzero. For example, we could have se-
lected G(x, z) = 2x
2
2z/C or G(x, y) = |x|
_
z/C, and
the constraint line G(x, z) = 0 would remain the same.
More generally, one might choose

G(x) = A(x)B(G(x))
instead of G, where A and B are some functions such
that A(x) = 0 for all x and B

(z) = 0 for all z; under


these assumptions, the equation G(x) = 0 is equivalent
to

G(x) = 0. Show that the equations of motion derived
with the constraint

G = 0 are equivalent to those derived
with the constraint G = 0, with a suitable change in the
denition of (t).
Solution: The equations of motion with the Lagrangian
L G are
d
dt
L

L
x

G
x
= 0.
Introduce the constraint function

G into the Lagrangian
with a multiplier

. The new equations of motion are
d
dt
L

L
x


x
(AB(G)) = 0,
AB(G(x)) = 0.
Since by assumption AB(G) = 0 is equivalent to B(G) =
0, we nd on the constraint surface


x
(AB(G)) =

B(G)
A
x
+

AB

(G)
G
x
.
=

AB

(G)
G
x
. (12.3)
Therefore, the newequations of motion are equivalent to
the old ones with =

AB

(G).
12.3.2. Lagrange multipliers and constraining
forces
We see from the equation for z(t) in the above example
that (t) looks like the component of the normal force
in the z direction. So it is clear that the Lagrange multi-
plier is somehow related to the unknown constraining
force. We shall now derive this relationship in the gen-
eral case.
We begin with an informal motivation. Consider
again the extended Lagrangian

L = L (t)G(x). The
term (t)G(x) looks like an extra amount of poten-
tial energy, although it depends on time through the
factor (t). Let us examine this a somewhat unusual
kind of potential energy in more detail. As long as the
point mass remains within the constraint surface, we
have G(x) = 0 and this extra potential energy is equal
to zero. But if the point mass could move a little bit
off the constraint surface, say in a direction given by a
small vector x, then this extra potential energy would
change by
V = x
(G)
x
.
This looks like work done by a force. The force is directed
orthogonally to the constraint surface and is equal to

F = G/x. We expect precisely this kind of force


to act on the mass point by a constraining device!
Let us verify more formally that

F is in fact equal to
the constraining force we are looking for. The Euler-
Lagrange equations that follow from the Lagrangian

L
are of the form
mass acceleration
d
dt
L


x
= force
L
x
(t)
G
x
.
The term L/x describes the usual free forces,
i.e. forces due to the potential energy in the original La-
grangian L. Nowit is clear that the term
G
x
describes
the additional force due to constraints.
Exercise: There is, of course, an arbitrary choice in se-
lecting the constraint function G. For instance, we might
choose the constraint function as 17G or G
3
or some
other function f(G) instead of G. More generally, one
may choose

G(x) A(x)B(G(x)) instead of G(x), as
long as A = 0 and B

= 0. Show that the constrain-


ing force will come out to be is the same for any choice
of the function

G.
Answer: The value of will change appropriately for
every different choice of G, as shown by Eq. (12.3), and
the expression for the force (G/x) will stay the same.
Exercise: Figure out how to compute the constrain-
ing force

F if there are several independent constraints
G
1
(x), ..., G
n
(x).
Answer: One needs to introduce n Lagrange multipli-
ers
1
, ...,
n
. The force is equal to

n
i=1

i
G
i
/x.
12.4. Constraints involving velocities
So far we considered only constraints that are expressed
by functions of coordinates, such as G(x, y) = 0. This
form of the constraint covers a wide range of applica-
tions. However, there exist important cases where phys-
ical constraints cannot be expressed in this way. For
example, the motion of a massive ball that is rolling
on a surface without sliding, or the motion of a skater
who is sliding on ice, can be described only using com-
plicated constraints that involve velocities and coordi-
nates at the same time. Such constraints, which are not
equivalent to a simple function of coordinates, are called
nonintegrable or nonholonomic constraints, whereas
the constraints of the type we considered up to now,
G(x) = 0, are called integrable or holonomic (even if
G also depends on time). For example, the constraint
x = y + 1 is integrable (holonomic) since it is equivalent
to x(t) x
0
= y(t) +t, while the constrant x(t) = y(t) z(t)
is nonintegrable (nonholonomic).
In the general case, there may be several constraints,
some holonomic, say F
j
(x) = 0, and some nonholo-
nomic, say G
j
(x,

x) = 0, where j = 1, 2, ..., n. Trajec-
tories x(t) of mechanical systems with such constraints
are found as extrema of the action
_
L(x,

x)dt under the
constraints, F
j
(x) = 0 and G
j
(x,

x) = 0. The holonomic
41
12. Constrained systems: Lagrange formalism
constraints can be incorporated into the extended La-
grangian with Lagrange multipliers, as described above,
but anholonomic constraints cannot. In other words,
the conditional optimization problem in the presence of
anholonomic constraints is not equivalent to an uncon-
strained optimization problem for another Lagrangian.
The reason for this is that the velocities

x(t) are not var-
ied independently from the coordinates x(t) when we
determine the equations of motion from the Lagrangian.
So the procedure for building an extended Lagrangian
with Lagrange multipliers does not work any more.
The reason for this problem can be visualized as fol-
lows. Lagrangians can describe only conservative me-
chanical systems, i.e. systems with forces due to po-
tentials, or due to interactions that perform no work
(e.g. the magnetic force). A holonomic constraint, for
example, the restriction of the motion to a surface, can
be seen as an idealization of a potential that has a very
steep minimum along the surface. Thus, constrain-
ing forces are effectively potential forces. For this rea-
son, holonomic constraints are straightforwardly incor-
porated into the Lagrangian. However, nonholonomic
constraints are not equivalent to any potentials or con-
servative forces. The forces arising from nonholonomic
constraints depend on velocities in nontrivial way and
do perform work. Such forces cannot be described by
any Lagrangian.
A special theory exists to derive the equations of
motion in the presence of arbitrary anholonomic con-
straints. However, this theory is beyond the scope of the
minimal standard course.
42
13. Advanced canonical methods
In this chapter we shall see that there exists a canonical
transformation (q, p) (

Q,

P) such that the new vari-
ables (

Q,

P) are constant in time. This canonical trans-
formation therefore yields a general solution of the equa-
tions of motion, expressing q(t) and p(t) through con-
stants of integration. The Hamilton-Jacobi (HJ) equation
allows one to nd such a canonical transformation. Of-
ten the method of separation of variables can be used to
nd suitable solutions of the HJ equation. If a canoni-
cal pair of variables (q
1
, p
1
) is separable and if the val-
ues of these variables are bounded, one can perform
a canonical transformation to action-angle variables,
(q
1
, p
1
) (, J), such that J = const and

= const.
These variables conveniently describe the motion in the
(q
1
, p
1
) phase plane. The real usefulness of the action-
angle variables is in the application to the case when
the Hamiltonian is a slow-changing function of time. In
that case, J is an adiabatic invariant with respect to slow
changes in the Hamiltonian. The change in J is exponen-
tially small, even if the total change of the Hamiltonian
is large (but spread over a long time).
13.1. Action evaluated on solutions
Let us consider a mechanical system whose trajectories
are completely known. We may consider a bunch of tra-
jectories q(t) that start at a xed initial time t
0
from a
xed initial point q
0
in all possible directions (i.e. with
all possible initial velocities). It is clear that the initial
portions of these trajectories will cover the entire neigh-
borhood of the initial point q
0
, and that there will be a
unique trajectory that reaches a nearby point q at a later
time t, for each q, and for sufciently small t t
0
. We can
compute the action along the trajectories q(t),
S(q, t; q
0
, t
0
) =
_
t
t
0
L(q,

q, t)dt.
Note that we are using the actual trajectories of the sys-
tem, i.e. paths q(t) that solve the equations of motion
(EOM).
1
We have just argued that (for small enough
t t
0
) the function S(q, t; q
0
, t
0
) is well-dened for every
q in a neighborhood of q
0
. We can compute this function
S if we know all the trajectories of the system. Note that
1
One can prove (although the proof is not simple) that the ac-
tion functional, S[q(t)], really has a minimum (not merely an ex-
tremum) at the solution q(t) of the EOM, if the neighbor trajecto-
ries emanating from (q0, t0) do not cross each other. In that case,
the principle of least action holds literally and S(q, t, q0, t0) is
in fact the value of the minimum of the action over all trajectories
connecting (q0, t0)and (q, t).
at late times t, trajectories may turn around or cross each
other, so that there will not be a unique trajectory reach-
ing q at time t, and the function S will be undened.
You may be wondering why the function S is inter-
esting. Let us substitute

Q instead of q
0
, set t
0
to a xed
value, and treat S( q,q
0


Q, t) as a generating function of
a canonical transformation, (q, p) (

Q,

P). This canon-
ical transformation denes new coordinates and a new
Hamiltonian through the relations
p =
S(q, q
0
, t)
q
,

P =
S(q, q
0
, t)
q
0
, H

= H +
S
t
.
(13.1)
We shall now show by an explicit calculation that H

=
0, which is certainly a great simplication. This is why
the function S is interesting.
To proceed, we need to compute S/q, S/q
0
, and
S/t. Note that the derivative S/q measures the vari-
ation S under a change q of the nal point q of the
trajectory q(t), while the initial point q
0
is held xed.
A change of the nal point, of course, changes also the
entire trajectory q(t). Suppose the trajectory is thereby
changed by q(t). Then we apply the familiar derivation
of the variation S,
S =
_
t
t
0
_
L
q
q +
L

q
_
dt
=
L


q
q

t
t
0
+
_
t
t
0
_
L
q

d
dt
L


q
_
q dt
= p(t)q(t) p(t
0
)q(t
0
),
since the trajectory q(t) is a solution of the EOM.
Presently, q(t
0
) = 0 since the initial point q(t
0
) q
0
is held xed. It follows that S/q = p(t), which is the
canonical momentum p evaluated on the correct trajec-
tory q(t) at the nal time t. Similarly, we could vary the
initial point q
0
and obtain S/q
0
= p(t
0
). Finally, we
need to compute S/t, which is the derivative with re-
spect to the change of the nal time t while the initial
and nal coordinates, q
0
and q, are held xed; under
these conditions, a change of the nal time will of course
change the entire trajectory q(t) as well. To compute the
quantity S/t, we note that the total derivative of the
action, dS/dt, is (by denition) equal to the Lagrangian:
d
dt
S(q(t), t; q
0
) = L(q(t),

q(t), t).
On the other hand,
dS(q, t; q
0
)
dt
=
S(q, t; q
0
)
t
+
S(q, t; q
0
)
q

q =
S
t
+ p

q.
43
13. Advanced canonical methods
Therefore,
S
t
= L p

q = H,
where H is the Hamiltonian, H = p

q L.
Now we can rewrite the canonical transforma-
tion (13.1) as
p =
S
q
= p(t),

P =
S
q
0
= p(0), H

= H+
S
t
= 0.
We can drawtwo conclusions about this canonical trans-
formation: (i) the newcoordinates and momenta, (

Q,

P),
are simply equal to the initial conditions q(t
0
) and p(t
0
)
at a xed time t
0
, and (ii) the new Hamiltonian is identi-
cally zero, indicating trivial EOM,
d

Q
dt
=
d

P
dt
= 0.
So the new variables (

Q,

P) are in a sense constants of
motion. Writing the canonical transformation in the
form q = q(

Q,

P, t), p = p(

Q,

P, t), we obtain a com-
plete set of trajectories for the system, with arbitrary con-
stants (

Q,

P) representing initial conditions at time t
0
.
This is an explicit general solution of the EOM. Thus, the
knowledge of a canonical transformation (q, p) (

Q,

P)
such that H

= 0 gives us a complete solution of the me-


chanical problem.
Note that a generating function S(q,

Q, t) denes a
valid canonical transformation only if the following de-
terminant is nonzero,

2
S(q,

Q, t)
q
i
Q
j

= 0. (13.2)
One can show that this determinant is always nonzero
for the action function S(q, q
0
, t), where one sets q
0


Q,
under the assumption that neighbor trajectories do not
cross.
2
Indeed, the formula
p
0
=
S(q, q
0
, t)
q
0
(13.3)
determines the required initial momentum p
0
for reach-
ing the nal position q from the initial position q
0
. If the
neighbor trajectories do not cross, there is only one such
initial momentum for every nal position q, and, con-
versely, only one nal position q for each initial momen-
tum p
0
. Therefore, the formula (13.3) can be viewed as
a system of equations for determining q when q
0
and p
0
are known. This system is always solvable with respect
to q and the solution is unique, therefore the correspond-
ing determinant (13.2) is always nonzero.
Thus, we nd that the generating function S yields
a canonical transformation representing an explicit gen-
eral solution of the EOM. Of course, the way we found
2
We shall not need to formulate the no-crossing condition in a rigor-
ous mathematical way, although it can be done with some effort.
this canonical transformation was by using the action
function S(q, t; q
0
, t
0
), while one can only determine this
function if one already knows the complete solution of the
EOM. It would be useful if we could determine such a
canonical transformation without knowing the solutions
q(t). One method is to use the Hamilton-Jacobi equation.
13.2. Hamilton-Jacobi equation
We have seen that the action function S(q, t; q
0
, t
0
) is a
generating function of a canonical transformation that
transforms the Hamiltonian into a zero. The new canon-
ical variables are the initial values (q
0
, p
0
), which makes
it very easy to nd a complete solution of the EOM. It is
true that we cannot nd S(q, t; q
0
, t
0
) without rst hav-
ing a complete solution of the EOM. So at rst the idea
of nding this canonical transformation may seemhope-
less.
However, now that we appreciate the idea that such
a transformation could exist, we realize that we do not
necessarily need the canonical transformation (q, p)
(q
0
, p
0
), where the new variables are the initial condi-
tions. In fact, any canonical transformation,
p =
F
q
,

P =
F

Q
, H

= H +
F
t
,
such that the new Hamiltonian H

= 0, would do just as
well. Let us try to determine a generating function, say
F(q,

Q, t), for such a canonical transformation.
From the requirement H

= 0 we obtain the following


condition for the generating function F(q,

Q, t),
F(q,

Q, t)
t
+H(q, p, t)
=
F(q,

Q, t)
t
+H(q,
F(q,

Q, t)
q
, t) = 0. (13.4)
This is a partial differential equation for F(q,

Q, t) called
the Hamilton-Jacobi equation. Additionally, the func-
tion F should dene a valid canonical transformation,
i.e. we must require the nondegeneracy condition

2
F(q,

Q, t)
q
i
Q
j

= 0. (13.5)
Note that the choice of the constants

Qentering the func-
tion F(q,

Q, t) are largely arbitrary parameters and could
be arbitrarily redened,

Q

Q, as long as the nonde-


generacy condition (13.5) holds. For instance, we could
always replace Q
1
f(Q
1
), where f is an arbitrary
function such that f

= 0, because this will not change


the condition (13.5). At the same time, an additive con-
stant (e.g. F = F(q,

Q, t) + Q
0
) is not desired: since
F/Q
0
= 1, the matrix row
2
F/Q
0
q
i
is entirely
equal to zero, so the determinant (13.5) would be equal
to zero.
44
13. Advanced canonical methods
For example, consider the Hamiltonian of a harmonic
oscillator with mass m and frequency ,
H(p, q) =
p
2
2m
+
m
2
q
2
2
.
Then the generating function F depends on q, Q, t, and
the Hamilton-Jacobi equation is
F(q, Q, t)
t
+
1
2m
_
F(q, Q, t)
q
_
2
+
m
2
q
2
2
= 0.
The variable Q is not being differentiated; one says that
it enters into the equation as a parameter.
Let us remark that the HJ equation can be simplied
for conservative systems (H/t = 0). We can look for the
generating function F in the form
F = F(q,

Q) +f(t,

Q),
which gives

f(t,

Q)
t
= H(q,
F(q,

Q)
q
).
Since H does not depend on t while f does not depend
on q, a solution is possible only if both f/t and H
are equal to a function only of the parameters

Q. (If you
are unfamiliar with this logical step, see Sec. 13.2.1.) Let
this function be E(

Q). Therefore, we nd
f(t,

Q) = tE(

Q),
while the rest of the HJ equation is
H(q,
F(q,

Q)
q
) = E(

Q). (13.6)
Since the choice of the parameters

Q is arbitrary (subject
only to the nondegeneracy condition), we may replace
one of these parameters, say Q
n
, with E. In this way we
obtain a simpler HJ equation.
13.2.1. Separation of variables in general
The method of separation of variables is usually applied
to partial differential equations. You may skip this sec-
tion if you understand the idea of separating variables.
Let us start with a simple example. Suppose someone
gives you the following equation:
1 +ax
3
= e
by
, (13.7)
and asks to determine the values of a and b such that
Eq. (13.7) holds for all values of x and y. It is easy to
see that the only possible solution a = b = 0. Indeed, if
Eq. (13.7) holds for all x and y, let us x the value of y,
e.g. set y = 3. Then we nd 1 +ax
3
= e
3b
, which should
hold for all x, say for x = 0 and x = 1. Then we get
1 = e
3b
and 1 +a = e
3b
, so a = 0. But then Eq. (13.7) says
that we have 1 = e
by
for all y, say for y = 1. Then we
have 1 = e
b
, so b = 0.
Let us make this reasoning more general. Equa-
tion (13.7) is of the form f(x) = g(y), where f and g
are some functions. Let us x the value of y, say y = y
0
,
then we get f(x) = g(y
0
). The right-hand side, g(y
0
), is
a xed number, so the only way f(x) = g(y
0
) can hold
for all x is if the left-hand side, f(x), is in fact indepen-
dent of x. Similarly, g(y) should be independent of y. In
other words, the only way f(x) = g(y) can hold for all
x, y is when both f(x) and g(y) are equal to a constant,
f(x) = g(y) = C. This is the main logical step in the
method of separation of variables.
As another example, consider the equation
y (A+Bx) + (C +y) (1 +x) = 0. (13.8)
Our goal is to determine the values of the parameters
A, B, C such that Eq. (13.8) holds for all x, y. Let us trans-
form Eq. (13.8) to
A+Bx
1 +x
=
C +y
y
.
Since the left-hand side is a function only of A, B, x while
the right-hand side is a function of C, y, we conclude that
both sides must be equal to a constant, say D:
A+Bx
1 +x
=
C +y
y
= D.
It follows that A = 1, B = 1, C = 0, D = 1.
One says that the variables x and y can be separated in
Eq. (13.8).
More generally, we may have an equation of the form
f(x
1
, x
2
, ..., x
n
, y
1
, y
2
, ..., y
n
) = g(y
1
, y
2
, ..., y
n
, z
1
, z
2
, ..., z
n
),
i.e. both sides share a set of variables (y
1
, ..., y
n
). Again,
it is given that this equation holds for all values of the
vectors x, y, z. In that case, it is clear that both sides must
be equal to a function h(y) depending only on the shared
variables, so we may write
f(x, y) = h(y) = g(y, z).
In this case, one says that the variables x and z can be
separated from other variables. The function h(y) is so
far unknown; but the condition that f(x, y) is indepen-
dent of x and that g(y, z) is independent of z is a very
strong restriction which usually allows one to achieve
signicant progress in the calculations.
13.2.2. Separation of variables in HJ equation
The HJ equation (13.4) is an equation in partial deriva-
tives and appears to be much more complicated than the
set of EOM (

q = H/ p,

p = H/q) which are or-
dinary differential equations. However, we do not need
a general solution of the HJ equation; we only need to
45
13. Advanced canonical methods
nd a particular solution containing some constants

Q.
In many cases, this task turns out to be easier than solv-
ing the EOM directly. In fact, in many cases the solution
can be guessed, and this turns out to be easier than to
guess the solutions of the EOM directly. Since we are
merely looking for one suitable solution, we are free to
choose any ansatz and substitute it into the HJ equation.
If that ansatz works, we will nd a solution and be done;
if that ansatz does not work, we can try another ansatz.
A common method to nd suitable solutions of the HJ
equation (with the constants) is to make a guess that the
solution has the form
F(q,

Q, t) = S
1
(q
1
, Q
1
) +W(q
2
, q
3
, ..., q
n
,

Q, t). (13.9)
This ansatz is called separating the variable q
1
. Substi-
tuting the ansatz (13.9) into the HJ equation, we nd
H(q
1
,
S
1
q
1
, q
2
,
W
q
2
, ..., q
n
,
W
q
n
, t) +
W
t
= 0. (13.10)
Note that the entire dependence on q
1
is contained in S
1
.
Suppose that we can rewrite Eq. (13.10) in the form
f
1
(q
1
,
S
1
q
1
) = g(q
2
,
W
q
2
, ..., q
n
,
W
q
n
,
W
t
, t), (13.11)
i.e. suppose we can solve for q
1
in this sense. The left-
hand side of Eq. (13.11) is a function only of q
1
and Q
1
,
while the right-hand side is a function of q
2
, ..., q
n
, t, and

Q. Therefore, both sides of Eq. (13.11) must be equal to


some function of the parameter Q
1
:
f
1
(q
1
,
S
1
q
1
) =

f(Q
1
) = g(q
2
,
W
q
2
, ..., q
n
,
W
q
n
,
W
t
, t).
We may redene the constant Q
1
as Q
1


f(Q
1
), so that
both sides of Eq. (13.11) are now equal to Q
1
. This is a
convenient way to introduce the necessary constant Q
1
into the solution. Then we nd two equations
f
1
(q
1
,
S
1
q
1
) = Q
1
, (13.12)
g(q
2
,
W
q
2
, ..., q
n
,
W
q
n
,
W
t
, t) = Q
1
. (13.13)
The rst of these equations is an ordinary differential
equation (since it contains only derivatives with respect
to q
1
) and can be easily integrated to compute S
1
(q
1
, Q
1
).
Thus we have separated the variable q
1
, introduced the
necessary constant Q
1
, and obtained a simplied re-
maining part of the HJ equation that does not contain
q
1
.
Equation (13.12) is easy to solve if we invert the func-
tion f
1
, so that
S
1
q
1
=

f
1
(q
1
, Q
1
),
S
1
(q
1
, Q
1
) =
_
q
1

f
1
(q
1
, Q
1
)dq
1
,
where

f
1
is the function inverse to f
1
with respect to its
second argument. Since F/q
1
= S
1
/q
1
, which fol-
lows from Eq. (13.9), the function

f
1
(q
1
, Q
1
) is actually
equal to the canonical momentum p
1
. Therefore, the tra-
jectory of the system in the phase plane (q
1
, p
1
) is fully
determined by the equation p
1
=

f
1
(q
1
, Q
1
), indepen-
dently of all the other variables p
j
, q
j
, j = 1. It is in this
sense that the variable q
1
is called separable from the
other variables.
The remaining part (13.13) of the HJ equation can be
treated similarly: we could try separating the variable
q
2
(which would in turn introduce the constant Q
2
into
the solution). If the Hamiltonian is such that all the vari-
ables can be separated in the HJ equation, the resulting
function F has the form
F(q,

Q, t) = S
1
(q
1
, Q
1
)+S
2
(q
2
, Q
2
)+...+S
n
(q
n
, Q
n
)E(

Q)t.
In this case the trajectories of the canonical pair (q
1
, p
1
)
are independent of trajectories of every other canonical
pair (q
i
, p
i
), and all these trajectories can be found ex-
plicitly. Then the Hamiltonian is called completely inte-
grable.
13.2.3. Examples
We shall nowconsider some examples where a mechani-
cal systemis solved using the method of HJ equation. In
these examples, we consider relatively simple systems
that can be solved using ordinary methods. The purpose
of these examples is to illustrate the Hamilton-Jacobi the-
ory.
3
Let us start with a system consisting of two noninter-
acting particles, described by the Hamiltonian
H =
p
2
1
2
+V
1
(q
1
) +
p
2
2
2
+V
2
(q
2
). (13.14)
It is perhaps obvious by looking at this Hamiltonian that
the variables (q
1
, p
1
) are separable from(q
2
, p
2
), since the
two particles do not interact at all and their motions are
independent. Let us check that the HJ equation indeed
allows one to separate these variables, in the sense de-
ned in Sec. 13.2.2 above. Since the Hamiltonian (13.14)
is time-independent, we write the ansatz as
F(q
1
, q
2
, t) = S
1
(q
1
) +S
2
(q
2
) Et. (13.15)
(So far there is only one constant, E, so we are wait-
ing to introduce another one.) The simplied HJ equa-
tion (13.6) then becomes
1
2
S
2
1
+V
1
(q
1
) +
1
2
S
2
2
+V
2
(q
2
) = E, (13.16)
3
Of course, the real usefulness of the HJ formalism is in its power
to solve problems that could not otherwise be solved. In fact,
large classes of completely integrable systems were discovered
using this formalism. But such complicated systems are beyond
the scope of the present text.
46
13. Advanced canonical methods
where we simply write S

1
instead of S
1
/q
1
since S
1
depends only on q
1
, and similarly for S
2
. Nowwe notice
that only the rst two terms in Eq. (13.16) depend on q
1
,
so we rewrite that equation as
1
2
S
2
1
(q
1
) +V
1
(q
1
) = E
1
2
S
2
2
(q
1
) V
2
(q
2
),
which shows explicitly that the variable q
1
separates.
Both sides of the above equation are equal to a constant
which we may call E
1
,
1
2
S
2
1
(q
1
) +V
1
(q
1
) = E
1
,
E
1
2
S
2
2
(q
1
) V
2
(q
2
) = E
1
.
It is easy to obtain S
1,2
(q) in the form of integrals,
S
1
(q
1
) =
_
q
1 _
2E
1
2V
1
(q
1
)dq
1
,
S
2
(q
2
) =
_
q
2 _
2E 2E
1
2V
2
(q
2
)dq
2
.
Let us denote E E
1
E
2
for convenience (these con-
stants will appear in the solution more symmetrically).
The solution of the HJ equation is thus
F(q
1
, E
1
, q
2
, E
2
, t) =

2
_
q
1 _
E
1
V
1
(q
1
)dq
1
+

2
_
q
2 _
E
2
V
2
(q
2
)dq
2
(E
1
+E
2
) t.
(13.17)
This is the generating function of the canonical trans-
formation (q
1
, p
1
, q
2
, p
2
) (E
1
, D
1
, E
2
, D
2
), where E
1,2
are the newcoordinates and D
1,2
the newmomenta.
The new Hamiltonian
H

(E
1
, D
1
, E
2
, D
2
) = H +
F
t
= 0,
therefore the new canonical variables are constant in
time. We can now obtain explicit formulas relating the
old canonical variables (q
1
, p
1
, q
2
, p
2
) to the new ones.
We write the denition of the canonical transformation
generated by F,
p
j
=
F
q
j
, D
j
=
F
E
j
, j = 1, 2.
Substituting Eq. (13.17) for S, we get
D
j
=
F
E
j
=
1

2
_
q
j
dq
j
_
E
j
V
j
(q
j
)
+t,
p
j
=

2
_
E
1
V
j
(q
j
).
These relations can be viewed as explicit formulas deter-
mining q
j
(t) and p
j
(t) through the arbitrary constants E
j
and D
j
. If we can evaluate the integrals in closed form
as certain functions, e.g.
1

2
_
z
0
dz
_
E
j
V
j
(z)
h
j
(z), j = 1, 2
(the initial value z = 0 was set arbitrarily), then we may
write the solution as
h
1
(q
1
(t)) = t D
1
, h
2
(q
2
(t)) = t D
2
,
and determine q
j
(t). Thus the equations of motion are
completely integrated. The interpretation of the new
canonical variables is that E
j
is the energy of the particle
j, while D
j
is the moment of time where q
j
= 0.
As a second example, consider the following Hamilto-
nian (describing a particle in a magnetic eld),
H =
p
2
1
2
+
(p
2
+q
1
)
2
2
+V (q
1
).
This time it may not be obvious whether (q
1
, p
1
) are
separable from (q
2
, p
2
). Let us try to substitute the
ansatz (13.15) into the HJ equation,
1
2
S
2
1
(q
1
) +
1
2
_
S

2
(q
2
) +q
1
_
2
= E. (13.18)
This equation is not yet in an explicitly separable form
because q
1
and q
2
are mixed together; for instance, we
cannot integrate S
1
since
S

1
(q
1
) = 2E
_
S

2
(q
2
) +q
1
_
2
(13.19)
depends on both q
1
and q
2
. We would like to gather q
1
on one side of the equation and q
2
on the other side. So
we solve Eq. (13.18) for S

2
,
S

2
(q
2
) = q
1
+
_
2E S
2
1
(q
1
).
Now the HJ equation is reduced to an explicitly separa-
ble form: both sides are equal to a constant that we may
denote P
2
,
S

2
(q
2
) = P
2
, (13.20)
_
2E S
2
1
(q
1
) q
1
= P
2
. (13.21)
The variables are separated, and it remains to integrate
the equations. Equation (13.20) yields simply
S
2
(q
2
) = P
2
q
2
.
(Note that we do not need to add another constant to
P
2
q
2
when we integrate; as we showed in Sec. 13.2, ad-
ditive constants are not desired in the solution of the HJ
equation.) Finally, we can express S

1
from Eq. (13.21) or
directly from Eq. (13.19) and integrate:
S
1
(q
1
) =
_
q
1
_
2E (P
2
+q
1
)
2
dq
1
.
This integral is elementary (i.e. can be computed in
terms of elementary functions such as exp, sin, etc.) but
we do not need its explicit form at this time.
We can now put together the solution of the HJ equa-
tion,
F(q
1
, q
2
, E, P
2
, t) =
_
q
1
_
2E (P
2
+q
1
)
2
dq
1
+P
2
q
2
Et.
47
13. Advanced canonical methods
This generating function yields the canonical transfor-
mation to new variables (D, E, Q
2
, P
2
) dened by
D =
F
E
=
_
q
1
dq
1
_
2E (P
2
+q
1
)
2
+t,
Q
2
=
F
P
2
= q
2
+
_
q
1
dq
1
(P
2
+q
1
)
_
2E (P
2
+q
1
)
2
.
These integrals are elementary, so we nd
D = t arcsin
P
2
+q
1

2E
, Q
2
= q
2

_
2E (P
2
+q
1
)
2
.
Finally, we invert these equations to obtain explicit solu-
tions as functions of the four constants,
q
1
(t) = P
2
+

2E sin (t D) ,
q
2
(t) = Q
2

2E cos (t D) .
These examples show how the HJ method yields an
explicit solution of a Hamiltonian system. The HJ equa-
tion reveals the structure of the relationships between
the canonical variables, which is not always easy to see
by looking directly at the Hamiltonian. This is why the
HJ method is one of the most powerful methods used to
study Hamiltonian systems.
13.3. Action-angle variables
Suppose that we have a Hamiltonian H(q, p) for which
at least one pair of canonical variables, say (q
1
, p
1
), is
separable in the HJ equation. Then, as we found in sec-
tion 13.2.2, the trajectory in the phase plane (q
1
, p
1
) is
independent of other canonical variables and is deter-
mined by an equation of the form
f
1
(q
1
,
S
1
(q
1
, Q
1
)
q
1
) f
1
(q
1
, p
1
) = Q
1
. (13.22)
In other words, the trajectories in the phase plane are
level surfaces of the function f
1
(q
1
, p
1
). Moreover, let
us now suppose that the motion in the (q
1
, p
1
) plane
is bounded (i.e. the values of q
1
and p
1
during mo-
tion along a particular trajectory are always smaller than
some maximal values q
max
and p
max
). Then it is clear
that at least some of the trajectories are closed curves. In
particular, suppose f
1
(q
1
, p
1
) has a local minimum or a
local maximum at some point in the phase plane. Then
trajectories around that point will look like deformedcir-
cles or ovals. These trajectories correspond to periodic
motion.
4
Physically, such motion is interpreted as an os-
cillation with a frequency that in general depends on the
amplitude of the oscillation.
4
Of course, there may be other trajectories that are not periodic
and/or not bounded. The study of the geometry of trajectories in
phase space is a fascinating area of modern theoretical mechanics
that serves as a basis of chaos theory. But this material is beyond
the scope of the present text.
It is convenient to describe these periodic trajecto-
ries in terms of two canonical variables: one, denoted
, that represents the angle along the circle, and an-
other, denoted J, that labels the different ovals. Let
us remark that the oval is already labeled by the value
Q
1
= f
1
(q
1
, p
1
), which is a constant of motion that varies
only from one oval to another. However, one could re-
dene f
1
and replace the constant Q
1
by any function of
Q
1
, while we would like to have in some sense a stan-
dard choice of this variable, so that we could compare
two different Hamiltonians in an unambiguous way. A
convenient requirement that xes the choice of canonical
variables (, J) is that the angle should vary from 0
to 2 as the systemtraverses any oval once.
Here is how one could motivate this requirement. Let
us assume for simplicity that the canonical variables
(q
1
, p
1
) are the only variables present in the system, and
that H(q
1
, p
1
) is the full Hamiltonian of the system. Sup-
pose (q
1
, p
1
) (, Q
1
), H H

is a canonical transfor-
mation such that Q
1
is a constant of motion. It means
that the Hamilton equations for the variables (, Q
1
) are

Q
1
=
H

(, Q
1
)

= 0,

=
H

(, Q
1
)
Q
1
.
It follows from the rst equation that H

is a function
only of Q
1
, and then it follows from the second equation
that

is a function only of Q
1
, i.e. a constant. Therefore
(t) = t
H

Q
1
(Q
1
)t, (Q
1
)
H

(Q
1
)
Q
1
,
in other words, (t) grows linearly with t. Since the
motion is along a closed curve (oval), it is natural to
interpret as the angle and as the angular veloc-
ity (in quotation marks, since nothing is really rotat-
ing and we are merely interpreting the trajectory in the
phase plane (q
1
, p
1
) in a geometrical way). A redeni-
tion Q
1
f(Q
1
) will change the value of this angular
velocity as f

(Q
1
). Thus we could set to any
function of Q
1
if we wish. Then it is natural to require
that (Q
1
) be such that the angle always varies from
0 to 2 as the system traverses any oval, for any xed
Q
1
. It is clear that this can be arranged by a suitable re-
denition Q
1
f(Q
1
).
It turns out that this requirement uniquely singles out
a choice of canonical variables. The variables chosen in
this way are commonly denoted by J and and called
the action-angle variables. We shall now determine the
required canonical transformation (q
1
, p
1
) (, J), as-
suming that the function f
1
(q
1
, p
1
) from Eq. (13.22) and
the corresponding solution S
1
(q
1
, Q
1
) are known.
Let us suppose that the generating function for the
canonical transformation is F(q
1
, J), where J is inter-
preted as the new momentum and as the new coor-
dinate. Then we have
(q
1
, J) =
F(q
1
, J)
J
, p
1
(q
1
, J) =
F(q
1
, J)
q
1
. (13.23)
48
13. Advanced canonical methods
We now use the following trick: We write the total
change of the variable as the system traverses one
oval as an integral over the closed curve, using q
1
as
the integration variable,
=
_
d =
_
d
dq
1
dq
1
=
_

2
F(q
1
, J)
q
1
J
dq
1
=

J
_
F(q
1
, J)
q
1
dq
1
=

J
_
p
1
(q
1
, J)dq
1
.
We now require that = 2. Consider the function
A(J)
_
p
1
(q
1
, J)dq
1
.
Since = dA/dJ = 2, we have A(J) = 2J, or
J =
1
2
_
p
1
(q
1
)dq
1
.
(Note that J has dimensions of action.) The above equa-
tion yields the value of J = J(Q
1
) for the oval cor-
responding to the given value of Q
1
, determined in
the phase plane by the algebraic equation (13.22). We
may invert that equation (at xed Q
1
) to obtain p
1
=

f
1
(q
1
, Q
1
). Then J(Q
1
) is expressed as
J(Q
1
) =
1
2
_

f
1
(q
1
, Q
1
)dq
1
, (13.24)
where the integral is performed over the oval given by
the equation p
1
=

f
1
(q
1
, Q
1
) in the phase plane, at a xed
value of Q
1
. (Note that the integral is equal to
1
2
of the
area inside the oval in the (q, p) plane.) Since J is now
known as a function of Q
1
, we can also express Q
1
as a
function of J, i.e. Q
1
= Q
1
(J).
Having dened the action variable J as a function
of Q
1
(and thus, through Eq. (13.22), as a function of q
1
and p
1
), it remains to dene the angle . We note that
we already have the function S
1
(q
1
, Q
1
), which satises
p
1
=

f
1
(q
1
, Q
1
) =
S
1
(q
1
, Q
1
)
q
1
. (13.25)
However, we need the generating function F(q
1
, J) of
the canonical transformation (q
1
, p
1
) (, J). We shall
now argue that F equals S
1
after substituting Q
1
as a
function of J, i.e. F(q
1
, J) = S
1
(q
1
, Q
1
(J)). To see this,
let us compare Eqs. (13.23) and (13.25). We nd that
S
1
(q
1
, Q
1
)
q
1
= p
1
(q
1
, Q
1
) p
1
(q
1
, J(Q
1
)) =
F(q
1
, J(Q
1
))
q
1
,
where the middle identity reects the fact that the canon-
ical momentump
1
(q
1
) is a xed function of q
1
for a given
Q
1
, and thus p
1
(q
1
, J) must be the same function of q
1
at
the xed value of J = J(Q
1
). We can now integrate the
above equation and obtain F(q
1
, J(Q
1
)) = S
1
(q
1
, Q
1
).
Having obtained F(q
1
, J), we dene the variable
through
=
F(q
1
, J)
J
=
S
1
(q
1
, Q
1
)
Q
1
1
J/Q
1
as a function of q
1
and Q
1
, and thus as a function of q
1
and p
1
. This completes an explicit construction of the
action-angle variables for the canonical pair (q
1
, p
1
).
The construction can be performed for every separa-
ble canonical pair (q
j
, p
j
) with bounded motion. For a
completely integrable system, the result is a canonical
transformation to n action and n angle variables (J
1
,
..., J
n
,
1
, ...,
n
), with the new Hamiltonian depending
only on J
1
, ..., J
n
. The equations of motion then become

J
k
= 0,

k
=
H(

J)
J
k

k
.
Note that the total motion of the systemwith several sep-
arable variables may not be periodic, even though each
pair executes independent periodic motion. The only
case when the total motion is periodic is when all the
frequencies
k
are commensurate (i.e. every ratio
j
/
k
is equal to a ratio of integers).
13.3.1. Examples
As a rst example, let us consider a simple harmonic os-
cillator, with the Hamiltonian
H =
p
2
2
+
2
q
2
2
.
Since the Hamiltonian is a constant of motion,
H(q(t), p(t)) = E = const, the trajectories in the phase
plane are curves determined by the equation
p
2
a
2
+
q
2
b
2

p
2
2E
+
q
2
2E
2
= 1,
that is, ellipses centered at (0, 0) with semiaxes a =

2E
and b =
1

2E. We see that all the trajectories are sim-


ple closed curves. Let us now carry out the construction
of the action-angle variables. The result will be a canon-
ical transformation, (q, p) (, J), where the new vari-
ables will be determined as explicit functions of the old
ones.
The simplied HJ equation (13.6) for the function
S(q) is
1
2
S
2
(q) +
1
2

2
q
2
= E.
This is already in the form (13.22) with Q
1
E and
f
1
(q, p) H(q, p). The solution S(q) is found as
S(q) =
_
q
_
2E
2
q
2
dq.
(We shall not need to integrate this explicitly.) The next
step is to determine the function J(E),
J(E) =
1
2
_
p(q, E)dq =
1
2
_
_
2E
2
q
2
dq.
Instead of computing this integral directly, let us use a
geometric consideration. Since J(E) is equal to
1
2
of the
49
13. Advanced canonical methods
area of the ellipse f
1
(q, p) = E, whose area is equal to
ab, we have
J(E) =
1
2
(ab) =
1
2

2E
__

2E
_
=
E

.
Therefore, E(J) = J, which also means that the Hamil-
tonian in the new variables is simply H(J) = J. The
next step is to dene the generating function F(q, J) of
the canonical transformation (q, p) (, J),
F(q, J) = S(q, E(J)) = S(q, J) =
_
q
_
2J
2
q
2
dq.
The variable is then found as
(q, J) =
F(q, J)
J
=
_
q
dq
_
2J
2
q
2
= arcsin
q

2J
.
Thus the old variables and the new ones are related by
q =
_
2J

sin , p =
_
2J
2
q
2
=

2J cos ;
(13.26)
J =
H(q, p)

=
1
2
_
p
2
+
2
q
2
_
, = arcsin
q

2J
.
One can check that the Poisson bracket is correct,
{, J} = 1. Since H(, J) = H(J) = J, the new equa-
tions of motion are

J = 0,

= ,
describing uniform motion around a circle of xed J.
Let us now consider another, somewhat unusual
Hamiltonian,
H = sinh
_
1
2
p
2
+
1
2

2
q
2
_
.
The simplied Hamilton-Jacobi equation for the func-
tion S(q) is
E = sinh
_
1
2
S
2
+
1
2

2
q
2
_
.
We can rewrite this as
arcsinh E =
1
2
S
2
+
1
2

2
q
2
.
It is now clear that the entire construction of the action-
angle variables for a harmonic oscillator can be repeated
if we replace E by arcsinh E. In particular, Eq. (13.26)
still holds, so it is straightforward to obtain the general
solution q(t), p(t). The Hamiltonian in the new variables
is
H(J) = sinh(J) ,
and thus the equations of motion are

J = 0,

=
H(J)
J
= cosh (J) .
Since the value of J describes the amplitude of oscilla-
tions, we nd that the frequency

of oscillations de-
pends on the amplitude.
13.4. Adiabatic invariants
We have seen how to describe periodic motion in terms
of action-angle variables. The other signicant use of
these variables is in situations when parameters of the
system change slowly with time. In that case, the mo-
tion of the system is only approximately periodic, since
both the frequency and the amplitude of oscillations will
slowly change with time. The action-angle variables are
particularly suitable for describing such systems with
good precision.
Consider for simplicity a one-dimensional system
with a Hamiltonian of the form H
0
(q, p; ), depending
on a parameter (such as mass or frequency). Assume
that the motion is periodic, so that the trajectories in the
(q, p) plane are closed curves and the action-angle vari-
ables (, J) can be introduced. Now suppose that is a
slow-changing function of time. Quantitatively, (t) is a
slow-changing function when its change is small over a
characteristic time t of order of one oscillation period,
||

. (13.27)
This is called the adiabaticity condition and the change
of is called adiabatic if the condition (13.27) holds.
The Hamiltonian is now explicitly time-dependent,
H(q, p; t) = H
0
(q, p; (t)), and is not any more a con-
served quantity (although the change of H is slow).
Let us now perform the canonical transformation to the
action-angle variables (, J) using the same generating
function F(q, J) as in the case of constant . Of course,
since is now a function of time, the generating func-
tion F(q, J) will be also a function of time through its
dependence on . The new Hamiltonian is
H

(, J) = H
0
(J; ) +
F
t
= H
0
(J; ) +
F

H
0
+f

,
where we introduced an auxiliary function
f(, J; )
F(q, J; )

qq(,J)
,
which must be expressed as a function of and J after
evaluating the derivative F/. With the Hamiltonian
H

, the equations of motion for , J are

J =
H

f(, J)

=
H

J
=
H
0
(J; )
J
+

f(, J)
J
.
It is clear that J is not constant any more, although its
time derivative is small due to the smallness of

. Also,
the frequency

is not constant but is slow-changing due
to the dependence of H
0
on .
Note that the equation for

J contains a small factor

(t) multiplied by a periodically oscillating term f/


50
13. Advanced canonical methods
(recall that is a circular variable with period 2). Qual-
itatively, one may expect that the oscillations of

f/
during one periodapproximately cancel, since

changes
very little during one period. For this reason, J(t) is al-
most constant even when traced over a long period of
time, during which (t) changes considerably. Quanti-
ties that remain approximately constant under an adia-
batic change of parameters are called adiabatic invari-
ants.
One may ask why the Hamiltonian itself is not an adi-
abatic invariant; after all, H(t) also changes slowly with
time. The answer is that J(t) changes much more slowly
than H(t), although this is perhaps not immediately ob-
vious. To make this point more evident, in the next
section we shall estimate the change of J(t) for a har-
monic oscillator with an adiabatically time-dependent
frequency. The role of the parameter (t) will be played
by the frequency (t). A change of the frequency
over a time T is adiabatic if / (T) , which will
hold for large T. So the frequency may change appre-
ciably, say = 1000, provided that the change is
spread over a very long time T. The conclusion will
be that the relative change of J is exponentially smal,
J/J exp (T), even if the total change in the fre-
quency is not small. Since J = H/ for the harmonic
oscillator, it follows that both and H may change sig-
nicantly during the time T, while the adiabatic invari-
ant J will remain essentially constant if T is large.
13.4.1. Change of adiabatic invariant
Consider a harmonic oscillator with a time-dependent
frequency (t) = 0,
H =
1
2
_
p
2
+
2
(t)q
2
_
.
The action-angle variables are introduced as before, ac-
cording to Eq. (13.26). The generating function of the
canonical transformation is
F(q, J) =
_
q
_
2J
2
q
2
dq
and is now time-dependent through its dependence on
(t). The new Hamiltonian is
H

(, J, t) = J +
F(q, J)

q=q(,J)
,
where we need to substitute q through , J using Eq.
(13.26) only after evaluating the derivative F/. We
rst compute
F(q, J)

=
_
q
J q
2
_
2J
2
q
2
dq.
Then we substitute q =
_
2J/ sin , change variable as
dq =
_
2J/ cos d, and nd
F(q, J)

q=q(,J)
=
J
2
sin 2.
Therefore, the new Hamiltonian is
H

(, J, t) = (t)J +
J
2
sin 2.
The equations of motion in the action-angle variables are

J =
H

J cos 2,

=
H

J
= +
1
2

sin 2.
Let us now analyze these equations. The adiabaticity
condition (13.27) with becomes | /| , there-
fore the equation for

can be approximately replaced by

(t) and integrated as


(t)
_
t
t
0
(t)dt, (13.28)
where t
0
is an arbitrary reference point. We now turn
to the equation for J(t). Since grows monotonically
with time, it is possible to use instead of t as the time
variable (d = dt) and to consider J as a function of .
We can express (t) as a function of as
(t) ((t)),
where (x) is an auxiliary function. The equation for
J() is
d
d
ln J() =

()
()
cos 2,
which can be immediately integrated, yielding
ln
J()
J(
0
)
=
_

0
cos 2

d. (13.29)
At this point, we cannot simplify the integral in
Eq. (13.29) any more. To proceed further, we consider
the case when the frequency (t) smoothly changes from
a constant value
1
at t to another constant
value
2
at t +, while the characteristic timescale
of change is T. One such function (t) is
(t) =

1
+
2
2
+

2

1
2
tanh
t
T
. (13.30)
This function (t) will be adiabatic if |
2

1
|
2
1
T.
So we do not need to assume that
1

2
; we might
even have
2
= 1000
1
if T is sufciently large. Note
that (t) has the form f(t/T), where f(x) is a function
that changes between
1
and
2
on scales x 1. When
we choose (t) in this way, we can adjust the slowness of
change of (t) by adjusting the value of T while keeping
the same overall shape of the function (t).
Suppose that the systemstarts at t = with a value
J = J
1
of the adiabatic invariant. We can now use
Eq. (13.29) to estimate the value J
2
at time t +when
the frequency has changed from
1
to
2
. We rewrite
Eq. (13.29) as
ln
J
2
J
1
= Re
_
+

e
2i

d,
51
13. Advanced canonical methods
so that we can use the methods of complex variable the-
ory to evaluate the integral. The contour of integra-
tion can be closed in the upper half-plane of complex
. Then the integral will be equal to the sum of residues
of the function e
2i

()/() in the upper half-plane.


Note that each root or pole

of () leads to a simple
pole of e
2i

/ with residue of order e


2i
. A pole at

=
1
+ i
2
(where we always have
2
> 0) will yield
a factor e
2
2
, thus only the poles with the smallest value
of
2
will contribute signicantly. Suppose that

is the
pole of

/ with the smallest value of


2
Im

, then
we can estimate the integral as
ln
J
2
J
1
Re
_
1
2i
e
2
2
e
2i
1
_
Ce
2
2
,
where C is a constant typically of order 1 (in any case,
not large).
Finally, we need to estimate the value of
2
Im

,
where

is the complex pole or root of () that is clos-


est to the real axis. Since () is simply (t) expressed
through , while is dened by Eq. (13.28), the complex
roots of (t) are also roots of (), while the complex
poles of (t) are poles of () at = and are irrele-
vant. Therefore,

=
_
z
t
0
(z)dz,
where z

is the complex root of (z) = 0 such that the


imaginary part of

is the smallest among all such com-


plex roots. If (t) is of the form f(t/T), then we have

= T
_
z

t
0
/T
f(z

)dz

, z

z/T,
where now z

is the corresponding complex root of


f(z) = 0. Since f(z) is a function that changes on scales
z 1, we expect that f(z) has complex roots of order
1 but not signicantly smaller. (In a moment, we shall
compute the complex roots of the function in Eq. (13.30)
to see an explicit illustration of this statement.) There-
fore, the change of the adiabatic invariant is estimated
as
ln
J
2
J
1
= Ce
2bT
, b Im
_
z

t
0
/T
f(z

)dz

.
This is the main result of this section. It shows that the
change in the adiabatic invariant is exponentially small
in the timescale T.
Let us now nd the specic values of the constant b
for the example (13.30). The function (t) has complex
roots at
t
T
=
i
2
ln

2

1
+in, n = 0, 1, 2, ...
If
2
>
1
, the root closest to the real axis is
t

T
=
i
2
ln

2

1
.
Therefore,
b = Im
_
t
t
0
(t)
dt
T
=

1
+
2
2
ln

2

1
,
ln
J
2
J
1
= C exp
_
T (
1
+
2
) ln

2

1
_
.
(The quantity ln(J
2
/J
1
) can be estimated more pre-
cisely, but we only performed an order-of-magnitude es-
timate.)
The nal conclusion is that the change of the adiabatic
invariant is of order exp (T), where is a typical fre-
quency and T is the typical timescale of change of fre-
quency.
52
14. Perturbation theory and anharmonic oscillations
In this chapter I explain the basics of perturbation the-
ory and then show how to apply this theory to describe
anharmonic oscillations.
14.1. Introduction
Consider a one-dimensional systemdescribed by a coor-
dinate x(t). This corresponds to a point mass moving in
a potential V (x). When the system is near a stable equi-
librium point x = x
0
, we have V

(x
0
) = 0, and thus the
equation of motion
x =
1
m
V

(x)
can be linearized as follows,
x = (x x
0
)
V

(x
0
)
m

1
2
(x x
0
)
2
V

(x
0
)
m
... (14.1)
If we only consider the linear term (harmonic approxi-
mation), we obtain the equation of a harmonic oscillator
with the frequency

1
m
V

(x
0
).
Harmonic oscillations have a xed period, T = 2
1
,
which is independent of the amplitude. However, this is
only an approximation which is valid for small enough
|x x
0
|. It is also important to study the effect of further
terms in Eq. (14.1). For instance, we would like to know
howthe period of oscillations depends on the amplitude
when the amplitude becomes large. This will give us
an idea about the limits of applicability of the harmonic
approximation.
Let us introduce the coordinate q(t) x(t) x
0
. We
are interested in describing an anharmonic oscillator,
q +
2
0
q = f(q), (14.2)
where f(q) is a nonlinear function which represents a
small perturbation. The function f(q) can be expanded
in a power series in q as follows,
f(q) = f
2
q
2
+f
3
q
3
+...,
where the parameters f
2
, f
3
, ..., are small in an ap-
propriate sense. (Below we shall determine exactly how
small they must be.)
In the general case, it is impossible to obtain an exact
solution of Eq. (14.2). However, one can use the method
of perturbation theory and obtain an approximate solu-
tion as a function of the small parameters f
2
, f
3
, ...
I shall nowreview the basic principles of perturbation
theory. Please follow all the calculations with pen and
paper.
14.2. What is perturbation theory
Perturbation theory is a method to obtain an approxi-
mate solution of an equation when that equation is only
slightly different from some other equation whose exact
solution is known.
14.2.1. A rst example
Consider an algebraic (not differential) equation,
x
5.23
0.000001 x 1 = 0. (14.3)
It is clear that this equation cannot be solved exactly. But
we notice that Eq. (14.3) is almost the same as the equa-
tion
x
5.23
1 = 0,
which we can easily solve: x = 1. In such situations, one
can apply perturbation theory with great success.
The method of perturbation theory is applied to
Eq. (14.3) as follows. For brevity, let us set a 5.23.
The rst step is to introduce a small parameter into the
equation. One says that Eq. (14.3) is a perturbation of
the equation x
a
1 = 0, and one describes this perturba-
tion by introducing a small parameter , so that Eq. (14.3)
becomes
x
a
x 1 = 0. (14.4)
In the particular example, we have = 10
6
, but it is
actually easier to keep the value of arbitrary, as long
as we remember that it is very small. The value = 0
describes the unperturbed equation which we can solve
exactly; the unperturbed solution is x
0
= 1. Now we
are ready for the second step. It is reasonable to sup-
pose that the solution x

of Eq. (14.4) is a small pertur-


bation of the unperturbed solution, i.e. that x

is only
very slightly different from 1. We may imagine that we
could solve Eq. (14.4) for every (sufciently small) , so
that the solution x

is a function of . Since we knowthat


x

( = 0) = x
0
= 1, it seems reasonable to assume that
we may describe the solution x

() as a series in ,
x

() = 1 +A
1
+A
2

2
+..., (14.5)
where A
1
, A
2
, ... are some unknown constants. I would
like to emphasize that the formula (14.5) is just a guess;
at this point, we have no proof that the solution x

is
indeed of this form. We need to check that this guess is
correct. Let us substitute Eq. (14.5) into Eq. (14.4) and
expand in powers of : the constants A
1
, A
2
, ... must be
such that
_
1 +A
1
+A
2

2
+...
_
a

_
1 +A
1
+A
2

2
+...
_
1 = 0.
(14.6)
53
14. Perturbation theory and anharmonic oscillations
For the moment, let us keep only linear terms in :
1 +aA
1
1 +O(
2
) = 0.
Since is arbitrary, the linear terms in must cancel sep-
arately from the quadratic terms, so we must have
A
1
=
1
a
.
Now let us look at the quadratic terms in Eq. (14.6) and
disregard the cubic terms. We nd
aA
2

2
+
a(a 1)
2
A
2
1

2
A
1

2
= 0, (14.7)
therefore
A
2
=
1
2a
+
3
2a
2
.
Let us stop at this point and summarize what we have
done. Using perturbation theory, we have found an ap-
proximate solution of Eq. (14.4),
x

1 +A
1
+A
2

2
= 1 +
1
a

a 3
2a
2

2
. (14.8)
For the values = 10
6
and a = 5.23, the term A
2

2
is only a tiny correction of order 10
13
, so the precision
obtained by using only the linear term, x

1 + /a, is
already sufcient for almost all practical purposes.
I would like to contrast the result obtained using per-
turbation theory, Eq. (14.8), with a numerical solution
(i.e. a solution found by a numerical method, which es-
sentially consists of substituting different numbers x

into Eq. (14.3) until a certain precision is achieved). By


using a computer program, one could easily obtain an
approximate solution of Eq. (14.3) with particular nu-
merical coefcients = 10
6
, a = 5.23. However, one
would obtain a single number x

; it would remain un-


clear how this solution depends on a and . The numer-
ical procedure would have to be repeated if a different
value of a or were given. The perturbative result (14.8)
is much more useful because it shows the behavior of x

as a function of and a.
14.2.2. Limits of applicability of perturbation
theory
The solution (14.8) is not exact but approximate; what is
the precision of this approximation? Here is how one
can get a qualitative answer without knowing the exact
solution. The second term in Eq. (14.8) is linear in and
represents a small correction to the rst term if ||
a. The third term represents a small correction to the
second term if
|a 3|
2a
2

2

||
a
,
which is also satised if || a. Therefore, the condition
|| a is the condition of applicability of perturbation
theory in our example. For very large such that || a,
we should expect that Eq. (14.8) becomes a bad approxi-
mation to the exact solution. One says that perturbation
theory breaks down at || a.
14.2.3. Higher orders
It is clear that by using the method shown above, we can
calculate all the coefcients in the expansion (14.5), one
by one. Let us describe the procedure in more general
terms. We assume the expansion
x

() = 1 +
N

n=1
A
n

n
+O(
N+1
),
which contains N unknown constants A
1
, ..., A
N
. This
is called a perturbative expansion since it describes a
small perturbation of the unperturbed solution x = 1.
We then substitute this expansion into Eq. (14.4), ex-
pand everything in , disregarding terms of order
N+1
or higher, and obtain an equation of the form
1 1 + (nA
1
1) +
_
aA
2
+
a(a 1)
2
A
2
1
A
1
_

2
+... + (...)
N
+O(
N+1
) =0.
(14.9)
Since is arbitrary, this equation can be satised only
if each of the N coefcients at ,
2
, ...,
N
is equal to
zero. To establish this more rigorously, we may divide
Eq. (14.9) by and take the limit as 0; we nd nA
1

1 = 0. Thus the rst nonvanishing term in Eq. (14.9) is


of order
2
. Then we can divide by
2
and take the limit
0, which yields Eq. (14.7), etc.
In this way, a single condition (14.9) yields a systemof
N equations for the N unknown constants. This system
of equations is easy to solve because the rst equation
contains only A
1
, the second equation contains only A
1
and A
2
, and so on. Of course, the calculations will be-
come more cumbersome for higher orders, since we will
have to retain many more terms, but there is no difculty
in principle. These calculations are so straightforward
that one can program a computer to perform these cal-
culations symbolically. (A calculation is called symbolic
if the computer is programmed to print the resulting for-
mula containing variables such as and a, as opposed to
a numerical calculation where the computer prints only
a numerical value of x

.)
14.2.4. Convergence and asymptotic
expansions
In principle, one could calculate all the coefcients A
n
,
n = 1, 2, 3, ..., by the method shown above. Then one
may ask whether the resulting innite series
x

() = 1 +

n=1
A
n

n
(14.10)
will converge to the exact solution x

. Unfortunately, the
question of convergence is much more difcult to an-
swer; to answer it, one needs some information about
the exact solution of the perturbed equation. In some
54
14. Perturbation theory and anharmonic oscillations
cases (such as Eq. (14.4) above, the series converges.
However, in many cases, especially when applying per-
turbation theory to differential equations, the perturba-
tive series does not converge. Nevertheless, a nite num-
ber of terms of the perturbative expansion always pro-
vide excellent precision when is very small, and hence
can be successfully usedin practice to determine approx-
imate solutions. This is a somewhat paradoxical situa-
tion: An innite series is useful and gives a good ap-
proximation to x

() at small , even if it does not actu-


ally converge to the exact value of x

().
Let us consider this situation in some more detail. The
sum of rst N terms can be thought of as a function of
and N,
x

(, N) 1 +
N

n=1
A
n

n
.
This function may have very different behavior as N
at xed , and as 0 at xed N. In particular, the
function x

(, N) provides a very precise approximation


to the exact solution x

() at xed N and small . The


error of this approximation is of order
N+1
; more rigor-
ously, one can write
lim
0
x

(, N) x

()

N
= 0. (14.11)
To describe this property, one says that such a se-
ries x

(, N) is an asymptotic expansion of the so-


lution x

(). Nevertheless, it often happens that


lim
N
x

(, N) = . There is no contradiction since


these limits are taken in very different ranges of the vari-
ables , N. An asymptotic expansion may be either con-
vergent or divergent.
The conclusion is that perturbation theory will give an
asymptotic expansion, but not always a convergent expan-
sion. However, in practice it is almost always impossible
to obtain innitely many coefcients A
n
. Moreover, one
usually computes only two or three terms of the series,
that is, one computes x

(, N) with N = 2 or, at most,


N = 3. So the lack of convergence at N is not
important. It is much more important to obtain an esti-
mate of the error and to determine the admissible values
of for which the approximate solution x

(, N) is still
precise at a xed, small value of N.
14.2.5. How to guess the perturbative ansatz
The starting point of perturbation theory is a perturba-
tive expansion such as Eq. (14.5). In other words, we are
already guessing the solution to the problem in a form
of a certain formula, or ansatz, containing in a certain
way. We do not know that this ansatz is correct; the
justication comes later when we actually arrive at the
solution (14.8). But it might happen that we have not
guessed the ansatz correctly; then the procedure will not
work.
For example, consider the equation
x
2
2 (1 +) x + 1 = 0. (14.12)
This equation looks like a perturbation of the eqution
x
2
2x + 1 = 0, which has the solution x = 1. Let us
assume the ansatz
x

() = 1 +A +O(
2
) (14.13)
and substitute it into Eq. (14.12), keeping only terms lin-
ear in . We nd
2 = O(
2
).
This condition cannot be satised because nothing can
cancel the linear term 2. Therefore, the ansatz (14.13) is
incorrect.
The reason for this trouble will become clear if we look
at the exact solution of Eq. (14.12),
x

() = 1 +
_
2 +
2
. (14.14)
The leading correction is actually of order

rather than
of order , and the expansion of x

() is of the form
x

= 1 +A
1

+A
2
+A
3

3/2
+...
Therefore, the correct perturbative ansatz must be of this
form. Upon substituting such an ansatz into Eq. (14.12),
one gets A
1
=

2, A
2
= 1, etc.
Now that we have the exact solution (14.14), it is easy
to see that the perturbative expansion is merely a Taylor
series of the analytic function x

), and therefore the


expansion actually converges for small enough . But it
would be difcult to judge the convergence of the expan-
sion if we had only computed the rst few coefcients
A
1
, A
2
, ... of the expansion and did not have a formula
for the exact solution.
The conclusion is that one will run into trouble if one
tries to use an incorrect perturbative expansion. When
this happens, another expansion ansatz must be chosen,
e.g. containing

or ln instead of . One can often use


physical intuition or qualitative considerations about the
exact solution to guess the correct ansatz.
14.3. Perturbation theory for
differential equations
Let us apply the method of perturbation theory to a dif-
ferential equation. For example, consider the equation
x + 2.33x = 0.000001x
3
, x(0) = 15. (14.15)
It is clear that this equation can be thought of as a small
perturbation of
x + 2.33x = 0, x(0) = 15,
whose exact solution can be easily found,
x(t) = 15e
2.33t
. (14.16)
55
14. Perturbation theory and anharmonic oscillations
Therefore, we rewrite Eq. (14.15) as
x +ax = x
3
, x(0) = A, (14.17)
where we introduced a small parameter 10
6
and
(non-small) parameters a 2.33, A 15. Let us now
apply the methodof perturbation theory to this problem.
As the next step, we assume that the solution of
Eq. (14.17) is a small perturbation of the solution (14.16),
i.e.,
x(t) = Ae
at
+f
1
(t) +O(
2
), (14.18)
where the function f
1
(t) needs to be determined. The
initial condition for f
1
(t) is found by setting t = 0 in
Eq. (14.18), which yields
A+f
1
(0) +O(
2
) = A.
Since the term linear in must vanish, we obtain f
1
(0) =
0. We now substitute the ansatz (14.18) into Eq. (14.17)
and expand to rst order in :


f
1
+af
1
= A
3
e
3at
+O(
2
).
All the terms linear in must cancel separately from
terms quadratic in . Therefore, we obtain the equation

f
1
+af
1
= A
3
e
3at
, f
1
(0) = 0.
The solution is
f
1
(t) = A
3
e
at
e
3at
2a
.
Therefore, the approximate solution of Eq. (14.17) is
x(t) Ae
at
+A
3
e
at
e
3at
2a
.
Let us now nd a second-order correction. We start
with the ansatz
x(t) = Ae
at
+f
1
(t) +
2
f
2
(t) +O(
3
), (14.19)
where f
1
(t) is already known and f
2
(t) is to be deter-
mined. The initial condition for f
2
(t) is f
2
(0) = 0. Sub-
stituting this into Eq. (14.17), we nd

f
2
+af
2
= 3A
2
e
2at
f
1
(t) =
3A
5
2a
_
e
3at
e
6at
_
, f
2
(0) = 0.
The solution is
f
2
(t) = A
5
9e
at
15e
3at
+ 6e
6at
20a
2
.
Therefore, the second-order approximate solution is
x(t) = Ae
at
+A
3
e
at
e
3at
2a
+
2
A
5
9e
at
15e
3at
+ 6e
6at
20a
2
. (14.20)
14.3.1. Precision and limits of applicability
As a rule, the perturbative expansion provides good pre-
cision if its successive terms rapidly diminish in magni-
tude. Therefore, the condition of applicability of pertur-
bation theory to Eq. (14.17) is

A
3
e
at
e
3at
2a

Ae
at
, for all t.
It straightforwardly follows that || 2aA
2
. The con-
dition for the quadratic term in Eq. (14.20) is

2
A
5

9e
at
15e
3at
+ 6e
6at

20a
2

A
3
e
at
e
3at
2a

,
which gives || 9aA
2
/10. It follows that perturba-
tion theory is well within its limit of applicability for the
values = 10
6
, A = 15, a = 2.33.
The precision of the approximate solution (14.20) can
be estimated as follows. The rst term is x
0
(t) = Ae
at
,
the second term is of order
A
3
a
1
e
at

A
2
a
x
0
10
4
x
0
,
and the third term is of order

2
A
5
a
2
e
at

_
A
2
a
_
2
x
0
10
8
x
0
.
If we computed the next term of the perturbative series,
it would be of order 10
12
x
0
, but we did not include
that term in Eq. (14.20). Therefore, the error is of order
10
12
x
0
.
14.3.2. Convergence
In principle, we can compute all the unknown functions
f
1
(t), f
2
(t), ..., in the perturbative expansion
x(t) = e
at
+

n=1

n
f
n
(t). (14.21)
To compute each f
n
(t), we would need to solve a simple
linear equation of the form

f
n
+af
n
= (...) ,
where the omitted expression in the r.h.s. will depend
on the functions f
1
, ..., f
n1
. Therefore, we can easily
determine f
n
(t) if the previous functions f
1
, ..., f
n1
are
already found. As before, the calculations become more
cumbersome at higher orders, but there is no difculty in
principle. A computer can be programmed to compute
the result symbolically to any order.
The expansion (14.21) is asymptotic as 0, i.e. the
partial sum
x
(n)
(t) = e
at
+
N

n=1

n
f
n
(t)
56
14. Perturbation theory and anharmonic oscillations
satises the equation up to terms of order
N+1
and
is therefore a good approximation to the exact solution
when is small. It can be shown that this expansion
converges. It is easy to demonstrate convergence in this
case because one can actually nd the exact solution of
Eq. (14.17) by separating the variables,
_
x
A
dx
x
3
ax
=
_
t
0
dt = t.
After some calculations, this gives
x(t) = Ae
at
_
1
A
2
a
_
1 e
2at
_
_

1
2
.
This is an analytic function of near = 0, therefore a
Taylor expansion in will converge for small enough
(more precisely, for || < aA
2
). The expansion (14.21) is
equivalent to a Taylor expansion of the above function in
. Hence, we conclude that the perturbative expansion
will converge for a range of .
14.3.3. Perturbative expansions with = 1
In the example considered in this section, we saw that
perturbation theory works well as long as || A
2
a. If
this condition holds, the perturbative expansion is use-
ful even if the parameter is itself not small. Even if
= 10, the perturbative expansion will be precise for
sufciently small A (the initial value). For instance, we
may set a = = 1 and conclude that the equation
x +x = x
3
(14.22)
may be solved using perturbation theory if the initial
value x(0) = A is small, |A| 1.
This may appear paradoxical at rst sight: we con-
structed the perturbative expansion under the assump-
tion that is small, disregarded higher powers of , and
now we want to use the result with = 1. The reason
for the success of the perturbative expansion with = 1
can be seen by examining the formula (14.20). In that
formula, every power of is multiplied also by A
2
/a. In
other words, the perturbative expansion is a series not in
powers of but in powers of A
2
/a. The value of does
not actually need to be small.
The lesson is that one may apply perturbation the-
ory successfully even to an equation such as Eq. (14.22),
where no small parameters are apparently present. To
apply perturbation theory, we may introduce a parame-
ter = 1 and rewrite the equation as
x +x = x
3
, x(0) = A. (14.23)
The perturbative ansatz can still be formulated as in
Eq. (14.19), as if is small, even though in reality = 1.
Corrections of order
2
in the nal result (14.20) will be
actually smaller than terms of order , not because it-
self is small, but because every is multiplied by a small
number A
2
. One says that the parameter is formal; by
this one means that serves merely as a bookkeeping pa-
rameter that measures the smallness of corrections. Of
course, it takes a certain amount of intuition to realize
that the parameter should be attached to the x
3
term
rather than to some other term in Eq. (14.22). This in-
tuition usually comes from physical considerations; in
the case of Eq. (14.22), one expects that the x
3
term is in-
signicant when the initial value x(0) is very small, es-
pecially since the solution of x + x = 0 exponentially
decays with time.
However, I would like to stress that perturbation the-
ory does involve a certain amount of guessing, because
one needs to start froma certain perturbative ansatz, and
there are no absolute rules about choosing that ansatz.
Howdoes one knowthat the ansatz is guessedcorrectly?
Proving the convergence of a perturbative expansion is
difcult unless one actually knows something about the
exact solution. In the absence of a proof of convergence,
one can justify the perturbative expansion by verifying
that successive terms of the expansion decrease in mag-
nitude. This will show the limits of applicability of per-
turbation theory and also provide an estimate of the re-
sulting precision.
14.4. Anharmonic oscillations
In the preceding sections, we have considered toy per-
turbation problems where the exact solution is in fact
known and there is no need to use perturbation theory.
The analysis of these examples helps understand pos-
sible pitfalls of guessing the perturbative expansion in
various cases. We are nowready to consider the problem
of anharmonic oscillations, where exact solutions are not
known.
As a rst example of an anharmonic oscillator, con-
sider a point mass m moving in the potential
V (x) =
1
2
kx
2
+
1
4
mx
4
,
where is a small parameter, while k is not small. This
potential has a minimumat x = 0 (for all ), so we expect
oscillatory behavior near this stable equilibrium point.
Of course, oscillations will not proceed according to a
sine curve such as x(t) = sin
0
t. One says that in this
case the oscillations are anharmonic (i.e. not harmonic).
It is clear that the potential V (x) is a small perturba-
tion of the harmonic potential V
0
(x) =
1
2
kx
2
. Suppose
that the oscillator was released from rest at a position
x = A = 0. Then the equation of motion is
x +
2
0
x +x
3
= 0, x(0) = A, x(0) = 0, (14.24)
where
2
0
k/m is the frequency of oscillations in the
unperturbed potential V
0
(x).
57
14. Perturbation theory and anharmonic oscillations
14.4.1. Failure of a simple perturbation ansatz
Let us nowapply perturbation theory to Eq. (14.24). The
unperturbed solution is
x
0
(t) = Acos
0
t.
As a rst attempt, let us assume a perturbative expan-
sion of the form
x(t) = x
0
(t) +x
1
(t) +
2
x
2
(t) +..., (14.25)
where x
1
(t), x
2
(t), ... are unknown functions. Substitut-
ing this ansatz into Eq. (14.24), we nd
x
1
+
2
0
x
1
= x
3
0
,
x
2
+
2
0
x
2
= 3x
2
0
x
1
.
The initial conditions for the functions x
1
(t) and x
2
(t)
are x
1
(0) = x
1
(0) = x
2
(0) = x
2
(0) = 0 since the initial
condition x(0) = A, x(0) = 0 is already exactly satised
by the unperturbed solution x
0
(t). Note that
x
3
0
= A
3
cos
3

0
t = A
3
cos 3
0
t 3 cos
0
t
4
,
therefore we obtain
x
1
(t) =
A
3
32
2
0
[cos 3
0
t cos
0
t 12
0
t sin
0
t] .
Thus the approximate rst-order solution is
x(t) Acos
0
t+
A
3
32
2
0
[cos 3
0
t cos
0
t 12
0
t sin
0
t] .
We omit the calculation of x
2
(t) because there is already
a problem with the above expression for x
1
(t). Namely,
it includes the term t sin
0
t which describes oscillations
with a growing amplitude. Clearly, the term t sin
0
t
will become arbitrarily large at late times, thus breaking
down the perturbation theory, even if is very small.
Such terms that grow at late times are called secular
terms (from the Latin saeculum = century; the mental im-
age is, perhaps, that a small contribution that persists
over centuries will eventually grow into a large contri-
bution). The reason for the appearance of the secular
term is, technically, the presence of cos
0
t in the r.h.s. of
the equation for x
1
(t), which is analogous to the pres-
ence of a periodic force whose frequency resonates with
the frequency of the oscillator. The resonance leads to a
linear growth of the amplitude of the oscillations.
The problem of secular terms is generic to any situ-
ation with anharmonic oscillations. Secular terms may
not appear in all orders in ; for instance, if we apply
perturbation theory to the equation
x +
2
0
x = x
2
,
secular terms will rst appear only at the second order
in . This, of course, merely postpones the problem.
Secular terms appearing in a perturbative expansion
are unsatisfactory for two reasons. First, they limit the
applicability of perturbation theory to small values of
time t. Perturbation theory breaks down at |t|
0
A
2
.
Higher-order coefcients in the perturbative expansion
will also contain secular terms; thus, we cannot improve
the precision at late times, no matter how many orders
of we compute. Second, an approximate solution con-
taining secular terms does not reproduce the correct be-
havior of anharmonic oscillations: Since the total energy
of the system is conserved, the amplitude of oscillations
cannot actually increase without bound. We conclude
that the perturbative ansatz (14.25) is incorrect. Let us
now come up with an improved ansatz.
14.4.2. Lindstedt-Poincar method
We shall begin with some qualitative considerations.
When the potential is unperturbed ( = 0), the oscilla-
tions are harmonic and their frequency
0
is indepen-
dent of the amplitude A. In the presence of the perturba-
tion (i.e. with = 0), we would expect that the frequency
of oscillations will depend on the amplitude A.
Since we expect that the perturbed solution is still os-
cillatory but with a slighly different frequency, it makes
sense to incorporate a change of frequency in the pertur-
bative ansatz:
x(t) = Acos t +x
1
(t) +
2
x
2
(t) +..., (14.26)
where
=
0
+
1
+
2

2
+... (14.27)
is the perturbed frequency of the oscillations, and the
constants
1
,
2
, ... are to be determined together with
the unknown functions x
1
(t), x
2
(t), ...
The perturbation method we shall use is known as
the Lindstedt-Poincar method. This method uses the
expansions (14.26)-(14.27) with a further mathematical
trick: one changes the time variable from t to the di-
mensionless phase variable t, where is the
perturbed frequency. After this change of variable, the
perturbative ansatz becomes
x() = Acos +x
1
() +
2
x
2
() +..., (14.28)
while Eq. (14.24) needs to be rewritten through deriva-
tives dx/d and d
2
x/d
2
. Let us denote these derivatives
by a prime, then we have
x =
d
2
x
dt
2
=
d
2
x
d
2

2

2
x

,
and nally Eq. (14.24) becomes
_

0
+
1
+
2

2
+...
_
2
x

+
2
0
x +x
3
= 0. (14.29)
The initial conditions for this equation are x(0) = A and
x

(0) = 0.
58
14. Perturbation theory and anharmonic oscillations
Now we can apply the perturbation ansatz (14.28)
straightforwardly to Eq. (14.29). Keeping only the linear
terms in , we obtain the following equation for x
1
():
x

1
+x
1
= 2A

0
cos
A
3

2
0
cos
3
, x
1
(0) = x

1
(0) = 0.
(14.30)
The solution of this equation will have secular terms if
the r.h.s. contains cos or sin . Since
cos
3
=
cos 3 3 cos
4
,
the r.h.s. of Eq. (14.30) is
_
2A

0
+
3
4
A
3

2
0
_
cos
A
3
4
2
0
cos 3.
A secular term will appear if the coefcient at cos is
nonzero. However, the value of
1
is still undetermined;
so we can choose the value of
1
to avoid the appearance
of secular terms. This is the key point of the Lindstedt-
Poincar method. It is clear that we need to choose

1
=
3
8
A
3

0
.
The solution of Eq. (14.30) is then
x
1
() =
A
3
32
2
0
(cos 3 cos ) ,
and thus the approximate rst-order solution of the orig-
inal equation is
x(t) Acos t +
A
3
32
2
0
(cos 3t cos t) , (14.31)

0

3
8
A
3

0
.
The solution (14.31) is well-behaved for all t and de-
scribes oscillations with a slightly perturbed frequency
and with slight deviations from pure cosine shape.
14.4.3. Precision of the approximation
Of course, the solution (14.31) is still approximate. Since
the next correction to the frequency is of order
2
, we
expect that the rst-order solution will deviate signi-
cantly from the exact solution after times t
2
. There-
fore, the solution (14.31) can be expected to be precise
only for t
2
. Moreover, the standard condition of
applicability of perturbation theory is that the higher-
order terms are smaller, which gives
|| A
3
32
2
0
A,
3
8
|| A
3

0
.
This condition is satised if || A
2

2
0
, which means
that either the perturbation in the potential is small, or
the amplitude A of oscillations is small. Physically, this
condition means that the change in the energy due to the
perturbation of the potential,
1
4
A
4
, is much smaller than
the typical energy of the unperturbed oscillator,
1
2

2
0
A
2
.
Note also that for negative such that <
2
0
A
2
, the
oscillator does not reach the stable equilibriumpoint x =
0 from its the initial position, x(0) = A.
The precision of the approximation (14.31) can be
estimated as follows. The rst correction is of order
|| A
2

2
0
times the unperturbed solution. Therefore,
one expects that the next correction will be of order
||
2
A
4

4
0
times the unperturbed solution. So the rel-
ative precision of the approximation (14.31) is of order
||
2
A
4

4
0
.
Using the Lindstedt-Poincar method, one can nd
further terms in the perturbative expansion (14.28)-
(14.29). At the n-th step, the function x
n
(t) will be de-
termined from an equation such as
x

n
+x
n
= (...) ,
where the terms in the r.h.s. will depend on previously
found constants
1
, ...,
n1
and functions x
1
(t), ...,
x
n1
(t). The value of
n
will be determined from the
condition that no secular terms should arise in this equa-
tion. The n-th order solution will be valid until times
t
n1
. For earlier times t
n1
, the error of the
n-th order solution
14.4.4. Discussion
Let us summarize the results we obtained. We have ap-
plied perturbation theory to Eq. (14.24) with a straight-
forward ansatz (14.25), but the attempt failed because of
the appearance of secular terms such as t cos
0
t. The
second attempt was to use an improved ansatz (14.26)-
(14.27), which takes into account a change in the oscilla-
tion frequency,

0
=
0
+
1
+
2

2
+...,
due to the perturbation of the potential. How-
ever, perturbation theory based on Eq. (14.24) and the
ansatz (14.26)-(14.27) would have failed, had we not
used the Lindstedt-Poincar method, which mandates a
change of variable, t = t, involving the perturbed
frequency . Equation (14.24) is rewritten in the new
variables x(), yielding Eq. (14.29), and straightforward
perturbation theory can now be applied. At each order
in , one can choose the constants
1
,
2
, ..., so that no
secular terms appear. The result is a well-behaved ap-
proximate solution for x(t). The n-th order solution is
valid until times t
n1
, as long as the amplitude of
oscillations A and the parameter are sufciently small
so that || A
2

0
.
What is the reason for the failure of perturbation the-
ory without the Lindstedt-Poincar method? We can
easily understand this by looking at the solution (14.31).
The straightforward perturbative ansatz (14.25) is a
59
14. Perturbation theory and anharmonic oscillations
Taylor expansion of that solution in . For instance, the
term Acos t will be expanded as
Acos [(
0
+
1
) t] = Acos
0
t A
0

1
t sin
0
t +O(
2
).
This expansion is only meaningful for small enough
such that ||
0

1
t 1; but the second termgrows with-
out bound at late times, so the expansion breaks down
at t
1
. Now it becomes clear why the secular term
t sin
0
t appears when one uses a simple perturbative ex-
pansion in .
The Lindstedt-Poincar method can be also applied to
any equation of the form
x +
2
0
x = f(x, x),
where is the expansion parameter and f(x, x) is some
expression involving x and x.
14.5. Suggested literature
E. J. Hinch, Perturbation methods (Cambridge University
Press, 1995). This is a short introductory book with many
examples.
A. H. Nayfeh, Perturbation methods (Wiley, 1973). This
is a more advanced book showing a great multitude of
tricks one needs to apply to develop a successful pertur-
bation theory in different cases.
60
15. License for this text
This text is distributedunder the GNUFree Documen-
tation license, which the author feels is the most suit-
able license for an advanced science textbook such as
this one. The precise conditions of the license are given
below. Please note that the license permits printing of the
text by a commercial publisher, without a contract or ad-
ditional permission to be asked of the author and with-
out any royalties paid to the author, provided that cer-
tain conditions are met, as stipulated in sections 15.1.1
and 15.1.2 below. This text cannot be reproduced with-
out this chapter or in violation of the terms of the follow-
ing license.
15.1. GNU Free Documentation
License
Version 1.2, November 2002
Copyright (c) 2000,2001,2002 Free Software Founda-
tion, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-
1307, USA
Everyone is permitted to copy and distribute verbatim
copies of this license document, but changing it is not
allowed.
Preamble
The purpose of this License is to make a manual, text-
book, or other functional and useful document free in
the sense of freedom: to assure everyone the effective
freedom to copy and redistribute it, with or without
modifying it, either commercially or noncommercially.
Secondarily, this License preserves for the author and
publisher a way to get credit for their work, while not
being considered responsible for modications made by
others.
This License is a kind of copyleft, which means that
derivative works of the document must themselves be
free in the same sense. It complements the GNU General
Public License, which is a copyleft license designed for
free software.
We have designed this License in order to use it for
manuals for free software, because free software needs
free documentation: a free program should come with
manuals providing the same freedoms that the software
does. But this License is not limited to software manuals;
it can be used for any textual work, regardless of sub-
ject matter or whether it is published as a printed book.
We recommend this License principally for works whose
purpose is instruction or reference.
15.1.0. Applicability and denitions
This License applies to any manual or other work, in any
medium, that contains a notice placed by the copyright
holder saying it can be distributed under the terms of
this License. Such a notice grants a world-wide, royalty-
free license, unlimited in duration, to use that work un-
der the conditions stated herein. The Document, be-
low, refers to any such manual or work. Any member
of the public is a licensee, and is addressed as you.
You accept the license if you copy, modify or distribute
the work in a way requiring permission under copyright
law.
A Modied Version of the Document means any
work containing the Document or a portion of it, either
copied verbatim, or with modications and/or trans-
lated into another language.
ASecondary Section is a named appendix or a front-
matter section of the Document that deals exclusively
with the relationship of the publishers or authors of the
Document to the Documents overall subject (or to re-
lated matters) and contains nothing that could fall di-
rectly within that overall subject. (Thus, if the Docu-
ment is in part a textbook of mathematics, a Secondary
Section may not explain any mathematics.) The rela-
tionship could be a matter of historical connection with
the subject or with related matters, or of legal, commer-
cial, philosophical, ethical or political position regarding
them.
The Invariant Sections are certain Secondary Sec-
tions whose titles are designated, as being those of In-
variant Sections, in the notice that says that the Docu-
ment is released under this License. If a section does
not t the above denition of Secondary then it is not
allowed to be designated as Invariant. The Document
may contain zero Invariant Sections. If the Document
does not identify any Invariant Sections then there are
none.
The Cover Texts are certain short passages of text
that are listed, as Front-Cover Texts or Back-Cover Texts,
in the notice that says that the Document is released un-
der this License. A Front-Cover Text may be at most 5
words, and a Back-Cover Text may be at most 25 words.
A Transparent copy of the Document means a
machine-readable copy, represented in a format whose
specication is available to the general public, that is
suitable for revising the document straightforwardly
with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some
widely available drawing editor, and that is suitable for
input to text formatters or for automatic translation to
61
15. License for this text
a variety of formats suitable for input to text format-
ters. A copy made in an otherwise Transparent le for-
mat whose markup, or absence of markup, has been ar-
ranged to thwart or discourage subsequent modication
by readers is not Transparent. An image format is not
Transparent if used for any substantial amount of text.
A copy that is not Transparent is called Opaque.
Examples of suitable formats for Transparent copies
include plain ASCII without markup, Texinfo input for-
mat, L
A
T
E
X input format, SGML or XML using a publicly
available DTD, and standard-conforming simple HTML,
PostScript or PDF designed for human modication. Ex-
amples of transparent image formats include PNG, XCF
and JPG. Opaque formats include proprietary formats
that can be read and edited only by proprietary word
processors, SGML or XML for which the DTD and/or
processing tools are not generally available, and the
machine-generated HTML, PostScript or PDF produced
by some word processors for output purposes only.
The Title Page means, for a printed book, the title
page itself, plus such following pages as are needed to
hold, legibly, the material this License requires to appear
in the title page. For works in formats which do not have
any title page as such, Title Page means the text near
the most prominent appearance of the works title, pre-
ceding the beginning of the body of the text.
A section Entitled XYZ means a named subunit of
the Document whose title either is precisely XYZ or con-
tains XYZ in parentheses following text that translates
XYZ in another language. (Here XYZ stands for a spe-
cic section name mentioned below, such as Acknowl-
edgements, Dedications, Endorsements, or His-
tory.) To Preserve the Title of such a section when
you modify the Document means that it remains a sec-
tion Entitled XYZ according to this denition.
The Document may include Warranty Disclaimers
next to the notice which states that this License applies
to the Document. These Warranty Disclaimers are con-
sidered to be included by reference in this License, but
only as regards disclaiming warranties: any other impli-
cation that these Warranty Disclaimers may have is void
and has no effect on the meaning of this License.
15.1.1. Verbatim copying
You may copy and distribute the Document in any
medium, either commercially or noncommercially, pro-
vided that this License, the copyright notices, and the
license notice saying this License applies to the Docu-
ment are reproduced in all copies, and that you add no
other conditions whatsoever to those of this License. You
may not use technical measures to obstruct or control
the reading or further copying of the copies you make
or distribute. However, you may accept compensation
in exchange for copies. If you distribute a large enough
number of copies you must also follow the conditions in
section 15.1.2.
You may also lend copies, under the same conditions
stated above, and you may publicly display copies.
15.1.2. Copying in quantity
If you publish printed copies (or copies in media that
commonly have printed covers) of the Document, num-
bering more than 100, and the Documents license notice
requires Cover Texts, you must enclose the copies in cov-
ers that carry, clearly and legibly, all these Cover Texts:
Front-Cover Texts on the front cover, and Back-Cover
Texts on the back cover. Both covers must also clearly
and legibly identify you as the publisher of these copies.
The front cover must present the full title with all words
of the title equally prominent and visible. You may add
other material on the covers in addition. Copying with
changes limited to the covers, as long as they preserve
the title of the Document and satisfy these conditions,
can be treated as verbatim copying in other respects.
If the required texts for either cover are too volumi-
nous to t legibly, you should put the rst ones listed (as
many as t reasonably) on the actual cover, and continue
the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Doc-
ument numbering more than 100, you must either in-
clude a machine-readable Transparent copy along with
each Opaque copy, or state in or with each Opaque
copy a computer-network location from which the gen-
eral network-using public has access to download using
public-standard network protocols a complete Transpar-
ent copy of the Document, free of added material. If you
use the latter option, you must take reasonably prudent
steps, when you begin distribution of Opaque copies in
quantity, to ensure that this Transparent copy will re-
main thus accessible at the stated location until at least
one year after the last time you distribute an Opaque
copy (directly or through your agents or retailers) of that
edition to the public.
It is requested, but not required, that you contact the
authors of the Document well before redistributing any
large number of copies, to give thema chance to provide
you with an updated version of the Document.
15.1.3. Modications
You may copy and distribute a Modied Version of the
Document under the conditions of sections 15.1.1 and
15.1.2 above, provided that you release the Modied
Version under precisely this License, with the Modied
Version lling the role of the Document, thus licensing
distribution and modication of the Modied Version to
whoever possesses a copy of it. In addition, you must do
these things in the Modied Version:
A. Use in the Title Page (and on the covers, if any) a
title distinct from that of the Document, and from those
of previous versions (which should, if there were any, be
listed in the History section of the Document). You may
62
15. License for this text
use the same title as a previous version if the original
publisher of that version gives permission.
B. List on the Title Page, as authors, one or more per-
sons or entities responsible for authorship of the modi-
cations in the Modied Version, together with at least
ve of the principal authors of the Document (all of its
principal authors, if it has fewer than ve), unless they
release you from this requirement.
C. State on the Title page the name of the publisher of
the Modied Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your mod-
ications adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a
license notice giving the public permission to use the
Modied Version under the terms of this License, in the
form shown in the Addendum below.
G. Preserve in that license notice the full lists of In-
variant Sections and required Cover Texts given in the
Documents license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled History, Preserve its
Title, and add to it an item stating at least the title, year,
new authors, and publisher of the Modied Version as
given on the Title Page. If there is no section Entitled
History in the Document, create one stating the title,
year, authors, and publisher of the Document as given
on its Title Page, then add an item describing the Modi-
ed Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the
Document for public access to a Transparent copy of the
Document, and likewise the network locations given in
the Document for previous versions it was based on.
These may be placed in the History section. You may
omit a network location for a work that was published at
least four years before the Document itself, or if the orig-
inal publisher of the version it refers to gives permission.
K. For any section Entitled Acknowledgements or
Dedications, Preserve the Title of the section, and pre-
serve in the section all the substance and tone of each of
the contributor acknowledgements and/or dedications
given therein.
L. Preserve all the Invariant Sections of the Document,
unaltered in their text and in their titles. Section num-
bers or the equivalent are not considered part of the sec-
tion titles.
M. Delete any section Entitled Endorsements. Such
a section may not be included in the Modied Version.
N. Do not retitle any existing section to be Entitled
Endorsements or to conict in title with any Invariant
Section.
O. Preserve any Warranty Disclaimers.
If the Modied Version includes newfront-matter sec-
tions or appendices that qualify as Secondary Sections
and contain no material copied from the Document, you
may at your option designate some or all of these sec-
tions as invariant. To do this, add their titles to the list of
Invariant Sections in the Modied Versions license no-
tice. These titles must be distinct from any other section
titles.
You may add a section Entitled Endorsements, pro-
vided it contains nothing but endorsements of your
Modied Version by various partiesfor example, state-
ments of peer review or that the text has been approved
by an organization as the authoritative denition of a
standard.
You may add a passage of up to ve words as a Front-
Cover Text, and a passage of up to 25 words as a Back-
Cover Text, to the end of the list of Cover Texts in the
Modied Version. Only one passage of Front-Cover
Text and one of Back-Cover Text may be added by (or
through arrangements made by) any one entity. If the
Document already includes a cover text for the same
cover, previously addedby you or by arrangement made
by the same entity you are acting on behalf of, you may
not add another; but you may replace the old one, on ex-
plicit permission fromthe previous publisher that added
the old one.
The author(s) and publisher(s) of the Document do
not by this License give permission to use their names
for publicity for or to assert or imply endorsement of any
Modied Version.
Combining documents
You may combine the Document with other documents
released under this License, under the terms dened in
section 4 above for modied versions, provided that you
include in the combination all of the Invariant Sections of
all of the original documents, unmodied, and list them
all as Invariant Sections of your combined work in its
license notice, and that you preserve all their Warranty
Disclaimers.
The combined work need only contain one copy of
this License, and multiple identical Invariant Sections
may be replaced with a single copy. If there are multi-
ple Invariant Sections with the same name but different
contents, make the title of each such section unique by
adding at the end of it, in parentheses, the name of the
original author or publisher of that section if known, or
else a unique number. Make the same adjustment to the
section titles in the list of Invariant Sections in the license
notice of the combined work.
In the combination, you must combine any sections
Entitled History in the various original documents,
forming one section Entitled History; likewise com-
bine any sections Entitled Acknowledgements, and
any sections Entitled Dedications. You must delete all
sections Entitled Endorsements.
63
15. License for this text
Collections of documents
You may make a collection consisting of the Document
and other documents released under this License, and
replace the individual copies of this License in the vari-
ous documents with a single copy that is included in the
collection, provided that you follow the rules of this Li-
cense for verbatim copying of each of the documents in
all other respects.
You may extract a single document from such a col-
lection, and distribute it individually under this License,
provided you insert a copy of this License into the ex-
tracted document, and follow this License in all other
respects regarding verbatim copying of that document.
Aggregation with independent works
A compilation of the Document or its derivatives with
other separate and independent documents or works, in
or on a volume of a storage or distribution medium, is
called an aggregate if the copyright resulting from the
compilation is not used to limit the legal rights of the
compilations users beyond what the individual works
permit. When the Document is included an aggregate,
this License does not apply to the other works in the
aggregate which are not themselves derivative works of
the Document.
If the Cover Text requirement of section 15.1.2 is appli-
cable to these copies of the Document, then if the Doc-
ument is less than one half of the entire aggregate, the
Documents Cover Texts may be placed on covers that
bracket the Document within the aggregate, or the elec-
tronic equivalent of covers if the Document is in elec-
tronic form. Otherwise they must appear on printedcov-
ers that bracket the whole aggregate.
Translation
Translation is considered a kind of modication, so you
may distribute translations of the Document under the
terms of section 15.1.3. Replacing Invariant Sections
with translations requires special permission from their
copyright holders, but you may include translations of
some or all Invariant Sections in addition to the original
versions of these Invariant Sections. You may include a
translation of this License, and all the license notices in
the Document, and any Warrany Disclaimers, provided
that you also include the original English version of this
License and the original versions of those notices and
disclaimers. In case of a disagreement between the trans-
lation and the original version of this License or a notice
or disclaimer, the original version will prevail.
If a section in the Document is Entitled Acknowl-
edgements, Dedications, or History, the require-
ment (section 15.1.3) to Preserve its Title (section 15.1.0)
will typically require changing the actual title.
Termination
You may not copy, modify, sublicense, or distribute the
Document except as expressly provided for under this
License. Any other attempt to copy, modify, sublicense
or distribute the Document is void, and will automati-
cally terminate your rights under this License. However,
parties who have received copies, or rights, fromyou un-
der this License will not have their licenses terminated
so long as such parties remain in full compliance.
Future revisions of this license
The Free Software Foundation may publish new, re-
vised versions of the GNU Free Documentation Li-
cense from time to time. Such new versions will be
similar in spirit to the present version, but may dif-
fer in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing
version number. If the Document species that a par-
ticular numbered version of this License or any later
version applies to it, you have the option of following
the terms and conditions either of that specied version
or of any later version that has been published (not as a
draft) by the Free Software Foundation. If the Document
does not specify a version number of this License, you
may choose any version ever published (not as a draft)
by the Free Software Foundation.
ADDENDUM: How to use this License for your
documents
To use this License in a document you have written, in-
clude a copy of the License in the document and put the
following copyright and license notices just after the title
page:
Copyright (c) <year> <your name>. Permission is
granted to copy, distribute and/or modify this docu-
ment under the terms of the GNU Free Documentation
License, Version 1.2 or any later version published by the
Free Software Foundation; with no Invariant Sections,
no Front-Cover Texts, and no Back-Cover Texts. A copy
of the license is included in the section entitled GNU
Free Documentation License.
If you have Invariant Sections, Front-Cover Texts and
Back-Cover Texts, replace the with...Texts. line with
this:
with the Invariant Sections being <list their titles>,
with the Front-Cover Texts being <list>, and with the
Back-Cover Texts being <list>.
If you have Invariant Sections without Cover Texts, or
some other combination of the three, merge those two
alternatives to suit the situation.
If your document contains nontrivial examples of pro-
gram code, we recommend releasing these examples in
parallel under your choice of free software license, such
64
15. License for this text
as the GNU General Public License, to permit their use
in free software.
Copyright
Copyright (c) 2000, 2001, 2002 Free Software Foundation,
Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307,
USA
Everyone is permitted to copy and distribute verbatim
copies of this license document, but changing it is not
allowed.
15.2. Authors position on commercial
publishing
Thanks to modern technology, one can prepare an en-
tire book electronically, in ready-to-print form, on a per-
sonal computer. Sending an electronic book across the
world takes at most a few minutes and costs as much
as a cup of tea. The author encourages everyone inter-
ested in reading the text to download and/or print it, in
whole or in part. The two-column formatting of the text
is designed to require the least possible amount of paper
when printed. Everyone is also entitled to commission a
print shop to produce bound copies of the text, in which
case the single-column formatting may be preferred. The
cost of one bound copy may be estimated as 10 to 20 US
dollars.
A commercial publisher may want to offer profession-
ally printed and bound copies for sale. Since the book
is distributed with the complete set of source les, it
will be a matter of minutes to reformat the book ac-
cording to the taste or constraints of a particular pub-
lisher, even without the authors assistance. The au-
thor welcomes commercial printing of the text, as long
as the publisher adheres to the conditions of the license
(the GNU FDL). Since the FDL disallows granting exclu-
sive distribution rights, the author cannot sign a stan-
dard exclusive-rights contract with a publisher. How-
ever, the author will consider signing any publishing
contract that leaves intact the conditions of the FDL.
65

S-ar putea să vă placă și