Documente Academic
Documente Profesional
Documente Cultură
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
Electrodynamics, Quantum
W. P. Healy
RMIT University
I. Introduction
II. Nonrelativistic Quantum Electrodynamics
III. Relativistic Quantum Electrodynamics
199
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
and positrons. In its most accurate form, the theory com- was not formulated until 1621, when it was discovered
bines the methods of quantum mechanics with the prin- experimentally by Snell. Snell’s law was later derived
ciples of special relativity; often, however, it is sufficient theoretically from Fermat’s celebrated principle of least
to treat the charged particles in nonrelativistic approxima- time.
tion. Each part of the complete dynamical system of radi- From about the middle of the seventeenth century to the
ation and charges displays a characteristic wave–particle end of the nineteenth century there were two competing,
duality. Thus, electrons behave in many circumstances as and mutually contradictory, theories of light. The wave
particles, but they can also exhibit wave properties such theory was initiated by Hooke and Huygens following the
as interference and diffraction. Similarly, electromagnetic first observations of interference and diffraction. Huygens
radiation, which was considered classically as a wave field, enunciated a principle, based on the wave theory, from
may have particle properties ascribed to it under suitable which he derived the laws of reflection and refraction. He
conditions (e.g., in scattering experiments). The particles also discovered the polarization properties of light. These
or quanta associated with the electromagnetic field are properties, as well as the law of rectilinear propagation,
called photons. were difficult to explain by the wave theory, which at that
Quantum electrodynamics is a highly successful theory, time dealt only with longitudinal waves in a hypothetical
despite certain mathematical and interpretational difficul- “aether,” analogous to sound waves in air. These difficul-
ties inherent in its formulation. Its success is due in part ties led Newton to propose a corpuscular theory, according
to the weakness of the coupling between the radiation and to which light is emitted from luminous bodies in a stream
the charges, which makes possible a perturbative treat- of small particles or corpuscles. Newton’s views inhibited
ment of the interaction of the two parts of the system. The any further advances in the wave theory until about the
theory accounts for many phenomena, including the emis- beginning of the nineteenth century. In the meantime, the
sion or absorption of radiation by atoms or molecules, the fact that light has a finite speed was confirmed by Römer
scattering of photons or electrons, and the creation or an- through observations of eclipses of the moons of Jupiter.
nihilation of electron–positron pairs. Its most famous pre- This occurred in 1675, more than two millenia after the
dictions concern the electromagnetic shift of energy levels time of Empedocles. (The speed of light in empty space
observed in atomic spectra and the anomalous magnetic is denoted by c and is approximately 2.998 × 1010 cm/sec
moment of the electron; both of these predictions are in in cgs units.)
good agreement with experimental results. Quantum elec-
trodynamics also includes the interaction of photons with
B. Classical Electrodynamics
muons and tauons (which differ from electrons only in
mass) and their antiparticles. The validity of the theory has The revival of the wave theory began with Young’s inter-
been tested in high-energy collision experiments involving pretation of interference experiments. In particular, the de-
these particles down to distances less than 10−16 cm. structive interference of two light beams at certain points
in space seemed totally inexplicable on the corpuscular
hypothesis but was readily accounted for by the wave the-
ory. Young also suggested that light waves execute trans-
I. INTRODUCTION
verse rather than longitudinal vibrations, as this could then
explain the observed polarization properties. The wave
A. Early Theories of Light
theory was further developed by Fresnel, who applied it
Since quantum electrodynamics is the modern theory of to phenomena involving diffraction, interference of po-
electromagnetic radiation, including visible light, it is in- larized light, and crystal optics. An important test of the
structive to begin with a brief historical review of previ- theory was provided by the comparison of the speeds of
ous theories. The nature of light has long been a subject light in media with different refractive indexes. Accord-
of interest to philosophers and scientists. In the fifth cen- ing to the wave theory, light travels slower in an optically
tury B.C., Empedocles of Acragas held that light takes denser medium, but according to the corpuscular theory
time to travel from one place to another but that we can- it travels faster. The results of experiments carried out in
not perceive its motion. He knew that the moon shines 1850 agreed with the predictions of the wave theory.
by light reflected from the sun and was also aware of The wave theory was in a certain sense completed when
the cause of solar eclipses. Heron of Alexandria, who is Maxwell established his equations for the electromagnetic
thought to have lived in the first or second century A.D., field and showed that they have solutions corresponding to
discussed the rectilinear propagation properties of light. transverse electromagnetic waves in which both the elec-
In his book Catoptrica he derived the law of reflection us- tric and magnetic induction field vectors oscillate perpen-
ing a principle of minimal distance. The law of refraction dicularly to the direction of propagation. The speed of
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
C. Photons
Despite the success of classical electromagnetic theory in
dealing with the propagation, interference, and scattering
of light, experiments carried out about the end of the nine-
teenth century and the beginning of the twentieth century
led to the reintroduction of the corpuscular theory, though FIGURE 1 The horizontal lines represent the discrete energy lev-
in a form different to that proposed by Newton. The depar- els of a quantized harmonic oscillator of frequency ν. The energy
of the ground state is taken to be 0 and the equally spaced lev-
ture from classical concepts began in 1900 when Planck
els 0, hν, 2hν, . . . , nhν, . . . are labeled by the quantum numbers
published his law of black-body radiation. In this law the 0, 1, 2, . . . , n, . . . , respectively. Excitation of the radiation field at
quantum of action h (approximately 6.626 × 10−27 erg frequency ν to level nhν corresponds to the addition of n photons,
sec), now known as Planck’s constant, made its first ap- each with energy hν, to the field.
pearance in physics. Planck’s law for the variation with
frequency of the energy in black-body radiation at a given
temperature is closely related to the existence of discrete atoms. The light quantum hypothesis states not only that
energy levels for the electromagnetic field, even though the energy of monochromatic radiation of frequency ν is
Planck, in his original derivation of the law, did not con- made up of integral multiples of the quantum hν, but also
sider the field itself to be quantized. A black body is one that the momentum is made up of integral multiples of the
that absorbs all the electromagnetic energy incident on it. quantum h/λ, where λ is the wavelength of the radiation
It was shown by Kirchhoff in 1860 that when such a body (ν and λ are related by the equation νλ = c). This hypothe-
is heated, the emitted radiation does not depend on the sis contrasts sharply with the classical picture in which the
detailed composition of the body but only on its absolute energy and momentum are regarded as continuously vari-
temperature. Radiation confined in a state of thermal equi- able. The existence of discrete light quanta, or photons, is
librium in a cavity with perfectly reflecting walls behaves not immediately evident on a macroscopic scale, however.
as black-body radiation. According to classical electro- Due to the smallness of Planck’s constant, even in a weak
magnetic theory, the cavity radiation can undergo simple electromagnetic field there is an enormous number of pho-
harmonic motion at a number of certain allowed or char- tons, provided the frequency is not too high. For example,
acteristic frequencies ν, the values of which depend on black-body radiation at a temperature of 300 K (room
the shape and size of the enclosure. These so-called radia- temperature) contains about 5.5 × 108 photons/cm3 , most
tion oscillators may be quantized, as in ordinary quantum of which correspond to frequencies in the infrared part of
mechanics. Then for each nonnegative integer n, an os- the electromagnetic spectrum. At a temperature of 6000 K
cillator with frequency ν has a nondegenerate stationary (roughly that at the surface of the sun), the bulk of the radi-
state with energy nhν above the ground-state energy (see ation has frequencies in the visible spectrum, and there are
Fig. 1). The possible values of the energy at this frequency about 4.4 × 1012 photons/cm3 . (The total number of pho-
thus form a discrete set 0, hν, 2hν, 3hν, . . . instead of a tons in black-body radiation is proportional to the cube of
continuum. It can be shown that quantization of the os- the absolute temperature.)
cillators in this way for all the allowed frequencies leads Individual photons manifest themselves only through
directly to Planck’s law. their interaction with atomic systems. According to Ein-
In 1905, Einstein made use of the idea of light quanta in stein’s treatment of the absorption and emission of radia-
order to explain the photoelectric effect and later applied tion, for example, an atom in a stationary state can make a
it to the emission as well as the absorption of radiation by transition to a lower or a higher energy level accompanied
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
The wave properties of radiation can be adequately de- posed on ¯ and cannot be fully subtracted out, as its value
scribed by using Maxwell’s equations for the electromag- is somewhat uncertain (due to the uncertainty q in the
netic field, and these are retained as operator equations position of the test body). This field, and hence ¯ , can
in the quantum theory of radiation. Suppose, for exam- indeed be made as small as desired by making the product
ple, that the electromagnetic field has a node (i.e., a point Qq sufficiently small, but then ¯ becomes relatively
where the field amplitudes always vanish) due to interfer- large. The experimental conditions for measurements of
ence at P. Then an atom placed at P has, in so far as it ¯ and ¯ are complementary—those that serve to measure
can be regarded as a geometrical point, zero probability ¯ more precisely will meaeure ¯ less precisely, and vice
of absorbing a photon from the field. The field amplitudes versa. The order of magnitude of the uncertainly product
are, however, subject to uncertainty relations, involving is given by
Planck’s constant h, which are analogous to the Heisen-
berg uncertainty relations for the position and momentum ¯ ¯ ∼ h/L 3 T
of a particle in ordinary quantum mechanics. The origin of and is independent of both Q and q . Thus, only for well-
the uncertainty relations for the fields may be understood separated regions or over long intervals of time can both
by considering a simple example. averages be measured with unlimited accuracy.
Let ¯ denote the average value of a component of the
electric field over a volume V and a time interval T . (Since
E. Electrons and Positrons
a field component at a definite point in space and a definite
instant of time appears an abstraction from physical real- Dirac’s original radiation theory had to be modified to
ity, only such average values need be considered.) Now bring it into line with the special theory of relativity. This
¯ may be found by measuring the change produced by
was true particularly of the treatment of the charged par-
the field in the momentum of a charged test body occu- ticles with which the electromagnetic field interacts. In
pying the volume during this time. Although the position 1928, Dirac had developed a one-particle relativistic wave
and momentum of the test body are uncertain by amounts equation for the electron that automatically accounted for
q and p that satisfy the Heisenberg uncertainty rela- the observed electron spin and predicted values for the fine
tion q p ∼ h, it can be shown that this does not impair structure of the energy levels of the hydrogen atom and of
the accuracy of the field measurement, provided a suffi- hydrogen-like ions that were in agreement with the exper-
ciently massive and highly charged body (which is there- imental data of that time. The Dirac equation, however,
fore part of a macroscopic measuring instrument) is used. also has extraneous solutions corresponding to negative-
The charge Q on the body must be such that the product energy states. To eliminate these, Dirac introduced in 1930
Qq is large; ¯ can then be measured to any desired ac- the so-called hole theory, according to which most of the
curacy ¯ . However, in the measurement of two average negative-energy states are occupied, each having one elec-
field strengths ¯ and ¯ (taken over two space regions V tron. Any unoccupied states, or holes, may be interpreted
and V during two times intervals T and T , respectively), as particles with positive energy and positive charge. These
it may not be possible to make both ¯ and ¯ as small particles were at first thought by Dirac to be protons,
as desired. If the separation distance L between V and V but were later identified as positrons, or antiparticles of
(see Fig. 4) is such that most light signals emitted from V electrons.
during the time interval T will reach V during the time The experimental discovery of the positron by Ander-
interval T , then the measurement of ¯ will influence that son in 1932 lent support to Dirac’s hole theory. Never-
of ¯ in a way that is to some extent unknown. The field theless, difficulties remained, such as the infinite (but un-
produced by the test body used to measure ¯ is superim- observable) charge density associated with the “sea” of
negative-energy electrons. These difficulties can be re-
moved, however, by treating Dirac’s one-particle wave
equation for the electron as a field equation and subject-
ing it to a process of quantization, similar in some respects
to the quantization of the classical electromagnetic field.
This method, which is often referred to as second quanti-
zation, was applied to the Dirac equation by Heisenberg
and others and resulted in the appearance of electrons and
FIGURE 4 Regions V and V in which the average electric fields
¯ are measured during time intervals T and T , respec-
positrons, on an equal footing, as quanta of the Dirac field,
¯ and
tively. The separation distance L is such that most light signals just as photons appear as quanta of the Maxwell field.
emitted from V during the interval T will reach V during the inter- There are, however, some differences between the meth-
val T . ods of second quantization used for the Dirac and Maxwell
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
fields, which stem from the different characteristics of the positron pair creation gave rise to an infinite vacuum po-
associated particles or quanta. Photons have zero rest mass larization in an external field and also implied an infinite
(but have nonzero momentum because they travel at the self-energy for the photon.
speed of light), are electrically neutral, have spin 1 (in units The need to extract finite results from the formalism be-
✏
of h, which is Planck’s constant divided by 2π ), and are came acute when refinements in experimental technique
bosons (i.e., any number of photons can occupy a given revealed small discrepancies between the observed fine
state). Electrons and positrons have the same nonzero rest structure of the energy levels of atomic hydrogen and that
mass, carry equal but oppositely signed charges (by con- given by Dirac’s one-particle relativistic wave equation.
vention, this is negative for the electron and positive for These differences, whose existence had been suspected
the positron), have spin 12 , and are fermions (i.e., not more for some time, were measured accurately by Lamb and
than one electron or positron can occupy a given state). It Retherford in 1947. In the same year, Kusch and Foley
was shown by Pauli that there is a connection between the found that the value of the intrinsic magnetic moment of
spin of a particle and its so-called statistics—particles with an electron in an atom also differs slightly from that pre-
integer spin are bosons and are not subject to the exclusion dicted by the Dirac theory. For it to be shown that these
principle, whereas particles with half odd-integer spin are discrepancies could be explained as radiative effects in
fermions and are subject to the exclusion principle. It is quantum electrodynamics, it was necessary first to recog-
necessary for photons to be bosons in order that the quan- nize that the mass and charge of the bare electrons and
tized electromagnetic field may have a classical counter- positrons that appear in the formalism cannot have their
part, which is realized in the limit of large photon occu- experimentally measured values. Since the electromag-
pation numbers. The quantized Dirac field, on the other netic field that accompanies an electron, for example, can
hand, does not have a physically realizable classical limit. never be “switched off,” the inertia associated with this
field contributes to the observed mass of the electron;
the bare mechanical mass itself is unobservable. Simi-
F. Divergences and Renormalization
larly, an electromagnetic field is always accompanied by
In quantum electrodynamics, as in classical electrodynam- a current of electrons and positrons whose influence on
ics, there are no known exact solutions to the equations for the field contributed to the measured values of charges.
the complete dynamical system of radiation and charges. The parameters of mass and charge, therefore, had to be
Indeed, from a purely mathematical viewpoint, the ques- renormalized to express the theory in terms of observ-
tion of even the existence of such solutions is still an open able quantities. The results for the shift of energy levels
one. Approximate solutions may be found by assuming (now known as the Lamb shift) and the anomalous mag-
that the coupling between the two parts of the system is netic moment of the electron then turned out to be finite
weak and using perturbation theory. This is justified by and were, moreover, in good agreement with experimental
the smallness of the fine-structure constant α, which gives results. The use of explicitly relativistic methods of cal-
a measure of the strength of the coupling: culation, developed by Tomonaga and Schwinger, was es-
sential in avoiding possible ambiguities in this procedure.
α = e2 /4π h c ≈
✏ 1
137
,
Further important contributions were made by Dyson, who
where e is the magnitude of the charge on the electron showed that the renormalized theory gave finite results for
and rationalized cgs units are being used (e ≈ 1.355 × interaction processes of arbitrary order, corresponding to
10−10 g1/2 cm3/2 /sec). arbitrary powers of the coupling constant e, and by Feyn-
It was found in the 1930s and 1940s that the calcu- man, who introduced a diagrammatic representation of the
lations for many processes, when taken beyond the first mathematical expressions for these processes, which are
approximation, gave divergent results. Some divergences often of considerable complexity.
(the so-called infrared divergences) were due to deficien- The Feynman-diagram technique and Dyson’s pertur-
cies in the approximation method itself. Others (ultravio- bation theory are now part of the standard formulation of
let divergences) were associated with the problem of the quantum electrodynamics. This formulation and some of
structure and self-energy of the electron and other ele- its applications will be outlined in Sections II and III.
mentary particles. This problem had also arisen in classi- In this article only the electromagnetic interactions of
cal electrodynamics, where the electron was assumed to electrons and positrons (or, more generally, of charged
have a structure-dependent electromagnetic contribution leptons, which include muons and tauons and their antipar-
included in its inertial mass. In quantum electrodynam- ticles) are considered. These particles also participate in
ics, however, there occurred additional divergences of a the so-called weak interaction (and, of course, in the much
radically different nature, due to effects that have no clas- weaker gravitational interaction). A unified theory of the
sical analogues. For example, the possibility of electron– electromagnetic and weak interactions has been developed
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
in recent years. Many elementary processes, however, are ergy of order mc2 (where m is the rest mass of the lightest
dominated by electromagnetic effects, and these alone charged particle, namely the electron) and will therefore
form the subject matter of quantum electrodynamics. be excluded if ν
νc , where νc is defined by hνc = mc2
and is about 1020 Hz. (Here νc is the frequency associated
with the Compton wavelength of the electron given by
II. NONRELATIVISTIC QUANTUM λc = h/mc.) It follows that hard X-rays and high-energy
ELECTRODYNAMICS gamma rays are to be omitted from consideration in this
A. Approximations section.
dynamical variables (such as energy and momentum) by said to correspond to a pure rotation in the generalized
linear operators, which act on the vectors to produce other vector space. Indeed, the individual probabilities |c|2 do
vectors of the same kind. The vacuum state is that for not change with time, since the probability amplitudes c
✏
which every oscillator has its lowest energy. It can be as- change only through a phase factor exp(–i Et/ h ). This is
sumed, for convenience, that the energy of the vacuum consistent with the fact that for the free field, photons are
state is zero. The so-called zero-point energy 1/2hν of an neither created nor destroyed.
oscillator with frequency ν is therefore discarded, but this
amounts merely to a shift in the datum point for measuring C. The Quantized Electromagnetic Field
energies. (Nevertheless, changes in the zero-point energy While the treatment of the radiation field as an assembly of
can give rise to measurable forces, for example, between photons may seem to emphasize its corpuscular aspects,
conducting plates. This is called the Casimir effect.) the wave properties are, nevertheless, also contained in the
If a radiation oscillator of mode (k, λ) is excited to its formalism. In particular, Maxwell’s equations in empty
nth stationary state, with energy nhν, then this is taken to space remain valid, although they now appear as opera-
correspond physically to the presence of n photons, each tor equations rather than as equations for classical fields.
with energy hν, for that mode of the field. An occupation- Both the electric field and the magnetic induction field
number state is one with a specified number of photons in become operators that can be expressed as linear combi-
each mode. The number n kλ of photons in mode (k, λ) is nations of the photon annihilation and creation operators.
then called the occupation number for that mode. Only a If these expressions are inserted into the classical formula
finite number of occupation numbers can be nonzero and for the field energy, then the expansion of the Hamiltonian
the total numbers of photons is n kλ with the summa- operator in terms of the annihilation and creation operators
tion extending over occupied modes. Similarly, the total is recovered. Thus,
energy
E of the photons in an occupation-number state 1
is (n kλ hν). The vacuum state can be thought of as an HRAD = (2 + 2 ) d V.
occupation-number state for which every occupation num- 2
ber is zero. (It is true that an infinite zero-point energy also appears.
The operator that represents the total energy of the radia- This energy may, however, be discarded, as were the zero-
tion field is called the Hamiltonian operator and is denoted point energies of the individual oscillators.) Similarly, the
by HRAD . It can be expressed in terms of photon annihila- classical expression
tion and creation operators. The annihilation operator for
1
mode (k, λ), when acting on an occupation-number state × dV
c
vector, reduces the number of photons for that mode by
one, and when acting on the vacuum-state vector gives for the field momentum, obtained from Poynting’s theo-
the zero vector. Similarly the creation operator increases rem, implies, when reinterpreted in terms of annihilation
✏
the number of photons by one. (In the context of the ele- and creation operators, that a momentum h k is to be as-
mentary theory of the harmonic oscillator, these operators cribed to each photon of mode (k, λ). This is in agreement
are usually called lowering and raising operators, respec- with Einstein’s hypothesis, since h|k| = h/λ.
✏
greater magnitude than those in the vacuum state. Now the momentum operators p1 , p2 , . . . , p N . The Hamiltonian
occupation-number states resemble incoherent superposi- operator for this system is given by
tions of classical plane-wave states, since they are asso-
N
1 2
ciated with definite wave vectors and polarization vectors HATOM = pα + U,
but do not have well-defined phases (in the classical sense), α=1
2m α
whereas a classical plane-wave field has a simple harmonic
time dependence. There exist other states of the quantized where the first term represents the kinetic energy and the
field, however, called coherent or quasi-classical states, second the potential energy. The potential energy U de-
in which the phase is more well defined but the number pends on the positions and momenta of the particles, their
of photons, and hence the energy and momentum, is less charges, and the external fields, if any are present.
sharp than for the occupation-number states. To each state It is often a good approximation to treat the nuclei as
of the classical field there corresponds a unique coherent fixed and to regard the coordinates and momenta of the
state of the quantized field such that (a) the mean values of electrons alone as dynamical variables. This is possible
the quantized field components are equal to the classical because of the large mass of the protons and neutrons com-
field components and (b) the mean value of the quantized pared with that of the electrons (proton mass ≈ 1836 ×
energy is equal to the classical energy. The coherent states electron mass). The fixed-nuclei approximation involves,
are also remarkable in the following respect: the field fluc- among other things, the neglect of the recoil of the atoms
tuations for these states are exactly the same as those for which should accompany the absorption or emission of
the vacuum state. photons. The recoil velocity is, however, normally very
The operators representing the components of the elec- small. For example, the speed imparted to a hydrogen
tric field and the magnetic induction field satisfy cer- atom by a photon with a frequency in the visible spectrum
tain commutation relations, which may be derived from is of the order of 10−8 times the speed of light in vacuo.
those for the photon annihilation and creation operators. Such a speed results in only a very slight Doppler shift in
Just as in ordinary quantum mechanics the commutation the frequency of the emitted radiation. In the fixed-nuclei
relation approximation, the Hamiltonian operator HATOM has, in
general, a discrete set of energy levels Er , E s , . . . cor-
q p − pq = i h ✏
parts of the complete system. In the absence of HINT , the there is a probability that after a time t a photon of mode
product vectors of the type mentioned above represent sta- (k, λ) has been created and the atom has made a transition
tionary states in which the photon occupation numbers are to a state s with lower energy E s , where
constant and the atom has a fixed energy. Due to the pres-
hν ≈ Er − E s .
ence of HINT , however, transitions between these states
can occur, in which, for example, the atom loses or gains Since there are no photons present initially, this process is
energy and the number of photons is correspondingly in- known as spontaneous emission. It is represented graph-
creased or decreased. The interaction Hamiltonian may be ically by the Feynman diagram in Fig. 6. Single-photon
expressed as the sum of two parts, one proportional to e spontaneous emission involves, in the lowest order of per-
and the other to e2 : turbation theory, only that term in the interaction Hamil-
tonian that is proportional to e. Furthermore, the so-called
HINT = eH1 + e2 H2 ,
dipole approximation can be used for optical or lower fre-
where H1 is linear and H2 is quadratic in the photon anni- quencies and bound states of atoms or small molecules,
hilation and creation operators. As a consequence, these since then the wavelength of the emitted photon is much
two terms give rise to processes in which the number of larger than the dimensions of the region in which the
photons changes by one or two, respectively. atomic wave functions differ significantly from zero. The
The use of the so-called Coulomb gauge is very con- emission probability can sometimes be expressed in terms
venient in nonrelativistic theory. In this gauge only the of a constant transition rate (that is, a probability per unit
transverse electromagnetic field, which is a superposition time for the transition to occur) known as Einstein’s A
of modes with transverse polarization vectors (ê(λ) · k̂ = 0), coefficient. The total transition rate for emission of the
is quantized. The effect of the longitudinal field, responsi- photon in any direction and with any polarization is given
ble for the instantaneous Coulomb interaction between the in dipole approximation by
charges, is treated as a potential as in ordinary quantum
Ars = (16π 3 ν 3 /3hc3 )|µr s |2 ,
mechanics and is included in the expression for HATOM .
The time evolution of the complete system is governed where µr s denotes the dipole transition moment, which
by Schrödinger’s equation: can be calculated once the wave functions for the atomic
states r and s are known. Thus, in dipole approximation,
∂
ih✏
= H , Einstein’s A coefficient is proportional to the cube of the
∂t transition frequency and the square of the length of the
where now H is the total Hamiltonian and represents dipole transition moment.
the state of both the field and the atom at time t. No The reciprocal of Ars is the average lifetime of the upper
exact solutions of this equation are known. Fortunately, state r with respect to the lower state s. For example,
however, HINT is of order e and, hence, can be regarded for optical transitions with a photon wavelength of order
as a small perturbation to the unperturbed Hamiltonian 5000 Å (1 Å = 10−8 cm) and a dipole transition moment
HRAD + HATOM . Time-dependent perturbation theory can
then be used to calculate approximately the probabilities
for transitions between unperturbed states. The total en-
ergy is always exactly conserved in transitions between
initial and final states. Since the perturbation is small, the
unperturbed energy is approximately conserved in such
transitions.
E. Applications
Applications of the theory to the emission and absorption
of photons by atoms and the scattering of photons by free FIGURE 6 Feynman diagram for spontaneous emission. The left-
electrons will now be considered. hand and right-hand portions of the parallel horizontal lines repre-
sent the initial and final atomic states r and s, respectively. (Double
lines are used to indicate that the electrons are not free but are
1. Spontaneous Emission—Einstein’s bound to the atomic nucleus.) The dotted line represents the emit-
A Coefficient ted photon of mode (k, λ). This is created when the atom under-
goes the transition r → s. The vertex labeled e corresponds to the
If initially (a) the atom is in an excited state r with energy first-order term in the interaction Hamiltonian, which is responsible
Er and (b) the radiation field is in the vacuum state, then for this process in the lowest order of perturbation theory.
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
FIGURE 8 Feynman diagrams for Thomson scattering. Dotted lines represent photons and solid lines represent free
electrons.
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
arriving from the left, and the scattered photon and recoil 4. Other Applications
electron disappearing to the right. The contribution of
The field of application of nonrelativistic quantum electro-
Fig. 8a arises from the e2 term in the interaction Hamil-
dynamics has expanded considerably in recent years due
tonian; it may be said that here the incident photon is
to the development of lasers and their use as spectroscopic
annihilated and the scattered photon simultaneously cre-
tools for investigating a variety of physical and chemical
ated. In Figs. 8b and 8c, on the other hand, the annihilation
systems. Lasers are sources of highly coherent and very
and creation are represented by two different one-photon
intense beams of light. In the subject of quantum optics,
vertices, each arising from the e term in the interaction
the quantum statistical properties, such as the degree of
Hamiltonian. These diagrams differ in the order in which
coherence, of the light beam itself may be the object of in-
the creation and annihilation take place. Overall momen-
vestigation. For example, the quasi-classical states of the
tum is conserved at every vertex.
radiation field referred to earlier exhibit a higher degree
It must be emphasized, however, that the contributions
of coherence than that of the occupation-number states,
of Figs. 8a–8c cannot be physically separated, since only
and these in turn are less chaotic than fields in thermal
the initial and final states are observed. For this reason, the
equilibrium, in which the distribution of photons follows
intermediate states are often referred to as virtual states.
Planck’s radiation law.
Indeed, it may be shown that the contributions of Figs. 8b
The high intensity of the light beams that may be
and 8c effectively cancel, as their sum is of order h k /(mc)
✏
the cgs expressions h k, mc2 , and e2 /(4π hc) for the mo-
✏ ✏
convert from natural units to cgs units, by inserting appro- The difficulties of interpretation associated with the
✏
priate factors of h and c. negative-energy solutions of the Dirac equation have al-
If quantum electrodynamics is to satisfy the principles ready been mentioned. These difficulties disappear in the
of the special theory of relativity, its equations must be second-quantized version of the theory, in which electrons
covariant under Lorentz transformations. Lorentz trans- and positrons are treated on an equal footing and all have
formations relate the space–time coordinates x, y, z, and positive energy. Moreover, this version provides a calcu-
t of events as seen by observers using inertial frames of lus for processes involving annihilation and creation of
reference moving with uniform velocity relative to each electrons and positrons—in the high-energy regime, the
other. (The coordinates x, y, z, and t are the components number of these particles is no longer conserved.
of a four-dimensional vector, to be denoted simply by x.) The spinor ψ and its related adjoint spinor ψ̄ are first
A covariant equation has the same form for two such ob- expressed in terms of plane-wave solutions of the free-
servers. The fact that physical laws are expressible as co- particle equation. These solutions correspond to particles
variant equations means that these laws are the same for with energy E, momentum p, and rest mass e satisfying
all observers using inertial reference frames. the relativistic energy–momentum relation
The state of a quantum-mechanical system (for exam-
E 2 = |p|2 + m 2 .
ple, the electromagnetic field in vacuo) is specified in rel-
ativistic theory on a three-dimensional spacelike hyper- The coefficients in the expansions of ψ and ψ̄ may then
plane. In a given inertial frame, this consists of either all be interpreted as annihilation and creation operators for
the points in three-dimensional space at a particular in- electrons and positrons in definite momentum and spin
stant of time or all the events on a two-dimensional plane states. The spin states can be chosen to be helicity states
moving for all time perpendicularly to itself with constant of the electrons and positrons, that is, states in which the
speed greater than that of light. Two distinct events on a component of spin in the direction of motion is either 12
spacelike hyperplane cannot be connected by signals trav- (right-hand helicity) or − 12 (left-hand helicity). It is only
eling with speed less than or equal to the speed of light, the component of spin in the direction of motion that is
and so two measurements made in the vicinity of the cor- invariant under Lorentz transformations.
responding space–time points will not interfere. This is The algebra of the creation and annihilation operators
known as microscopic causality. The whole of the four- for electrons and positrons differs from that of the cre-
dimensional space–time manifold is filled with a set of ation and annihilation operators for photons. (Technically,
parallel spacelike hyperplanes, which may be labeled by it involves anticommutation instead of commutation rela-
an invariant timelike parameter τ , with τ ranging from tions.) The difference arises from the fact that, whereas
−∞ to ∞. The evolution of the system is described by photons are bosons, electrons and positrons are fermions
specifying the state for each hyperplane τ and is deter- and are subject to the exclusion principle. The only pos-
mined dynamically, through Schrödinger’s equation, on sible occupation numbers for electrons or positrons are
the interval [τ1 , τ2 ], if the state is specified at τ1 and no therefore 0 or 1. This is in agreement with Pauli’s spin-
measurements are made until τ2 . statistics theorem, since the spin of an electron or a
positron is half an odd integer. It can be shown that quantiz-
ing the Dirac field by using commutation relations instead
B. Electrons and Positrons
of anticommutation relations leads to a Hamiltonian op-
The one-particle relativistic theory of the electron is based erator with energy levels that are not bounded below and,
on the Dirac equation. This is a differential equation, with hence, one for which no stable vacuum (ground) state ex-
matrix coefficients, for a spinor wave function ψ(x) having ists. (Similarly, quantizing the Maxwell field, which has
four components ψ µ (x) (µ = 0, 1, 2, 3). The requirement intrinsic spin 1, by using anticommutation relations in-
that the Dirac equation be covariant determines the behav- stead of commutation relations leads to a breakdown of
ior of ψ under Lorentz transformations. Since the electron microscopic causality.)
is now described by a four-component spinor rather than
by a one-component scalar wave function, it has extra
C. Covariant Quantization of
degrees-of-freedom, over and above those allowed by the
the Electromagnetic Field
Schrödinger theory. These correspond to the spin or intrin-
sic angular momentum of magnitude 12 (in natural units). The quantization of the electromagnetic field in the
Hence the spin appears automatically in the Dirac the- Coulomb gauge, though very useful for dealing with
ory of the electron and does not have to be added on in bound systems in nonrelativistic approximation, is not a
an ad hoc fashion, as it does in nonrelativistic quantum manifestly covariant procedure. In the Coulomb-gauge
mechanics. formalism, only the transverse field is quantized, while
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
the longitudinal field gives rise to an instantaneous inter- to the calculation of probabilities for physically realizable
action between the charges. The division of the field into states.
transverse and longitudinal components, however, is not Despite the fact that timelike and longitudinal photons
Lorentz covariant—these components do not transform are unobservable in real states of the system, their presence
separately on going from one inertial frame to another. is important and cannot be neglected in intermediate or
So that the covariance of the theory can be exhibited, it is virtual states. For example, the Coulomb interaction may
necessary to use a guage condition on the electromagnetic be described in terms of the virtual exchange of timelike
potentials that is itself covariant. The most convenient and longitudinal photons by charged particles. The appear-
such condition is the so-called Lorentz condition. This ance of these photons in the formalism is also required, of
condition also leads to certain difficulties which are, course, if the theory is to be manifestly Lorentz covariant.
however, overcome in the formalism developed by Gupta
and Bleuler.
D. Symmetries and Conservation Laws
The electromagnetic field is, in the first instance, quan-
tized in a covariant way without reference to the Lorentz- The coupling between the quantized Maxwell and Dirac
gauge condition. In contrast to the noncovariant treatment, fields is represented by a Lorentz-invariant interaction
there are now, for each wave vector k, four types of pho- Hamiltonian density INT that links scalar and vector po-
ton, corresponding to timelike and longitudinal as well as tentials to the charge and current densities. Here INT is
two transverse polarization vectors, which are, in addition, linear in the electromagnetic potentials and bilinear in the
four-dimensional rather than three-dimensional vectors. spinor fields ψ and ψ̄ and is also of order e—there is no e2
Moreover, the inner (or scalar) product of the infinite- term as in nonrelativistic theory. The interaction Hamil-
dimensional vector space on which the photon creation tonian density may be derived from a Lagrangian density
and annihilation operators act is not positive definite; that known as the minimal-coupling Lagrangian density.
is, there exist nonzero vectors in this space the square of It is interesting to note that certain continuous symme-
whose length is zero or negative. (This is due to the metric tries of the coupled systems are reflected in the structure
of the space–time continuum, which distinguishes time- of the complete Lagrangian density , which is Lorentz
like from spacelike directions. Thus, the four-dimensional invariant and gauge invariant. According to a theorem of
vector x is spacelike, lightlike, or timelike, relative to the Noether, these symmetries must lead to conservation laws.
origin, according to whether as x 2 + y 2 + z 2 − t 2 is posi- For example, the invariance of under time displacements
tive, zero, or negative.) This constitutes a serious difficulty, implies the conservation of energy; its invariance under
since the quantum-mechanical statistical interpretation re- space displacements and rotations implies the conserva-
quires a positive-definite inner product. For the resolution tion of linear and angular momentum, respectively; and its
of this problem, the use of the Lorentz-gauge condition, invariance under gauge transformations implies the con-
which has yet to be imposed, is of decisive importance. servation of charge.
It may be shown that neither the Lorentz condition nor The complete system also has three discrete symme-
Maxwell’s equations are satisfied as operator equations in tries. It is invariant under (a) charge conjugation C, that
the covariant theory, because they are incompatible with is, the interchange of particles and antiparticles (which
the commutation relations. They are, however, satisfied affects only electrons and positrons, since the photon is
as equations for expectation values (and hence are sat- its own antiparticle); (b) the parity operation P, that is,
isfied in the classical limit), provided a subsidiary con- space inversion or the interchange of left and right; and
dition is imposed on those state vectors that are to rep- (c) time reversal T . This invariance under C, P, and T is
resent physically realizable states. The effect of the sub- not shared by all the laws of nature. The nonconservation
sidiary condition is to make the timelike and longitudinal of parity in the weak interaction, which is responsible for
photons unobservable in real states of the system. These the dynamics of beta emission, was suggested by Lee and
states have either no timelike or longitudinal photons at Yang in 1956 and subsequently confirmed experimentally.
all or only certain allowed admixtures of them. More- That the combined transformation of charge conjugation
over, changing the allowed admixtures is merely equiv- and parity is also not a symmetry follows from the decay
alent to carrying out a gauge transformation that main- of the long-lived neutral K meson into two charged pions,
tains the Lorentz condition. The allowed admixtures are a decay that is forbidden by CP conservation. Invariance
always such that the contributions of timelike and longitu- under the conbined CPT transformation, established on
dinal photons to, for example, the energy and momentum, very general assumptions (Lorentz covariance and local-
cancel out, and only the contributions of the transverse, ity), then implies that time reversal is also not a symmetry
observable photons remain. Similarly, the statistical inter- of the physical world. Hence, the separate conservation of
pretation of the theory is consistent, when this is restricted C, P, and T is only an approximation which is, however,
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
FIGURE 9 Virtual processes depicted by basic vertex diagrams (to be viewed from left to right). (a) Photon (γ )
annihilation and electron–positron (e− e+ ) pair production. (b) Electron scattering and photon creation. (c) Photon
annihilation and positron scattering. Note the convention for the sense of the arrows used on the fermion lines to
distinguish electrons and positrons.
valid for phenomena that are adequately described by elec- cesses such as those depicted in the Feynman diagrams of
trodynamics alone. Fig. 9. These diagrams are called basic vertex diagrams.
There are in all eight such diagrams, corresponding to
processes in which a photon is either annihilated or cre-
E. The S Matrix and Feynman Diagrams
ated and two fermions are annihilated or created or one is
The S matrix in quantum electrodynamics is used to cal- annihilated and the other created.
culate probability amplitudes for processes in which par- Every Feynman diagram is a combination of some or
ticles (electrons, positrons, or photons) that are initially all of the eight basic vertex diagrams—an nth-order dia-
free are allowed to interact and scatter. In the so-called gram contains n vertices. Energy and momentum (which
interaction picture of the motion, the state vector for together form a four-dimensional vector) are conserved
the complete system evolves under the influence of the at every vertex. (This is in contrast to the nonrelativistic
interaction Hamiltonian INT , alone, and the S operator theory, in which momentum but not energy is conserved
(or scattering operator) maps the state vector on the hyper- in virtual processes.) However, the relativistic relation be-
plane τ = −∞ (that is, long before the interaction takes tween energy and momentum need not be satisfied for vir-
place) onto the state vector on the hyperplane τ = ∞ (that tual particles. Now this relation cannot be satisfied by all
is, long after the interaction has ceased): the particles participating in a basic vertex process, which
must therefore be a virtual rather than a real process. For
(∞) = S(−∞). example, electron–positron annihilation with the produc-
The S operator can be developed as a power series in the tion of a single photon is forbidden by energy–momentum
coupling constant e. With the help of a theorem due to conservation, even though it is allowed by charge conser-
Wick, the structure of the nth-order contribution, corre- vation. Hence the basic vertex diagrams can appear only
sponding to the nth power of e in the expansion, may be as parts of larger Feynman diagrams depicting processes
systematically analyzed and represented by Feynman di- for which overall energy and momentum are conserved
agrams. It is usually convenient to use Feynman diagrams and the relativistic energy–momentum relation is satisfied
in energy–momentum space. These represent all possi- by the (real) particles in the initial and final states.
ble virtual processes that can take place for given initial As an example of a real process, consider the Compton
and final momentum and polarization or spin states of scattering of photons by electrons. This is allowed in the
the particles. The Feynman rules enable expressions for second order of perturbation theory, and the Feynman dia-
the probability amplitude or S-matrix element S f i for the grams, each containing two vertices, are shown in Fig. 10.
process i → f to be written down directly from the di- The corresponding polarized cross section for the labo-
agrams. From this the cross section for the process may ratory reference system, in which the target electron is
be calculated to a given order in e and compared with the initially at rest, is given by the Klein-Nishina formula:
experimentally obtained value.
dσ α2 ν 2 ν ν
The lowest order of perturbation theroy (n = 1) involves = + + 4(ε · ε) − 2 .
2
d 4m 2 ν ν ν
only the first power of the interaction Hamiltonia INT ,
which is linear in the photon annihilation and creation op- Here ν and ν are the frequencies of the incident and scat-
erators and bilinear in the fermion (electron or positron) tered photons, respectively, and ε and ε are their (four-
annihilation and creation operators. This gives rise to pro- dimensional) transverse polarization vectors, which in this
P1: GLM Final Pages
Encyclopedia of Physical Science and Technology EN005D-207 June 15, 2001 20:26
have been accelerated to speeds about 0.999988 times the to a test of the pointlike nature of photon–lepton interac-
speed of light. For energies much greater than the thresh- tions down to distances of the order of 1/± , that is, less
old energy (or speeds even closer to that of light), the than 10−16 cm.
unpolarized differential and total cross sections for muon
pair production in the center-of-mass system reduce to
SEE ALSO THE FOLLOWING ARTICLES
dσ α2
= (1 + cos2 θ )
d 16E 2 ATOMIC PHYSICS • ATOMIC SPECTROSCOPY • ELECTRO-
MAGNETICS • MICROOPTICS • OPTICAL DIFFRACTION •
and
QUANTUM OPTICS • QUANTUM THEORY • RELATIVITY,
πα 2 SPECIAL • UNIFIED FIELD THEORIES
σtot = ,
3E 2
where θ is the angle between the incoming electron and
the outgoing muon (or between the incoming positron and BIBLIOGRAPHY
the outgoing antimuon). The second of these formulas
Cohen-Tannoudji, C., Dupont Roc, J., and Grynberg, G. (1989). “Pho-
has been verified to within a few percent in experiments
tons and Atoms: An Introduction to Quantum Electrodynamics,”
using total center-of-mass energies of the order of 55 GeV. Wiley, New York.
(At very high center-of-mass energies, weak-interaction Craig, D. P., and Thirunamachandran, T. (1998). “Molecular Quantum
effects must also be taken into account.) Electrodynamics: An Introduction to Radiation–Molecule Interac-
The results of the high-energy experiments can be used tions,” Dover Mincola, N.Y.
Dowling, J. P., ed. (1997). “Electron Theory and Quantum Electrody-
to set bounds on possible deviations from exact quantum
namics: 100 Years Later,” Plenum New York.
electrodynamics. The existence of heavy photons, for ex- Feynman, R. P. (1985). “QED: The Strange Theory of Light and Matter,”
ample, would modify the structure of the theory. (Thus, in Princeton Univ. Press, Princeton, N.J.
the static limit, the Coulomb potential would no longer Greiner, W., and Reinhardt, J. (1994). “Quantum Electrodynamics,” 2nd
have a simple inverse-distance dependence). The total ed. Springer-Verlag, Berlin.
Healy, W. P. (1982). “Non-Relativistic Quantum Electrodynamics,” Aca-
cross section for muon pair production would be altered
demic Press, London.
according to Kinoshita, T., ed. (1990). “Quantum Electrodynamics,” World Scientific,
2 Singapore.
4E 2
σtot → σtot 1 ∓ , Milonni, P. W. (1994). “The Quantum Vacuum: An Introduction to Quan-
4E 2 − 2± tum Electrodynamics,” Academic Press, Boston.
Pike, E. R., and Sarkar, S. (1995). “The Quantum Theory of Radiation,”
where ± are cutoff parameters with the dimensions of Clarendon, Oxford.
energy. For consistency with the experimental results, ± Weinberg, S. (1995). “The Quantum Theory of Fields,” Vol. 1, Cam-
must be at least of the order of 200 GeV. This corresponds bridge Univ. Press, Cambridge.
P1: lEF/GJK P2: GTYFinal Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
Quantum Chemistry
Donald J. Kouri
University of Houston
301
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
Variational method Computational method ensuring an “dual space” vector, χ|, used to compute scalar products
upper bound to the system energy. in quantum mechanics); Ĥ is the Hamiltonian operator
Wave function Projection of quantum state onto position (the generator of the state vector time evolution); and -h- is
eigenstate. Planck’s constant h divided by 2π . In quantum chemistry,
Wave packet Time-dependent wave function. Ĥ usually consists of the sum of all contributions to the
kinetic energy of the system, K̂ , and all contributions to
the potential energy of the system, V̂ . The former takes
QUANTUM CHEMISTRY deals with the detailed under- account of energy due to motion, and the second takes
standing of atomic and molecular structure, properties, account of energy due to position. Like all physical ob-
and dynamics based on the mathematical framework of servables in quantum mechanics, Ĥ is a “self-adjoint” or
quantum mechanics. This involves determining the quan- Hermitian operator, ensuring real eigenvalues. Particularly
tum mechanical “state vector” of the system, and most useful are state vectors associated with specific values of a
commonly in chemistry, this is done in the so-called “co- complete set of observables. Such sets of observables are
ordinate representation,” yielding the “wave function.” composed of the maximum number of dynamical vari-
The time evolution of the state vector or wave function ables whose quantum operators commute (since commut-
is governed by the time-dependent Schrödinger equation. ing operators are not subject to a Heisenberg uncertainty
A typical strategy expresses the behavior and properties condition). Two of the most useful sets of observables are
of a bulk system in terms of the smallest possible dynami- the Cartesian positions of the particles in the system and
cal system displaying the underlying physical or chemical their canonically conjugate momenta. As canonically con-
properties or processes. In the case of chemical reactions, jugate variables, the position and corresponding momen-
this involves synthesizing the bulk reaction in terms of tum operators of a particle do not commute. Consequently,
the smallest possible number of atomic and/or molec- it is customary to use one or the other, and in quantum
ular species needed for the basic reaction stoichiome- chemistry, the overwhelming importance of coulombic in-
try. For spectroscopy, magnetic, electrical, etc. properties, teractions makes the position variables most convenient.
it typically involves determining such properties for indi- States of well-defined position are eigenvectors of the po-
vidual atoms and/or molecules. sition operator, x̂, so that for a particle confined to the
Under the usual conditions, the most important interac- x-axis,
tions governing atomic and molecular structure, proper-
x̂ | x = x | x, (2)
ties, and dynamics are electromagnetic, and time is treated
nonrelativistically. There are, however, some important where the eigenvalue x is the point on the axis where
manifestations of special relativity, such as spin degrees the particle is exactly located and |x is the (improper)
of freedom and the Pauli exclusion principle. It should be eigenvector for this state. The wave function, χ (x), is a
clear that quantum chemistry has much in common with measure of how likely it is that a particle in the state |χ
atomic and molecular physics and theoretical physics. will be found at the point x. This is called a probability
Here, we focus on the areas of most intense research ac- amplitude and is computed as the scalar product of the
tivity in chemistry. state |χ with the state |x,
χ (x) ≡ x | χ . (3)
I. CALCULATING ATOMIC AND If one also represents the Hamiltonian operator in terms of
MOLECULAR WAVE FUNCTIONS its effect on χ (x), we expect the kinetic energy operator in
the coordinate representation to result in changes in where
A. Some Basics of Wave Mechanics the particle is likely to be found. The most general form
We assume that the reader has some familiarity with quan- is given by
∞
tum mechanics, and here we summarize only a few of the
most important aspects, primarily to establish notation. x| K̂ |χ = d x
x| K̂ |x
x
| χ . (4)
−∞
For any atomic or molecular system, there is an associ-
ated state vector, |χ , whose time evolution is governed The coordinate representation of the kinetic energy
by the abstract Schrödinger equation, operator is
∂ -h- 2 ∂2
Ĥ |χ = i -h- |χ . (1) x| K̂ |x
= − δ(x − x
) 2 , (5)
∂t 2m ∂x
We use the convenient Dirac “ket” notation, |χ , for the where δ(x − x
) is the Dirac delta function. For the poten-
state vector (along with the Dirac “bra” notation for the tial energy, we expect V̂ to be a function of the particle
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
position operator, x̂, so in the coordinate representation it letters, and we shall suppress the explicit coordinate de-
is diagonal, with eigenvalues given by the classical poten- pendence, except when it is required for understanding.
tial energy function, V (x). Then the coordinate represen- The separation constant, E, is interpreted as the total en-
tation of the time-dependent Schrödinger equation [in one ergy of the system, and Eq. (10) is the “workhorse” of
dimension (1D)] is quantum chemistry. However, it is perhaps useful to point
-- 2 2 out that while the above is totally general for systems in
h ∂ ∂
− + V (x) χ (x, t) = i -h- χ (x, t). (6) which the preparation of the system is irrelevant (e.g.,
2m ∂ x 2 ∂t systems not involving sources), it is not for other prob-
For typical chemical systems, Eq. (6) applies to each lems. For example, in reactive scattering experiments one
Cartesian coordinate for each electron and nucleus in may desire to determine highly resolved information (e.g.,
the system, and there are also potential energy terms for state-to-state cross sections at specific energies), and one
the interparticle coulombic interactions. The general form must know the detailed initial conditions in order to make
is a comparison between experiment and theory. Although
seldom discussed, the special nature of Eq. (10) can be
L−1
1 N −1
Z a Z a
seen by deriving it in a more general manner.
+
2
e
|
r
l>l
l
=1 l
−
r l
|
a>a
a
=1 | R a − R a
|
We do this by noting that time and energy are conjugate
variables in the sense of Fourier analysis. A more general
N L
Za way to derive a time-independent wave function equation
− χ ({ R a , rl }, t)
a=1 l=1 | R a − r l | is to Fourier transform Eq. (7), written more compactly as
∂
-h- 2 N
1 2 1 L
H χ = i -h- χ . (11)
× − ∇ + ∇ 2
χ ({ R a , rl }, t) ∂t
2 a=1 m a a m e l=1 l
If we specify an experiment lasting from ti to t f , a natural
∂χ definition of a time-independent wave function is
= i -h- . (7)
∂t tf
1 -
≡√ dtei Et/hχ . (12)
Here, (a, a
) label nuclei, (l, l
) label electrons, { R a , rl } 2π ti
denotes the set of all vectors from a common origin to the
nuclei and electrons, ∇a2 is the three-dimensional Lapla- The equation determining is obtained, in general, by
cian for nucleus a (having charge Z a ), and ∇l2 is the applying H to , using Eq. (11), and integrating by parts
three-dimensional Laplacian for electron l. The enormous to yield
difficulty associated with solving Eq. (7) basically stems i -h-
i Et f /h- -
from the coulombic repulsions and attractions which pre- (E − H ) = − √ e χ (t f ) − ei Eti /hχ (ti ) . (13)
2π
vent a separation of variables. In recent years, quantum
chemists have become increasingly interested in solving Now t f is the end time of the experiment, and for scat-
the time-dependent Schrödinger equation directly. How- tering, the products, whatever they are, will have passed
ever, it remains true that the overwhelming majority of through the detector and exited the apparatus. Therefore,
computations have focussed on taking advantage of the χ (t f ) is essentially zero inside the apparatus. However, to
fact that (in the absence of external, time-varying per- obtain Eq. (10), one must also assume that χ (ti ) is also zero
turbations) Ĥ is independent of any explicit time de- everywhere within the apparatus, but this contradicts the
pendence. This makes possible solution by separation of fact that one must measure what it is in order to analyze
variables, the experimental data. Therefore, the time-independent
Schrödinger equation that corresponds to experiment must
χ ({ R a , rl }, t) = ({ R a , rl })
(t), (8) be
i -h- -
where (E − H ) = √ ei Eti /hχ (ti ). (14)
∂ 2π
i -h-
= E
, (9)
∂t It is convenient to define ti to be zero. Thus, the time-
H = E . (10) independent Schrödinger equation retains a memory of
the experimental details. This should not be surprising
Note that we shall always use Dirac notation when writing since it is well known in quantum statistical mechanics
abstract state vectors and denote abstract operators with that the time-independent von Neumann equation con-
a caret. Wave functions and coordinate representation op- tains the initial density matrix. Of course, the ultimate
erators will be indicated by ordinary Greek and Roman goal is to determine dynamical information characteristic
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
of the chemical species involved, independent of the ex- where the Hamiltonian matrix contains the radial kinetic
perimental details. This means that the ultimate quantities energy and centrifugal energy for the relative nuclear mo-
calculated will have had all reference to the experiment tion, the centrifugal energy for rotation of the electron
removed. Individual definite energy, state-to-state cross about the interproton axis, and terms that are nondiagonal
sections from Eqs. (7), (10), or (14) will be the same. in the index (referred to as “coriolis coupling,” describ-
However, there can be computational advantages to solv- ing the tumbling in space of the three particle triangle).
ing equations containing specific experimental conditions. It is common to neglect the coriolis coupling so that one
solves a single uncoupled equation for each J :
B. General Aspects of Nuclear and Electronic
e2
Dynamics: Born-Oppenheimer H = E −
J J
J . (17)
R
Approximation
This equation is still nonseparable, and the Born-
Except for the hydrogen atom and its isotopic variants, Oppenheimer approximation consists of assuming a prod-
there are no closed form analytical solutions to the uct solution
Schrödinger equation for atoms and molecules. The sim-
plest molecules are H+ 2 and H2 (and isotopic variants), J (R, ξ, η) = ζJ (R)φ (R, ξ, η) (18)
and they provide convenient examples for sketching the
and neglecting ∂φ and ∂∂ Rφ2 terms. Here, ξ, η denote the
2
difficulties encountered. For H+ 2 , Eq. (10) is ∂R
remaining electron coordinates. Then one obtains an equa-
-h- 2
-h- 2 e2
− ∇ R2 + ∇ R2 − ∇r21 + tion of the form
2m p 1 2 2m e | R 1 − R2 |
φ Ha
J
(R)ζJ (R) + ζJ (R)He (ξ, η, R)φ
e2 e2
− − = E , (15) e2 J
| R 1 − r1 | | R 2 − r1 | = E− ζ (R)φ (R, ξ, η), (19)
R
and the coulombic attraction terms prevent separation of
J
the r1 dependence. However, the mass of the electron where Ha (R) is a radial kinetic energy operator for
is about 1837 times smaller than that of a proton, so at the relative dynamics of the two protons (nuclei); it
any given kinetic energy, electrons travel some 43 times also contains the centrifugal potential (if any) associated
more slowly. This suggests one can neglect changes in with the internuclear rotation. We solve for the eigen-
the nuclear wave function on the time scale for electron values and eigenfunctions of the electronic Hamiltonian
dynamics. Mathematically, this is manifested by the ap- He (ξ, η, R):
proximate separation of variables in Eq. (15), and one
He φn = n (R)φn . (20)
should be able to solve the electron dynamics at fixed
| R 2 − R 1 |, with the wave function written in a product Note that the electronic eigenenergies depend on R
form. In fact, one may do this in a much more careful through the electron-proton attraction terms of the poten-
way. The potential depends only on the three distances tial operator in He, . Its dependence on takes account
separating the two protons and the electron from each nu- of the electronic rotation about the internuclear vector.
cleus. It is independent of the three coordinates of the Also, the quantum number enters He as 2 , so each en-
molecular center of mass and the three (Eulerian) angles ergy level n is twofold degenerate. The potential energy
orienting the three particle triangle. Therefore, one rigor- for nuclear motion associated with the Born-Oppenheimer
ously separates out these six coordinates. The remaining state ζJ (R)φn (R, ξ, η) is
dynamics is then characterized by the total angular mo-
mentum quantum number J , the component of angular e2
Un (R) = n (R) + . (21)
momentum quantum number M with respect to a z-axis R
fixed at the system center of mass with arbitrary orien- The electronic energy n (R) and the associated wave
tation, and the component of angular momentum quan- function φn (R, ξ, η) can be thought of as “adiabatic”
tum number with respect to a “body-fixed” z-axis that energies and states. They essentially assume that the elec-
rotates to maintain a definite orientation with respect to tron adjusts adiabatically to the nuclear motion. Inclu-
sion of the nuclear derivatives, ∂∂R φ and ∂∂R 2 φ , leads to
2
the three particle triangle. The wave function becomes
a vector of functions, J , satisfying equations of the form states that are viewed as “diabatic.” More rigorous treat-
J
ments exist in which these nuclear kinetic energy terms
e2
= E− J ,
J J are eliminated by a mathematical transformation defining
H (16)
=−J
R the adiabatic and diabatic states.
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
The last step in the procedure (at the Born-Oppenheimer also have an additional nodal surface (to give a shorter
level) is to solve the Schrödinger equation for the nuclear deBroglie wavelength). This can be ensured by writing
vibrational motion,
J J 1
Hn (R) + Un (R) ζnν (R) = J ν ζnν
J
(R), (22) φ2,0 (R → ∞) = √ [φ1s (A) − φ1s (B)] , (26)
2
and for scattering dynamics,
J J+ J+
since at any value of R this is zero when the electron is
Hn (R) + Un (R) ζn (E | R) = Eζn (E | R). (23) anywhere in the plane bisecting the internuclear distance.
Corrections to take account of the nonadiabatic effects of In the united atom limit, this plane contains the united
nuclear-electronic coupling require inclusion not only of nucleus, and φ1s (A) provides the positive lobe and
the ∂∂R φn , ∂∂R 2 φn terms, but also the coriolis coupling in
2 −φ1s (B) provides the negative lobe of the 2 pz wave
the quantum number, as well as inclusion of more than function. Similar ideas apply in general.
just a single product form for J (R, ξ, η). Thus, Eq. (18) In the case of the H2 molecule, there is an additional
is replaced by electron, which introduces additional complexity. This oc-
curs because of an effect of the special theory of relativity,
J (R, ξ, η) = ζn
J
(R)φn (R, ξ, η), (24) namely, electron spin and the requirements of Fermi-Dirac
n statistics and the Pauli exclusion principle. The relativistic
which is substituted into Eq. (16) to generate coupled requirement that time must be treated on an equal footing
equations for the ζn J
(R). The φn are members of the with the spatial coordinates was shown by Dirac to lead to
complete set of eigenfunctions of the Hamiltonian He , an additional matrix structure for relativistically covariant
in Eq. (20). The study of electronic nonadiabatic coupling quantum mechanics. It was also found that even in the non-
is currently of great interest in quantum chemistry. relativistic limit, this matrix structure (though somewhat
Finally, before going on, we display in Fig. 1 some simplified) did not go away. In this limit, Pauli showed that
examples of the potential energy functions, Un (R), gov- the matrix operators associated with the remaining rela-
erning the relative nuclear dynamics. The electronic states tivistic effects obey the commutation relations of a quan-
can be understood in part by using their behavior in two ex- tum mechanical angular momentum operator, with eigen-
treme limits. The “united atom” limit (R → 0) results in a value for the square of the spin being s(s +1)h -- 2 , s = 1 , and
2
1 --
total nuclear charge (Z 1 + Z 2 )e, so for H+ 2 , it gives a charge
sz = ± 2 h. As a result, it was found necessary to multiply
of +2e. The lowest H+ 2 electronic state then correlates with the usual nonrelativistic, single electron solutions of the
the ground state of He+ . When the nuclear repulsion, eR is Schrödinger equation by either of two spin states, |α and
2
-
added to obtain U (R), it results in a coulombic singularity |β. By convention, the α-state corresponds to sz = + 2h and
-h
at R = 0. At large R, H+ 2 yields a neutral H-atom and H ,
+ the β-state corresponds to sz = − 2 . The energy (Hamil-
so the electronic energy tends to the ground state of the tonian) is independent of the spin angular momentum if
2
hydrogen atom. Since eR → 0 in this limit, we see that one neglects electromagnetic interactions associated with
U → −0.5 a.u. Similar considerations apply to excited the electron’s magnetic moments arising due to its orbital
states. For H+2 , the first excited united atom state is the 2 pz and spin angular momentum. (Classically, any current
level of He+ , and the large R limit is again a single ground or charge circulation produces a magnetic field.) In an
state H-atom plus a proton. Of course, the protons are iden- analogous fashion, nuclei that are fermions also possess
tical and one cannot distinguish to which one the electron intrinsic (spin) angular momentum. For example, protons
is bound. This is reflected in the fact that the ground state, and neutrons are fermions, having half-integral spin. Con-
φn=1=0 (R → ∞), wave function has the form sequently, associated with the nuclear part of an atomic
1 or molecular wave function one must include nuclear spin
φ1,0 (R → ∞) = √ [φ1s (A) + φ1s (B)] , (25) states.
2 The second complication arising if a system contains
where φ1s (i) is the 1s-hydrogen orbital centered on nu- more than one of a given type of fermion (e.g., two protons
cleus i = A or B. The positive sign reflects the fact that the in H+2 ; two protons and two electrons in H2 ; three electrons
lowest electronic wave function should have the longest in Li; etc.) is that the wave function (or state vector) must
possible deBroglie wavelength and must be normalizable. change sign if any two identical particles are exchanged.
This latter condition implies that φ1,0 vanishes for the So long as the system only contains two of any identi-
electron infinitely far from both nuclei, and the former cal fermion particles, the effects of antisymmetry can be
condition implies that it have no other “nodal surfaces.” accounted for by forming a product of an appropriately
The first excited electronic state must likewise have the symmetrized and antisymmetrized spatial and spin wave
nodal surface at infinity (for normalizability), but it must functions.
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
In the case of H2 , one approach is to use single electron Expanding this yields
molecular orbitals patterned after the electronic states of
H+ 1
2 . Then a symmetric spatial wave function consists of φ̃ =0,S=0 = √ [φ1s (A, 1)α(1)φ1s (B, 2)β(2)
each electron occupying a φ1,0 -like state, with the total 2
spatial wave function being a product φ1,0 (1)φ1,0 (2), with − φ1s (B, 1)β(1)φ1s (A, 2)α(2)]. (32)
the index in parentheses labeling electron 1 or 2. Obvi-
ously, this wave function is symmetric under exchange of Clearly, this wave function contains no ionic terms, but
the electrons. The spin part of the wave function must be rather assigns one electron to nucleus A and one electron
antisymmetric, to nucleus B. This is an example of a “valence bond”
electronic wave function. In general, both ionic and co-
1
χs=0,sz =0 (1, 2) = √ [α(1)β(2) − α(2)β(1)], (27) valent type terms contribute to the accurate description
2 of electronic states in molecules. In a general LCAO-MO
treatment, one would use a sum of a number of atomic
or in the usual determinantal form,
electronic configurations to form molecular orbitals, and
1 by doing so, arbitrarily accurate results could (in principle)
χs=0,sz =0 = √ |α(1)α(2)β(1)β(2)|. (28)
2 be obtained. Similarly, one could take a sum of many dis-
tinct valence bond wave functions so that many electronic
In general, interchanging the pair of electrons interchanges configurations contribute. Again (in principle), arbitrar-
the columns of the determinant, which ensures a change ily accurate results can be obtained. Both LCAO-MO and
-
of sign. This state corresponds to one electron with sz = 2h valence bond approaches are used by quantum chemists.
-h
and one with sz = − 2 . The total Sz is zero, as is also the When more than two identical Fermions are present in
magnitude Ŝ 2 . The complete molecular orbital description an atom or molecule (e.g., Li or LiH), one no longer can
of ground state H2 is then construct a properly antisymmetrized wave function as a
product of a symmetrized total spatial and an antisym-
φ=0,S=0 = φ1,0 (1)φ1,0 (2)χ S=0,Sz =0 (1, 2). (29) metrized total spin wave function. This essentially results
from the fact that there are only two possible single elec-
This is a simple example of the “linear combination tron spin states, α and β. If one tries to form a spin deter-
of atomic-molecular orbital” (LCAO-MO) description minant for three (or more) electrons, one always finds two
of the electronic structure of H2 . It is of interest to rows of the determinant are equal and it vanishes identi-
multiply out the spatial part of this LCAO-MO wave cally. The consequence is that one must form determinants
function: using one-electron products of a spin and spatial function.
1 Any particular spatial function can be multiplied by either
φ1,0 (1)φ1,0 (2) = [φ1s (A, 1)φ1s (A, 2) + φ1s (B, 1)φ1s an α or β spin state so that, at most, each spatial orbital can
2
be occupied by no more than two electrons. The spatial
× (B, 2) + φ1s (A, 1)φ1s (B, 2)
function can be either an LCAO-MO or a valence bond
+ φ1s (B, 1)φ1s (A, 2)]. (30) function.
One last important aspect of molecular wave functions
This illustrates a very important aspect of the LCAO-MO is the inclusion of the effect of identical nuclei. Nuclei
approach, since the terms φ1s (i, 1)φ1s (i, 2), i = A, B rep- with half-odd integral spin are Fermions, and the total
resent both electrons bound to the same nuleus, A or B. wave function (nuclear plus electronic) must be antisym-
These are called “ionic terms” since, e.g., the state metric under exchange of any two identical nuclei. If they
φ1s (A, 1)φ1s (A, 2) implies that B has lost an electron have integer spin, they are Bosons, and the total wave
which is now bound to A. function must be symmetric under exchange of two such
Another approach is to assign an electron to a spin- nuclei. In the case of H2 , we note that exchanging pro-
orbital so that, e.g., having electron 1 on nucleus A with tons A and B affects the nuclear wave function through
spin α is represented by φ1s (A, 1)α(1). Similarly, electron the polar and azimuthal angles of the internuclear vector.
2 on nucleus B with spin β is described by φ1s (B, 2)β(2). The symmetry is determined by whether J is even or odd,
We can construct an antisymmetrized total electron wave with odd (even) J yielding a nuclear rotational wave func-
function as the “Slater determinant” tion that is odd (even) under nuclear exchange. However,
it is also clear that the LCAO-MO φ1,0 [see Eq. (25)] is
1
φ̃ =0,S=0 = √ |φ1s (A, 1)α(1)φ1s (A, 2)α(2) even if A and B are exchanged, but φ2,0 [see Eq. (26)]
2 will change sign under this interchange. If both electrons
× φ1s (B, 1)β(1)φ1s (B, 2)β(2)|. (31) in H2 occupy either the φ1,0 or φ2,0 spatial orbitals, then
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
the total electronic wave function is even under proton vibrational-rotational coupling, higher powers of (ωe + 12 )
exchange. If one occupies φ1,0 and the other occupies due to anharmonic effects, etc.
φ2,0 , then the electronic wave function is odd under the Again, analogous generalizations exist for systems with
exchange. Then the even (odd) J -rotational states times more than two nuclei. For systems with large numbers of
the even (odd) total electronic wave function requires an nuclei, it is not convenient nor particularly useful com-
antisymmetric nuclear spin wave function, while the odd putationally to separate the center-of-mass motion or the
(even) J -rotational states times even (odd) total electronic three Eulerian angles. This is essentially because of the
states require a symmetric nuclear spin state. We note that enormous number of possible relative coordinates and ro-
the odd nuclear spin state has the form tating frames for such systems.
1
χ S=0,Sz =0 (A, B) = √ [α(A)β(B) − β(A)α(B)] , (33)
2
C. Computational Tools for Bound States
and the symmetric spin states are
The two primary tools for computing energy levels and
χ S=1,Sz =1 = α(A)α(B), (34) wave functions in quantum chemistry are the variational
χ S=1,Sz =−1 = β(A)β(B), (35) and perturbation theoretic methods. A key aspect of the
wave functions for bound systems is the fact that since
1 the probability of observing system components at infi-
χ S=1,Sz =0 = √ [α(A)β(B) + α(B)β(A)]. (36)
2 nite separations from one another must be zero, the wave
Clearly, there are three states possible for S = 1(−1 ≤ function must vanish on the system boundaries. Further,
Sz ≤ 1) so it is the triplet spin state. The proper enumer- because the probability density for observing the system
ation of the possible spin states with even- and odd-J in any configuration equals the modulus of the wave func-
nuclear rotational states is crucial in quantum statistical tion squared, and the total probability of finding the system
mechanics. somewhere must equal 1, bound state wave functions must
If there are three or more nuclei in the molecule and be “quadratically integrable” or L2 -functions. In general,
some or all are identical, appropriate generalizations of the the value of the observable total energy of a system in state
above are necessary. In addition, there can be additional is given by
symmetry operations (for the nuclei in their equilibrium
∗
positions), and in general, it is computationally advan- E = dτ H dτ | |2 , (38)
tageous to take this into account both in the molecular
electronic wave function and in the nuclear wave func- where dτ is the differential volume element and the inte-
tion. The use of group theory plays a major role in atomic gral is over all space. The Rayleigh-Ritz variational prin-
and molecular spectroscopy. ciple states that the functional variation of E with respect
The final topic in this discussion is the nature of the to either or ∗ vanishes, and that the extremum solution
nuclear vibrational Schrödinger equation [Eq. (22)]. If the to the resulting Euler-Lagrange equations is a minimum.
potential Un (R) possesses a well sufficiently deep to sup- Rearranging Eq. (38), we calculate
port bound eigenstates, then one seeks to compute the
ζnν
J
and J ν . One commonly used technique is to expand
δ E dτ | |2 + dτ δ ∗ (E − H )
Un (R) in a Taylor series about the potential minimum at
Re :
+ dτ ∗ (E − H )δ ≡ 0, (39)
1 d2
Un (R) = Un (Re ) + Un | Re (R − Re )2 , (37)
2 dR 2
and we impose the condition δ E ≡ 0. Since complex and
since dRd
Un | Re ≡ 0. In addition, the centrifugal potential ∗ are linearly independent, so also are their variations
is approximated by its value at R = Re . This gives the and consequently the coefficients of both δ and δ ∗
energy J ν as a sum of the harmonic oscillator energy must separately vanish. This yields the usual Schrödinger
-h- ω (ν + 1 ) and the rigid rotor energy J (J + 1) -h- 2 /2µR 2 , equation, (E − H ) = 0, or its complex conjugate, and
e 2 e
where µ is the reduced mass of the two nuclei and ωe is establishes that a complete minimization of E yields the
the usual harmonic oscillator angular frequency. A more exact energy. Alternatively, we can expand in Eq. (38)
accurate description can be obtained by including more using the complete set of eigenfunctions of H , {φn },
terms in the Taylor expansion of both Un and the cen- satisfying
trifugal potential. This leads to a power series in the quan-
tum numbers ν and J , with cross terms that arise from H φn = E n φn , (40)
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
this occurs, the procedure is modified by replacing the II. ELECTRONIC STRUCTURE
degenerate ψ 0j by linear combinations: AND PROPERTIES OF ATOMS
AND MOLECULES
ψ̃ 0k = bkj ψ 0j (63)
j=degenerate with k A. Variational Methods
from one-electron spin-orbitals. This yields a properly an- which there are twice as many STOs or GTOs as there are
tisymmetrized wave function, core and valence atomic orbitals. This gives greater flex-
ibility to the LCAO-MOs that ultimately appear in each
= Cl
l . (73)
l
of the Slater determinants (configurations). Typically, one
of the pair has a smaller exponent (α or ζ ) and one has a
The individual “configuration” Slater determinants,
l ,
larger exponent. A triple zeta basis has three times as many
are constructed from spin-orbitals, φ j , given by
STOs or GTOs as the number of core and valence atomic
φ j = |α Cλj χλ or |β Cλj χλ , (74) orbitals. Even larger basis sets are possible. One such basis
λ λ is denoted by (10s, 6 p/5s, 4 p), indicating that 10 s-type
where χλ is a spatial basis function. The χλ generally fall primitive GTOs were contracted to yield 5 distinct s-type
into two classes: CGTOs and 6 primitive p-type GTOs were contracted to
provide 4 distinct p-type CGTOs. Another Gaussian-type
r Slater-type orbitals (STOs) basis is denoted STO-3G, where three Gaussians each are
used in a least squares fit of STOs, which have been op-
χλ = Nnlmζ Ylm (θ, φ)r n−1 exp(−ζ r ), (75) timized to describe various atomic electronic states. Yet
where Nnlmζ is the normalization constant and ζ others are denoted as 4-31G and 5-31G bases. In this case,
determines the radial “size” of the STO (physically, it the core orbital space is treated with CGTOs containing
is associated with an “effective nuclear charge”). four or five Gaussians. The valence orbital space is treated
r Cartesian Gaussian-type orbitals (GTOs) at a double zeta level, with the first CGTO constructed
from three primitive GTOs and the second containing one
χλ = Nabcα x a y b z c exp(−αr 2 ), (76) primitive GTO.
where Nabcα is the normalization constant and α The framework of the Hartree-Fock approach is com-
determines the radial size of the orbital. Note, e.g., that mon to many of the most widely used electronic struc-
(a, b, c) = (1, 0, 0), (0, 1, 0), or (0, 0, 1) are px , p y , pz ture computational methods. Here we summarize some
orbitals; (a, b, c) = (2, 0, 0), (0, 2, 0), (0, 0, 2), of these and indicate how they are related. In the MCSCF
(1, 1, 0), (0, 1, 1), (1, 0, 1) are sufficient to describe version, the value of E = |H | / | is determined
five d-orbitals and an s-orbital [since the sum of variationally with respect to the CI coefficients, Cl , in
(2, 0, 0), (0, 2, 0), and (0, 0, 2) yields the function Eq. (73) and simultaneously with respect to the Cλj coef-
r 2 exp(−αr 2 )]. ficients in the individual spin-orbitals. In addition, the Cλj
are constrained to satisfy the ortho-normalization relations
In early work in quantum chemistry, STOs were the
∗
most commonly used basis, but as digital computers have Cλj χλ | χλ
Cλ
j
= δ j j
. (77)
λ,λ
ρ(r ), of the system. Note that this does not provide the ways to estimate the Hamiltonian matrix elements. These
explicit dependence of E 0 on ρ, but rather guarantees that can range from ignoring the effects of nonorthogonality
if one can determine ρ, this, in principle, determines the of the atomic orbitals centered on nonbonded atoms to
exact ground state energy. The resulting equations to be approximating matrix elements using experimental data.
solved are The resulting approaches are referred to as semi-empirical
methods, and there exist highly developed computer codes
-h- 2 Z n e2 r
)
ρ(
− ∇ −
2
+e 2
d r
and data banks for carrying out such computations. Details
2m e
n | Rn − r
| r − r
|
| can be found in some of the bibliographic references at the
end of this article.
+ U X α (r ) φi = i φi , (88)
D. Other Properties
where U X α (r ) is an effective one-particle potential that Finally, we comment on the calculation of properties other
accounts exactly (in principle) for all electron-electron than the electronic energy. These are usually expressed in
correlation and exchange effects, and the integral term in- terms of the response of the system to some appropriate
volving ρ is the coulombic interaction at the point r due to external perturbation, λV . These properties are typically
the total electron density. Note that U X α also must remove evaluated by doing a Taylor expansion of the total energy
the “self-interaction” of the electron occupying orbital φi of the perturbed system about λ = 0,
with its own contribution to ρ(r ). The DFT equations,
dE
like the Hartree approximation equations, must be solved E = E(λ = 0) + λ + ··· (92)
iteratively, since it is easily verified that dλ λ=0
where
ρ(r ) = n j |φ j |2 , (89)
j E(λ = 0) = (λ = 0)|H (λ = 0)| (λ = 0), (93)
where n j is the number of electrons occupying orbital φ j . dE
Thus, one begins with an initial guess of the φ j and com- = (λ = 0)|V | (λ = 0)
dλ λ=0
putes ρ. If one also has an estimate of U X α , then Eq. (88)
∂Cl
∂
is a linear eigenvalue problem for calculating a revised +2 |H (λ = 0)| (λ = 0)
estimate of the φ j and its single particle energy, i . Once l
∂λ λ=0 ∂Cl
Eq. (88) is solved, a new ρ(r ) is calculated from Eq. (89) ∂Cµj
∂
and the procedure is repeated until one reaches self- +2 |H (λ = 0)| (λ = 0)
consistency. The total energy of the system is computed j,µ
∂λ λ=0 ∂Cµj
as ∂χµ
∂
-h- 2
ρ( r )Z n e2 +2
∂λ λ=0 ∂χµ
|H (λ = 0)| (λ = 0) , (94)
E= n j φ j ∇ 2 φ j − d r µ
j
2m e | R − r| n n
etc. The value of ( dE )
dλ λ=0
depends not only on the nature
2
e ρ(r )ρ(r ) 1 1 of the perturbation, but also on the method used to obtain
+ dr dr
+ dr U X α (r ).
2 |r − r
| 2 2 (λ = 0). Thus, for the MCSCF method, derivatives of
with respect to the Cl and Cµj vanish identically since the
(90)
MCSCF functional is stationary with respect to such vari-
The greatest difficulty in the theory is obtaining an accu- ations. If the atomic orbitals, χµ , do not depend explicitly
∂χ
rate approximation for U X α , and this is a very active area of on the external field, then ( ∂λµ )λ=0 ≡ 0, and the first order
research. A popular choice is the so-called “local density effect is due solely to the average of V . In, e.g., the CI
∂ ∂
approximation” (LDA), for which method, ∂C l
≡ 0, but ∂C µj
= 0 since the Cµj are not varied
to minimize the CI energy functional.
U X α = −9α/[2(3ρ(r )/8π)1/3 ]. (91)
Theoretical work indicates that α = 2/3 is the optimum
III. QUANTUM CHEMICAL DYNAMICS
value to use.
All of the above discussion has been couched in purely
A. Early Transition State Theory
theoretical terms. When all of the relevant matrix elements
of the Hamiltonian are computed from first principles, Over the past 30 or so years, the field of rigorous quan-
the calculation is referred to as “ab initio.” However, a tum chemical dynamics has been created. Prior to this,
great deal of quantum chemical applications use various the dominant method used to calculate rates was the
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
transition state approach. The role of quantum mechanics from the HF center of mass, R1 ; and the angle, γ1 , between
was mainly to determine the Born-Oppenheimer poten- these two distance coordinates. Analogous but distinct
tial surface for nuclear dynamics and the eigenenergies sets of three natural coordinates apply to the second and
of various bound degrees of freedom of the full system third rearrangements. For the last arrangement (termed the
at the transition state (for use in calculating the transition “breakup” arrangement), the natural coordinates are called
state partition function). The definition of the transition hyperspherical coordinates and typically consist of a hy-
state is most easily given for systems in which the colli- perradius ρ (whose square equals the sum of the squares
sion process has to pass through a so-called “bottleneck” of the two distance coordinates for any of the other three
(e.g., assumed in many cases to be the saddlepoint in the arrangements, i) and two angles. One of these can be the
Born-Oppenheimer nuclear potential surface, separating same as the γi in any of the other coordinate sets, and
reactants and products). Generally, it is assumed that in the second angle is typically defined as α ≡ arctan(ri /Ri ).
all collisions in which the system reaches the transition The boundary conditions are very different in each of the
state configuration, having positive kinetic energy (posi- limits Ri → ∞, i = 1, 2, 3 or ρ → ∞ (the latter requir-
tive velocity in the direction of the product region), the ing that both ri , Ri → ∞ simultaneously). We note that
reactive probability equals 1. The transition state is some- the breakup process has only been treated quantatively
times couched in terms of a “dividing surface” normal within a collinear model of chemical reaction; a major
to which a positive velocity inexorably leads to reaction. topic of research is the development of methods for treat-
An additional quantum modification is to introduce an ad- ing breakup in full dimensionality. Henceforth, we restrict
ditional “transition factor” which can account for effects our discussion to collisions well below the breakup energy
such as tunneling (for collision energies below the saddle- threshold. Computational methods that have been applied
point barrier height, which otherwise have a zero reaction include algebraic variational methods and direct numeri-
probability) as well as corrections to the assumption that cal solution of the discretized partial differential form of
all positive kinetic energy trajectories reaching the transi- the Schrödinger equation. For scattering, the most widely
tion state cross to products with unit probability. used variational principles are based on the generalized
Newton and the Kohn functionals, and they are stationary
rather than extremal. They are typically employed using
B. Rigorous State-to-State Quantum
basis set expansions, with the stationary condition lead-
Reactive Dynamics
ing to linear systems of inhomogeneous algebraic equa-
Although such transition state-based methods continue to tions. Either the basis functions explicitly include terms
play a major role in ab initio predictions of reaction rates, to satisfy the various asymptotic boundary conditions
there now exist several rigorous quantum treatments of or the variational functional itself explicitly includes them
atom-diatom and diatom-diatom reactive collisions, yield- through the appearance of “causal” or “anticausal” Green
ing detailed quantum state resolved probabilities, cross functions.
sections, and rates. Most of these employ a rotating, center- A major advance in dealing with the problem of co-
of-mass coordinate frame (rigorously separating out the ordinates in reactive scattering was the introduction of a
three coordinates of the center of mass and the three Euler practical method for eliminating the need to solve simulta-
angles characterizing the overall rotation of the three or neously for the dynamics in all three arrangements. This is
four atom system). In reactive scattering, the major com- done by introducing ad hoc, localized negative imaginary
plication is the result of the fact that the natural coordinates potentials (NIPs) designed to absorb the wave function in
for describing the atomic and molecular species in each ar- regions that are not of direct interest. This permits one to
rangement are different. This is easily seen by considering solve the Schrödinger equation only in the region leading
an atom-diatom collision system like D + HF. Depend- from reactants to the one transition state of interest. Both
ing on the collision energy, there can be as many as four a time-dependent and time-independent version of the ap-
arrangements: proach exist. It is also possible to resolve state-to-state re-
active scattering amplitudes at any energy contained suf-
r D + HF ficiently in the t = 0 wave packet. The approach yields
r H + DF results in quantitative agreement with those obtained by
r F + HD the more traditional methods. We note that in the tradi-
r F+H+D tional numerical integration approach, one is usually faced
with solving the equations subject to so-called “two-point
In the first, the three natural coordinates remaining after boundary conditions.” One condition is that the solution
separating off the center-of-mass and Euler angles of rota- be regular at the origin, and the others are that the solution
tion are the internuclear HF distance, r1 ; the distance of D have incident waves only in the initial arrangement and
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
that there be only outgoing scattered waves in all possible an NIP. Finally, the time-dependent method automatically
product arrangements. Since the amplitudes for scattering yields scattering information at all energies suffiently con-
into the various possible final states are unknown (and are tained in χ (0), and χ (t) automatically satisfies the proper
what one is trying to compute), one only knows the form scattering boundary conditions.
of the scattering boundary conditions and not their nu- A particularly powerful and promising wave packet ap-
merical values. Consequently, it is necessary to generate proach makes use of the fact that while having a term −i A,
enough linearly independent solutions of the differential where A is a positive semi-definite function, added to the
equations to form linear combinations that possess the Hamiltonian absorbs the wave packet with a time history
correct asymptotic behavior. The general procedure is to given by −i Aχ (t), if one records this time evolution in-
expand the total wave function in internal basis states, giv- formation and changes the sign, +i Aχ (t) then constitutes
ing rise to coupled ordinary differential equations for the a source that feeds the wave back into the region where
expansion coefficients. If there are N basis functions in the A is nonzero. This makes it possible to decouple the dy-
expansion, there result N coupled second order differen- namics in all regions that can be defined using appropriate
tial equations, having in general 2N linearly independent dividing surfaces from transition state theory. We write
solutions. At most, only N of these can be regular at the the set of uncoupled regional time-dependent Schrödinger
origin, leading to the necessity of solving the coupled dif- equations as
ferential equations N separate times for these N sets of
∂χr
linearly independent regular solutions. The expansion in i -h- = (H − i Ar )χr , (96)
the N basis functions yields an N × N Hamiltonian ma- ∂t
trix operator, and propagating each N component regular ∂χ2 K
-
-
ih = H −i A j χ2 + i Ar χr , (97)
solution vector involves doing N 2 multiplications at each ∂t j=1
step of the propagation. Since this is done N times, there
is a total of N 3 multiplications at each step, for an overall ∂χ p
i -h- = H χ p + i A p χ2 , p = 1, . . . , K . (98)
scaling of MN 3 , where M is the number of steps needed ∂t
to escape the scattering interaction. This must be redone Here, χr (t = 0) is the initial packet; H is the full Hamil-
at each separate energy E for which one desires scattering tonian, and Ar , A p , p = 1, . . . , K are the functions that
information. By comparison, solving the time-dependent determine in what regions of configuration space sinks
Schrödinger equation formally, one gets and sources are located. The absorber −i Ar is located
-
χ (t) = e−i H t/hχ (0). (95) just beyond the dividing surface associated with the initial
(reactant) arrangement (just inside the so-called “strong
A popular way to evaluate Eq. (95) is to take t short enough interaction” region, 2). The −i A p are located just be-
that one can approximate exp(−i H t/h -- ) as the product
-- -
- -- ) (called the yond the dividing surface associated with arrangement
exp(−i V t/2h)exp(−i H0 t/h)exp(−i V t/2h p, p = 1, . . . , K (including the initial arrangement). Then
“symmetric split operator” approximation). Then the ex- it is easily seen that the dynamics of χr occurs only in the
ponential operators are evaluated in the representation in reactant arrangement, up to just within the reactant divid-
which they are diagonal (the coordinate representation for ing surface. The dynamics of χ2 (t) occurs solely within the
the potential and the momentum representation and in- strong interaction region, until it is completely absorbed
ternal eigenstates for the reference Hamiltonian H0 ). The by the various −i A p located just outside the respective
major computational effort is that of transforming between p arrangement dividing surface. The dynamics of the χ p
these two representations. For the scattering distance and wave packet occurs solely in the region outside the pth
its canonical momentum, this is typically done by Fast dividing surface. We note that if we add all the equations,
Fourier Transform, which scales as M log2 M, where M the result is
is the number of grid points in the discrete Fourier trans-
form. The other basis dependence is at most N 2 (and can
K K
-
ih- χ +χ +χ = H χ +χ +χ ,
be as low as N 3/2 in the rotating frame approach). The p r 2 p r 2
p=1 p=1
computational effort then scales more slowly with N than
the time-independent method discussed above, but more (99)
rapidly with M. Also, one must do this at each time step
until the scattering is completed. To avoid having to follow so that the exact solution of the time-dependent Schrö-
the continued time evolution of the portion of χ (t) that has dinger equation is
already escaped the range of the interaction causing the
K
scattering, one may analyze it for the scattering informa- χ (t) ≡ χ p + χr + χ2 . (100)
tion and then absorb it beyond the analysis region with p=1
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
Notice that while we must solve for χr (t) and χ2 (t), they One then finds that on the target side of χ (0), in configu-
are nonzero for a shorter length of time than the total ration space,
overall collision, and they are nonzero only in limited re-
gions of configuration space. Therefore, computing them mC(k) +
is much less work than solving for χ (t) directly and re- ξ + (E) = -h- 2 k k (E). (109)
quires less storage. In addition, once we have recorded
the time evolution of the −i A p χ2 , p = 1, . . . , K , we only In general, if there are internal degrees of freedom and
need to solve for the final arrangements of immediate in- different possible chemical arrangements, one obtains
terest, and these are done completely separately (they are
totally uncoupled between two regions p = p
). We can
mCr knr
also do the calculations whenever computing resources are ξr+nr (E) = -- 2 r+knr (E), (110)
h knr
most available and least costly. We also can use whatever
coordinates are optimum in each separate region p. We
where Cr (knr ) characterizes the initial packet located in ar-
point out that a time-independent version of Eqs. (96)–(98)
rangement r , with initial- 2internal quantum numbers n r and
has been developed which appears very robust. Essen- h k2
relative kinetic energy 2mnr , and r+knr (E) is the complete
tially, it results from a half Fourier time-to-energy trans-
Lippmann-Schwinger solution, including all possible fi-
form of Eq. (95), to yield [in analogy with Eq. (14)]
nal arrangements. The state ξr+nr can be similarly split into
i localized pieces satisfying uncoupled time-independent
(E − H )ξ = χ (0), (101) dynamical equations analogous to Eqs. (96)–(98).
2π
A particularly convenient observation is that, just like
or in causal solution form, the homogeneous Schrödinger equation, a variational ex-
i pression for the S matrix at energy E can be derived,
ξ + (E) = χ (0). (102) based on Eqs. (101), (102), and (110). The result for a
2π(E − H + i)
general scattering amplitude, connecting state r, n r at to-
Note that ξ + (E) is not the usual definite energy, caus- tal energy E with final state p, n p at energy E is the robust
al solution of the Schrödinger equation (known as the expression
Lippmann-Schwinger scattering solution). However, in
certain well-defined regions of configuration space [name- i -h- 2 kn p knr
ly, on the “target side” of the initial wave packet, χ (0)], S pn p kn p r n r knr = χ ( pn p | t
m(2π )2 Cr knr C ∗p kn p
it is proportional to the Lippmann-Schwinger solution,
+ (E), satisfying 1
= 0) χ (r n r | t = 0) .
E − H + i
1
k+ (E) = φk (E) + V φk (E). (103) (111)
E − H + i
Here, φk (E) is the solution of the unperturbed Schrödinger The modulus squared of this S matrix element is the
equation, probability of the indicated state-to-state process, and
χ ( pn p | t = 0) is a wave packet located in the prod-
H0 φk (E) = Eφk (E), (104) uct region of configuration space, outside the collision
region. Clearly, this is equal to a known factor times
where, as usual, χ ( pn p |t = 0)|ξr+nr (E). It should be clear that knowledge
of the full Green function, 1/(E − H + i) ≡ G + (E), pro-
H = H0 + V, (105)
vides all one needs to know to calculate the state resolved
with V being the interaction responsible for causing the S matrix element at the energy E.
scattering. If we express χ (0) in terms of the complete set It is worthwhile to examine how one can implement
{φk (E)}, Eq. (111) computationally. A particularly effective way
∞ to do this is to expand 1/(E − H + i) in an appropri-
ate polynomial basis. A popular one is the Chebychev
χ (0) = dkC(k)φk (E), (106)
−∞ polynomials, denoted Tn . However, they require an ar-
1 gument bounded by ±1, while the exact spectrum of H
C(k) = φk (E) | χ (0), (107) is unbounded. However, in practice, one introduces a fi-
2π
nite dimensional matrix representation of H , and this can
φk (E) | φk
(E) = 2πδ(k − k
). (108) be normalized so that it possesses no eigenvalues larger
P1: lEF/GJK P2: GTYFinal Pages
Encyclopedia of Physical Science and Technology EN013L-624 July 26, 2001 19:37
in magnitude than 1. This leads to an expansion of the solution, one obtains η̃n (E) ≡ Tn (Hnor m )V φk (E), which
form must be recalculated at every new energy!
1
= gn+ (E nor m )Tn (Hnor m ), (112)
E − H + i n C. Rigorous Quantum Transition State Theory
where −1 ≤ E nor m ≤ +1; a similar bound holds for the The above discussion shows, as one might have expected,
eigenvalues of Hnor m ; and gn+ (E nor m ) is a simple, analyt- that knowledge of G + (E) is tantamount to having solved,
ically known function that explicitly builds in the causal subject to causal boundary conditions, the Schrödinger
behavior of G + (E). Computation of the S matrix then equation for any possible initial and final states and ar-
requires evaluating rangements. This suggests that G + (E) cannot only yield
1 state-to-state probability amplitudes, but also the sum over
χ (r n r | t = 0) all possible initial states in arrangement r and all possible
E − H + i
final states in product arrangement p, denoted as N pr (E).
= gn+ (E nor m )Tn (Hnor m )χ (r n r | t = 0). (113) This quantity is known as the “cummulative reaction prob-
n ability,” and it is central to an arbitrarily accurate, rigor-
Note that all of the dependence on the energy E is con- ous quantum transition state theory. This should also not
tained in the analytical coefficients, gn+ . Defining “Krylov be surprizing in light of the fact that by use of absorbing
vectors” potentials located at various dividing surfaces, one can
isolate the dynamics occurring in the strong interaction
ηn ≡ Tn (Hnor m )χ (r n r | t = 0), (114) region. One then expects that the total strong interaction
Green function for Eq. (97),
we note that they are totally independent of E, and
1 G 2 (E) =
1
χ (r n r | t = 0) = g + (E nor m )ηn . (115) K (120)
E − H + i E − H +i j=1 Aj
n
Therefore, once the {ηn } are calculated that have nonzero should be able to provide N pr (E) without having to carry
overlap with χ( pn p | t = 0), one has all the information out any dynamics outside of region 2. This has indeed
needed to compute the S matrix elements at any other been proved to be true, and it is found that
energy E contained in the initial packet. The Tn satisfy a
simple recursion relation, so that N pr (E) = trace 4Ar1/2 G ∗2 (E)A p G 2 Ar1/2
Quantum Chromodynamics
(QCD)
Taizo Muta
Hiroshima University
319
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
1964 Gell-Mann and Zweig proposed that hadrons are that quark systems are required to have an extra symmetry
composed of quarks. It was then natural to look for the associated with the non-Abelian gauge field. In the mean-
dynamics obeyed by quark systems as the origin of strong time three facts suggested that quarks must have a new
interactions. quantum number called color and exhibit the color sym-
metry. We discuss these observations below.
The symmetry group in QCD is the color SU(3) group, (Dµ ψ) = U (Dµ ψ). (14)
which is a noncommutative (non-Abelian) group. A gauge
theory which is invariant under the non-Abelian gauge Thus ψ̄ Dµ ψ is invariant under the non-Abelian local
group is also called Yang–Mills theory. gauge transformation (12) and hence the Lagrangian (11)
We consider the fermion ψ(x) with mass m (the quark is also invariant.
field, to be more specific) which belongs to the N - The Lagrangian (11) describes the fermion ψ(x) in in-
dimensional fundamental representation of the group teraction with the gauge fields Aaµ (x). It is still to be sup-
G. Thus the field ψ(x) has N components: ψi (x), i = plemented by a kinetic term consisting purely of the gauge
1, 2, . . . , N . The group G in QCD is the color SU(3) group fields Aaµ (x). Inspired by the example of electrodynamics,
and so N = 3. The Lagrangian for the free fermion field we may try the form
ψ(x) is given by 1
− ∂µ Aaν − ∂ν Aaµ (∂ µ Aaν − ∂ νAaµ ). (15)
L = ψ̄ i iγ µ ∂µ − m ψi . (6) 4
However, such a term is not SU(3) gauge invariant. It is
The Lagrangian is invariant under the transformation a
not difficult to prove that Fµν F aµν is invariant with
ψ = Ui j ψ j , U = exp(−i T a θ a ), (7)
a
Fµν ≡ ∂µ Aaν − ∂ν Aaµ + g f abc Abµ Acν . (16)
where θ (a = 1, . . . , 8) are the transformation parameters
a
and T a are the SU(3) generators, which are subject to the Finally we arrive at the general form of the Lagrangian
commutation relations which is invariant under the non-Abelian local gauge
transformations (7) and (12),
[T a , T b ] = i f abc T c , (8)
1 a aµν
where the summation on the repeated indices should be L = − Fµν F + ψ̄ iγ µ Dµ − m ψ. (17)
understood as usual and f abc are the structure constants of 4
the SU(3) group. For θ a depending on x, the Lagrangian It should be noted that in the above Lagrangian there exists
(6) is no longer invariant under the transformation (7). The only one arbitrary parameter g owing to gauge invariance.
Lagranian may be made invariant under the non-Ablian This universal constant g is called the gauge coupling
local gauge transformation (7) if the derivative in Eq. (6) constant. We derived the above Lagrangian based on the
is replaced by the covariant derivative, gauge principle. Of course it should still be Lorentz invari-
Dµ = ∂µ − igT a Aaµ , (9) ant and invariant under space and time reversal. Unfortu-
nately, the Lagrangian (17) is not unique, in that we may
where Aaµ are the gauge fields and g is a constant rep- add to it other terms of higher power of Fµν a
and ψ within
resenting the coupling strength between ψ and Aaµ . In the requirement of local gauge invariance, Lorentz invari-
component form Eq. (9) reads ance, and invariance under space inversion and time rever-
(Dµ )i j = δi j ∂µ − igTiaj Aaµ , (10) sal. The additional requirement of renormalizability may
eliminate all the irrelevant terms and fix the Lagrangian
where Tiaj a
is the representation of T in the fundamental (17) in a unique way.
representation. We now replace the Lagrangian (6) by After fixing the Lagrangian we are ready to move on
to the quantization of the theory under consideration. The
L = ψ̄ i iγ µ (Dµ )i j − mδi j ψi
quantization can be conveniently performed in the Feyn-
= ψ̄ iγ µ Dµ − m ψ. (11) man functional-integral formalism. For simplicity we con-
The new Lagrangian (11) is invariant under the non-Ablian sider a system consisting only of a neutral scalar field φ(x)
local gauge transformation (7) provided Aaµ (x) obeys the with mass m. The Green function is given by the functional
transformation rule integral
i −1 [dφ]φ(x1 ) · · · φ(xn ) exp(i S)
T a Aa
µ = U T a a
A µ − U ∂ µ U U −1 . (12) 0|T [φ̂(x1 ) · · · φ̂(xn )]|0 = ,
g [dφ] exp(i S)
We now show that the Lagranian (11) is invariant under (18)
the local gauge transformation. Note that
where S is the classical action,
(Dµ ψ) = ∂µ − igT a Aaµ ψ
S = d 4 xL. (19)
= U ∂µ + U −1 ∂µ U − igU −1 T a U Aa
µ ψ. (13)
Because Aaµ satisfies the transformation rule (12), we have Here L is the classical Lagrangian density:
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
=i n
[dφ] φ(x1 ) · · · φ(xn ) exp i d x(L + φ J ) .
4 Z [J ] = [d A] det MG
(23) 1 µ a 2
× exp i d 4 x L − G Aµ + Aaµ J aµ .
2α
Hence we obtain
(30)
(−i)n δ n Z [J ]
0|T [φ̂(x1 ) · · · φ̂(xn )]|0 = .
Z [0] δ J (x1 ) · · · δ J (xn ) J =0 In this way we succeeded in exponentiating the constraint.
The resulting exponent is the so-called gauge-fixing term
(24) with gauge parameter α. The following are examples of
The functional integral Z (J ) thus generates all the Green explicit expressions for the matrix MG :
functions. In this sense Z (J ) is called the generating func-
tional for Green functions. 1. Coulomb gauge G µ = (0, ∇):
A straightforward application of Eq. (21) to the case (MG (x, y))ab = (δ ab ∇ 2 − g f abc Ac · ∇)δ 4 (x − y). (31)
of gauge fields suggests that the generating functional for
gauge fields Aaµ is given by 2. Lorentz (covariant) gauge G µ = ∂ µ :
(MG (x, y))ab = δ ab − g f abc ∂ µ Acµ δ 4 (x − y). (32)
Z [J ] = [d A] exp i d x L + Aµ J
4 a aµ
, (25)
3. Axial gauge G µ = n µ (here n µ is a spacelike con-
where L is given by Eq. (11) and [A] is the shorthand stant 4-vector):
notation for
(MG (x, y))ab = (δ ab n · ∂ − g f abc n · Ac )δ 4 (x − y). (33)
d Aaµ . (26)
µ,a 4. Temporal gauge G µ = (1, 0, 0, 0):
(25), L and [A] are gauge invariant. The action
In Eq. (MG (x, y))ab = δ ab ∂0 − g f abc Ac0 δ 4 (x − y). (34)
S = d 4 xL is also gauge invariant. In the following we
set Aa
µ = Aµ
(θ )a
in order to emphasize its dependence on With the generating functional Z [J ] given by Eq. (30) the
θ . Starting with a fixed Aaµ , we obtain a set of A(θ
a
µ
)a
by quantization program for gauge fields is completed.
a
applying to Aµ all the transformations U (θ) belonging to The functional-integral quantization has been success-
group G. According to the above invariance, the action S fully performed for scalar and gauge fields in Eqs. (21) and
is constant for all A(θ)a
µ in this subset and the functional (30). The case of fermion fields needs special care and the
integral Z [0] on this subset of A(θ)a
µ diverges, as the region quantization is performed by the use of the Grassmann
of the integral is infinite. Hence it is sensible to integrate number, which is a set of anticommuting numbers
only once on such A(θ)a µ that belongs to the subset and to ψ j ( j = 1, . . . , N ),
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
2
× exp i d 4 x L − (1/2α) ∂ µ Aaµ d 4 x d 4 yχ a (x)∗ (MG (x, y))ab χ b (y)
× exp i d 4 x(L + A J + χ ∗ ξ
This rule of doing the functional differentiation with
−η(x) should always be kept in mind in defining fermion
Green functions. + ξ ∗ χ + ψ̄η + η̄ψ) , (42)
The following formula for the integral over a Grassman
algebra will be useful in the following, where A(x, y) is a where ξ a and ξ a∗ are source functions (Grassmann num-
certain complex function: bers) for the ghosts, and A J , χ ∗ ξ , and ξ ∗ χ are shorthand
notations for
[dψ][d ψ̄] exp d x d y ψ̄(x)A(x, y)ψ(y) = det A.
4 4
g
= − f abc ∂µ Aaν − ∂ν Aaµ Abµ Acν d 4 z(x − z)S(z − y) = δ 4 (x − y). (65)
2
g 2 abe cde a b cµ dν
− f f Aµ Aν A A These functions Dµν ab
, D ab , and S are propagators of
4
the gluon, Feddeev–Popov ghost and quark, respectively.
− g f abc (∂ µ χ a∗ )χ b Acµ + g ψ̄ T a γ µ ψ Aaµ . (54) Solving the conditions (63)–(65) for Fourier coefficients,
Then Eq. (42) can be rewritten in the following form: we find
Z [J, ξ, ξ ∗ , η, η̄] d 4 k e−ik·x kµ kν
Dµν (x) = δ
ab ab
gµν − (1 − α) 2 ,
(2π )4 k 2 + iε k
= exp i d 4 xL1
(66)
δ δ δ δ δ 4
d k −1 −ik·x
× , , , , D ab (x) = δ ab e , (67)
iδ J aµ iδξ a∗ iδ(−ξ a ) iδ η̄ iδ(−η) (2π ) k + iε
4 2
Z 0 [J, . . .] = Z 0G [J ]Z 0FP [ξ, ξ ∗ ]Z 0F [η, η̄], (56) We can now perform the functional integrations of Aaµ , χ ,
χ ∗ , ψ, and ψ̄ in Eqs. (57)–(59), respectively. We obtain,
G apart from irrelevant constant multiples,
Z 0 [J ] = [d A] exp i d x L0 + A J ,
G 4
(57)
i
Z 0 [J ] = exp
G
d x d y J (x)Dµν (x − y)J (y) ,
4 4 aµ ab bν
Z 0FP [ξ, ξ ∗ ] = [dχ ][dχ ∗ ] 2
(69)
FP ∗ ∗
× exp i d x L0 + χ ξ + ξ χ , (58)
4
In order to obtain Z [J, . . .] perturbatively we first cal- We insert Eqs. (69)–(71) into Eq. (56) and use Eq. (55)
culate Z 0 [J, . . .] for the gluon, Faddeev–Popov ghost, to generate the perturbation series,
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
Propagators
Gluons A k dµν (k)
aµ bν δab
k2
Ghosts χ k −1
a b δab 2
k
Quarks ψ p 1
i j δi j
m− p
Vertices
a1µ1
k1
Three-Gluon k2 −ig f a1 a2 a3 Vµ1 µ2 µ3 (k1 , k2 , k3 )
a2µ2 k3 a3µ3
a1µ1
a2µ2
Four-gluon a4µ4 −g 2 Wµa11 ...a
...µ4
4
a3µ3
aµ
k
Gluon–ghost b c −ig f abc kµ
aµ
Gluon–quark gγµ Tiaj
i j
Loops
d 4 k ab µν
Gluon aµ k δ δ
bν (2π )4 i
Ghost d 4 k ab
a k − δ
b (2π )4 i
Quark d 4 p i j αβ
iα p − δ δ
jβ (2π )4 i
k
Gluon–quark
d4k
(2π )4 i
Gluon–ghost
Symmetry factors
1 , 1
2! 3!
1 ,
2!
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
λ k/
d 4k ∼ lim K . (75)
p k2k2 K →∞
The term LC is called the counter-term Lagrangian, and Equation (90) defines a set of finite renormalization
is used to subtract the divergences. The renormalization {z g (µ , µ)} for varying renormalization scales µ and µ.
constants Z 3 , Z̃ 3 , Z 2 , Z m , and Z g should be determined The finite renormalization (90) can be regarded as a trans-
by adjusting the counter terms so as to cancel overall di- formation. This set of transformations possesses group
vergences appearing in higher order Feynman amplitudes. properties. This group is Abelian. A similar property is
possessed by {z m (µ , µ)}, which is also an Abelian group.
We have yet another kind of finite renormalization
B. Renormalization Group Method which applies to Green functions. For simplicity, let us
According to the renormalization program, we substract consider here the truncated connected Green function for
all the divergences from the Green functions systemati- n gluon legs G̃ tcn ( p, g, m), which is defined by
cally order by order in perturbation theory. In the subtrac-
(2π )4 δ 4 ( p1 + · · · + pn )G̃ 2 ( p1 ) · · · G̃ 2 ( pn )G̃ tc
n ( p, g, m)
tion procedure there exists an arbitrariness of how to define
Fn ( p, gr (µ), m r (µ), µ) dg dm
= 0, = 0. (106)
= −i G̃ rn
tc
( p, gr (µ), m r (µ), µ). (101) dµ dµ
The finite renormalization for Fn is then given by According to Eq. (105), we obtain, from (106), the differ-
ential equations for the renormalized parameters gR and
Fn ( p, gr (µ ), m r (µ ), µ ) mR,
= z n (µ , µ)Fn ( p, gr (µ), m r (µ), µ), (102) dgR
µ =β (107)
where the renormalization factor z n (µ , µ) is defined by dµ
dm R
z n (µ , µ) = [Z 3 (µ )/Z 3 (µ)]n/2 . (103) µ = −m R γm , (108)
dµ
Thus the change of the renormalization scale µ → µ gen-
erates a finite multiplicative renormalization of the Feyn- where we have rewritten m r as m R , i.e., m R = m r , just to
man amplitudes which gives rise to an Abelian group balance the notation, and β and γm are given by
{z n (µ , µ)}. µ d Zg
We have obtained three sets of finite renormalizations β = −εgR − gR , (109)
Z g dµ
{z g (µ , µ)}, {z m (µ , µ)}, and {z n (µ , µ)} which constitute
Abelian groups generated by the scale change µ → µ . The µ d Zm
1/2
Hence the running coupling constant ḡ is given by The second term in the parentheses in the above equation
represents the next to leading order, which corresponds to
g2 1
ḡ 2 = = , (137) the two-loop effect.
1 + 2β0 g 2 t β0 ln (−q 2 /2 ) We find that the renormalized coupling constant tends
where the new momentum scale is defined by to be small as the relevant momentum scale grows. Ac-
cording to this property of asymptotic freedom, we realize
= µ exp −1 2β0 g 2 . (138) that our perturbative calculation is justified for the large-
The momentum scale is often referred to as the QCD momentum scale. Thus, in QCD, perturbation theory is
scale parameter and is the only adjustable parameter in perfectly legitimate in the large-momentum region.
QCD except for the quark masses. In fact, the free param-
eter g present in the original Lagrangian is replaced by C. Operator-Product Expansion
the scale parameter through Eq. (138). The scale pa-
rameter should be determined by comparing the QCD The product of fields at the same space-time point is called
predictions with experimental data. the composite field or composite operator. Strictly speak-
From Eq. (137), we can see that asymptotic freedom ing, the composite operator field is not well defined if one
occurs if β0 > 0. Equation (128) shows that the condition takes a product of fields in a naive way. To make the ar-
for the asymptotic freedom reads gument simpler we shall confine ourselves to the case of
the neutral scalar field φ(x).
Nf < 33/2. Even for free fields we can see show that the composite
Hence quantum chromodynamics enjoys the property of operator is not well defined. Let us consider the time-
asymptotic freedom in so far as the number of quark fla- ordered product of two fields T [φ(x)φ(y)]. Its vacuum
vors is less than 16. It should be noted that, for Nf = 0, expectation value is the propagator of the free field φ(x)
i.e., for a world made up only of gluons, the coefficient (times −i),
β0 is positive definite. It is the presence of quarks that 0|T [φ̂(x)φ̂(y)]|0 = −i(x − y)
can spoil asymptotic freedom. The fundamental origin of
fields, this simple manipulation cannot be generalized in and so on. The operator Ô 0 (x) is usually an identity op-
a straightforward manner. erator. Thus the operator-product expansion serves as a
For interacting fields we also have a simple argument means of defining the composite operator.
showing that the composite operator φ(x)2 is ill defined. Now we shall derive the operator-product expansion of
We take the vacuum expectation value of the composite Eq. (151). We exemplify it in free field theories. One of
operator φ̂(x)2 and find that the simplest examples of the operator-product expansion
and are unimportant in the short-distance region. wµν = d 4 x eiq·x 0|[ jµ (x), jν (0)]|0, (174)
We may transform Eq. (160) into a formula for the cur-
rent commutator [ jµ (x), jν (0)]. For this purpose we first Here p1 ( p2 ) is the momentum of the electron (positron),
note that and q = p1 + p2 . We insert Eq. (171) in Eq. (191). Since
OV , O A , and Oµν are of the form of a normal product, we
T [ jµ (x) jν (0)] − T [ jµ (x) jν (0)]† = ε(x0 )[ jµ (x), jν (0)],
realize that only the first term on the right-hand side of
(165) Eq. (171) contributes to wµν . Hence we have
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
X
to discuss the high-energy e+ e− annihilation process. The
(188) R ratio is closely related to the current commutator at short
It is easily shown that wµν can be rewritten as distances. To show, this we use Eqs. (191), (192) and (196)
to reexpress the R ratio as
large s is found to be small; thus the validity of the per- in Eq. (202) and the expansion parameter ḡ(s) is smaller
turbative calculation of R is guaranteed. than g for large s (s µ2 ) according to the property of
The R ratio expressed in terms of the coupling con- asymptotic freedom.
stant g renormalized at scale µ contains, in general, large We shall show how to calculate a(1) in Eq. (203) in
logs, ln(s/µ2 ), for large s in each term of the perturbative perturbative QCD. The strategy for computing a(1) is first
expansion and hence the effectiveness of the perturbative to calculate the R ratio to order g 2 by using the coupling
calculation is spoiled. According to Eq. (200), however, constant g renormalized at the scale µ and then set µ2 = s
the calculation is drasticaly improved if we employ the to obtain a(1).
coupling constant
√ renormalized at the scale of the rele- The Feynman diagrams contributing to the total cross
vant energy s. section of the e+ e− annihilation are shown in Fig. 3. The
Let us look into the details of the above statement. The total cross section σ receives separate contributions from
R ratio R(s/µ2 , g) is given by the perturbative calculation the final states q q̄, q q̄G, q q̄q q̄, q q̄GG, . . . , with q, q̄, and
in the form G denoting the quark, antiquark, and gluon, respectively.
The contribution up to order g 2 may be represented as in
s
R ,g = Q i2 [1 + a(s/µ2 )g 2 + b(s/µ2 )g 4 + · · ·], Fig. 4 and is split into three parts: the Born cross section
µ2 i σB (Fig. 4a), the virtual (one-loop) gluon contribution σV
(202)
(Fig. 4b, c), and the real-gluon-emission cross section σR
where the index i runs over colors and flavors of quarks. (Fig. 4d). Denoting the full cross section to order g 2 by σ ,
The coefficients a, b, . . . in general include large logs, we have
ln (s/µ2 ). Equation (200) indicates that, if g is replaced
by ḡ(s), these large logs in the coefficient disappear, i.e., σ = Z 22 σB + σV + σR
s = σB + σ̃V + σR , (204)
R , g = Q i2 [1 + a(1)ḡ(s)2 + b(1)ḡ(s)4 + · · ·].
µ2 i where σ̃V = σV +(Z 22 −1)σB with Z 2 the field renormaliza-
(203) tion constant associated with the quark extrnal lines. The
The expression (203) is much better than Eq. (202) in two factor Z 22 in Eq. (204) is necessary since the field renormal-
1/2
respects: its expansion coefficients are smaller than those ization constant Z 2 for each quark external line should
G 2
q q G G
A A A A . . . .
e e
G
q q G G 2
A A A . . . . . . . .
e e
FIGURE 3
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
k1 k2 2 k1 k2
* *
p1 p2 p1 p2
(a) (b) (c)
k3 2
k1 k2
p1 p2
(d)
FIGURE 4
be included in the expression of the renormalized S-matrix Note also that we use the Feynman gauge in the present
elements as a renormalized truncated Green function. calculation, and we calculate Z 2 on the mass shell of
The Born cross section σB can be obtained by neglecting quarks where quarks are massless. Here naturally we meet
electron and quark messes, with the infrared divergence (mass singularity) arising
from the vanishing quark mass. We shall regularize the
4πα 2 2
σB = Qi . (205) mass singularity by means of dimensional regularization.
3s i For p 2 = 0 the quark self-energy part ( p) in the Feynman
The one-loop contribution σV to the e+ e− → q q̄ cross gauge reads
1
q
d Dk 1 1 1
µ = g 2 CF γλ γµ γ λ . (208)
D
(2π) i k 2 k/ − k/1 k/ + k/2 FIGURE 5
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
where µ is the mass scale introduced to make the coupling As can be seen in Eq. (220), Iµν depends only on qµ and
g dimensionless and satisfies the following condition corresponding to the con-
servation of the electromagnetic current:
q = p1 + p2 . (212)
q µ Iµν = 0. (222)
As expected, the ultraviolet divergences present in
Eq. (211) cancel out in σ̃V on account of Eq. (210). In- Hence the general form of Iµν is given by
serting Eq. (211) into Eq. (207) and taking account of
qµ qν
Eqs. (205) and Eq. (210), we find Iµν = I (q )
2
− gµν , (223)
q2
σ̃V = AV σB , (213) with I (q 2 ) = −g µν Iµν /(D − 1). By means of Eq. (223)
we obtain
where AV is given by
ε D − 2 2 µν
αs 4πµ2 cos πε L µν Iµν = q g Iµν . (224)
AV = CF D−1
π s (1 − ε) On the other hand, we find after some calculation
1 3 x12 + x22 − εx32
× − 2− − 4 + O(ε) , (214) g µν G µν = −8(1 − ε) ,
ε 2ε (1 − x1 )(1 − x2 )
(225)
1
Owing to asymptotic freedom, the running coupling con-
3
−ε
3
x12 + x22 − εx32
K = (1−xi ) d xi δ 2− xi . stant of QCD decreases logarithmically as the relevant
0 i=1 i=1
(1 − x1 )(1 − x2 )
mass scale grows. Accordingly, Eq. (237) tells us that the
(229) R ratio in QCD approaches the parton-model prediction
The constant K can be calculated analytically to order ε 0 , as s → ∞.
i.e.,
B. q q̄ Jets in e+ e− Annihilations
4 12
K = − + 10 − 4ε B(1 − ε, 2 − 2ε)B We shall show in the present section that hadronic jet phe-
ε2 ε
nomena are dominated by short-distance effects, so that
× (1 − ε, 1 − ε) + O(ε). (230) perturbative QCD may be safely applied to the discussion
We wish to express Eq. (228) in the form of of jet processes. For this purpose we present the proof
of the cancellation of the infrared divergences in the jet
σR = A R σB , (231) cross sections since the infrared divergences reflect the
long-distance nature of QCD.
with AR to be determined. For this purpose we need to find
Here we confine ourselves to hadronic jets arising from
the expression for the Born cross section σB in D dimen-
the quark–antiquark pair production in e+ e− annihila-
sions. We repeat the calculation of σB in D-dimensional
tions. In order that the hadronic jets flow from the quark–
space-time and obtain
antiquark pair, the quark and antiquark are required not to
4π α 2 2 4π ε 3(1 − ε)(2 − ε) lose too much energy in their direction by the emission
σB = Qi . of gluons and quark pairs. In other words it is necessary
3s i
s (3 − 2ε)(2 − 2ε)
to show that most of the annihilation energy is deposited
(232) along the direction of the quark and antiquark. The first
Comparing Eq. (228) with (232), we find for AR of step of the argument is to give a precise definition of the
Eq. (231) jet cross section and then to prove that the dangerous in-
ε frared (soft and collinear) divergences are controllable in
αs 4πµ2 cos πε size. The jet generated by the above QCD mechanism is
AR = C F
π s (1 − ε) often called the Sterman–Weinberg jet or the QCD jet.
We have calculated in Section IV.A the total cross sec-
1 3 19
× + + + O(ε) . (233) tion of e+ e− annihilations to which the QCD diagrams
ε2 2ε 4 shown in Fig. 3 contribute. The contribution up to or-
Substituting Eqs. (213) and (231) into Eq. (204), we finally der g 2 was represented in Fig. 4. Here we discuss the jet
obtain contibution up to the same order as above. The diagrams
contibuting to jets are the same as in Fig. 4 though the
3 αs
σ = (1 + AV + AR )σB = 1 + CF σB . (234) kinematical region in the final state is restricted. We define
4 π
the two-jet
√ event in such √ a way that most of the available
In this result we clearly see that the infrared divergences energy s, i.e., (1 − ) s with 1, is deposited on
present in AV and AR just cancel out leaving a finite one- two cones of small half-angle δ (see Fig. 6). We consider
loop effect. We thus finally obtain a(s/µ2 ), which was the angular distribution of the final quarks, i.e., the differ-
defined in Eq. (202). ential cross section dσ/d in the solid angle specified
It should be noted here that up to this order, a(s/µ2 ) is by polar and azimuthal angles θ and φ, respectively. The
independent of large logs, ln (s/µ2 ), and so Born contribution (dσ/d )B corresponding to Fig. 4a is
given by
a(s/µ2 ) = a(1). (235)
dσ α2 2
Under this circumstance we may, according to Eq. (200), = Q i (1 + cos2 θ ). (238)
d 4s
simply replace αs in Eq. (234) by ᾱ s , the running coupling B i
e e
q
FIGURE 6
k3x3
k1
x1
13
23
p1 p2
k2
x2
FIGURE 7
ζmax
dσ 3 2
1
ρ
= (1 + cos2 θ )α 2 αs CF Q i2 K R̄ = d x3 x3 (1 − x3 ) dζ1 , (251)
d R 16π s ζmin (1 − x3 ζ1 )2
i
2ε where ζmax and ζmin are respectively the maximum and
4πµ (1 − ε)2
× K R , (247) minimum values of ζ1 given in Eq. (246), and ρ defined
s (3 − 2ε)(2 − 2ε)
in Eq. (249) is rewritten in terms of ζ1 and x3 as
where
(1 − x3 )2 + (1 − x3 (2 − x3 )ζ1 )2
3
3
ρ=
x32 (1 − x3 )ζ1 (1 − ζ1 )
. (252)
−ε
KR = (1 − xi ) d xi δ 2 − xi ρ, (248)
R i=1 i=1 After some calculation we find
x12+ − x22 εx32 7 π2
ρ= . (249) K R̄ = 2 4 ln δ ln + 3 ln δ − + + O(δ, ).
(1 − x1 )(1 − x2 ) 4 3
(253)
Note that the slight deviation from the above angular
distribution is expected if a precise calculation is made.
Probably the easiest way of calculating K R in Eq. (248) is 1
the following: We first note that R
1
K R = K − K R̄ , (250)
1 sin2
where K is the quantity corresponding to K R integrated R
over the whole phase space and is already given in
Eq. (230) and K R̄ is obtained by integrating the integrand
of Eq. (248) over the region R̄ which is obtained by elim-
inating R from the whole phase space as shown in Fig. 8.
Since there is no infrared singularity in the region R̄, we sin2
x3
may put ε = 0 in K R̄ and then the calculation turns out to
be straightforward. Changing the variables to ζ1 and x3 , 0 1
we obtain FIGURE 8
P1: GPA Final pages
Encyclopedia of Physical Science and Technology EN013K-625 July 27, 2001 11:32
Hence we have and the hadronic pair jets appproximately in the direction
of the quark and antiquark. Thus the angular distribution
K R = 4B(1 − ε, 2 − 2ε)B(1 − ε, 1 − ε)
of the hadronic pair jets reflects the spin-1/2 nature of the
1 3 17 π 2 constituents.
× − + − − 4 ln δ ln − 3 ln δ .
ε2 ε 4 3 A similar argument as above may be made for hadronic
jets originating from gluon sources. It has been shown
(254)
that the same infrared cancellation as in the quark jets
Exactly in the same way as in Eq. (232) we calculate the takes place in the gluon jets. Hence the hadronic jet from
Born contribution to dσ/d in D dimensions, which re- the gluon should also be observed experimentally. In fact,
sults in clear signals of three jets from the quark, antiquark, and
ε gluon have been observed in e+ e− annihilation processes.
dσ α2 2 4π
= Q i (1 + cos θ )
2
d B 4s i
s
3(1 − ε)(2 − ε) SEE ALSO THE FOLLOWING ARTICLES
× . (255)
(3 − 2ε)(2 − 2ε)
GREEN’S FUNCTIONS • GROUP THEORY, APPLIED • PER-
Using Eqs. (247), (254), and (255) we finally obtain
TURBATION THEORY • QUANTUM MECHANICS
ε
dσ dσ αs 4πµ2 cos πε 1 3
= CF +
d R d B π s (1 − ε) ε 2 2ε
BIBLIOGRAPHY
13 π 2
+ − − 4 ln δ ln − 3 ln δ . (256)
2 3 Bromley, D. A., and Schäfer, A. (1994). “Quantum Chromodynamics,”
Springer-Verlag, Berlin.
Combining Eqs. (238), (239), and (256), we see that the
Close, F. (1997). “The Cosmic Onion: Quarks and the Nature of the
infrared divergences just cancel out, and find Universe,” Springer-Verlag, Berlin.
dσ dσ αs Fernandez, F. M. (2001). “Introduction to Perturbation Theory in Quan-
= 1 − CF 4 ln δ ln tum Mechanics,” CRS Press, Boca Raton, FL.
d d B π Forshaw, J. R., and Ross, D. A. (1997). “Quantum Chromodynamics and
the Pomeron,” Cambridge University Press, Cambridge.
π2 5
+ 3 ln δ + − . (257) Henyey, F. (1994). “Quantum Chromodynamics,” Springer-Verlag,
3 2 Berlin.
Manohar, A. V., and Wise, M. B. (2000). “Heavy Quark Physics,”
According to Eq. (257), we realize that the order-αs Cambridge University Press, Cambridge.
correction to the two jets from the quark pair is control- Reinhardt, H., and Alkofer, R. (1995). “Chiral Quark Dynamics,”
lable in size within the framework of perturbation theory Springer-Verlag, Berlin.
P1: GPB/GJK P2: FYD Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN013A-949 July 26, 2001 19:39
345
P1: GPB/GJK P2: FYD Final Pages
Encyclopedia of Physical Science and Technology EN013A-949 July 26, 2001 19:39
A. Landau Levels
The Hamiltonian for a single electron moving in two di-
mensions in a perpendicular magnetic field is given by
2
1 e
H= p+ A , (4)
2m b c
where, A is given by ∇ × A = B = B ẑ and m b is the band
mass of the electron. The solution of this problem shows
that the electron kinetic energy is quantized into Landau
levels, as shown in Fig. 3B,C. The eigenenergies are given
by E n = −h ωc (n + 1/2), where ωc = eB/m b c is the cyclo-
tron frequency, and n = 0, 1, . . . is the Landau level index.
The degeneracy of each Landau level for a single spin is
given by B/φ0 per unit area, where φ0 = hc/e is called the
flux quantum.
Now consider many independent electrons, with density
ρ per unit area. The ground state is obtained by filling up
the lowest energy single particle orbitals, with the condi-
tion that no orbital is occupied by more than one electron,
as required by the Pauli principle. The number of filled
Landau bands,
ρφ0
ν= (5)
B
is called the filling factor (defined so that a Landau level
filled with both spins has a filling factor of 2). The plateau
with RH = h/ne2 is centered at ν = n.
The many-particle ground state is infinitely degenerate
in general, because all arrangements of electrons in the
topmost partially filled Landau level produce the same
energy. The exception is at an integral filling factor, ν = n.
The ground state here is unique (Fig. 3B), with a gap to
excitations. This fact is responsible for the IQHE plateau
with RH = h/ne2 . The IQHE is a consequence of the quan-
tization of the electron energy into Landau levels.
due to the applied Hall voltage. Let us now consider a full The first several members of each sequence are well estab-
Landau level, that is, a state in which all single particle lished, in that quantized plateaus have been observed. The
energy levels with momenta p− < px < p+ are occupied, last few are seen only through resistance minima, but there
where p− and p+ are the momenta at the edges of the sam- is little doubt that the corresponding Hall plateaus will de-
ple. The current in the x direction is obtained by adding velop upon further improvement in sample quality. FQHE
the individual currents: at f also implies FQHE at 1 − f due to particle-hole sym-
metry in the lowest Landau level. In addition to these frac-
dpx
I =e vx . (6) tions, FQHE has also been observed with f = 5/2, the
2π h
only exception to the “odd-denominator rule” in a single
With vx = (1/m b ) px + (e/c)A x = ∂H /∂px = ∂E/∂px , layer system.
it follows that I = (e/ h)(µ+ − µ− ) = (e2 / h)VH , where
µ+ and µ− are the chemical potentials at the edges.
The current is independent of the LL index, so the total B. Model Hamiltonian
current for n filled Landau levels is I = (ne2 / h)VH . Now Because only the integral QHE is possible for independent
consider a sample with disorder, but connected to the electrons, interactions are clearly responsible for produc-
current source via ideal, disorder-free leads. So long as ing gaps at fractional filling factors. One therefore must
the electrons at the chemical potential, which are at one look for the solutions of the more complete problem of in-
edge of the sample, are not back-scattered, the current is teracting electrons, defined by the Schrödinger equation
not degraded and the Hall resistance remains quantized at H = E with
RH = h/ne2 , independent of disorder. The back-scattering 1 −h e
2
e2 1
is strongly suppressed because the states carrying currents H= ∇ j + A(r j ) +
in opposite directions are localized on opposite edges, j
2m b i c j<k |r j − rk |
and therefore are exponentially weakly coupled.
+ U (r j ) + gµB · S. (8)
j
III. FRACTIONAL QUANTUM HALL The first term on the right-hand side is the kinetic energy,
EFFECT the second term is the Coulomb interaction energy, the
third term is a one-body potential incorporating the effects
A. Phenomenology of the uniform positive background and disorder, and the
In 1982, Tsui, Stormer, and Gossard observed a plateau last term is the Zeeman energy. The parameter is the
quantized at RH = h/ 13 e2 . In the subsequent years, as the dielectric constant of the background material.
result of a tremendous improvement in the quality of Insofar as the conceptual foundation of the FQHE is
samples, a host of new fractions were observed. Today, concerned, it is convenient to neglect disorder, and con-
the number of observed fractions below unity ( f < 1) sider the limit of large B, so both the cyclotron and the
stands at approximately 50 and increasing. The plateau at Zeeman energies are large compared to the interaction
RH = h/ f e2 is seen in the vicinity of filling factor ν ≈ f . energy. The electrons are then fully polarized and con-
The longitudinal resistance RL exhibits activated behav- fined to the lowest LL, making the kinetic and Zeeman
ior, as in IQHE, indicating the existence of a gap in the energies irrelevant constants. We thus end up with the ide-
excitation spectrum. The observed fractions appear in se- alized model of fully polarized electrons in the lowest LL
quences of the form with the Hamiltonian (suppressing the interaction with the
n background)
f = . (7)
2pn ± 1 e2 1
HLLL = . (9)
Some of the fractions observed to date are: j<k |r j − rk |
n 1 2 3 10 This is the simplest and the cleanest model containing the
f = = , , ,...
2n + 1 3 5 7 21 essential physics of the FQHE. The strongly coupled, non-
n 2 3 4 10 perturbative nature of the problem can already be seen by
f = = , , ,... noting that there is no small parameter in the problem:
2n − 1 3 5 7 19
HLLL contains only one energy scale, set by the Coulomb
n 1 2 6
f = = , ,... interaction. All states are degenerate in the absence of in-
4n + 1 5 9 25 teraction, and the FQHE results as soon as the interaction
n 2 3 6 is turned on, no matter how small its strength (or, how large
f = = , ,... .
4n − 1 7 11 23 ). Standard perturbative treatments are not useful here.
P1: GPB/GJK P2: FYD Final Pages
Encyclopedia of Physical Science and Technology EN013A-949 July 26, 2001 19:39
The FQHE is a true many body phenomenon, in which which is relevant to FQHE at f = 1/5 observed later on.
strongly interacting electrons behave in a correlated man- However, the exponent is not allowed to take noninteger
ner to produce rich and nontrivial, yet amazingly sim- or even-integer values, and the observations of numer-
ple behavior. The solution of this Schrödinger equation ous fractions other than 1/m subsequent to Laughlin’s
should clarify the physics responsible such behavior. By theory pointed to the existence of a more general
analogy to the IQHE, the plateau at RH = h/ f e2 with structure.
fractional f originates due to the opening of a gap at
ν = f . (For such a gapped state, when the filling factor is
moved away from ν = f , some “defects” are created, but D. Composite Fermions
they are pinned by disorder, thus giving rise to a plateau A more general theory was put forth by the author in 1989.
at RH = h/ f e2 .) The goal of theory is therefore to ex- The fundamental building block of this theory is called the
plain the origin of gaps in a partially filled Landau level, composite fermion, which is the bound state of an electron
specifically why they appear only at certain sequences of and an even number of quantum mechanical vortices. Ac-
odd-denominator fractions. A satisfactory explanation of cording to this theory, strongly interacting electrons in the
the odd-denominator rule must necessarily elucidate why lowest LL capture vortices to turn into weakly interacting
there is no FQHE at even-denominator fractions (with the composite fermions. The ensuing investigations revealed
exception of f = 5/2). Even though the problem looks that the FQHE was only one manifestation of composite
intractable at first, the trail of experimental clues has led fermions, which describe a superstructure encompassing
to a simple, yet extremely accurate solution to the prob- other phenomena as well. Experimenters observed their
lem that is also in good agreement with experimental Fermi sea, their Shubnikov-de Haas oscillations, and their
observations. semiclassical cyclotron orbits; they also measured the par-
ticles’ charge, spin, statistics, mass, and magnetic moment
C. Laughlin’s Theory (Stormer and Tsui, 1996).
The composite fermion (CF) theory proposes the fol-
The first observed fraction was f = 1/3. Soon there- lowing wave functions at any arbitrary filling factor ν:
after, in 1983, Laughlin noted that the single parti-
cle wave function in lowest Landau level has the form
N
move about, the vortices carried by them generate phases The unusual character of composite fermion ought to be
which partly cancel the Aharonov-Bohm phases due to emphasized. It is a collective particle, with the definition
the external magnetic field, and the composite fermions in of one composite fermion involving all particles in the
effect experience a much weaker magnetic field. Because system. A composite fermion may live only inside the CF
a closed path around a vortex produces, by definition, a liquid. It is an inherently quantum mechanical object, a
phase of 2π , a vortex is effectively equivalent to a flux “quantum particle,” because it is the product of the union
quantum, which also produces the same Aharonov Bohm of an electron and quantum mechanical phases (vortices).
phase for a closed path around it. (For this reason, the com- The fluids of composite fermions are quantum fluids not
posite fermion is often envisioned as an electron bound to only because composite fermions themselves are quantum
2 p flux quanta.) Therefore, each vortex cancels one flux particles, but also because they involve a quantization of
quantum of the external magnetic field, giving the effec- the composite fermion orbits into CF Landau levels. The
tive magnetic field: composite fermion also has a topological character due
to its integrally quantized vorticity (=2 p). It is indeed
B ∗ = B − 2 pρφ0 . (13) surprising that the composite fermions behave as ordinary
In effect, each electron absorbs 2 p flux quanta of the ex- particles to a large degree.
ternal field to become a composite fermion (Fig. 6). (It In 1991, A. Lopez and E. Fradkin implemented the
must be understood here that an external magnetometer physics of composite fermions in a Chern-Simons field
will measure B and not B ∗ . There is no real “Meissner ef- theoretical framework. It has been further developed by a
fect” in the FQHE. However, B ∗ is the real magnetic field number of groups over the years. The Hamiltonian formu-
for composite fermions; this is the field that would be ob- lation of R. Shankar and G. Murthy (1997) obtains many
tained if the composite fermions themselves are used to essential features at the Hartree-Fock level.
measure the field.) Treating composite fermions as inde- To summarize: The strongly interacting electrons in the
pendent, their filling factor is ν ∗ = ρφ0 /|B ∗ |, and Eq. (13) lowest Landau level transforms into weakly interacting
can be transcribed into the relation in Eq. (12). The − sign composite fermions in a reduced magnetic field B ∗ . The
in Eq. (12) corresponds to the situation when B ∗ points lowest electronic Landau level thus splits into energy lev-
opposite to B. els of composite fermions (Fig. 3).
Laughlin’s theory of inverse-odd-integer states falls
naturally within
the CF theory. The wave function E. FQHE
1/(2 p+1) = 1 j<k (z j − z k )2 p is identical to Laughlin’s
wave function, which can be seen by noting that the wave Plotting the FQHE data as a function of B ∗ brings out its
function of the fully occupied lowest LL is given by striking similarity to the IQHE data plotted as a function
of B, as seen in Fig. 7. The difference between the two
1 2 is no more significant than that between the IQHE traces
1 = (z j − z k ) exp − 2 |z i | . (14)
j<k
4l i in different samples demonstrating that the strongly
correlated liquid of interacting electrons at B behaves as
1/(2 p+1) is interpreted as one filled Landau level of com- a weakly interacting system of fermions at B ∗ . A similar
posite fermions. correspondence is obtained for negative values of B ∗ , and
also at lower electron filling factors, where composite
fermions have four or more vortices bound to them.
There is also a correspondence between the Hall
plateaus at B and B ∗ , but the Hall resistances are not the
same at ν ∗ and ν. For example, at ν = 1/2, where B ∗ = 0,
the Hall resistance is RH = h/( 12 e2 ), but at B = 0 we have
RH = 0. The difference is explained by noting that the
composite fermions respond to a combination of the ex-
ternal Hall voltage and the Hall voltage induced by the
vortex current tied to the charge current in the CF state,
but the latter is not measured by the external voltmeter.
(Just like the effective magnetic field, the induced Hall
electric field is also fully internal to composite fermions.)
FIGURE 6 Capturing two flux quanta converts each electron into The FQHE of electrons is understood as the IQHE
a composite fermion moving in a reduced effective magnetic field. for composite fermions. The integral fillings ν ∗ = n of
(Adapted from Jain, 2000.) composite fermions correspond to electron filling factors
P1: GPB/GJK P2: FYD Final Pages
Encyclopedia of Physical Science and Technology EN013A-949 July 26, 2001 19:39
FIGURE 7 The top panel shows the IQHE of electrons. The bot-
tom panel shows the FQHE of electrons plotted as a function of
B ∗ , starting from B ∗ = 0 (n = 1/2). A close correspondence be-
tween the prominent features is manifest. (Adapted from Clark,
1986, and Du, 1993.)
Kohmoto, M. (1985). “Topological Invariant and the Quantization of the Scarola, V. W., Park, K., and Jain, J. K. (2000). “Rotons of composite
Hall Conductance,” Ann. Phys. 160, 343–353. fermions: Comparison between theory and experiment,” Phys. Rev.
Koulakov, A. A., Fogler, M. M., and Shklovskii, B. I. (1996). “Charge B61, 13064–13072.
Density Wave in Two-Dimensional Electron Liquid in Weak Magnetic Shankar, R., and Murthy, G. (1997). “Towards a Field Theory of Frac-
Field,” Phys. Rev. Lett. 76, 499–502. tional Quantum Hall States,” Phys. Rev. Lett. 79, 4437–4440.
Lam, P. K., and Girvin, S. M. (1984). “Liquid-solid transition and the Sondhi, S. L., Karlhede, A., Kivelson, S. A., and Rezayi, E. H. (1993).
fractional quantum-Hall effect,” Phys. Rev. B30, 473–475. “Skyrmions and the crossover from the integer to fractional quantum
Laughlin, R. B. (1981). “Quantized Hall conductivity in two dimen- Hall effect at small Zeeman energies,” Phys. Rev. B47, 16419–16426.
sions,” Phys. Rev. B23, 5632–5633. Stormer, H. L., and Tsui, D. C. (1996). Chapter 10 in Das Sarma, S.,
Laughlin, R. B. (1983). “Anomalous Quantum Hall Effect: An In- Pinczuk, A. (eds.) “Perspectives in Quantum Hall Effects,” Wiley,
compressible Quantum Fluid with Fractionally Charged Excitations,” New York.
Phys. Rev. Lett. 50, 1395–1398. Suen, Y. W., et al. (1992). “Observation of a ν = 1/2 fractional quantum
Lilly, M. P., et al. (1999). “Evidence for an Anisotropic State of Two- Hall state in a double-layer electron system,” Phys. Rev. Lett. 68,
Dimensional Electrons in High Landau Levels,” Phys. Rev. Lett. 82, 1379–1382.
394–397. Thouless, D. J., Kohmoto, M., Nightingale, M. P., and den Nijs, M.
Lopez, A., and Fradkin, E. (1991). “Fractional quantum Hall effect and (1982). “Quantized Hall Conductance in a Two-Dimensional Periodic
Chern-Simons gauge theories,” Phys. Rev. B44, 5246–5262. Potential,” Phys. Rev. Lett. 49, 405–408.
Moore, G., and Read, N. (1991). “Nonabelions in the fractional quantum Tsui, D. C., Stormer, H. L., and Gossard, A. C. (1982). “Two-
Hall effect,” Nucl. Phys. B360, 360–396. Dimensional Magnetotransport in the Extreme Quantum Limit,” Phys.
Niu, Q., Thouless, D. J., and Wu, Y. S. (1985). “Quantized Hall conduc- Rev. Lett. 48, 1559–1562.
tance as a topological invariant,” Phys. Rev. B31, 3372–3377. Von Klitzing, K., Dorda, G., and Pepper, M. (1980). “New Method for
Park, K., and Jain, J. K. (2000). “Two-Roton Bound State in the Frac- High-Accuracy Determination of the Fine-Structure Constant Based
tional Quantum Hall Effect,” Phys. Rev. Lett. 84, 5576–5579. on Quantized Hall Resistance,” Phys. Rev. Lett. 45, 494–497.
Pinczuk, A., Dennis, B. S., Pfeiffer, L. N., and West, K. (1993). “Obser- Wen, X. G. (1990). “Chiral Luttinger Liquid and the Edge Excitations in
vation of collective excitations in the fractional quantum Hall effect,” the Fractional Quantum Hall States,” Phys. Rev. B41, 12828–12844.
Phys. Rev. Lett. 70, 3983–3986. Willett, R. L., Ruel, R. R., West, K. W., and Pfeiffer, L. N. (1993).
Prange, R. E., and Girvin, S. M. (eds.) (1990). “The Quantum Hall “Experimental demonstration of a Fermi surface at one-half filling of
Effect,” Springer-Verlag, New York. the lowest Landau level,” Phys. Rev. Lett. 71, 3846–3849.
Saminadayar, L., Glattli, D. C., Jin, Y., and Etienne, B. (1997). “Obser- Willett, R. L., et al. (1987). “Observation of an even-denominator quan-
vation of the e/3 fractionally charged Laughlin quasiparticle,” Phys. tum number in the fractional quantum Hall effect,” Phys. Rev. Lett.
Rev. Lett. 79, 2526–2529. 59, 1776–1779.
P1: GNH Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
Quantum Mechanics
Albert Thomas Fromhold, Jr.
Auburn University
. 359
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
Time-independent Schrödinger equation Eigenvalue are not merely perturbations, as is the case for the gravi-
equation used to obtain energy eigenvalues and energy tational forces between planets orbiting the sun.
eigenfunctions for a system. However, it was expected that the hydrogen atom, con-
Uncertainty relation Mathematical form relating the taining only a single electron and thus free of the repul-
maximum precision of measurement of some physical sive Coulomb force between electrons in two-electron and
quantity, such as position or energy, to the precision of many-electron atoms, should be easily amenable to treat-
some related quantity, such as momentum or time. ment by classical mechanics. The model is a simple picture
Wavefunction A mathematical form, usually complex, in which the single electron orbits the far more massive
used to deduce the probability density. proton nucleus under the action of a centripetal force pro-
Wavelike Nonlocalized and periodic, with the capability vided by the attractive electrical force between electron
of interacting constructively or destructively. and nucleus.
Wave–particle duality Coexistence of wavelike and par-
ticlelike aspects in a physical entity, such as an electron.
B. Classical-Mechanical Treatment
of the Single-Electron Ion
QUANTUM MECHANICS is a theory that is capable of Let us consider the more general case of a single electron
predicting the behavior of atomic and subatomic systems. atom or ion, where the charge of the nucleus is Z e, with
In fact, it is the only theory to date that is adequate for the the quantity e representing the magnitude of the electron
microscopic domain in nature. Quantum mechanics not charge. For the hydrogen atom, Z = 1. The electrical force
only correctly predicts the results of physical observations between a nucleus of charge Z e separated by a distance r
in the microscopic world where classical physics is often from an electron of charge −e has magnitude
quite unsuccessful but also leads to valid predictions in
the macroscopic world experienced by human senses. K Z e2
FE = , (1)
r2
where K is a constant that for SI units has the value
I. CLASSICAL MODEL OF THE ATOM
1
K = , (2)
A. Structure of the Atom 4π ε0
Ernest Rutherford’s (1871–1937) interpretation of his ex- where ε0 is the electric permittivity of free space having
tensive scattering experiments in 1911 gave overpowering the value 8.854 × 10−12 farad/meter (F/m). Let us make
evidence that atoms consist of a dense, positively charged the assumption here that the nucleus is stationary and the
nucleus surrounded by a cloud of electrons. The elec- electron travels around the nucleus. Although this view
tron had been discovered a few years earlier in 1887 by is absolutely correct in classical mechanics only if the
Joseph John Thomson (1856–1940), who attributed a def- nucleus is infinitely massive, it can be shown to be ap-
inite charge-to-mass ratio to the particle. Before the dis- proximately correct whenever the nucleus is much more
covery of the wave-like properties of the electron in 1927 massive than the electron, and this is satisfied for all one-
by George Paget Thomson (1892–1975), the son of J. J. electron ions. Thus, we consider the radius of the electron
Thomson, it was expected that the mechanical properties orbit to be equal to the distance separating the electron and
of atoms could readily be explained by applying classi- the nucleus; corrections for deviations caused by the finite
cal mechanics in a straightforward way to this model. In nuclear mass can easily be incorporated into the treatment
fact, the successful model of planetary motion around the at a later stage. The electrical force provides the centripetal
more massive sun, under the action of gravitational forces force mv 2 /r , which causes the electron to travel in the hy-
between the planets and the sun, and the perturbations to pothesized circular orbit, m being the electron mass and v
such motion due to the gravitational forces between plan- being the electron speed, so that
ets, provide the closely related classical analog model for
mv 2 K Z e2
describing the motion of electrons about a single, massive = . (3)
nucleus: the electrical forces between charged particles r r2
replace the gravitational forces in the solar system. This at once gives the following nonrelativistic expression
The forces between electrons are repulsive instead of for the kinetic energy K of the electron:
attractive, and these repulsive forces are of the same order
of magnitude as the attractive forces between an electron 1 2 K Z e2
K = mv = . (4)
and the nucleus. Therefore these forces between electrons 2 2r
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
The potential energy P associated with the Coulomb The kinetic energy in terms of the angular momentum is
force is given by then given by
−K Z e2 1 2 m K 2 Z 2 e4
P = , (5) K = mv = (14)
r 2 2L 2
so that Equations (1)–(14) are based entirely on classical
mechanics.
1
K = − P . (6)
2
The sum of potential and kinetic energies gives the total C. Electromagnetic Fields Produced
energy T : by an Orbiting Electron
Let us explore the classical viewpoint a bit further by
T = P + K = −2K + K = −K . (7)
means of the planetary model of the one-electron atom,
The kinetic energy is intrinsically positive, as can be noted where a single electronic charge −e is considered in mo-
from Eq. (4), so that the total energy is negative. This tion about an equal-magnitude charge of opposite sign due
means that the electron is bound to the nucleus. As the to the proton. This constitutes a rotating electric dipole,
electron separation from the nucleus increases, the po- although admittedly the center of rotation is located very
tential energy algebraically increases toward zero. If the near one end of the dipole. This rotating dipole produces
electron cannot separate to an arbitrarily large distance a time-varying electric field at distant points in space. The
from the nucleus, we say that it is bound to the nucleus. electric field so produced is cyclic, being the same as the
The centripetal force relation given by Eq. (3) relates rotation frequency of the dipole. An oscillating electric
the electron speed v to the radius r of the classical orbit, field is thus produced at the observation point, the fre-
so that quency of the field ν being equal to the frequency of rota-
tion of the dipole. According to classical electrodynamics,
K Z e2
r= (8) the energy associated with this oscillating electric field
mv 2 can propagate outward in space or be absorbed at the field
or, equivalently point (e.g., by accelerating the conduction electrons in a
1/2 metal).
K Z e2
v= . (9)
mr
D. Electromagnetic Radiation Predictions
The rotation angular frequency ω is given by of Classical Physics
v K Z e2 1/2 The energy radiated away or absorbed from a rotating
ω= = . (10)
r mr 3 dipole source, as previously described, must come at the
expense of the potential energy of the configuration of
This development not only allows the kinetic energy to be
the two charges if it is not supplied by some source con-
expressed in terms of the radius, as given by Eq. (4), it
nected with the rotation of the dipole. For a hydrogen
also allows the angular momentum L = pr = mvr for an
atom, there is no such source. The consequent decrease
electron in a circular orbit to be expressed in terms of the
in total energy of the electron in the one-electron atom
radius,
would mean that the total energy becomes more negative,
K Z e2 1/2 which corresponds to an increase in the kinetic energy of
L = pr = mvr = mr = (K Z e2 mr )1/2 . the electron in accordance with Eq. (7). This corresponds
mr
(11) to a decrease in the radius of the electron orbit and to an
Equivalently, this gives the radius in terms of the angular increase in the rotation frequency according to Eqs. (4)
momentum and (10). On the basis of this classical model of radiation,
the frequency of the radiated electromagnetic wave would
L2
r= , (12) then increase. The changes which occur would thus be
K Z e2 m continuous; that is, since there could be arbitrary values
which can then be substituted into Eq. (9) to give the speed for the radius of each electron orbit, the electron would be
in terms of the angular momentum capable of having any one of a continuous range of values
of kinetic energy. This speed could be changed continu-
K Z e2 K Z e2 m 1/2 K Z e2
v= 2
= . (13) ously by adding or extracting arbitrarily small quantities
m L L of energy. As the speed and frequency of rotation change,
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
corresponding continuous changes occur in the radius of all machines in the world could be run for a day, a month,
the orbit. According to classical electrodynamics, an ac- a year, or even as long as one wishes, by allowing an elec-
celerated charge radiates energy, so the potential energy of tron to come closer and closer to a proton in a controlled
the electron would steadily decrease. This ultimately leads manner. This means that in the original creation of the op-
to a catastrophe: there would be theoretically no limit to posite charges, an infinite amount of energy was expended.
the process, even as the total energy of the electron–proton Alternately, we could imagine an explosion dwarfing even
system approached negative infinity, with the consequence that of the most powerful fission or fusion weapon avail-
that an infinite amount of energy would be radiated away. able today coming about by merely allowing or triggering
the collapse of a configuration of two point charges of op-
posite sign. Such incomprehensible conclusions resulting
E. Logical Failure of Classical Mechanics
from pushing the classical model to its logical endpoint
Clearly, the results deduced on the basis of the classical are sufficient in themselves to force us into the realization
model are unreasonable. In addition, the predictions do not that nature actually must behave under constraints addi-
correspond in any way to the experimental observations of tional to those contained within the formalism of classical
optical spectra, which are not continuous, but instead con- mechanics.
tain many sharp spectral lines. Hence, this classical model To bring in evidence bearing on this matter, electron–
cannot be used to describe the mechanical properties of positron pairs can be produced from gamma rays, and it is
an atom. known from these experiments on pair production that an
The true state of affairs is that all atoms have a low- infinite amount of energy is not required to separate the op-
est energy state, labeled the ground state by Niels Henrik positely charged particles so produced. In fact, the γ -ray
Bohr (1885–1962), for which no further energy emission energy required is only of the order of the rest mass ener-
is possible. Furthermore, even while in some given higher- gies of the electron and the positron. Likewise, pair annihi-
energy configuration, the atom does not emit energy con- lation does not lead to the emission of an infinite quantity
tinuously; excited states emit energy only sporadically, of γ -ray energy, so again we must conclude that the nega-
each such event being accompanied by a sudden jump tive Coulomb energy must be restricted by nature to have
to a lower-energy configuration. Bohr called the various a finite magnitude. This end could be accomplished by re-
discrete energy states of an atom stationary states, since stricting the separation distance between opposite charges
such states are stable until the time when a transition to a to some minimum, nonzero value corresponding, perhaps,
lower-energy state actually takes place. to the electron–proton separation distance in the so-called
The logical deduction from the previous discussion only ground-state configuration of the hydrogen atom.
can be that the classical approach, which is so successful
for describing the motion of the planets, fails completely
F. New Foundations of Mechanics
for the hydrogen atom. So much for arguing by analogy!
Moreover, the failure of classical mechanics for the hy- It is important to note that classical mechanics does pro-
drogen atom is manifested not in our lack of ability to see vide a generally adequate theoretical description of motion
and follow the electron, because it might be too tiny to be for all objects that can be seen in an ordinary way traveling
seen with the eye or even with a microscope, but instead in at ordinary speeds. By “ordinary,” we mean observable to
experimental measurements yielding data such as the light the human eye and traveling at speeds low enough to be
spectrum emitted by excited gases of atoms. The observed able to follow the position of the object with the eye. In
spectra can in no way be explained without additional as- fact, the theory is found to apply even in the domain of far
sumptions quite foreign to classical mechanics. smaller particles, such as can be seen only with the aid of
Even considered statistically, an attractive, inverse an optical microscope, as long as the speed is well below
square force, as typified by Eq. (1) for the Coulomb elec- that of light propagation. However, the extremely versa-
tric force, leads to a negative potential energy that varies tile framework provided by classical mechanics yields in-
inversely with the separation distance between charges accurate results in two domains, which can be distinctly
(or of masses, in the case of gravitational forces). The po- different: (i) particles traveling at speeds approaching the
tential energy is conservative, meaning that a decrease in speed of light and (ii) particles having very tiny masses.
separation distance yields energy that can be extracted by First of all, objects traveling at speeds v of the order
an external agency or converted into heat. If the separation of 0.1c or larger manifest marked departures from Isaac
distance can be imagined to decrease to zero, then the po- Newton’s (1642–1727) predictions based on a fixed mass.
tential energy approaches negative infinity. This implies This is due to the relativistic dependence of mass on veloc-
that an infinite amount of energy simultaneously would ei- ity. Second, very small mass particles, such as the smaller
ther be extracted or be converted into heat. Conceptually, atoms and nuclei (e.g., the hydrogen and helium atoms,
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
proton, neutron, and α-particle) and especially the even for the present development are presented here. It is vital
less massive electrons, all exhibit diffraction properties to point out that the profound prediction by Einstein of the
typical of wavelike phenomena. variation of the measured mass of an object with its speed
In both of these domains—namely, the domain of very has immediate implications for motion which are quite
fast particles and the domain of very tiny particles— contrary to those predicted by Newton’s second law, ap-
entirely new forms of mechanics were erected that had plied in the restricted sense of a fixed, unvarying mass for
essentially different premises than those inherent in clas- any given object. Contained within the theory of special
sical mechanics. This was necessary because classical me- relativity is not only the experimentally observed mass
chanics simply fails to describe and correctly predict the variation with speed but also the concept of the equiva-
motions and trajectories of objects under such conditions. lence of mass and energy. The latter concept, which is di-
The domain listed in (i), very high-speed motion, requires rectly verified in electron–positron pair production, has of
the use of Albert Einstein’s (1879–1955) theory of special course far-reaching consequences with regard to present-
relativity (1905). The domain listed in (ii), very small- day energy production.
mass particles, requires the use of the theory of quantum Defining the velocity v as a vector with magnitude equal
mechanics, the subject of this article. to the speed and pointing in the direction of motion, the
Special relativity and quantum mechanics constitute a momentum p is then simply
pair of theories that enable one to make accurate calcula-
p = mv. (15)
tions in the separate domains listed. The relativistic me-
chanics provided by Einstein’s theory of special relativity Newton considered the mass m to have a fixed value m 0 .
accurately describes and predicts motions of particles and The acceleration a is defined as the time rate of change of
bodies moving with speeds approaching the speed of light; the velocity
quantum mechanics describes and predicts experimental
dv
observations for very tiny particles, atoms, and the prop- a= . (16)
erties of ensembles of atoms. Einstein’s theory of special dt
relativity alone, however, is unable to account for the prop- Newton’s second law F = m 0 a thus can be written in terms
erties of atoms and the discreteness of the emitted radiation of the time rate of change of the momentum
that constitute the optically observed atomic spectra. dp
For very small-mass particles traveling at relativistic F= . (17)
dt
speeds, more complicated theories (relativistic quantum
This form is valid even in special relativity, as are Newton’s
mechanics) have been formulated that have had their suc-
first and third laws. Time t, however, is not absolute in
cesses, one of the most prominent being Paul Adrien
special relativity as it is within the framework of Newton’s
Maurice Dirac’s (1902–1984) theory predicting the ex-
formalism. Also, we know from special relativity that mass
istence of the positron. Ideally, one would like to have a
m depends upon speed in accordance with
very general framework that not only predicted the motion
of particles correctly for each of these domains but also m = γ m0, (18)
gave the correct results under more ordinary conditions in
which classical mechanics already does a completely sat- where m 0 is referred to as the rest mass and the parameter
isfactory job. It must be admitted, however, that the basis γ is determined by v/c, the ratio of the speed of the particle
for a perfectly general theory of mechanics, if it is within to the speed of light, in accordance with
the province of human beings to conceive, still remains to 2 −1
v
be developed. Even Einstein’s general theory of relativity γ = 1− . (19)
does not contain the power to interpret the microscopic c
world but, instead, gives answers to cosmological ques- Energy is related to mass m in special relativity:
tions relating to the macroscopic world. Simply put, no
completely general theory exists today. = mc2 . (20)
This relation holds true for any speed v. When v = 0, this
reduces to the rest mass energy
II. SPECIAL RELATIVITY 0 = m 0 c2 . (21)
A. Essential Relations for Quantum Mechanics The energy versus momentum relation for a free particle
in special relativity is
Because special relativity is covered adequately in another
part of this encyclopedia, only the relationships essential 2 = 20 + p 2 c2 , (22)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
which differs from the energy–momentum relation in and Dirac, on the contrary, believed that the uncertainty
Newtonian mechanics. inherent in quantum mechanics was in fact an intrinsic re-
Energy–momentum relations are needed to develop the ality in nature, never to be overcome by any better theory
de Broglie relation, which underlies the Schrödinger equa- purported to be more comprehensive or general. For these
tion of quantum mechanics. philosophically minded physical theorists, the important
uncertainty principle is elevated from its role as an integral
part of quantum mechanics theory to an even more lofty
B. Philosophical Implications
philosophical principle.
There is unquestionably a great philosophical differ- Philosophical speculation, of course, is not physics. One
ence between Einstein’s special relativity and modern-day should ask different questions, such as how to best describe
quantum mechanics. It is unlikely that Einstein could have and predict experimental observations in a logical manner
foreseen in 1905, the year of publication of his works on from models developed from a combination of observation
special relativity and the photoelectric effect, the great and intuition. To go beyond this end by inquiring why
philosophical differences that eventually would split the nature behaves in such a fashion can prove interesting and
quantum way of thinking from the relativity way of think- stimulating but does not, in itself, constitute the furthering
ing. Relativistic mechanics, like classical mechanics, leads of the subject proper of physics.
to a world view of an absolutely predictable future, at least The best approach is to develop formulations that rep-
in principle, based on the specified state of the universe resent a synthesis of all experimentally verified properties
at any given time. Quantum mechanics, on the contrary, of matter, including variation of mass with velocity and
contains an intrinsic margin of uncertainty regarding the wavelike interference of particles. The roots of the most
future evolution of the system, even if the state of the fundamental relation (the de Broglie relation, developed
universe is known as completely as possible at any given by Louis Victor de Broglie) in the wavelike description
time. For example, descriptions of the motion of an elec- of particles are to be found most naturally in Einstein’s
tron by Newtonian and Einsteinian mechanics are both explanation of the photoelectric effect and in the equa-
quite precise, although not necessarily in agreement with tions of special relativity. It is an enigma that the seeds of
each other, in predicting a specific position and velocity quantum theory lie within one of Einstein’s creative expla-
at a time t whenever the exact position and velocity at nations of a very puzzling experimental observation, the
any other time t are specified as well as all forces that photoelectric effect. The enigma arises from the fact that
act on the particle in the time interval between t and t. Einstein’s best-known work, the theory of special relativ-
Quantum mechanics, on the other hand, gives only relative ity, is an entirely separate and remarkably different theory
probabilities for a broad range of possibilities. philosophically from that of quantum mechanics.
One must be flexible enough to accept experimental
facts and not reject, for example, relativistic mechanics
simply because one dislikes the concept that the masses III. QUANTUM CONCEPTS
of objects vary with speed. In the same way, experimental
demonstration of the wave properties of matter requires A. Early Quantum Theory
that theoretical formulations be broadened to include these
properties. This leads in a natural way to the concept of Old quantum theory is essentially the patchwork of clas-
indeterminism. sical and quantum ideas that were pieced together to yield
The lack of determinism inherent in quantum mechan- Planck’s theory of black-body radiation, Einstein’s expla-
ics was the “Achilles heel” that eventually led Einstein to nation of the photoelectric effect, and Bohr’s theory of the
the conclusion that quantum mechanics provides, at best, one-electron atom. Old quantum theory included the con-
an incomplete description of the universe. His powerful cept of the particle nature of radiation (i.e., the photon)
intuition, which should not be lightly discounted, led him and the quantization of the radiation energy in elemental
to the belief that quantum mechanics more than likely units hν. However, it did not include the concept of the
would be superseded eventually by a more complete the- wave nature of matter.
ory, completely deterministic in form, which would allow
exact predictions of the complete future evolution of a sys-
B. Einstein’s Concept of Light
tem, given only the initial conditions at some prior point in
as an Energy Quantum
time. This viewpoint, which means that nothing is left to
chance, was hotly disputed by prominent contemporaries The origin of the word “quantum” in quantum mechanics
of Einstein. It would seem to belie even the possibility of can be understood from its use in the explanation of some
free will in humans. Max Karl Ernst Planck (1858–1947) of the properties of light. It was Einstein, in his explanation
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
total energy therefore always changes by discrete photon- equation. The Schrödinger formulation can be used to de-
energy increments hν. To the extent that the electron can duce a matrix formulation of quantum mechanics analo-
be viewed as changing speed in its orbital motion to yield gous to Heisenberg’s formulation, so in this sense the two
the emission of energy, that change in speed must occur theories are equivalent.
in a single-step process. Clearly, an entirely new type of
mechanics is required for the description of atomic sys- F. Bohr Quantized Energy Levels
tems having such properties. Due to the discreteness of for Hydrogen Atom
the stationary states and the quantum nature of the energy
emitted as photons, this new type of mechanics was called It is remarkable that Bohr was able to put together a suc-
quantum mechanics. cessful theory of the hydrogen atom, since its formulation
in 1913 predated by more than two decades the discov-
ery of the wave diffraction of particles and even predated
E. Ideas of de Broglie, Heisenberg, by a decade the postulate of wave–particle duality by de
and Schrödinger Broglie and his deduction that a particle of momentum
Matter was discovered in 1927 to have wavelike proper- p has a wavelength λ associated with it. Fundamentally,
ties by means of the electron diffraction experiments of Bohr utilized the data given by experimental optical spec-
Clinton Joseph Davisson (1881–1958) and Lester Halbert tra together with heuristic arguments based on the classical
Germer (1896–1971), and G. P. Thomson (1892–1975). limit. A somewhat different line of reasoning, based on the
This was preceded (1923) by de Broglie’s deduction from experimentally confirmed de Broglie relation, is followed
special relativity considerations that a particle of energy here in deducing the quantized energy levels of Bohr.
hν and momentum p had a wavelength λ associated with As a preliminary to the development of the Schrödinger
it. In the same way that electromagnetic radiation can be equation utilizing the experimentally verified de Broglie
observed to behave in a wavelike manner under certain relation
conditions and in a particlelike manner under other con- h
λ= (25)
ditions (witness radio-wave interference on the one hand p
and the photoelectric effect on the other), de Broglie pos-
between the wavelength λ associated with a particle and
tulated that there was a wave–particle duality for all of
the momentum p of that particle, where h is Planck’s
nature. Wave–particle duality asserts that nonzero mass
constant having the value 6.6262 × 10−34 Joule-second
particles, such as electrons, protons, neutrons, and atoms,
(J-s), let us use that same relation as a selection device
can be observed to behave in a wavelike manner under
for choosing a series of discrete orbits for an electron
the proper conditions and in a particlelike manner under
imagined to circle a proton from the continuous range of
different conditions.
orbits allowed in the purely classical planetary model of
This very important concept of particles possessing
the one-electron atom. The key addition is the requirement
an intrinsic wave character underlies the Schrödinger
that a wave associated with a trajectory must be a single-
equation—that cornerstone of present-day quantum me-
valued function of position on the trajectory. For a circular
chanics dating from 1926. The birth of quantum me-
orbit, this requires that the wavelength be commensurate
chanics, in fact, dates to the developments of Werner
with the circumference of the orbit, namely
Karl Heisenberg (1901–1976) in 1925 and of Erwin
Schrödinger (1887–1961) in 1926, both of which intrin- 2πr
= n, (26)
sically contain the wave properties of matter, although λ
somewhat differently. Whereas Schrödinger directly de- where r is the radius of the circular orbit, λ is the wave-
veloped a wave equation to describe the behavior of mat- length, and n is any integer ≥1. Note that no attempt is
ter, Heisenberg incorporated the wave properties into a made to interpret the meaning of the wave, which is as-
theory in a somewhat different way, setting up a noncom- sumed to exist around the circumference of the orbit; in-
muting matrix operator formulation. Heisenberg was able stead, only one of the most general properties of waves
to show that certain pairs of physical observables repre- (i.e., single-valuedness) is relied on when writing the con-
sented by noncommuting operators, in principle, could not dition given by Eq. (26). The logic of this approach is
be measured to arbitrary precision; rather, the more pre- merely that if the properties of matter are wavelike and if a
cise the measurement of one member of the pair was, the planetary orbit exists, then the condition given by Eq. (26)
less precise the knowledge of the other would be. Thus should be met. Later in the treatment of the hydrogen atom
was born the Heisenberg uncertainty principle. This type by means of the Schrödinger equation, the concept of a
of uncertainty follows naturally from a wavelike descrip- well-defined planetary orbit will be found to be too naive,
tion of matter such as that represented by the Schrödinger except in what is designated the classical limit. This is
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
due to the fact that a wave does not usually have a precise energy increases algebraically from negative values to ap-
localization. Nevertheless, there do exist well-defined val- proach the asymptotic limit of zero corresponding to an
ues of the most likely separation distance of the electron unbound state. At the other extreme of small values for n,
relative to the nucleus of the one-electron atom. the negative total energy becomes algebraically smaller,
Substituting Eq. (25) into (26) gives corresponding to tighter binding of the electron and more
localization of the electron in the neighborhood of the nu-
2π pr/ h = n (27)
cleus. The lowest energy state is given by n = 1, so this
which, in terms of the new constant h defined as is the ground state of the one electron atom. Denoting the
h total energy in this state by 0 gives
h= , (28)
2π −m Z 2 e4 −m Z 2 e4
0 = = . (34)
takes the form 32π 2 ε02 h 2 8ε02 h 2
pr = nh . (29) Substituting the values m = 9.1096 × 10−31 kg, e =
Because for a circular orbit, the product pr is the mag- 1.6022 × 10−19 Coulombs (C), h = h/2π = 1.0546 ×
nitude of the vector angular momentum L, Eq. (29) consti- 10−34 J-s, and ε0 = 8.854 × 10−12 F/m gives
tutes a restrictive condition on the angular momentum; that 0 = −Z 2 × 2.180 × 10−18 J
is, the values of the angular momentum L are restricted to
the discrete set of values L n given by = −Z 2 × 13.60 eV (35)
L n = nh (n = 1, 2, 3, . . .). (30) as the ground-state energy in electron volts (eV) for the
one-electron atom. For hydrogen, Z = 1, so the ground-
This is a statement of the quantization of angular momen-
state energy of the hydrogen atom is predicted to be
tum. Thus, one arrives at the startling conclusion that not
−13.6 eV. The corresponding Bohr radius, as deduced
only is angular momentum conserved for the one-electron
from Eq. (31), is 0.529 × 10−10 m. As will be deduced
atom, as can be readily deduced from classical mechanics,
shortly by using these results to examine the predictions
but also that the new quantum condition establishes that
for optical spectra, the theory yields results that are in good
the angular momentum must be quantized. The elemental
agreement with experiment. Thus, this amalgam of classi-
unit for the mechanical angular momentum is thus h . Be-
cal mechanics and the assumption of a wavelike character
cause angular momentum quantization proceeds entirely
for the electron in orbit, as given by the de Broglie relation,
from the wave description, it is a direct consequence of
leads to a new picture for electrons in atoms.
the wave nature of particles.
Needless to say, the electron is too small to be observed
Now let us show that the quantization of the angular
directly in its orbit, even if such a picture were tenable from
momentum leads to the quantization of the energy values
a fundamental standpoint, so the theory cannot be proved
for the one-electron atom. Introducing the quantum con-
or disproved in this way. On the other hand, optical spectra
dition given by Eq. (30) into Eqs. (12), (7), and (14) gives
can be measured, so optical measurements can serve as a
discrete values for the radius
point of contact between microscopic mechanical models
(nh )2 of the atom and the world of observation. The predictions
rn = (31)
K Z e2 m of the model therefore can be tested in this manner. In
and leads to the following quantized values for the total Section III.H, the Bohr theory will be shown to lead to a
energy of the Bohr atom reasonably accurate explanation for the optical spectra for
the one-electron atom.
−m K 2 Z 2 e4
T = −K = (n = 1, 2, 3, . . .). (32) Despite the success of the Bohr theory for one-electron
2n 2 h 2 atoms, it is incapable of giving realistic predictions for
Using 1/(4π ε0 ) for K , this expression for the total energy atoms having two or more electrons and, a fortiori, fails to
can be written as give a general explanation of the periodic table for the great
−m Z 2 e4 variety of elements to be found in nature. Therefore, it can
n = be concluded that a stronger wave theory is needed for
32π 2 ε02 n 2 h 2
understanding and predicting nature. The planetary model,
−m Z 2 e4 as modified in the simplest possible way by the concept
= (n = 1, 2, 3, . . .). (33) of the wave nature of matter, is sufficiently successful to
8ε02 n 2 h 2
indicate a promising way to proceed, since it gives some
Thus, the average value of the orbit radius is predicted insight into the types of thought processes and concepts
to increase with an increase in the integer n while the required to begin a better treatment.
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
G. The Heisenberg Uncertainty Relations For a meaningful electron orbit, the uncertainty in po-
sition of the electron must be less than the radius of the
There is a still more fundamental problem with the Bohr
orbit. Applying Eq. (37) to this situation yields
theory as deduced on the basis of the wave properties of
matter inherent in the de Broglie relation. This problem h
pn > (38)
has to do with the basic meaning of the position of an 2rn
electron on one of the Bohr orbits in the hydrogen atom. for a Bohr orbit of radius rn . The radius rn is given by
The philosophical question is whether or not a physical Eq. (31), so that for Z = 1, rn = 4π ε0 n 2 h 2 /me2 . Thus
quantity is actually meaningful if it cannot be measured.
The Heisenberg uncertainty principle actually implies that me2
pn > . (39)
the position of an electron on the smaller orbits in the Bohr 8π ε0 n 2 h
model cannot be measured without serious disruption of The corresponding total energy T given by Eq. (32) is
the electron trajectory itself, as we shall prove later. First n = −me4 /32π 2 ε02 n 2 h 2 . The momentum pn for state n
of all, let us examine one approach that can be used to obtained by means of the general relation p = (2mk )1/2
rationalize the uncertainty relation. is given by pn = (2m|n |)1/2 , since K = −T accord-
Let us consider the experimental aspects of a position ing to Eq. (32). Substituting the expression for n then
measurement of the location of a point mass by means gives pn = me2 /4π ε0 nh . Now let us establish the re-
of a microscope utilizing electromagnetic waves of quirement on n provided by pn evaluated above.
wavelength λphoton and frequency νphoton = c/λphoton . It Again using K = −T , with K = p 2 /2m, it is found that
is a well-known fact that the resolution of a microscope K = ( p/m)p, so that n = ( pn /m)pn . Substitut-
is limited by the wavelength of the light utilized for the ing the above expressions for pn and pn then gives
measurement such that the uncertainty xmass of the
position measurement will be of the order of (or greater me4
n > = |n |/n. (40)
than) the wavelength: 32π 2 ε02 n 3 h 2
xmass ≥ λphoton . (36) When n = 1, the ground-state Bohr orbit having radius r1
is denoted by a and the ground-state energy 1 is denoted
However, as already discussed, photons have momentum by 0 . For this case, 0 > |0 |. This surprising result
p = h/λ, and the conservation of momentum when the indicates that the uncertainty introduced into the energy
measuring photon scatters off the point mass will cause by an attempted measurement of the position of the elec-
an uncertainty in the final momentum of the point mass tron on its ground-state orbit exceeds the binding energy
of this order—namely, pmass ≈ h/λphoton . Therefore, itself, so the stable configuration will be seriously dis-
it must be concluded that, following the measurement, rupted by the measurement, if not totally destroyed. Thus,
xmass pmass ≥ (λphoton )(h/λphoton ) = h. Thus, there is if one subscribes to the debatable tenet that a quantity is
some lower limit to the precision with which the two not physically meaningful unless it can be experimentally
physical variables of momentum and position of a particle measured, then the very meaning of an electron orbit as-
can be measured. This conclusion is consistent with sociated with the ground-state energy of the Bohr model
a rigorous version of what is known as the position— is brought into serious question. In any event, an electron
momentum form of the Heisenberg uncertainty relation, cannot be observed directly in orbit about a proton, so it
which states that the minimum value of (x)(p) is of the is impossible to say exactly what the separation distance
order of 12 h . There is no particular limit to the maximum is between electron and proton at any given instant. How-
value. Therefore, ever, the energy absorption and emission due to transitions
of the hydrogen atom between various states of excitation
1
xp ≥ h. (37) can be measured. This is indeed an experimental point
2 of contact with the theory. Therefore, let us examine the
Two variables for which the uncertainty relation holds predictions of the Bohr theory for the optical spectrum of
are known as complementary variables. hydrogen.
An alternate pair of complementary variables is given
by the energy of a particle in a quantum state and the
H. Optical Spectrum of Hydrogen
lifetime of the particle in the state. This means that there
is also a lower limit to the product t, where is If the electron in the hydrogen atom with energy corre-
the uncertainty of the energy of the particle and t is the sponding to the integer n suddenly undergoes a transi-
uncertainty in the time the particle will remain in that state. tion to an energy corresponding to a different integer n ,
Thus, in analogy with Eq. (37), t ≥ 12 h . with the energy difference being positive so as to create a
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
photon of energy hν, then energy conservation gives the slightly smaller orbit radius than the one obtained above
relation (assuming the radius is the same as the separation dis-
2
1 2 1 tance between electron and proton). This is the so-called
hν = n − n = −0
− , (41) reduced mass effect, which is purely classical in nature.
n n
where 0 is given by Eq. (34), with Z = 1. Since for the
photon I. The Laser
hc c The absorption and emission of light due to electron tran-
hν = = 2π h (42)
λ λ sitions between quantized energy levels provide the basis
the relation given by Eq. (41) can be used to write for one of the most powerful research tools developed over
2 the past 50 years—namely, the laser. The key property of
1 hν −0 1 2 1
= =
− . (43) a laser beam is its phase coherence. The phase coherence
λ 2π h c 2π h c n n of the beam is to be contrasted with the random phase
This is usually written in the form relationships between the light emitted from various re-
2 gions of an ordinary light source, such as an incandescent
1 1 2 1 filament. The phase-coherent beam in a laser is produced
=R
− , (44)
λ n n by reflecting the emitted light back and forth between par-
where allel mirror surfaces bounding the emitting medium, so
−0 me4 that any light already emitted triggers additional emission
R= = (45) and influences the phase of such additional light emission
2π h c 64π 3 h 3 ε02 c
from the excited atoms of the medium. The phase of the
is known as the Rydberg constant. Thus, transitions from newly emitted light turns out to be the same as that of the
the various excited states (n > 1) to the ground state (n = already emitted light. It is not intuitive from a quantum
1) leads to a series of spectral lines with frequencies mechanical viewpoint that this should occur, but it is as
ν = c/λ given by one might expect on the basis of classical physics, since
2
1 the electric field provides the accelerating force for the
νn→1 = cR 1 − (n = 2, 3, 4, . . .). (46) atomic electrons. That phase coherence does occur is, in
n
fact, basic to the nature of the process referred to as stimu-
This series of lines is known as the Lyman series, which lated emission. Stimulated emission is to be distinguished
is in the ultraviolet region of the spectrum. Yet a second from spontaneous emission, which prevails in ordinary
series of spectral frequencies can be generated by transi- light sources which lack phase coherence.
tions from excited states with n > 2 to the state n = 2. It The emission of light in a laser can initiate due to prior
can be noted that these frequencies are given by electronic transitions in atoms in a gas (e.g., a helium—
2
1 2 1 neon mixture) or in a solid (e.g., ruby) that have been
νn→2 = cR − (n = 3, 4, 5, . . .). (47) preexcited by some process, such as the flash of an ordi-
2 n
nary discharge lamp. As the phase-coherent beam builds
The spectral lines in this series, known as the Balmer se- in intensity within the laser, a portion of it is allowed to
ries, have wavelengths in the near ultraviolet and visible emerge continuously from the active lasing medium by a
region of the spectrum. Similar series of lines determined partial transmission through one of the mirrored surfaces.
from n = 3, 4, and 5 can be written using the above pre- The intense phase-coherent beams produced by lasers
scription; these three series, labeled Paschen, Brackett, allow many enhanced experiments involving light inter-
and Pfund, respectively, have spectral lines with wave- ference to be carried out. The laser has enabled the speed
lengths in the infrared region of the spectrum. of light to be measured to much greater precision than
These various series of spectral lines were known from ever before, such that the wavelength of a laser beam can
experimental observation long before the invention of the be used as an accurate standard for the measurement of
Bohr model of the atom and indeed constituted some of the length. The angular divergence of a laser beam can be
major evidence that classical mechanics was inadequate held to quite small values, so that reflectance of a measur-
for the treatment of atomic systems. The frequency rela- able signal from very great distances (e.g., from earth to
tionships deduced from the Bohr theory agree quite well moon and return) becomes possible. By measuring the
with the experimental spectra. Even better agreement is time for round-trip light travel, large distances can be
obtained when the finite mass of the nucleus is included measured more accurately than was posible previously.
in the theory; the correction comes about because elec- The Michaelson–Morley experiment for determining the
tron and proton revolve about the center of mass, giving a speed of light in different directions relative to the earth’s
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
instantaneous velocity vector in its orbit about the sun also The net force Fy is given by the difference between the
can be performed with greater precision with the aid of the y-directed components of the uniform axial tension force
laser. T at the two ends of the element x. Considering θ to
measure the angle in radians between the wire and the
IV. WAVE MOTION horizontal line representing the wire when it has no trans-
verse displacement, θ can be visualized as varying with
A. Development of the Wave Equation position x along the wire and with time t as the wire vi-
for Transverse Vibrations brates. The tensile force T is considered to have a line of
action that is always parallel to the wire. This leads to a
The essential equations for classical wave motion consti- projected component in the y-direction given by (T sin θ).
tute the foundation for developing intuitive insight into the At the ends of the segment x the tensile force T acts in
fundamentals of quantum mechanical wave equations for opposite directions, as is required of a tensile force. If the
particles. It is well known that wave motion can describe segment is not to be accelerated in the x-direction, then
as varied phenomena as the transverse displacement of the x-components of these end forces must essentially
a wire in tension, the propagation of sound in gases, the cancel each other. The component of T projected in the
motion of electromagnetic radiation in free space, and the x-direction is given at any point by (T cos θ ), so this can-
longitudinal displacements associated with elastic waves cellation condition is met adequately if θ is small enough.
in a solid. All of these classical phenomena can be treated In the small θ limit, the net force in the y-direction on the
theoretically by means of the same differential equation segment x, namely,
technique, which moreover proves to be adequate for de-
scribing the wave properties of particles. Fy = T sin θx+x − T sin θx (49)
Consider, for example, the basic classical problem of reduces to the approximation
the time and position dependence of the transverse dis-
placement of a vibrating wire, string, or rope, as indicated Fy T θx+x − T θx (50)
in Fig. 3. The transverse displacement y(x, t) of a thin which, in turn, can be approximated by
wire under tension T , with mass per unit length along
Fy = T tan θx+x − T tan θx . (51)
the x-direction given by ρ, is determined as a function of
time t by a differential equation that can be obtained from Because tan θ is the slope ∂ y/∂ x, this last equation is
a straightforward application of Newton’s second law of equivalent to
motion, F = ma, where F is the force and a is the ac-
celeration. A net force Fy acting in the y-direction on a ∂y ∂y
Fy = T − (52)
length x produces an acceleration a y that is inversely ∂ x x+x ∂x x
proportional to the mass ρx,
2 Equating this expression for Fy to the partialderivative
d y equivalent of Eq. (48) gives
Fy = (ρx)a y = (ρx) . (48)
dt 2 2
∂ y ∂y ∂y
(ρx) =T − (53)
∂t 2 ∂ x x+x ∂x x
Dividing through by x and taking the limit x → 0 gives
∂2 y ∂2 y
ρ =T 2 (54)
∂t 2 ∂x
The unit of tension T in SI units is Nt and the units of ρ
are kg/m, so T /ρ has the dimensions of m2 /s2 , the square
of a speed. Denoting the square root of this quantity by
1/2
T
vp = (55)
ρ
where v p has the units of velocity, the equation for a vi-
brating wire takes the form
∂2 y 1 ∂2 y
= (56)
FIGURE 3 Transverse wave. ∂x2 v 2p ∂t 2
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
This is the well-known classical wave equation. It is a lin- where means “take the real part of.” It is expedient
ear, second-order differential equation that governs, in the at this point to restrict the constants k and ω to real val-
present example, the transverse displacement y(x, t) as a ues. However, it proves very useful, as will be shown,
function of position along the wire as time proceeds. Both to maintain the constant C in its most general complex
standing-wave modes and running-wave modes are possi- form,
ble, depending upon the boundary conditions imposed on
C = D exp(i), (61)
the wire.
The classical wave equation given by Eq. (56) also has where D and are real numbers. The trial solution there-
a three-dimensional form to describe motion in arbitrary fore can be written in the form
directions in space. There are various ways to generalize
y = D exp[i(kx − ωt + )], (62)
Eq. (56), but it is simplest first of all to think of wave
motion along the z-axis, in contrast to the x-axis. The thus giving real solutions of the form
simple change in dependent variable from x to z naturally
y = D cos(kx − ωt + ). (63)
yields the relevant equation. It is clear that to include the
three independent directions in space will require three The right-hand side of Eq. (63) has the sort of space de-
terms involving spatial derivatives in place of the sin- pendence and time dependence seen for vibrating wires in
gle term in Eq. (56). If these three Cartesian coordinates the laboratory. Larger values of ω yield shorter repetition
are denoted by x, y, and z, then it is necessary to use periods in time, and larger values of k yield shorter repe-
some alternate notation for the wave displacement. De- tition periods in space. Considering that a cosine function
noting the wave displacement by ψ(x, y, z, t), the classi- repeats itself when its phase is increased by 2π , it can be
cal three-dimensional wave equation then can be written concluded that kλ = 2π and ωr = 2π give the basic spatial
as unit λ and temporal unit τ for repetition. These parame-
2 2 ters are called wavelength and period, respectively. Note
1 ∂ ψ
∇ ψ=
2
, (57) that
vp ∂t 2
2π
where ∇ 2 symbolizes the differential operator (∂ 2 /∂ x 2 ) + k= (64)
λ
(∂ 2 /∂ y 2 ) + (∂ 2 /∂z 2 ).
and
2π
B. Solutions to the Classical Wave Equation ω= . (65)
τ
Let us attempt a trial solution for Eq. (56) of the form The temporal frequency ν is the reciprocal of the period
τ , or
y = C exp[i(kx − ωt)], (58)
1
where, at the moment, C, k, and ω are unspecified ν= , (66)
τ
constants, perhaps even complex numbers. Substituting so that
Eq. (58) into Eq. (56) gives a condition on the constants
ω = 2π ν. (67)
ω and k in terms of v p :
Here, the explicit assumption was made that the constants
ω2 ω and k are real. For wave motion, this is consistent with
k2 = . (59)
v 2p the physical interpretations relating these parameters to
The complex form of the trial solution is troublesome the temporal frequency and the reciprocal of the spatial
for those who are more concerned with physical phenom- periodicity.
ena than with mathematics, since the displacement is a real
quantity. In point of fact, the mathematical solution, in it- C. Phase Velocity of Waves
self, does not have any physical interpretation associated
with it. To be able to clothe the mathematical solution with It is a universal property of wave motion that
physical content, it is often necessary to restrict somewhat ω 2π ν λ
the range of solutions that can be accepted. = = λν = , (68)
k 2π/λ τ
The physically meaningful solutions for the present
problem of a vibrating wire are thus given by and this ratio gives the speed vphase of the sinusoidal wave,
where τ is the period for one temporal oscillation. This
[y] physically = {C exp[i(kx − ωt)]}. (60) is readily understood by visualizing a moving sinusoidal
meaningful
subject spatial wave pass by a given point in space, the length λ
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
passing in time τ . The speed of an individual sinusoidal Taking the vector curl of Eqs. (78) and (79) yields
wave is called phase velocity. and is denoted by ∂
±ω ∇ × ∇ × E = −µ0 (∇ × H) (80)
vphase = . (69) ∂t
k ∂
∇ × ∇ × H = ε0 (∇ × E). (81)
Therefore, Eq. (55) states that the quantity v p = (T /ρ)1/2 ∂t
is the speed with which the sinusoidal wave moves along Applying the vector identity
the wire. The phase velocity may depend upon frequency
or wavelength in some instances. Note that in the present ∇ × ∇ × V = ∇(∇ · V) − ∇ 2 V, (82)
example, however, the phase velocity depends directly
which is valid for any vector V, to the left-hand sides
upon the square root of the tension in the wire and in-
and simultaneously utilizing the relations ∇ · E = 0 and
versely upon the square root of the mass per unit length of
∇ · H = 0 given by Eqs. (76) and (77) yields the following
the wire, independent of the frequency or the wavelength
relations:
of the wave.
∂
∇ 2 E = µ0 (∇ × H) (83)
∂t
D. Development of the Wave Equation ∂
for Electromagnetic Waves ∇ 2 H = −ε0 (∇ × E). (84)
∂t
It can be readily shown that electromagnetic waves in free Substituting the expressions for ∇ × E and ∇ × H then
space also satisfy a wave equation analogous to Eq. (57) reduces Eqs. (83) and (84)
for transverse waves along a wire and thus have solutions
that are mathematically the same. To develop this wave ∂ 2E
∇ 2 E = µ 0 ε0 (85)
equation, consider Maxwell’s equations of electromag- ∂t 2
netic theory ∂ 2H
∇ 2 H = µ 0 ε0 . (86)
∇ ·D = ρ (70) ∂t 2
∇ ·B = 0 (71) These are vector equations; each represents three scalar
equations, one for each of the three vector components.
∂B By considering any one component of these fields (e.g.,
∇ ×E = − (72)
∂t E x , E y , E z , Hx , Hy , or Hz ) and calling the particular com-
∂D ponent ψ, Eqs. (85) and (86) lead to the three-dimensional
∇ ×H = + , (73)
∂t wave equation
2
where D is the electric displacement vector, B is the mag- 1 ∂ ψ
netic induction vector, E is the electric field vector, H is ∇ 2ψ = (87)
c2 ∂t 2
the magnetic field vector, is the electric current density
vector, and ρ is the electric charge density. In free space, where, in the present problem, the phase velocity c is de-
the linear relations termined from the physical constants µ0 and ε0 in ac-
cordance with c = (µ0 ε0 )−1/2 . This is the propagation
D = ε0 E (74) velocity for electromagnetic waves in free space. Sub-
B = µ0 H (75) stituting the values ε0 = 8.854 × 10−12 F/m and µ0 =
4π × 10−7 henry/m (H/m) gives the speed of light in free
are applicable, where ε0 is the electric permittivity of free space as c = 2.998 × 108 m/s.
space and µ0 is the magnetic permeability of free space. Equation (87) has particularly simple solutions depend-
In the absence of free charge and with no electric current ing on position r and time t. For example, plane waves
density, the following relations are obtained: represented by A cos(k · r − ωt + α) satisfy the equation,
where A, k, ω, and α are constants. The magnitude of the
∇ ·E = 0 (76)
vector k determines the wavelength λ, so that
∇ ·H = 0 (77) 2π
1/2
λ= = 2π k x2 + k 2y + k z2 . (88)
∂H |k|
∇ × E = −µ0 (78)
∂t The wave is traveling at phase velocity c = ω/|k| in the di-
∂E rection of k. The amplitude A of the wave is the maximum
∇ × H = ε0 . (79)
∂t value of the transverse field component represented by ψ.
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
E. Dispersion Relations for Waves boundary conditions as time progresses, but two such so-
lutions having the same magnitude k but differing in sign
The ω versus k relation for any wave is called the disper-
of k, with properly chosen phases, can be superimposed
sion relation for that wave. The dispersion relation always
to yield a solution that can satisfy the condition. For ex-
gives the magnitude of the phase velocity vphase of the wave
ample, the superposition solution y = D[sin(kx − ωt) +
in accordance with vphase = ω/k. With this generalization,
sin(kx + ωt)] = 2D sin(kx) cos(ωt) satisfies the bound-
Eq. (57) can be viewed as a more general equation for
ary condition at x = 0; the boundary condition at x = L
one-dimensional classical wave motion. The same form of
is satisfied if the wavelength is chosen to have any value
the equation holds for electromagnetic wave propagation
λ = L/n, where n can be any positive integer. The am-
in free space, according to Eq. (87). The phase velocity
plitude of this wave is 2D. The nodes (y = 0) and ex-
is then the speed of light, and the appropriate dispersion
tremum values (y = ±2D) of the wave do not change po-
relation is
sition x in time t, so this solution is called a standing
ω = ck. (89) wave.
On the other hand, for an infinitely long wire (or an
Light propagates in a dielectric medium according to es- unbounded medium for electromagnetic wave propaga-
sentially the same wave equation as that for free space. In tion), there may be no constraints on the displacement
that case, however, the phase velocity depends upon the at any particular position, so the solution described by
refractive index n of the medium, so that Eq. (63) may be quite acceptable. The particular values of
c position x for which the wave has nodes and extremum
vphase = , (90)
n values change with time t. This wave is not stationary
where n usually varies with the wavelength. The action of a and thus is called a running wave. For positive values
prism in dispersing the colors from incident white light, for of ω and k, Eq. (63) shows that a given value of the
example, is primarily due to the fact that the phase velocity phase is maintained as time increases if one focuses on
is color dependent. Red light travels approximately 0.5% an observation point x that moves along the x-axis lin-
faster than blue light in fused quartz. Both colors travel in early with t. Thus, the phase velocity ω/k is said to be
fused quartz at approximately two-thirds the speed of light constant and the wave moves in the positive x-direction.
in free space, but since the speed of blue light is decreased If k is negative but ω is again chosen to be positive, as
slightly more than the speed of red light as the white light is conventional, then the phase velocity is negative and
enters the quartz from the vacuum, the blue light is bent the wave moves in the negative x-direction. The stand-
more toward the normal direction than the red light. The ing wave constructed to satisfy the fixed-boundary condi-
colors thus become dispersed as the white light enters the tions situation can be viewed simply as the superposition
prism at an angle with respect to the normal to the sur- of two equal-amplitude running waves having the same
face of the prism. The dispersion relation for this situation frequency and wavelength but moving in opposite direc-
is tions. Considerations of more complicated superpositions
ck follow.
ω = vphase k = , (91)
n(k)
where n explicitly depends on k, denoting a wavelength- G. Superposition Solutions
dependence since λ = 2π/k. This example also will be An arbitrary superposition of solutions to the linear, time-
useful in understanding group velocity in the development dependent wave equation considered above also satisfies
that follows. the same equation. There is no way for the terms to mix
as products if the equation is linear, so the terms for each
solution can individually add to zero. The superposition
F. Boundary Conditions
could be written as a sum of terms having different weight-
The next consideration is the type of boundary conditions ing coefficients D j , so that
that are imposed by the physical situation. If the endpoints
of a wire of length L are fixed at positions x = 0 and x = L, y(x, t) = D j cos(k j x − ω j t + j ), (92)
j
then the fact that the displacements must be zero at these
points leads to the requirement that the wave must have where each ratio ω j /k j has the appropriate value of vphase .
stationary nodes at these points. A similar type of bound- The superposition could yield a shape for y(x) at a given
ary condition holds for electromagnetic waves trapped in a time t that differs markedly from any of the sinusoidal
highly conducting metal box, since the metal walls cannot components. If each component moves in the same di-
support an electric field. Equation (63) cannot satisfy such rection at the same speed, corresponding to a value of v p ,
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
which is a fixed constant, then motion would be anticipated Considering any reasonable shape for χ (k), the super-
but not distortion of shape. In wave packets considered position can be written
below, it may happen that different component waves have ∞
different phase velocities. That is, the phase velocity may ψ(x, t) = χ (k) exp[i(kx − ωt)] dk. (93)
−∞
be a function of the frequency (or wavelength). Variations
in the phase velocity can lead to a change in shape, since This form has the character of a complex Fourier integral.
phase relationships among the individual components Alternately, the same problem could be treated in terms
continually change if they are traveling at different of integrals involving the real functions, sine and cosine,
speeds. with the real argument (kx − ωt) replacing the imaginary
argument [i(kx − ωt)] of the exponential form. To obtain
general properties of the superposition, no specific choice
H. Group Velocity of Waves is made regarding the functional form of χ (k), except that
It is known from the theory of Fourier series and Fourier it be chosen to be moderate in functional behavior and
integrals that the superposition of sinusoidal waves can smooth. In general, it is complex, with the roles of the
yield almost any physically reasonable periodic or nonpe- real and imaginary parts being exactly the same as the
riodic function. Thus, by superposing solutions of differ- roles of the real and imaginary parts of C = D exp(i)
ent wavelengths, nearly any function shape can be gen- utilized in Section B—namely, to supply both amplitude
erated at any given time. The superposition of waves and phase information. In this situation, both are supplied
bearing a harmonic relationship to one another yields a as a function of k. Let us assume that the dispersion rela-
spatially periodic function, whereas the superposition of tion ω versus k is known for the wave in question in the
waves having amplitudes over a continuous narrow band neighborhood of k0 , where the components are presumed
of frequencies yields a spatially localized function. The to have larger amplitudes. These conditions are then suf-
waves in the superposition may or may not have the same ficient to use a Taylor series expansion of χ (k) on k about
phase velocity. Consider, for example, the case of electro- the value k0 :
magnetic wave motion in dielectrics, where the physical dω(k)
properties of the medium determining the wave speed are ω(k) = ω(k0 ) + (k − k0 )
dk k=k0
frequency- or wavelength-dependent. Whether or not the
shape changes over time, it will generally move. The prob- 1 d 2 ω(k)
+ (k − k0 )2 + · · · . (94)
lem to address now is the evaluation of the speed of motion 2! dk 2 k=k0
of the shape.
Substituting the first two terms of this expansion yields the
To simplify the problem, let us confine our attention to
following result for the wave packet under consideration:
a narrow band of wavelengths, corresponding to a narrow
∞
range of k values centered about some central value k0 .
For example, a pulse of light having a certain shape could ψ(x, t) = χ (k) exp i kx − ω(k0 )
−∞
have a range of wavelengths closely centered about the
green 5461 Å line of mercury. If the amplitude is a contin- dω(k)
uous function of the wavelength over the band, in contrast + (k − k0 ) t dk. (95)
dk k=k0
to the superposition of a finite number of discrete wave-
length components, then an amplitude function χ (k) can In terms of the defined quantities
be defined that peaks at k0 and has very low values outside
dω(k)
the narrow band of interest. It is known from the theory of vgroup = (96)
dk k=k0
Fourier integrals that the shape of χ (k) as a function of k
depends directly upon the position-dependent shape of the β0 = ω(k0 ) − k0 vgroup (97)
pulse. If ψ(x, 0) denotes the pulse in space at time t = 0,
Eq. (95) becomes
then a Fourier inversion directly yields χ (k). It can be ∞
shown that the widths of the two functions are related in- ψ(x, t) = χ (k) exp{i[k(x − vgroup t) − β0 t]} dk.
versely: the broader χ (k) is, the more narrow ψ(x, 0) will −∞
be, and vice versa. This is a manifestation of a relationship (98)
between the spread (or uncertainty) in wavelength versus
To find out where ψ is peaked in space at any given t, one
the spread (or uncertainty) in position of the wave packet.
can consider the real and imaginary parts of this expression
This point is crucial for the Heisenberg uncertainty princi-
individually. Writing
ple but is not directly relevant to the present development
for pulse velocity. χ (k) = χr (k) + iχi (k), (99)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
given direction. Since ω is a scalar quantity, ω(k) maps In 1926 Walter Maurice Elsasser suggested that the de
out a type of three-dimensional surface for the dependence Broglie hypothesis could be tested by aiming an elec-
of frequency on direction and magnitude of the k-vector. tron beam at the surface of a crystalline solid to see if
The k-vector represents the direction of propagation of one diffraction spots were obtained in the same way as ob-
component wave of the packet, and the magnitude of the served in X-ray diffraction. Experiments were carried out
k-vector still has the interpretation of being 2π/λ, where shortly thereafter by Davisson and Germer and also by
λ is the wavelength of the component wave in its propaga- G. P. Thomson, so that by 1927, the de Broglie hypoth-
tion direction. The group velocity components in the x, y, esis had been experimentally verified. Thus, the electron
and z directions in space can be obtained by generalizing discovered by J. J. Thomson in 1987 enjoyed its status
Eq. (96), which was derived on the assumption that there as a classical particle for a period of less than 30 years
was only x-motion. A proper way to carry out the deriva- before its schizophrenic nature surfaced. Amazingly, it
tion is to use the Taylor series expansion of ω(k) in three was later confirmed not only that electrons had this dual
dimensions. Three first-derivative terms can be obtained nature but also that other particles, including neutrons
in place of one, and the multipliers of k x , k y , and k z are, and atoms, could be diffracted. Particles somehow have
(y)
respectively, [x − vgroup
(x)
t], [y − vgroup t], and [z − vgroup
(z)
t], the inherent property of being able to experience mutual
where interference even while maintaining such an extremely
high degree of localization in space that they would not
∂ω(k)
vgroup =
(x)
(104) appear to have any physical overlap. What boggles the
∂k x k=k0
mind even more, single particles themselves behave statis-
tically in accordance with the same diffraction probability
∂ω(k)
vgroup =
(y)
(105) distribution.
∂k y k=k0
Because electromagnetic radiation can manifest dis-
creteness properties, with the photon energy related to
∂ω(k)
vgroup
(z)
= . (106) frequency according to
∂k z k=k0
= hν = h ω (108)
The conclusion to be reached is that the group velocity
is a vector with components given by Eqs. (104)–(106). de Broglie reasoned that, in analogy, any wave proper-
These results can be abbreviated by writing ties of particles would also have some associated fre-
quency. Moreover, in accordance with the concept of
vgroup = ∇k ω(k)|k=k0 (107) wave–particle duality and the similarity of particle and
as the general expression for the group velocity for wave electromagnetic radiation in nature, he assumed that the
motion in three-dimensional space. energy–frequency relation for particles would be the same
as the energy–frequency relation for photons given by
Eq. (108).
The term particle generally is utilized to refer to those
V. WAVE–PARTICLE DUALITY
fundamental entities for which the rest mass m 0 is nonzero.
Exactly what the frequency of a particle refers to is not
A. Ideas of de Broglie
known, but it can be imagined that there is some internal
The discovery that particles diffract like waves was one of mode of oscillation.
the most important in physics, since the entire discipline
of quantum mechanics is based on a wave description of
B. Development of the de Broglie Relation
matter. Preceding that discovery, de Broglie carried out
an analysis of the hypothetical wave properties of matter If a particle can ever be localized to any extent in space, as
by employing special relativity theory, with an insight- there is certainly every right to expect, and simultaneously
ful hunch that matter is not all that different in its funda- is to have its experimentaly observed wavelike charac-
mental wave behavior from electromagnetic radiation. His ter described in some fashion by waves, then it is quite
idea was that since radiation had been shown to have both logical to consider a wave packet of the sort treated in
wavelike and particlelike properties, perhaps nonzero rest Section IV.H. The waves forming the packet necessarily
mass “particles” also could manifest both wavelike and would be matter waves, but the diffraction results lead to
particlelike properties. However, different experimental the belief that these waves have the same linear superpo-
conditions might be required for matter to manifest one sition properties of all other familiar types of waves. The
property or the other. De Broglie called his idea wave– waves must be associated with the presence of the par-
particle duality. ticle in some way, so it is reasonable to expect that the
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
For low values of the particle momentum, k is small and But vparticle = dr/dt, so dr = vparticle dt, and the result is
the frequency does not depart very much from the rest-
−d p
mass frequency. This is merely a reflection of the fact that dU (r ) = vparticle dt = −vparticle d p
= mc2 = γ m 0 c2 = h ω predicts that dt
ω −p
= γ, (130) = dp (135)
ω0 m
where γ is the parameter defined by Eq. (19) in Section II This relation cannot be integrated directly because m de-
on special relativity. This might be described by consid- pends upon particle velocity and, hence, upon p. However,
ering the particle as initially created in a zero potential substituting m = /c2 , with given by the usual energy–
energy region with a proper frequency, which it then main- momentum relation of special relativity in Eq. (22), leads
tains always during its lifetime. Any perceived difference to a perfect differential:
in frequency due to motion is then merely due to the rel-
−1/2
ativistic shift in energy with Galilean reference frame. dU (r ) = −(c2 p d p) 20 + p 2 c2
This is analogous to the increase in mass with the par-
1/2
= −d 20 + p 2 c2 . (136)
ticle speed; that is, m = γ m 0 , so = mc2 = γ m 0 c2 . Thus,
= h ω = γ h ω0 , thereby giving ω = γ ω0 in Eq. (130). Integrating from a classical turining point, defined as a
This does maintain the de Broglie premise that a particle position where p = 0 so the potential energy at the point
has a frequency associated with it even when it is station- is the total energy (T ) in classical Newtonian physics,
ary, that frequency being dependent upon the rest mass m 0 we obtain
in accordance with the relation 0 = m 0 c2 = h ω0 .
1/2
U (r ) − (T ) = m 0 c2 − 20 + p 2 c2 . (137)
E. Energy–Momentum Relations for The classical total energy is conserved as long as no rest
Particles with Potential Energy mass is created or annihilated. A new constant (T ) thus
It may be appreciated that developing the dispersion rela- can be defined as
tion ω(k) versus k can be sufficient preparation for eval- (T )
= T( ) + m 0 c2 (138)
uating the phase velocity vphase = ω(k)/k, according to
Eq. (69). Because most applications of quantum mechan- and Eq. (137) can be written as
ics involve particles with some type of potential energy, it
1/2
is necessary to consider this situation. (T ) − U (r ) = 20 + p 2 c2 . (139)
A force F changes the momentum p of a free particle in
This is the energy–momentum relation for a particle hav-
accordance with F = dp/dt, or, equivalently, the change
ing a potential energy. Note from this relation that when
dp in particle momentum is given by F dt. Since the work
the potential energy is zero, T( ) becomes the total rel-
done by a force in moving a particle through a vector dis-
ativistic energy of a free particle. The constant T( ) ,
tance dr is dW = F·dr, the power input = d W/dt from
in lieu of , is the conserved quantity when a potential
the source is F · dr/dt = F · v. In one dimension, = Fv.
energy is included. Let us also define a quantity (K )
In our case, the power to change the motion energy of
the particle comes from the internal potential energy. For )
(K = T( ) − U (r ) (140)
a conservative system, the potential energy change with
position leads to an internal force on the particle
or, alternatively, T( ) = (K ) + U (r ). By employing
F = −∇U (r) (131) Eq. (138), Eq. (140) can be written in the form
which, for one dimension, is simply (K ) = T( ) − m 0 c2 − U (r )
−dU (r )
F= . (132) = T( ) − U (r ) − m 0 c2 (141)
dr
Thus and, by employing Eq. (139), this expression can then be
dp −dU (r ) written in the form
=F= (133)
1/2
dt dr (K ) = 20 + p 2c2 − m 0 c2 . (142)
or, equivalently
Equation (141), in the form
−d p
dU (r ) = dr. (134) ) )
dt T( (
= K + U (r ) + m 0 c2 (143)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
seems intuitive from a scalar-energy viewpoint. This form pho = λpho /n, where λpho is the wavelength in the
with λmed vac med
has its primary usefulness in the low-velocity limit. In the medium and λpho is the wavelength in free space. The
vac
relativistic limit, the best form for T( ) is that obtained photon velocity is still given by the product of photon fre-
from Eq. (139): quency and the appropriate photon wavelenght, whether
1/2 the medium be a dielectric or free space. The momen-
T( ) = 20 + p 2c2 + U (r ) (144) vac
tum ppho = h/λvacpho of the photon in free space changes to
since this form is intuitive from a free-particle viewpoint. ppho = h/λmed
med
pho as it enters the dielectric medium. Thus,
Squaring Eq. (139) and solving for p 2 gives the constant of motion for the photon is the frequency
1 2 ν = ω/2π , with wavelength λ and momentum p being
p 2 = 2 T( ) − U (r ) − 20 . (145) medium-dependent. It is an interesting observation that,
c
Substituting Eq. (141) then gives from the viewpoint of Eq. (142), all energy of the photon
is kinetic.
1 ( ) 2
2
p 2 = 2 K + m 0 c2 − m 0 c2 , (146) One might be tempted to assume, in analogy with the
c photon, that the frequency of a particle located in a purely
( )
which relates the momentum to the quantity K . Rear- conservative potential energy region is a constant of mo-
( )
ranging Eq. (146) to obtain K explicitly in terms of p tion. This would give rise to the picture of a particle enter-
gives ing a region of varying potential energy, where it could
p 2 c2 1/2
be accelerated. Its speed would change with its accel-
( ) 2
K = −m 0 c + m 0 c 1 + 2 4
2
. (147) eration. Its momentum would change, so its wavelength
m0c would change also. However, the total energy of the par-
Employing the binomial expansion then gives the series ticle would not change. Both remaining unchanged, the
approximation frequency would then maintain a direct relationship to the
total energy. The analogy with the behavior of a photon as
p2 p2
(K ) 1− + · · · . (148) it enters a region of varying refractive index would, in this
2m 0 4m 20 c2 way, be exploited to the fullest. However, this might seem
In the low-velocity limit defined by γ 2 1, which requires to do great violence to the earlier conclusion that the fre-
(v/c)2 1 [see Eq. (19)], p m 0 v, and the quantity (K ) quency of the wave associated with a free particle depends
thus reduces in lowest order to the Newtonian expression upon the speed of the particle. In the present case, one
for the kinetic energy would be considering the particle changing speed due to
1 the change in internal potential energy as it moves through
( )
K = m 0v2. (149) the potential energy region, yet one would be proposing to
2
hold its frequency ω to be a constant. Moreover, the con-
stant frequency would not be equal to the proper frequency
F. Dispersion Relations for Particles
ω0 for the particle in the relativistic sense but would have
with Potential Energy
some value that would necessarily be related to U (r ), since
To develop a wave dispersion relation applicable to parti- even the choice of the zero point for measuring U (r ) would
cles having a potential energy, the wavelength is again affect the value of ω.
associated with the particle momentum by means of This strange state of affairs might be viewed as unac-
the de Broglie relation λ = h/ p. Equivalently, p = h/λ = ceptable. The seeming enigma can be resolved, however,
(h/2π )(2π/λ) = h k, in accordance with Eq. (117). in the following way. The total energy of a particle in
According to the de Broglie hypothesis, a frequency ω a potential energy U (r) can be viewed, apart from some
must be associated with the particle energy. Although the constant reference energy, as the energy of a quantum sys-
nature of the frequency of a particle is not yet understood, tem consisting of the particle in question plus the source
a guideline can be the familiar results from the particu- medium responsible for the potential energy. As the par-
lar limiting case of a zero rest-mass quantum particle— ticle moves about in the potential-energy region, its ki-
namely, the photon. One must remain alert, however, to netic and potential energy changes in such a way that the
avoid introducing subtle errors in using the analogy. total energy remains invariant. Thus, an increase in the
The photon has a wavelength λ and a frequency ν. As kinetic energy of the particle comes at the expense of its
the photon travels in free space, it has a speed c = λν. potential energy, so the chemical binding energy of the
When the photon enters a dielectric material medium of system is increased. Similarly, a decrease in the kinetic en-
refractive index n, its speed decreases to c/n but its fre- ergy of the particle means that its potential energy is alge-
quency ν does not change. The photon energy = hν thus braically increased, which corresponds to a decrease in the
does not change. The wavelength decreases in accordance chemical binding energy of the system. The relationship
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
2
between energy and frequency for the particle in the sys- 1 1 1 ∂ T
tem can be postulated to involve the kinetic energy of the ∇ 2 = . (155)
T ∂t 2
particle and the attendant local modification in the bind-
The usual argument for the separation of variables holds.
ing energy of the system. This is consistent with choosing
The right-hand side is a function (at most) of the variable
h ω = (T ) instead of (T ) − U (r ), so that
t, the left-hand side is a function (at most) of the variable
)
T( r. Since the two sides are required to be equal, then neither
ω= , (150) can vary with t or r. Setting the right-hand side equal to
h
the constant and employing the trial solution
)
where (T is defined by Eq. (144).
T (t) = T0 exp(−it) (156)
yields a solution, provided that
G. Phase Velocity of Particles = −2 . (157)
with Potential Energy
Since T0 is unspecified, it can be conveniently chosen to
Taking the frequency of the particle wave from Eq. (150) be unity. The right-hand side of Eq. (155) equals , so
and using the de Broglie relation k = p/ h given by the left-hand side must equal also. Thus, the differential
Eq. (117) leads to an evaluation of the phase velocity equation for (r) is
)
ω hω ( 1 1
∇ 2 = −2 (158)
vphase = = = T . (151)
k hk p
or, equivalently
∇ 2 (r) + 2 (r) = 0. (159)
VI. WAVE EQUATIONS FOR PARTICLES There still remains the task of evaluating the constant .
The approach used is to consider the limiting case of a
A. Master Equations for Particles constant potential energy U (r) = U0 , so that the momen-
with Potential Energy tum p is a constant. There is then no position-dependence
In this section, it is assumed at the outset that the reader of . The quantity 2 is always a constant [cf. Eq. (157)],
already has studied carefully the material presented ear- so that the product 2 is then a constant. As a conse-
lier, especially the fundamentals of classical wave motion quence, Eq. (159) has particularly simple solutions. The
in Section IV and the ideas of wave–particle duality in trial solution
Section V. This section relies heavily on the concepts of = 0 exp(ik · r)
dispersion relations and phase velocity in wave motion
= 0 exp[i(k x x + k y y + k z z)] (160)
covered in those sections.
Utilizing the three-dimensional classical wave equation satisfies the equation, with k being a vector constant. This
[Eq. (57)] and substituting Eq. (151) for the phase velocity solution represents a wave having wavelength 2π/k, with
gives the wave equation for particles k = (k x2 + k 2y + k z2 )1/2 , which is traveling in the direction
2 2 of the fixed vector k. In one dimension (e.g., a particle
p ∂ ∂ 2 moving along the x-direction), the exponential function
∇ 2 = ( )
= (152)
T ∂t 2 ∂t 2 reduces simply to exp(ikx). The k-vector involves the
wavelength for the particle wave and therefore is asso-
or, equivalently
ciated with the particle momentum p in accordance with
1 2 ∂ 2 the de Broglie relation p = h k. The momentum in this
∇ = (153)
∂t 2 case of a constant potential energy is a constant of mo-
tion. Substituting this solution into Eq. (159) gives
with ≡ [ p/(T ) ]2 . It should be noted from Eq. (145)
for p that will depend upon position r, but it does not −k 2 = 2 = 0 (161)
depend explicitly upon the time t. Therefore, the variables so that
2
can be separated in Eq. (152), so a solution can be carried k2 k ( ) 2
out by the separation-of-variables technique. Substituting = 2
= T . (162)
p2
the product trial solution
Since p = h k, this leads to the conclusion that
(r, t) = (r)T (t) (154) T( )
= . (163)
into Eq. (153) and dividing through by ψ yields h
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
However, a time rate of change of the probability density ∇ψ, these two quantities can be subtracted to give the
at any given point in space requires a difference between relation
the particle currents flowing into and out of the differential ∇ · (ψ ∗ ∇ψ − ψ∇ψ ∗ ) = ψ ∗ ∇ 2 ψ − ψ∇ 2 ψ ∗ . (190)
volume surrounding the point in question. The mathemat-
ical statement of this fact is the well-known microscopic The right-hand side of Eq. (190) can be identified with the
equation of continuity factor in the right-hand side of Eq. (187), so that
∂ρ ∗ ∂ψ ∂ψ ∗ h2
= −∇ · J (181) ih ψ +ψ =− ∇ · (ψ ∗ ∇ψ − ψ∇ψ ∗ ).
∂t ∂t ∂t 2m
(191)
where ∇ · J is the divergence of the particle current J at Substituting into Eq. (182) for ∇ · J gives
the point in question.
Equating Eqs. (180) and (181) for ∂ρ/∂t gives the h
∇ ·J= ∇ · (ψ ∗ ∇ψ − ψ∇ψ ∗ ). (192)
relation 2mi
Within an arbitrary constant, then
∂ψ ∂ψ ∗
∇ · J = − ψ∗ + ψ (182) h
∂t ∂t J= (ψ ∗ ∇ψ − ψ∇ψ ∗ ). (193)
that must be obeyed by the quantum mechanical analog of 2mi
the particle current density J. For further development of The arbitrary constant is zero if J = 0 whenever ψ = 0,
this expression, the time-dependent Schrödinger equation as one would expect. Knowledge of the wavefunction
(179) and its complex conjugate can be employed: ψ therefore allows one to calculate the particle current
density J in quantum mechanics. For the case of electrons,
∂ψ h2 2 the charge per particle is −e, so that the charge density
ih =− ∇ ψ + U (r)ψ (183)
∂t 2m follows immediately from
∂ψ ∗ h2 2 ∗ = −eJ.
−i h =− ∇ ψ + U (r)ψ ∗ . (184) (194)
∂t 2m
It is illuminating to apply the relation given by Eq. (193)
Taking the complex conjugate utilizes the relations r∗ = r, to the specific case of plane waves exp(ik · r), which can
t ∗ = t, and U (r)∗ = U (r) due to the fact that the concern be shown to be eigenfunctions of the so-called momen-
is with real positions, real times, and real potential ener- tum operator = −i h ∇. Thus, ∇ψ = (i/ h )pψ. Also
gies. Multiplying the Schrödinger equation by ψ ∗ and its ∇ψ ∗ = (−i/ h )pψ ∗ , so that
complex conjugate by ψ gives
h ∗ i (−i) ∗
∂ψ h2 ∗ 2 J= ψ pψ − ψ pψ
ih ψ∗ =− ψ ∇ ψ + ψ ∗ U (r)ψ (185) 2mi h h
∂t 2m p
= (ψ ∗ ψ + ψ ∗ ψ) = ψ ∗ ψv. (195)
∂ψ ∗ h2 2m
−i h ψ =− ψ∇ 2 ψ ∗ + ψU (r)ψ ∗ (186)
∂t 2m This is simply the product of the particle probability den-
Subtracting Eq. (186) from Eq. (185) gives sity ψ ∗ ψ and the particle velocity v, which is readily in-
terpreted as the particle current density on the basis of
∂ψ ∂ψ ∗ h2
ih ψ∗ +ψ = (ψ∇ 2 ψ ∗ − ψ ∗ ∇ 2 ψ), physical considerations.
∂t ∂t 2m
(187)
B. Piecewise Constant Potential
where, in equating ψ ∗ U (r)ψ with ψU (r)ψ ∗ , the property
Energy Problems
of U (r) is utilized merely as a multiplicative operator, so
that the factors in the product ψ ∗ U (r)ψ commute. The It is worthwhile to work through the details of an illus-
right-hand side of Eq. (187) involves the Laplacian oper- trative example that typifies the quantum treatment of a
ator, so that it is reasonable to expect that perhaps it can particle-beam incident on potential energy steps, rectan-
be expressed as the divergence of some vector quantity. gular potential barriers and wells, and arrays of config-
Taking the divergence of ψ ∗ ∇ψ yields urations that may be useful in understanding currents in
modern solid-state devices. Figure 4 illustrates the three
∇ · (ψ ∗ ∇ψ) = ∇ψ ∗ · ∇ψ + ψ ∗ ∇ 2 ψ, (188)
regions defined by the potential energy function chosen
∗
whereas taking the divergence of ψ∇ψ gives for this example:
∇ · (ψ∇ψ ∗ ) = ∇ψ · ∇ψ ∗ + ψ∇ 2 ψ ∗ . (189)
0 (x < 0)
Recognizing that the dot product of two vectors such U (x) = U0 (0 ≤ x ≤ L). (196)
W
as ∇ψ · ∇ψ ∗ is commutative, so that ∇ψ · ∇ψ ∗ = ∇ψ ∗ · (x > L)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
∗
B B
=
A A
2 2 2 2 2 2 2 2
1 − βk 1 + β + 1 + βk 1 − β + e −2iβ L 1 − βk 1− β + e2iβ L 1 − βk 1− β
= 2 2 2 2 2 2 2 2
β β −2iβ β β
1+ k 1+ β + 1− k 1− β +e L 1− k 1− β +e 2iβ L 1− k 1− β
(227)
or, equivalently
{[1 − (β/k) + ( /β) − ( /k)]2 + [1 + (β/k) − ( /β) − ( /k)]2 + 2{cos 2β L}[1 − (β/k)2 − ( /β)2 + ( /k)2 ]}
=
{[1 − (β/k) + ( /β) + ( /k)]2 + [1 − (β/k) − ( /β) + ( /k)]2 + 2{cos 2β L}[1 − (β/k)2 − ( /β)2 + ( /k)2 ]}
(228)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
This rectangular barrier solution contrasts markedly with As before, the wavefunction ψ1 for region I is
the classical result = 1. The cosine factor in the denom-
ψI = ψinc + ψref
inator leads to a decided oscillatory dependence of the
transmission coefficient on energy of the incident parti- = (Aeikx + Be−ikx )e−iωt , (239)
cles, as will be demonstrated later in a graphical example.
where the constants and k have their previously defined
If, in addition, the barrier height U0 is taken to be zero,
values
then β = k and = 1, as expected. This limit is also well
approximated for U0 , since then β k. = h −1 [2m( − W )]1/2 (240)
In the alternate limit, W → U0 , so that → β, and
k = h −1 (2m)1/2 . (241)
Eq. (224) yields
4 k The boundary conditions at x = 0 of continuity of the
= . (231) wavefunction ψI (0) = ψII (0) and continuity of the first
(k + )2 derivative of the wavefunction
This is the transmission coefficient for a step potential of dψI dψII
height U0 for the case > U0 . In this same limit W → U0 , =
dx x=0 dx x=0
Eq. (228) for the reflection coefficient reduces to
lead directly to the following two relations:
(k − )2
= . (232)
(k + )2 A+B = D+E (242)
ik A − ik B = −α D + α E. (243)
D. Incident Beam with Particle Energy
below First Step Only Rewriting this pair of equations in the form
Let us now consider a case for which W < < U0 alge- A+B = D+E (244)
braically. For < U0 , the wavefunction in region II (see −α
A−B = (D − E) (245)
Fig. 4) cannot be of the plane-wave type, because the ki- ik
netic energy would be negative and the momentum con- makes it easy to obtain expressions for A and B in terms
sequently would be imaginary. It is easy to show that the of D and E. Adding the two equations gives
wavefunction
1 α α
ψII = (De−αx + Eeαx )e−iωt (233) A= D 1− + E 1+ (246)
2 ik ik
satisfies the Schrödinger equation
2 and subtracting the two equations gives
h2 ∂ ψ ∂ψ
− + U0 ψ = i h (234) 1 α α
2m ∂x 2 ∂t B= D 1+ + E 1− . (247)
2 ik ik
appropriate for this region, where α is obtained by substi-
tuting ψII into the equation Next, let us apply boundary conditions at the barrier
h2 2 discontinuity at x = L. The conditions of continuity of the
− α ψII + U0 ψII = i h (−iω)ψII , (235) wavefunction ψII (L) = ψIII (L) and continuity of the first
2m
derivative of the wavefunction
which gives
dψII dψIII
h2 2 =
− α = h ω − U0 (236) d x x=L d x x=L
2m
or, since h ω = lead directly to the two additional relations:
1/2
2m De−αL + EeαL = Cei L
(248)
α= (U 0 − ) . (237)
h2
The positive sign is conventionally chosen for α; the −α De−αL + α EeαL = i Cei L
. (249)
choice of a negative sign would simply interchange co- Rewriting this pair of equations in the form
efficients D and E.
Since > W , the solution in region III (see Fig. 4) is De−αL + EeαL = Cei L
(250)
again of the propagating type:
−i
i x −iωt De−αL − EeαL = Cei L
. (251)
ψIII = ψtrans = Ce i( x−ωt)
= Ce e . (238) α
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
Since, for arbitrary θ, cosh(θ ) = 12 (eθ + e−θ ) and sinh(θ ) = 12 (eθ − e−θ ), it follows that cosh2 (θ) =1 + sinh2 (θ ) and
cosh2 (θ) + sinh2 (θ ) = cosh(2θ ). Thus
The transmission coefficient is readily evaluated from this expression for the reflection coefficient:
= 1−
[1 + ( /k)]2 + [1 + ( /α)2 ][1 + (α/k)2 ] sinh2 (αL) − [1 − ( /k)]2 − [1 + ( /α)2 ][1 + (α/k)2 ] sinh2 (αL)
=
[1 + ( /k)]2 + [1 + ( /α)2 ][1 + (α/k)2 ] sinh2 (αL)
4( /k)
= . (260)
[1 + ( /k)]2 + [1 + ( /α)2 ][1 + (α/k)2 ] sinh2 (αL)
If the limit W → 0, then → k and the transmission the rectangular barrier is chosen to have a thickness of
coefficient reduces to that for a rectangular barrier: 10 Å and a height of 10 eV. Figure 5 illustrates the variation
of the transmission coefficient with incident electron en-
4
= ergy. The remarkable oscillatory behavior is due to the
4 + [1 + (k/α)2 + (α/k)2 + 1] sinh2 (αL) wavelike nature of the particle; the peaks coincide with
1 certain relationships between the de Broglie wavelength
= . (261)
1 + (1/4)[(k/α) + (α/k)]2 sinh2 (αL)
This is the transmission coefficient for particles having
energy < U0 through a rectangular barrier of height U0 .
If the further limit αL 1, then sinh(αL) 12 eαL and
the approximate form is
1
1 + (1/4)[(k/α) + (α/k)]2 (1/4)e2αL
16e−2αL 4(α/k) 2 −2αL
= e . (262)
[(k/α) + (α/k)]2 1 + (α/k)2
If the barrier thickness L approaches infinity, then the po-
tential energy becomes a step potential and the transmis-
sion coefficient can be noted from Eq. (262) to approach
zero for the presently considered case of < U0 .
To recapitulate, the remarkable quantum mechanical re-
sult has been derived that particles can, in some cases, pen-
etrate potential energy barriers that are even higher than
the particle energy. This result, which contrasts markedly
with the classical result that = 0, has the greatest rele-
vance for quantum electronic devices. It likewise provides FIGURE 5 Transmission coefficient versus incident-particle en-
the explanation for the decay of radioactive nuclei by α- ergy for a rectangular potential energy barrier. [Fig. 1.35 in
Quantum Mechanics for Applied Physics and Engineering by
particle emission.
Albert Thomas Fromhold, Jr. (Academic Press, Inc., New York,
An example calculation spanning the domains for < 1981; Dover Publications, Inc., New York, 1991); reproduced with
U0 and > U0 , with W = 0 for both cases, has been the permission of Academic Press, Dover Publications, and the
carried out. The value of the electronic mass is utilized, and author.]
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
and the barrier thickness, which can be deduced from this same limit. The continuity in at the limiting energy
Eq. (230). Note that approaches unity whenever of 10 eV can be noted in Fig. 5.
cos(2β L) = 1, which occurs when 2β L = 2mπ (m =
1, 2, 3, . . .). Since β = 2π/λII , the condition is met when
m(λII /2) = L (i.e., whenever there are an integer number E. Incident Beam with Particle
of half wavelengths over the barrier distance L). On the Energy Below Both Steps
other hand, for (2m + 1)(λII /4) = L, which corresponds As a final consideration, let us examine the situation in
to an odd-quarter wavelength over the barrier distance L, which the particle energy is less than the potential energy
then cos(2β L) = −1 and the transmission coefficient goes in region II and in region III. Algebraically, this is the case
through the value 4/[(β/k) + (k/β)]2 . In this odd-quarter when < U0 and < W . At the limit W → U0 , this case
wavelength condition, for β k corresponding to particle reduces to the single step potential. In any event, since
energies not far above the barrier maximum, 4(β/k)2 , region III extends to x = ∞, the probability density in
which is very small in value. Thus, there are transmission region III can be expected to approach zero as x → ∞.
resonances as the particle energy continuously increases The wavefunction for region III thus may be chosen to be
above the barrier height U0 . For the odd-quarter wave-
length condition with the particle energy far exceeding ψIII = H e−γ x e−iωt , (263)
the barrier height, however, β is not too different from k
where
and so is not much less than unity. In short, the depth
of the transmission resonances decreases as increases. γ = h −1 [2m(W − 0 )]1/2 . (264)
It should be pointed out that although the odd-quarter
wavelength condition usually provides a very good Matching the wavefunctions and the first derivatives at
approximation for minimum , it does not provide x = 0 yields the same results for the relationships among
the exact minimum. A quantitative evaluation based on A, B, D, E as given by Eqs. (242)–(247), since the wave-
d /d = 0 for fixed U0 , utilizing Eq. (230), leads to the functions in regions I and II are exactly the same as for
condition the case W < < U0 . However, the matching of the wave-
functions and their first derivatives at x = L is different,
βL
tan(β L) = since ψIII is now different. The result obtained from this
1 + (β/k)2 matching is
and the roots of this equation yield the distinct values of De−αL + EeαL = H e−γ L (265)
for minimum . This transcendental equation can be
solved graphically. The problem is simplified in the limit −α De−αL + α EeαL = −γ H e−γ L . (266)
where (β/k)2 1, since then the roots depend only on
Writing this pair of equations in the form
the energy difference above the barrier, thereby leading to
pure wavelength conditions in the barrier region itself for De−αL + EeαL = H e−γ L (267)
the transmission minima. Further, in the large β L limit, γ
the equation requires tan(β L) to be large, so that the roots De−αL − EeαL = H e−γ L (268)
α
approach values that are approximately the same as the
values leading to the odd-quarter wavelength condition facilitates the algebra. Adding these two equations leads
just discussed. to
Finally, care must be taken not to draw the erroneous 1 γ
D= 1+ H e−γ L (269)
conclusion from a casual inspection of Eq. (230) that 2 α
is unity when β = 0, which corresponds to the par-
ticle energy being equal to the barrier height. A sim- and subtracting these two equations leads to
ilar erroneous conclusion also could be reached from 1 γ
Eq. (261) in the same limit, since then α is zero. Care- E= 1− H e−γ L . (270)
2 α
ful inspection of these two equations reveals indetermi-
nate forms in the denominators of the equations for this Substituting these two expressions for D and E into the
limit. In Eq. (230), for example, the two terms in the group previously derived expressions for A and B—namely,
[(k/β)2 − (k/β)2 cos(2β L)] approach ∞ → ∞ as β → 0,
1 α α
but by noting the equivalence to 2(k/β)2 sin2 (β L), the in- A= D 1− + E 1+ (271)
2 ik ik
determinate form yields 2(k L)2 as β → 0. As given by Eq.
(230), therefore approaches [1 + (k L/2)2 ]−1 as β → 0. 1 α α
B= D 1+ + E 1− (272)
In a similar manner, Eq. (261) can be shown to approach 2 ik ik
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
feature of bound-state problems in quantum mechanics is = −i h ∇, since it can be noted by direct substitution
the fact that the application of realistic boundary condi- that
tions to the time-independent Schrödinger equation, given
by Eq. (169), forces a restriction on the energy values, so ψk (r, t) = (−i h )[∇(ik · r)]ψk (r) = h kψk (r). (281)
that the eigenvalues for the total energy are confined to a
discrete set. This feature is independent of the particular Even though such plane-wave solutions formally do sat-
functional form of the potential energy, although the de- isfy the Schrödinger equation for this problem, they do
tails of the energy-level spectrum are dependent upon the not satisfy the constraint that the probability density be
form of U (x). In the following discussion, three different continuous at the walls of the box, since continuity of the
bound-state problems, for which the potential is given by wavefunction requires the wavefunction to be zero inside
squarewell, harmonic oscillator, and Coulomb potentials, the box at the walls in order to equal the zero wavefunc-
are considered individually. tion exterior to the box. This underscores the vital role
of boundary conditions in quantum mechanical solutions.
As described in Section IV.F, however, oppositely directed
B. Three-Dimensional Potential Energy but equal-amplitude plane waves can be superimposed to
Square-Well Problem give standing-wave solutions that may satisfy the desired
Let us consider a particle of mass m confined to a re- boundary conditions. Let us consider the six infinitely high
gion of space having the shape of a rectangular paral- potential energy barriers delimiting the rectangular paral-
lelepiped. The particle confinement is due to infinite po- lelepiped box to be perpendicular to the x, y, and z axes and
tential energy barriers at the faces of the parallelepiped to be located at x = 0, x = L x , y = 0, y = L y , z = 0, z = L z .
box. This is sometimes referred to as a square-well po- The box confining the particle thus has edges of length
tential because the potential energy rises so sharply (with L x , L y , and L z in the x, y, and z directions, respectively.
infinite slope) at the boundaries of the parallelepiped. If One form of the three-dimensional, normalized standing
the potential energy inside the parallelepiped is taken to waves providing solutions to the Schrödinger equation for
be zero, then the Hamiltonian for the time-independent this problem is given by
and time-dependent Schrödinger equations [φ = φ and 1/2
8 nx π x nyπ y nz π z
ψ = i h ∂ψ/∂t, given by Eqs. (169) and (179)] takes the φn r = sin sin sin ,
V Lx Ly Lz
form = −(h 2 /2m)∇ 2 within the box but is undefined
outside the box, where the potential energy is considered (282)
infinite. Thus, a wavefunction ψ exists for the interior but where n x , n y , and n z are a triplet of positive integers rep-
is zero for the exterior, where the particle cannot pene- resented by the symbol n. This product of sine functions
trate. The time-independent Schrödinger equation for the satisfies the fixed boundary condition that the wavefunc-
interior of the box is formally the same as that for a free tion vanish on six faces of a rectangular parallelepiped,
particle, namely, and direct substitution shows that it satisfies the three-
dimensional Schrödinger equation given by Eq. (278). The
h 2 ∂2 ∂2 ∂2
− + + φk (r) = k φk (r), (278) corresponding quantized energy eigenvalues n thereby
2m ∂ x 2 ∂ y2 ∂z 2 obtained are
which has solutions given by the normalized plane-wave 2 2 2 2 2
h π nx ny nz
spatial functions n = + + . (283)
1/2 2m Lx Ly Lz
1
φk (r) = exp[ik · r], (279) Only the positive integers are chosen for the triplet n x ,
V
n y , n z in the wavefunction given by Eq. (282), since the
where V is the volume of the box. These spatial functions, corresponding negative values yield the same wavefunc-
when combined with the time factor exp[−(i/ h )k t], rep- tion to within a factor of −1. This would represent exactly
resent running waves the same state for all practical quantum mechanical cal-
ψk (r, t) = φk (r) exp[(−i/ h)k t], (280) culation purposes, because the particle probability den-
sity ψ ∗ ψ would be the same. It is a very general aspect
which constitute the stationary-state wavefunctions for the of quantum mechanics that linearly dependent eigenfunc-
particle that satisfy the time-dependent Schrödinger equa- tions are redundant whereas linearly independent eigen-
tion. These traveling-wave eigenfunctions of the Hamil- functions are not.
tonian having the form exp[(i/ h )(p · r − t)] are simul- The standing-wave solutions of the Schrödinger equa-
taneous eigenfunctions of the linear momentum operator tion given by Eq. (282) do not represent states having
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
definite momentum values, but instead are linear com- sical frequency given by the reciprocal of the period is
binations of plane-wave states having oppositely directed ν0 = ω0 /2π = (2π )−1 (K /M)1/2 . Other examples of clas-
momenta. Because the particle undergoes reversals in mo- sical harmonic oscillators also are characterized by some
mentum associated with reflection in the neighborhood of specific frequency (with corresponding angular frequency
the classical turning points, it is not surprising that the ω0 = 2π ν0 ) determined by the physical parameters of the
most appropriate solutions to the problem are not those system in question. The analogous quantum problem has
states that represent definite momentum values. solutions that can be expressed in terms of this classical
Because the standing-wave solutions can be viewed as frequency ν0 .
the superposition of traveling-wave solutions of equal am- The Hamiltonian for the harmonic oscillator problem is
plitude traveling in opposite directions, plane-wave solu- 2
h2 d 1
tions can be utilized for the present problem by means of = − 2
+ K x2 (286)
2m dx 2
a construct known as periodic boundary conditions:
and the time-independent Schrödinger equation given by
φ(x + L x , y, z) = φ(x, y, z) Eq. (169) for this problem is
φ(x, y + L y , z) = φ(x, y, z) (284) 2
h2 d φn (x) 1
− + K x 2 φn (x) = n φn (x). (287)
φ(x, y, z + L z ) = φ(x, y, z). 2m dx 2 2
This represents a substitute for the fixed boundary con- The complete time-dependent solutions ψn (x, t) then are
ditions previously invoked to determine the allowable k given as usual, by taking the product of the spatial func-
values. This is the approach generally used in the develop- tions with the corresponding time-dependent function
ment of the free-electron theory of metals. It has the utility θn (t) = exp[(−i/ h )n t], so that
of simplifying the mathematics a bit while retaining the
i
essential property of the correct number of quantum states ψn (x, t) = φn (x) exp − n t . (288)
h
per unit energy range required for quantum statistical
calculations. In the present case
φn (x) = n e−αx /2
2
Hn (α 1/2 x) (289)
C. The Harmonic Oscillator Potential with
Another very important bound-state potential energy func- mω0
α≡ . (290)
tion is the harmonic oscillator potential h
1 The functions Hn (y) appearing in Eq. (289), with y ≡
U (x) = K x2 (285) α 1/2 x, are the Hermite polynomials. These polynomials
2
illustrated in Fig. 7. For a one-dimensional harmonic are readily generated by means of the differential relation
oscillator consisting of a mass M attached to a fixed dn
spring (with force constant K ), which is set into mo- Hn (y) = (−1)n exp (y 2 ) [exp(−y 2 )]. (291)
dy n
tion on a horizontal; frictionless planar surface, the clas-
Some examples of the lower-order Hermite polynomials
are
H0 (y) = 1
H1 (y) = 2y
H2 (y) = 4y 2 − 2
H3 (y) = 8y 3 − 12y
H4 (y) = 16y 4 − 48y 2 + 12
H5 (y) = 32y 5 − 160y 3 + 120y
H6 (y) = 64y 6 − 480y 4 + 720y 2 − 120
FIGURE 7 Harmonic oscillator potential. [Fig. 1.39 in Quantum
Mechanics for Applied Physics and Engineering by Albert Thomas H7 (y) = 128y 7 − 1344y 5 + 3360y 3 − 1680y. (292)
Fromhold, Jr. (Academic Press, Inc., New York, 1981; Dover Pub-
lications, Inc., New York, 1991); reproduced with the permission The normalization factors N for the eigenfunctions, as
of Academic Press, Dover Publications, and the author.] found from the normalization integral
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
mathematical one, although it ties in very well with the cor- =0 P00 (cos θ) = 1
responding physics since the square of the z-component
=1 P10 (cos θ) = cos θ
of the orbital angular momentum—namely, m 2 h 2 —
cannot exceed the square of the total orbital angular P11 (cos θ) = sin θ
momentum—namely, (+1)h 2 —which, in turn requires =2 P20 (cos θ) = 12 (3 cos2 θ − 1)
|m| ≤ .
The differential equation for R(r ) representing the r - P21 (cos θ) = 3 sin θ cos θ
component of the wavefunction ψ(r, t) contains the quan- P22 (cos θ) = 3 sin2 θ
tum number as well as a third separation constant. The
solutions can be expressed in terms of the associated =3 P30 (cos θ) = 12 (cos θ )(5 cos2 θ − 3)
Laguerre polynomials. The requirement that the solu- P31 (cos θ) = 32 (sin θ )(5 cos2 θ − 1) (308)
tions not diverge in order to have a physically meaningful
probability density ψ ∗ ψ once again places severe restric- P32 (cos θ) = 15 sin θ cos θ
2
tions on the separation constant. This, in turn, requires P33 (cos θ) = 15 sin3 θ
an integer quantum number n, known as the principal
=4 P40 (cos θ) = 18 (35 cos4 θ − 30 cos2 θ + 3)
quantum number, together with the condition n > . In
a straightforward way, this leads to the quantized energy P41 (cos θ) = 52 (sin θ )(7 cos3 θ − 3 cos θ )
eigenvalues P42 (cos θ) = 15
(sin2 θ)(7 cos2 θ − 1)
2
tional form of the Coulomb potential energy) and g(θ, φ) = am Ym (θ, φ). (310)
=0 m=−
Y m (θ, φ) = m (θ ) m (φ) is one of the spherical har-
monics. The normalized spherical harmonics Y m can be The coefficients am in the linear combination are
written in terms of the associated Legendre polynomials determined from
Pm (cos θ ):
∗
am = Ym (θ, φ)g(θ, φ) d (311)
1/2
(2 + 1) ( − m)!
Y m (θ, φ) = Pm (cos θ )eimφ . with the differential d denoting the differential surface
4π ( + m)!
area on a unit sphere d = sin θ dθ dφ, with the limits
(307) of integration being 0 to π for θ and 0 to 2π for φ.
Several of the lower-order associated Legendre polyno- The normalized radial portion of the energy eigenfunc-
mials are tions for hydrogenlike atoms can be written in the form
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
which must be added to the Hamiltonian. Other energy tum numbers (n, , m , and m s ) is required to specify an
terms also can arise, such as the energy of the interaction electronic state. The wavefunction specified by a given set
between the two magnetic moments µorb and µs that is of quantum numbers is called an orbital or, more specif-
designated the spin–orbit interaction energy. ically, an atomic orbital, in analogy with the older Bohr
theory in which electrons were considered to travel in plan-
E. Many-Electron Atoms and Ions etary orbits in accordance with classical mechanics.
The Pauli exclusion principle requires that no two elec-
First of all, it is important to recognize that the potential en- trons have the same set of quantum numbers [viz., the
ergy in a multi-electron atom depends upon the electron– principal quantum number n characteristic of the total en-
electron Coulomb energies as well as on the Coulomb ergy of the electron, the angular momentum (azimuthal)
energy of interaction of each electron with the nucleus. quantum number characteristic of the total orbital an-
Although the electron–electron interaction can be viewed gular momentum of the electron, the magnetic quantum
as a perturbation for purposes of very crude estimates, it, in number m characteristic of the orientation of the magnetic
fact, is much too large to be treated within the framework moment with respect to the z-azis, and the spin quantum
of perturbation theory. In the helium atom, for example, number m s characteristic of the orientation of the electron-
the potential energy involves the Coulomb potential en- spin magnetic moment]. The orbital angular momentum
ergy of interaction of the attractive force between each and magnetic quantum numbers and m for the multi-
electron and the nucleus plus the Coulomb energy of in- electron atom or ion are the same as the quantum numbers
teraction of the repulsive force between the two electrons. and m in the hydrogen atom, since the variables sepa-
The wavefunction ψ then must be taken at minimum as the ration with the more general central potential U (r ) pro-
product of the wavefunctions for each individual electron ceeds in exactly the same way as for the one-electron atom,
in the atom. The problem becomes enormously more com- for which U (r ) = −Z e2 /4π ε0r , thereby yielding the same
plicated to solve, requiring a numerical solution because equations for the (θ) and (φ) factors in ψ(r, t). The
an exact analytical solution cannot be found. Nevertheless, electron-spin quantum number m s = ±1/2 is likewise the
the Schrödinger equation yields numerical results for the same as in the hydrogen atom. The radial equation, con-
two-electron atom problem that agree with experimental taining as it does the generalized central potential U (r ) in-
optical spectra. In this way, the Schrödinger equation pro- stead of simply the electron–nucleus Coulomb potential,
vides the more general theory required to supplant the requires a generalized total quantum number n analogous
more limited Bohr theory. to the principal quantum number n for the hydrogen atom.
The complexity of an exact numerical solution of the One very important difference between the results for
Schrödinger equation naturally increases greatly as the the general central potential problem, in which the po-
number of electrons increases from two three. If only a tential no longer varies as 1/r , and the hydrogen atom
few electrons are involved, as in the case for the lighter problem, in which the potential varies strictly as 1/r , is
atoms, a variational treatment often can be used to obtain the fact that electronic states characterized by different
approximate solutions to the many-electron Schrödinger values of the orbital angular momentum quantum num-
equation. ber with the same total quantum number n generally
For the heavier atoms, wherein a larger number of ele- correspond to different energy eigenvalues, whereas in
crons are involved, the starting point for the most calcula- the hydrogen-atom problem, the energy eigenfunctions
tions is the approximation that the total potential energy for a given n but different values are degenerate. In
of interaction of a given electron with the nucleus and multi-electron atoms, states of lower -value consistent
the other electrons can be represented by a spherically with a fixed n-value lie at a lower energy. The combined
symmetric potential U (r), referred to as the central-field values of and n for a given eigenfunction determine
approximation. In practice, then, one of the most difficult the radial nodes, these being n − − 1 in number. As
parts of the problem is to estimate or calculate the poten- in the hydrogen atom, n must be a positive integer, and
tial. The details are beyond the scope of the present treat- the magnitude of the integer cannot exceed n − 1. An
ment; however, the results justify utilizing one-electron atomic shell is specified by a given value for n, and an
eigenvalues and eigenfunctions as a semantic framework atomic subshell is specified by a given set of values for
for describing the multi-electron atom or ion. Thus, the both n and . Taking into account the two possible spin
product wavefunction quantum numbers m s = ±1/2 and the (2 + 1) values for
−i m [m = −, − + 1, . . . 0, 1, . . . ], one deduces the re-
ψ(r, t) = R(r )(θ ) (φ) exp t (319)
h sult that a given subshell contains 2(2 + 1) degenerate
is chosen to have the same form as that given by Eq. (301) electronic states. In standard spectroscopic notation, the
for the one-electron atom. Once again, a set of four quan- series of shells are denoted by K , L , M, N , . . . .
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
The ground state of a many-electron atom is the one energy levels at one time, so the quantum statistics gov-
in which a sufficient number of electrons populates the erning the behavior of many electrons in the same sys-
orbitals of lowest energy consistent with the Pauli exclu- tem again plays an important role. The treatment of the
sion principle to give a neutral entity. The ground-state periodic-potential problem leads to energy band theory—
configuration of the electrons in an atom is specified by an interesting and important topic, since it is so successful
the number of electrons in each shell. The chemical prop- in explaining the difference between metals, semiconduc-
erties of the different atoms (or elements) are determined tors, and insulators.
principally by the uppermost filled energy levels, since
these higher-energy electrons, being less tightly bound to
IX. ELECTRON TRANSPORT IN SOLIDS
the atomic core, most easily share themselves with adja-
cent atomic cores for the formation of chemical bonds in
A. Failure of Classical Physics for
molecules and solids. If the uppermost occupied shell is
Electrical Currents in Solids
full, there is generally an appreciable difference in energy
between the occupied and next-higher unoccupied state, Another important difficulty encountered in classical me-
and the atom then tends to be chemically inert. chanics is in the area of electron transport in solids. View-
It is standard in spectroscopic notation to give the n- ing condensed matter as merely an agglomeration of hard-
value of a shall as a number and the -value as a low- sphere atoms packed so closely together that they are in
ercase letter, with = 0, 1, 2, 3, 4, . . . being denoted, re- contact, it seems intuitively clear that any particle, how-
spectively, by the letters s, p, d, f, g, . . . . The periodic ever small, while moving through the agglomerate in a
filling of successive shells as Z increases explains the use straight-line motion, would rebound from one or another
of a periodic table for listing the chemical elements. The of the atoms before traveling very far. Even allowing for
number of electrons in a given shell generally is denoted the fact that the atoms in the solid usually order into a
by a superscript. For example, sodium has two electrons in lattice configuration, there still will be very few direc-
the 1s shell, two electrons in the 2s shell, six electrons in tions through the ordered array in which a particle could
the 2 p shell, and one electron in the 3s shell; this ground- travel unimpeded on the basis of this purely classical pic-
state configuration for sodium (Z = 11) would be denoted ture. Despite this, it can be deduced from experimental
by Na: 1s 2 2s 2 2 p 6 3s. The rule that the maximum number measurements that under conditions of very low strain,
of electrons in a shell be 2(2 + 1), with ≤ n − 1, can be very high purity, and quite low temperatures, the con-
consulted in conjunction with this configuration to illus- duction electrons can travel distances involving hundreds
trate that atomic sodium consists of two filled shells (or of atoms without being scattered. Devising an accept-
three filled subshells) containing the core electrons and an able explanation for such easy flow of electrons in met-
outermost partly filled shell containing the single valence als thus constituted a problem that could not be resolved
electron. by means of a mechanics based on a purely classical
The use of one-electron states to characterize the elec- viewpoint.
tronic states of the multi-electron atom is a good illustra- Before delving into the quantum mechanical explana-
tion of how physical models for complicated systems are tion of easy electron transport in metals, let us first ask
constructed. In this section, a simple framework, supple- how it is known that atoms in a solid are actually in con-
mented by the additional requirements of the problem in tact. Next, let us ask how it is known that electrons have
question, has enabled an understanding of a much more such long, mean-free paths in metals for which the atoms
complex problem to be developed. In the present case, the are in contact. Classical radii for atoms may be deduced
simplifying assumption is that of a spherically symmetric in a variety of ways. Scattering experiments initiated by
potential and the additional condition is that of the Pauli Rutherford in the early 1900s provided direct evidence
exclusion principle. The overall approach permits a rudi- that an atom has a tiny, dense nucleus surrounded by a
mentary understanding of the entire periodic table for the cloud of electrons extending for distances of the order of
chemical elements. angstroms (1 Å = 10−10 m). The viscosity of fluids and the
Section IX examines the predictions of the Schrödinger molecular flow of gases also provide some data on atom
equation for an electron in the presence of a periodic po- and molecule sizes. X-ray diffraction by crystalline solids
tential energy. The periodic potential represents another yields lattice distances that are of the same order as the
example of a simple framework used as the basis for mod- atom sizes. Compressibility data for solids lend credence
els of a very complicated physical system. This case relies to the view that forces between atoms increase as a high
on the fact that periodicity in the potential is a common power of the separation distance, as might be expected
attribute of the atom array in a crystalline solid. The prob- from a hard-sphere picture of atoms in contact. These in-
lem of electrons in a solid involves the population of many dications, together with a variety of other types of data,
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
lead us to picture a solid as an array of hard-sphere atoms due to the partial cancellation of wavefronts having ran-
in contact. dom phase with respect to one another; in the periodic
The second point—namely, the existence of the long, array case, propagation of the wavefront becomes quite
mean-free path of electrons in metals—can be deduced possible.
from the simple picture that resistance is due to elec- Even in the periodic case, however, there are situations
tron scattering, coupled with experimental data on the in which propagation is retarded, as when a portion of the
temperature-dependence of the resistivity of metals and wavefront reflected from one plane of the crystalline array
the dependence of the resistivity on purity and crystal- is superimposed upon and has a 180◦ phase difference
preparation techniques. Perhaps the most salient point is with respect to another portion of the wavefront reflected
that the resistivity decreases by many orders of magnitude from a different plane of the array. Such waves interfere
when the purity of the metal is increased and the metal is destructively. Propagation, on the other hand, is enabled
grown as a strain-free single crystal. by a constructive interference of the scattered waves in the
The limiting factor on the electron mean free path in direction of propagation.
metals thus actually appears to depend upon residual im- These facts of classical wave propagation are applica-
perfections, impurities, and grain boundaries in the sample ble immediately to electron propagation in solids once
instead of on atom density. There is hardly any way a clas- it is admitted that electrons have a wavelike character.
sical picture can explain the fact that the ordinary atoms Thus, it can be stated that due to the wavelike properties
that make up the ordered solid themselves provide so lit- of electrons, the perfectly periodic array of atoms in a solid
tle resistance to electron flow. The classical picture of a may not scatter electrons out of their straight-line path. In
pointlike electron scattering in a billiard-ball manner from this sense, the periodic array may be considered to of-
a hardsphere atom fails completely. fer no resistance whatsoever to electron motion, thereby
rationalizing the long, mean free paths for electrons in
single, strain-free crystals of high purity held at low
B. Quantum Mechanics Approach
temperatures.
Quantum mechanics permits a rationalization of the clas- The emergent picture is that electrical resistance is not
sically unexplainable observations just described. Even due to the scattering of electrons by the atoms of the
neglecting the ordinary Coulomb repulsion between elec- periodic array per se but by the departures from period-
trons, there remains a quantum mechanical tendency for icity in the crystalline array. Such departures from peri-
electrons to remain separated. This tendency can be treated odicity are provided by impurities, vacancies, strained re-
within the framework of what is called the Pauli exclusion gions, dislocations, and grain boundaries and also by ther-
principle, which states that no two electrons in a system mal fluctuations of the atom array. Increased scattering at
can have the same set of quantum numbers. Practically higher temperatures due to temperature-dependent ther-
speaking, this requires higher and higher average kinetic mal fluctuations in the lattice can be shown to lead to the
energies for the electrons as the electron density increases. linear temperature-dependence of the resistivity of met-
This explains why adjacent atoms resist electron-cloud als. The residual resistance at extremely low temperatures
overlap, even though the electron cloud otherwise would is due to scattering from the impurity atoms and struc-
be expected to be rather soft and easily deformed under tural defects. A quantum mechanical approach involving
compression, and so accounts for the hard-sphere view of the Schrödinger equation, based as it is on the wavelike
atoms in a crystal lattice. behavior of particles, provides a suitable framework for
The unimpeded motion of electrons moving through a rationalizing and treating these varied contributions to the
lattice of such hard-sphere atoms in a solid can be under- electron resistivity of metals.
stood from the wavelike properties of the electron. Even
classically, it can be shown that the collective scattering
of waves from a periodic array of scattering centers dif- C. Periodic Potential for Crystalline Solids
fers quite dramatically from the scattering of waves from The Schrödinger equation can be applied to describe con-
a random array of scattering centers. The difference be- duction electrons in metals, each conduction electron be-
tween these two situations is that a random array leads to ing considered to be under the influence of a potential
random phases between the scattered wavefronts whereas energy function that has the same periodicity as the lat-
phase coherence between the scattered wavefronts is pos- tice. An illustration of a periodic potential in one di-
sible if the scattering centers are located in a periodic array. mension is given in Fig. 9. If the lattice positions in a
(Indeed, X-ray diffraction by crystalline solids hinges on three-dimensional array are
phase coherence.) In the random-array case, movement
of an incident wave through the array is grossly impeded Rj = j1 d1 + j2 d2 + j3 d3 , (320)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
where
cell = d1 · (d2 × d3 ) (325)
is the volume of a unit cell in the real lattice. In terms of
these vectors, the vectors Gn appearing in Eq. (324) are
given by
Gn = 2π (n 1 b1 + n 2 b2 + n 3 b3 ) (326)
FIGURE 9 Periodic potential. [Fig. 7.8 in Quantum Mechanics for where n 1 , n 2 , and n 3 are integers. The symbol n is used
Applied Physics and Engineering by Albert Thomas Fromhold, Jr. to represent the integer triplet n1 , n2 , n3 . The set Gn maps
(Academic Press, Inc., New York, 1981; Dover Publications, Inc.,
New York, 1991); reproduced with the permission of Academic
out a lattice of points in the same manner that the set Rj
Press, Dover Publications, and the author.] maps out a lattice, but the two lattices are generally quite
different. The vectors Rj are said to map out a real or
direct lattice, whereas the vectors Gn are said to map out
where j1 , j2 , and j3 are integers and d1 , d2 , and d3 are ele- the reciprocal lattice. The vectors Gn are referred to as
mental vectors denoting the basic three units of periodicity reciprocal lattice vectors. The functions exp(iGn · r) can
in a three-dimensional crystalline solid, then a satisfactory be shown to have the properties required of basis functions
potential energy U (r), denoted by V (r) in this special case, for a Fourier series representation of arbitrary functions
has the periodicity requirement having the lattice periodicity.
Once the periodic potential energy is defined, then the
V (r + Rj ) = V (r). (321) Schrödinger equation given in Eq. (322) can be solved by
The time-independent Schrödinger equation given in various methods. One way leading to great insight into
Eq. (169) then takes the form this problem is to assume a general form for the eigen-
functions φ by utilizing a Fourier series description with
h2 2 the periodicity of the entire solid—namely,
− ∇ φ + V (r)φ = φ . (322)
2m
φm = Bm eikm ·r . (327)
The next step is to utilize some mathematical function for m
V (r). This can be assumed to be a simple form, such as
a one-dimensional, periodic step function (the Krönig– This leads to a coupled set of algebraic equations for the
Penney model), or it may be quite complex, as when one unknown coefficients Bm that contain the energy eigenval-
attempts to simulate mathematically the actual potential ues . The vectors km can be obtained similarly to the way
energy that would be sensed by an electron that probes the vectors Gn were constructed. The algebraic equations
different positions within each atom and between atoms so derived constitute a homogeneous set. Self-consistency
in the periodic array. One very general method that lends then requires the determinant of the coefficients of the un-
great insight into the general problem of the motion of knowns Bm to be zero. This determinant, called the sec-
an electron in a periodic solid is to express the potential ular determinant, leads directly to an algebraic equation
energy as a type of Fourier series known as the secular equation for the energy eigenvalues
. Choice of a specific eigenvalue for solution of the
V (r) = Vn eiGn ·r (323) set of algebraic equations containing the lattice potential
n energy coefficients Vn yields the eigenfunction φ corre-
with the amplitudes Vn for the various harmonics chosen sponding to that energy. Repeating the procedure for each
so that the function V (r) reproduces any periodic poten- energy eigenvalue, in principle, will yield the complete set
tial energy of interest. For a lattice in which the basic of energy eigenfunctions for that periodic potential energy.
spatial periodicity vectors d1 , d2 , and d3 are orthogonal, The energy eigenfunctions obtained for a periodic
the vectors Gn turn out to be especially simple in form. For potential energy are known as Bloch functions, after
the more general case of a nonorthogonal triad d1 , d2 , d3 , Felix Bloch (1905–1983). Bloch functions have the gen-
however, it greatly facilitates the problem to define a triad eral form
b1 , b2 , b3 , called the reciprocal lattice vectors φm (r) = u m (r)eikm ·r , (328)
−1
b1 = (cell ) d2 × d3 where u m (r) are functions having the periodicity of the
b2 = (cell )−1 d3 × d1 (324) real lattice. This periodicity condition is
b3 = (cell )−1 d1 ×, d2 u m (r) = u m (r + Rj ). (329)
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
The vectors km are propagation-type vectors for plane- in value between m ∗ and m is a measure of the average
wave functions having wavelengths greater than the basic conduction electron interaction with the periodic potential
unit of lattice periodicity but less than the length of the of the lattice. Although a perfectly periodic lattice does not
crystal in the propagation direction. offer a resistance by scattering the conduction electrons,
A Bloch function for an electron in a solid may be it does offer a resistance to the acceleration of the electron
likened to an individual harmonic of sound in a musical under an applied force (e.g., an electric field) by affecting
cabinet. Bloch functions can be shown to have a number its inertial response to the force.
of very interesting properties, such as completeness of the In considering the population of the various energy lev-
set, linear independence, and orthogonality. els, it is necessary to add in as an essential component
the Pauli exclusion principle—that very soul of quantum
D. Energy Bands and Energy Gaps mechanics that disallows any two electrons to occupy the
same state, the state being denoted by the specification of
The picture that thereby emerges is of groups of closely values for the complete set of quantum numbers, including
spaced, allowed discrete energies that can be populated by electron spin. Once this is woven into the fabric, it must be
electrons, with the groups of allowed levels being sepa- considered how the energy eigenstates are occupied by the
rated by energy ranges called gaps that contain no allowed electrons available for conduction. Under thermal equi-
energy values for conduction electrons. Each group of librium condition at temperatures near absolute zero, the
closely allowed discrete energies is called an energy band. lower-energy states will certainly be occupied. The lower-
Each allowed energy value within a band is characterized energy states in any of the energy bands represent states
by a set of quantum numbers. With the additional con- having low kinetic energies. Because the Pauli exclusion
sideration of electron spin, these are four in number. An principle does not allow more than one conduction elec-
electron in one of the allowed levels characterized by spec- tron to crowd into any lower-energy state, higher-energy
ified values of these quantum numbers travels unscattered states will be populated to the degree required for all con-
by the atoms of the crystal lattice, the straight-line motion duction electrons to be accommodated.
being allowed due to a wavelike propagation through the The requirement by quantum mechanics that all elec-
spatially periodic lattice potential energy. This provides trons be in different states has no analog in a classical
the quantum mechanical explanation of the long, mean- mechanics description. Classically, therefore, there is no
free path for conduction electrons in metals. lower limit to the energy of the conduction electrons; in
As a rule, the electrons in a solid are characterized by fact, in a classical description, the kinetic energy of the
the vector k, which denotes the propagation direction. The conduction electrons decreases to zero as the absolute
k-vector is the analog of the momentum for a particle in a temperature approaches zero. In a quantum mechanical
solid. (In free space, p = h k, according to the de Broglie description, the average kinetic energy of the conduction
relation derived in Section V.B.) The energy of the electron electrons in a metal decreases asymptotically to a still rel-
state is denoted by (k). The goal of solid-state band- atively high value as the absolute temperature approaches
structure calculations is to evaluate (k) for a specified zero.
periodic potential energy. This periodic potential may be
considered to be available at the outset, although the prob- E. Metals
lem is best approached from the standpoint of computing The average kinetic energy for the highest populated en-
the potential self-consistently with the electron states de- ergy band can be estimated by invoking a “particle in a
duced in the calculation. Since = h ω, the function (k) box” model of a metal holding its conduction electrons
obtained by means of a band-structure calculation repre- within the boundary walls, with no consideration given
sents the dispersion relation for electrons in the solid. As to the actual periodic potential energy of the lattice. This
recalled from Section IV.H, the dispersion relation pro- simple approach, known as the free-electron model, of-
vides the basis for determining the group velocity of the ten yields surprisingly accurate quantitative values for a
particle. Thus, from Eq. (107) number of physical properties of metals associated with
1 the conduction electrons. In such cases, a full solution of
vgroup = ∇k ω(k) = ∇k (k). (330)
h the Schrödinger equation for the actual periodic potential
This relation is very useful for obtaining the conduction may not be required.
electron velocity as a function of energy for any particular The kinetic energy of the highest filled state in a given
direction in the crystal. In fact, this approach must be used energy band at 0 Kelvin (K) is designated the Fermi energy.
in lieu of the free-space relation vparticle = p/m because the A computation of how the average energy changes with
inertia of an electron in a crystal is governed by an effec- increases in the thermodynamic temperature of the system
tive mass m ∗ instead of its actual mass m. The difference yields the specific heat of the conduction electrons. The
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
accurate predictions obtained by quantum mechanics for A force for directed electron motion invariably leads to
the specific heat of metals at low temperatures represents the prediction of a nonzero electrical current from a purely
a remarkable success for the theory, that is to be sharply classical viewpoint; however, it does not necessarily lead
contrasted with the total failure on the part of the classical to such a consequence in quantum mechanics. The rea-
approach to provide an adequate quantitative estimate of son is that acceleration of electrons by a force leads to a
this physical property of metals. change in electron momentum and, generally, to an ac-
Quantum mechanics gives great insight into the scatter- companying change in the electron energy. A change in
ing of conduction electrons by imperfections in a metal. electron momentum is synonymous with a change in the
The quantum nature of the scattering of conduction elec- quantum numbers characterizing the occupied electronic
trons places the restriction on the process that scattering state; that is, the acceleration of an electron in quantum
can take place only to vacant quantum states of the system. mechanics is described by the electron vacating the state
This means that at 0 K, an elastic scattering event can occur it initially occupied as it simultaneously enters a different
only for a conduction electron having an energy equal to allowed state, which, by the requisites of the Pauli ex-
the Fermi energy, because that is the only energy at which clusion principle, must necessarily be vacant before any
both filled and empty states simultaneously exist. The sit- occupation can occur. Quantum mechanically, one views
uation is not quite so restrictive at higher temperatures, the electron as being induced to enter a succession of ad-
where there is a statistical probability that nearby states jacent allowed states by electric-field-induced transitions.
are occupied or unoccupied over a range of energy at least This view can be contrasted sharply with the classical pic-
k B T in width in the neighborhood of the Fermi energy F ture of a continuous acceleration of the electron through
[Boltzmann constant k B = 1.38×10−23 J/K; T = absolute a continuous sequence of momentum vectors. For a metal
(Kelvin) temperature]. Nevertheless, electron scattering having a partly filled energy band, there indeed exists the
and, hence, the electrical resistivity are still severely re- requisite sequence of nearby unoccupied states adjacent
stricted in metals by the requirements of the Pauli exclu- to the filled states, so that transitions can be induced by
sion principle. the electric force acting on the conduction electrons.
For the specific case of electrical insulators, consider
the seemingly unlikely situation that there are precisely
F. Insulators
enough conduction electrons to fill every energy state up
After accepting the preceding reasons for the success of to the start of a given energy gap. Both the scattering of
quantum mechanics in describing the properties of the the conduction electrons and the electric-field excitation
long, mean-free path in metals, one then must feel quite of electrons to nearby empty states then are impossible
puzzled when confronted with the experimental fact that at 0 K and, for all practical purposes, nearly impossible
some crystals—even in the limit of high purity, low tem- even at nonzero temperatures for which k B T is far smaller
perature, and perfect periodicity—do not allow the free than the energy gap gap extending to the next-higher
and easy motion of electrons. These materials are called empty energy level. Although zero scattering might seem
“electrical insulators.” to constitute the ideal situation leading to a low (or even
What is it about insulators that causes them to differ so zero) resistivity, there nevertheless is no existing electrical
drastically from metals in the ability to conduct electrons? current in the presently described situation, nor can there
The answer is again quantum mechanical in origin. It is be any induced electric current. There is no existing cur-
hardly more abstruse than the answer to the question of rent because, in such a situation, all electrons occupy pairs
the existence of the long, mean-free path in metals. The of states representing equal magnitude but oppositely di-
solution of the Schrödinger equation for a periodic poten- rected momentum, so that there is no net charge transport.
tial energy in the way previously outlined yields a divi- There can be no induced current because the electric force
sion of the energy scale into interspersed allowable and cannot change the momentum of the electrons in any of
forbidden regions of energy. Over the allowable regions the filled states since there are no nearby unoccupied states
(the energy bands), very closely spaced discrete energy for the transitions. This is the required situation for an
eigenvalues are found, but within the forbidden regions insulator.
(the energy gaps), there are no such energy eigenvalues. In actuality, the seemingly unlikely situation of there
However, this property of the energy-eigenvalue spectrum being exactly enough electrons to fill a band, with none left
characteristic of the periodic potential is, of itself, insuf- over for the next-higher empty band, is not too unlikely.
ficient to explain the basic nature of insulators. As in the The reason for this is again a bit abstruse; in simplest
situation for electron scattering by impurities, allowance terms, an energy band is found to contain one allowed state
must be made for the consequences of the all-pervading per atom in the solid for each degree of freedom of the
Pauli exclusion principle. electron spin. If there are two degrees of freedom for the
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
electron spin (viz., two spin states, designated “up” and ductivity in metals, which is more or less linear with in-
“down”), then the requirement for a filled energy band is creasing temperature, is due to the increase in the thermal
simply that two conduction electrons be furnished by each vibrations of the atoms of the lattice at higher tempera-
atom in the crystal. Since the valence electrons become the tures. Thermal vibrations yield greater departures of the
conduction electrons in a solid, this is not an improbable atom array from perfact periodicity, thereby leading to
condition. more random scattering of the electrons and a consequent
decrease in the electron current.
The exponential increase in conductivity with tempera-
G. Semiconductors
ture described for excitation of electrons across an energy
The energy-level distribution for a solid can be quite a bit gap in an intrinsic semiconductor is paralleled in another
more complex than just described, since there is the pos- type of solid, called an extrinsic semiconductor. In ex-
sibility of overlapping energy bands (i.e., energy bands trinsic semiconductors, however, there is no band-to-band
unseparated by the usual energy gap). In addition, there excitation. Instead, the source of electrons for the empty
can be energy gaps that are quite small relative to labo- band is a doping concentration of impurities that have
ratory values of k B T . It can be readily appreciated that outer electrons at energies just below the empty band (so-
overlapping bands promote metallic conduction or else called donor impurities, or simply donors) or, alternately,
can lead to what is known as a semimetal, whereas narrow empty levels at energies just above the filled band (so-
energy gaps can lead to what is called a semiconductor. In called acceptor impurities, or simply acceptors). Semicon-
semiconductors, an increase in temperature leads to more duction then takes place by means of electrons donated to
excitation of electrons from the highest-energy filled band the empty band by donor impurities or, alternately, by elec-
across the gap to the adjacent empty band, thereby yielding tron holes created in the filled band as a result of electrons
electrons capable of conducting in what otherwise would accepted from that band by the acceptor impurities.
be an empty band. Those electrons excited across the gap
leave behind empty states (called electron holes) in the oth-
H. Superconductors
erwise filled band. These empty states also can promote
conduction in the following sense. Filled states nearby in It is another well-known experimental fact that a number
momentum and energy to the newly provided empty states of metals go into a state of zero resistance at very low
can undergo transitions to the empty states by means of temperatures, below the so-called transition temperature
electric-field excitation, all such transitions being impos- characteristic of the material in question. The concept of
sible in the 0 K equilibrium situation where all states in energy gaps likewise turns out to play an important role in
the band are filled. In this way, two distinct carrier types understanding this incredible phenomenon, although in a
are simultaneously provided—namely, electrons in an al- different way from that just described for semiconductors.
most empty band and electron holes in an almost filled The energy gap in the case of superconductors is attributed
band. to a condensation of the electrons carrying the charge into
The number of electrons excited across the energy gap so-called Cooper pairs, the binding energy of the pairs
increases nearly exponentially with increasing tempera- being attributed to indirect Coulomb-force-induced inter-
ture. To the extent that electron transport increases with the action between electrons as mediated by the intervening
number of carriers, the conductivity of the semiconduc- ion cores on the lattice sites in the metal.
tor increases almost exponentially with the temperature. Let us consider, for example, pairs of electrons passing
A material having the properties just described is called one another while traveling in opposite directions through
an intrinsic semiconductor. the lattice of ion cores surrounded by the attendant elec-
The prediction of the experimentally observed exponen- tron clouds. One electron exerts a force on the nearby
tial increase of conductivity with increasing temperature ionic lattice, which responds to that force. The resulting
in intrinsic semiconductors represents another triumph of disturbance in the periodic potential sensed by the second
quantum mechanics. In a classical description, there are electron of the pair can lead to an effective lowering in the
no energy gaps and, hence, no parallel to predictions of total energy relative to the situation of a rigid, nonrespond-
an exponentially increasing conductivity due to excitation ing lattice. The energy lowering is the greatest for pairs of
across an energy gap. electrons having equal magnitudes but oppositely directed
The exponential increase of conductivity with temper- momentum values, so Cooper pairs are characterized by
ature in semiconductors also contrasts markedly with the two electrons having this property. The lowering of energy
temperature-dependence of the conductivity of metals. In leads to a superconducting energy gap. The energy gap so
that case, the conductivity decreases (instead of increas- produced is quite small, so that very low temperatures
ing) with increasing temperature. This decrease of con- are usually required for the pairs to remain unbroken by
P1: GNH Final Pages
Encyclopedia of Physical Science and Technology EN013A-626 July 26, 2001 19:40
thermal fluctuations. The temperature range of supercon- systems. Heisenberg and Schrödinger played prominent
ductivity has been greatly extended, however, by the dis- roles in the development of this theory, which proves to be
covery of the so-called “high temperature superconduc- the only formulation adequate for the microscopic domain
tors” initiated with a series of compounds based on a of nature. Heisenberg stressed the importance of includ-
copper oxide matrix. ing physical observables and experimental observations
As the temperature is reduced through the supercon- of optical spectral lines in his formulation. Schrödinger
ducting transition, one speaks of the condensation of the based his work on a differential equation for the wave-
electron system into the paired state. It is evident that like behavior of small mass particles that includes the
electrons with oppositely directed momentum values will possibility of constructive and destructive interference of
become separated spatially in a short time, so the pairing waves presumably associated with the presence of a par-
process must be statistical in nature. Pairs must continu- ticle. Inherent in quantum mechanics is the germ concept
ally exchange partners (as required, e.g., in some forms of that accurate predictions of future trajectories of parti-
folk dancing). cles and the time evolution of a system involving one or
The dance of the conduction electrons, while main- more particles are at best statistical, involving a range of
taining this property of electron pairing, is a many-body possibilities specified exactly only in terms of precise val-
problem of some complexity. The wavefunction for the ues for the relative probabilities. This precludes the de-
entire system of electrons as a unit must be considered, terministic prediction of an exactly specified future path,
not merely the single-particle wavefunctions individually. regardless of how accurately the initial conditions of the
The establishment of a net electric current requires a suit- system are specified. Also inherent in the theory is the
able modification in the zero-current wavefunction for the impossibility of exactly measuring even the initial con-
system. ditions, such as initial position and initial linear momen-
The importance of electron pairing for electrical resis- tum, due to some inherent uncertainty in the value of one
tance is that the scattering of conduction electrons in the of these variables following determinations of the values
paired state will be ineffective in randomizing the net elec- of the other variables to some specified degree of preci-
tron momentum. Thus, there can be a zero resistivity state sion. The philosophical implications of an inherent uncer-
as long as Cooper pairs exist in the system. tainty in quantum mechanical predictions, contrasted with
Thermal fluctuations can break Cooper pairs to yield the absolute determinism inherent in classical mechani-
electrons in the normal state. The breaking of electron cal predictions, initially led many (including Einstein) to
pairs by this means is tantamount to an excitation across doubt whether the discipline had any fundamental merit
the energy gap. In contrast to the case of semiconductors, beyond that of being an elegant and elaborate computa-
the excitation across the energy gap in the present instance tion tool for obtaining predictions of a statistical nature. To
leads to an increase in the resistivity. Raising the temper- date, no better theory for the microscopic world has been
ature of a superconductor through the superconducting developed.
transition temperature means that the thermal fluctuations
become so large that essentially no Cooper pairs remain
in the metal to provide superconductivity. SEE ALSO THE FOLLOWING ARTICLES
Physics and Engineering,” Academic Press, New York; Dover Publi- Mandl, F. (1957). “Quantum Mechanics,” Butterworths Scientific Pub-
cations, New York. lications, Stoneham, MA.
Heisenberg, W. (1930). “Physical Principles of Quantum Theory,” Dover Pauli, W. (1973). “Wave Mechanics,” Pauli Lectures on Physics, Vol. 5.
Publications, New York. MIT Press, Cambridge, MA.
Ikenberry, E. (1962). “Quantum Mechanics for Mathematicians and Pauling, L., and Wilson, E. B., Jr. (1935). “Introduction to Quantum
Physicists,” Oxford University Press, New York. Mechanics,” McGraw-Hill, New York.
P1: ZBU/GRD P2: GQT Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
Quantum Optics
J. H. Eberly P. W. Milonni
University of Rochester Los Alamos National Laboratory
I. Introduction
II. Induced Atomic Coherence Effects
III. Radiation Coherence and Statistics
IV. Quantum Interactions and Correlations
V. Recent Developments
409
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
emission and single photon absorption to the highly non- theory of light amplification and laser action; and mani-
linear processes induced by laser fields and has connec- festations of nonlinearity, bistability, and chaos in optical
tions with laser physics, nonlinear optics, quantum elec- contexts. A wide variety of quantum optical phenomena
tronics, quantum statistics, and quantum electrodynamics. that bear on one or another of these issues are now known
and widely studied.
I. INTRODUCTION
B. The A and B Coefficients of Einstein
A. Central Issues of Quantum Optics The second half of the twentieth century saw remarkable
Planck’s quantum, announced to the Prussian Academy on advances in our understanding of light, of its generation,
October 19, 1900, as a solution to the blackbody puzzle propagation, and detection. The laser is one manifesta-
reopened the wave–particle question in optics, a question tion of these advances. Lasers generally depend on the
that Fresnel and Young had settled in favor of waves al- quantum mechanical properties of atoms, molecules, and
most two centuries earlier. Planck’s quantum could not be solids because quantum properties determine the ways that
confined to light fields. Within three decades, all of particle matter absorbs light and emits light. Conversely, the prop-
mechanics had been quantized and rewritten in wave me- erties of laser beams have made optical studies of quantum
chanical form, and wave–particle duality was understood mechanics possible in a variety of new ways. It is this in-
to be both universal and probabilistic. terplay that has created the field of quantum optics since
Quantum optics is fundamentally concerned with co- about 1960.
herence and interference of both photons and atomic prob- From a different historical perspective, however, quan-
ability amplitudes. For example, it provides one of the tum optics is much older than the laser and even older than
main avenues at the present time for detailed study of quantum mechanics. The quantum concept first entered
wave–particle duality. The central issues of quantum op- physics in 1900 when Planck invented the light quantum
tics deal with light itself, with quantum mechanical states to help understand black body radiation. The understand-
of matter excited by light, and with the process of inter- ing of other quantum optical phenomena, such as the pho-
action of light and matter. toelectric effect, first explained by Einstein in 1905, was
Questions arising in the description of a single atom and well underway almost two decades before a quantum the-
its associated radiation field, as the atom makes a transition ory of mechanics was properly formulated in 1925 and
between two energy states and either emits or absorbs a 1926 by Heisenberg and Schrödinger. Indeed, these early
photon, are among the most central questions in quantum developments in quantum optics played an essential role
optics. Observations of individual optical emission and in the first quantum pictures of atomic matter given by
absorption events are possible, and the interpretation of Bohr and others in the period from 1913 to 1923.
such observations is at the heart of quantum theory. Only two parameters are needed to understand the in-
Various elements of these central considerations are to teraction of light with atomic (and molecular) matter, ac-
a degree independent of each other and are understood cording to Einstein. These two parameters are called A and
separately. Among these are (1) the probability that an B coefficients. These coefficients are important because
atomic electron occupies one or another state and the rate they control the rates of photon emission and absorption
at which these occupation probabilities change, (2) the processes in atoms, as follows. Let the probability that a
statistical nature of the photons emitted during transitions, given atom is in its nth energy level be written Pn . Sup-
(3) correlations between atom and photon states, (4) the pose there are photons present in the form of radiation
characteristic parameters that control the light–matter in- with spectral energy density (J/m3 Hz) denoted by u(ω).
teraction, and (5) the intrinsically quantum mechanical Then the rate at which the probability Pn changes is due
features of the atom’s response to the radiation. to three fundamental processes:
Quantum optics also concerns itself with problems that (dPn /dt)absorption of light = +Bu(ω)Pm (1a)
grow out of these central considerations and whose an-
swers can be expressed within the conceptual frame- (dPn /dt)spontaneous emission of light = −APn (1b)
work established by the central problem. Areas related (dPn /dt)stimulated emission of light = −Bu(ω)Pn . (1c)
in this way to the core of quantum optics deal, for exam-
ple, with correlated many-atom light–matter interactions; Here Pm is the probability that the atom is in a lower level,
near-resonant transitions among three and more states of the mth, which is related to the nth through the energy re-
an atom or molecule or solid; optical tests of quantum elec- lation E n − E m = hω, where h = h/2π and h is Planck’s
trodynamics and measurement theory; multiphoton pro- famous quantum constant. Einstein’s great insight was to
cesses; quantum limits to noise and linewidth; quantum include stimulated emission [Eq. (1c)] among the three
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
elementary processes, in effect, to recognize that an atom In these formulas we have separated the factor 1/4π ε0 =
in an upper energy state could be encouraged by the pres- 8.9874 × 109 N m2 /C2 to display A and B in atomic units
ence of photons [the existence of u(ω)] to hasten the rate as well as SI units, and D denotes the quantum mechan-
at which it would drop down to a lower state. ical “dipole matrix element” associated with the m → n
The three contributions to the rate of change of Pn transition under consideration.
shown in Eqs. (1a–c) can be added to make an overall The values of these important coefficients can be ob-
single equation for the total rate of change of Pn : tained for transitions of interest in quantum optics by as-
suming that the dipole matrix element is approximately
dPn /dt = +Bu(ω)Pm − APn − Bu(ω)Pn . (2a)
equal to the product of the electron’s charge and a “typi-
Einstein applied this equation to an examination of cal” electron displacement from the nucleus. Thus, we
blackbody light. He showed that the steady-state solution take ω = 2π ν and e and h from Table I and D = er ,
with r equal to about 1 to 3 Å (1–3 × 10−10 m). In this
Pm /Pn = 1 + A/Bu(ω) (2b)
case the values are A ≈ 108 s−1 and B ≈ 1022 m2 /J s2 ,
implies the validity of Planck’s formula for u(ω): respectively.
hω3 1 The advantage of Einstein’s approach, and the reason
u(ω, T ) = , (3) it still provides one basis for understanding light–matter
π c exp{hω/kT } − 1
2 3
interactions, is that it breaks the interaction process into its
and the value of the prefactor is just the ratio A/B: separate elements, as identified above in Eqs. (1a–c). To
A hω3 repeat this important identification, these processes are (a)
= 2 3, (4) absorption, (b) spontaneous emission, and (c) stimulated
B π c
emission.
where k is Boltzmann’s constant and T the temperature
However, it must be pointed out that Einstein’s formu-
in degrees Kelvin. Table I contains the values of physical
las are not universally valid, and Eqs. (1) and (2) can be
constants used in evaluating various radiation formulas.
seriously misleading in some cases, particularly for laser
For typical optical radiation, the value of the funda-
light. Laser light typically has a very high spectral energy
mental ratio A/B is approximately 10−14 J/m3 Hz. The
density u(ω). In this case different formulas and equations,
corresponding intensity, namely c A/B, is approximately
and even entirely different concepts with their origins in
3 × 10−6 J/m2 , or 6π × 10−6 W/m2 per Hz of bandwidth.
wave mechanics, may be required.
The value of the spectral intensity of thermal radiation
A large body of experimental evidence has accumu-
is usually many orders of magnitude lower than this be-
lated since 1960 showing that many aspects of the inter-
cause of the second factor in Eq. (3). At optical wave-
action between light and matter depend on electric radia-
lengths, the second factor is much smaller than one for all
tion field strength E directly, not only on energy density
temperatures less than about 5000 K.
u ≈ E 2 . Just those aspects of the light–matter interaction
After the development of a fully quantum mechanical
that depend directly on E also depend directly on quantum
theory of light by Dirac in 1927, it was possible to give
mechanical state amplitudes ψ, not only on their asso-
expressions for A and B separately:
ciated probabilities |ψ|2 . Issues of coherence and inter-
1 4D 2 ω3 ference of both radiation fields and probability ampli-
A= (5)
4πε0 3 hc3 tudes are fundamental to these studies. It is principally
the experiments and theories that deal with light and mat-
1 4π 2 D 2 ter in this domain that make up the field of quantum
B= . (6) optics.
4πε0 3 h 2
level probability between levels 1 and 2. They have no For many purposes in quantum optics the semiclassi-
steady state. Figure 2 shows graphs of P2 (t). One al- cal dipole variables u and v, and the atomic inversion
ready sees, therefore, that the quantum amplitude equa- w, are the primary atomic variables. They obey equa-
tions [Eqs. (12a–d)] make strikingly different predictions tions which are equivalent to Eq. (8) and take the place
from the two-level equations of Einstein [recall the steady- of Schrödinger’s equation for the level’s probability am-
state solution Eq. (2b)]. plitudes a1 and a2 (or C1 and C2 ):
Within the RWA, the dynamics remain unitary or prob-
du/dt = −
v (19a)
ability conserving: P1 + P2 = 1 for all values of t. As a
result, the compact version [Eq. (12)] of Schrödinger’s dv/dt =
u + χ w (19b)
equation remains valid even for very strong interactions
dw/dt = −χ v. (19c)
between the atom and the radiation. This is important
when considering the effects on atoms of very intense All of these considerations assume that the field is mono-
laser fields. The limits of validity of Eq. (12) are deter- chromatic or nearly so.
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
Although the field E(r, t) is not considered an oper- transitions are sufficiently restrictive, and optical reso-
ator in the semiclassical formulation, it can still have a nances can be sufficiently sharp that very good approxi-
dynamical character, which means it is not prescribed in mations to two-level atoms can be found in nature. A good
advance, but obeys its own equation of motion. It is natu- example is found in a pair of levels in atomic sodium, and
rally taken to obey the Maxwell wave equation, with the they have been used in quantum optical experiments.
usual source term µ0 d 2 /dt 2 P(z, t). In the semiclassical In sodium the nucleus has spin I = 3/2, and the low-
theory P(z, t) = N d, where N is the density of two-level est electronic energy level is 3S1/2 with hyperfine splitting
atoms in the source volume and d the average dipole mo- (F = 1 and F = 2). The first excited levels 3P1/2 and 3P3/2
ment of a single atom, already calculated in Eq. (16). are responsible for the well-known strong sodium D lines
As Eq. (16) indicates, d is quasi-monochromatic, es- of Fraunhofer in the yellow region of the optical spectrum
sentially because a two-level atom is characterized by a at wavelengths 589.0 nm and 589.6 nm. The “two-level”
single transition frequency. Therefore the E that it gener- transition is between the m F = +2 magnetic sublevel of
ates may also be regarded as monochromatic or nearly so, F = 2 of the ground state and the m F = +3 magnetic sub-
as Eq. (10) assumed. This internal consistency is an im- level of F = 3 in the 3P3/2 excited state. Circularly po-
portant consideration in the semiclassical theory. In many larized dye laser light can be tuned within the 15-MHz
cases it is also suitable to regard the field as having a def- natural linewidth of the upper level, and the
m = +1
inite direction of propagation, say z, and to neglect its selection rule for circular polarization prevents excitation
dependence on x and y (plane wave approximation): of the other magnetic sublevels.
Spontaneous decay from the upper state, which obeys
E(z, t) = ê(z, t)e−i(ωt−kz) + c.c., (20)
no resonance condition, is also restricted in this example.
where k = ω/c and the amplitude or envelope function The final state of spontaneous decay could, in princi-
(z, t) is a complex generalization of 0 (t) in Eq. (10). It ple, have m F = 4, 3, 2. However, in sodium there are no
obeys a “reduced” wave equation in variables z and t: m F = 4 or 3 states below the 3P3/2 state, and only one
m F = 2 state, namely the one from which the excitation
[∂/∂z + ∂/∂ct](z, t) = i(π N dω/4πε0 c)[u − iv].
process began. Thus, the state of the sodium atom is very
(21)
effectively constrained to this two-state subset out of the
The reduced wave equation Eq. (21) and Eqs. (19) for u, v, infinitely many quantum states of the atom.
and w are called the semiclassical coupled Maxwell–Bloch
equations.
Inspection of the semiclassical Eqs. (19) and (21) re- II. INDUCED ATOMIC COHERENCE
veals their major flaw. One easily sees that the semiclas- EFFECTS
sical approach to radiation theory does not include the
process of spontaneous emission. A completely excited The ability of an atomic system to have a coherent dipole
atom will not emit a photon in this theory. That is, if the moment during an extended interaction with a radiation
atom is in its excited state, then a2 = 1 and a1 = 0, so field is a necessary condition for a wide variety of effects
w = 1 and u = v = 0. Thus, according to Eq. (21) no field associated with quantum optical resonance. The dipole
can be generated. By the same token, if = 0 then χ = 0 moment should be coherent in the sense that it retains a
and dw/dt = 0, and no evolution toward the ground state stable phase relationship with the radiation field. Long-
can occur. The flaw in the semiclassical Maxwell–Bloch term phase memory may be difficult to achieve, for exam-
approach arises from the assumption that all of the dynam- ple, because spontaneous emission and collisions destroy
ics can be reduced to a consideration of average values, phase memory at a rate that is typically in the range 108 s−1
that is, averages of dipole moment and inversion and field or much greater. Coherence is also lost if the radiation
strength. In reality, fluctuations about average values are bandwidth is too broad. The importance of dipole coher-
an important ingredient of quantum theory and essential ence effects shows that light–matter interactions do de-
for spontaneous emission. pend most fundamentally on the dipole moment and elec-
Spontaneous emission does not play a dominant tric field strength, not on radiation intensity and the B
role in many quantum optical processes, particularly coefficient.
those involving strong radiation fields. The semiclassical
Maxwell–Bloch equations give an entirely satisfactory ex-
A. p Pulses and Pulse Area
planation of these effects, and the next sections describe
some of them. The solution for the level probabilities given in Eq. (14)
In nature there are no actual two-level atoms, of course. is the single most important example in quantum optics
However, the selection rules for allowed optical dipole of the coherent response of an atom to a monochromatic
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
radiation field. “Coherence” in this context has several then 0 is not too rapidly varying if τp is long enough,
connected meanings, all associated with the well-phased namely if 1/τp ω. Pulse lengths in the range 1 ns ≥ τp
steady oscillation of the probabilities considered as a ≥10 fs are of interest and are compatible with the RWA
function of time. This time dependence was shown in [1 fs (femtosecond) = 10−15 s].
Fig. 2 for several values of the parameters
and χ . The The significance of φ(t) is evident in Eq. (24). The term
significance of the Rabi frequency χ is clear—it is the φ is just the angle of rotation of the on-resonance Bloch
frequency at which the inversion oscillates when the atom vector S = [u, v, w] during the passage of the light pulse.
and the radiation are at exact resonance, when
= 0, since A light pulse with φ = π is called a π pulse, that is, a
pulse that rotates the initial vector [0, 0, −1], which points
w(t) = −cos χ t. (22)
down, through 180◦ to the final vector [0, 0, +1], which
Fruitful connections to the spin vector formalism of points up. Thus, a π pulse completely inverts the atomic
magnetic resonance physics are obtained by regarding the probability, taking the atom from the ground state to the
triplet [u, v, w] as a vector S. Equations (19) for u, v, and excited state. A 2π pulse is one that returns the atom via
w can then be written in compact vector form: a 360◦ rotation of its Bloch vector to its initial state, after
passing through the excited state. The remarkable nature of
dS/dt = Q × S, (23) this rotation is not so much that an inversion of the atomic
where Q is the vector of length with components state is possible, but that it can be done fully coherently and
[−χ , 0,
]. The vector Q can be called the torque vec- without regard for the pulse shape. Only the total integral
tor for S, which is variously called the pseudo-spin vector, of 0 (t) is significant.
the atomic coherence vector, and the optical Bloch vector. The physics behind Bloch vector rotation is essentially
This vector formulation in Eq. (23) of Eqs. (19) shows the same in magnetic resonance, but perhaps less remark-
that the evolution of the two-level atom in the presence able since the Bloch vector in that case is a physically
of radiation is simply a rotation in a three-dimensional “real” magnetic moment. In optical resonance there is no
space. The space is only mathematical in quantum optics “real” electric moment vector whose Cartesian compo-
because the components of the optical Bloch vector are not nents can be identified with [u, v, w]. Early evidence of
the components of a single real physical vector, whereas in the response of the optical Bloch vector to coherent pulses
magnetic resonance they are the components of a real mag- was obtained in experiments of Tang, Gibbs, Slusher,
netic moment. In both cases the nature of the torque equa- Brewer, and others (see Sections II.B and II.C), and fur-
tion leads to a useful conservation law: d/dt(S · S) = 0. ther experiments probing these properties continue to be
That is, the length of the Bloch vector is constant. In of interest.
the u, v, w notation this means u 2 + v 2 + w 2 = 1, and it
implies that the vector [u, v, w] traces out a path on a B. Photon Echoes
unit sphere as the two-level atom changes its quantum
state. Photon echoes are an example of spontaneous recovery
Further consideration of the on-resonant atoms (
→ 0 of a physical property that has been dephased after many
and → χ ) gives information about the interaction of relaxation times have elapsed. In the case of echoes the
atoms with (nonmonochromatic) pulsed fields. According physical property is the macroscopic polarization of a sam-
to Eq. (19a) u(t) can be neglected if
= 0 and a new form ple of two-level atoms. Recovery of the polarization means
of solutions to Eqs. (19b and c) follows immediately: recovery of the ability to emit radiation, and the signature
of a photon echo is the appearance of a burst of radiation
v(t) = −sin φ(t) (24a) from a long-quiescent sample of atoms. The burst occurs
w(t) = −cos φ(t), (24b) at a precisely predictable time, not randomly, and is due to
a hidden long-term memory. The echo principle was dis-
where φ(t) is called the “area” of the electric field be- covered and spin echoes were observed by Hahn in 1950
cause it is related to the time integral of the electric field in magnetic resonance experiments. Photon echoes were
envelope: first observed by Hartmann and co-workers in 1965.
t t Photon echoes are possible when a sample of atoms
φ(t) = χ (t ) dt = (d/ h) 0 (t ) dt . (25) is characterized by a broad distribution g(
) of detun-
0 0 ings. This may occur in a gas, for example, because
In the monochromatic limit when 0 = constant, then of Doppler broadening or in a solid because of crys-
φ(t) → χt. talline inhomogeneities. For the latter reason it is said that
Recall that the rotating wave approximation (RWA) will g(
) indicates the presence of inhomogeneous broaden-
not permit 0 to vary too rapidly. If τp is the pulse length, ing. The Maxwellian distribution of velocities in a gas is
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
equivalent to a Maxwellian distribution of detunings since (u − iv)T = ie−i
T . Only the sum of these u and v values
each atom’s Doppler shift is proportional to its velocity. If is zero due to their different
values. The torque vector
the number of atoms in the sample is N , then the fraction describing the second pulse is Q = [−χ , 0,
], which can
with detuning
is given by Ng(
) d
, where again be approximated by [−χ , 0, 0] if χ
. The effect
1 of the second pulse is again to rotate the Bloch vectors
e(−1/2)[(
−
) /(δωD ) ] .
¯ 2 2
g(
) = √ (26) about the u axis. The ideal second pulse is a π pulse, in
(2π)δωD
which case u → u, v → −v and w → −w, that is, a rota-
Here
¯ is the average detuning and δωD is the Doppler tion by 180◦ . Thus, for times t ≥ T after the π pulse the
linewidth, δωD = ωL (kT /mc2 )1/2 . u − v components are
The Bloch vector picture is well suited for describ-
ing photon echoes. Assume that the Bloch vectors for (u − iv)t = (u − iv)T e−i
(t−T ) ,
a collection of N atoms all lie in the equatorial (u − v)
plane of the unit sphere along the negative v axis, that where T signifies the rotated coherence vector immedi-
is, S = [0, −1, 0]. This arrangement can be accomplished ately following the π pulse at time T :
by excitation from the ground state [0, 0, −1] with a strong
π/2 pulse for which Q = [−χ , 0,
] ≈ [−χ , 0, 0] if (u − iv)T = (u + iv)T = −iei
T .
χ
. The total Bloch vector is then S N = [U, V,
W ] = [0, −N , 0], which corresponds to a macroscopic The remarkable feature of a π pulse is that it accom-
dipole moment of magnitude N d. After this excitation plishes an effective reversal of time. Following it the Bloch
pulse the sample begins to radiate coherently at a rate ap- vectors do not continue to dephase, but begin to rephase:
propriate to the dipole moment N d. However, following
the excitation pulse we again have χ = 0 and Q = (u − iv)t = −iei
T e−i
(t−T )
[0, 0,
], and according to Eq. (23) the individual Bloch
vectors immediately begin to process freely about the w = −ie−i
(t−2T ) . (28)
axis (in the u − v plane) at rates depending on their individ-
ual detunings
. Specifically, (u − iv)t = (u − iv)0 e−i
t Thus, at the exact time t = 2T , the individual u’s and v’s all
for an atom with detuning
, if χ = 0. rephase perfectly: [u, v, w] = [0, 1, 0]. Their Bloch vec-
As a consequence, the total Bloch vector will rapidly tors are merely rotated 180◦ from their positions after the
shrink to zero in a time δt ≈ 1/δω, where δω is the original π/2 pulse, and they again constitute a macro-
spread in angular velocities in the N atom collection. scopic dipole moment S N = [0, N , 0], and therefore the
That is, the coherent sum of N dipoles rapidly dephases collection will begin to radiate again.
and [0, N , 0] → [0, 0, 0], with the result that the sample Because of the timing of this radiation burst, exactly
quickly stops radiating. This is called free precession de- as long (T ) after the π pulse as the π pulse was after the
cay, or free induction decay after the similar effect in original π/2 pulse, it is natural to call the signal a photon
magnetic resonance, because the decay is due only to the echo. Because of the separation by the intervals T and
fact that the individual dipole components u − iv get out 2T from the π and π/2 excitation pulses, the observation
of phase with each other due to their different preces- of an echo can be in practice an observation that is very
sion speeds, not because any individual dipole moment is noise free. Following the echo pulse, the coherence vectors
decaying. again immediately begin to dephase, but they can again be
If the distribution of
’s is determined by the Doppler rephased using the same method, and a sequence of echoes
effect, as in Eq. (26), then since (u − iv)0 = i, one finds can be arranged.
Because of collisions with other atoms, the individual u
U − i V = i N d
g(
)e−i
t and v values will actually get smaller during the course of
an echo experiment, independent of their
values. Thus,
2 the rephased Bloch vector is not quite as large as the origi-
= i N e−i
t exp − 12 δωD2 t .
¯
(27)
nal one. One of the possible uses of an echo experiment is
The decay is very rapid if the Doppler width is large. to measure the rate of collisional decay of u and v, say as
Typically δωD ≈ 109 to 1010 s−1 , thus, within a few tenths a function of gas pressure, by measuring the echo inten-
of a nanosecond U − i V → 0 and radiation ceases. sity in a sequence of experiments with different values of
The echo method consists of applying a second pulse T since the echo intensities will get smaller as collisions
to the collection of atoms after U and V have vanished, reduce the length of the rephased Bloch vector. The way
that is, at some time T (δωD )−1 . Each single atom in which Eq. (23) is rewritten to account for collisions is
with its detuning
, after the time interval T , still has taken up in Section II.D.
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
C. Self-Induced Transparency and Short propagation this would correspond to an index of refrac-
Pulse Propagation tion n ≈ 1000. All of these remarkable features were dis-
covered by McCall and Hahn in the 1960s and labeled by
The polarization of a dielectric medium of two-level
the term self-induced transparency to indicate that a light
atoms, such as any atomic vapor excited near to resonance,
pulse could manipulate the atoms in a dielectric in such a
is linearly related to the incident electric field strength
way that the atoms cannot absorb any of the light.
0 at low-light intensities but becomes nonlinear in the
Self-induced transparency is an example of soliton be-
strong-field, short-pulse regime. This is most evident in
havior. The nonlinearity of the coupled Maxwell–Bloch
Eq. (18), where the sine and cosine functions contain all
equations opposes the dispersive character of normal light
powers of χ = 2d0 / h. These nonlinearities have striking
transmission in a polarizable medium to permit a steady
consequences for optical pulse propagation.
nondispersing solitary wave (or soliton) [Eq. (31)] to
If a 2π pulse is injected into a collection of on-resonant
propagate unchanged. This happens only for fields suffi-
two-level atoms, it cannot give any energy to them because
ciently strong that the 2π pusle condition can be met. The
after its passage the atoms have been dynamically forced
Maxwell–Bloch equations can in many cases be shown
back into their initial state. Thus, a 2π pulse has a certain
to be equivalent to the sine–Gordon soliton equation or
energy stability and so does a 4π pulse and every 2nπ
generalizations of it.
pulse for the same reason. However, all other pulses are
obviously not stable since they must give up some energy
to the atoms if they do not rotate the atomic choherence D. Relaxation
vectors all the way back to their initial positions.
Both photon echoes and self-induced transparency
The effect on the injected pulse due to atomic absorp-
demonstrate the existence of optical phenomena depend-
tions is given by the Maxwell equation (21). When there is
ing on χ ≈ , and not on 2 , that is, on the Maxwell–Bloch
a broad distribution g(
) of detunings among the atoms,
equations and not on the Einstein rate equations. How
then Eq. (21) leads to a so-called area theorem, a nonlinear
are these two approaches to light–matter interactions con-
propagation equation for a pulse of area φ:
nected? To answer this question it is necessary to extend
dφ(z)/dz = − 12 N σ sin φ(z). (29) the scope of the Bloch equations and include the effects
of line-broadening and relaxation processes.
Here φ(z) means the total pulse area χ (z, t ) dt , where The upper levels of any system have a finite lifetime, and
the integral extends over the duration of the pulse. The so |a2 |2 cannot oscillate indefinitely as Eq. (14b) implies,
solution is tan φ(z)/2 = e−αz/2 , and the attenuation coef- but must relax to zero. This is most fundamentally due
ficient is α = N σ , where N is the density of atoms and σ to the possibility of spontaneous emission of a photon,
is the inhomogeneous absorption cross section: accompanied by a transition in the system to the lower
level. Such transitions occur at the rate A, as in Einstein’s
σ = g(
)σa (
) d
, (30) equation (1b).
Other relaxation processes also occur. For example,
where σa (
) is the single-atom cross section (see Sec- collisions with other atoms cause unpredictable changes
tion II.E). For very weak pusles with φ π, one can re- in the state of a given two-level atom. These collisional
place sin φ(z) by φ(z) and recover from Eq. (29) the usual changes typically affect the dipole coherence of the two-
linear law for pulse propagation: dφ(z)/dz = −(α/2)φ(z), level atom instead of the level probabilities, that is, they
which predicts exponential attenuation of the pulse: affect u and v instead of w. We suppose that the rate
φ(z) = φ(0) exp[−αz/2]. The factor of 12 arises because of such processes is γ . Although γ does not appear in
φ is proportional to the electric field amplitude, not the Einstein’s rate equations, its existence is implied. This
intensity. will be clarified later.
Remarkably, one of the “magic” pulses with φ = 2π n, The fundamental equations of optical resonance,
which does not lose energy while propagating, also pre- Eqs. (19), can be rewritten to include these relaxations
serves its shape. This is the 2π pulse, for which there is as follows:
a constant-shape solution of the reduced Maxwell–Bloch
equations: du/dt = −
v − (γ + A/2)u (32a)
are said to be coherent. In the absence of the radiation ± i
), and both ρ21 and ρ12 decay rapidly. If in addition
field (χ = 0), the on-resonance solutions are completely A |γ + A/2 ± i
|, then ρ22 and ρ11 change relatively
nonoscillatory: slowly, and so ρ21 and ρ12 rapidly adjust themselves to the
small quasi-steady values:
u = u 0 e−(γ +A/2)t
−(γ +A/2)t
ρ21 ≈ −(i/2)χ [γ + A/2 + i
]−1 (ρ22 − ρ11 ) (35a)
v = v0 e
ρ12 ≈ (i/2)χ [γ + A/2 − i
]−1 (ρ22 − ρ11 ). (35b)
w = −1 + (w0 + 1)e−At , all for χ = 0.
These solutions show that the off-diagonal elements of the
These solutions are said to be incoherent. In each case “co- density matrix can be determined from constant numerical
herence” refers to the existence of oscillations with a well- factors and combinations of the diagonal density matrix
defined period and phase. In Bloch’s notation, the relax- elements. They can then be eliminated from Eqs. (34c
ation rates are written γ + A/2 = 1/T2 , where A = 1/T1 , and d).
and T1 and T2 are called the “longitudinal” and “trans- This procedure is referred to as adiabatic elimination of
verse” rates of relaxation. off-diagonal coherence because the remaining equations
Relaxation theory is a part of statistical physics, and for ρ11 and ρ22 no longer exhibit coherence. That is, they
in quantum theory statistical properties of atoms and no longer have oscillatory solutions. The term adiabatic is
fields are usually discussed with the aid of the quantum appropriate in the sense that ρ21 and ρ12 are entrained by
mechanical density matrix ρ. The density matrix for the slower ρ11 and ρ22 . The reverse procedure, the elimi-
a two-level atom has four elements, ρ11 , ρ12 , ρ21 , and nation of ρ11 and ρ22 in favor of ρ21 and ρ12 , is not possible
ρ22 . These are related to u, v, and w by the equations because the reverse inequality |γ + A/2 ± i
| A is not
u = ρ12 + ρ21 , v = −i(ρ12 − ρ21 ), and w = ρ22 − ρ11 , possible.
which have the inverse forms: These adiabatic off-diagonal solutions, once inserted
into Eqs. (34) lead to an equation for the slowly changing
ρ12 = 12 (u + iv) = a1 a2∗ (33a)
∗
ρ22 as follows:
ρ21 = 2 (u − iv) = a2 a1
1
(33b)
dρ22 1 2 γ + A/2
= −Aρ22 − χ 2 (ρ22 − ρ11 ).
ρ11 = 12 (1 − w) = a1 a1∗ (33c) dt 2
+ (γ + A/2)2
(36)
ρ22 = 12 (1 + w) = a2 a2∗ . (33d)
Recall ρ22 = |a2 |2 is the probability that the atom is in
Here the brackets . . . are understood to refer to an aver- its upper state, and thus plays the same role as Pn in
age over an ensemble of parameters and variables inacces- the Einstein equation (2a). Similarly ρ11 plays the role
sible to direct and deterministic evaluation, such as the ini- here of Pm there. By comparing Eqs. (2a) and (36) one
tial positions and velocities of all the atoms in a collection sees that they are identical in form and content if one
that may collide with and disturb a typical two-level atom. identifies the coefficient of ρ11 in Eq. (36) with the co-
The equations given in Eqs. (19) for u, v, and w can also efficient of Pm in Eq. (2a). In other words, the density
be obtained from ρ via the Liouville equation of quantum matrix equations [Eqs. (34a–d)] of quantum optics con-
statistical mechanics: i hdρ/dt = [H, ρ]. The equations tain Einstein’s equation in the weak-field and adiabatic
for the density matrix elements [Eqs. (33)] are limits χ |γ + A/2 ±
| and A |γ + A/2 ± i
|.
The B coefficient can be derived in this limit (see
dρ21 /dt = −(γ + A/2 + i
)ρ21 Section II.E) if one properly interprets the factor
− (iχ /2)(ρ22 − ρ11 ) (34a) 2
χ (γ + A/2)/[
2 + (γ + A/2)2 ].
1 2
Equation (37a) is Poynting’s theorem for a “one- (Rabi frequency). It holds in situations much more general
dimensional” medium. It expresses the conservation of than the present example and allows rapid and accurate
photon flux in terms of atomic excitations. However, in the translation of formulas from one domain to the other.
absence of the resonant two-level atoms (N = 0), Eqs. (37) With the use of χ = 2d/h and I = 2cε0 2 , Eq. (39a)
predicts (z, t) = 0 (z−ct), which means that the photon becomes
flux has the constant value (z 0 ) at every point z = z 0 + ct D2 I β
that travels with the pulse. This is contradictory to ordinary abs. rate = . (39b)
3ε0 ch 2
2 + β 2
experience in two respects. The medium that is host to the The introduction of the new dipole parameter D here is
two-level atoms (e.g., other gas atoms, a solvent, or a crys- based on the assumption that all orientations of the atomic
tal lattice) always causes both dispersion and absorption. dipole matrix element d21 are possible (in case, e.g., all
They can be taken into account by modifying Eq. (37) magnetic sublevels of the main levels 1 and 2 are degener-
slightly: ate). Then d 2 ≡ |e · d21 |2 must be replaced by its spherical
[∂/∂z + κ + ∂/∂vg t](z, t) = −N ∂P2 /∂t, (37b) average, that is, by D 2 /3, where D 2 = |d21 · d12 |. We as-
sume this is appropriate in the remainder of Section II.
where vg is the group velocity for light pulses in the The atomic absorption cross section σa is, by definition,
medium and κ its linear attenuation coefficient. the ratio of the rate of energy absorption, hω21 × (abs.
This form of the flux equation implies a similar alter- rate), to the energy flux (intensity) I of the photons being
ation of Maxwell’s equation (21): absorbed. From Eq. (39b) this ratio is
[∂/∂z+κ/2 + ∂/∂vg t] = i(N dω/4ε0 c)[u − iv]. (38) D2ω β
σa (ω; ω21 ) = , (40)
This form of Maxwell’s equation is useful in describing 3ε0 hc (ω21 − ω)2 + β 2
the elements of laser theory (see Section II.G). where γ = β − A/2 is the specifically collisional con-
tribution to the halfwidth. Formula (40) shows that a
E. Cross Section and the B Coefficient quantum optical approach to light absorption through the
How does one use quantum optical expressions to obtain weak field limit of Eqs. (34) gives conventional results
basic spectroscopic formulas, such as for the absorption of atomic spectroscopy. If Doppler broadening is present,
cross section and B coefficient? That is, given expressions then Eq. (40) must be integrated over the Doppler distri-
derived from Eqs. (34), which are based on the Rabi fre- bution of detunings Eq. (26) as was done in Eq. (30).
quency χ instead of the more familiar radiation intensity Lineshape plays a key role in understanding the rela-
I or spectral energy density u(ω), how does one recover a tionship of Eqs. (39) to the Einstein expression (1a) re-
cross section, for example? Consider the quantum optical lating absorption rate to the B coefficient. The absorption
derivation of the Einstein formula in Eq. (36). The transi- cross section can be written σa = σt Sa (ω; ω21 ), where σt
tion rate (absorption rate or stimulated emission rate) can is the total frequency-integrated cross section:
be identified readily. With the abbreviation β = γ + A/2, σt = π D 2 ω21 3ε0 hc (41a)
one obtains: and Sa is the atomic lineshape
1 β
abs. rate = χ 2 2 . (39a) β/π
2
+ β2 Sa (ω; ω21 ) = , (41b)
(ω21 − ω)2 + β 2
The absorption rate is a peaked function of
whose value
drops to 12 of the maximum value at
= ± β. Thus, β is which is normalized according to dω Sa = 1, and in
called the halfwidth at halfmaximum (HWHM) of the ab- this case has a Lorentzian shape. A lineshape also ex-
sorption lineshape. This shows that relaxation leads to ists for the radiation field and is expressed by u(ω),
line broadening, and since β applies equally and individ- the spectral energy density function. One connects u(ω)
ually to every atom, it is an example of a homogeneous with I by the frequency integral c u(ω ) dω = I . In the
linewidth. Recall (Section I.B) that inhomogeneous broad- monochromatic case cu(ω ) takes the idealized singular
ening is not a characteristic of individual atoms but of a form cu(ω ) = I δ(ω − ω ), and Eq. (39b) is the result for
collection of them. From expression (39a) at exact reso- the absorption rate.
nance one obtains the relationship: In the general nonmonochromatic case, the expression
resonant transition rate = χ 2 /full linewidth. for absorption rate involves the integrated overlap of u(ω)
and the atomic lineshape function:
This is the single most concise relationship between the
cσ1
parameters of incoherent optical physics (transition rate abs. rate = Sa (ω ; ω21 )u(ω ) dω . (42)
and linewidth) and the central parameter of coherence hω21
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
The intensity of the light beam associated with this elec- = I¯ + I¯ + 4cε0
tric field is given by:
× Re[ε ∗ (t)ε(t + δ/c)e−iωδ/c ], (64)
I (r, t) = cε0 E(r, t)2
where Re means “real part” and the angular brackets in-
= cε0 [2|(r, t)|2 + 2 (r, t)e−2iωt + c.c.], (62) dicate the time average. If δ = 0, then I¯a+b = 4 I¯, because
where c.c. means complex conjugate. In principle I is the two beams interfere fully constructively. The quantity
rapidly time dependent, but the factors e±2iωt oscillate (δ/c) = ε∗ (t)ε(t + δ/c)e−iωδ/c is called the mutual co-
too rapidly to be observed by any realistic detector. That herence function of the electric field, and the appearance
is, a photodetector can respond only to the average of of fringes at the output plane of the interferometer is due
Eq. (62) over a finite interval, say of length T beginning to the variation of with δ. From the factor e−iωδ/c it is
at t, where T 2π/ω. The e±2iωt terms average to zero clear that a fringe shift (a shift from one maximum of I¯a+b
and one obtains: to the next) corresponds to a shift of δ by 2πc/ω = λ.
T The mutual coherence function is conveniently normal-
I¯T (t) = (1/T ) I(t + t ) dt = 2cε0 |(t)|2 . (63) ized to its maximum value which occurs when δ = 0, and
0 the normalized function γ (δ/c) is called the complex de-
The overbar thus denotes a coarse-grained average value gree of coherence. That is,
that is not sensitive to variations on the scale of a few
optical periods. We have dropped the r dependence as a γ (δ/c) = ε(t)ε ∗ (t + δ/c)e+iωδ/c /ε(t)ε ∗ (t) (65)
simplification. If the beam is steady and the averaging
interval T is long enough, then both the length T and the and the output intensity can be written:
beginning value I (t) are unimportant, and I¯T (t) is also
independent of t. This is the property of a class of fields I¯a+b = 2 I¯[1 + Re γ (δ/c)], (66)
called stationary. Obviously even nonstationary fields can
where Re γ must satisfy −1 Re γ 1.
be considered stationary in the sense of a long T average,
It is common experience that if the path difference δ is
and we will adopt stationarity as a simplification of our
made too great in the interferometer, the fringes are lost
discussion and write I¯T (t) simply as I¯.
and the output intensity is simply the sum of the intensities
Consider now the operation of a Michelson interferom-
in the two beams. This can be accounted for by introducing
eter (Fig. 6). An incident beam is split into two beams
a coherence time τ for the light by writing
a and b, and after traveling different path lengths, say
and + δ, the beams are recombined and the beam inten- γ (δ/c) = γ (0)e−|δ|/cτ e+iωδ/c . (67)
sity Ia+b = cε0 [E a (t) + E b (t)]2 is measured. If the beam
splitter sends equal beams with field strength E(t) and in- This representation for γ has the correct behavior since
tensity I¯ into each path and there is no absorption during it vanishes whenever δ is large enough, specifically
the propagation, then one measures whenever δ cτ . For obvious reasons, cτ is called the
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
coherence length of the light. This does not explain the = 1.6 fs and a coherence length cτ = 0.5 µm that are six
fundamental origin of the coherence time τ , but the lack orders of magnitude smaller. In the latter case,
ν ≈ ν, and
of such a deep understanding of the light beam can be cτ ≈ λ, so sunlight cannot be called quasi-monochromatic
one reason that a statistical description is necessary in the in any sense.
first place. Typically one adopts Eq. (67) as a convenient A hierarchy of correlation functions can be defined for
empirical relation and interprets τ from it. statistical fields. One denotes by the degree of first-order
The fringe visibility is usually the important quan- coherence the normalized first-order correlation of posi-
tity that describes an interference pattern, not the ab- tive and negative frequency parts of the field:
solute level of intensity. The visibility is defined by (+)
The hermitean operator a † a (we drop the mode index Thus, the dispersion is equal to the mean number, a prop-
k temporarily) has integer eigenvalues and is called the erty already noticed for the Poisson distribution (77). If
photon number operator: n 1, then the relative dispersion (
n)2 /n2 is very
small, and the number of photons in the state is well de-
a † a|n = n|n, n = 0, 1, 2, . . . , (84) termined in a relative sense. At the same time, a relatively
and the eigenstate |n is called a Fock state or photon well-defined phase can also be associated with the state.
number state. The operators a † and a are called the pho- The clearest interpretation of the amplitude and phase
ton creation operator and destruction operator because of associated with a coherent state |α is obtained by com-
their effect on photon number states: puting the expected value of E in the state |α. Since
α|a † |α = α|a|α∗ = α ∗ , one easily determines that
a † |n = (n + 1)|n + 1,
Nk = (hωk /2ε0 V ), (88)
and (85)
√ where V is the mode volume, and then Nk can be inter-
a|n = n|n − 1.
preted loosely as the electric field amplitude
√ associated
That is, a † and a respectively increase and decrease by 1 with one photon. This shows that Nk α = (hωk /2ε0 V )α
the number of photons in the field state. The photon num- is the mean amplitude of the expected electric field, or
ber states are mutually orthonormal: m|n = δmn , and what is sometimes called the classical field amplitude. The
they form a complete set for the mode, and all other states phase of α thus determines the phase of the field described
can be expressed in terms of them. They have very attrac- by the state |α.
tive simple properties. For example, the photon number is The coherent states |α can be expressed in terms of the
exactly determined, that is, the dispersion of the photon photon number states:
number operator a † a is zero in the state |n for any n: am
(
n)2 = n|(a † a)2 |n − n| a † a|2 |α = exp[−|α| /2]
2
√ |m, (89)
m m!
= n 2 − n 2 = 0. where the sum runs over all integers m 0. From Eq. (89)
However, the nice properties of the |n states are in some one can compute the probability pn (α) that a given coher-
respects not well suited to the most common situations ent state |α contains exactly n photons. According to the
in quantum optics. For example, loosely speaking, their principles of quantum theory this is given by |n|α|2 , and
phase is completely undefined because their photon num- we find:
ber is exactly determined. As a consequence, the expected
pn (α) = e−n nn /n!, (90)
value of the mode field is exactly zero in a photon number
state. That is n|E(r)|n = 0, because Eq. (85) and the or- which is exactly the “purely random” Poisson probability
thogonality property together give n|a|n = n|a † |n = 0. distribution shown in Fig. 7. This is interpreted as meaning
In most laser fields there are so many photons per mode that photon-counting measurements of a radiation field in
that it is difficult to imagine an experiment to count the ex- the coherent state |α would find exactly n photons with
act number of them. Thus, a different kind of state, called probability (90).
a coherent state, is usually more appropriate to describe The photon creation and destruction operators are not
laser fields than the photon number states. These states |α hermitean and are generally considered not observable,
are right eigenstates of the photon destruction operator a: but their real and imaginary parts (essentially the electric
a|α = α|α. (86) and magnetic field strengths, or in other terms, the in-
phase and quadrature components of the optical signal)
They are normalized so that α|α = 1. Since a is not her- are in principle observable. Thus, one can introduce the
mitean, its eigenvalue α is generally complex. The ex- definitions
pected number of photons in the mode in a coherent state
is n = a † a = α|a † a = 12 (a1 − ia2 ), a † = 12 (a1 + ia2 ) (91)
√ a|α = |α| . Thus, α can be inter-
2
preted loosely as n, the square root of the mean num- and their inverses
ber of photons in the mode. The number of photons in the
mode is not exactly determined in the state |α. This can a1 = a + a † , a2 = i(a − a † ). (92)
be seen by computing the dispersion in the number:
What are the quantum limitations on measurement
(
n)2 = α|(a † a)2 |α − α|a † a|α2 of the hermitean operators a1 and a2 ? They must,
of course, obey the Heisenberg uncertainty relation
= α|(a † (a † a − 1)a|α − α|a † a|2
(
a1 )2 (
a2 )2 | 12 [a1 , a2 ]|2 , where . . . indicates
= α|(a † a|α = |α|2 = n. (87) expectation value in any given quantum state, and
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
IV. QUANTUM INTERACTIONS Here E j and | j are the quantized energies and eigenstates
AND CORRELATIONS of the atom including level-shifting and level-splitting due
to static fields that give rise to Zeeman and Stark effects,
It should be remembered that the highly successful semi- etc. The dipole coupling constant is h f ikj = Nk ê · d i j , in the
†
classical version of the quantum theory of light (Sections I notation of Eqs. (9) and (39). Also, ak and ak are the
and II) does not ignore quantum principles, or put h → 0, photon creation and destruction operators introduced in
but it does ignore quantum fluctuations and correlations. Section III.C.
It is a theory of coupled quantum expectation values. The two-level version of this Hamiltonian is the most
In the following sections of a number phenomena de- used in quantum optics. It is obtained by restricting the
pending directly on quantum fluctuations and correlations sums over i and j to the values 1 and 2. It is not necessary
are described. None of them have been successfully treated that |1 and |2 be the two lowest energy levels. In pho-
by the semiclassical theory or by any other theory than toionization, which is the physical process underlying the
the generally accepted and fully quantized version of the operation of photon counters, the upper state is not even a
quantum theory of light. For this reason the observation discreate state but lies in the continuum of energies above
of these and similar quantum optical effects can offer a the ionization threshold (see Section IV.B). In its two-level
means of testing the accepted theory. version H becomes
†
Such tests are of great interest for two related reasons. H = E 1 |11| + E 2 |22| + hωk ak ak
Because the quantum theory of light (or quantum electro- k
dynamics) is the most carefully studied quantum theory, †
and because it serves as a fundamental guide to all field − h k
f 12 {ak + ak }{σ + σ † }, (96)
k
theories, it plays a key role in our present understanding
of quantum principles and should be tested as rigorously where σ = |12| and σ † = |21|, and f 12
k
has been taken
as possible. Moreover, tests of the effects described be- to be real for simplicity. Note that σ has the effect of a
low play a special role because they bear on the theory in lowering transition when it acts on the two-level atomic
a different way compared with traditional tests, such as state |! = C1 |1 + C2 |2. That is, σ |! = C2 |1, so σ
high-precision measurements of the Lamb shift, the fine takes the amplitude C2 of the upper state |2 and assigns
structure constant, the Rydberg, and the electron’s anoma- it to the lower state |1. By the same argument σ † causes
lous moment. a raising transition.
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
†
The term ak σ † in Eq. (96) is difficult to interpret because the photons associated with a transition in the two-level
it has σ † raising the atom into its upper state together with atom.
†
ak creating a photon. One expects photon creation to be as- The σ operators also change in time in the course of
sociated only with a lowering of the atomic state. The term emission and absorption processes. Their time depen-
ak σ presents similar difficulties. It can be shown, however, dence is also determined by Heisenberg’s equation and
that these two terms are the source of the very rapid oscil- one finds the equations:
lations exp[±2iωt] which were discussed above Eq. (12) dσ/dt = −iω21 σ − i f k ak σ z (102a)
and were eliminated by the rotating wave approximation k
(RWA). The adoption of the RWA eliminates them here †
also. With the RWA and the convenient convention that dσ † /dt = iω21 σ † + i f k ak σ z (102b)
k
E 1 = 0 (and therefore that E 2 = hω21 ), the working two- †
level Hamiltonian is dσz /dt = 2i f k (σ † ak − ak σ ). (102c)
1 † k
H = hω21 (σz + 1) + hωk ak ak
2 k
These are the operator equations underlying the semiclas-
†
sical equations for the Bloch vector components given in
− h f 12 {ak σ + σ † ak }.
k
(97) Eq. (19).
k The correspondence between the quantum and semi-
The operators σ, σ † , and σz are closely related to the classical sets of variables is
2 × 2 Pauli spin matrices: σ → 12 (u − iv)e−iωt (103a)
0 0
|12| = σ → 12 (σx − iσ y ) = (98a) σ † → 12 (u + iv)eiωt (103b)
1 0
σz → w. (103c)
0 1
|21| = σ † → 12 (σx − iσ y ) = (98b) This identification is precise if the radiation field is both
0 0
intense and classical enough. This means that one retains
†
1 0 only one mode and replaces ak and ak by their coher-
|22| − |11| → σz = . (98c)
0 −1 ent state expectation values α = α0 e −iφ
and α0∗ eiφ . Then
Eqs. (102) are identical with Eq. (19) under the previous
One can easily confirm that the matrix representation of
assumptions, that is, the field is quasi-monochromatic so
σ is a “lowering” operator in the two-dimensional space
φ = ωt, and Nk α is the slowly varying electric field ex-
with basis vectors
pectation value, which is interpreted as the classical field
1 0 amplitude. Thus, one has
|2 → and |1 → (98d)
0 1 2 f k α0 → 2d0 / h = χ . (104)
and σ † is represented by the corresponding “raising” op- The part of ak that depends on σ in Eq. (100) acts as a
erator. The σ ’s obey the commutator relations: radiation reaction field when substituted into Eq. (102). It
causes damping and a small frequency shift in the atomic
[σ, σz ] = 2σ, [σ † , σz ] = −2σ † , [σ † , σ ] = σz . (99) operator equations even if there are no external photons
Changes in the radiation field occur as a result of emis- present and thus is associated with spontaneous emission.
sion and absorption of photons during transitions in the The damping constant is exactly the correct Einstein A
atom. One consequence is that ak obeys an equation ob- coefficient for the transition, because the A coefficient is
tained from the Heisenberg equation i h∂ O/∂t = [O, H ] a two-level parameter.
that is valid for all operators O: The frequency shift is only a primitive two-level ver-
sion of a more general many level radiative correction
∂ak /∂t = −iωk ak + i f k σ. (100) such as the Lamb shift. One observes here a natural limi-
k tation of any two-level model. It is intrinsically incapable
Here we have simplified the notation, rewriting f 12 as f k .
The solution, of Eq. (100) is: of dealing with any precision with effects, such as radia-
t tive level shifts, that depend strongly on the contributions
ak (t) = ak (0)e−iωk t + i f k dt e−iωk (t−t ) σ (t ). (101) of many levels. However, in the cases of interest described
0 in the remainder of Section IV, the effects of other levels
The first term represents photons that are present in the are negligible and the numerical value of the frequency
mode k but not associated with the two-level atoms (e.g., shift is irrelevant. It can be assumed to be included in the
from a distant laser), and the second term represents definition of ω21 .
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
B. Quantum Light Detection and Statistics the same average discussed from a classical standpoint in
Sections III.A and III.B. Thus, we finally have
The quantum theory of light detection is based on the
†
quantum theory of photoionization because photon coun- rate ≈ ak ak , (106)
ters are triggered by an ionizing absorption of a photon.
Photoionization is a weak field phenomenon because the where the angular brackets now mean an average over the
effective γ is so large [recall Eq. (45)]. Thus, perturba- initial field, that is, the quantum mechanical expectation
tive methods are adequate and one computes the absolute value in state |!.
square of the ionization matrix element, in this case given The significance of Eq. (106) is in the ordering of
by F| − er · E(r)|I , where |I and |F are the initial and the field operators. The nature of the photoionization
final states of the photoionization. The initial state consists process mandates that they be in the given order and
of an atom in its ground state, described by the electronic not the reverse. Since a † a is not equal to aa † for a
orbital function φ0 (r), and the initial state of the radiation quantum field, the order makes a difference. The order
field |!. The final state consists of the atom in an ion- given, in which the destruction operator is to the right,
ized state described by an electronic orbital function φ f (r) is called normal order. Photoionization is a normally or-
appropriate to a free electron with energy above the ion- dered process by its nature, and therefore so is photode-
ization threshold, and another state of the radiation field tection and photon counting. This has fundamental conse-
|! . Then the matrix element becomes quences for quantum statistical measurements, as we now
explain.
Let us consider the degree of second-order coherence
F| − er · E|I
g (2) in quantum theory. This was written in Eq. (73a) as
= − φ ∗f (r)er · ! |E(r)|!φ0 (r)d 3r an intensity correlation: g (2) (τ ) = I (t)I (t + τ )/I 2 . Be-
cause of the normally ordered character of photoioniza-
tion, if g (2) is measured with photodetectors as usual, its
= −d f 0 · êk Nk ! |ak |!, (105)
k
correct definition according to the quantum theory of ion-
ization is normally ordered:
where d f 0 is the so-called dipole matrix element for the (−)
the same limit
= 0 and under the same circumstances, This combination of properties is unique in atomic ra-
namely, with the field intensity I ≈ χ 2 distributed with diation theory. They indicate, for example, that a sys-
some probability p(I ) over a range of values: tem obeying Hamiltonian [Eq. (117)] would permit some
fundamental questions in quantum electrodynamics to be
w(t) = − p(I ) cos χ (I ) t d I (semiclassical). studied independently of the restrictions of the usual per-
(119b) turbation methods that are based on short-time expan-
sions and a small coupling constant. Experimental real-
The apparent similarity of these results disguises fun- ization of the model is unlikely in the optical frequency
damental differences in dynamical behavior between range because of the restriction to a single radiation mode.
Eqs. (119a and b). These differences arise from the dis- However, quantum optical techniques, including the de-
creteness of the allowed photon numbers in Eq. (119a), tection of single photons, are rapidly being extended to
which are discrete precisely because the field is quantized, much lower frequencies, where single-mode cavities can
that is, because the mode contains only whole photons and be built. Observation of the Jaynes–Cummings model is
never fractional units of the energy hω. By contrast, in the expected to play a guiding role in microwave single-mode
semiclassical theory (recall Section I.C) the intensity I experiments with Rydberg atoms.
and Rabi frequency χ can have any values.
For the quantum photon distribution associated with a
coherent state, for which pn is given by Eq. (90), a plot E. AC Stark Effect and Resonance
of σz (t) is shown in Fig. 10. The nearly immediate dis- Fluorescence
appearance “collapse” of the signal shortly after t = 0 can Just as the Bloch vector provides a powerful descriptive
be explained on the basis of the interference of many fre- framework for a wide variety of quantum optical phenom-
quencies χ (n) in the quantum sum of Eq. (119a). Such ena, so does the Jaynes–Cummings model. The Hamilto-
a collapse can be expected for any broad distribution pn nian [Eq. (117)] can be regarded as a zero-order approx-
and would be predicted by the semiclassical Eq. (119b) as imation to a “true” Hamiltonian, in which the atom is
well, if p(I ) is a broad distribution function. However, the allowed more than two levels or the field has more than
predicted reappearances or “revivals” of the signal are a one mode. It is an unsual zero-order approximation be-
sign that the field is quantized. They occur, and at regular cause it includes the interacting Hamiltonian as well as
intervals, only because pn is a discrete distribution. The the noninteracting atomic and radiation Hamiltonians.
semiclassical expression (119b) leads inevitably to an ir- If the atom interacts resonantly with a single strong
reversible collapse. Only quantum theory can provide the mode of the field, then its interactions with other modes,
stepwise discontinuous photon number distribution that is perhaps involving other levels of the atom, can be treated
the basis for the revivals. approximately. The energy spectrum of the Jaynes–
The revivals and other quantum mechanical predictions Cummings Hamiltonian makes it clear how this can be
implied by the truncated Hamiltonian [Eq. (117)] are of done. In Fig. 11 the RWA energy spectrum is shown in
interest because this Hamiltonian is simple enough to per- the absence of a strong resonant interaction (i.e., with
mit exact calculations, without further approximations of λ = 0), and also with λ = 0. The spectrum shows that the
the kind familiar in most of radiation theory. For example, state |n ∗ |1, which corresponds to the atom in its lower
the expression for quantum inversion in Eq. (119a) has the level and n photons in the mode when λ = 0, is pushed
following unusual properties: down to become the state n, − when λ = 0. That is, for
Eq. (118a) we obtain
1. it is not restricted to any finite range of t values;
E n,− = E 1 + nω − 12 [n −
]. (120)
√ constant λ,
2. it holds for all values of the coupling
which is contained in χ (n) = 2λ n; Since n
, the lower level is pushed down. Similarly,
3. it is completely free of decorrelations, such as the the corresponding upper state |n ∗ |2 is pushed up by the
commonly used approximation a † aσz ≈ a † aσz ; same amount. This shift is called the AC Stark shift because
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
Einstein, Podolsky, and Rosen (EPR) gave a precise particle b, even though the particles may be arbitrarily far
meaning to the concept of reality in this context: “If, with- apart and not interacting in any way.
out in any way disturbing a system, we can predict with Motivated by the EPR argument, one may ask whether
certainty [i.e., with probability equal to unity] the value of it is possible to formulate a theory in which physical
a physical quantity, then there exists an element of physical quantities do have objectively real values “out there,”
reality corresponding to this physical quantity.” It was of independently of whether any measurements are made.
primary concern to EPR whether quantum theory can be These objectively real values may be imagined to be deter-
considered to be a “complete” theory. A necessary condi- mined by certain hidden variables which may themselves
tion for completeness of a theory, according to EPR, is that be stochastic. One can ask, is it possible, in principle,
“every element of the physical reality must have a coun- to construct a hidden variable theory in full agreement
terpart in the physical theory.” Using these definitions of with the statistical predictions of quantum mechanics, but
reality and completeness, and the properties of correlated which allows for an objective reality in the EPR sense?
quantum states, EPR concluded that quantum theory does In the early 1960s Bell considered the most palatable
not provide a complete description of physical reality. class of hidden variable theories, the so-called local the-
An illuminating example due to Bohm is provided by ories. He demonstrated that such theories cannot fully
the singlet state of two spin− 12 particles: agree with quantum mechanics. In particular, certain Bell
√ inequalities distinguish any local hidden variable theory
S = (1/ 2)[|a+, n̂ ∗ |b−, n̂ − |a−, n̂ ∗ b+, n̂], from quantum mechanics. Another way of stating “Bell’s
(122) theorem” is that no local realistic theory can be in full
agreement with the predictions of quantum theory.
where |a±, n̂ is the state for which particle a has spin Bell inequalities are not difficult to derive for the Bohm
up (+) or down (−) in the direction n̂. The unit vector n̂ thought experiment. A local hidden variable theory is pos-
can point in any direction. In light of the EPR argument, tulated to give ± 12 for each spin component, as determined
one notes that the spin of particle b in, say, the x̂ direction, by the hidden variables. The theory is also supposed to ac-
can be predicted with certainty from a measurement of the count for the spin correlations: if one is up the other must
spin of particle a in that direction. If the spin of particle a be down. The difference between such a theory and quan-
is found to be up, then the spin of particle b must, for the tum mechanics is that the spins are predetermined (by the
singlet state, be down, and vice versa. Thus, the spin of hidden variables) before any measurement, that is, they are
particle b in the x̂ direction can be predicted with certainty objectively real. The condition of locality enters through
“without in any way disturbing” that particle. According the additional assumption that a measurement of the spin
to EPR, therefore, the x̂ component of the spin of particle of each particle is not affected by the direction in which
b is an element of physical reality. the spin of the other particle is measured. This is certainly
Of course, one may choose instead to measure the ŷ reasonable if all the spin components are predetermined,
component of the spin of particle a, in which case the ŷ since the two particles may be very far apart when a mea-
component of particle b can be predicted with certainty. surement is made.
It follows therefore that both the x̂ and ŷ components of The question now is whether any measurements can
the spin of particle b (and, of course, particle a) are ele- distinguish such a theory from quantum mechanics. Bell
ments of physical reality. However, according to quan- considered E(m̂, n̂), the expectation value of the product
tum mechanics, the x̂ and ŷ components of spin can- of the spin components of particles a and b in the m̂ and
not have simultaneously predetermined values, because n̂ directions, respectively. He obtained the inequality
the associated spin operators do not commute. Therefore,
|E(m̂, n̂) − E(m̂, p̂)| 14 + E(n̂, p̂), (123)
quantum theory does not account for these elements of
physical reality, and so, according to EPR, it is not a which must be satisfied by the entire class of local hidden
complete theory. variable theories. This inequality is violated by quantum
The EPR experiment can be criticized on the ground theory, as can be seen from the quantum mechanical pre-
that “the system” must be understood in its totality and diction E(m̂, n̂) = − 14 m̂ · n̂. It is therefore possible to test
cannot refer to just one of the particles, for instance, of a experimentally the predictions of quantum theory vis-à-
correlated two-particle system. In the Bohm gedanken ex- vis the whole class of plausible “realistic” theories.
periment just considered, a measurement on particle a does Although Bell’s theorem promoted philosophical ques-
disturb the complete (two-particle) system because the tions about hidden variables to the level of experimental
quantum mechanical description of the system is changed verifiability, it remained difficult to conceive of specific
by the measurement. One cannot consistently associate an experiments that could be undertaken. In 1969, however,
element of physical reality with each spin component of Clauser and co-workers suggested that Bell inequalities
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
could be tested by measuring photon polarization corre- cal realistic theories. In quantum optics it is often possi-
lations if certain additional but reasonable assumptions ble to address such questions from essentially first prin-
about the measurement process were made. The spin ciples and to carry out accurate tests of theory in the
considered by Bell is replaced by photon polarization, an- laboratory.
other two-state phenomenon. Correlated two-photon po-
larization states are produced in atomic cascade emissions,
and efficient polarizers and detectors are available for op- V. RECENT DEVELOPMENTS
tical photons.
Consider a J = 0 → 1 → 0 atomic cascade decay, A. Substantial Squeezing
with polarizer–detector systems on the ±z axes. Linear In Section III.B the first observation of a squeezed state
polarization filters may be employed to distinguish the of light was mentioned. Since then, considerably greater
photons by their energy so that each polarizer–detector degrees of squeezing have been reported. As much as a
system records photons of one frequency but not the other. 60% noise reduction has been observed using a three-wave
It may be shown that the two-photon prolarization state has parametric down-conversion technique, in which a photon
the form of frequency ω is converted into two photons of frequency
√
! = (1/ 2[|a, x̂ ∗ |b, ŷ + |a, ŷ ∗ |b, x̂]), (124) ω/2 in a crystal.
where |a, x̂ is the single-photon state in which photon a
is linearly polarized along the x̂ direction, etc. A similar B. Two-Photon Correlated Quantum States
form applies if a circular polarization basis is used. Studies of photon coherence and interference have en-
The correlated photon state of Eq. (124) is obviously tered a new domain with experiments being undertaken
analogous to the spin- 12 correlated state of Eq. (122). A on new types of two-photon quantum states, going beyond
hidden variable theory of such polarization correlations the detection of two-photon polarization correlations men-
leads to a Bell inequality analogous to Eq. (123): tioned in Section IV.F. For example, quantum beats have
|E(α, β) − E(α, γ )| ≤ 1 − E(β, γ ), (125) been observed in the joint probability of two-photon de-
tection as a function of the path difference and frequency
where E(α, β) now refers to photon polarization compo- difference of two photons produced by parametric down
nents and α and β are the filter orientations with respect to conversion.
some reference axis. The differences between Eqs. (125)
and (123) arise because we are now dealing with spin-one
particles and because Eq. (124) describes a positive cor- C. Cavity QED in the Optical Domain
relation. The quantum mechanical prediction for E(α, β) Despite our prediction to the contrary at the end of Sec-
is simply cos 2(α − β), and Eq. (125) becomes tion IV.D, experimental observations of effects associ-
|cos 2(α − β) − cos 2(α − γ )| 1 − cos 2(β − γ ), ated with the single-atom single-mode Jaynes–Cummings
(126) quantum model have been successful in the optical do-
main, as well as at microwave frequencies. Collapse and
which, in fact, is not satisfied for all angles α, β, and γ . revival signals (see Fig. 10) have been reported in ex-
Such violations of Bell inequalities have been observed in periments using rubidium atoms traversing a microwave
independent experiments led by Clauser, Fry, and Aspect. cavity, and optical observations of line-narrowing below
The results of such experiments are in agreement with the free-space limit and cavity line-spilitting have been
quantum theory, and appear to rule out any local hidden reported.
variable theory. There are possible loopholes in the in-
terpretation of the experiments, but at the present time
D. Two-Photon Laser
none of them seem very plausible. According to Clauser
and Shimony, “The conclusions are philosophically un- After more than two decades of theoretical interest, sub-
settling: Either one most totally abandon the realistic phi- stantial experimental progress toward the realization of
losophy of most working scientists, or dramatically revise a two-photon laser is occurring. A two-photon laser has
our present concept of space–time.” many fundamental differences with the conventional one-
From the viewpoint of quantum optics, the photon po- photon laser, including special noise properties, multista-
larization correlation experiments measure a second-order bility, and a different phase transition at threshold. An
field correlation function. Such a correlation function not important step has been the successful operation of a
only distinguishes between classical and quantum radi- quantum oscillator working on a microwave two-photon
ation theories, but also between quantum theory and lo- Rydberg transition in rubidium.
P1: ZBU/GRD P2: GQT Final Pages
Encyclopedia of Physical Science and Technology EN013C-627 July 26, 2001 19:42
!
12 = 2 (|H1 |H2 + |V1 |V2 ),
√1
anomalously strong very high-order harmonic generation
(above 30th order). (B)
!
12 = 2 (|H1 |H2 − |V1 |V2 ),
√1
(C)
!
12 = 2 (|H1 |V2 + |V1 |H2 ),
√1
F. Cooling below the Doppler Limit (D)
!
12 = 2 (|H1 |V2 − |V1 |H2 ).
√1
Resonant excitation of two-level atoms can give rise to
center of mass motion as well as internal transitions, and It is not hard to check that our three-photon state |!123 ,
various schemes for trapping and cooling collections of can be written:
atoms by laser light are based on this. The spontaneous (A)
line-width of the transition places a so-called Doppler |!123 = 12 !12 ∗ (−b|H3 + a|V3 )
atoms allow this limit to be evaded, and much lower tem- + 12 !12 ∗ (−a|H3 + b|V3 )
peratures, in the neighborhood of 50 µK (0.000050 de-
|!3 = a|H1 + b|V1 . Note that this is the same as the Bennett, C. H., Brassard, G., Crepeau, C., Jozsa, R., Peres, A., and
original state of particle 1, and the teleportation is com- Wooters, W. K. (1993). Phys. Rev. Lett. 70, 1895.
plete. The reader can check this by choosing the state 3 Boschi, D., Branca, S., De Martini, F., Hardy, L., and Popescu, S. (1997).
Phys. Rev. Lett. 80, 1121.
partner of any one of the Bell components in |!123 and Bouwmeester, B., Pan, J. W., Mattle, K., Eibl, M., Weinfurter, H., and
performing the rotation indicated by the table. Zeilinger, A. (1997). Nature 390, 575.
At this point Bob has the original state while Alice has Delone, N. B., and Krainov, V. P. (1985). “Atoms in Strong Light Fields.”
the entangled Bell pair resulting from her Bell projec- Springer-Verlag, Berlin.
tion, but nothing of her original state of photon 1. The Fontana, P. (1982). “Atomic Radiative Processes,” Academic Press, San
Diego.
original state is destroyed in the process of teleporting it. Furusawa, A., Sorenson, J., Braunstein, S. L., Fuchs, C. A., Kimble,
Alice’s inability to know in advance which entangled Bell H. J., and Polzik, E. S. (1998). Science 282, 706.
state will result from her Bell measurement is mirrored Knight, P. L., and Allen, L. (1983). “Concepts of Quantum Optics,”
in Bob’s uncertainty about the unitary operation (polar- Pergamon, Oxford.
ization basis rotation) required of him before receiving Knight, P. L., and Milonni, P. W. (1980). The Rabi frequency in optical
spectra. In “Physics Reports,” Vol. 66, pp. 21–107. North-Holland,
Alice’s phone call. This uncertainty necessitates the phone Amsterdam.
call, and this eliminates any imagined superluminal as- Loudon, R. (1983). “The Quantum Theory of Light,” 2nd ed. Oxford
pect of the process. The locations of Alice and Bob are ir- Univ. Press, Oxford, UK.
relevent to the process and they may be separated by a great Mandel, L. (1976). The case for and against semiclassical radiation the-
distance. ory. In “Progress in Optics” (E. Wolf, ed.), Vol. XIII. North-Holland,
Amsterdam.
Milonni, P. W. (1976). Semiclassical and quantum-electrodynamical
approaches in nonrelativistic radiation theory. In “Physics Reports,”
SEE ALSO THE FOLLOWING ARTICLES Vol. 25, pp. 1–81. North-Holland, Amsterdam.
Perina, J. (1984). “Quantum Statistics of Linear and Nonlinear Phenom-
ATOMIC AND MOLECULAR COLLISIONS • ELECTROMAG- ena,” D. Reidel, Dordrecht.
NETICS • LASERS • NONLINEAR OPTICAL PROCESSES • Rosen, H. J., and Gustafson, T. K. (eds.) (1989). Quantum electronic
OPTICAL INTERFEROMETRY • QUANTUM THEORY applications and optical studies of high-Tc superconductors. Quantum
Electron 25(11), 2357–2409.
Stenholm, S. (1984). “Foundations of Laser Spectroscopy,” Wiley,
New York.
BIBLIOGRAPHY Yoo, H. I., and Eberly, J. H. (1985). Dynamical theory of an
atom with two or three levels interacting with quantized cavity
Allen, L., and Eberly, J. H. (1975). “Optical Resonance and Two-Level fields. In “Physics Reports,” Vol. 118, pp. 239–337. North-Holland,
Atoms,” Wiley, New York, reprinted by Dover (1987). Amsterdam.
P1: LLL/GJK P2: GSS Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
Quantum Theory
David W. Cohen
Smith College
441
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
all of these theories, as well as the formulation of the a derived quantity, force, by setting particles of known
philosophical questions behind them, were merged into mass into motion. When an object of mass m is accelerated
one mathematical formalism called quantum mechanics. from rest, along a straight line, to an acceleration a, we say
Our discussion of the quantum theory will trace the the force F causing the acceleration acts in the direction
main ideas in chronological order from the classical setting of the motion and is given by
in the 19th century through the birth of quantum mechan-
F = ma. (1)
ics in 1925. Then we will look at some of the controversy
surrounding the modern theoretical foundations of quan- Notice that force, like acceleration, is described in terms of
tum mechanics. And finally we will look at an example of direction, and so it is a vector quantity. The magnitude of
how the probabilistic view of nature arising from quantum force necessary to accelerate 1 kg of mass to a magnitude
theory might actually be used to advantage to build high- of 1 m sec−2 is called 1 newton (N). Equation (1) is a
speed quantum computers. In our discussions we won’t formulation of Newton’s second principle of motion.
hesitate to paraphrase original ideas and substitute mod- Let us look more closely at the meaning of Eq. (1).
ern notation when it’s necessary for clear understanding. We think of the letters in the equation as place holders for
numbers. That is, if we assign numbers to two of the letters,
we can manipulate the equation to compute the number
I. THE CLASSICAL SETTING that must be assigned to the third letter, if the equation is
to be true. So we call the letters variables. A variable in
In this section we review the classical physics that we shall an equation is called continuous if the numbers that can
need for our discussion of quantum theory. be assigned to it comprise an open (possibly unbounded)
interval. For example, if we assume that we can accelerate
an object of given mass to any magnitude of acceleration
A. Physical Laws and Variables
between values −x and x, then the variable a in Eq. (1) is
One of Sir Isaac Newton’s key contributions to science continuous with domain of assignability (−x, x).
in the 17th century was the concept of the physical law. Some variables, however, can be assigned values only
He codified events, like the fall of an apple, into a system from a discrete set of numbers. A set of numbers S is
of words and mathematical expressions that brought such called discrete if every number in S belongs to an open
order, clarity, and economy to thought that he was able interval that contains no other members of S. The set of
to reveal hidden relationships among the events. In fact, integers, for example, is a discrete set of numbers. In the
Newton’s laws provided experimenters with the means to equation cos 2N π = 1, the set of values for N for which
predict with uncanny certainty how some events would the equation is true can come only from the set of integers.
affect the future. Thus, N is a discrete variable in that equation.
The success of the predictive value of Newton’s laws The word discrete, as we have defined it, is relatively
was so great that it became a universally accepted princi- modern. Physicists traditionally have used the word dis-
ple of science that it is possible, in theory, to determine continuous instead. One of the revolutionary ideas of
simultaneously the values of all physical quantities asso- quantum theory was the assumption of the discontinuity
ciated with a given physical event and with those values of some variables that had always been considered con-
predict, with 100% certainty, all future events caused by tinuous in classical physics.
the first. We shall see later how quantum theory challenged
this principle, but first let us examine the law of Newto-
B. Mechanical Energy
nian mechanics that we shall need for our discussion. We
use modern terminology. Energy is the capacity to do work. We present here, briefly,
Let us call a physical quantity fundamental if its values the classical definitions of potential and kinetic energy for
are determined by comparing measurements of the quan- a particle moving in a field of force.
tity against some arbitrarily selected standard. Examples Let R denote a region of three-dimensional space and
of such quantities are periods of time (measured in sec- suppose that at every point p in R there is a three-
onds, perhaps) and length (measured in meters, feet, etc.). dimensional vector F( p) assigned to that point. We call F
We shall call derived quantities those obtained from funda- a vector field. Suppose also that f is a real-valued function
mental quantities by mathematical calculations. Examples on R, that ∇ f is its gradient, and that
of these are instantaneous velocity and acceleration.
F( p) = −∇ f ( p) (2)
One fundamental quantity defined by Newton is the
mass of an object. Mass measures the resistance of objects for every point p of R. Any function f that satisfies Eq. (2)
to acceleration. An object used to determine a standard is called a potential function for F. If F( p) represents a
value of mass, 1 kg, is kept in a vault in Paris. We can define force at every point p, we call F a force field.
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
If γ is any differentiable path in R from point p = γ (a) where m is the mass of the particle and v( p) the magnitude
to point q = γ (b), and if f is a potential function for force of the velocity of the particle when it is at point p. The
field F, then we define the work done by moving a particle kinetic energy is equal to the amount of work that must
along γ through force field F by the line integral be done to accelerate the particle from a state of rest to a
state of motion with velocity of magnitude v( p).
W = F. (3) The law of conservation of energy states that, through-
γ out the motion of the particle, the total energy
The work depends only on the end points of the path, and
not on γ itself. We can express this by the equations E( p) = PE( p) + KE( p) (7)
b has the same value for every point p.
F= F(γ (t)) · γ (t) dt = f (q) − f ( p), (4) A useful example is that of the kinetic energy of a parti-
γ a
cle of mass m rotating at constant speed in a circular orbit
which can be proved using Eq. (2). of radius r . Suppose that at every point the particle is sub-
We have, then, that f (q) − f ( p) is the work done if we ject to a force toward the center of the orbit of magnitude
move a particle through force field F from point p to point F and that its velocity at every point has magnitude v. It
q. Since any two potential functions for F must differ only follows from some routine calculations that at each point
by a constant, that amount of work can be calculated from the particle is subject to an acceleration toward the center
Eq. (4) using any potential function for F. We call that of the orbit of magnitude a = v 2 /r . Then we have from
amount of work the potential energy difference between Eq. (1) that F = ma = mv 2 /r , so that we can obtain the
point p and point q. kinetic energy of the particle at every point in its orbit as
It is possible to define an absolute potential energy func-
tion PE by selecting any point s in R and deciding what KE = 12 mv 2 = 12 Fr. (8)
the potential energy P is to be at the point. Then for any Later, we shall make use of this equation.
potential function f for F, we can define the potential
energy at every point p to be
C. Electromagnetic Energy
PE( p) = P + f ( p) − f (s). (5)
In the late 19th century James Clerk Maxwell combined
This function PE differs from f by a constant, so it is a theories of electricity and magnetism into a theory of elec-
potential function for F, and its definition does not depend tromagnetic fields. Such a field is a region of space at every
on the choice for f . point of which is a vector of electromagnetic force that acts
An example of a potential energy function arises when on any charged particle placed at the point. The work done
a particle moves about R, all of three-dimensional space, as the force moves the particle is a measure of the electro-
in a field of a force of attraction between the particle and magnetic energy of the field. Such fields surround every
the point O at the origin of R. The potential energy can be charged particle, and if a charged particle is accelerated,
negative at every point p, with large negative values near some of the field surrounding it leaves the vicinity of the
O and the potential energy approaching zero at points very particle and travels off into space. We call this traveling
far from O. Then to move the particle from a distant point field a wave of electromagnetic radiation. We can mea-
p along a straight line toward O to a point s closer to O sure the energy of the radiation by measuring the work the
requires negative work, because it is movement in the same wave can perform in moving a charged particle placed in
direction as the force exerted by the field. This results in its path.
a change of potential energy from one negative value to It will not be necessary for our discussion of quan-
a more negative value, which is considered a decrease in tum theory to examine the quantitative relationships in-
potential energy. As we shall see, an electron in an atom volved with electromagnetic theory, but it is important to
is subject to a potential energy function of this type. know that these relationships, known as Maxwell’s equa-
Now let us consider kinetic energy. A force field F that tions, require fields of electromagnetic force to be continu-
satisfies Eq. (2) for some potential function f is called ous vector fields. Electromagnetic energy calculated from
a conservative force field. A particle placed in a conser- Maxwell’s equations is necessarily a continuous variable.
vative force field and allowed to react solely to the force
exerted by the field will move from point to point, always
in the direction of decreasing potential energy. During the D. Entropy
motion, we ascribe to the particle at each point p a kinetic
Entropy was originally defined as a measure of the rela-
energy defined by
tionship between temperature and energy. The total me-
KE( p) = mv( p)2 /2, (6) chanical energy due to the motion of the molecules in a
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
given substance is called the thermal energy of the sub- different kinds, or a region of space containing items of
stance. Some of this thermal energy can be made to do unknown nature. Even the entire universe can be consid-
work if the substance is allowed to cool. ered a giant physical system. Extending the theory of en-
In the early part of the 19th century Sadi Carnot pro- tropy was not easy. Among the most difficult tasks was
vided a theoretical model of an engine that transforms a assigning meaning to the notion of the “state” of various
fall in temperature of a substance into mechanical energy. physical systems. We shall return to that problem later, but
Some years later Rudolf Clausius and William Thomson let us continue our discussion of entropy supposing that
(Lord Kelvin) observed the following about Carnot’s en- “states” have been defined.
gine: Of the total difference in thermal energy resulting As in the theory of gases, changes in the state of a physi-
from a drop in temperature from T1 (degrees Kelvin) to T2 cal system can be classified as reversible or nonreversible.
in a Carnot engine, only the fractional portion 1 − T2 /T1 If we consider a sequence w1 , w2 , . . . , wn of reversible
can be transformed into mechanical energy to do work, changes in state for a system at absolute temperature T ,
even under the most ideal theoretical conditions. The re- as we did to obtain Eq. (9), then the incremental change
maining portion of energy is redistributed in the substance in entropy from state wk to wk+1 is denoted
of the engine. Note that no work can be obtained from ther-
mal energy without a strict decrease in temperature. Sk = E k /T, (10)
The principle of Carnot’s engine applies to a gas at tem- where E k is the incremental change in the total energy
perature T1 occupying a volume V1 and exerting pressure of the system.
P1 on the walls of its container. Let us say that this gas is in We examine our next step carefully. It reveals an as-
a macroscopic state w1 = (P1 , V1 , T1 ). Suppose we wish to sumption taken so much for granted in the physics of the
change the state of the gas to w2 = (P2 , V2 , T2 ). We could, 19th century that it was seldom, if ever, explicitly dis-
if we wished, change states without changing temperature cussed then. The challenge to that assumption was at the
by allowing the gas to expand slowly against one movable very heart of quantum theory in the early 20th century.
side of its container, decreasing pressure and increasing We have in Eq. (10) a relationship between differences
volume. Since work, or mechanical energy, is obtained in entropy and differences in energy. For a gas in a con-
from this process (assuming the side resists movement), trolled experiment, precise values for these differences can
we can add thermal energy to the gas to counteract the be measured. If we assign a particular value of entropy to
temperature drop necessary to supply the work. On the a particular value of energy in an experiment, it is possible
other hand, we could, at least theoretically, allow the gas to use Eq. (10) to write an explicit relationship between
to expand in a vacuum, increasing volume without chang- absolute entropy and energy at temperature T for states
ing pressure. In this case, no work is done and temperature w1 , w2 , . . . , wn :
can remain constant without an addition of thermal energy.
The first type of change from w1 to w2 , requiring an ad- Sk = E k /T. (11)
dition (or subtraction) of thermal energy, if temperature is Next comes the assumption. We assume that the states
to remain constant, is called a reversible change of state. w1 , w2 , . . . , wn were chosen from a continuum of states
Now suppose we wish to pass in tiny reversible incre- so that Eq. (11) provides a discrete set of values in what is
mental steps w1 , w2 , . . . , wn from some initial state w1 to really a continuous, in fact, differentiable function of en-
some final state wn . Denote by E k the change in thermal tropy in terms of energy. Time and again physicists iden-
energy for the gas as it goes from state wk to wk+1 . Now tified one physical quantity as a function of another one,
consider the number and assumed that the functional relationship was continu-
n
E k ous and that the ability to measure arbitrarily small differ-
S(w1 , wn ) = . (9)
k=1
Tk ences in the quantities was limited only by the ability to
build arbitrarily precise and accurate measuring devices.
Clausius proved that for reversible changes of state, the Although this assumption will be reexamined in our dis-
sum of Eq. (9) depends only on w1 and wn and is inde- cussion of quantum theory, it allows us to take our next
pendent of the sequence used in changing from the initial step to obtain an expression involving entropy that we
to the final state. He called this intrinsic difference between shall need in the sequel.
the two states a difference in the entropy of the gas. Notice We rewrite Eq. (10) as Sk / E k = 1/T and use our
that, as with potential energy, we do not have a value for the assumption to pass to infinitesimally small incremental
absolute entropy until we assign a value to one state. changes to obtain an expression for the derivative of en-
The theory of entropy was eventually extended to phys- tropy with respect to energy:
ical systems other than fixed amounts of gas. A physical
system can consist of a mixture of gases or particles of d S/d E = 1/T. (12)
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
This equation was a basic building block in the original numbers according to laws of probability, and this notion
construction of the quantum theory. was a key inspiration to the developers of quantum theory.
Let us consider a gas in a closed container and the
macroscopic states of the gas defined as w = (P, V, T )
E. Specific Heat
when the gas is at pressure P, volume V , and temperature
As we mentioned in the preceding section, the thermal T . We shall restrict our attention to changes in macro-
energy of a substance is transformed into mechanical en- scopic state due solely to changes in temperature, leav-
ergy when the substance undergoes a drop in temperature. ing pressure and volume fixed. Consider one macroscopic
Conversely, energy supplied to a substance can raise its state w, and suppose that the gas consists of N molecules,
temperature. Let us consider cases in which a substance each with a specific energy, and that in state w the total
does not change pressure (if it is a gas) or volume while it thermal energy of the gas is E.
undergoes changes in temperature. Next we shall partition the real-number interval [0, E]
Throughout the 19th century experiments were per- into p subintervals J1 , . . . , J p , each of length ε = E/ p.
formed to determine the amount of energy required to We then assign each of the N molecules to one subinter-
raise fixed amounts of various substances one degree on val according to the amount of energy it has. For a given
a temperature scale. It was discovered that the relation assignment we consider a p-tuple of numbers (l1 , . . . , l p ),
between the amount of energy added to a substance and where li is the number of molecules assigned to subinterval
the resulting rise in temperature depends not only on the Ji . We call such a p-tuple a microscopic state of the gas.
material of the substance but also on the temperature of For example, if l1 = N and l2 = l3 = · · · = l p = 0, then the
the material during the addition of energy. gas is in microscopic state (N , 0, . . . , 0), and we can write
Let us consider a fixed substance. Consider a sequence the total energy of the gas as E = pε ≈ N ε = l1 ε. The ap-
T1 , . . . , Tn of temperatures. Denote by Tk the change in proximation follows since all N molecules have energy
temperature from Tk to Tk+1 , and by E k the correspond- values in interval J1 , which has the number ε as an end
ing change in thermal energy of the substance. Now we point. So each molecule has energy approximately equal
define the heat capacity of the substance at temperature Tk to ε. For an arbitrary microscopic state (l1 , . . . , l p ), we
as have an approximation of E given by
C(Tk ) = E k / Tk . (13)
p
E≈ li x i , (16)
Using the assumption of differentiability mentioned in our i=1
discussion of entropy, we can then pass to infinitesimally
small increments in Eq. (13) to arrive at the definition for where each xi is an arbitrary value in subinterval Ji .
the heat capacity of the substance at temperature T as It is assumed that the molecules are distinguishable,
so that there are many ways a specific microscopic state
C(T ) = dE(T )/dT. (14) can be achieved. In other words, interchanging one of
The specific heat of a substance at temperature T is the molecules in subinterval Jm with one in subinterval
defined as its heat capacity per unit mass. Thus, a substance Jn does not change the microscopic state, which depends
of mass m has specific heat at temperature T given by only on the p-tuple (l1 , . . . , l p ). It can be computed from
standard methods in combinatorics that the total number
c(T ) = C(T )/m. (15) of ways the molecules can be assigned among the subin-
Although we have defined heat capacity and specific tervals to achieve a given microscopic state (l1 , . . . , l p ) is
heat as functions of temperature, one often hears a phrase W (l1 , . . . , l p ) = N !/l1 !l2 ! · · · l p ! (17)
such as “the specific heat of copper” without mention of
temperature. That is because the values of the heat capacity Now if W (l1 , . . . , l p ) is a high number, many assign-
of most substances are nearly constant over a range of ments of the molecules to the subintervals result in state
room temperatures, and it is this constant value that is (l1 , . . . , l p ). If it is a small number, few assignments result
often called the heat capacity of the substance. in that state. The common way to express this in physics is
to say that W (l1 , . . . , l p ) is the probability for microscopic
state (l1 , . . . , l p ) to occur. A state where W (l1 , . . . , l p )
F. Boltzmann Statistics
achieves a maximum value is a state “most likely to occur,”
In the late 19th century Ludwig Boltzmann undertook and that is called a state of statistical thermal equilibrium.
the task of providing a more precise relationship between Another way to describe W is to call it a measure of the
changes in mechanical motions of molecules and changes “disorder” of the molecules. States highly likely to occur
in entropy. His theory rested on notions of distributing are those having coordinates nearly equal. They are said
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
to be in a state of high disorder. That is why it is often said Suppose we build such a furnace and heat the interior of
that entropy is a measure of the disorder of the universe. it and hold it at constant temperature T (degrees Kelvin).
Now if we have two systems that we combine into one The particles in the furnace, dust, for example, and the par-
(e.g., two gases in the same vessel), then an increase in ticles making up the walls of the furnace will absorb and
entropy S1 and S2 of the systems individually must result radiate electromagnetic energy until the interior reaches
in an increase S1 + S2 of the entropy of the combined a state of equilibrium. That is when each particle is ab-
system. On the other hand, if W1 is the probability for sorbing exactly the same amount of energy it is radiating.
a given state for one system with entropy S1 to occur, In reality, if a tiny bit of radiation energy escapes through
and W2 is the probability for a given state for the other the hole, more energy is being emitted than absorbed by
system with entropy S2 to occur, then the probability for some particles, but we can ignore that tiny difference and
the combined system to be in both states simultaneously consider the electromagnetic energy in the furnace to be a
is the product W1 W2 . So the relationship between entropy field of electromagnetic energy in a state of equilibrium.
S and the probability W of a microscopic state of a system We now wish to measure the intensity per unit volume
must be the one that relates multiplication to addition, of the electromagnetic energy in the furnace. In practice
we can do this by letting a small amount of electromag-
S = k ln W, (18) netic energy radiate from the box through the hole. When
where k is a constant. This is Boltzmann’s key entropy this was done in the late 19th century, it was found that
equation, although he never wrote it this way. The constant the intensity varied with the temperature T and the fre-
k, now known as Boltzmann’s constant, was the subject quency ν of the radiation. This can be observed, for ex-
of much research at the beginning of the 20th century. ample, by shining the light from the furnace through a
Observe that, at thermal statistical equilibrium for a prism and measuring the intensity of the light at different
given energy, W is a maximum and hence S is also. A frequencies. Other variables were considered. For a given
macroscopic state of statistical thermal equilibrium, there- temperature and frequency the size and the shape of the
fore, is a state of maximum entropy for a given energy, and furnace were varied. Furnaces were built from a variety of
that macroscopic state can occur for many different mi- materials, and particles of different materials were placed
croscopic states. in the furnaces. None of these changes affected the in-
Let us conclude our introduction to Boltzmann statis- tensity of the electromagnetic energy, which appeared to
tics by mentioning a step that Boltzmann took whenever depend only on frequency and temperature. Thus, it was
he applied his calculations. By using the assumption of hypothesized that there exists an expression I for the en-
continuity we discussed in Section I.D, he let the number ergy intensity in a unit volume in the furnace as a function
of subintervals of the energy interval go to infinity and the of frequency ν and temperature T that is fundamental to
size of each subinterval go to zero. As we shall see, the blackbody radiation.
success of quantum theory rested on not taking that step. In the late 19th century there began an intense and frus-
trating search for the function I based on a sound theoreti-
cal model and verified by experiment. Well-known laws of
classical electromagnetic radiation, thermodynamics, and
II. BLACKBODY RADIATION: THE BIRTH
mechanics were blended together to give formulations for
OF QUANTUM THEORY
I that were patently absurd. On the other hand, expressions
for I extrapolated from experimental data were accurate
A. The Blackbody Furnace
to various extents but stood wholly without theoretical
Particles absorb and radiate electromagnetic energy. Some foundations. The search for an explanation of blackbody
absorb very little incident radiation (reflectors), and others radiation led to the birth of quantum theory.
a great deal (absorbers). Objects can be made to radiate
electromagnetic energy when they are heated. We see this
B. The Extrapolated Formulas
when we watch glowing coals in a dying fire, for light
is a form of electromagnetic radiation. The most intense Let us begin the search for the function I in the same way
energy is radiated by objects that are the best absorbers. it was begun by physicists around the turn of the century:
A perfect absorber is called a blackbody. by examining experimental data. For various temperatures
Physicists can simulate a blackbody by building a box T1 , . . . , Tn , we plot the observed values for I (ν, Tk ) as a
and placing a tiny hole in one side of the box. The hole is function of ν. Four such plots are illustrated in Fig. 1.
the blackbody, because nearly all electromagnetic energy For a specific temperature T , if I (ν1 , T ) and I (ν2 , T )
falling on the hole will enter into the box and not emerge are the energy intensity per unit volume at frequencies
again. We shall call such a box a blackbody furnace. ν1 and ν2 , respectively, then the total energy intensity in
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
a wide range of temperatures and frequencies. Convinced To simplify notation we shall now drop the symbols ν and
that he had the “right” radiation formula, Planck then set T in what follows, bearing in mind that we have N oscil-
about finding the precise mechanical model for the furnace lators, all at equilibrium at temperature T and vibrating
and a theory about how it operated to yield this result. with frequency ν.
According to Boltzmann’s theory, if a system is in a
D. The Quantum Theory given microscopic state with probability W , then the total
of Blackbody Radiation entropy S is proportional to ln W ,
If our basic assumption is that the furnace and the particles S = k ln W, (27)
in it are composed of linear oscillators, two questions arise
where k is a constant. The task now is to calculate W for
immediately: How many oscillators are there, and how
our system of oscillators. This will be the total number of
are the total radiant energy and total entropy of the furnace
ways the p energy elements can be distributed among the
distributed among the oscillators?
N oscillators. Note that, although we are taking a cue from
We shall answer the first question, as did Planck, in the
Boltzmann, our approach differs in that our “states” reflect
simplest way, by assuming that there are finitely many os-
distributions of p numbers among N oscillators, whereas
cillators for each frequency ν, say Nν . If we write S(ν, T )
Boltzmann’s states reflect distributions of N molecules
for the entropy of a linear oscillator in thermal equilib-
over p intervals of real numbers.
rium at temperature T and frequency ν, and E(ν, T ) for
Let us count the possible distributions of energy ele-
its average energy, then we have for the total entropy and
ments among oscillators. Suppose, first, that one oscillator
total energy of all oscillators having frequency ν
is assigned all p energy elements. There are N ways that
Stot,ν = Nν S(ν, T ) and E tot,ν = Nν E(ν, T ). (24) this distribution could occur. That is each of the N oscil-
lators in turn could be assigned all the energy. Another
Our search for E continues on the basis of a relation be-
distribution might be to assign all but one energy unit to
tween E and S.
one oscillator and the remaining one to another. There are
If we solve Eq. (23) for T and use Eq. (12), which relates
N (N − 1) ways to achieve this distribution. It is a routine
energy to entropy, by writing d S(ν, T )/d E(ν, T ) = 1/T ,
calculation in combinatorics to show that the total number
we can perform an integration to obtain the entropy for a
of ways to achieve all possible distributions is
single oscillator,
W = (N + p − 1)!/ p!(N − 1)! (28)
h [1 + E(ν, T )/ hν]1+E(ν,T )/ hν
S(ν, T ) = ln + D,
B [E(ν, T )/ hν] E(ν,T )/ hν Since N is very large for practical purposes, one can ne-
(25) glect the subtraction of 1 in Eq. (28) to obtain
where D is a constant of integration. We shall return to
W = (N + p)!/ p!N ! (29)
this equation later.
The next step was described by Planck as “an act We simplify W further by using Stirling’s formula x! ≈ x x
of desperation.” He had been a dedicated opponent of to obtain
Boltzmann’s view that a total amount of energy in a gas
was distributed among a finite number of molecules ac- W = (N + p) N + p /N N p p . (30)
cording to laws of probability and combinations. Laws We now have two expressions for the total entropy of the
of probability permit exceptions, and Planck thought that system. After an algebraic manipulation using Eq. (25),
such laws had no place in a theory of thermodynamics, we obtain the total entropy (for frequency ν and tempera-
which he held to be valid without exceptions. Yet now ture T ):
he saw that such a view point could be taken to move
him along toward a derivation of his own radiation law. Stot = NS
With a flexibility that often distinguishes the creative ge-
Nh E E
nius from the pedant, Planck adopted ideas of Boltzmann = 1+ ln 1 +
B hν hν
that he had previously rejected. Let us continue following
Planck’s reasoning. E E
− ln + D. (31)
To obtain discrete quantities of energy to distribute over hν hν
our oscillators, we divide the total energy E tot,ν into p
At the same time we have directly from Eq. (27), using
elements of size ε, so that
Eq. (30), a bit of algebraic manipulation, and the substi-
E tot,ν = pε = Nν E(ν, T ). (26) tution p/N = E/ε from Eq. (26):
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
revolutionized physics. The “discontinuity” (more accu- function of ν, but it will make our subsequent calculations
rately, the “discreteness”) of the energy variable and the easier to follow.
statistical nature of the behavior of discrete energy quanta We shall follow Einstein and use Wien’s expression
were ideas that were to become the foundations of a new for I ,
and controversial view of the universe.
I (ν, T ) = aν 3 exp(−bν/T ) (21)
Albert Einstein, of course, was as important to quan-
tum theory as he was to nearly every other development of instead of Planck’s, even though it was well known that
physics in the early 20th century. Sometimes a friend and Wien’s law was valid only for large values of ν/T . Solving
sometimes a foe of the rapidly evolving quantum theory, he Eq. (21) for 1/T , we obtain
made important contributions to it and, merely by paying
1/T = −ln[I (ν, T )/aν 3 ]/bν. (36)
attention to it, helped to spur the interest of the scien-
tific community. Let us now discuss two ideas of Einstein Then, using the fact that the derivative of entropy with
that were instrumental in placing the “quantum” in the respect to energy is equal to 1/T , we get
forefront of physics.
∂ S(I, ν)/∂ I = 1/T. (37)
We then substitute Eq. (36) into Eq. (37) and integrate
B. Einstein’s Theory of Light Quanta
with respect to I to obtain the entropy function
Einstein set forth his theory of light in one of three papers
−I (ν, T )[ln(I (ν, T )/aν 3 ) − 1]
published in 1905. He won the Nobel prize for that paper, S(I, ν) = . (38)
and although it is commonly referred to as his paper on bν
the “photoelectric effect,” it really was much more. Now consider a volume V in the cavity, and suppose that
Einstein was disturbed by the dualism in physics be- the radiation is nearly monochromatic, say of frequency
tween particle mechanics and the electromagnetic wave ν. The radiation energy in volume V for frequency ν is
theory of Maxwell. The fundamental difference was the given by another spectral density function,
discrete nature of the former as opposed to the continuous E(ν, T ) = VI(ν, T ). (39)
nature of the latter. The continuous wave theory of light
was inadequate to explain some experimental phenomena. The entropy in volume V for frequency ν is then found
For example, it was known that ultraviolet light incident by solving Eq. (39) for I (ν, T ) and substituting into
on a piece of metal causes electrons to be emitted from the Eq. (38):
metal. Contrary to electromagnetic wave theory, however,
S(E, ν) = VS (I, ν)
the velocity of an emitted electron does not depend on the
intensity (wave energy determined by amplitude) of the −E(ν, T )[ln(E(ν, T )/V aν 3 ) − 1]
= . (40)
incident light, but instead is a function of its frequency. bν
This phenomenon is known as the photoelectric effect. Next we recalculate a new entropy for a volume V
One of Einstein’s motives for investigating light was smaller than V . For the same energy E(ν, T ), substitute
to explain this effect, although his bold explanation was V into Eq. (40) to calculate S . Then the entropy decreases
such a violent departure from accepted theory of the wave by an amount
nature of light that it had implications far beyond consid-
−E(ν, T ) ln(V /V )
eration of the photoelectric effect. In particular, it had a S(E, ν) − S (E, ν) = . (41)
profound influence on the development of quantum the- bν
ory, although, ironically, Planck rejected Einstein’s theory Now we relate Eq. (41) to the key relation of Boltzmann
of light. statistics, S = k ln W , which applies to an ideal gas in a
We begin our development of the theory of light by closed container. (Einstein used this equation in his paper
returning to light radiation in the cavity of a blackbody of 1905 but did not express it using the letter k.) For a
furnace in thermal equilibrium at temperature T . As be- change of the gas from one state W to a state W , the
fore, consider the spectral density function I (ν, T ), which corresponding change in entropy is
gives the energy intensity per unit volume in the cavity
S − S = −k ln(W /W ), (42)
due to the portion of radiation having frequency ν.
Let us write S(I, ν) for an entropy density function, where W is interpreted as the “probability” or likelihood
which gives the entropy per unit volume as a function of for the given state to occur.
energy intensity and frequency ν, and consider that our Boltzmann statistics applies to a finite number of
first task is to find the correct expression for S(I, ν) in molecules in a gas. We are not considering a gas in black-
terms of I and ν. Our notation is a bit redundant, since I is a body radiation, nor are we explicitly using Planck’s model
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
of finitely many “oscillators.” What then shall we consider radiation law, was limited to the range of frequencies and
as the meaning of a “state” in our context? Returning to temperatures for which Wien’s law was valid.
our derivation of Eq. (42) in terms of a change in volume Einstein’s approach to quantization differed from
from V to V , let us recall that, if N particles (of any- Planck’s in a fundamental way. Planck assumed the quan-
thing) are allowed to travel freely in a vessel of volume V , tization of energy to derive his radiation law, therefore
then the probability of finding, at any given instant, all N showing that the quantum hypothesis was sufficient to
particles in a portion of the vessel having volume V < V obtain the law. Einstein, on the other hand, started with
is Wien’s radiation law and showed that one of its necessary
consequences was the quantization of monochromatic ra-
(V /V ) N . (43) diant energy.
The corresponding decrease in entropy of the system of It is also worth mentioning the advantage Einstein’s
particles from the state of random distribution to the state theory had over Planck’s in that it did not involve oscil-
of all particles in the smaller portion is thus lators and so did not directly contradict the equipartition
theorem.
S − S = −k ln(V /V ) N Let us now look briefly at the data predicted by
= −N k ln(V /V ). (44) Einstein’s theory and verified to a high degree of accuracy
by R. A. Millikan in 1916, about 10 years after Einstein
Now we argue backward from Eqs. (41) and (44). If we published his paper. We shall denote the maximum ki-
take the view that Eq. (41) gives the change in entropy netic energy of an electron emitted by a light quantum of
accompanying a decrease in the volume of the blackbody frequency ν as
cavity from V to V , and we assume that the radiation
energy E(ν, T ) is distributed among N independent par- KE = kbν − E , (46)
ticles of some sort, which are allowed to move freely in
where E is the amount of energy necessary to remove the
the cavity, then the right side of Eq. (41) must equal the electron from the metal. Note that this kinetic energy is a
right side of Eq. (45). We then conclude that linear function of ν. When accurate date are plotted as a
linear graph of KE versus ν, we can measure the slope of
N k = E(ν, T )/bν or E(ν, T ) = N kbν. (45)
the line, which turns out to be Planck’s constant h. Since
That is, monochromatic radiant energy behaves as if it the nature of the metal affects the kinetic energy of the
were composed of independent energy “quanta” of mag- electron only in the additive constant E , the slope h ap-
nitude kbν. pears to be a universal constant. In other words, kb = h in
Einstein immediately suggested that the same reasoning Eq. (46), so that the energy of a light quantum of frequency
could be applied to the radiation of light and thus fired ν is hν.
another shot in the 20th century revolution in physics by Although Einstein and Planck were at odds over the fun-
proposing a return to the particle theory of light: a view damentals of each other’s work, this constant h provided
considered dead nearly a century. This proposal was so an undeniable link between their theories and a powerful
radical that even Planck, who never considered himself a motivating force for further investigations into its role in
revolutionary anyway, rejected it. nature.
Let us see how Einstein’s theory of light quanta pro-
vided an explanation of the photoelectric effect. If an elec-
C. The Quantum Theory of Specific Heat
tron in a metal strip is set free by the energy of a quantum
of light incident on the metal, then the kinetic energy of At the same time Einstein put forth his theory of light he
the electron is equal to at most the energy of the incident also proposed a theory of specific heat of solids that was to
quantum, which is proportional to the frequency of the become another pillar of quantum theory. His work further
light. This kinetic energy is less than the energy of the in- widened the range of applicability of quantum concepts.
cident light quantum, according to the amount of work re- Let us return to Planck’s radiation law,
quired to overcome the forces tending to keep the electron
E(ν, T ) = hν/[exp(hν/kT ) − 1] (34)
bound to the metal. Moreover, increasing light intensity
increases the number of light quanta incident on the metal for the average (over time) energy of an oscillator of
and hence increases the number of electrons freed; but the frequency ν in a blackbody furnace in equilibrium at
energy of the freed electrons is dependent solely on the temperature T . Recall that a consequence of Planck’s
frequency of the light. derivation of his radiation law is that the oscillators in the
Einstein was careful to point out that his light quan- radiation field can have energy values only in the discrete
tum hypothesis, which was motivated in part by Wien’s range 0, hν, 2hν, . . . , for these are the values of the energy
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
elements he uses to distribute over the oscillators to make as a key energy value with an existence not immediately
his statistical calculations. Einstein understood that this tied to linear oscillators. Second, as we shall see later, this
implied a revolutionary principle that in turn implied a generalized derivation of Eq. (34) enabled Peter Debye to
modification of all theories of physical phenomena dealing use it as a point of departure to obtain Planck’s radiation
with exchanges of energy between radiation and matter. law without the fundamental inconsistencies contained in
In particular, the theory of heat, based on the equiparti- Planck’s derivation. Finally, the formulation of φ as a prin-
tion law, was at odds with experimental evidence on the ciple applying to all systems involving interaction between
specific heat of solids. He sought a new theory of spe- matter and radiation elevated the status of quantum theory
cific heat without the equipartition law, similar to the way from an ad hoc assumption about oscillators to a scientific
Planck had disregarded equipartition in his theory leading principle.
to Eq. (34). Let us apply Eq. (34) to vibrating atoms, which deter-
Although Einstein used Eq. (34) in his theory of specific mine the heat capacity of a solid. If an atom has three
heat, it is important to note that he provided his own deriva- independent vibrational degrees of freedom, all vibrating
tion of it. To get at the heart of Einstein’s derivation, let with the same frequency ν, then it can be considered to be
us consider a vibrating physical system (e.g., a linear os- three independent vibrating systems, and so the energy of
cillator) that can exist in various states of thermodynamic each atom is calculated from Eq. (34):
equilibrium, while vibrating at frequency ν. Let us denote
E(ν, T ) = 3hν/[exp(hν/kT ) − 1]. (48)
by φ(E, ν) the energy density function, which, when in-
tegrated between limits E 1 and E 2 , gives the number of The energy of 1 g-atom of the solid is therefore
states that have energy between those limits. Let P be a
E(ν, T ) = 3N hν/[exp(hν/kT ) − 1], (49)
probability density function: P(E, T ) is the probability
that the system is in a state of energy E when at equilib- where N , known as Avogadro’s number, is the number of
rium at temperature T . Then the energy of the system is atoms in 1 g-atom of the solid in question. From Eq. (14),
given by the expected value the heat capacity of the solid is then
∞
EP(E, T )φ(E, ν) dE dE tot (ν, T ) 3N k(hν/kT )2 exp(hν/kT )
E(ν, T ) = 0 ∞ . (47) = . (50)
[exp(hν/kT ) − 1]2
0 P(E, T )φ(E, ν) dE
dT
We now arrive at Einstein’s characterization of φ, which The formula for the specific heat of the solid then follows
defines a principle applicable to all systems involving in- by dividing Eq. (50) by the mass of the solid. This re-
teraction between vibrating matter and electromagnetic sult was recognized by Einstein as subject to correction
radiation. because it rested on serveral simplification assumptions.
There exists a positive number r , very small compared In spite of its need for correction, Eq. (50) represents
with hν (h is Planck’s constant), and a sequence of inter- the first application of quantum theory to solids. What is
vals of numbers In = [nhν, nhν + r ] such that: more important, historically, is that it established that the
principle of quantum theory reached beyond one or two
specific physical phenomena. This was the view of Walther
1. φ(E, v) = 0 only for values of E lying in the intervals
Nernst, who obtained solid evidence of Einstein’s formula
I0 , I1 ,. . ..
for specific heat and whose eloquent praise of quantum
2. In φ (E, ν) dE is equal to the same constant for all
theory around 1910 was instrumental in the organization
integers n = 0, 1, 2, . . . .
of the first Solvay Congress in 1911. That was a conference
of distinguished physicists gathered together in Brussels
Figure 2 shows intervals In for n = 0, 1, 2, 3.
for the purpose of discussing the “new” quantum theory.
From this formulation of φ and the use of expres-
sion P(E) = exp(−E/kT ) (from statistical mechanics)
D. Debye’s Derivation of Planck’s
in Eq. (47), Einstein was able to derive Eq. (34).
Radiation Law
What is accomplished by this formulation of φ is the
restriction of nonzero energy values to a set of small inter- As we have noted, Planck’s theory leading to his radiation
vals: again, almost a “quantization” of energy. Much more law contains a fundamental inconsistency made clear
is also accomplished, however. First, the value hν emerges by Einstein: the requirement that oscillator energy by
FIGURE 2 Intervals of nonzero values of energy density φ(E, ν) in Einstein’s derivation of the law of blackbody radiation.
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
continuously variable for the formula I (ν, T ) = we can write that the number of integer points (k, l, m) in
aν 2 E(ν, T ) and that it be discontinuously variable for the Q(r, r ) (one-eighth of a spherical shell) is
formula E(ν, T ) = hν/[exp(hν/kT ) − 1]. Nevertheless,
Nr = 4πr 2 r/8. (54)
the unqualified success of the radiation formula itself in
matching experimental results, together with the success- Each point with integer coordinates determines a mode
ful application of the fundamental idea of quantization to of vibration of a particular wavelength, although it is clear
other realms of physics, prompted great efforts to remove from Eq. (51) that several points (modes) can correspond
the inconsistency. This was finally accomplished by Peter to the same wavelength. In addition, in electromagnetic
Debye in 1910. Let us trace Debye’s argument. radiation each mode can be polarized in one of two ways,
The key to our success will be the elimination alto- so that we must double the right-hand side of Eq. (54) to
gether of the need for oscillators. Instead, we consider the account for all possible modes of vibration corresponding
cavity of a blackbody furnace to be a resonating chamber to Q(r, r ). Now let us use Eq. (53) to rewrite Eq. (54)
and draw an analogy between the radiation waves in the in terms of frequency, replacing r with (2L/c) ν, to
electromagnetic field in the cavity and the vibrations of obtain
an elastic fluid in an enclosed container. This idea was,
N ν = 8π v 2 L 3 ν/c3 . (55)
in fact, proposed in 1900 by J. W. Strutt, better known as
Lord Rayleigh, who was an expert in sound and likened Finally, we allow ν to become infinitesimally small
blackbody radiation to sound waves. and divide by L 3 to obtain a density function giving the
It is known from the theory of sound that a cubic box density of modes of vibration of the electromagnetic field
of volume L 3 supports standing sound waves of vibration per unit volume in the blackbody cavity as a function of
only in modes of wavelength frequency:
λ = 2L/ k 2 + l 2 + m 2 , (51) φ(ν) = 8π ν 2 /c3 . (56)
where k, l, and m are positive integers. The next step is to consider a unit volume of the cavity
Let R 3 stand for three-dimensional space. For each posi- in thermal equilibrium at temperature T and the total elec-
tive number r , consider the sphere of radius r centered at tromagnetic energy I (ν, T ) due to radiation of frequency
the origin and let Sr denote the segment of the sphere in ν. Then we partition this total energy into discrete packets
the first octant. Now if a mode of vibration in the cubic box of value E(ν, T ) and distribute these packets among the
with wavelength λ given by Eq. (51)√is associated with a modes associated with ν according to Eq. (56) to obtain a
point (k, l, m) in R 3 of distance r = k 2 + l 2 + m 2 from spectral density function for energy:
the origin, we have from Eq. (51) that I (ν, T ) = φ(ν)E(ν, T ) = 8π ν 2 E(ν, T )/c3 . (57)
λ = 2L/r. (52)
Thus, we arrive once again at Eq. (22), this time with-
In terms of frequency ν = c/λ (c is the speed of light, the out recourse to classical theory of linear oscillators. To be
speed at which we assume electromagnetic waves propa- sure, this step is no less bold than Planck’s original “act
gate), we can rewrite Eq. (52) as of desperation” cited in Section III.D. Debye was encour-
aged to use this assumption of quantized energy by the
r = 2Lν/c. (53)
derivation of Eq. (34) in Einstein’s theory of specific heat.
Let us now consider an interval of numbers [r, r + r ], To obtain Planck’s radiation law it is now necessary
and ask how many points (k, l, m) with integer coordi- only to consider each mode of vibration to contribute the
nates lie in the region Q(r, r ) between Sr and Sr + r . If quantized value E(ν, T ) = hν/[exp(hν/kT − 1].
r is very large, which it is for the very short wavelengths As we mentioned above, Debye’s derivation of the radi-
in blackbody radiation, then this number of points can be ation law is fundamentally more attractive than Planck’s
approximated reasonably by the volume of Q(r, r ). To because, by avoiding arguments based on classical har-
see this, imagine a large region of space filled with unit monic oscillators, it is possible to avoid the inconsistency
cubes centered at the points in the region with integer co- in Planck’s theory cited by Einstein. By the same token,
ordinates. These cubes fill space and each contains exactly this derivation is noteworthy for its lack of any mechni-
one point with integer coordinates. The larger the region, cal model to account for the radiation. It is necessary in
the better we can approximate its volume with such unit Debye’s approach only to break up the variable, energy,
cubes. In other words, points with integer coordinates oc- into discrete quantum values and distribute those values
cupy space with a density of one per unit volume. over modes of vibration of a “vibrating” electromagnetic
Now the volume of Q(r, r ) can be approximated by field. The quantum theory, the assumption that the physical
the surface area of Sr times the thickness of Q(r, r ). So quantity, energy, is a discontinuous variable, appears here
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
as a fundamental law of nature, and the role of the clas- plicated interaction among the orbiting electrons, but no
sical mechanical model as nature’s fundamental building such escape clause can exist in Rutherford’s theory for the
block is thereby diminished. hydrogen atom.
Debye’s work is noteworthy for another reason. It out- Niels Bohr, a Danish physicist who had visited
lines the basic pattern that characterizes all applications Rutherford’s laboratory in Manchester in 1912, applied the
of what we have called the quantum theory. That is, to quantum theory to the planetary model of the atom. Bohr’s
apply the quantum theory to a physical phenomenon, first theory can be conveniently outlined in four postulates.
find a description of the phenomenon that involves both Like Planck’s postulate about harmonic oscillators, some
energy E and frequency ν. Second, set energy propor- of Bohr’s postulate were controversial because they were
tional to frequency, E = hν, where h is Planck’s constant. ad hoc, without explanation based on classical physics.
All applications of early-20th-century quantum theory are Like Planck’s postulate, however, they resulted in experi-
variations on this theme. If E = mc2 has become the inter- mentally verifiable calculations, and so they were difficult
nationally recognized equation symbolizing the theory of to ignore. Furthermore, his postulates not only addressed
relativity, then E = hν might well deserve the same status the question of why atoms do not collapse but also intro-
as the symbol of quantum theory. duced new mixtures of ideas relating waves to particles
that were to become general principles of physics. Let us
now examine Bohr’s theory of the atom by paraphrasing
his postulates.
IV. THE BOHR ATOM
a bundle of radiant energy emitted from an atom, racing We then apply classical laws of mechanics [Eq. (8)] to
through space with many other bundles in a wavelike beam arrive at the kinetic energy of the electron:
of radiation, and each bundle carries a message about the
frequency of the beam according to Eq. (58). This idea KE = Z e2 /2r. (61)
was already well known for Einstein’s light quanta, of
course, but now it appears again in another context. The The potential energy of the electron at every point in the
generalization of the idea of these “wave–particles” is one orbit due to the attraction of the nucleus is
of the cornerstones of quantum mechanics. PE = −Z e2 /r. (62)
Bohr’s fourth postulate is directly related to the appli-
cation of Rutherford’s model to the study of atomic spec- This is the negative of the amount of work required to
tra: radiation emitted from an electrically charged gas. remove the electron from its orbit to a theoretical infi-
When an electric charge is passed through a gas, neon, nite distance from the nucleus. Thus, the total mechanical
for example, the gas emits electromagnetic radiation, pre- energy is
dominantly red light in the case of neon. The radiation
emitted covers a continuous set of frequencies, but it is E = KE + PE = −Z e2 /2r. (63)
much stronger at a certain discrete set of frequencies than
it is at others, giving rise to the terms continuous and dis- The absolute value of E is the (positive) amount of energy
continuous spectra, the latter being the set of frequency required to bind an electron in its orbit, and we shall denote
values at which radiation is very strong. We shall again it, as is customary, by W . We then observe that the value
use the more traditional word discontinuous in place of of W equals the kinetic energy of the electron given by
discrete. If we pass the beam of radiation from a glowing Eq. (61). Then, writing that kinetic energy in its classical
gas through a prism that separates the waves according to form in terms of the rotational frequency ω of a particle
frequency, we can observe part of the spectrum, the visible of mass m moving in circular orbit of radius r , we have
part, as in Fig. 3. The light lines indicate intense light at
frequencies in the discontinuous part of the spectrum. W = |E| = KE = 2π 2 mr 2 ω2 . (64)
The spectral lines were the object of much research near
This formula for binding energy was adjusted by Bohr
the end of the 19th century. This research resulted in em-
to account for noncircular orbits and rotating nuclei, but
pirical formulas relating the frequencies of the discontinu-
our simplified form is sufficient to continue with our story
ous spectra to sets of positive integers. One such formula,
of the Bohr atom.
developed by a Swiss schoolteacher, Johann Balmer, and
The next step is based on the pattern characteristic of
reformulated by Janne Rydberg in 1890 (Rydberg claimed
applications of quantum theory mentioned at the end of
originality) can be written
Section III. We seek a relation between energy W and
ν = R(1/n 2 − 1/m 2 ), (59) orbital frequency ω and Planck’s constant h. Bohr actu-
ally provided several plausibility arguments leading to the
where R is a constant, known as Rydberg’s constant, n fourth postulate, some more convincing than others. We
and m are positive integers (with m > n), and ν is the shall follow two of those arguments, one because it is sim-
frequency of a line in the discontinuous spectrum. For ple, the other because it is based on an important philo-
n = 2 and m = 3, 4, 5, . . . , frequencies given by Eq. (59) sophical principle. Remember, however, that these are just
had been observed for the spectrum of hydrogen gas. heuristic arguments. All of Bohr’s postulates are ad hoc
Let us now move to Bohr’s fourth postulate and see assumptions.
how it leads to an explanation of empirical formula (59). We begin the first heuristic argument by establishing a
Consider an electron in circular orbit of radius r in an atom relation between orbital frequency ω and the frequency ν
of atomic number Z . At each point in the orbit the force of radiation emitted during a change between stationary
on the electron due to the attraction of the nucleus is orbits. Assume that in the process of binding a free electron
to the nucleus a binding energy quantum of frequency
F = Z e2 /r 2 . (60) ν is required. If the orbital frequency resulting from the
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
binding process is ω, then ν should be the average of ω for (with smaller orbit), which corresponds to a decrease in
the orbiting electron and zero for the free electron. Thus, mechanical energy from a negative value E n+1 to a greater
negative value E n .
ν = ω/2. (65)
Then from Eq. (58) we have that
Now we can state Bohr’s fourth postulate.
ν = (E n+1 − E n )/ h
4. Postulate IV = (−Wn+1 + Wn )/ h, (72)
An atom with one electron can exist in stationary states of which from Eq. (71) gives us
equilibrium indexed by natural numbers n = 1, 2, 3, . . . .
π 2 me4 Z 2 1 1
In the state associated with number n, the electron orbits ν= −
about the nucleus and the mean value of its kinetic energy 2α 2 h 3 n2 (n + 1)2
in that orbit is given by π 2 me4 Z 2 2n + 1
= . (73)
KE = nhω/2, (66) 2α 2 h 3 n 2 (n + 1)2
where ω is its orbital frequency and h is Planck’s constant. This brings us to the correspondence principle:
Note that this postulate is again a variation on the theme
E = hν. For very large quantum numbers n, the quantum theory fre-
quency of radiation ν should correspond to the classical theory
frequency ω of radiation from a charged particle in a circular
C. The Correspondence Principle
orbit of orbital frequency ω.
Next we shall examine a second argument leading to
Eq. (66) based on what is now called the correspondence In other words, the frequency ν given by Eq. (73) should
principle. We begin by assuming that the energy W for equal the frequency ωn given by Eq. (70) for large values
binding an electron into stationary orbit of frequency ω is of n.
proportional to hω. We then suppose that there is an orbit Now for large values of n, (2n + 1)/n 2 (n + 1)2 is
of lowest binding energy W = αhω, where α is a propor- approximately equal to 2/n 3 . If we therefore replace
tionality constant, and that all other orbits have binding (2n + 1)/n 2 (n + 1)2 in Eq. (73) with 2/n 3 and use the cor-
energies W1 , W2 , . . . given by respondence principle to set ν = ωn , we arrive at α = 12 .
We now take another bold step by declaring that what is
Wn = nαhω, (67)
true for large n must be true for all n. This will complete
where n is a natural number called the quantum number our second heuristic argument leading to postulate IV, for
for the stationary state. The state with n = 1 is called the if the kinetic energy KE of a rotating electron in an orbit
ground state of the atom. of quantum number n is to equal the binding energy Wn ,
Let us solve Eq. (64) for ω2 to obtain we have from Eq. (67) (recall that α = 12 ) that
ω2 = W/2π 2 mr 2 (68) KE = nhω/2. (74)
and replace r using Eq. (61) and the fact that W = KE
2
The correspondence principle has been generalized and
to obtain restated many times since it was first applied by Bohr
in 1913. A form of this principle is another one of the
ω2 = 2W 3 /π 2 me4 Z 2 . (69)
cornerstones of quantum mechanics. That is why we have
Now if we index each orbit by its quantum number n traced this last argument leading to Eq. (74).
to rewrite Eq. (69) using ωn and Wn in place of ω and W ,
we can substitute Eq. (67) into Eq. (69) and solve for ωn :
D. Consequences of Bohr’s Theory
ωn = π 2 me4 Z 2 /2α 3 n 3 h 3 (70)
First, note that we can rewrite Eq. (74) in terms of angu-
Using Eq. (67) once more, we obtain lar momentum l of an orbiting electron with kinetic en-
Wn = nαhωn = π 2 me4 Z 2 /2α 2 n 2 h 2 . (71) ergy KE. From Newtonian dynamics it can be shown that
l = KE/π ω for orbital frequency ω, and so from Eq. (74)
Now let us use postulate III and consider the quantum we obtain
of radiation with frequency ν emitted during a change of
state from one with quantum number n + 1 to the state with l = nh/2π (75)
quantum number n. This is a change from a state of smaller for stationary orbit with quantum number n. In other
binding energy Wn+1 to one of higher binding energy Wn words, the quantization of the kinetic energy is equivalent
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
to the quantization of the angular momentum of the or- in our discussion of Newton’s contributions to science
biting electron. Though it would be much easier to derive (Section I) we cited, as one of the most important, the no-
Eq. (74) by simply assuming the quantization of angular tion that explanations of hidden, unobserved events come
momentum, that derivation completely avoids the corre- from precise measurements of observed events. In fact, it
spondence principle and obscures a major philosophical is this deductive power that some people equate with sci-
point about Bohr’s contribution to quantum theory. ence itself. Yet Werner Heisenberg in 1925 saw that it was
Let us note also the close resemblance between Bohr’s the universal application of this notion that stymied the de-
formula (73) and Rydberg’s formula (59). As we men- velopment of a quantum mechanics. By first accepting the
tioned, Bohr adjusted his value for mass m in Eq. (73) to philosophical viewpoint that the only quantities of physics
account for the combined mass of nucleus and electron. are the observable ones, he was able to produce his kine-
Setting α = 12 in Eq. (73) gave Bohr an equation of the matics of quantum theory, a calculus of “observables.” We
form of Eq. (59) from which he could calculate the value now call it matrix mechanics, although Heisenberg never
R = 2π 2 me4 Z 2 / h 3 . used the word matrix to describe the rectangular arrays of
For hydrogen, where Z = 1, this formula provided ex- numbers that appeared in his equations as the observables.
cellent agreement with the experimentally determined One of the results of Heisenberg’s mechanics was a
value for R, using the best-known value for h at the calculation that involved Planck’s constant in a profound
time. Another major victory had been scored for quan- way. With each observable Heisenberg defined a quantity
tum theory. called the uncertainty of the observable. He then showed
that, for certain pairs of observables, the product of their
uncertainties is at least as large as Planck’s constant. Two
V. TRANSITION TO QUANTUM observables related in this way are called incompatible
MECHANICS observables. A consequence of this “uncertainty princi-
ple” is that the reduction of the uncertainty in one of the
There were several important victories for quantum the- observables necessarily implies an increase in the uncer-
ory between 1913 and 1925. None of them, however, pro- tainty of the other. Heisenberg took these uncertainties to
vided new fundamental principles. So it is at this point be a fundamental fact of nature, not a consequence of the
that we conclude our discussion of quantum theory with inaccuracy of the measuring devices of physicists. Thus,
a brief look at some of the steps that were required to he took one step farther Planck’s reluctant acceptance
make the transition from quantum theory to quantum of the fundamentally probabilistic behavior of oscillators
mechanics. and the implied uncertainty of that behavior. Heisenberg’s
Recall that what we have been calling quantum theory complete break with the classical Newtonian physics of
is really a collection of theories applied to different phe- certainty sparked years of research that continues to this
nomena by mixing variable amounts of classical physics day, not only in physics, but also in philosophy, logic, and
and quantum hypotheses. The three themes of quantum mathematics.
theory—the quantization of energy and the probabilistic Another step in the transition to quantum mechanics
behavior of energy quanta, the wave–particle nature of was the theory of wave mechanics developed by Erwin
some matter, and Planck’s constant—formed an interre- Schrödinger. Working at the same time as Heisenberg in
lated set of ideas that lacked a universality and coherence late 1925, but completely independently, Schrödinger pro-
necessary for them to constitute a scientific theory. Also duced a parameterized set of partial differential equations
lacking was a system of mathematical expressions com- that had solutions only for a discrete set of values of the
mon to all applications of quantum theory from which parameter. The equations involved a differential operator,
one could calculate the values of quantities observed in and Schrödinger was able to show that his equations could
experiments. be applied to any physical system by choosing appropriate
Quantum mechanics, like Newtonian mechanics, was differential operators. He then claimed that, after an oper-
born of the necessity to bring mathematical clarity and ator was correctly chosen to represent a particular system,
order to the chaos of observations of the physical universe. the discrete parameter values for which the resulting equa-
Although Newtonian mechanics brought order to a set of tion had solutions were all the possible values of energy
observations of the continuous, predictable macroscopic for that system. Here, then, was a universally applicable
world, it was inadequate to deal with the new chaos created mathematical formula that helped change quantum theory
by quantum theories of the discontinuous, unpredictable into quantum mechanics.
microscopic world. Although Schrödinger did not supply a convincing the-
One step in the transition from quantum theory to oretical foundation for his equations, he was able to show
quantum mechanics was a philosophical one. Recall that in 1926 that his equations were mathematically equivalent
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
to Heisenberg’s matrix mechanics. There was no doubt in quantum mechanics was incomplete because it contained
anyone’s mind that the resulting combined theory, quan- no counterpart for an “element of reality” present in the
tum mechanics, was a major scientific achievement. There objects in their apparatus. They were quite precise in their
was considerable doubt in the minds of some, however, definition of an “element of reality”:
Einstein most prominently, that the new mechanics, with
its philosophical roots in the physics of uncertainty, was If, without in any way disturbing a system, we can predict with
as universally applicable as its proponents claimed. Al- certainty . . . the value of a physical quantity, then there exists
though controversy over its range of applicability persists an element of physical reality corresponding to this physical
to this day, quantum mechanics was the final step that quantity.
brought Max Planck’s “desperate act” to explain black-
body radiation to the status of a full-fledged scientific We shall describe a simplified version of the EPR thought
theory. experiment. Although our presentation is faithful to the
ideas in the EPR paper, we have taken advantage of the
65 years scientists have had to study the paper to reframe
VI. AFTER QUANTUM MECHANICS: the discussion for greater clarity.
MODERN DEVELOPMENTS IN Two quantum mechanical objects (photons or electrons,
FOUNDATIONS OF QUANTUM for example) are sent hurtling through space in opposite
PHYSICS directions by a firing device. A certain measurement is
performed on Object 1 (traveling to the left in the diagram)
The quantum mechanics developed by Schrödinger and at collector A, and a similar measurement on Object 2 at
Heisenberg provided mathematical formulas with which collector B (see Fig. 4).
to calculate physical values but lacked theoretical under- At each collector is a dial which can be set at any angle
pinnings. These were supplied by Max Born, Niels Bohr, from 0 to 360 degrees. We shall explain the use for the dial
and many others in the late 1920s and early 1930s. The the- later. The measurements are arranged so that for every
ories, however, were controversial, and remain so to this setting of the dial, every time a pair of objects is fired,
day. While quantum mechanics has been successful in pre- when an object reaches its collector one of two outcomes
dicting the outcomes of some delicate experiments with is recorded: “yes” or “no.”
photons, electrons, and other nuclear particles, some cru- It is theoretically possible to design the EPR appara-
cial predictions have not been experimentally confirmed. tus using special pairs of objects so that the outcomes
These predictions lie at the heart of the quantum theory of the measurements at the two collectors are corre-
controversy that we will explore in this section. lated if the predictions of quantum mechanics are correct.
The most famous criticism of the theoretical founda- Specifically, quantum mechanics predicts the following
tions of quantum mechanics was published in 1935 by correlation:
Albert Einstein, Boris Podolsky, and Nathan Rosen in a
paper now called the “EPR paper” after the names of its when the dials are set at the same angle, for each pair of ob-
authors. The paper describes a “thought experiment”—an jects fired, if a measurement on Object 1 at collector A registers
experiment using an apparatus which could not be built at “yes,” then one can predict with certainty that the measurement
the time but which could test the theoretical predictions of on Object 2 at B will also register “yes,” whether or not the
quantum mechanics. The authors argued that the theory of measurement at B is actually made.
This implies that Object 2 has an “element of physical that the objects in the EPR apparatus carry codes which
reality.” It is carrying a code of some sort that determines determine how the “yes–no” measurements at every an-
how the measurement at B will turn out before the mea- gle setting of the dials will come out, whether or not the
surement takes place. The roles of Objects 1 and 2 could measurements are actually made. A “local” theory is one
have been interchanged in our description, so, if we con- that prohibits a measurement at one collector from send-
clude that Object 2 is carrying a code, it is safe to assume ing a signal across space at speeds greater than the speed
that both objects carry a code as they race toward their of light to the other object.
respective collectors. We emphasize that the inconsistency demonstrated by
Einstein, Podolsky, and Rosen cited a basic tenet of Bell is not between realistic, local theories and the current
quantum mechanics, which asserts that the objects in the theory of quantum mechanics, but rather between realis-
EPR apparatus cannot carry codes. The theory of quan- tic theories and the experimental results predicted by the
tum mechanics requires that these objects behave proba- current theory of quantum mechanics. Thus, even if the
bilistically, and that only a measurement can reduce the current theory of quantum mechanics is discarded, if EPR-
probabilism to a state of certainty. The assumption that type experiments result in the correlations predicted by the
the objects cannot carry outcome-determining codes be- current theory, then Bell’s proof shows that no realistic,
fore they are measured is crucial to the theory of quantum local theory can be used to explain those correlations.
mechanics; without it the entire theory collapses. Bell’s result can be understood by considering the di-
We can see why the EPR authors called the theory of als on the EPR apparatus. Suppose we assume that there
quantum mechanics “incomplete.” Because, after measur- is a realistic, local theory describing the behavior of the
ing Object 1, one can predict with certainty the value of object pairs. By that we mean that for each pair fired an
the “yes–no” property of Object 2 without in any way element of reality is accounted for by a code carried by
disturbing it, this property has an element of physical re- the objects which determines for every setting of the dials
ality which has no counterpart in the theory of quantum which outcome (“yes” or “no”) each collector will record
mechanics. when the object gets to it. Let us set both dials at 0 degrees.
There was one response to the EPR paper which We know that quantum mechanics predicts that with these
Einstein dismissed immediately, but which has lingered settings the outcomes at the two collectors will be per-
to this day as an intriguing possibility. Perhaps the objects fectly correlated: each firing will result either in “yes” at
in the EPR apparatus have no codes when they are fired, both collectors or “no” at both.
but by measuring Object 1 at A we somehow endow both Now suppose we move the A dial to a setting +n degrees
objects with a code. This possibility is consistent with the for some small number n, say 2 or 3, and leave the B dial
theory of quantum mechanics, because it allows that the at 0. Then if 100 pairs of objects are fired, there might
codes are produced by the act of a measurement. Einstein be a loss of correlation. Let us denote by M1 the number
called this notion “spooky action at a distance.” This is of mismatches (“yes” at A and “no” at B, or vice versa)
because the apparatus could theoretically be arranged so in 100 firings. Next, let us leave the A dial at +n, and
that Object 2 was far away from Object 1 at the time of move the B dial to −n, (say 357 or 358 degrees), and
measurement at A, requiring a “spooky action” to carry fire another 100 pairs with exactly the same codes as the
a message about the measurement at A faster than the first 100 had. Bell’s results establish that as long as we
speed of light to Object 2 in time to be measured at B, assume that the objects are carrying codes, and that what
contradicting the theory of relativity. is recorded at one collector cannot affect what is recorded
Einstein, Podolsky, and Rosen ended their paper by as- at the other, then when the dials are set at −n and +n
serting that while they had shown the theory of quantum during 100 firings, the number of mismatches M2 in this
mechanics to be an incomplete description of physical re- new situation cannot exceed 2M1 . That is M2 ≤ 2M1 . The
ality, they believed that a complete, realistic theory could theory of quantum mechanics predicts, however, that the
be found eventually. number of mismatches in the second set of firings will be
Hundreds of books and papers have been written about more than twice the number of mismatches in the first set.
the EPR dilemma, and for years theoreticians searched That is, M2 > 2M1 . The predictions of quantum mechan-
for the complete, realistic theory Einstein and others ics and realistic, local theories are then in irreconcilable
believed to exist. In 1964, however, nine years after conflict.
Einstein’s death, a Scottish mathematician, J. S. Bell, Much modern experimental research is devoted to
poured cold water on the search. He showed that no “real- turning the EPR-type thought experiments into real
istic, local” theory to explain the EPR experiment can ex- experiments to establish whether or not the predictions
ist that is consistent with the experimental outcomes pre- of quantum mechanics actually attain. One such series of
dicted by quantum mechanics. A “realistic” theory asserts experiments was performed by a team of scientists headed
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
by Alain Aspect at Orsay, France, between 1982 and 1988. notion—one which requires a realistic, local theory of the
These experiments provide evidence that the results pre- physics governing the bit. As we saw in Sections V and VI,
dicted by quantum mechanics do in fact attain in the phys- however, realistic, local theories are at odds with quantum
ical world as we know it. The evidence, however, is not mechanics. Instead, on the quantum level it is essential to
incontrovertible. The Aspect experiments used object de- accept the fact that a computer bit exists in a probabilistic
tectors which are the best that can be built with mod- state. So the best we might be able to say is that a bit
ern technology, yet are only about 15% efficient. That is “in a zero state with a given probability.” While this
is, only about 15% (or fewer) of the measurements at situation might seem to spell doom to the idea of building
each collector can be guaranteed to be correct. To rule computers on the quantum level, it turns out instead to
out the possibility that the results in the Aspect exper- present a promising new direction for computing.
iments were merely a coincidence produced by experi- To see how uncertainty can be used to advantage let’s
mental error, we would have to build detectors that are look at a geometrical conceptualization of the Uncertainty
at least 80% efficient. Most experimenters do not see the Principle. Recall from Section V that Heisenberg’s matrix
possibility of building such detectors in the foreseeable mechanics identifies pairs of incompatible observables.
future. Two observables are incompatible if at every instant the
We see then that 90 years after Max Planck put forth his more certain we are about the value of one of them, the
“desperate” quantum hypothesis, the theory behind quan- more uncertain we must be about the value of the other. To
tum behavior is still not fully formulated. Current formu- make precise this notion of uncertainty of values let’s con-
lations are fraught with ambiguities and counter-intuitive sider a physical system (a spinning electron, for example)
hypotheses. They are at odds with centuries of Western and an observable O1 which can take on two values a and
thought, including Platonism, classical physics, and most b (say, “spin-up” and “spin-down” in a certain direction).
religions. It is understandable that many people are re- We can represent the two values with a perpendicular set
luctant to throw out all of this tradition merely because of axes. We say that at every instant in time the system
one peculiar model for describing subatomic phenomena “exists in a state ,” which we represent as a vector of
is enjoying high predictive value at the moment. It is little length one on our axis system (see Fig. 5).
wonder that the theories of quantum physics are the ob- Then we make the statement which lies at the heart of the
ject of such intense scrutiny and debate in the worlds of probabilistic description of quantum mechanics. We say
science, mathematics, and philosophy. that when the system is in state , then a measurement of
observable O1 will yield the value a with probability |λ1 |2 ,
which is the square of the length of the projection of the
VII. AN ADVANTAGE OF UNCERTAINTY: state vector onto the a axis. At the same time the mea-
QUANTUM COMPUTING surement will yield value b with probability |λ2 |2 , which is
the square of the length of the projection of the state vector
Even though the problems with the quantum theory ex- onto the b axis (see Fig. 6). (Note that |λ1 |2 + |λ2 |2 = 1,
posed by the EPR paper and Bell’s Theorem caused a because has length one. So the probability that a mea-
great stir among philosophers and those fascinated by surement will yield “either a or b” is one.)
the foundations of science, they did not stop physicists If lies along the positive a axis, we’ll say that the
from using quantum mechanics to make many important system is in state ā, and if it’s along the positive b axis,
discoveries throughout the second half of the 20th cen-
tury. As the century drew to a close, however, a troubling
clash between quantum theory and high-speed comput-
ers suddenly turned into what promises to be an exciting
marriage.
In the 1980s and 1990s high-speed computers became
faster and faster, in part because their components became
smaller and smaller. Computer “bits,” the devices used to
store the “zero-one” information at the heart of all com-
putations are becoming so small that they’re beginning
to approach atomic scale. And on that scale the laws of
quantum physics begin to apply.
The most troublesome quantum law facing computer
designers is Heisenberg’s Uncertainty Principle. To say FIGURE 5 Two axes, representing possible values a and b for
that a computer bit is “in a state 0 or state 1” is a classical observable O 1, and a state vector .
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
and
0
a2 =
1
FIGURE 7 Two pairs of axes, representing possible values for
two incompatible observables, and a state vector on the positive in two-dimensional space. Then we’ll represent our func-
a-axis. tion f as a 2 × 2 matrix F with all off-diagonal entries
P1: LLL/GJK P2: GSS Final Pages
Encyclopedia of Physical Science and Technology EN013A-628 July 27, 2001 10:24
(a) (a)
(b) (b)
FIGURE 8 The function evaluator F flips state a1 only if f (a1) = 1. FIGURE 9 The function evaluator F flips state a2 only if
f (a 2 ) = 1.
Relativity, General
James L. Anderson
Stevens Institute of Technology
93
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
94 Relativity, General
Riemannian geometry Geometry in which the distance can be mapped in a one-to-one manner onto a connected
between neighboring points is defined by a metric and region of the four-dimensional Euclidean plane. Other-
is quadratic in the coordinate differences between the wise, these points are featureless and indistinguishable
points. from each other, and the manifold as a whole is character-
Space–time manifold Four dimensional manifold that ized only by its topological properties. While this manifold
underlies the construction of the various space–time is not itself associated with any physical entity, it serves as
descriptions of physical systems. the basis for the construction of the geometrical structures
Space–time theories Physical theories that make use of that are to be associated with such objects.
the space–time manifold in the formulation. Since the points of the space–time manifold can be
Special relativity The invariance of the laws describing mapped onto the four-dimensional Euclidean plane, one
physical systems with respect to Lorentz transforma- can coordinatize the manifold by assigning to each point
tions between observers moving with uniform velocity the coordinates of its image point in the Euclidean plane,
relative to each other. x µ , where the index µ takes on the values 0, 1, 2, 3. Be-
cause the points of the manifold are assumed to be indis-
tinguishable, this mapping is to a large extent arbitrary
THE GENERAL THEORY OF RELATIVITY is cur- and hence the coordinatization is also arbitrary. Depend-
rently accepted as our best macroscopic description of the ing on the topological structure of the manifold, it may be
gravitation interaction that exists between all physical sys- necessary to cover it with several overlapping coordinate
tems. It is also universal in that all physical systems are “patches” to avoid singularities in the coordinates. If, for
held to interact gravitationally. The predictions of gen- example, the manifold has the topology of the surface of
eral relativity differ both quantitatively and qualitatively a ball, it is necessary to employ two such patches to avoid
from those of the Newtonian theory of gravity. Although the coordinate singularity one encounters at the pole when
the quantitative differences between the two theories is using the customary polar coordinates.
for the most part small, so that general relativity contains If the manifold is coordinatized in two different ways,
Newtonian gravity as an approximation, these differences for example, by using Cartesian or spherical coordinates,
have been extensively tested in the solar system and other the coordinates used for one such coordinatization must
astrophysical systems. The agreement between observa- be functions of those used for the other and vice versa.
tion and theory is better than 0.5%. However, the qual- This relation is called a coordinate transformation. In or-
itative predictions of the theory are its most exciting der to preserve the continuity and differentiability of the
and challenging feature. Among others, the theory pre- manifold it must be continuous, nonsingular, and differ-
dicts the existence of gravitational radiation. Although entiable.
this radiation has not been directly observed, the effects
of its emission have been observed in the binary pul-
sar PSR 1913 + 16 and agree with the predictions of the B. Geometrical Structures
theory to within 3%. The theory also predicts the phe- In space–time theories, physical entities are associated
nomenon of gravitational collapse leading to the creation with geometrical objects that are constructed on the space–
of black holes. There is strong observational evidence time manifold. These objects can be of many different
that such objects exist in the universe. And finally, the types. A curve can be associated with the trajectory of a
general theory serves as the basis for our best descrip- particle and is specified by designating the points of the
tion of the universe as a whole, the so-called hot big-bang manifold through which it passes. This can be done by
cosmology. giving the coordinates of these points as functions x µ (λ)
of a monotonically varying parameter λ along the curve.
Likewise, a two-dimensional surface could be specified
I. SPACE–TIME THEORIES OF PHYSICS
by giving the coordinates of the surface as functions of
two monotonic parameters, and similarly for three- and
A. The Space–Time Manifold
four-dimensional regions.
To understand the revolution wrought by the general the- In addition to collections of points, one can intro-
ory it is useful to set it in a framework that encompasses duce geometrical objects that consist of a set of numbers
it as well as the other two major structures of physics, assigned to a point. These numbers are said to constitute
Newtonian mechanics and special relativity. Basic to each the components of the geometrical object. The compo-
of these structures is the notion of the space–time mani- nents of the velocity of a particle at a particular point in
fold consisting of a four-dimensional continuum of points. its trajectory would constitute such a collection. If the
It is assumed only that any finite piece of this manifold components are specified along a trajectory, a surface,
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
Relativity, General 95
or any other part of the space–time manifold, they are planet, one would have to discover a system of equations
said to constitute a field. The temperature in a room can, such as those obtained from Newton’s laws of motion to
for example, be associated with a one-component field. select from the totality of all curves the subset that would
Likewise, the electromagnetic field surrounding a mov- correspond to actual planetary trajectories.
ing charge can be associated with a field consisting of six One requirement that one would like to be fulfilled by all
components. laws of motion is that of completeness—every set of values
The basic requirement that must be met in order that allowed by them must, at least in principle, be observable.
an object be geometrical is a consequence of the indistin- It is, after all, the purpose of the laws of motion to rule
guishability of the points of the space–time manifold. It out unobservable sets of values. Nevertheless, there are
is that under a coordinate transformation, the transformed problems associated with the imposition of such a require-
components of the object must be functions solely of its ment. There may, for example, be practical limitations on
original components and the coordinate transformation. our ability to observe all the values allowed by a given set
This requirement is simply met in the case of curves, sur- of laws. It is unlikely that we will ever be able to attain
faces, and so forth; for example, given a curve, the trans- the energies needed to verify some of the predictions of
formed curve can be immediately calculated given the the grand unified theories that are being considered today.
coordinate transformation. However, if one of these theories correctly described all
An especially useful group of geometrical objects for that we can observe about elementary particle interactions,
associating with physical entities are those whose trans- we would not discard it because we could not directly ob-
formed components are linear, homogeneous functions serve its other predictions. More troubling, however, are
of the original components. The simplest example is the limitations in principle on what we can observe. When
single-component object called a scalar φ(x). Under a applied to the universe as a whole, the general theory of
coordinate transformation, its transformed value is just relativity allows for many different possibilities, yet by its
equal to its original value. The other linear, homoge- very nature we can observe only the universe in which we
neous objects constitute the vectors, tensors, and pseu- live. In the strict sense, then, the general theory should
doscalars, pseudovectors, and pseudotensors. Vectors and be considered incomplete. Nevertheless, it does correctly
pseudovectors are four-component objects (they come in describe a vast range of phenomena, and so far there does
two varieties called covectors and contravectors), while not exist a more restrictive theory that does so. Therefore,
tensors and pseudotensors have larger numbers of com- probably, the best we can do is to require that a law not
ponents. There are also objects whose transformed com- admit values that could be observed if they existed but do
ponents are linear but not homogeneous functions of not.
the original components. Finally, there are objects whose
transformed components are nonlinear functions of the
D. Principle of General Covariance
original components, although such objects have not been
used to any great extent. In all cases, however, the nature However the laws of motion are formulated, they must be
of the object is characterized by its transformation law. such as to be independent of a particular coordinatization
It should be pointed out that not all objects one can of the space–time manifold. This requirement is called the
construct are geometrical. The gradient of a scalar is a ge- principle of general covariance and was one of the basic
ometrical object while the gradients of vectors and tensors principles employed by Einstein when he formulated the
in general are not. general theory. For the principle to hold, the laws of motion
for a given set of geometrical objects must be such that all
of the transforms of a set of values of these objects that
C. Laws of Motion
satisfy the laws of motion must also satisfy these laws.
The basis for associating geometrical objects with physical The principle of general covariance is not, as has some-
entities is purely utilitarian—there is no general procedure times been suggested, an empty principle that can be sat-
for making this association. The numerical values that isfied by any set of physical laws. If, for example, the
these objects can assume are taken to correspond to the geometrical object chosen to be associated with a given
observed values of the physical entities with which they physical entity is a scalar field φ(x), then the only generally
are associated. Since not all such values can in general covariant law that can be formulated involving only this
be observed, it is necessary to formulate a set of rules, object is the trivial equation
called here laws of motion, that select from the totality of
φ(x) = const. (1)
values a given set of geometrical objects can have, a subset
that corresponds to possible observed values. Thus, if it In order to formulate a nontrivial law of motion for φ
is decided to associate a curve with the trajectory of a it is necessary to introduce other geometrical objects in
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
96 Relativity, General
addition to the scalar field. One possibility is to introduce gµν = diag(1, −1, −1, −1). If these values are substituted
a symmetric tensor field gµν (x) and its inverse g µν (x), into the other laws of motion they will no longer be gener-
which are related by the equation ally covariant, but rather they will be covariant with respect
to some subgroup of coordinate transformations. This sub-
g µρ gρν = δνµ , (2)
group will leave invariant the chosen values of the geomet-
where δνµ is the Kronecker delta, with values given by rical object (or objects) and will be called the invariance
group of the theory. The structure of this group will be
δνµ = 1 µ=ν independent of which particular set of values allowed by
(3)
=0 µ = ν the laws of motion is chosen for the absolute objects. If
and where the appearance of a double index such as ρ there are no absolute objects then the invariance group is
implies a summation over its range of values. One can just the group of all allowed coordinate transformations.
then take as the law of motion for φ the equation Absolute objects are seen to play a preferred role in a
√ theory—their values are independent of the values of the
−gg µν φ,µ ,ν = 0, (4) dynamical objects of the theory while the converse is in
general not the case. (If it is, the absolute objects become
where g is the determinant of gµν and ,µ := ∂/∂ x µ . The
superfluous and can be ignored.) A theory with absolute
tensor field gµν cannot itself be given as a function of the
objects thus violates a kind of general law of action and
coordinates directly since in that case Eq. (4) would not be
reaction. We will see that both Newtonian mechanics and
generally covariant. Rather, it must in turn satisfy a law of
special relativity contain absolute objects while the gen-
motion that is itself generally covariant. If one requires that
eral theory does not.
this law involve no higher than second derivatives of gµν ,
it can be shown that there are in fact only three essentially
different such laws for this object. One can form laws of II. NEWTONIAN MECHANICS
motion for φ other than Eq. (4), but in each case it is
necessary to introduce other geometrical objects for the A. Absolute Time and Space
purpose and to formulate laws of motion for them. In the
In his formulation of the laws of motion, Newton in-
general theory, gµν is associated with the gravitational
troduced a number of absolute objects, chief of which
field and hence couples to all other physical quantities
were his absolute space and absolute time. Absolute time
through their equations of motion.
corresponds to the foliation of the space–time mani-
fold by a one-parameter family of nonintersecting three-
E. Absolute and Dynamical Objects
dimensional hypersurfaces, which we call planes of ab-
To understand the revolutionary nature of the general the- solute simultaneity. All of the points in a given plane are
ory it is necessary to distinguish between two essentially taken to be simultaneous with respect to each other. Fur-
different types of objects that appear in the various space– thermore, these planes are such that the curves associated
time theories. We call them absolute and dynamical, re- with the trajectories of particles intersect each plane once
spectively. If the totality of values allowed by the laws of and only once. The “time” at which such an intersection
motion for some geometrical object, such as the tensor gµν takes place is characterized by the value of the parameter
introduced above are such that they can all be transformed associated with the plane being intersected. These planes
into each other by coordinate transformations, we say that are absolute in that their existence and structure are as-
that object is an absolute object in the theory. This can sumed to be independent of the existence or behavior of
occur if the law of motion for the object does not involve any other physical system in the space–time.
any of the other objects in the theory. The remaining ob- In Newtonian mechanics the interaction of particles is
jects in the theory are called dynamical objects. The elec- assumed to be instantaneous as in Newton’s action-at-a-
tric fields associated with different charge distributions, distance theory of gravity. Consequently, such interactions
for example, cannot in general be transformed into one take place between the points on the trajectories that lie in
another and hence must be associated with a dynamical the same plane of absolute simultaneity. As a consequence,
object. these planes can be observed by giving an impulse to one
Given a theory with absolute objects, it is possible to member of a group of charged particles and noting where,
coordinatize the space–time manifold so that they take on on the trajectories of the other particles, the transmitted
a specific set of values. In the case of the tensor gµν , one impulse acts.
of the three possible laws of motion mentioned above is Newton’s absolute space corresponds to a unique three-
such that every set of values allowed by it can be trans- parameter congruence of nonintersecting curves that fill
formed so that, for every point of the space–time manifold, the space–time manifold; that is, through each point of
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
Relativity, General 97
the manifold there passes one and only one such curve. tories of the free bodies are linear in t; that is, they are of
Furthermore, each curve passes through one and only one the form
point of each plane of absolute simultaneity. The existence
xi = vi t + x0i , (6)
of such a congruence would therefore imply that there ex-
ists a unique one-to-one relation between the points in any where the index i takes on the values 1, 2, 3, and vi and
two planes of absolute simultaneity. The “location” of a x0i are constants. The constants vi are the components of
space–time point would be characterized by the parame- the “velocity” of the free body whose trajectory is asso-
ters associated with the curve of the congruence passing ciated with this curve and the x0i are its initial positions.
through it. When expressed in terms of these coordinates, the laws
The notion of absolute space brings with it the notion of motion of Newtonian mechanics take on their usual
of absolute rest: a particle is absolutely at rest if its tra- form.
jectory can be associated with one of the curves of the Since the planes of absolute simultaneity and the
congruence. However, unlike the planes of absolute si- straight lines constitute the absolute objects of Newtonian
multaneity that are needed in the formulation of the laws mechanics and enter into the formulation of the laws of
of motion of material particles, these laws do not require motion of all Newtonian systems, the subgroup of coordi-
the existence of the space–time congruence of curves that nate transformations that leave them invariant as a whole
constitute Newton’s absolute space, nor do they afford any constitutes the invariance group of Newtonian mechanics.
way of detecting a state of absolute rest. This property of In addition to the group of spatial rotations and translations
the Newtonian laws of motion is known as the principle and time translations, this group consists of the Galilean
of Galilean relativity. Furthermore, since the congruence transformations given by
is not needed in the formulation of these laws, we can
xi = xi + Vi t (7a)
dispense with it and hence with Newton’s absolute space
altogether as an unobservable element of the theory. and
t = t, (7b)
B. Free Bodies
where the Vi are the components of the velocity that char-
In his setting down of the three laws of motion, Newton acterize a particular transformation of the group. The re-
was careful to give the first law, “Every body continues quirement of invariance under this group of transforma-
in its state of rest, or of uniform motion in a right line, tions is called the principle of Galilean relativity.
unless it is compelled to change that state by forces im- In terms of the primed coordinates, we see that the equa-
pressed upon it,” as separate and distinct from the second tions of a straight line (6) take the form
law. He clearly did not consider it, as it is sometimes taken
xi = vi t + x0i , (8)
to be, a special case of the second law. In effect, the first
law supposes a class of curves, the straight (right) lines, vi
where the transformed velocity components are given
to exist in the space–time manifold. Furthermore, these by the Galilean law of addition for velocities:
curves correspond to the trajectories of a class of objects
vi = vi + Vi . (9)
on which no forces act, namely, free bodies. As a conse-
quence, these curves are absolute objects of the theory.
Furthermore, they, like the planes of absolute simultane-
III. SPECIAL RELATIVITY
ity, are needed to formulate the laws of motion for bodies
on which forces act.
A. Light Cones
The transition from Newtonian mechanics to special rel-
C. Galilean Invariance
ativity in the early part of this century involved the aban-
One can always coordinatize the space–time manifold in donment of the Newtonian planes of absolute simultaneity
such a way that the parameter t, which labels the different and their replacement by a new set of absolute objects, the
planes of absolute simultaneity, is taken to be one of these light cones. With the completion of the laws of electro-
coordinates. When this is done, the equation that defines dynamics by Maxwell in the middle of the last century it
these planes is simply became evident that electromagnetic interactions between
charged particles were not instantaneous but rather were
t = const. (5)
transmitted with a finite velocity, the speed of light. This
Furthermore, the remaining coordinates can be chosen so fact, coupled with the Galilean velocity addition law, made
that the equations of the curves associated with the trajec- it appear possible that some electromechanical experiment
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
98 Relativity, General
could be devised for the detection of a state of absolute the straight lines, that are associated with the trajectories
rest and thus reinstate Newton’s absolute space. However, of free bodies. There is, however, an important difference
all attempts to do so, such as those of Michelson and between the two theories. In Newtonian mechanics any
Morley and Trouton and Noble, proved fruitless. In one straight line that intersects all of the planes of absolute
way or another, these experiments sought to measure the simultaneity is assumed to correspond to the trajectory of
absolute velocity of the earth with respect to this absolute a free body. In special relativity, on the other hand, only the
space. Even though they were sensitive enough to detect straight lines that correspond to free bodies with velocities
a velocity as small as 30 km/sec, which is much less than less than or equal to the speed of light are assumed to
the known velocity of the earth with respect to the galaxy, correspond to observable free bodies. These straight lines
no such motion was ever detected. are such that, given a point lying on one of them, the other
Einstein realized that if all interactions were transmitted points lying on it are either interior to or lie on the light
with a finite velocity there was no way objectively to ob- cone associated with that point.
serve the Newtonian planes of absolute simultaneity and
that they, like Newton’s absolute space, should be elimi-
nated from the theory. It was his analysis of the meaning C. Lorentz Invariance
of absolute simultaneity and its rejection by him that dis- Together, the light cones and straight lines constitute the
tinguished his approach to special relativity from those absolute objects of special relativity. It can be shown that
of Lorentz and Poincaré. Since, however, unlike absolute one can coordinatize the space–time manifold in such a
space, the planes of absolute simultaneity were needed in way that the points with coordinates x µ lying on a straight
the formulation of the laws of motion for material bodies, line are given by the equations
it was necessary to replace them by some other structure.
µ
The key to doing this lay in Einstein’s postulate that the ve- x µ = v µ λ + x0 , (10)
locity of light is independent of the motion of the source. If µ
this is the case, and to date all experimental evidence sup- where the v µ and x0 are constants and λ is a monotone
ports this postulate, the totality of all light ray trajectories increasing parameter along the line. For the straight lines
form an invariant structure and can be associated with a that correspond to the trajectories of free bodies, the v µ
corresponding family of three-dimensional surfaces in the are constrained by the condition that
space–time manifold, the light cones. Just as in Newtonian
ηµν v µ v ν 0 (11)
mechanics, where through each point there passes a unique
plane of absolute simultaneity, in special relativity through where ηµν = diag(1, −1, −1, −1). Provided that ηµν v µ v ν
each point there passes a light cone that consists of all of > 0, it is always possible to choose the parameter λ such
the points on the curves passing through this point that that ηµν v µ v ν = 1. In this case the v µ are said to constitute
correspond to the trajectories of light rays. the components of the four-velocity of the particle and
In Newtonian mechanics the interaction of particles was τ = λ is called the proper time along the line.
assumed to take place between the points on the curves as- In addition to the form (10) for the straight lines, coor-
sociated with their trajectories that lay in the same plane of dinates can be chosen so that the points x µ lying on the
absolute simultaneity. In special relativity this interaction µ
light cone associated with the point x0 satisfy the equation
is assumed to take place, depending on the type of in- µ
teraction that exists between the particles, either between ηµν x µ − x0 x ν − x0ν = 0. (12)
points that lie in the same light cone or between one such
For points interior to this light cone the quantity on the
point and points in the interior of the light cone associated
left side of this equation is greater than zero, while for
with this point. The electromagnetic interaction, for ex-
points exterior to it it is less than zero. In what follows,
ample, takes place between points lying in the same light
coordinates in which the straight lines and light cones
cone. Consequently, these light cones can be observed by
are described by Eqs. (10) and (12) will be called inertial
giving an impulse to one member of a group of charged
coordinates.
particles and noting where, on the trajectories of the other
Since the light cones and straight lines are absolute ob-
particles, the transmitted impulse acts.
jects in special relativity, the coordinate transformations
that leave these structures invariant constitute the invari-
ance group of special relativity. In an inertial coordinate
B. Free Bodies
system, these transformations have the form
In addition to the absolute light cones, special relativity
assumes, like Newtonian mechanics, a family of curves, x µ = ανµ x ν + bµ , (13)
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
Relativity, General 99
where ανµ and bν are constants. The bµ are arbitrary while These quantities constitute the components of a geomet-
the ανµ are constrained to satisfy the conditions rical object that is linear but not homogeneous. With their
help, the equations for the coordinates x µ (λ) of the points
ηµν αρµ ασν = ηρσ . (14)
lying on a straight line can now be written as
These transformations form a group, the inhomoge- µ ρ
neous Lorentz group, each member of which is charac- d 2 x µ /dλ2 + ρσ (d x /dλ)(d x σ /dλ) = 0, (19)
terized by the 10 arbitrary values one can assign to the where again λ is a monotone increasing parameter along
ανµ and bµ . This group contains, as subgroups, the three- the curve. One can choose λ so that gµν d x µ /dλ d x ν /
dimensional rotation group and the group of spatial and dλ = 1, in which case it is the proper time along the line.
temporal translations. It also includes the group of Lorentz One can also use Eq. (19) to characterize the trajectories
transformations, now called Lorentz boosts. A boost along of light rays if one adds the condition that gµν d x µ /dλ
the x axis takes the form d x ν /dλ = 0. Such rays have the property that they serve as
x 0 = γ (x 0 + βx 1 ) the generators of the light cones. Equation (19) is usually
x 1 = γ (βx 0 + x 1 ) referred to as the geodesic equation since it has the same
(15) form as the equation for a geodesic curve, that is, a curve of
x 2 = x2
minimum length connecting two points in a Riemannian
x 3 = x 3, space with a metric gµν .
where γ = (1 − β 2 )−1/2 and β is a parameter that char- Having introduced the tensor gµν , it now becomes nec-
acterizes the boost. These transformation equations take essary to construct a law of motion for it. In special rela-
their more familiar form if we set x µ = (ct, x, y, z) and tivity this law is taken to be
similarly for x µ and β = v/c, where c is the velocity of Rµνρσ = 0, (20)
light, in which case v is the velocity associated with the
transformation. In special relativity, the Lorentz boosts re- where the tensor Rµνρσ , called the Riemann–Christoffel
place the Galilean transformations of Newtonian mechan- tensor, is constructed from the tensor gµν according to
ics just as the light cones replace the planes of absolute
Rµνρσ = 12 (gµρ,νσ + gνσ,µρ − gµσ,νρ − gνρ,µσ )
simultaneity and the requirement of invariance under this α β α β
group is called the principle of Special relativity. Also, + gαβ µρ νσ
− µσ νρ . (21)
the Galilean law of addition for velocities, Eq. (9), is no
It appears in the equation of geodesic deviation that
longer valid. For a boost in the x direction, the transformed
governs the separation between two neighboring, freely
components vi of the velocity of a body are related to its
falling bodies. When all of the components of Rµνρσ van-
original components vi by the equations
ish, this separation remains constant.
vi = δ(v1 + v) v2 = γ −1 δv2 v3 = γ −1 δv3 , It can be shown that every solution of Eq. (20) can be
(16) transformed so that gµν = ηµν everywhere on the space–
where δ = (1 + v1 v/c2 )−1 . time manifold, in which case gµν is said to take on its
Minkowski values. In a coordinate system in which gµν
has this form, Eq. (17) becomes
D. The Space–Time Metric
ηµν φ,µ φ,ν = 0 (17a)
Equations (10) and (12) for straight lines and light cones
are given in a special coordinate system in which they It is seen that the surface defined by Eq. (12) satisfies
assume these simple forms. It is possible to write gener- this equation. Likewise, in this coordinate system, Eq. (19)
ally covariant equations for these objects by introducing a reduces to
symmetric second rank tensor gµν of signature −2 to-
gether with its inverse g µν . With its help, the light cones d 2 x µ /dλ2 = 0 (19a)
can be characterized by the surfaces φ(x) = 0, where φ and has, as its solution, the curves defined by Eq. (16).
satisfies the covariant equation Since gµν can always be transformed to its Minkowski
g µν φ,µ φ,ν = 0. (17) values, it is seen to be an absolute object in the theory and
the group of transformations that leave it invariant is again
To construct an equation for a straight line we first in- the inhomogeneous Lorentz group. In all respects gµν is
µ } defined by
troduce the Christoffel symbols {ρσ equivalent to the straight lines and light cones of the theory.
µ 1 µν Furthermore, one can either construct laws of motion that
ρσ
= 2 g (gρν,σ + gνρ,σ − gρσ,ν ). (18) employ inertial coordinates and that are covariant with
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
respect to the inhomogeneous Lorentz group or construct different ways—as inertial mass in the second law of mo-
generally covariant laws with the help of the gµν . When tion and as gravitational mass in the law of gravitational
gµν is transformed to take on the values ηµν , the latter interaction. Logically, these two masses have nothing to
equations reduce to the former. do with one another. Inertial mass measures the resistance
The Riemann–Christoffel tensor arose in the study of of a body to forces imposed on it while gravitational mass
the geometry of manifolds with Riemannian metrics. In determines, in the same way as electric charge determines
such a geometry one defines a distance ds between neigh- the strength of the electrical force between charged bod-
boring points of the manifold with coordinates x µ and ies, the strength of the gravitational force between massive
x µ + d x µ to be bodies. Galileo was the first to demonstrate this constancy
by observing that the acceleration experienced by objects
ds 2 = gµν (x) d x µ d x ν , (22) in the earth’s gravitational field was independent of their
where gµν , a symmetric tensor field, is the metric of the mass. In 1891, Eötvös demonstrated it to an accuracy of
manifold. The vanishing of the Riemann–Christoffel ten- one part in 108 . More recent determinations by groups in
sor can be shown to be the necessary and sufficient con- Princeton and Moscow have established this constancy to
dition for the geometry to be flat; that is, the metric can better than one part in 1011 .
always be transformed to a constant tensor everywhere on While there was no explanation for this constancy,
the manifold. Since the tensor gµν introduced above is an Einstein realized that it called into question the existence
absolute object it is sometimes referred to as the metric of of one of the absolute objects of both Newtonian mechan-
the flat space–time of special relativity. Although we will ics and special relativity, the free bodies. If indeed this
not need to make use of this geometrical interpretation, it ratio was a universal constant, then there could be no such
will sometimes prove convenient to give an expression for thing as a gravitationally uncharged body since zero gravi-
ds as a way of specifying the components of gµν . tational mass would then imply zero inertial mass. Einstein
also realized that this constancy meant that it would be
impossible to distinguish locally, that is, in a sufficiently
small region of space–time, between inertial and gravita-
IV. GENERAL RELATIVITY
tional effects through their action on material bodies. An
observer in an elevator being accelerated upward with an
A. The Principle of Equivalence
acceleration equal to that produced by the earth’s gravity
After Einstein formulated the special theory of relativity would see objects fall to the floor of the elevator in ex-
he turned his attention to, among other things, the prob- actly the same way that they fall on earth, that is, with an
lem of constructing a Lorentz-invariant theory of grav- acceleration that is independent of their mass.
ity. Newton thought of gravity as an action-at-a-distance After this realization, Einstein made a characteristic
force between massive bodies and as transmitted instanta- leap of imagination. He postulated that it is impossible
neously between them. Since special relativity required a to distinguish locally between inertial and gravitational
finite speed of transmission, Einstein sought to construct effects by any means. One of the consequences of this
a relativistic field theory of gravity. The simplest object postulate, called by him the principle of equivalence, is
to associate with the gravitational field was a scalar field. that light should be bent in a gravitational field just as
However, a difficulty presented itself when he came to con- it would appear to be to an observer in an accelerat-
struct a source for this field. In the Newtonian theory, the ing elevator. But if this is the case, the light cones of
gravitational attraction between bodies was proportional special relativity would no longer be absolute objects
to their masses. In special relativity, however, energy has either, and this in turn would mean that the metric of
associated with it an equivalent mass through the relation special relativistic space–time would not be an absolute
E = mc2 . Consequently, Einstein argued, mass density by object.
itself could not be the sole source of the gravitational field. The principle of equivalence however, implied even
At the same time, energy density could not be used since more. If inertial and gravitational effects are indistinguish-
it is not, by itself, associated with a geometrical object able from each other locally, then one and the same ob-
in special relativity but rather with one component of a ject could be used to characterize both effects. Since it is
tensor field. the metric gµν that is responsible for the inertial effects
While thinking about the problem of gravity, Einstein one observes in special relativity, gµν should also be as-
was struck by a peculiarity of the gravitational interaction sociated with the gravitational field. In effect, geometry
between bodies, namely, the constancy of the ratio of the and gravity became simply different aspects of the same
inertial to the gravitational mass of all material bodies. thing. Actually, one never needs to interpret gµν as a met-
In Newtonian mechanics, mass enters in two essentially ric. One can identify it solely with the gravitational field.
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
This identification has, as a consequence, that gµν must number of laws of motion for the vector potential Aµ that
be a dynamical object since the gravitational field clearly satisfy these requirements.
must be such. Having recognized this fact, Einstein then To formulate the law of motion for gµν , we first con-
turned his attention to the problem of constructing a law struct from the Riemann–Christoffel tensor (21) the Ricci
of motion for this object. tensor Rµν , where
Rµν = g ρσ Rσ µρν (23)
B. The Principle of General Invariance and the curvature scalar R, where
In his attempts to construct a law of motion for gµν , R = g µν Rµν . (24)
Einstein proposed that these laws should be generally co-
variant. However, we have already seen that the laws of In terms of these quantities this law can be written as
motion of special relativity could be cast in generally co- Rµν − 12 gµν R + gµν = κ Tµν , (25)
variant form with the introduction of a metric satisfying
Eq. (20). But such a metric was absolute and Einstein where and κ are constants and Tµν is the energy–
wanted a law of motion for a dynamical gµν . Conse- momentum tensor associated with the sources of the grav-
quently, what Einstein was really requiring was not general itational field.
covariance but rather general invariance, that is, that the In general, the components of the Riemann–Christoffel
invariance group of the laws of motion should be the same tensor will not vanish even when all of the components of
as their covariance group, namely, the group of all arbitrary the Ricci tensor do. As a consequence, it follows from the
coordinate transformations. As we have argued above, this equation of geodesic deviation that the separation between
can be the case only if there are no absolute objects in the neighboring freely falling masses will change with time.
theory. The absence of absolute objects in the theory sat- Since such changes appear due to tidal forces in many-
isfies a version of Mach’s principle which states that there practicle systems, the Riemann–Christoffel tensor is thus
should be no absolute objects in any physical theory. a measure of such forces and vice versa.
Although Einstein did not use precisely the reason- The so-called cosmological term gµν was originally
ing outlined above, it was his recognition of the pre- not present in the Einstein field equations. It was later
ferred role played by the inhomogeneous Lorentz group added by him to obtain a static cosmological model with
in special relativity that was crucial to the development matter. When it was discovered that the universe was ex-
of the general theory of relativity. And although he for- panding and that there were solutions of the field equations
mulated his argument in terms of the relativity of motion, without the cosmological term that fit the then available
it is clear that he was referring to the invariance proper- data, Einstein dropped this term and for the most part it
ties of the laws of motion. His argument that all motion has not been included in the field equations. However,
should be relative—hence the term general relativity— recent data suggest that the expansion of the universe is
was really a requirement, in modern terms, that these laws accelerating instead of slowing down as it does in the older
should be generally invariant. This is not an empty require- cosmological models. One way to take into account such a
ment, as some authors have suggested, but rather severely discovery would be to reintroduce the cosmological term
limits the possible laws of motion one can formulate in the field equations.
for gµν . In addition to the law of motion for gµν it is necessary
to formulate generally invariant laws of motion for the
other geometrical objects that are to be associated with
C. Dynamic Laws the physical quantities being observed. One way to do this
1. Field Equations is simply to take over the generally covariant form of the
laws formulated for these objects in special relativity. The
The search for a generally invariant law of motion for gµν law of motion for the electromagnetic field, when this field
occupied a considerable portion of Einstein’s time prior is associated with a vector Aµ , can, for example, be written
to the year 1915. At one point he even argued that such in the form
a law could not exist. However, he did succeed in that √
year in finally formulating this law. If one requires that −g g µρ g νσ Fρσ ,µ = 4πj ν , (26)
this law contain no higher than second derivatives of the where g is the determinant of gµν , j µ the current density
gµν and furthermore that it be derivable from a variational associated with the sources of the electromagnetic field,
principle, then there is, infact, essentially only one law that and
fills these requirements. This is in marked contrast to the
situation in electrodynamics, where there are an infinite Fµν = Aν,µ − Aµ,ν . (27)
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
Such laws of motion are said to involve minimal coupling tion for compact bodies moving in external fields that are
to the gravitational field. It is also possible to construct slowly varying in space and time. In particular one can
laws of motion that do not couple minimally to the gravi- derive an approximate version of the geodesic Eq. (19)
tational field. In the case of the electromagnetic field, for for a single body, making it unnecessary to postulate the
example, one could include a factor of 1 + R, where R latter equations.
is the curvature scalar, inside the parentheses in Eq. (26). To better appreciate the significance of the EIH results
The only requirement these laws of motion should satisfy it is useful to compare the problem of motion in general
is that they reduce to their special relativistic form when relativity to that in special relativity and Newtonian me-
the Riemann–Christoffel tensor vanishes. chanics. In those theories one must postulate not only the
force law between bodies, such as the Lorentz force law
in electrodynamics, but also the form of the inertial terms
2. Particle Dynamics
in the equations of motion for these bodies. General rela-
When Einstein proposed the general theory in 1915 he tivity is unique among field theories in that both the force
took the geodesic Eq. (19) to be the equation of mo- laws and inertial terms follow from the field equations of
tion for a test particle moving in an external gravitational the theory.
field. Such a particle was assumed to have no effect on
the sources of this external field. Since, however, even
D. Clocks, Rods, and Coordinates
a test particle has a finite mass it cannot avoid having
some small but finite effect on these sources. Hence, the It has been argued that some kind of postulate concerning
geodesic equation yields at best an approximation to the the behavior of clocks and measuring rods is required in
true motion of both the test particle. Furthermore, this general relativity. For example, it has been suggested that
equation cannot be applied at all in the important case a class of objects, ideal clocks, measure proper time along
of the mutual interaction of bodies of comparable mass. their trajectories, where the proper time along a trajectory
And finally it cannot deal with gravitational radiation is defined to be the integral of the distance ds, given by
effects. Eq. (22), along this trajectory. In this view, clocks, and
In 1939 Einstein together with Banesh Hoffmann and also measuring rods, are assumed to be primitive objects
Leopold Infeld (hereafter referred to as EIH) made a fun- in the theory.
damental advance in the problem of particle dynamics Actually, all such postulates are unnecessary, in both the
in general relativity. They were able to derive approxi- special and general theories. Clocks, and similarly mea-
mate equations of motion directly from the Einstein field suring rods, are, in fact, composite physical systems with
equations for slowly moving, compact bodies without laws of motion governing their behavior. Once these laws
the need of additional assumptions. In the lowest, so- have been established, there is no need to add additional
called Newtonian approximation, they obtained equations postulates governing their behavior. It can be shown, for
of motion identical in form to those of Newton for com- example, that if one takes, as a model for a clock, a clas-
pact bodies interacting according to his inverse square law sical hydrogen atom, then as long as the forces acting on
of gravitation. This result shows incidentally that general this clock are small compared to the internal forces acting
relativity encompasses all of the results of Newtonian me- on its constituents and its dimensions are small compared
chanics. The EIH calculations also yielded directly the to the curvature of its trajectory, it will indeed measure
equivalence of inertial and gravitational masses. approximately the proper time along this trajectory. How-
Higher-order, post-Newtonian corrections to these laws ever, if the forces acting on it are sufficiently strong, the
correctly predict, among other observed effects, the per- atom will be ionized and cease to measure any kind of
ihelion advance of the planetary orbits. It is also possi- time along its trajectory. Thus, the behavior of clocks is
ble to extend the EIH approximation method to obtain seen to be a dynamical question that cannot be decided a
corrections that include the radiation reaction effects that priori from any kinematic postulate.
have been observed in binary pulsar systems. Also, this In this view, then, clocks and measuring rods, and in-
author was able to derive the electromagnetic interaction deed all measuring devices, are considered to be physi-
force law between charged particles from the combined cal systems with the geometrical objects associated with
Einstein–Maxwell field equations. them obeying their own laws of motion. Furthermore, a
In their original work EIH derived their equations of physical description would have to be considered incom-
motion for bodies moving in an everywhere constant plete if it did not supply these laws of motion. To avoid
(gµν = ηµν ) external gravitational field. One can also use the necessity of having to formulate and solve the laws of
their approach to derive approximate equations of mo- motion for a particular kind of clock, one may assume that
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
it does satisfy the conditions for measuring proper time, constant. For a point particle of mass m this equation has
with the proviso that if these conditions are violated it will the well-known solution
no longer do so. If this assumption results in inconsisten-
φ = −Gm/r. (29)
cies it does not mean that a principle of general relativity
has been violated but only that these conditions have not In the general theory we assume that, in the case of
been met. weak fields, there exists a coordinate system such that
While clocks and rods can be used to measure times gµν = ηµν + h µν , where h µν 1. We also assume that the
and distances, it should be emphasized that these mea- velocities of the sources of the gravitational field are all
surements bear no direct relation to the coordinates em- vanishingly small compared to the velocity of light. In this
ployed in the formulation of the laws of motion. Since, in case, the only nonvanishing component of Tµν is T00 = ρc2
all space–time descriptions, these laws are generally co- and Eq. (25) can be shown to reduce to Eq. (28) if we set
variant, there are no preferred coordinate systems. Conse- = 0, κ = −8π G/c4 , where G is the Newtonian gravita-
quently, it follows that the predictions of a theory cannot tional constant, and take
depend on a particular coordinatization. In effect, the co-
φ = (c2 /2)h 00 . (30)
ordinates play the same role in space–time theories as do
the indices that characterize the various components of a
geometrical object and hence, like these indices, are not B. The “Flat” Field
associated with any physical objects.
It is, however, often convenient to choose a particu- The simplest exact solution of the empty-space Einstein
lar coordinatization. Thus in Newtonian mechanics one equations with vanishing cosmological constant (Eq. (25)
usually chooses coordinates such that one of them is the with Tµν = 0 and = 0) is the so-called “flat” field. It sat-
parameter that characterizes the planes of simultaneity, isfies Eq. (20), whence the appellation “flat,” and in appro-
and likewise in special relativity one usually employs in- priate coordinates takes the form gµν = ηµν . By itself this
ertial coordinates. In general relativity one also can em- solution would be rather uninteresting if it were not for the
ploy a coordinatization that is particularly convenient for fact that it is the only stationary solution that is asymptot-
some purpose. One can, for example, choose coordinates ically flat and everywhere regular (no singularities) on the
in such a way that one of them corresponds to the time whole space–time manifold −∞ ≤ x µ ≤ ∞ as was first
and distance intervals measured by a particular family of shown by A. Lichnerowicz. One consequence of this re-
clocks and rods. Alternatively, one can choose coordi- sult is that there do not exist regular, stationary solutions
nates so that certain components of the gravitational field of the empty-space field equations that are asymptotically
have simple values. For example, one can choose coor- flat and that could be interpreted as “particle” solutions,
dinates so that g00 = 1 and g01 = g02 = g03 = 0. But in all that is, solutions that are nonflat only in some localized
cases such a choice is arbitrary and devoid of physical region of space. Einstein had hoped that such solutions of
content. his field equations existed so that he could dispense with
the matter stress–energy tensor T µν that appears in these
equations.
V. GRAVITATIONAL FIELDS
field. The condition that the field be independent of x 0 u = a cosh(x 0 /4M) v = a sinh(x 0 /4M) (32)
was originally imposed by Schwarzschild in obtaining his
solution of the field equations but has since been shown to with a = [(r/2M) − 1]1/2 exp(r/4M), such that the trans-
be a consequence of the condition of spherical symmetry. formed components of the Schwarzschild field were given
The importance of the Schwarzschild solution lies in by
the fact that it is the general relativistic analog of the g00 = f g11 = −f g22 = −r 2
Newtonian field (29) of a point mass. By making use of
g33 = −r 2 sin2 φ, (33)
Eq. (30) and Eq. (31) for g00 we see that M = Gm/c2 .
The constant 2M is refferred to as the Schwarzschild ra- where f = (8M/r ) exp(−r/2M). In these latter expres-
dius of the mass m. The Schwarzschild radius of the sun sions, r is a function of u and v obtained by solving
is 2.9 km and of the earth is 0.88 cm. For comparison, the Eq. (32) for this quantity. In this form the field is seen
Schwarzschild radius of a proton is 2.4 × 10−52 cm and to be singular only at r = 0, which is a true physical sin-
that of a typical galaxy of mas ∼1045 g is ∼1017 cm. gularity.
The Schwarzschild field has a property that distin- Figure 1 depicts a number of the features of the Kruskal
guishes it from the corresponding Newtonian field: at transformation in what is called a Kruskal diagram. In this
r = 2M it becomes singular. Indeed, at this radius g11 diagram, curves of constant Schwarzschild r correspond
is infinite! However, this is not a physical singularity, as to the right hyperbolas u 2 − v 2 = const, while curves of
Eddington first showed in 1924, but rather what is called a constant x 0 correspond to straight lines passing through
coordinate singularity. A final clarification of the structure the origin. The region I to the right of the two r = 2M lines
of the Schwarzschild field came in 1960 with the work of corresponds to the entire region r > 2M, −∞ < x 0 < ∞.
M. Kruskal. He found a coordinate transformation from Hence it follows that the transformation from Kruskal
the Schwarzschild coordinates (x 0 , r, θ, φ) to the set (u, v, to Schwarzschild coordinates must be a singular trans-
θ, φ), where formation, and indeed it is the singular nature of this
FIGURE 1 Kruskal diagram for the Schwartzschild field. The hyperbola r = 0 represents a real singularity of the
field.
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
transformation that is responsible for the singularity of After the discovery of the Kerr–Newman solutions it
gµν at r = 2M in Schwarzschild coordinates. was shown by the work of Israel, Carter, Hawking, and
Many of the properties of the Schwarzschild field can be Robinson that the Kerr–Newman solutions are the only
understood by referring to the Kruskal diagram. In Kruskal asymptotically flat, stationary black hole solutions, that
coordinates, light rays propagate along straight lines at 45◦ is, solutions with event horizons that depend continuously
with respect to the u, v axes. As a consequence, it is seen on a finite set of parameters. This result is a version of a
that all of the rays emitted at a point P1 in region II above conjecture of Wheeler to the effect that “a black hole has
the two lines r = 2M will ultimately reach the singular no hair.” What is still lacking is a proof of uniqueness of
curve r = 0, so no information can be transmitted from this the Kerr–Newman solutions, that is, that not more than
point into region I. Likewise, inward-directed rays emitted three parameters is sufficient to characterize completely
at a point P2 in region I will also reach the r = 0 curve all stationary, asymptotically flat black hole solutions. To
while outward-directed rays will continue their outward date, the best one can do in this respect is to show that
propagation forever. And finally, some of the rays emitted the only such static solutions and the only such neutral
from a point P3 in region IV below the two lines r = 2M solutions belong to the family of Kerr–Newman solutions.
will ultimately reach points in region I while others will In addition to the Kerr–Newman family of solutions
reach the region III to the left of the two lines r = 2M. there are solutions of the empty-space field equations that
Because of these features, the surface r = 2M is said to do not have event horizons surrounding the physical sin-
form an event horizon: an observer in region I can never gularity in the solution. One such solution is obtained by
receive information about events taking place in regions letting the parameter M in the Schwarzschild solution be-
II and III of the diagram. In addition, it is seen to be a light come negative. Also, a charged Schwarzschild field has
cone. no horizon if the charge Q < M. In both cases the only
Finally, we note that a material body starting from rest singularity occurs at r = 0. Because of the absence of a
at P2 will fall into the singularity at r = 0 along the dashed horizon surrounding this point, it is referred to as a naked
path indicated in Fig. 1. Even though it reaches the event singularity.
horizon at x 0 = ∞, the proper time along the curve from P2
to the point where it reaches the singularity is finite. A lo-
E. Fields with Matter—Gravitational Collapse
cal observer falling with the particle would notice nothing
peculiar as he passed through the horizon. As long as the In addition to the empty-space solutions discussed here
particle is outside this horizon it is possible to reverse its there are numerous solutions of the Einstein equations
motion so that it does not cross it. However, once it does with nonvanishing energy–momentum tensors. One can,
its fate is sealed; its motion can no longer be reversed for example, construct a spherically symmetric, nonsin-
and it will ultimately reach the r = 0 singularity in a fi- gular interior solution that joins on smoothly to an exte-
nite amount of proper time. For this reason the source of a rior Schwarzschild solution. The combined solution would
Schwarzschild field is said to be a black hole—nothing that then correspond to the field of a normal star, a white dwarf,
falls into it can ever get out again and any radiation gener- or a neutron star. What distinguishes these objects is the
ated inside the horizon can never be seen from the outside. equation of state for the matter comprising them. It is sur-
prising that no interior solutions that could be joined to a
Kerr–Newman field have been found.
D. Kerr–Newman Fields, Naked Singularities
Normal stars, of course, are not eternal objects. They are
In addition to the Schwarzschild family of solutions, each supported against gravitational collapse by pressure forces
member of which is characterized by a value of the pa- whose source is the thermonuclear burning that takes place
rameter M, other stationary solutions of the empty-space at the center of the star. Once such burning ceases due to
Einstein field equations have been found by Kerr and the depletion of its nuclear fuel, a star will begin to con-
Newman. Like the Schwarzschild solution, the Kerr– tract. If its total mass is less than approximately two solar
Newman solutions are asymptotically flat; that is, the value masses it will ultimately become a stable white dwarf or
of the Riemann–Christoffel tensor goes to zero as the co- a neutron star, supported by either electron or neutron
ordinate r approaches infinity and also the physical sin- degeneracy pressure. If, however, its total mass exceeds
gularity in the solution is surrounded by an event hori- this limit, the star will continue to contract under its own
zon. This family of solutions is characterized by three gravity down to a point, a result first demonstrated by
continuously variable parameters, which, because of the Oppenheimer and Snyder in 1939. Although they treated
asymptotic form of these solutions, can be interpreted as only cold matter, that is, matter without pressure, their
the mass, angular momentum, and electric charge of the result would still hold once the star has passed a certain
black hole. critical stage even if the repulsion of the nuclei comprising
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
the star were infinite. This critical stage is reached once galaxy NGC 4258, Makoto Miyoshi and his collaborators
the radius of the star becomes less than its Schwarzschild were able to establish in 1995 that the mass of the central
radius. When this happens, the surface r = 2M becomes object is 4 × 107 M and its diameter is a half a light year,
an event horizon and the matter is trapped inside, forming thereby establishing its identity as a black hole. Our own
a black hole. Even an infinite pressure would then be un- galaxy has been shown to contain a black hole at its center
able to halt the continued contraction to a point because of mass 2.6 × 106 M . To date, there are about 15 masses
such a pressure would contribute an infinite amount to that have been determined for black holes in the centers
the energy–momentum tensor of the matter, which in turn of nearby galaxies. The most massive such candidate for
would result in an even stronger gravitational attraction. a black hole is in the giant elliptic galaxy M87 with an
Because of the disquieting features of black hole for- estimated mass of 3 × 109 M although it is not certain
mation (neither Eddington nor Einstein was prepared that this object is indeed a black hole. Finally it is now
to accept their existence), theorists looked for ways to believed that the energy source of quasars is the inflow of
avoid their formation. However, theorems of Penrose and vast amounts of gas into massive black holes in the centers
Hawking show that collapse to a singularity is inevitable of very young galaxies.
once the gravitational field becomes strong enough to drag
back any light emitted by the star, that is, when the escape
F. Gravitational Radiation,
velocity at the surface of the star exceeds the velocity of
the Quadrupole Formula
light.
What has not been proved to date is what Penrose calls Shortly after he formulated the general theory, Einstein
the hypothesis of “cosmic censorship.” This hypothesis as- showed that, when linearized around the flat field gµν =
serts that matter will never collapse to a naked singularity, ηµν , the empty-space field equations with zero cosmolog-
but rather that the singularity will always be surrounded ical constant possessed plane wave solutions similar to the
by an event horizon and hence not be visible to an external plane wave solutions of electromagnetism. These waves
observer. While the hypothesis is suported by both numer- propagate along the light cones defined by ηµν , that is, at
ical and perturbation calculations, it has so far not been the speed of light. Like their electromagnetic counterparts,
shown to be a rigorous consequence of the laws of motion they are transverse waves and possess two independent
of the general theory. states of polarization. However, unlike the dipole struc-
The detection of black holes is complicated by the fact ture of plane electromagnetic waves, gravitational waves
that they are black—by themselves they can emit no ra- have a quadrupole structure, as would be expected from the
diation. If they exist at all then, they can be detected only tensor nature of the gravitational field. Using coordinates
through the effects of their gravitational field on nearby such that gµν = ηµν + h µν and the direction of propagation
matter. If a black hole were a member of a double star sys- as the z axis, the two states of polarization are character-
tem, it would become the source of intense X rays when ized by h + + × ×
x x = h yy and h x y = h yx with, in the two cases,
its companion expanded during the later stages of its own the other components having the value zero.
evolution. As matter from the companion fell onto the Although the empty-space field equations have been
black hole it would become compressed and thus heated to shown to possess exact solutions with wavelike proper-
temperatures high enough for it to emit such high-energy ties, the so-called plane fronted waves, neither they nor the
radiation. Of course, it is not enough to find an x-ray- approximate wave solutions found by Einstein are associ-
emitting binary system in order to prove the existence ated with sources. In electromagnetic theory it is possible
of a black hole. It is also necessary that the mass of the to find exact solutions of the field equations associated
x-ray-emitting component be larger than the upper limit with sources with arbitrary motions. So far, no such solu-
on the mass of a stable neutron star, that is, larger than tions have been found for the highly nonlinear equations
3 M (M denotes a solar mass of 1.99 × 1030 kg). of the general theory. In order to construct radiative so-
The first object to be definitely identified as a black hole lutions associated with sources it is necessary to employ
in 1971 was a member of a binary system Cygnus X-1 in approximation methods.
the constellation Cygnus which was an intense emitter The first attempts to carry out such approximations were
of x-rays. Measurements established that the mass of the made by Einstein and Arthur Eddington. Einstein tried
compact component of the system was about 8 M . Since to calculate the energy radiated by a localized system of
then eight more black holes have been found in binary masses by introducing a pseudo-energy-momentum tensor
systems. In addition to stellar mass black holes there is for the gravitational field. However, this object was not a
now compelling evidence that massive black holes exist true tensor and in fact it was later recognized that it was in
in the centers of galaxies. By measurements of the orbital general not possible to introduce a true energy-momentum
speed of the gaseous disk around the nucleus of the spiral tensor for the gravitational field. As a consequence, even if
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
frequency νrec of the received radiation. Then it can be use it alone to determine the result of the Vessot–Levine
shown that experiment. In this case it is necessary to make direct use
of Eq. (36). The derivation of Eq. (36), however, makes
Z := (νrec − νem )/νem = (1/c2 )(φem − φrec ), (36)
use of Eq. (19) for a light ray, which, in turn, is a con-
where φem and φrec are the gravitational potentials at the sequence of the field equations (25) of general relativity.
locations of the emitter and receiver, respectively. If φem Also, although the present red shift measurements are not
is less than φrec then the quantity Z is negative, hence sufficiently accurate to distinguish between different pos-
the emitted light appears to be red shifted. This effect can sible equations for the gravitational field, there is nothing
be understood by noting that, in going from the emiter in principle that would preclude such a test.
to the receiver, a photon will, in this case, gain potential
energy. Since its total energy is conserved, its kinetic en- B. Solar System Tests—The PPN Formalism
ergy, which is proportional to its frequency, will decrease
The first arena used for testing general relativity was the
and hence so will its frequency. If either the emitter or re-
solar system and it remains so to this date. What has
ceiver is moving with respect to the background field then
changed dramatically over the years, due to the rapid
Eq. (36) must be amended to take account of the Doppler
growth of technology, is the degree of accuracy with which
shift produced by such motion.
the theory can be tested. It is in the solar system that the
The first attempts to observe the gravitational red shift
gravitational field of the sun is of sufficient strength that
were made on the spectral lines of the sun and known
deviations from the Newtonian theory are observable, but
white dwarfs. For the sun, Z = −2.12 × 10−6 while for
just barely.
white dwarfs it would have values 10–100 times as large.
In calculating the size of these effects one assumes that
In the case of the sun the shift was masked by the Doppler
the trajectories of planets and light rays obey Eqs. (19).
broadening of the spectral lines due to thermal motion.
However, rather than take the sun’s gravitational field to be
However, observations near its edge are consistent with a
the Schwarzschild field in evaluating the Christoffel sym-
red shift of the magnitude predicted by Eq. (36). In the case
bols appearing in these equations, it is useful to use what
of the white dwarfs, it was not possible to measure their
has become known as the parametrized post-Newtonian
masses and radii with sufficient accuracy to determine φem ,
(PPN) formalism. In this formalism, first developed by
although again red shifts were observed whose magnitudes
Eddington and extended by Robertson, Schiff, Will, and
were consistent with estimates of these quantities.
others, one assumes a more general form for the post-
The first accurate test of the red shift prediction was
Newtonian corrections to the gravitational field of the sun
carried out in a series of terrestrial experiments by Pound
than those given by general relativity. These corrections
and Rebka in 1960, using the Mössbauer effect. The emit-
are allowed to depend on a number of unknown parame-
ter and detector used by them were separated by a vertical
ters that one hopes to determine by solar system and other
height of 74 ft. In this case the gravitational potential can
observations. The reason for proceeding in this manner is
be taken equal to gz, where g is the local acceleration
that it allows one to test the validity of other competing
due to gravity and z is the height above ground level.
theories of gravity in which these parameters have differ-
Equation (36) then yields the value Z = 2.5 × 10−15 . In
ent values from those they would have if general relativity
spite of its small value, Pound and Rebka were able to
were valid.
observe a shift equal to 1.05 ± 0.10 times the predicted Z .
In the most extreme versions of this formalism as many
Later experiments with cesium beam and rubidium clocks
as 10 parameters are employed. However, a number of
on jet aircraft yielded similar results.
these parameters can be eliminated if one requires that
The possibility of testing the red shift prediction has
the equation of motion (19) are a consequence of the field
improved dramatically with the development of high-
equations for the gravitational field, as they are in general
precision clocks. In 1976 Vessot and Levine used a rocket
relativity. Also, some of these parameters are known to be
to carry a hydrogenmaser clock to an altitude of about
small from other experiments. In what follows we will use
10,000 km. Their result verified the theoretical value to
an abbreviated version of the PPN formalism employed by
within 2 parts in 10−4 .
Hellings in his analysis of the solar system data. In this
It has been argued that red shift observations do not
version the components of the gravitational field have the
bear on the question of the validity of general relativity
form
but rather only on the validity of the principle of equiva-
lence. This is, in fact, only partially true. If one assumes g00 = 1 − 2U 1 − J2 (R /r )2 P2 (θ)
that the gravitational field of the earth is uniform over + 2βU 2 + α1 U (w/c) (37a)
the 74 ft that separated the emitter and receiver of the
Pound–Rebka experiment, then indeed their result can be g0i = α1 U wi c (37b)
calculated using only this principle. However, one cannot gi j = −(1 + 2γ U )δi j , (37c)
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
where β, γ , and α1 are PPN parameters U = GM /r c2 is terminations, and the most recent, in 1984, agrees with the
the Newtonian gravitational potential of the sun, and R general relativistic prediction to within 1%.
and M are the radius and mass of the sun. Included in
Eq. (37a) is a term proportional to J2 , a dimensionless
2. Time Delay
measure of the quadrupole moment of the sun. In this
term P2 is the Legendre polynomial of order 2 and θ is the In passing through a strong gravitational field, light not
angle between the radius vector from the sun’s center and only will be red shifted but also will take longer to traverse
the normal to the sun’s equator. The so-called preferred a given distance than it would if Newtonian theory were
frame velocity wi is taken to be the average of the determi- valid. The reason for this delay is that the gravitational field
nations of the solar system velocity relative to the cosmic acts like a variable index of refraction, so the velocity of
blackbody background. In general relativity β = γ = 1 and light will vary as it passes through such a field. This effect
α1 = 0. was first proposed by Shapiro in 1964 as a means of testing
general relativity. It can be observed by bouncing a radar
signal off a planet or artificial satellite and measuring its
1. Bending of Light
round-trip travel time. At superior conjunction, when the
One of the more spectacular confirmations of the general planet or satellite is on the far side of the sun from the
theory came in 1919, when the solar eclipse expedition earth, the effect is a maximum, in which case the amount
headed by Eddington announced that they had observed of delay is given by
bending of light from stars as they passed near the edge
of the sun that was in agreement with the prediction of δt = 2(1 + γ ) G M c3 ln 4rerp d 2 , (40)
the theory. Derivations of the bending that use only the where rc , rp , and d are, respectively, the distance from the
principle of equivalence or the corpuscular theory of light sun to the earth, the distance from the sun to the target, and
predict just half of the bending predicted by the general the distance of closest approach of the signal to the center
theory. of the sun. Since one does not have a Newtonian value
The angle of bending for light passing at a distance d for the round-trip travel time with which to compare the
from the center of the sun can be computed by using the measured time it is necessary to monitor the travel time
equation of motion (19) with gµν d x µ /d λ d x ν /d λ = 0. as the target passes through superior conjunction and look
Using the form for the gravitational field given by Eq. (37), for a logarithmic dependence.
the angle of bending is given by The use of a planet such as Mercury or Venus as a tar-
= (1 + γ )2G M c2 d . (38) get is complicated by the fact that its topography is largely
unknown. As a consequence, a signal could be reflected
For light that just grazes the edge of the sun = 1 .75 from a valley or a mountaintop without our being able to
when γ = 1. Because of this small value it is necessary to detect the difference. Such differences can introduce er-
observe stars whose light passes very close to the edge of rors of as much as 5 µsec in the round-trip travel time.
the sun, and this can be done only during a total eclipse. Artificial satellites such as Mariners 6 and 7 have been
The apparent positions of these stars during the eclipse used to overcome this difficulty. Furthermore, since they
are then compared to their positions when the sun is no are active retransmitters of the radar signal they permit an
longer in the field of view in order to measure the amount accurate determination of their true range. Unfortunately,
of bending. Unfortunately, such measurements are beset fluctuations in the solar wind and solar radiation pressure
with a number of uncertainties. Thus the measurements produce random accelerations that can lead to uncertain-
made by Eddington and his co-workers had only 30% ties of up to 0.1 µsec in the travel time. Finally, spacecraft
accuracy. The most recent such measurements were made such as the Mariner 9 Mars orbiter and the Viking Mars
during the solar eclipse of June 30, 1973, and yielded the landers and orbiters have been used as targets. Since they
value are anchored to the planet they will not suffer such accel-
1
2
(1 + γ ) = 0.95 ± 0.11. (39) erations. The most recent measurements by Reasenberg
et al. in 1979 have yielded a value
The use of long-baseline and very-long-baseline in-
terferometry, which is capable in principle of measur- (1 + γ )/2 = 1.000 ± 0.001. (41)
ing angular separations and changes in angle as small as
3 × 10−4 , has made possible much more accurate tests of
3. Planetary Motion
the bending of light. These techniques have been used to
observe a number of quasars such as 3C273 that pass very Long before the general theory was proposed, it was
close to the sun in the course of a year. Beginning in 1970, known that there was an anomalous precession of the per-
these observations have yielded increasingly accurate de- ihelion (distance of closest approach to the sun) of the
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
planet Mercury that could not be accounted for on the ba- inconsistent with the observed value by about two stan-
sis of Newtonian theory by taking into consideration the dard deviations. Unfortunately, the present measurements
perturbations on Mercury’s orbit due to the other planets. of the orbit of Mercury are not sufficiently accurate to
At the end of the last century, Newcomb calculated this separate the post-Newtonian and quadrupole effects.
residual advance to have a value of 41 .24 ± 2 .09 of arc A resolution of this difficulty has come from an anal-
per century. ysis of the ranging data for the planet Mars. Since the
The field values given by Eq. (37) and the equation of quadrupole contribution to the perihelion advance has a
motion (19) together yield an expression for the perihelion different dependence on the semimajor axis from the grav-
advance per period that is given, to an accuracy commen- itational effect, it is in principle possible to separate the
surate with the accuracy of the observations, by two by observing the advance for different planets. In spite
of the smallness of these effects on the orbit of Mars, the
δ ω̄ = (6π GM /c2 p) 13 (2 + 2γ − β) accuracy of the Viking data from Mars, which are accu-
2 2
+ J2 R c 12GM p , (42) rate to within 7 km, combined with the radar data from
Mercury allows such a determination. Using a solar system
where p = a(1 − e2 ) is the semi-latus rectum of the orbit, model that includes 200 of the largest asteroids. Hellings
a its semimajor axis, and e its eccentricity. Using the best has found, with J2 = 0, that
current values for the orbital elements and physical con-
stants for Mercury and the sun, one obtains from Eq. (38) β − 1 = (−0.2 ± 1.0) × 10−3 (44a)
a perihelion advance of 42 .95λp of arc per century, where γ − 1 = (−1.2 ± 1.6) × 10 −3
(44b)
λp = [ 13 (2 + 2γ − β) + 3 × 103 J2 ].
The measured value of the perihelion advance of Mer- α1 = (2.2 ± 1.8) × 10−4 . (44c)
cury is known to a precision of about 1% from optical mea- When J2 was allowed to have a finite value, he found that
surements made over the past three centuries and of about
0.5% from radar observations made over the past two J2 = (−1.4 ± 1.5) × 10−6 (45a)
decades. If one assumes that J2 has the value ∼1 × 10−7 , and
which it would have if it were the consequence of centrifu-
gal flattening due to a uniform rotation of the sun equal to β − 1 = (−2.9 ± 3.1) × 10−3 (45b)
−3
its observed surface rate of rotation, then, using this value, γ − 1 = (−0.7 ± 1.7) × 10 (45c)
Shapiro gives
α1 = (2.1 ± 1.9) × 10−4 . (45d)
1
3
(2 + 2γ − β) = 1.003 ± 0.005, (43)
Hellings also used these data to analyze the nonsymmetric
which is in excellent agreement with the prediction of gravitational theory of Moffat, which was consistent with
general relativity. the Mercury data and Hill’s value for J2 . The result was
This agreement has been called into question by some that
researchers, notably Dicke and Hill. Observations of
J2 = (1.7 ± 2.4) × 10−7 . (46)
the solar oblateness by Dicke and Goldenberg in 1966
led them to conclude that J2 actually has a value of From these results it appears that the predictions of gen-
(2.47 ± 0.23) × 10−5 , leading to a contribution of about eral relativity are confirmed to about 0.1%. However, by a
4 per century to the overall perihelion advance. If true, suitable adjustment of parameters, several competing the-
this would put the prediction of general relativity into se- ories also share this property. What distinguishes general
rious disagreement with the observations. On the other relativity from these other theories is that, aside from the
hand, it would agree with the prediction of the Brans– value for the gravitational constant G, it contains no other
Dicke scalar tensor theory of gravity if an adjustable pa- adjustable parameters.
rameter in that theory were suitably chosen. However, a
number of authors have disagreed with the interpretation
4. Time Varying G
of their observations by Dicke and Goldenberg. These
authors argue that the observations could equally well In addition to the tests discussed above, the solar system
be explained by assuming a standard solar model with data can be used to test the possibility that the gravitational
J2 ∼ 10−7 and a surface temperature difference of about constant varies with time. Such a possibility was first sug-
1◦ between the pole and the equator. More recently Hill gested by Dirac in 1937 on the basis of his large number
has given a value of J2 = 6 × 10−6 , based on his mea- hypothesis. He observed that one could form, from the
surements of normal mode oscillations of the sun. If true, atomic and cosmological constants, several dimensionless
the general relativistic prediction for Mercury would be numbers whose values were all of the order of 1040 . Rather
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
than being a coincidence that was valid only at the present PSR 1913 + 16 by Hulse and Taylor in 1974. It consists
time, Dirac proposed that the equality of these numbers of a pulsar in orbital motion about an unseen companion
was the manifestation of some underlying physical prin- with a period of 7.75 hr. Its relevance for general rela-
ciple and that they held at all times. Since one of these tivity is twofold: because v 2 /c2 ∼ 5 × 10−7 is a factor 10
numbers involves the present age of the universe through larger than for Mercury, relativistic effects are consider-
its dependence on the Hubble “constant” and hence de- ably larger than any that have been observed in the solar
creases as one moves back in time, the other constants system. Also, the short period amplifies secular changes in
must also change with time in order to maintain the equal- the orbit. Thus the observed periastron advance amounts
ity between the large numbers. One of these numbers, to 4◦ .2261 ± 0.0007 of arc per year compared to the 43 of
however, involves only atomic constants, being the ratio arc per century for Mercury. Furthermore, the pulsar car-
of the electrical to the gravitational force between an elec- ries its own clock with a period that is accurate to better
tron and a proton. Hence the Dirac hypothesis requires than one part in 1012 . As a consequence, measurements of
that one of these atomic constants must be changing on post-Newtonian effects can be made with unprecedented
a cosmic time scale. The constant that is usually taken to accuracy. If this were all, the binary pulsar would still be
vary with time in theoretical implementations of the large an invaluable tool for testing general relativistic orbit ef-
number hypothesis is the gravitational constant. fects. However, it also provides us for the first time with a
There are several ways of constructing a theory with means for testing an essentially different kind of prediction
an effective time-varying gravitational constant. In the of general relativity, namely the existence of gravitational
Brans–Dicke theory, the effective gravitational constant radiation.
itself varies with time: Considerable effort has gone into identifying the pul-
sar companion. It was soon found that the pulsar radio
G eff = G[1 + (Ġ/G)(t − t0 )]. (47)
signals were never eclipsed by the companion. Also, the
An alternative proposal by Dirac assumed that cosmic ef- dispersion of the pulsed signal showed little change over
fects couple to local atomic physics so that the ratio of an orbit, implying the absence of a dense plasma in the
atomic to gravitational time is not constant. The rate of system. These two facts together ruled out the possibil-
change of gravitational time τG with respect to atomic ity of the companion being a main sequence star. Another
time τA is then given as possibility is that it is a helium star. However, since the
dτG /dτA = 1 + φ̇(t − t0 ), (48) pulsar is at a distance of only about 5 kpc from us, such
a star would have been seen. In spite of intense efforts,
where φ is some cosmological field that is supposed to be no such star has been observed in the neighborhood of the
responsible for the effect. In both cases, the net effect is pulsar. The remaining possibility is that it is a compact ob-
to produce an anomalous acceleration in the equations of ject, either a white dwarf, another neutron star, or a black
motion for material bodies. hole.
Since the change in atomic constants is tied to cosmic In the case of conventional spectroscopic binaries, it is
evolution in the large number hypothesis, the expected usually possible to measure only two parameters of the
rate of change in G should be proportional to the inverse system, the so-called mass function of the two masses
Hubble time: M1 and M2 of the components and the product of the
Ġ/G − H0 ∼ = 5 × 10−11 year−1 (49) semimajor axis a1 and the sine of the angle i of inclination
of the plane of the orbit to the line of sight. However, in the
On the basis of the Viking lander data, Hellings concludes case of the binary pulsar one can use general relativity to
that determine all four of these parameters from measurements
Ġ/G = (0.2 ± 0.4) × 10−11 year−1 (50a) of the periastron advance and the combined second-order
−11 −1
Doppler shift and gravitational red shift of the emitted
φ̇ = (0.1 ± 0.8) × 10 year (50b) signals. One finds from these combined measurements
Since these limits are an order of magnitude smaller than that M1 = M2 = (1.41 ± 0.06) M . From the fact that the
what one would expect from simple cosmic scale ar- Chandrasekhar limit on the mass of a nonrotating white
guments, they cast serious doubt on the large number dwarf is about 1.4 M , it appears likely that the unseen
hypothesis. companion is either a neutron star or a black hole.
In addition to the measurements discussed above, it was
discovered that the orbital period P was decreasing with
C. The Binary Pulsar
time. Later measurements gave a value for Ṗ = (−2.30 ±
A new, and essentially unique, opportunity for testing gen- 0.22) × 10−12 sec/sec−1 or about 7 × 10−5 sec/year. If one
eral relativity came with the discovery of the binary pulsar computes the period change due to loss of energy by the
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
emission of gravitational radiation using the quadrupole Weber bars is thermal noise. As a consequence, second-
formula (34) one obtains a value for Ṗ = −2.40 × generation Weber bars are being constructed that will be
10−12 sec/sec−1 , in excellent agreement with the observed cooled to liquid helium temperatures. Such bars are esti-
value. mated to have sensitivities of h ∼ 10−19 . It is technically
Of course, there are other effects that could change the feasible to construct bars for which h ∼ 10−21 , although
orbital period, such as tidal dissipation, mass loss or accre- that would require cooling to the millidegree level. The
tion onto the system, or acceleration relative to the solar latter value appears to be a lower limit to what can be
system. Furthermore, there could be other contributions to attained with presently available technology.
the periastron advance such as rotational or tidal deforma- One of the drawbacks of Weber bar detectors is that
tion of the companion. Only in the case of a helium-star they are only sensitive to the Fourier component of the
companion would any of these effects contribute signifi- incoming signal whose frequency is equal to the resonant
cantly to the calculated or observed period change. Fur- frequency of the bar. Furthermore, most bars have res-
thermore, it would be truly remarkable if some combina- onant frequencies in the kiloherts range with a smallest
tion of these effects should conspire to give a value for the reported frequency of 60.2 Hz. Unfortunately, most con-
period change equal to that predicted by the quadrupole tinuous wave sources such as binary star systems have
formula. It therefore appears safe to say that for the first much lower frequencies. In an attempt to overcome this
time we have evidence of a qualitatively new prediction of difficulty and to increase sensitivity, a number of groups
general relativity, namely gravitational radiation. Finally, have undertaken the construction of laser interferometer
the data from the binary pulsar seem to rule out a num- detectors. In these devices, a gravitational wave would
ber of competing theories of gravity such as the Rosen change the lengths of the interferometer arms and one
bimetric theory. In such theories this system can radiate would measure the resulting fringe shifts. Such detectors
dipole gravitational waves that result in a period increase. can, in principle, record the entire waveform of an in-
Only a very artificial mechanism could then give rise to coming wave rather than a single Fourier component. It
the observed period decrease. is possible that sensitivities as low as h ∼ 10−22 might be
achieved. The most ambitious of these projects to date is
the LIGO (for laser interferometer gravitational wave ob-
D. Gravitational Wave Detection
servatory) project which is building two such detectors in
The first comprehensive attempt to detect gravitational the United States and is expected to go on line in the next
waves impinging on the earth was begun by J. Weber in few years. It has also been suggested that gravitational ra-
1961. His antenna consisted of a large aluminum cylin- diation could be detected by the accurate Doppler tracking
der ∼1.5 m long with a resonant frequency of ∼1660 Hz. of spacecraft. Such a scheme would, in principle, be able
In the early 1970s Weber announced the detection of co- to detect waves with frequencies in the 1 to 10−4 range.
incident pulses on two of these antennae separated by a Present technology is within one or two orders of magni-
distance of ∼1000 mile. However, attempts to duplicate tude of the sensitivity needed to detect possible signals in
these results by a number of other groups, using some- this frequency range.
what more sensitive detectors than those used by Weber, Possible sources of gravitational waves can be divided
proved fruitless, and it is now generally agreed that the into two groups, those that emit continuously and those
events recorded by Weber were not caused by gravita- that emit in bursts. Possible continuous wave sources are
tional waves. binary stars and vibrating or rotating stars. In the case of
Since Weber’s pioneering efforts, about 15 different binary stars, the strongest emitter known is µ Scorpii,
groups from around the world have undertaken the con- for which h = 2.1 × 10−20 . However, its frequency is
struction of gravitational wave detectors. The sensitivity 1.6 × 10−5 Hz. The largest binary frequency known is
of a detector can be expressed in terms of the smallest 1.9 × 10−3 Hz. However, for this system h ∼ 5 × 10−22 .
strain L/L, where L is a change in the length L of the Other possible continuous wave sources have values for
detector, that can just be measured. This change in length h that are this small or smaller. Thus, estimates of h for
is produced by tidal forces associated with the incident waves from the Crab and Vela pulsars are of order 10−24 to
gravitational wave and hence its measurement leads to a 10−27 , at frequencies between 10 and 100 Hz. In spite of
determination of the Riemann–Christoffel tensor of the these low amplitudes, signal integration over an extended
wave. Since this strain is approximately equal to the di- time can effectively increase the sensitivity of a detector
mensionless amplitude h of a gravitational wave incident by an order of magnitude or more, so the detection of such
on the detector, the sensitivity of a detector is usually given signals is not totally out of the question.
as the minimum value of h that can be detected. Bursts of gravitational radiation can be expected to
The original Weber bars had a sensitivity h ∼ 10−16 . At accompany cataclysmic events such as supernova explo-
present one of the main limitations on the sensitivity of sions, stellar collapse to form neutron stars or black holes,
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
or coalescence of the neutron stars or black holes in a bi- tronomers to calculate the mass distribution in this cluster
nary system at the end stage of its evolution. One of the including the contribution due to ‘dark matter.’
problems in dealing with such systems is the determina-
tion of the efficiency with which other forms of energy
VII. GRAVITY AND QUANTUM
can be converted into gravitational radiation. Estimates
MECHANICS
range from a maximum of 0.5 to as low as 0.001. Such
events would have characteristic frequencies in the range
A. Hawking Radiation and Black
102 to 105 Hz and those occurring in our galaxy would
Hole Thermodynamics
have amplitudes estimated to be in the range h ∼ 10−18
to 10−17 . Here the problem for detection is not so much For most of its history, general relativity has stood apart
the frequency or the intensity, as it is in the case of con- from quantum mechanics. Early attempts to quantize the
tinuous emitters, but rather the scarcity of such events. gravitational field proved to be largely unsuccessful and
Thus, the supernova rate in our galaxy has been estimated quantum theory usually neglected the presence of gravi-
to be 0.03 per year. If one includes such events in other tational fields. For most problems one could justify this
galaxies the rate increases. For example, at a distance of neglect. The radius of the first Bohr orbit of a hydrogen
10 Mpc the estimated supernova rate is one per year. How- atom held together by gravitational rather than electrical
ever, the corresponding amplitude would be h ≈ 3 × 10−21 forces, for example, would be about 5 × 1030 m, which
to 3 × 10−20 . is almost four orders of magnitude larger than the ra-
By combining the sensitivity estimates for gravitational dius of the visible universe! However, one is not justi-
wave detectors now under construction and the expected fied in ignoring the effects of strong gravitational fields,
amplitudes and frequencies of possible sources we see such as those that occur near the Schwarzschild radius of
that the possibility for detection in the near future is good. a black hole, on the behavior of quantum systems since
Furthermore, as the technology improves, an era of grav- gravity couples universally to all physical systems. When
itational wave astronomy may soon be possible. Since one takes account of the gravitational field in the quan-
gravitational waves are not absorbed by intervening mat- tum description of a system, qualitatively new features
ter as is electromagnetic radiation, such an astronomy emerge. One such feature is the phenomenon of Hawking
may allow us to explore regions of the universe, such radiation.
as the centers of galaxies, that are now blocked to our Even in the vacuum, where there are no real quanta,
view. pairs of virtual quanta of the various matter fields observed
in nature are being continually created and destroyed in
equal numbers. Their presence is manifested in such phe-
E. Gravitational Lenses nomena as the Lamb shift in hydrogen and the Casimir
In many of its effects, a gravitational field acts like a effect. According to Hawking, a black hole can absorb
medium with a variable index of refraction. Thus, two one member of such a virtual pair, leaving its partner to
of the observed effects discussed above, the bending of propagate as a real quantum of the field. The energy needed
light and the time delay of signals as they pass through a for this process to occur is supplied by the gravitational
gravitational field, can be understood on this basis. A fur- energy of the black hole. Hawking was able to show that,
ther consequence of this notion is that there should exist as a black hole formed, such a flux of real quanta should be
gravitational lenses with properties similar to those of or- produced and that it would be equal to the flux produced
dinary optical lenses. The most likely candidates for such by a hot body of temperature T given by
lenses are galaxies. If placed between us and a distant
kT = hg/2π c, (51)
point source such as a quasar, a galaxy can, in effect, pro-
vide more than one path along which light from the source where k is Boltzmann’s constant, h is Planck’s constant,
may reach the observer. As a consequence, one would see and g is the gravitational acceleration at the Schwarzschild
multiple images of the source. Applied to a galaxy, the radius of the black hole and is equal to c4 /4G M, where
bending formula (38) (with γ = 1) gives typical bending M is its mass.
angles of about 1 arcsec. For a solar mass black hole this temperature would be
The first gravitational lens, predicted by Einstein in 2.5 × 10−6 K. However, for a 1012 kg mass black hole it
1936, was observed in 1979. In 1998 a complete ‘Einstein’ would be 5 × 1012 K. Such a black hole would emit energy
ring was observed and the Californian-Arizona Space at a rate of about 6000 MW, mainly in gamma rays, neutri-
Telescope Lens survey lists 43 probable examples of lens- nos, and electron–positron pairs. Hawking has suggested
ing including the most distant galaxy so far observed. In that “primordal black holes” with such masses might have
addition, a study of the multiple images produced by the been formed by the collapse of inhomogeneities in the
light passing through a galactic cluster has enabled as- very early stages of the universe and that some of them
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
might have survived to the present day. If so, they prob- simply cease to exist. In deriving the emission from black
ably represent our only hope of observing Hawking ra- holes, the gravitational field was treated classically while
diation. However, measurements of the cosmic-ray back- the matter fields were treated quantum mechanically. It
ground around 100 MeV place an upper limit for black has been suggested that when the mass of the black hole
holes with masses around 1015 kg of about 200 per cubic becomes comparable to the Planck mass (M P ), that is,
light-year. the mass one can form from the constants c, h, and G,
The fact that a black hole can radiate real quanta might namely, (hc/G)1/2 ∼ 5 × 10−15 g, the gravitational field
seem to contradict the fact that no radiation can escape can no longer be treated as a classical field. If so, the fate
from a black hole. However, that restriction is only true of a black hole will be decided only when we have a con-
classically. One can think of the emitted radiation as hav- sistent theory of quantum gravity.
ing come from inside the event horizon surrounding the
black hole by quantum mechanically tunneling through
B. Quantum Gravity
the potential barrier created by its gravitational field. Ac-
tually, it is possible for a black hole to emit almost any The search for a consistent quantum theory of gravity still
configuration of quanta, including macroscopic objects. stands as a major challenge for physics. Because of the
Since we cannot have direct knowledge of the interior of extreme weakness of the gravitational force compared to
a black hole, all we can determine are the probabilities for the other fundamental forces in nature, the electro-weak
the emission of such configurations. The overwhelming and the strong, it is clear that the quantum effects of grav-
probability is that the emitted radiation is thermal with a ity would manifest themselves only at extremely short
temperature given by Eq. (50). distances and correspondingly high energies. It has been
That a black hole should have associated with it a argued that one measure of such a distance is the so-called
temperature fits in with some analogies between black Plank length L P formed from G, h, and c and given by
holes and thermodynamics discovered by Bardeen, Carter,
1
Gh 2
Hawking, and Bekenstein. If the energy density of the LP = 10−35 m, (53)
matter that went to make up the black hole is nonnega- c3
tive, it can be shown that, classically, the surface area of since this is the radius at which the Compton wavelength of
the event horizon surrounding it can never decrease with a black hole is equal to its Schwarzschild radius. To probe
time. Moreover, if two black holes coalesce to form a sin- such a small length would, it is further argued, require
gle black hole, the area of its event horizon is greater than energies of the order of a Plank mass which, in energy
the sum of the areas of the event horizons surrounding the units is about 1018 GeV. Thus one might expect that the
two original black holes. These properties are very similar effects of “quantum gravity” would become important for
to those of ordinary entropy. Furthermore, when a black instance when the radius of the early universe was of the
hole forms, all information concerning its structure ex- order of a Plank length. Put another way, one would expect
cept its mass, charge, and angular momentum is lost, and that the laws of classical gravity would no longer be ap-
when ordered energy is absorbed by a black hole it too plicable within such small distances so that any attempt to
is forever lost to the outside world. These considerations describe the evolution of the universe during this “Plank”
led Bekenstein to associate with a black hole an entropy era would require some kind of quantum theory of gravity.
S given by Furthermore the search for a “unified” theory of elemen-
tary particles could not be considered complete without
S = ckA/4h, (52)
the inclusion of the gravitational field and since such a
where A is the surface area of the event horizon surround- theory must, of necessity be a quantum theory, it must in-
ing the hole. clude a quantum theory of gravity. The other argument for
The existence of Hawking radiation raises the question quantizing the gravitational field is that, if it were strictly
of the ultimate fate of a black hole. As it radiates, its mass a classical field, one would be able to determine both the
decreases. Its temperature therefore increases, and hence position and momentum of its sources simultaneously and
so does the rate of emission of radiation. What is left is, at thus violate the uncertainty principle.
this point, speculation. It might disappear completely, it One possible approach to the construction of a quan-
might cease radiating when its mass reaches some critical tum theory of gravity is to proceed along the lines used
value, or it might continue radiating indefinitely, creating a to quantize other fields such as the electromagnetic field.
negative-mass naked singularity. While the latter two pos- One constructs a Hamiltonian version of the theory and
sibilities seem unlikely, the first one implies that whatever then applies the rules of quantum mechanics. The fields
matter went into making the black hole initially would and their conjugate momenta are taken to be operators
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
and one writes down an appropriate Schrödinger equation. and since one of these excitations corresponds to a mass-
However, when one attempts to apply this prescription one less spin 2 state, that state was identified with a graviton,
runs into apparently insurmountable difficulties. The clas- the putative quanta of the gravitational field. Also, the
sical Hamiltonian theory possesses a number of so-called metric used to construct the string or p-brane action must
constraint equations, algebraic relations between the field satisfy the Einstein equations, albeit in eleven or more di-
variables and their conjugate momenta, which have been mensions, in order to insure the cancellation of a dilational
studied by Peter Bergmann and his students and by Paul anomaly in the theory. Most recently it was shown by
Dirac. These constraints are generators of the infinitesimal A. Strominger and C. Vafa that by counting the energy
coordinate transformations under which the theory is in- levels of the superstring gravitational field for a black hole
variant and as such must satisfy a specific Poisson bracket one obtains the Bekenstein formula (52) for the entropy of
algebra. Unfortunately, in the quantized version of the the- a black hole. What is still lacking in the theory at this time
ory using the gravitational field as the basic field variables however is a reduction, in some classical limit, of the full
no factor ordering of the constraints satisfies an analogous Einstein field equations. Such a derivation would show just
commutation algebra. Furthermore, only “observables,” how the classical gravitational field arises as an ‘emerging’
i.e., quantities that are themselves invariant under the in- property of a more basic underlying structure in much the
variant transformations of the theory are quantized. The same way as the hydrodynamic variables of fluid mechan-
problem of finding such observables in general relativity ics arise from the Boltzmann equation, which itself arises
is a still an unsolved problem. from classical many-body theory. It might even show how
If, on the other hand, one proceeds to quantize the lin- the very notion of a space-time point arises from such a
ear Einstein equations and treat the nonlinear terms as structure.
perturbations one also runs into serious difficulties. As in The Ashtakar approach to constructing a theory of
all quantum field theories one encounters divergent inte- quantum gravity is much less radical than that taken by
grals. In successful theories such as quantum electrody- string theory. Ashtakar starts from the classical Einstein
namics where there only a finite number of such integrals, equations but rather than working with the gravitational
they can be “renormalized” away. However, in the gravi- field as the fundamental variable he tries to recast the the-
tational field case one encounters new divergent integrals ory into a form resembling a Yang-Mills gauge quantum
at each new order of approximation making the theory field theory. In effect he uses the connection which, in
non-renormalizable. terms of the field gµν is given by Eq. (18) and an addi-
In recent times there have been several new attempts to tional tetrad field. In addition the connection is complexi-
construct a quantum theory of gravity. One such approach fied, that is, treated as a complex field. They resulting field
is supergravity. equations, though equivalent to the Einstein equations as
It is an extension of supersymmetric quantum field the- far as their physical content is concerned, have the form
ory in which every boson in the theory has associated of Yang-Mills equations. As in the hamiltonian version
with it a fermion and vice versa. Furthermore, the theory of the Einstein equations, the Ashtakar variables satisfy a
is invariant under the interchange of bosons and fermions. number of algebraic constraints. These latter constraints
Such theories have been used to construct a unified theory however are not as formidable as those between the gµν
of the strong and electroweak interactions between ele- and their conjugate momenta and it appears that they can
mentary particles. In supergravity, one associates a spin by solved nonperturbatively.
3/2 particle called the gravitino with the quantum of the The two approaches, string theory and canonical quanti-
gravitational field. This theory has been shown to have no zation using Ashtakar variables, appear to be incompatible
logarithmic divergences in the lowest two orders of per- since the former is based on elementary excitations (gravi-
turbation theory even when gravity is coupled to matter. tons) while the latter, being nonperturbative, does not. At
Whether supergravity or one of its extensions proves to be present, however, the physical consequences of neither
a viable theory is a matter for future investigation. theory have been explored sufficiently to allow one to de-
The two most recent attempts to construct a quantum cide if indeed they are equivalent or whether in fact either
theory of gravity are Superstring theory and canonical one agrees with observation.
(Hamiltonian) quantization using so-called Ashtekar vari-
ables. Both approaches show considerable promise at this SEE ALSO THE FOLLOWING ARTICLES
time. Superstring theory replaces the point particles of
quantum field theory by strings, membranes or higher di- CELESTIAL MECHANICS • COSMOLOGY • GRAVITA-
mensional extended structures called p-branes. In these TIONAL WAVE PHYSICS • MANIFOLD GEOMETRY •
theories p-brane excitations correspond to particle states MECHANICS, CLASSICAL • MÖSSBAUER SPECTROSCOPY
P1: GNH/GUR P2: GLM Final Pages
Encyclopedia of Physical Science and Technology En014A-658 July 28, 2001 17:6
• QUANTUM MECHANICS • QUANTUM THEORY • RELA- Hawking, S. W., and Israel, W. (eds.) (1979). “General Relativity,”
TIVITY, SPECIAL • STELLAR STRUCTURE AND EVOLUTION Cambridge University Press, Cambridge, U.K.
Hawking, S. W., and Israel, W. (eds.) (1987). “300 Years of Gravitation,”
• X-RAY ASTRONOMY
Cambridge University Press, Cambridge, U.K.
Kaufman, W. J., III. (1977). “The Cosmic Frontiers of General Relativ-
ity,” Little, Brown, Boston, MA.
BIBLIOGRAPHY Misner, C. W., Thorne, K. S., and Wheeler, J. A. (1971). “Gravitation,”
Freeman, San Francisco.
Anderson, J. L. (1967). “Principles of Relativity Physics,” Academic Rindler, W. (1977). “Essential Relativity,” 2nd ed., Springer Verlag, New
Press, New York. York.
Bergmann, P. G. (1968). “The Riddle of Gravitation,” Scribner’s, New Sexl, R., and Sexl, H. (1979). “White Dwarfs, Black Holes,” Academic
York. Press, New York.
Bertotti, B., de Felice, F., and Pascolini, A. (eds.) (1984). “General Rel- Smarr, L. (ed.) (1979). “Sources of Gravitational Radiation,” Cambridge
ativity and Gravitation,” Reidel, Dordrecht, Netherlands. University Press, Cambridge, U.K.
Blair, D. G., and Buckingham, M. J. (eds.) (1989). “Fifth Marcel Gross- Thorne, K. S. (1994). “Black Holes and Time Warps: Einstein’s Outra-
man Meeting on Recent Developments in General Relativity,” World geous Legacy,” Norton, New York.
Scientific, New Jersey. Wald, R. M. (1984). “General Relativity,” Univ. of Chicago Press,
Cooperstock, F. I. (ed.) (1990). “Developments in General Relativity,” Chicago.
I.O.P., Bristol, U.K. Weinberg, S. (1972). “Gravitation and Cosmology,” Wiley, New York.
Evans, C. R., Finn, L. S., and Hobill, D. W. (eds.) (1989). “Frontiers in Wheeler, J. A. (1989). “A Journey into Gravity and Spacetime,” Freeman,
Numerical Relativity,” Cambridge University Press, Cambridge, U.K. New York.
Hawking, S. W. (1988). “A Brief History of Time,” Bantam Books, New Will, C. (1981). “Theory and Experiment in Gravitational Physics,”
York. Cambridge University Press, Cambridge, U.K.
P1: GSR/GLT P2: GLM Final Pages Qu: 00, 00, 00, 00
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
Relativity, Special
Kathleen A. Thompson
Stanford University
117
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
the motion of objects, even when they are moving at ex- I. OBSERVERS, REFERENCE FRAMES,
tremely high speeds with respect to each other. At the be- AND OTHER PRELIMINARIES
ginning of the 20th century many physicists were aware
that there were some difficulties in existing physics the- The process of making measurements and observations
ories. On the one hand there was a theory of mechan- involves some subtleties in relativity. In everyday life we
ics (the theory of motion of bodies) that had been devel- don’t normally worry about the distinction between “when
oped by Galileo, Newton, and others in the 17th century. an event happens” and “when we see an event happen”
This theory was well established and had many successes. because the speed of light is so high that the time for it
There was also a newer but also very successful theory of to travel to our eyes is undetectably small. When dealing
electromagnetism that had been put forth by the Scottish with velocities near the speed of light, as in relativity, we
physicist James Clerk Maxwell in 1861. However, these must be more careful. So to begin, we need to discuss
two theories were hard to reconcile with certain experi- observers in the context of relativity.
mental observations, and they did not seem to fit together We define an observer’s reference frame to be a spa-
consistently. tial coordinate system in which the observer is at rest,
One symptom of the problems was the fact that the along with clocks at locations of interest. These clocks
speed of light in empty space seemed to be always the are at rest in the spatial coordinate system, and we as-
same, no matter how fast the source of the light and/or sume a clock is conveniently located in the vicinity of any
the observer were moving. Based on Newtonian mechan- event whose time of occurrence we need to measure. An
ics (and on everyday experience with velocities), it was event is the mathematical idealization of the concept of
expected that if an observer is traveling in the same direc- something happening at a particular place at a particular
tion as a beam of light, the light should appear to him/her time. Obviously an event (e.g., a handclap) has an exis-
to have a slower speed than it does to an observer mov- tence independent of anyone’s frame of reference. But to
ing opposite to the direction of the light. Of course, since specify an event in a given reference frame, we can give
light moves at extremely high speed, it requires very sensi- three spatial coordinates to tell where it occurs and a time
tive experiments to accurately measure its speed and look coordinate to tell when.
for a dependence on the motion of the source and/or the The observer can specify where an object is located
observer. by giving its spatial coordinates (x, y, z) in his coordi-
As we shall see, it turns out that Newtonian mechanics is nate system. We will use bold-faced notation to denote
really only an approximate theory, useful when the veloc- such three-dimensional vectors, in particular we use x as
ities involved are not too high. Special relativity replaces a short-hand for the position vector (x, y, z). Our observer
Newtonian mechanics and is consistent with Maxwell’s also needs a fourth coordinate so that he can specify the
theory of electromagnetism. Although special relativity is time t when the object is at the location (x, y, z). The ob-
elegant, logical, and one of the cornerstones of modern server in a given reference frame may specify the motion
physics, some of its consequences can seem bizarre and of an object by giving its three spatial coordinates (x, y, z)
paradoxical at first. This is because our everyday expe- in that frame as a function of the time. We imagine that
rience and perceptions involve objects that have speeds the time of arrival at a given spatial location is to be read
much less than the speed of light. The speed in light from a nearby (“local”) clock.
in empty space, denoted by c, is equal to 2.99792458 × The observer (or observers) in a given reference frame
108 m/sec (this number is exact, since the meter is now are assumed to have access to the data in a given frame,
defined to make it so). At this speed, a beam of light can even if the events being measured are distantly located.
travel all the way around the earth about seven times in a However, if the space and time coordinates of a number
second. of events are recorded in that frame, it may take some time
When velocities are much lower than c, the differ- for an observer located at a given place to gather together
ences between the predictions of special-relativistic and all that data. The observer cannot instantaneously find out
Newtonian mechanics are very slight. However, in situa- what is happening at distant locations—it takes time to
tions where the differences between the two theories are transmit information to him. For instance, if a lightning
large enough to be measured, special relativity has always flash occurs at time t at a distance D from the observer, he
turned out to be right. Special relativity is now an essential will not actually see the lightning flash with his eyes until
tool in many scientific and technical fields, including nu- time t + D/c, where c is the speed of light. Nevertheless,
clear physics and reactors, astrophysics, acceleration and we say he observes it to occur at time t, since he is assumed
control of high-energy particles, and our fundamental the- to be able to measure how far it is away, and he can take
ories of the elementary particles from which the universe the light travel time into account. Alternatively, another
is made. observer in the same frame could record the time on a
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
Before special relativity, it was generally assumed that of potential energy is the energy a ball has by virtue of
the Galilean transformation is exactly true. For the train being held above the ground, in the gravitational field of
and ball example, it is an extremely good approximation. the Earth. A soon as the ball is let go, it accelerates toward
But suppose the train were instead a very futuristic rocket the Earth, gaining kinetic energy and losing an (approxi-
moving away from some star at 90% of the speed of light. mately) equal amount of potential energy. The reason we
Suppose also that the ball were moving toward the front say “approximately” is that the friction of the air through
of the rocket at 30% of the speed of light (with respect to which the ball falls dissipates a small amount of the energy
the rocket). Then, as we shall see, it would be completely as heat. Heat is “internal” kinetic energy, i.e., the energy of
wrong to conclude that the ball is moving away from the motion of the molecules comprising the ball and the sur-
star at 120% of the speed of light. rounding air. In an isolated system, the total sum of all the
different forms of energy is a constant in any given frame
2. Mass, Energy, and Momentum of reference. This is the law of conservation of energy.
in Newtonian Mechanics Newton’s Second Law tells how the momentum p of an
object changes with time when a force F is applied:
Before proceeding further into relativity we need to briefly
F = dp/dt. (8)
review the concepts of mass, energy, and momentum in
Newtonian mechanics. The mass is a measure of the inertia In Newtonian mechanics, this can also be written F = ma,
of a body, which may be thought of as its “resistance” to where a = dv/dt is the acceleration of the object.
the action of a force. In Newtonian mechanics, the total
amount of mass in an isolated system (that is, a system that 3. Galilean Relativity Principle
has negligible interaction or exchange with the rest of the
universe) is a conserved quantity, i.e., it does not change A principle of relativity had been formulated by Galileo
with time. Also, the mass of an object does not depend on long before Einstein developed the special theory of rela-
how fast it is moving or on its temperature. tivity. Galileo considered the example of a ship in smooth
In Newtonian mechanics, the momentum p of an object waters, carrying passengers who cannot see out, but who
is defined to be mv where v is the velocity of the object have with them a bowl of fish, a bottle dripping water into
and m is its mass. Thus momentum is a vector with three a bowl, and some birds and butterflies. From their obser-
components vations, the passengers cannot tell whether the ship is at
rest or moving forward in a straight line at constant speed.
px = m d x/dt, p y = m dy/dt, pz = m dz/dt. (6) As Galileo said,
The total amount of momentum in an isolated physical sys-
tem is also conserved. However, the amount of momentum . . . the little animals fly with equal speed to all sides of the cabin.
depends on the reference frame. As a very simple exam- The fish swim indifferently in all directions, the drops fall into
ple, consider a single object with mass m that is at rest the vessel beneath, and in throwing something to your friend you
need throw it no more strongly in one direction than another. . . .
in some reference frame. If it is isolated from all outside
influences, it remains at rest—its momentum is zero. In
More formally, this means that the if the fundamental laws
a reference frame moving with velocity v with respect to
of motion hold in one reference frame, then they also hold
the object’s rest frame, the object has momentum equal
in any reference frame moving at constant velocity with
to −mv, also unchanging with time.
respect to the first frame. This is the Galilean relativity
A system consisting of several objects interacting with
principle.
each other can have the objects’ momenta redistributed,
To see one example of this, consider the law of momen-
but the sum of the momenta remains constant. For exam-
tum conservation as expressed in Eq. (7). If we change
ple, in a system consisting of two bodies, the Newtonian
to a frame of reference moving with velocity V with re-
law of momentum conservation says
spect to the original frame, then according to the Galilean
m 1 v1,i + m 2 v2,i = m 1 v1, f + m 2 v2, f , (7) transformation we simply subtract V from each of the ve-
locities in Eq. (7). to get the velocities in the new frame
where the subscript “i” labels quantities before the col-
(v1,i = v1,i − V, etc.). The masses m 1 and m 2 are not
lision and the subscript “ f ” labels quantities after the
changed by transforming to a different reference frame.
collision.
Thus, transforming to the new frame is equivalent to sub-
There are two basic forms of energy: (1) kinetic energy,
tracting (m 1 + m 2 )V from both sides of Eq. (5). So, mo-
the energy of motion of a body, which is defined in Newto-
mentum conservation also holds in the new frame:
nian mechanics by 12 mv 2 ; and (2) potential energy, which
may be regarded as “stored” energy. A common example m 1 v1,i + m 2 v2,i = m 1 v1, f + m 2 v2, f . (9)
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
As another example, we can check that Newton’s Second exerts very strong restoring forces when displaced from
Law is invariant under Galilean transformations. Differ- equilibrium and yet there was no direct evidence for its
entiating Eq. (5) gives d 2 x /dt 2 = d 2 x/dt 2 , since t = t existence). Nevertheless, it was assumed by many people
and v is constant. This just says that a = a , i.e., the accel- that there does exist a “luminiferous ether” permeating all
eration of an object is the same as measured in any two space, to play this role. It was expected that the speed of
frames moving at constant velocity with respect to each light should appear different depending on the observer’s
other. Furthermore, in Newtonian mechanics, the masses motion with respect to the ether.
and forces are the same in the two frames. Thus Newton’s The ether hypothesis ran into difficulties, although it
Second Law is valid in the primed frame, if it is valid in managed to survive a number of experimental tests by
the unprimed frame. introducing further hypotheses. For example, in order to
avoid contradictions with some of the experiments, it was
necessary to assume that the ether is partially dragged
B. Propagation of Light
along with moving material media in a very specific way
Maxwell’s theory of electromagnetism showed how elec- that depends on the index of refraction of the medium.
tric and magnetic fields are related to each other and to
the presence and motion of electric charges. The heart
of the theory is Maxwell’s equations for the fields, along 2. The Michelson–Morley Experiment
with an equation (Lorentz electromagnetic force law) that It was important to look for direct evidence of the pre-
gives the force on a charge in terms of the local elec- ferred frame of reference that the ether was supposed to
tric and magnetic field and the velocity of the charge. In provide (if it existed at all). The most famous such at-
Maxwell’s theory, light is a wave consisting of undulating tempts were carried out by A. A. Michelson and E. W.
patterns of electric and magnetic fields. Visible light is Morley. Their goal was to detect an effect on the measured
just a small slice of a spectrum of electromagnetic waves speed of light due to motion of the observer through the
ranging from long wavelength radio waves to very short ether. The experiments used an interferometer invented by
wavelength gamma rays. We use the term “light” to refer Michelson (1881), and later refined in collaboration with
to electromagnetic waves of any wavelength, even if they Morley (1887).
are outside the range visible to our eyes. The basic design of the interferometer is shown in Fig. 2.
Special relativity was originally motivated by the ob- It is essentially a device in which a beam of monochro-
servation of several fairly subtle effects involving light. matic (single-wavelength) light is split, follows two
To understand the issues involved, we must review some
of what was known about light. Early in the 19th century
(1801–1804), Thomas Young carried out a quantitative
demonstration of interference in light, using a double-
slit experiment. Interference is characteristic of waves—it
simply means that when two waves interact, they reinforce
or cancel each other depending on their relative phases of
oscillation. Young’s experiments were followed by de-
tailed studies of interference, diffraction, and polarization
of light, by A. J. Fresnel and others. Thus by Maxwell’s
time, it was well established that light exhibited many of
the properties of a wave.
velocity with respect to the rest of the universe. The freely C. Synchronization of Clocks
falling elevator is an example of an inertial frame that is
In order to make meaningful comparisons of time at differ-
not moving at constant velocity—it is accelerating toward
ent locations in a given frame, it is necessary to synchro-
the center of the earth. All that we require of an inertial ref-
nize the clocks being used in that frame. Since the speed
erence frame is that Galileo’s law of inertia hold to within
of light c is constant in any frame if Postulate 2 is true,
the sensitivity of our measurements, in the region of space
we could proceed as follows. Choose one of the clocks
and for the duration of time that we are concerned with.
to be the reference clock, and set its time to zero. Set the
Hereafter when we say “reference frame” or just
time on each of the other clocks to time Di /c, where Di
“frame,” we shall mean an inertial reference frame unless
is the distance from the reference clock to clock i, and
otherwise specified.
hold clock i’s time at this value. Now let a flash of light
be emitted from the reference clock, and at the same time
B. Postulates of Relativity let it begin to run starting at time t = 0. At the moment
when the flash arrives at any other clock, let it begin to
According to the Galilean relativity principle, the laws run starting at its preset value. In this way, all the clocks
of motion are the same in all inertial frames. Einstein will be synchronized, since the time lag needed for the
extended the relativity principle to ALL the laws of flash to reach each clock will be exactly the preset value
physics, not just the laws of motion, and took it as one of for that clock. It is important to note that this procedure
the basic postulates of his special theory of relativity. synchronizes the clocks according to observers at rest in
a particular reference frame. As we shall see, the clocks
1. Postulate 1 (Principle of Relativity): The funda- will be noticeably out of synchronization according to ob-
mental laws of physics are the same in every inertial ref- servers in a frame moving at high speed with respect to
erence frame. the first frame.
Einstein, in recalling a paradox he had first thought about
when he was sixteen, wrote:
D. Time Dilation in Relativity
If I pursue a beam of light with the velocity c (velocity of light in The postulate that the speed of light in empty space is
a vacuum). I should observe such a beam of light as a spatially constant is the basis for many of the results in relativity that
oscillatory electromagnetic field at rest. However, there seems to go against our normal intuition. As one example of this,
be no such thing, whether on the basis of experience or according
consider the following situation. Suppose that there are
to Maxwell’s equations.
two parallel mirrors a distance D apart. We can use them
to make a clock consisting of a light flash that bounces
In other words, Einstein began to question whether it was
back and forth between the two mirrors. The clock “ticks”
possible even in principle to “catch up with” a beam of
each time the light flash hits either mirror. Suppose that
light. This, along with the failure of all experiments to
this clock is put on a high-speed rocket.
detect any changes of the speed of light in empty space,
First consider how the clock looks to an observer (whom
motivated him to assume.
we shall call O ) who is moving with the rocket and is at rest
with respect to the clock. Observer O sees the light moving
2. Postulate 2 (Invariance of the Speed of Light):
straight up and down along the distance D between mirrors
The speed of light in vacuum is the same in all reference
as shown in Fig. 4(a). According to observer O , each tick
frames.
of the clock takes a time T = D/c.
In particular, the observed speed of light depends neither Let the rocket be moving to the right at speed v with
on the speed of the source of light nor on the speed of respect to another observer whom we shall call observer
the observer. As noted earlier, Newton’s Laws were as- O (see Fig. 4(b)). Common sense would lead us to expect
sumed to be valid in all inertial frames, and these laws that each tick of the clock takes the same time for observer
of motion are invariant under Galilean transformations. O as it does for O , but from the following arguments we
Special relativity modifies this by assuming that ALL cor- see that this cannot be true if relativity is valid.
rect fundamental laws of physics are valid in all inertial First, the dimensions transverse to the direction of mo-
frames, and the fundamental laws of physics are invari- tion are invariant. In the present situation this means that
ant under Lorentz transformations. Newton’s Laws and observer O agrees with observer O that the distance be-
the Galilean transformation are good approximations in tween the two mirrors is D.
many situations, but fail badly when dealing with speeds Next, we note that while the light is going from one
close to the speed of light. mirror to the other, the rocket moves to the right in observer
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
the postulate that the laws of physics are the same in all
inertial frames.
Furthermore, there is no inherent asymmetry between
observers O and O . From the point of view of O , it is O
who is moving (at velocity −v along the x-axis). By the
same kind of argument as given above, observer O will
observe clocks that are moving with O to run slow (by the
factor γ ) compared to his own clocks.
same place in the rest frame of the proper clock, that is,
the spatial separation (x)2 + (y)2 + (z)2 between the
two events is zero. Therefore the proper time τ is simply
related to the space–time interval between two time-like
events by
τ = s/c. (24)
Carrying a clock at constant velocity from one event to
another means that the portion of the clock’s world line
between the two events is a straight line in Minkowski
space. If the Minkowski diagram is plotted in the rest frame
of the clock, the line is vertical—there is no change in any
of the spatial coordinates anywhere along the path—not
only is x = y = z = 0 for the path as a whole, but
d x = dy = dz = 0 for any short segment along the path.
FIGURE 5 A Minkowski diagram (including only one spatial di-
mension), showing the world-line of a particle and the past and
Thus, in this frame
future light-cones of the event at O. [Reproduced with permission
dτ = dt (25)
from Jackson, J. D. (1975). “Classical Electrodynamics,” 2nd ed.,
p. 519, Wiley, New York.] for each short segment along the path.
Suppose the clock instead takes a less direct route, but
is still present at both events, i.e., the clock still departs
a curve in space–time. This curve is called the particle’s from x1 , y1 , z 1 at time t1 and arrives at x2 , y2 , z 2 at time t2 ,
world line. An example of a world line starting at event A but it does not travel at constant velocity. In this case it will
and proceeding to event B is shown in Fig. 5. not be true that d x = dy = dz = 0 for all segments along
Note that the absolute value of the slope of a world line the path. The total elapsed time on the clock is obtained by
must never be less than one (when plotted with time on the integrating the proper time along the new path. We assume
vertical axis as shown), since the particle may not exceed that the proper time along each short space–time segment
the speed of light. For a particle which passes through of the path may be calculated treating each sufficiently
O, all points in the white region marked “future” are in small segment of the world-line as a straight line, and we
principle reachable by the particle at later times, since it calculate the proper time assuming the clock moves with
could get to any of these events without exceeding the constant velocity along each short segment. The proper
speed of light. Similarly, a particle at O could in principle time along each such segment is then
have taken a path that allowed it to have been present at
any event in the white region marked “past.” However, all dτ = (cdt)2 − (d x)2 − (dy)2 − (dz)2 /c. (26)
events in the cross-hatched region are inaccessible in the
Comparing segments that range over the same values of
sense that they cannot causally influence, nor be influenced
the time coordinates, it is obvious that the proper time
by, an event at O. This region is sometimes referred to as
along the indirect path (Eq. (26)) is less than the proper
“elsewhere” with respect to event O.
time along the direct path (Eq. (25)) We see that taking
a more circuitous route in spacetime will make the inte-
H. Proper Time grated proper time less than it is if the clock takes the
most direct route! This peculiar feature of the geometry of
We saw earlier that an observer who is in motion with space-time is due to the relative minus sign between the
respect to a clock will always observe it to run slower space and time coordinates.
than clocks at rest in his own frame. This leads us to the
notion of proper time. Suppose that the time between two
events is to be measured. Assume also that the space–
IV. CONSEQUENCES OF THE
time interval between the two events is timelike, so that
LORENTZ TRANSFORMATION
is possible for a clock to be present at both events. If
the clock is carried at constant velocity from Event 1 to
A. Proper Length and Lorentz Contraction
Event 2, then the time between the events as read by that
clock is called the proper time between the events, i.e., τ A well-known result of relativity is that objects are ob-
is equal to t2 − t1 in that frame (we use the Greek letter served to be shorter along their direction of motion than
tau (τ ) to represent proper time). Both events occur at the when they are at rest. This follows directly from the
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
Lorentz transformation, but may also be derived by the The closer the velocities are to the speed of light, the more
following simple argument involving time dilation. this expression disagrees with simple addition of veloci-
The proper length of an object is defined to be its length ties. Note that if either of the velocities v or V is equal to
as measured in its own rest frame, Suppose the object is c, then V is equal to c. So this is consistent with the postu-
a straight stick having proper length L 0 . Let a bee fly late that the speed of light in vacuum looks the same in all
directly from one end of the stick to the other at constant reference frames. We also see that so long as both v and V
speed v with respect to the rest frame of the stick. Then are less than c, the magnitude V of the combined velocity
according to an observer in the stick rest frame, the trip will also be less than c. More general velocity combination
will take a time t0 = L 0 /v. The bee sees the stick going formulas, where the velocities are not all along the same
backwards, also at speed v. But according to the bee, from axis, may similarly be derived from the Lorentz transfor-
our previous discussion of time dilation, the trip will take mation, and these conclusions still hold.
less time—the time as measured in the frame of the bee
is tbee = t0 /γ = t0 1 − v 2 /c2 . Therefore, in the frame of
the bee, the distance travelled in going from one end of C. Relativistic Doppler Effect
the stick to the other is only L = vtbee , i.e., It is familiar from our everyday experience that the pitch
of a siren is higher when it approaches and lower when it
L = L 0 /γ . (27)
recedes. This is an example of the Doppler effect, which
Of course, for real bees, the factor γ is so close to 1 that is a change in the observed frequency of a periodic dis-
tbee and t0 are for all practical purposes indistinguishable. turbance, arising from the motion of the source and/or the
However, subatomic particles are often accelerated in high observer. In the case of sound waves, the observed speed
energy physics laboratories to speeds very close to c, so of the waves, as well as the observed frequency, depends
that γ 1. For example at Stanford University, electrons on the motion of the observer with respect to the transmit-
are accelerated in a 3-km (approximately 2-mile)-long ting medium. In other words, sound waves have a “pre-
straight machine called a linac. By the time an electron ferred” frame, namely, the frame in which the transmitting
gets to the end of the linac, its speed is so close to c that medium, typically air, is at rest. Light and other electro-
the Lorentz factor γ ≈ 105 . In the rest frame of an elec- magnetic waves, however, do not have a preferred frame—
tron with this speed, the apparent length of the linac is we have seen that attempts to find an “ether” failed.
only about three centimeters (little more than an inch!). For comparison, we will first derive the nonrelativistic
Doppler formula for sound waves. Then we will derive
the relativistic Doppler effect for light, i.e., for electro-
B. Velocity Combination Formula magnetic waves. Throughout the discussion we use the
general relationship
Consider two inertial frames such that the velocity of one
frame (the primed frame) with respect to the other frame V = λf (30)
(the unprimed frame) is v along the x-axis. We now sup-
pose there is an object having velocity V , also along the between the propagation speed V, wavelength λ, and fre-
x-axis, as observed in the primed frame. What is the ve- quency f of a wave or other periodic disturbance. Also,
locity V of the object as observed in the unprimed frame? the frequency of the wave is just the reciprocal of its
Commonsense experience would lead us to expect that period T :
the velocity in the unprimed frame is just V = v + V . f = 1/T. (31)
As shown earlier, this is what the Galilean transformation
gives. However, relativity tells us that the correct trans-
formation between frames is the Lorentz transformation, 1. Nonrelativistic Doppler Effect for Sound Waves
which gives
The speed of sound is much less than c, so if we assume
the speeds of the source and observer are also much less
d x = γ (d x + v dt )
than c, then Newtonian mechanics is valid to an excellent
v
dt = γ dt + 2 d x . (28) approximation. Suppose that a source of sound waves S
c and a receiver R are moving along the same line which we
Dividing the first equation by the second, and using the take to be the x-axis, and suppose that their velocities with
fact that V = d x /dt and V = d x/dt, we obtain respect to the air are v S and v R , respectively (we speak of
velocities v R and v S here rather than just speeds, since
v + V these quantities do have a sign according to whether they
V = . (29)
1 + vV /c2 are in the direction of the positive or negative x-axis). Let
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
of sound waves, the result depends only on the relative is that the speed of light is observed to be the same in any
velocity of the source and receiver. frame, regardless of the state of motion of the observer.
So the front and back flashes also both travel with speed
c in the woman’s frame. A consequence of this is that the
D. Relativity of Simultaneity: Einstein’s
two observers do not agree about whether the two flashes
Train Paradox
are simultaneous.
It is necessary to be extremely careful about using the This may seem paradoxical at first, because it is not
phrase “at the same time” when dealing with relativity. something we are familiar with from everyday experience.
The Newtonian and Galilean concept of an absolute time, We see that accepting Postulate 2 (invariance of the speed
flowing uniformly and at the same rate for all observers, is of light) forces us to revise the concept of simultaneity.
not strictly true. The notion of an absolute time is an excel- Relativity of simultaneity applies to any two events which
lent approximation in our everyday experience. However, are separated along the line of relative motion of two dif-
it breaks down when considering situations involving ferent inertial frames: If the two events are simultaneous
relative motion at very high speed. in one of the frames, they will not be simultaneous in the
To see this, we examine a situation sometimes referred other—this follows directly from the Lorentz transforma-
to as “Einstein’s train paradox.” Consider a (very high- tion. The train “paradox” is just one illustration of this
speed!) train moving along a straight track. Let a woman general statement.
be at sitting on the train exactly equidistant from its two Most “paradoxes” in relativity can in the end be reduced
ends, and let a man be standing on the ground right next to some confusion arising from the fact that whether or not
to the tracks. Suppose that the man sees the flashes from two events are simultaneous depends on the observer’s
two bolts of lightning striking the ends of the train at the frame of reference. With this in mind, it is instructive to
exact same moment when the woman is passing him on the look at two more famous relativistic paradoxes and how
train. At this moment the man knows he is also equidistant they are resolved.
from both ends of the train (from symmetry, assuming he
knows the woman is sitting equidistant from both ends).
E. The Twin Paradox
So, knowing that the speed of light is c, he concludes that
the light flashes from each end will take the same amount According to the principle of relativity, all processes in a
of time to reach him and therefore the strikes at each end given frame, including the biological workings of a human
of the train occurred simultaneously. He also concludes body, must undergo the same amount of time dilation as
that the flash from the front of the train will arrive at the all the other clocks and physical processes in that frame.
woman before the flash from the back arrives, because she Any inertial frame is supposed to follow the same physical
is moving toward the front flash and away from the back laws as any other inertial frame, and this would be vio-
flash. lated if a person saw a clock in his own frame run slower
Now, what does the woman say about these events? with respect to his own heartbeat when he changed his
The mere fact of changing reference frames, e.g., from speed (of course, if this happens to a race car driver, it is
the man’s frame to the woman’s frame, cannot change the a psychological and not a relativistic effect!)
time ordering of events which occur at the same location A very famous paradox involving relativistic time dila-
in some frame (in this case, at the location of the woman tion is the twin paradox. Suppose there are twins, Astro
in her frame). To see this, suppose for example that the and Homer, and that Astro makes a round trip journey to
woman is carrying a device which does nothing if it re- a star 20 light-years away while Homer remains on the
ceives the front flash first, and explodes if it receives the Earth. A light-year is the distance light travels in 1 year.
back flash first. Obviously, the functioning of the device [Here we use the convenient device of measuring time and
must be independent of whose frame the situation is ana- distance in the same unit, the year.] To make the analy-
lyzed from—the explosion either occurs or does not occur sis simple, we assume that Earth and the destination star
in both frames. So the device must receive the flashes in are both at rest in the same inertial frame, and that Astro
the same order in both frames. travels at constant speed (say 80% of the speed of light)
At first glance there is nothing surprising about the straight to the star, instantaneously reverses direction upon
fact that the woman sees the front flash before the back getting there, and returns at the same speed. Since the one-
flash. Based on our ordinary experience with velocities way distance L is 20 light-years in Homer’s rest frame and
we would be tempted to say that in the woman’s reference β = v/c = 0.8, Homer calculates that the total time T for
frame, the front flash travels faster than c and the back the trip will be 2L/v = 50 years.
flash travels slower than c, if c is the speed of light in the However, from the Lorentz time dilation, Homer knows
man’s frame. However, one of the postulates of relativity that Astro’s clock will be running slow by the factor
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
event, according to Astro while he is in the outbound takes another 20 years to get back to Earth. According to
frame. the Doppler formula, Homer receives pulses originating
This line of simultaneity for the outbound frame S in- during the outbound leg at the rate
tersects the t-axis, where x = 0 (i.e., at the location of
1 − β 1/2
Earth), at t = 9 years. This is in agreement with the asser- · (1 flash per year) = 1/3 flash per year.
1+β
tion that Astro sees Homer’s clock run slow by the factor (42)
γ —the outbound trip takes 15 years according to Astro,
Therefore he receives a total of
thus 15/γ = 9 years should elapse on Homer’s clock, ac-
cording to Astro’s definition of simultaneity. (45 years)(1/3 flash per year) = 15 flashes. (43)
By similar reasoning, the lines of simultaneity for frame
During the inbound leg he receives pulses at the rate
S , the frame of Astro during his inbound journey, have
slope −v/c. The downward-sloping dotted line is the line 1 + β 1/2
· (1 flash per year) = 3 flashes per year.
of all events simultaneous with the turnaround event, ac- 1−β
cording to Astro when he is in the inbound frame. The (44)
intercept on the t-axis is at t = 41 = 50 − 9 years, again He receives these flashes over the course of the remaining
consistent with Astro’s expectation that 9 years should 5 years, for a total of 5 · 3 = 15 more flashes. Thus he has
elapse on Homer’s clock during Astro’s return journey. received a total of 15 + 15 = 30 flashes, as he should since
However, Astro’s definition of simultaneity with events on Astro ages 30 years and therefore sent Homer 30 flashes
Earth jumps ahead by 41 − 9 = 32 years when he changes during the trip.
frames! Of course, no sudden jump occurs on the clocks Now let us calculate how many flashes Astro sees. Ac-
in any frame! It is just that the definition of simultaneity cording to him the outbound leg of the trip takes 15 years,
of events is relative—it depends on the observer’s frame during which time he receives flashes at the rate of 1/3 per
of reference. Clocks at a given location (say at the earth) year, for a total of 5 flashes. The inbound trip also takes
which have been synchronized in different frames will 15 years, during which time he receives flashes at the rate
in general have different readings. Suppose that the twins of 3 per year, for a total of 45 flashes. Thus over the course
were exactly 30 years old when Astro departed on his jour- of the entire trip he receives 45 + 5 = 50 flashes, in agree-
ney. If Astro has just reached the turnaround point, he will ment with the fact that Homer has aged 50 years when
say that it is his own 45th birthday and if he is still in the they are reunited.
outbound frame, he will say that back on Earth it is “now” If a real human were going on such a journey, he or
Homer’s 39th birthday. However, if an instant later he has she would need to be accelerated gradually rather than
made the transition to the inbound frame, he will say that instantaneously. The calculation of the age difference of
Homer is “now” celebrating his 71st birthday. According the twins, although more complicated, could still be done
to Astro’s clocks it takes 15 years to return, making him and would still show Astro to end up younger than Homer.
60 years old at the twins reunion. As noted before, ac- Our civilization does not yet have the technology and re-
cording to Astro, 15/γ = 9 years will elapse for Homer sources to send living beings on such high-speed journeys,
during the return journey, making Homer 80 years old at so this particular experimental test has not been done!
their reunion. However, time-dilation effects have been experimentally
We can gain further insight by looking at how the sit- verified by observing subatomic particles. For example,
uation evolves if each twin is periodically sending out a there is a type of particle called a muon, which has only a
light flash at some frequency f as measured by his own short lifetime before it disappears, producing other parti-
clock, say one flash at each birthday. Then each twin can cles. (The produced particles are the familiar electron and
count the flashes received from the other, and by the time particles called neutrinos.)
the twins come back together again, each twin must have It cannot be predicted exactly when this “decay” of a
received all the signals sent out by the other during the trip. given muon will occur, but the average lifetime before de-
To be consistent with Homer’s faster aging, Astro should cay is about 2 × 10−6 sec. This is the lifetime of a muon as
receive a total of 20 more flashes than Homer does over observed in its rest frame. However, muons often travel at
the course of the trip. speeds very close to the speed of light relative to observers
First consider what Homer sees. He says each leg of the on the Earth. When a large number of muons having a
trip takes 25 years. Thus the flash from the turnaround given speed are observed, it is found that the average of
event originates 25 years after Astro’s departure from their lifetimes does indeed increase by the factor γ .
Earth, according to Homer. However, he doesn’t see it un- Time dilation is the reason that significant numbers of
til 45 years after Astro’s departure from Earth because the muons from cosmic rays are observed at the Earth’s sur-
turnaround point is 20 light-years away and so this flash face. Muons are produced high in the atmosphere of the
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
Earth when cosmic rays enter from outer space.If their life-
times were not increased due to travelling near the speed
of light, very few of them would reach the surface of the
Earth before decaying.
back door is opened, allowing the head of the pole to Other stopping methods are of course possible. All seg-
pass through before the tail of the pole passes through the ments of the pole could be brought to rest with respect to
front door. In this frame, there is never a time when both the barn by stopping each piece simultaneously as mea-
doors are simultaneously closed. There had better not be, sured in the initial rest frame of the pole. This means that
because in this frame the pole is twice as long as the barn, each segment of the pole is given a backwards velocity
so it cannot fit completely inside at any time. In neither of −v as observed in the initial rest frame of the pole. The
frame does the pole touch either door. Something would length of the pole in the initial pole rest frame would still
be wrong if our reasoning told us the pole could dent a be L but this would no longer be its proper length since it
door in one frame but not in the other. We could look ends up moving at −v with respect to that frame, and hence
at the doors after the fact, and whether or not there is a is Lorentz-contracted by the factor γ = 2. Once again, the
dent cannot depend on the observer’s frame of reference. pole has been deformed to a new proper length—this time
However, the answer to the question “Was the pole ever it is stretched or pulled apart to a new proper length of
entirely inside the barn?” does depend on the observer’s γ L = 2L.
frame of reference. Stopping the pole while maintaining its original proper
Suppose that an observer in the barn rest frame decides length throughout the transition from the original pole
she will “prove” that the pole is “really” shorter than the rest frame to the barn rest frame is more complicated to
barn by suddenly bringing it to rest while (according to analyze. In this case, the pole moves smoothly through an
her) it is totally inside the barn. However, “suddenly bring- infinite sequence of different Lorentz frames. Einstein’s
ing the pole to rest” means that all pieces of it are brought general theory of relativity is better suited to handling
to rest simultaneously. As we already know, events along such situations involving continuous acceleration than is
the direction of motion—in this case along the length of special relativity.
the pole—that are simultaneous in the barn frame are not Obviously, runners carrying poles do not really run fast
simultaneous in the pole frame. Furthermore, no causal enough for relativistic effects to be significant! But the
influence can go faster than c. It would not be sufficient electrons in the Stanford linac typically travel down the
for the barn-frame observer to just grab the pole at its cen- linac in groups (“bunches”) that are about a centimeter
ter with a single short clamp. There is no immediate effect long as observed in the rest frame of the linac. For a
on the rest of the pole—it takes a nonzero amount of time Lorentz factor γ = 105 , the proper length of a bunch (that
for the ends of the pole to even “know” that the clamp has is, the length as observed in the rest frame of the bunch) is
taken hold of the middle, since no causal influence can go therefore about a kilometer. As noted earlier, in the frame
faster than c. Thus we need many clamps stationed along of an electron with this value of γ the linac is only about
the path of the pole if we want to stop all parts of the pole three centimeters long, obviously much shorter than the
simultaneously. proper length of the bunch. Whether or not the bunch “fits”
So, let us assume that the observer in the barn rest frame inside the linac depends on your reference frame!
times her clamps so that all parts of the pole stop at the
same time in her frame. When the pole has been brought to
rest in the barn, it has been crushed to a length of L/2. This V. RELATIVISTIC TREATMENT OF
is now the proper length of the pole since the pole is now at ENERGY AND MOMENTUM
rest in the barn frame. The structure of the pole really has
been changed—the barn observer has crushed the pole to Two very fundamental laws of mechanics are conservation
half its original proper length using her stopping method. of energy and momentum. These laws are extremely use-
How does this look to an observer in the original rest ful in analyzing physical situations. They say that the total
frame√of the pole? His frame continues to move at velocity amount of energy and the total amount of momentum in an
v = 3c/2 with respect to the barn. According to him, isolated physical system do not change with time. Since
clamps near the head of the pole take hold before clamps they are so useful, we would hope to have similar conser-
near the tail of the pole. So he agrees that the pole will get vation laws in special relativity. We will also want the new
crushed to a shorter length. But the final length as observed relativistic definitions of mass, energy, and momentum to
in his frame is not its proper length, since the pole does reduce to the old Newtonian ones in situations where the
not end up at rest with respect to his frame. He continues velocities involved are much less than the speed of light.
to move at speed v with respect to the barn frame, which All this can be done, but some modifications to our notions
is the new rest frame of the pole. Therefore the pole is of mass, energy, and momentum are required. To motivate
Lorentz contracted from its new proper length of L/2 by the new definitions, we will look at situations involving
another factor of γ = 2, to an apparent length of L/4 in collisions between two objects. Even in these simple situ-
his frame. ations the arguments are a little more complicated than in
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
A. Relativistic Momentum
Momentum conservation would not be preserved under
Lorentz transformations, if we used the Newtonian def-
inition of momentum p = mv, where m is the mass. Let
us try introducing a multiplicative function f (v) that is a
function of the speed v of the object (remembering that
speed v is just the magnitude of the velocity v, i.e., v = |v|).
We start by guessing that the momentum in a frame where
the object moves with velocity v is given by
m f (v)v. (45)
Our first goal is to see if we can find a functional form
for the dependence of f (v) that satisfies both momentum FIGURE 9 Elastic collision of particles A and B. (a) As viewed
conservation and the principle of relativity (i.e., momen- in center of mass frame (chosen to be unprimed frame). (b) As
tum conservation holds in all inertial frames). For v c viewed in frame moving with longitudinal velocity of particle A
we want the momentum to agree with the Newtonian def- (primed frame).
inition, thus we require f (v) → 1 as v → 0.
To find out what f (v) must be, we consider an elastic
collision between two objects of equal mass m. “Elastic” component during the collision, the y component of mo-
means that the two objects bounce apart without dissipat- mentum change in the collision is −2m f (v)D/TC M for
ing any of their incoming kinetic energy as other forms A and +2m f (v)D/TC M for B. From the symmetry of the
of energy. We will look at this collision from two differ- collision as viewed in this frame, it is obvious that the total
ent inertial frames. Choose one frame to be the center of momentum is conserved—it is zero both before and after
mass (CM) frame, which is the frame in which the total the collision.
momentum is zero. In this frame the momenta of A and B Now we impose the requirement that momentum con-
are equal in magnitude and opposite in direction. We can servation should hold for any other inertial frame as well.
choose the spatial coordinate axes in this frame, which we In particular, we consider a (primed) frame that is mov-
will label the unprimed frame, so that the collision looks ing to the right at speed vx with respect to the unprimed
as shown in Fig. 9(a). [Note this is not a diagram in space– frame. We choose vx to be the velocity component of A
time; it is simply a diagram of the paths of the objects in along the x-axis, as observed in the unprimed frame. Thus
two space dimensions.] Since the masses are equal and in the primed frame A just moves straight up the y axis
the collision is symmetric in this frame, A has an equal and back down, while B follows a longer path, as shown in
and opposite velocity to B, both before and after the colli- Fig. 9(b), with velocity component vx,B along the x-axis.
sion. Since the collision is elastic, the speed of each object [Note that vx,B is NOT the value −2vx that we would get
before the collision is the same as its speed afterwards. if the Galilean transformation, and hence straightforward
In this frame, we assume that A departs from the line addition of velocities, were valid.]
y = −D/2 at the same time B departs from the line We know that Lorentz transformations along a given
y = +D/2, and they have equal speeds v. At a time TC M /2 direction do not change the observed distances transverse
later, they collide at the origin and each reverses its compo- to that direction, so the total distance in the y direction
nent of velocity along the y-axis. After an additional time travelled by each object is also D in the primed frame.
TC M /2, A returns to the line y = −D/2 and B returns to We also know that events which are simultaneous in one
the line y = +D/2. Since the total distance travelled by frame are not simultaneous in another, when the events
each object along the y direction during the time TC M is are separated along the direction of relative motion of the
D, the magnitude of the y velocity component is D/TC M . two frames. By symmetry, the departure events were si-
Since each object reverses the direction of its y velocity multaneous in the unprimed frame and so were the return
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
(1 − v 2 /c2 )2 Substituting for V and 1 − V 2 /c2 from Eqs. (60) and (61)
1 − V 2 /c2 = . (61) we find
(1 + v 2 /c2 )2
2m
Therefore the energy of A in the unprimed frame is M= . (66)
1 − v 2 /c2
mc2 (1 + v 2 /c2 ) Clearly mass is not conserved in the collision, since the
EA = = mc2 . (62)
1 − V 2 /c2 (1 − v 2 /c2 ) total mass of the incoming objects was 2m. The final mass
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
initial kinetic energy of the incoming particles may be ab- VI. SPECIAL RELATIVITY AND
sorbed as an increase in the internal energy of motion of ELECTROMAGNETISM
the molecules comprising the composite body formed in
the collision (and perhaps its environment). This means Maxwell’s equations are one example of the postulate that
that the temperature is very slightly increased. What is fundamental laws of physics should be invariant with re-
different in relativity is that when the internal energy and spect to changes from one inertial frame to another via
temperature change, so does the mass. The change in mass a Lorentz transformation. Relativity shows that electric
is extremely tiny in situations we are familiar with in ev- and magnetic fields are aspects of the same entity—the
eryday life, since the conversion factor c2 between “mass” electromagnetic field. Indeed Einstein said,
and “energy” is so huge. As we have already noted, mass
(and thus rest energy) is not a conserved quantity in rela- What led me more or less directly to the special theory of rela-
tivity. tivity was the conviction that the electromotive force acting on
Conversion of rest energy to other forms of energy oc- a body in motion in a magnetic field was nothing else but an
curs on a significant scale in nuclear reactions. These are electric field.
of two basic types, fission reactions and fusion reactions.
In a fission reaction, a single atomic nucleus dissociates More precisely, when a pure electric field in one frame
into two or more pieces, where the sum of the masses of (e.g., the unprimed frame) is viewed from another frame
the pieces is less than the mass of the original nucleus. (the primed frame) in motion with respect to the first, there
The difference in rest energy is transformed into kinetic can be a nonzero magnetic field in the second frame.
energy of the pieces. In a fusion reaction, two nuclei fuse to As usual we will assume the primed frame is moving
form a nucleus that has less mass than the sum of original with speed v along the x-axis, and that the x and x axes
nuclei. The corresponding leftover rest energy is released are in the same direction. In this case, the transformation
partly as high energy electromagnetic radiation (including of the components of the electric field and the magnetic
ordinary visible light). This is the mechanism by which field is given by
the sun and other stars shine.
E x = E x , Bx = Bx
On a much smaller scale, the same type of thing hap-
E y = γ [E y − (v/c)Bz ], B y = γ [B y + (v/c)E z ] (75)
pens when energy is released or absorbed in normal chem-
ical reactions. The amount of energy involved is so small E z = γ [E z + (v/c)B y ], Bz = γ [Bz − (v/c)E y ].
however, that the fractional difference in mass between We shall not derive this result here. However, it follows
the initial and final constituents is extremely tiny. from the Lorentz transformation plus one additional as-
sumption.
This additional ingredient is Coulomb’s Law, which was
D. Force and Newton’s Second formulated in the 18th century. Coulomb’s Law gives the
Law in Relativity force F exerted by one charge (with charge q1 ) upon an-
other charge (with charge q2 ), assuming both charges are
Newton’s Second Law, which relates the change in mo-
at rest:
mentum of an object to the force F acting upon it, may be q1 q2
written in the form F =k 2 . (76)
r
F = dp/dt. (74) Here r is the distance between the two charges, k is a
constant, and the force F acts along the line through the
two charges. If q1 and q2 have the same sign, the force
It turns out to be useful to define force such that this
is repulsive; if they have the opposite sign, the force is
equation may still be used in relativity. In non-relativistic
attractive. Coulomb’s Law may be rewritten as
physics, Eq. (74) is equivalent to both F = ma and
F = mv/dt. This is not the case in relativity since p = γ mv F = q2 E, (77)
and γ depends on the velocity. In fact, in relativity the force
where E is the magnitude of the electric field at q2 :
and the acceleration are not necessarily even in the same
direction. q1
E = k 2. (78)
As v approaches c, the Lorentz factor γ approaches r
infinity and thus so do the momentum and the energy. The magnetic field is an effect that may arise simply be-
This is consistent with the observation that an object with cause one has changed reference frames. This is apparent
nonzero mass can never quite reach the speed of light, no from the above transformation. For example, suppose that
matter how much force is exerted on it. in the unprimed frame there is no magnetic field, that is,
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
Bx = B y = Bz = 0. From Eq. (75) we see that in the primed zero as v approaches c, because the electric and magnetic
frame there are nonzero magnetic field components B y forces tend to cancel each other in this limit. The calcu-
and Bz transverse to the direction of motion of the primed lation of the magnetic force on a charge is slightly more
frame if E y or E z is non-zero. Furthermore the transverse complicated than the calculation of the electric force—the
components of the electric field are different in the primed magnetic force is perpendicular both to the direction of B
and unprimed frames. and the direction of the velocity of the charge, and it de-
Consider, for example a single point charge at rest in pends on the magnitude of the velocity. However, the net
the unprimed frame. From Coulomb’s law, we know that result is that the electric and magnetic forces between two
the electric field is radially symmetric as shown on the left electrons having essentially the same x coordinate nearly
hand side of Fig. 11. The field is stronger where the field cancel, if they are moving at sufficiently high speed along
lines are closer together. Suppose the charge is in motion the x-axis.
with velocity v as shown on the right hand side of Fig. 11. Another way to see that the electrons are not much
Then the electric field is “flattened” along the direction affected by each others’ electric and magnetic fields is
of motion as shown. The field strength is increased trans- to take account of Lorentz contraction of the bunch. In
verse to the direction of motion and decreased parallel the frame where the compact bunch is moving at very
to the the
√ direction of motion. In this particular example, high speed, it is Lorentz contracted by a factor γ com-
v/c = 8/9, so that γ = 3. For larger v (and thus larger pared to its proper length. In the rest frame of the bunch,
γ ), the field would be flattened even more. the bunch length (its proper length) is much longer so
This transformation of the electric and magnetic fields that the forces (given by Coulomb’s Law) are reduced due
is relevant in high energy particle accelerators, such as the to the increased distance between particles.
linac discussed earlier. A linac accelerates a large num-
ber of charged particles, for example electrons, in a single
short bunch. The mutual electrical repulsion of the elec- VII. SPECIAL RELATIVITY AND
trons would be extremely strong if such a bunch were so QUANTUM MECHANICS
short when at rest. However when the speed of the bunch
is high enough, the forces between the electrons become Other developments occurring in parallel to Einstein’s for-
almost negligible. This is due to two effects, both arising mulation of special relativity eventually led to an entirely
from the Lorentz transformation of the fields. One effect new picture of the behavior of matter at the subatomic
is that the electric field is mostly transverse as shown, so level, the realm of quantum phenomena. Quantum me-
that it becomes small for electrons having different x co- chanics, a branch of physics that was motivated by studies
ordinates. The other effect is that even for electrons having of atomic structure and radiating bodies, eventually led to
essentially the same x coordinate, the net force approaches modifications of mechanics, in addition to those required
FIGURE 11 Electric field of a charged particle at rest (left), and of a charged particle moving with uniform velocity
v such that γ = 3 (right). [Reproduced with permission from Jackson, J. D. (1975). “Classical electrodynamics,”
2nd ed., p. 555, Wiley, New York.]
P1: GSR/GLT P2: GLM Final Pages
Encyclopedia of Physical Science and Technology EN014D-657 July 31, 2001 16:50
in going from Newtonian mechanics to special relativ- some other frame it can look like a particle originated
ity. According to quantum mechanics, light exhibits both at x2 and was absorbed at x1 . If the particle was electri-
particle properties and wave properties, and so do other cally charged, then from charge conservation its “time-
“objects” (e.g., electrons). reversed” form must appear to have the opposite charge.
Special relativity retains in common with Newtonian The mass is invariant, the same in all frames. Each type of
mechanics the assumptions that (1) light is a wave, and charged subatomic particle is therefore required to have
(2) Newtonian determinism is valid (that is, there is a corresponding type of antiparticle with opposite charge
no limitation in principle on how precisely we can si- and identical mass.
multaneously measure positions and velocities in a sys- A characteristic feature of very high energy collisions of
tem, and we can use these measurements to predict the subatomic particles is that new particle–antiparticle pairs
state of the system at later times). Quantum mechanics, are sometimes created. In some experiments, an electron
on the other hand, says that there is an inherent uncertainty may collide with its antiparticle (the positron) after both
in the measurement process. The minimum uncertainty in particles have been accelerated to very high energy. The
the product of the position uncertainty x and the momen- electron and positron can annihilate each other—all of
tum uncertainty p is given by the Heisenberg uncertainty their rest energy plus their incoming kinetic energy is then
principle available to produce other particles, so long as the sum
of the rest energies of the produced particles is less than
xp ≥ h, (79)
the original total energy and all other conservation laws
where h is Planck’s constant. Planck’s constant is a number (momentum, charge, etc.) are satisfied.
so small that this uncertainty is irrelevant in everyday life
where the objects we deal with are much larger than atoms. VIII. GENERAL RELATIVITY
The combination of special relativity and quantum me-
chanics has been very fruitful, leading to the formulation The term “special” in special relativity refers to the fact
of quantum field theory. This synthesis was achieved by that it is really only a restricted version of a more gen-
the middle of the 20th century, and is our current frame- eral theory of relativity which was put forth by Einstein in
work for describing and understanding the behavior of 1916. General relativity treats all reference frames—not
the most elementary particles observed in nature. As we just all inertial reference frames—on an equal basis. Gen-
have seen, the main motivation driving the development eral relativity can readily be used to analyze situations
of special relativity was the desire to account correctly for in which observers are accelerating and/or gravitational
electromagnetic phenomena. When electrodynamics was fields are present.
extended into the quantum domain, the result was quantum
electrodynamics, the prototype quantum field theory. The
underlying principles were further generalized to allow SEE ALSO THE FOLLOWING ARTICLES
the incorporation of additional phenomena—in addition to
electromagnetism, there exist other interactions between COSMOLOGY • ELECTRODYNAMICS, QUANTUM • ELEC-
certain particles. These include the “strong” and “weak” TROMAGNETICS • MECHANICS, CLASSICAL • OPTICAL
forces, that are important, for example, in understanding INTERFEROMETRY • PARTICLE PHYSICS, ELEMENTARY •
how the nuclei of atoms are constructed. QUANTUM THEORY • RELATIVITY, GENERAL • STELLAR
STRUCTURE AND EVOLUTION • TIME AND FREQUENCY
A. Antiparticles
BIBLIOGRAPHY
One consequence of quantum field theory is the neces-
sity for the existence of antiparticles. This comes about Feynman, R. P., Leighton, R. B., and Sands, M. (1963). “The Feynman
because it is possible for Lorentz transformations to re- Lectures on Physics,” Addison-Wesley, Reading, Massachusetts.
verse the time order of two events x1 and x2 that have a French, A. P. (1968). “Special Relativity,” W. W. Norton and Company,
space-like separation. “Space-like separation” means that New York.
Jackson, J. D. (1975). “Classical Electrodynamics,” 2nd ed., Chapters
the spatial separation of the two events is greater than c 11 and 12, Wiley, New York.
times their time separation. Thus, special relativity would Rindler, W. (1969). “Essential Relativity,” Van Nostrand Reinhold, New
say that there is no way for a particle to propagate from York.
one event to the other, if quantum mechanics is ignored. Sartori, L. (1996). “Understanding Relativity,” University of California
But due to the uncertainty principle in quantum mechan- Press, Berkeley and Los Angeles, CA.
Stachl, J. (1998). “Einstein’s Miraculous Year,” Princeton University
ics, there is a nonzero probability for a particle originating Press, Princeton, NJ.
at x1 to be absorbed at x2 even when the two events have Taylor, E. F., and Wheeler, J. A. (1992). “Spacetime Physics: Introduction
space-like separation. If the separation is space-like, in to Special Relativity,” 2nd ed., W. H. Freeman, New York.