Pre-U Physics

Cambridge Pre-U Physics
S1: Kinematics
S1.1 Using x instead of s for displacement

You will see that the Coursebook refers to displacement as ‘s’, and change in displacement as ‘∆s’.
In Pre-U, displacement may also be referred to as ‘x’ and change in displacement as ‘∆x’.
The equation for velocity then becomes v = ∆x
∆t
Summary
∆x
■ The equation for velocity may also be represented as v =
∆t
1
S2: Accelerated motion
Learning Outcomes
■ use calculus to describe motion, with differentials corresponding to gradients of graphs and
integrals corresponding to areas under graphs
S2.1 Describing motion using calculus
Later in your studies of maths and physics you will learn more about calculus (differentiation and
integration).
The two methods in calculus are differentiation, which is the equivalent when using graphs
of calculating the gradient of a function, and integration, which when using graphs involves
calculating the area under a line. We have seen already that gradients and areas are important in
calculating displacement, velocity and acceleration. So why is calculus needed?
Calculus can be applied directly to mathematical functions (equations) and this means we
do not need to plot graphs. Calculus can also give us calculated answers when the graph of a
function is curved and difficult to measure visually. For example, if we have a mathematical
1
function for the position of an object (such as s = ut + at 2) we can differentiate this function
2
to get the velocity.
1
We write the function and the differentiated function as follows:
1 2
s = ut + at
2
ds
v = = u + at
dt
Here, ‘ds/dt’ means ‘the function of displacement s differentiated with respect to time t’.
Using calculus allows us to solve problems with complicated and non-uniform functions for
acceleration. Calculus is especially useful when making calculations involving friction or air
resistance, where the acceleration changes depending on the size and direction of the velocity.
Note that differentiation of velocity produces a function that is closely related to an expression
we have already seen for acceleration, a = ∆v . For the differentiation of velocity v to give us the
d ∆t
acceleration a, we write a = v .
dt d
So what is the difference between a = ∆v and a = v ?
∆t dt
Differentiation provides an instantaneous value for a gradient, for example a true
instantaneous acceleration, whereas using ∆v means we are calculating only an approximation
∆t
to the true instantaneous value.
The same applies to integration when compared to counting squares on a graph. When we
count squares under a graph, we are determining an approximate value. Integration of a
function gives us an exact, calculated answer instead. For example, integrating the function
of velocity, v = u + at gives us a function for the displacement s, which is written as
1
∫ ∫ (u + at ) dt = ut + 2 at
2
s = v dt =
The integrated function for displacement should look familiar!

Calculus can be used to derive the equations of motion and also to produce more advanced
functions such as the rate of change of acceleration – how quickly an acceleration is applied or
removed – which is important in many engineering situations, such as designing motor vehicles
or roller coasters.
Summary
■ Calculus can be used to describe motion, with differentials corresponding to the
gradients of graphs and integrals corresponding to areas under graphs.
End-of-chapter questions
S2.1
A cheetah sets off in pursuit of its prey, an antelope. The antelope is initially 100 m
ahead of the cheetah and running at 20 m s−1. The cheetah accelerates at 15 m s−2 to its
top speed of 30 m s−1 and can keep that speed up for 10 s. Does the cheetah catch the
antelope, and if so, how long does it take?
2
S4: Forces – vectors and moments
Learning Outcomes
■ understand how to add three or more forces using a diagram
S4.1 Adding three or more forces

To add more than two forces using a diagram, we use the same principle, adding vectors by
drawing arrows end-to-end. The vectors do not need to be at right angles, and each vector
added is drawn on the end of the previous one. The starting point and the order of adding
the vectors are not important, as shown in Figure S4.1. The total or resultant is the vector
that joins the starting point to the end point.
b a b
c
R R
R c
c
a b a
Figure S4.1 The three vectors a, b and c add up to a resultant R, which is the same no matter whether 1
we start with a, b or c
If the vectors form a closed polygon, the resultant force is zero. This can occur in a
number of situations. For example:
• When an object is in equilibrium (not accelerating) because all the forces on it balance,
this follows from Newton’s second law.
• When only internal forces act within a system, and no external forces. Every internal
force is part of a pair. The other force of the pair is equal in magnitude and opposite in
direction, so every internal pair of forces cancels out. This leaves zero resultant and,
again, a closed polygon.
S4: Forces – vectors and moments
Learning Outcomes
■ understand that frictional forces depend upon the surfaces, the normal force and whether the
surfaces are in motion
S4.2 Friction
In Worked Example 1 in Chapter 4 of the Coursebook we are told the frictional force, but it
is often possible to calculate it. Consider the following: a book is placed on a table and the
table is tilted upwards. At some angle the book begins to slide. Once the book has started
sliding, it will move at a constant acceleration down the slope. How can we explain this, and
calculate the forces that act?
Frictional forces depend on two things:
• the nature of the two surfaces in contact with each other
• the strength of the force normal to those surfaces.
Here, ‘normal’ means ‘at right angles to’. The force that acts at right angles to a surface
is often called the normal contact force. You can observe this easily – it is harder to make
an object slide over a surface if you push down firmly on the object while trying to slide it. 1
If you only push down lightly, the object is easier to slide. Pushing down hard produces a
larger normal contact force than pushing down lightly.
We can write an equation for the frictional force, F, that acts parallel to the surface in
contact in terms of the normal contact force, N:
F ≤ µN
where µ is a constant called the coefficient of friction. The value of μ depends on the
properties of the two surfaces that are in contact with each other. The coefficient of friction is
a number – it has no units because the frictional force and the normal contact force are both
measured in newtons. The most important part of this equation is the ‘≤’ symbol, which
means ‘less than or equal to’. The equation tells us that the frictional force between two
particular surfaces can increase to exactly balance any applied force up to a limit.
N = 20 N
F X
W = 20 N
Figure S4.2 A heavy book placed on a table.
In Figure S4.2, a heavy book of weight W = 20 N is placed on a table. The coefficient of
friction µ between the book and the table is 0.3. A horizontal force, X, is gradually increased
until the book starts to move.
The book is not accelerating vertically, so from Newton’s first law of motion we know that
the normal contact force N will be 20 N. As long as the applied horizontal force X is less than
µN = 0.3 × 20 = 6 N, then the frictional force will exactly balance the value of X. Once X is
greater than 6 N, F cannot be greater than 6 N and the horizontal forces become unbalanced.
Friction is a variable force, one that balances and opposes other forces –
up to a limit.
We have one other experience to explain. If the force applied is a small amount greater
than the maximum frictional force µN, we would expect the book to accelerate very slowly.
In fact, it accelerates more than we would expect, so that a smaller force is enough to keep it
moving. This suggests that two different explanations of friction are needed, one for before
an object starts moving, and another for after the object has started moving. There are two
different coefficients of friction:
• µS is the coefficient of static friction which applies when two surfaces are not moving
relative to each other
• µK is the coefficient of kinetic friction which applies when two surfaces are moving
relative to each other.
Generally, µS is greater than µK between the same surfaces.
The worked example shows a typical calculation.
Worked Example S4.1
A book of mass 0.8 kg is placed on a surface with coefficient of static friction 0.4.
a The surface is gradually tilted up until the book begins to slide. Find the angle at which it
begins to slide.
b Given that the coefficient of kinetic friction is 0.25, find the acceleration of the book down
the slope.
N
F
2
Figure S4.3 For Worked example 2.
Step 1 We find the angle, θ, at which the book just begins to slide. We do this by taking
components of the forces parallel and perpendicular to the slope:
Perpendicular to the slope N = W cos θ
Parallel to the slope F = W sin θ
Dividing the second equation by the first we get sin θ/cos θ = tan θ = F/N
However, we also know that at the limit of static friction F = µSN and so F/N = µS
Hence the book begins to slide when tan θ = µS.
θ = tan−1(µS) = tan−1(0.4) = 21.8°
Note that the normal force and the component of the weight down the slope are
both proportional to the weight. This means that the value of F/N at the limit of
static friction stays the same for any weight. Therefore, for a particular value of the
coefficient of static friction, the angle at which the book starts to slide is the same
for any weight.
Step 2 Once the angle is very slightly greater than 21.8°, the static frictional force reaches
its maximum value and the book begins to slide. Once it is moving, the coefficient of
friction reduces to μK = 0.25 and so the frictional force drops to μKN.
The force down the slope is now given by W sin θ − F = W sin θ − µKN
Worked Example S4.1 (continued)
Step 3 We need to calculate W and N.

W = mg = 0.8 × 9.81 = 7.85 N
N = W cos θ = 7.85 × cos 21.8° = 7.29 N
Resultant force = 7.85 × sin 21.8° − 0.25 × 7.29 = 2.91 − 1.82 = 1.09 N
Step 4 We use Newton’s second law to find the acceleration:

F = ma
1.09 = 0.8 × a
So acceleration a = 1.37 m s −2
Summary
■ The value of the frictional force is given by the expression F ≤ μN where μ is the
coefficient of friction.
■ There are two coefficients of friction:
■ The coefficient of static friction μS, which applies when the two surfaces in
contact are stationary.
■ The coefficient of kinetic friction μK, which applies when the two surfaces in
contact are moving. 3
■ Generally, kinetic friction is less than static friction.
S4.1
A large shopping trolley is filled up and has a mass of 80 kg. The coefficient of static
friction is 0.2.
a Find the acceleration of the trolley if it is initially stationary and a force of 100 N is
applied to it.
b Find the acceleration of the trolley if it is initially stationary and a force of 200 N is
applied to it.
c Find the acceleration under the same forces if the trolley is already moving. The
coefficient of kinetic friction is 0.08.
S5: Work, energy and power
Learning Outcomes
■ understand that a heat engine is a device that is supplied with thermal energy and converts
some of this energy into useful work
S5.1 Heat engines

A piston can be used to do work. When the gas is heated, it expands (the volume of the gas
increases) and pushes the piston outwards. If the gas is then cooled, the volume of the gas
decreases and the piston is pulled back inwards. This process can be repeated over and over
in what is called a cycle. A device that uses this cycle is called a heat engine. A heat engine
uses thermal energy to do mechanical work. One example is the combustion engine used in
most cars. Fuel such as petrol (gasoline) is mixed with air to form a gas. This gas is ignited
(combusted) and the heat causes air in the mixture to expand. The expansion pushes a
piston, and this mechanical work turns a crankshaft that connects to the wheels that push
the car. However, such a device can never be 100% efficient as some of the energy used to
heat and expand the gas (to do work) has to be withdrawn when the gas is cooled again. This
is always wasteful and limits the efficiency of engines, which is why petrol and diesel engines 1
are often only about 30–40% efficient, but electric motors can be over 90% efficient. You will
look at heat engines in much more detail in Chapter 22.
The force produced by an expanding gas is not always constant. For example, when a gas
expands in a combustion engine, the pressure of the gas reduces so that the force on the
piston reduces. We cannot simply multiply the force by the displacement to calculate
the total work done, we must plot a graph of force (y-axis) against displacement (x-axis).
The work done can then be found by measuring or calculating the area under the force–
displacement graph.
Learning Outcomes
■ understand that gravitational potential is the energy per unit mass of a system
S5.2 Gravitational potential

Another way to write the equation for gravitational potential energy (g.p.e.) is:
EP = m × (g∆h)
We can think of the right-hand side of the equation as containing two terms: m, the mass,
and g∆h, the change in gravitational potential. Potential is a term you will meet later in the
book and is a measure of energy per unit mass, so it has units of J kg−1. In the presence of a
field such as the gravitational field around a planet, each point in space has a different value
for the potential. We can calculate the the difference in potential between any two points.
If the distances involved are small, then the gravitational field strength g is approximately
constant and we can use the simple formula g∆h. If we then want to find the change in
potential energy for any object placed in the field and moved between these two points, we
multiply the difference in potential by the mass of the object placed in the field. The concept 1
of potential is useful because we can calculate the potential and differences in potential
without needing to use the mass of an object placed in the field.
Worked Example S5.1
1 A car park has floors spaced 3 m apart: Calculate the change in gravitational potential in
going from:
a Level 1 to level 2
b Level 2 to level 7
c Level 5 to level 4
Calculate the change in g.p.e. of a person of mass 60 kg and a car of mass 1500 kg
in each case.
Step 1 Use the equation change of gravitational potential = g∆h
a change of gravitational potential = 9.81 × (2 − 1) × 3 = 29.4 J kg−1
b change of gravitational potential = 9.81 × (7 − 2) × 3 = 147 J kg−1
c change of gravitational potential = 9.81 × (4 − 5) × 3 = −29.4 J kg−1
The change of g.p.e. is found by multiplying the mass by the change of gravitational
potential:
a change of g.p.e. = 60 × 29.4 = 1770 J for the person and 1500 × 29.4 = 44 100 J
for the car
b change of g.p.e. = 60 × 147 = 8830 J for the person and 1500 × 147 = 221 000 J
for the car
c change of g.p.e. = 60 × −29.4 = −1770 J for the person and 1500 × −29.4 = −44 100 J
for the car
Learning Outcomes
■ understand that the efficiency equation can also be written in terms of power as well as energy
S5.3 Calculating efficiency using power

The equation for efficiency that you have seen written in terms of energy can also be written
in terms of power:
efficiency = (useful output power/total input power) × 100%
We can show this is identical to the equation using energy if we multiply top and bottom by
the same time period.
Summary
■ A heat engine is supplied with thermal energy and converts some of this energy into
useful work. 1
■ Change in gravitational potential is given by g∆h and is measured in J kg−1.

■ Efficiency can be calculated using power instead of energy.
S6: Momentum
Learning Outcomes
■ calculate impulses and relate them to change in momentum
S6.1 Impulse
Newton’s second and third laws of motion can be used to explain why momentum is
conserved. Consider a collision between two objects, A and B. Object A exerts a force FA on
F
object B with mass mB, causing B to accelerate with acceleration aB = mA . The change in
B
velocity of B is given by ∆v B = aBt, where t is the time the collision takes (we call this time the
duration of the collision).
Ft
Hence ∆vB = A by Newton’s second law.
mB
Ft
By a similar argument, the change in velocity of A is ∆v A = mB . (See if you can use the
A
logic of the explanation above for object B to write down the explanation for object A.) If we
multiply each side of the equation for each object’s change in velocity by the relevant object’s
1
mass we get:
mB∆vB = FAt and mA∆vA = FBt
The two objects collide with each other, and are in contact with each other for the same
length of time, t, so t is the same in both equations. Newton’s third law tells us that FA = −FB,
as the force exerted by object A on object B has same value but the opposite direction to the
force exerted by object B on object A. It follows that:
mB∆vB = − mA∆vA so that mA∆vA + mB∆vB = 0
In other words, during any interaction (for example, a collision), the total change in
momentum of any pair of interacting objects is zero.
This analysis has another point to it. The change in momentum of an object, m∆v can be
calculated by multiplying the force F acting on it and the time t over which the force acts.
The quantity Ft is called impulse and it equals the change in momentum. Impulse can be
measured in kg m s−1, the same units as momentum. The units that are more often used for
impulse are N s.
Worked Example S6.1
1 A ball of mass 100 g is travelling due west at 50 m s−1. It is struck by a racket that exerts a
force due east. The ball and racket are in contact with each other for 160 ms, after which the
ball travels east with a speed of 30 m s−1. Find the average force exerted by:
a the racket on the ball
b the ball on the racket.
Step 1 First we find the change in momentum of the ball. We shall use west as the
positive direction. The ball’s momentum initially is 0.1 kg × 50 m s−1 = 5 kg m s−1.
After the collision the ball has momentum 0.1 kg × 30 m s−1 east = −3 kg m s−1, where
the negative sign indicates that the direction has reversed. Hence the change in
momentum is:
final momentum − initial momentum = (−3) − 5 = −8 kg m s−1
The momentum change is negative, so it is to the east.

Step 2 We now work out the force using the fact that the change in momentum is equal to
the impulse
−8 = F × t
where t = 160 ms = 0.16 s. Hence

−8
F= = −50 N
0.16
Again, the minus sign shows that the force on the ball is to the east.
For part (b) we use Newton’s third law, which tells us that the force exerted on the racket by
the ball is 50 N to the west. 2
S6.2 Determining impulse from a force–time graph

In a real collision, the forces between two objects are unlikely to be constant. In the Worked
example, as the ball hits the racket and the strings on the racket stretch, the forces on both
the ball and the racket will increase. As the ball leaves the racket, and the strings on the
racket return to their normal length, the forces will decrease. A graph of force against time
would look like Figure S6.1. In this situation, the change in momentum can be found by
determining the area under the graph. This can either be calculated, if the shape of the graph
is known, or be found by counting squares.
maximum value
of the force
Force area under

/N the graph
end of collision,
start of collision 160 ms later
0
Time / s
Figure S6.1 When the force varies with time, the impulse is found by taking the area under the
graph (shaded region).
We can still calculate a value for the force, by taking Ft = m∆v, with t being the total collision
time. This will be the average force.
question
6.1 A toy cart of mass 0.5 kg is first pushed by a force of 2 N for 4 s, and then a force of 6 N for
1 s. Find:
a the total impulse acting on the cart
b the change in momentum of the cart
c the change in velocity of the cart
d the average force acting on the cart.
Summary
■ Change of momentum is a vector called impulse, given by Ft, and measured in N s.
■ If the force varies, the impulse can be found from the area under a force–time graph.
S6.1
A toy rocket has a spring in its base which is used to launch it sideways from a wall.
Figure S6.2 shows a force–time graph of the spring during the launch. The rocket has
mass 25 g. Using the graph find the speed of the rocket just after it launches.
0.06
Force
/N
0
0 0.5 0.7
Time / s
S6.2
A tennis ball hits a wall at 30 m s−1, reversing its motion. The ball has mass 80 g. If the
average force between the ball and the wall is 60 N, find the contact time.
S7: Matter and materials
Learning Outcomes
■ distinguish between elastic and plastic deformation of a material
■ recall the terms brittle, ductile, hard, malleable, stiff, strong and tough, explain their meaning
and give examples of materials exhibiting such behaviour
■ explain the meaning of strength, breaking stress and stiffness
■ draw force–extension, force–compression and tensile/compressive stress–strain graphs, and
explain the meaning of the limit of proportionality, elastic limit, yield point, breaking force
and breaking stress
■ state Hooke’s law and identify situations in which it is obeyed
■ account for the stress–strain graphs of metals and polymers in terms of the microstructure
of the material
S7.1 Describing materials

If we plot a stress–strain graph it reveals a lot about the behaviour of the material being
investigated. For example, you have already seen that many metals show a linear behaviour
only up to strains of about 0.1%, whereas rubber has almost no linear part to the graph but 1
remains elastic up to strains of several hundred percent. We can conclude that both the
shape of a stress–strain graph and the numerical values on the axes indicate the type of
material being investigated.
To begin with, contrast the behaviour of two different metals in Figure S7.1.
300
metal A
200
Stress / MPa
metal B
100
0
0 0.2 0.4 0.6 0.8 1.0
Strain / %
Figure S7.1 Stress–strain graph for two different metals.
Both metals obey Hooke’s law, although they have different Young moduli, with A being
stiffer than B. However, the graphs tell us much more.
• Metal A obeys Hooke’s law but then breaks suddenly, with only a very small region
beyond the straight-line section. If metal A were to be tested in the experiment shown
in Figure 7.10 (in the Coursebook), it would extend steadily by a fixed amount with each
weight added, and then snap. Metal B would behave very differently. As with A, at first it
would extend a fixed amount with each weight. Then it would extend much, much more
with each weight – perhaps 10 times as much – and it would be possible to see the wire
getting thinner. At some point the wire would continue to stretch even with no more
weight added, and then snap.
• Metal A will break at a larger load than B – it is said to be stronger. It extends up to the
limit of proportionality (the end of the straight-line region of the graph) and then a
little beyond, to the elastic limit. Remember that when a material extends past the elastic
limit, it will not return to its original length because it is deformed. Rather than then
deforming, metal A snaps. A material showing this behaviour is described as brittle.
Among metals, a good example would be cast iron. Among non-metals, most types of
glass show the same behaviour.
• Metal B will extend past the limit of proportionality, past the elastic limit and then will
deform substantially before breaking. When loaded in this way it will be drawn out into a
thinner and thinner wire. We call this behaviour ductility. In fact, electrical wire is often
made this way – by being drawn through a small hole and stretched. A good example
would be copper.
Stretching a metal wire is something that involves deformation in one dimension. A metal
can also be deformed in two dimensions by hammering or rolling and stretching it out flat.
Metals that deform easily in this fashion are called malleable (from the Latin word for a
hammer). The most malleable metal is gold, which is often made into exceptionally thin
2
sheets called gold leaf, used for decoration. Lead is also very malleable; in the past, lead was
often used as a roofing material.
Some metals such as steel and titanium are not very malleable unless they are heated. If
they are hammered when at normal temperatures they show very little or no deformation.
Sometimes a sample of these metals may shatter into pieces when hammered. These metals
are said to be hard.
More generally, a harder material will scratch the surface of a less hard material.
Geologists use this relative hardness to test and identify minerals. By using a small number
of items with different hardness, such as a piece of glass, a steel penknife and a couple of
different stones, it is easy to determine the hardness of a mineral relative to those materials.
Figure S7.2 summarises some of the important terms discussed so far.
plastic region
ultimate
yield point tensile
Stress / MPa
elastic limit strength

limit of proportionality breaking
stress
A
Strain / %
Figure S7.2 A typical stress–strain graph for a metal.
The behaviour of a metal like copper can be explained using this graph. Once it is loaded past
the elastic limit, it begins to deform. Beyond that point the graph’s gradient is very shallow,
showing that the material is much less stiff and so the wire extends much more. Eventually
the graph starts to curve downwards (at the ‘ultimate tensile strength’ point). Beyond this
point, the load on the wire is sufficient to keep the wire extending – it actually takes less and
less force to continue to stretch the wire. Eventually the wire reaches its breaking force and
snaps. By knowing the cross-sectional area of the wire at this moment, the breaking stress
can be calculated using the equation stress = force/area.
Testing Materials
You may wonder how a graph like that in Figure S7.2 is plotted, given that a wire will extend
rapidly once the load exceeds the ultimate tensile strength. Professional testing apparatus
works differently – the test sample is trapped between jaws and stretched (see Figure S7.3). Both
the force applied and the extension are measured. As the material stretches, the apparatus can
alter the force as necessary in order to stretch the wire by constant increments.
Figure S7.3 A materials testing rig and samples.
The importance of deformation

Most people would consider steel to be stronger than glass because glass is much easier to
break. But if glass is prepared as a fresh fibre and then loaded with weights until it breaks, the
ultimate tensile strength can be as high as 4100 MPa, compared to about 500 MPa for steel.
So why is glass easier to break? It is all to do with small cracks.
The surface of a material is never perfectly smooth, but will contain lots of small cracks.
At the tip of a crack the stress is very concentrated:
• in the main part of the material, an applied load is shared across a number of chemical
bonds
• at the tip of a crack, an applied load is concentrated on a single bond
• the material at the tip of a crack breaks, passing the load on to the next bond and so the
crack is said to propagate across the material (see Figure S7.4).
A crack can propagate across a material at the speed of sound.
stress is
concentrated here
Figure S7.4 In a brittle material stress can be concentrated at the tip of a crack.
A brittle material, such as glass, does not deform plastically. Brittle materials are much more
likely to form cracks that propagate. A ductile material can deform plastically, so the atoms
can rearrange as the material deforms. This acts to ‘blunt’ the crack and share the load
among more bonds. Resistance to cracking is called toughness. However, a tough material is
not necessarily strong – for example, polythene is tough but it is not very strong.
Cracks will only propagate when a material is placed under tension. Under compression,
a crack will close up. Some materials can be very much stronger under compression than
under tension – examples include stone and concrete. Such materials are often used in
construction. They are wonderful at supporting walls, where they are compressed, but less
good at spanning gaps, where they are stretched (see Figure S7.5).
area of
compression
area of
tension
load
Figure S7.5 A horizontal structure under load will have areas both of compression and of tension
To solve this, concrete lintels (horizontal supports such as those over windows or doors) are
often reinforced by the addition of steel rods. The steel performs well under tension, and
prevents the concrete from cracking.
S7.2 Explaining stress–strain graphs

In the previous section we began to consider not only what properties a material has, but why
it has them. The elastic behaviour of a material is explained by the elastic behaviour of the
bonds between its atoms. When a force is applied to an object, the load is shared between all
the bonds. Each bond stretches like a small spring. If the bonds between atoms obey Hooke’s
law, so will the material. When the load is removed, the bonds return to their normal state.
However, if those bonds are placed under too great a load they may break.
If all materials were perfect crystals then their properties would depend only on the
strength of the bonds between the atoms. Strong bonds would provide a high breaking stress
but the materials would be brittle and unable to deform, with bonds breaking when the
material reached the breaking stress. Also, the breaking strain (the percentage increase in
length) of a material would be the breaking strain of its bonds – which can be as little as 1%
of the original length.
Metals
Although metals are crystalline (they are made up of atoms in a regular arrangement), there
are two important reasons why they do not behave in the ‘ideal’ way just described:
1 A crystal is never perfect – the planes (layers) of atoms do not always align. The point where
one plane does not align with the next is called a dislocation. Dislocations allow whole
planes of atoms to slide over one another, and so the material can deform without breaking.
However, this ability to deform is limited. If there are too many dislocations, they can
tangle and restrict the movement of planes of atoms. This leads to something called work-
hardening, where a material that is repeatedly stretched will eventually go brittle and snap.
This is easily demonstrated: a steel paperclip has been bent into shape and can be bent out of
shape; however, if this is done too many times, the paperclip will break.
2 Most metals are polycrystalline. Instead of being formed from a single crystal, most
metals are made up of many grains (crystallites or small crystals). Within any single
grain the atoms will be nicely ordered (apart from dislocations), but each grain is
randomly arranged relative to the others. The presence of boundaries between the grains
(called grain boundaries) also limits the movement of the planes of atoms.
These two factors mean that a sample of a metal made from a single, large crystal will be
brittle. However, a sample containing many much smaller grains will also be brittle.
For metals, the behaviour under compression is very similar to the behaviour under
tension, as the atomic bonds behave the same way in each case. The planes of atoms can slide
over each other in the same fashion.
Polymers
The molecules in polymers consist of many repeated units of atoms bonded together. We call
these long-chain molecules, and their length means they are often found coiled and wound
around each other. When a force of tension is applied, the molecules begin to straighten out. 5
This requires much less force than stretching the bonds between atoms, and so polymers
(such as polythene or rubber) are much less stiff than metals.
The amount of ‘unwinding’ of long-chain molecules is usually not proportional to the force
applied, so polymers often do not obey Hooke’s law. However, the unwinding means that the
maximum strain can be much greater than that of individual bonds between atoms. Some
polymers can withstand a strain of several hundred percent. Once the molecules are fully
stretched, though, a polymer can become much stiffer. You may have noticed that an elastic
band will extend significantly up to a point, but then stiffen and sometimes break.
Different polymers behave differently under compression and tension. The amount of
compressive and tensile strain a polymer may undergo before strain acts directly on the
bonds within molecules depends on how straight or coiled up the polymer molecules are.
Stretching molecules is not necessarily an elastic process. In some polymers, such as
hardened rubber, there are many cross-links, which are weak bonds either between curved
sections of one molecule, or between one molecule and another (see Figure S7.6). These
cross-links have to be broken before the molecules can stretch. Having more cross-links
makes a polymer stiffer, and it means that more energy is needed for the stretching to take
place. After stretching, cross-links can reform, so the material returns to its original length.
Stretching and shrinking a material made from polymers can cause the material to give off
heat. This is due to energy being released when the cross-links re-form.
cross-links
Figure S7.6 Cross-links in a polymer.
Amorphous materials
Not all materials are crystalline. In some materials, the atoms or molecules are arranged in
an apparently random pattern. These are amorphous materials, such as glass and ceramics.
Amorphous materials are brittle, because there are no crystal planes able to move, no
dislocations and no cross-links to absorb energy.
Summary
■ A ductile material can be drawn out into a wire, a malleable material can be
flattened into a sheet.
■ Brittle materials break cleanly without deforming.
■ Tough materials deform and so resist cracking.
■ A strong material requires a large stress to break it. This can be measured by a 6
quantity called the ultimate tensile stress, the breaking stress or the yield stress.
■ A material can show different properties depending on whether forces applied are
tensile or compressive.
■ The characteristics of a stress–strain graph can be explained by the small-scale
structure of a material.
S7.1
Two identical steel wires are tested. The first wire is heated and quenched (placed
quickly in cold liquid) so that it becomes brittle. The second wire is left untreated. Each
wire in turn is loaded with equal masses, one at a time, until they break. Predict how
each of the two wires would behave. Highlight any similarities and differences.
S7.2
A wire of diameter 0.2 mm is gradually loaded with masses. Once a total mass of 2.3 kg
is loaded, the wire starts stretching rapidly and then breaks. Calculate the ultimate
tensile force and thus the stress of the wire.
S13: Waves
Learning Outcomes
■ describe sound waves in terms of the displacement of molecules or changes in pressure
■ explain what is meant by a plane-polarised wave, and use Malus’ law to calculate the
amplitude and intensity of transmission through a polarising filter
■ understand refraction of waves at the interface between two media, and relate the refractive
index to the wave speeds in those media
■ derive the equation for the critical angle and use it to solve problems
■ recall that total internal reflection occurs when a wave is incident at an angle greater than the
critical angle, and that optical fibres use total internal reflection to transmit signals
■ recall that, in general, waves are partially transmitted and partially reflected at an interface
between media
S13.1 Terminology
Another name for a progressive wave is a travelling wave.
We can use the terms frequency and period to describe other periodic (or cyclic)
1
phenomena too, as you will see in later chapters on oscillations and rotation. The period,
T is the time for one cycle, and the frequency, f is the number of cycles per unit time. They
1
are always related by the reciprocal relationship f = .
T
Figure 13.8 in the Coursebook (Chapter 13) shows how we can represent longitudinal and
transverse waves. The high pressure regions of a longitudinal wave are called compressions,
and the low pressure regions are called rarefactions.
The sine graph used to represent a longitudinal wave may be plotted as pressure change
against distance, with zero on the pressure axis corresponding to the equilibrium pressure.
Alternatively it may be plotted as displacement against distance, where the displacement
refers to the displacement of the particles from their equilibrium position. The maximum
displacement does not correspond to the maximum pressure, though! At the centre of a
compression (maximum pressure) or rarefaction (minimum pressure), the displacement is
zero. The largest displacements correspond to the points that are between and equidistant
(equal distances) from the compressions and rarefactions. We can describe the displacement
and pressure in a sound wave as being 90° out of phase with each other. Phase difference is
discussed further in the next section.
So far, we have described waves in terms of how their displacement varies with distance
along the direction of travel of the wave. The graphs we have been plotting are a ‘snapshot’ of
what the wave looks like at a particular instant in time. If we were to take a second ‘snapshot’
half a period later, we would see that the wave had moved half a wavelength to the right
(along the distance axis). This is shown in Figure S13.1a. Instead of plotting displacement
against position (at a given time), we could plot displacement against time, at a given position.
This produces the graph shown in Figure S13.1b, on which we can identify the period of the
wave. We measured this period in Box 13.1 and the accompanying worked example.
a wave at time t = 0 after time t = T/2, the wave

has advanced half a
wavelength
λ /2
Displacement
Distance
–
b
T
Displacement
Time
–
Figure 13.1 a A progressive wave travels along the direction of propagation, so at later times
the graph of displacement against distance will be shifted along the distance axis. b A graph of
displacement against time, for a fixed point along the direction of travel of the wave. The time for 2
one complete oscillation to pass that point is known as the period, T.
Note that phase difference can be measured in radians, where a complete cycle of
360° = 2π radians. (See also Chapter 17.) For example, this means that a phase difference
of 90° is π radians.
2
13.2 Waves at boundaries

When waves meet a boundary between two different materials, they may be reflected,
absorbed or transmitted. We will not deal specifically with the case where they are absorbed
here – when this happens, energy is transferred from the wave into the medium, heating it up.
Reflection
You will be very familiar with the phenomenon of reflection from your everyday life. You
probably see your own reflection in a mirror or a reflective surface several times daily, and
often you will hear the reflection of sound as echoes. The reflection of seismic waves can
be used to investigate the structure of rocks beneath the surface and search for oil. Police
radar detectors reflect radio waves off vehicles. If the vehicle is moving, the reflected wave
undergoes a Doppler shift, which can be used to calculate the speed of the vehicle.
All types of waves can be reflected, although the properties of the surface required to
reflect them vary depending on the type of wave. When waves are reflected, they obey the
law of reflection, illustrated in Figure S13.2:
The incident and reflected rays are at equal angles to the normal at the reflection point.
a The law of reflection

normal
incident ray reflected ray
i r
i=r
mirror
b A parabolic mirror
F r i
i=r
Figure S13.2 a The law of reflection. The normal is a line drawn at right angles to the surface. For
curved surfaces, at any point the normal is a line at right angles to the tangent to the curve at that
point. b Reflections from a curved surface – a parabolic mirror.
3
We have already met the idea of wavefronts, which ‘join up’ points of equal phase on the
wave. A ray is a line that is at right angles to the wavefront. If we start a ray from the wave
source in a given direction, it will follow a path that is at right angles to all the wavefronts
it crosses. You will already be familiar with the idea of light rays from your earlier physics
courses, but we can extend the use of rays to any other types of wave.
We can use a ray diagram to analyse the properties of the reflection. Figure S13.3 is a ray
diagram showing how an image is formed from a reflection in a plane mirror. This image
is known as a virtual image since no real rays of light actually cross (or converge) at the
image location. To find the image, we have to project the reflected rays backwards behind the
mirror to the point where they meet (the dotted lines in the diagram). The reflected light rays
are said to be diverging (spreading apart) in front of the mirror. They diverge in the same
way as light would if it travelled directly from an object placed at the image location (if the
mirror were not there).
dotted lines are the

continuation of the
reflected light rays to
the position where
they appear to come
from (as if they had
come directly from a
light source rather
than being reflected)
blue arrow is the image, in the
object, placed mirror position where it
in front of mirror is seen
Figure S13.3 Image formation in a mirror. The image can be seen when viewed from the position
marked by the eye symbol.
questionS
13.1 Use a ray diagram to prove that the image of a point in a plane mirror is the same
distance behind the mirror as the object point is in front.
13.2 Using the result in question 1, explain what the image looks like when a three-
dimensional object is placed in front of the mirror – use diagrams to help you.
Change of phase on reflection

Imagine you stretch out a Slinky spring between you and a friend. Then, while they firmly hold
one end, you send a transverse wave pulse down the spring, by quickly moving your end of the
Slinky up and down. What happens to the pulse when it meets the other end? You should see
that it is inverted or ‘flips’ as it is reflected – an ‘upwards’ pulse is returned as a ‘downwards’
pulse. This is a phase change of π radians (180°). We call this phase change inversion.
Then hold the Slinky vertically, so that it is extended but only held at one end. If you send
a transverse wave pulse down the Slinky like this, you will see that it is again reflected when
it gets to the bottom, but in this case there is no inversion on reflection – there is no phase
change on reflection. Figure S13.4 illustrates this effect.
a b
fixed end free end
Figure S13.4 A wave pulse passing along a string. In a the end of the string is fixed, and the pulse
undergoes a phase change of π radians (180°) on reflection. In b the end of the string is free and the
pulse is not inverted on reflection.
We can explain the phase change on reflection using our knowledge of Newton’s laws of
motion. Think about the case where the Slinky was held fixed at one end (Figure S13.4a), and
imagine that an upward pulse is arriving at the fixed point. The upward movement of the
Slinky exerts an upward force on the fixed point as it arrives. Therefore, by Newton’s third
law, the fixed point must exert an equal downwards force on the Slinky. This accelerates this
part of the Slinky downwards, and so the pulse is inverted.
The same phase change on reflection can happen with light. A light ray travelling through
air and reflecting off the surface of a piece of glass undergoes a phase shift of π radians on
reflection. However, a ray travelling through glass and reflecting off the interface between the
glass and air, does not undergo a phase shift on reflection.
The general principle can be summarised as:
• when a wave travels through a more dense medium and reflects off a less dense medium,
there is no phase shift
• when a wave travels through a less dense medium and reflects off a more dense medium,
the wave is inverted.
For light, we say that the medium with the higher refractive index (see below) is more
optically dense. In the case of a mechanical wave on a spring or a rope, we are referring to
density in the usual sense of mass per unit volume, assuming that the tension in the spring or
rope remains the same across the boundary.
Refraction
You may have noticed that when you put a straw in a glass of water, the straw appears bent
(Figure S13.5). Of course, the straw itself is not bent, but light rays travelling from the straw
change direction as they leave the water. This phenomenon is called refraction, and occurs
whenever a wave travels through a boundary between two different materials and changes
speed. The human eye uses refraction to form an image of the world around us on the retina.
If you wear spectacles or use contact lenses, the refraction of light in the lens provides the
correction necessary for an image to be formed in focus on the retina.
Figure S13.5 A straw placed in a glass of water appears bent because the light rays reflected from
the bottom of the straw are refracted when they leave the water.
Imagine a car driving along a straight road with a hard surface. At the edge of the road there
is soft mud. If the wheels on the left side of the car roll off the road into the mud, then they
will be slowed down compared to the wheels that remain on the road. The car will turn to
the left: its velocity vector will change from being almost parallel to the road, to pointing
to the left of the road. This models what happens when waves are refracted.
The laws of refraction

The diagram in Figure S13.6 shows what happens when a wave is refracted at a boundary
between two materials. We can also use the diagram to derive the law of refraction, which is
also known as Snell’s law.
a
5
ray
wavefront
wavefront
λ1
medium 11
medium
(speed
(speed vv1 )1 )
λλ2 medium
medium 22
(speed vv2 ) )
(speed 2
v1 > v2
λ1 > λ22
b zoom in on dashed square
normal to boundary
λ1 θ1
Figure 13.6 The speed of a wave
A θ1 C depends on the medium (material)
θ2 through which it travels. When a wave is
λ2 θ2 transmitted across a boundary between
D
two media that have a different wave
speed, it is refracted. a shows how the
ray and wavefronts are refracted.
b shows a close-up view of a, and
allows us to derive the law of refraction
(see text).
In Figure S13.6, the wavefronts are continuous across the boundary between the two
materials – that is, although the wavefronts change direction, each is an unbroken line as it
crosses the boundary. The wavefronts are continuous because the frequency of the waves is
the same on either side of the boundary. The frequency of the wave is set at the moment it
leaves the source. The frequency cannot change as the wave crossed the boundary (otherwise
a number of wavefronts would disappear completely). However, the wavelength does change
as the wave crosses the boundary, because the speed of the wave is different in the two
materials. Earlier in Chapter 13 of the Coursebook, we used the equation v = f λ to relate the
wavespeed, frequency and wavelength. If the frequency remains constant but v decreases as
the wave moves from medium 1 to medium 2, the wavelength λ must decrease. To allow this
while keeping the wavefronts continuous across the boundary, the wavefronts have to change
direction as they cross the boundary: they are refracted and the ray appears to bend.
We can use the geometry of the two right-angled triangles shown on the diagram to
produce two different expressions for the length of line AC, in terms of the wavelengths on
each side of the boundary:
λ
AC = 1
sin θ1
λ2
AC =
sin θ 2
Equating the two expressions:

λ1 λ
= 2
sin θ1 sin θ 2
Using v = f λ : 6
v1 v2
=
f sin θ1 f sin θ 2
then rearranging:
v1 sin θ1
= = n
v 2 sin θ 2
Here n is called the refractive index of medium 2 with respect to medium 1, and is the
v
ratio of the wavespeeds in the two media, 1 . We could also call this the boundary refractive
v2
index when travelling from medium 1 to medium 2. The line at right angles to the boundary,
at the point at which the ray crosses the boundary, is called the normal. The angles θ1 and
θ 2 are the angles the ray makes with the normal on either side of the boundary. Notice that
the incident ray, refracted ray and normal are all in the same plane. Because we measure the
angles to the normal, we can use the same law for curved surfaces.
Absolute refractive index

The definition of refractive index above depends on both media – it is the refractive index of
one medium relative to another medium.
We can instead define the absolute refractive index of a medium as:
c
nabs =
v
where nabs is the (absolute) refractive index, c is the speed of light in a vacuum (3.00 × 108 m s−1),
and v is the speed of light in the medium. You will see in most cases the absolute refractive
index is simply called the refractive index of a medium. The speed of light in any medium can
never be greater than the speed of light in a vacuum (empty space), so the refractive index is
always greater than 1. For light travelling in air, the speed of light is very close to the speed of
light in a vacuum, so we often approximate the refractive index of air to be 1. Table S13.1 gives
some refractive indices for common media.
We can use this new definition to write a new expression for Snell’s law. If the refractive
index of medium 1 is n1 and that of medium 2 is n2, we can write:
c
v1 =
n1
and
c
v2 =
n2
So the equation we derived above can be re-written as

v1 sinθ1 n2
= =
v 2 sinθ 2 n1
When re-arranged this gives us a simpler form of Snell’s law:
n1 sinθ1 = n2 sinθ 2
Note that we can write the refractive index of medium 2 with respect to medium 1, n, as
n 7
n = n2
1
The use of the absolute refractive indices makes it easier to solve problems. To see how this
works in practice, look at Worked example S13.1.
Material Refractive Index

Vacuum 1 (by definition)
Air at 0 °C and 1 atm pressure 1.000293 (usually taken to be 1)
Water at 20 °C 1.3330
Water ice 1.31
Crown glass 1.50 – 1.54
Flint glass 1.60 – 1.62
Pyrex 1.47
Perspex (acrylic glass) 1.49
Sapphire 1.76 – 1.78
Diamond 2.42
Table S13.1 Refractive indices of different media, for yellow light with a wavelength of 589 nm.
Worked Example S13.1
A ray of light falls on a glass block at an angle of incidence (angle to the normal) of 45°.
The angle of refraction inside the block is measured to be 30°. What is the refractive index
of the glass?
Step 1 Decide which material is medium 1 and which is medium 2. You may wish to draw a
labelled diagram to show the materials and the angles. In this case, we are going from
air into glass, and so medium 1 is air and medium 2 is glass. The refractive index of air is
1.00 (to 2 s.f.).
air (n = 1.00)
45°
glass
Step 2 We need to find the refractive index of medium 2 (n2). Rearrange Snell’s law to find this
quantity, then substitute in the values given in the question.
n1 sin θ1 = n2 sin θ 2
n1 sin θ1
⇒ n2 =
n2 sin θ 2
1.00 sin 45

n2 = = 1.41
sin 30

8
A diver is working underwater and a ray of light from his lamp strikes the surface of the water
at an angle of 55° to the horizontal. At what angle to the horizontal will the ray travel after it
leaves the water? The refractive index of water is 1.33.
Step 1 Read the question carefully! Here we are given and asked for angles to the horizontal, but
remember that Snell’s law works with angles to the normal. Draw a labelled diagram with
the given quantities and angles marked, and work out the angles to the normal.
air (n = 1.00)
θ2
55°
water (n = 1.33) 35°
Step 2 Rearrange Snell’s law to find θ2.
−1  n1 sin θ 1 
⇒ θ1 = sin  n 
2
 1.33 sin 35 

= sin −1  
 1.00
= 49.7° = 50° (2 s.f.)
Step 3 Give the answer in the form asked for in the question:
The ray leaves the water at an angle of 90° − 50° = 40° to the horizontal.
SUMMARY OF THE LAWS OF REFRACTION
1 The incident ray, refracted ray and normal to the point of incidence are all in the same plane.
2 If light travels from a medium of refractive index n1 into a medium of refractive index n2, then
the angles that the rays make to the normal to the boundary are given by the relationship
n1 sinθ1 = n2 sinθ 2
θ1is the angle between the ray and the normal in the medium of refractive index n1and
θ2is the angle between the ray and the normal in the medium of refractive index n2
(see Figure S13.7).
medium 1
(refractive
index n1)
θ1
θ2
medium 2
(refractive
index n2 )
Figure S13.7 Snell’s law
Apparent depth
You may have noticed that when you look down into a pool of water, it appears to be less
deep than it actually is. This effect is due to refraction – if you look back at the photograph
of the straw at the start of this section (Figure S13.5) you will notice that the straw looks bent
upwards in the water.
In fact, if we look directly down into the water (at right angles to the surface), the
refractive index of the water is given by the ratio
real depth
n=
apparent depth
If you look at the water at a smaller angle than a right angle, you will find that the apparent
depth is reduced, so this formula only applies if you are looking directly down.
We can explain this using our knowledge of refraction (see Figure S13.8).
θ1
air A
O
θ2
apparent
depth θ1
θ1
θ2
water
refractive
index n C
Figure S13.8 Refraction means that water appears to be less deep than it really is.
A ray of light coming from the bottom of the container of water at an angle to the normal
is refracted away from the normal as it leaves the water and passes into air (ray CO). A ray
of light that comes from the bottom of the container but is normal to the surface passes
through without changing angle (ray CA). If we trace the first ray back into the water (dotted
line OB), then it meets the ray that came out along the normal at point B. The distance AB is
the apparent depth (the real depth is the distance AC).
If you redraw Figure S13.8 with a larger angle θ1, then you will notice that the rays cross
higher up in the water and the apparent depth is reduced. So to find the maximum possible
10
apparent depth, we need to work out what happens as θ1 tends to zero.
Snell’s law tells us that:
sin θ1 = n sin θ 2
Trigonometry tells us that in triangle AOB
OA
sin θ1 =
OB
and that in triangle AOC
OA
sin θ 2 =
OC
Combining these two equations with Snell’s law tells us that:
OA OA
=n
OB OC
However, as we make θ1 smaller, length OB tends to length AB and OC tends to AC

(it doesn’t make sense to say what they are when θ1 = 0, but just before it becomes zero, these
pairs of lengths are nearly equal). Applying this to our equation above tells us that
OA OA
=n
AB AC
AC real depth
h= =
AB apparent depth
This equation can be used to measure the refractive index of a rectangular block of solid
or a liquid, using a travelling microscope (a microscope that moves up and down on a scale).
1. Focus the microscope on a mark on a piece of paper laid on the bench. Call this
measurement on the microscope scale a.
2. Put the block or liquid in place and refocus the microscope so it is again focused on the
mark. Call this measurement on the microscope scale b.
3. Focus the microscope on the top of the block. Call this measurement on the microscope
scale c.
4. The real depth is (c – a), and the apparent depth is (c – b).
5. You can use these measurements in the formula above to calculate the refractive index.
question
13.3 If you stand at one end of a swimming pool of constant depth, as you look to the far end
it looks like the swimming pool gets shallower. Explain this effect using a ray diagram
and your knowledge of refraction.
Dispersion and the prism

So far, we have assumed that the refractive index is the same for all wavelengths (colours) of
light. For many materials, however, this is not the case – the refractive index, and therefore
the wavespeed, varies depending on the wavelength. This property is known as dispersion.
11
Figure S13.9 White light entering a prism. Glass has a different refractive index for different
colours (wavelengths), so the colours are refracted differently.
White light was known to be split into colours by a prism before Isaac Newton’s experiments
with light, but the colours were thought to originate from the prism in some way. To test this
idea, Newton took the coloured light from the prism and tried to split it further. Since no further
colours were produced, he deduced that the white light was made up of a mixture of colours.
Total internal reflection

Our studies of refraction and Snell’s law have shown us that when light passes from a
more optically dense medium (high refractive index) into a less optically dense medium
(lower refractive index), the light bends away from the normal. However, once the angle of
incidence in the high refractive index medium reaches a value where the angle of refraction
would be greater than 90°, refraction can no longer happen. If you tried to solve Snell’s law
for such a case, you would be trying to find the inverse sine of a number greater than 1, so
there is no solution.
The angle of incidence required for the angle of refraction to be 90° is known as the
critical angle. A critical angle only exists for a ray going from one medium into another
medium with a lower refractive index. (Think about the opposite situation: if the ray were
going into a medium with a higher refractive index, it would be bent towards the normal, and
we would not reach an angle of refraction of 90° before the angle of incidence reached 90°).
For a ray of light going from a medium of refractive index n into air (which we will take
to have a refractive index of 1), the critical angle can be found by using Snell’s law, with the
angle of refraction set to 90°. We will call the critical angle c.
n sin c = 1.0 sin 90°
1
⇒ c = sin −1
n
More generally, if light travels from a medium with refractive index n1 into a medium with
refractive index n2, where n1 > n2, then the critical angle is:
n2
c = sin −1
n1
Once the angle of incidence becomes greater than or equal to the critical angle, no
refraction takes place and the ray undergoes total internal reflection (see Figure S13.10),
which obeys the laws of reflection discussed earlier in the chapter.
a refraction critical total internal 12

angle reflection
r
air r
medium weak
with reflection
refractive c i >c i
index n i i

Figure S13.10 a Total internal reflection occurs when the angle of incidence is greater than the
critical angle. b Photograph showing total internal reflection in an acrylic block.
The critical angle is defined as the angle of incidence for a ray crossing the boundary from
a medium of higher refractive index to one of lower refractive index for which the law of
refraction predicts an angle of refraction of 90°. No refracted ray can form and the incident
ray undergoes total internal reflection at all angles greater than or equal to the critical angle.
Diamond has a very high refractive index and therefore a small critical angle. Diamonds
used for jewellery are cut so that light entering through the top surface is totally internally
reflected and comes back out of the top, so it looks like light is streaming out of the diamond.
Getting the cut right is critical to this – if one cut is not correct, then the light will exit through
the sides of the diamond after being internally reflected. The small critical angle means that
most light entering the diamond is totally internally reflected, and a small movement of the
diamond can cause the light to illuminate a different facet – the diamond appears to sparkle.
Many optical instruments such as binoculars and periscopes use total internal reflections
in 45° prisms. Since the critical angle for glass with a refractive index of 1.5 is around 42°, the
light is incident on the internal face of the prism at an angle greater than the critical angle,
and is totally internally reflected (see Figure S13.11).
periscope
light ray
13
eye
Figure S13.11 Light is totally internally reflected inside 45° prisms in a periscope.
Fibre optics
Transparent glass fibres (often called optical fibres) guide light along them by total internal
reflection. Light rays that pass into one end of a fibre meet the inner surface at an angle
greater than the critical angle, and are therefore totally internally reflected. This continues
to be occur even when the fibre is bent, as long as the radius of the bend in the fibre is much
greater than the radius of the fibre. Most optical fibres produced have a diameter less than a
millimetre, so the condition for total internal reflection is easy to achieve. The fibre used to
transmit the light is usually clad (coated) in a layer of glass with a lower refractive index. This
means that the critical angle is quite large, so the rays travel very close to the axis of the fibre.
Optical fibres are mainly used for communication. In some cities, optical fibre is used
instead of copper wire for high-speed internet communications. It is possible to send
information down an optical fibre much more quickly, and with less signal loss, than sending
electrical pulses down a copper cable (see also Chapter 20 on communications systems). This
is because the high frequency of light (>1014 Hz) means that very short pulses can be used
and detected. Light with a single frequency (monochromatic light) is used since the glass
is dispersive, and light with a mix of different frequencies (colours) would travel at different
speeds. If the fibres were used to communicate over long distances, then the different frequency
components in non-monochromatic light would spread out and cause the signal to degrade.
Optical fibres are also used in medicine. A device called an endoscope can be inserted
into the body and used to see inside. An endoscope contains one bundle of optical fibres to
transmit light inside the body to illuminate the area under investigation, and another bundle
of fibres to transmit the image back to the physician. Endoscopes are used for diagnosis –
determining the nature of a medical condition. They are also used in operations with special
surgical instruments that can be inserted through a small incision in the patient’s tissue.
This minimises the need to cut through large amounts of tissue to perform an operation and
helps to reduce the patient’s recovery time.
a
θc θc
90 – θ c θc
θ max
cladding: n2 core: n1

Figure S13.12 a An optical fibre used for digital audio connections between devices. b Diagram
showing transmission of light through an optical fibre. This shows the maximum possible angle to 14
the axis at which light can be incident, as it meets the fibre boundaries at the critical angle.
Partial reflection
a b
θ incident
air
incident ray reflected ray
water reflected ray
θ refracted

Figure S13.13 Partial reflection. a In this photograph, you can see a reflection from the buildings
on the surface of the water – however, if you were viewing from under the surface of the water,
you would be able to see a refracted image of the buildings, too. We can also see a refracted
image of the bottom of the lake, but from within the water, you would be able to see a reflection
of the pebbles on the upper surface of the water. b When a ray of light is incident on a boundary
between media, some of the light is transmitted (refracted) and some is reflected.
So far, we have discussed refraction and total internal reflection. When light is incident on an
interface between two media at less than the critical angle, most of the light is refracted and
transmitted, but some is reflected too (see Figure S13.13). The amount of light that is reflected
depends on the angle of incidence in a complicated way, but once we get beyond the critical
angle, we know that no light is transmitted – it is all reflected, hence the name total internal
reflection.
You will have experienced partial reflection on a daily basis, but may not have thought
about it. If you look out of a window when it is dark outside, you will see your reflection
in the window. From the outside, though, a passer-by will be able to see you clearly. That’s
one of the reasons we usually draw curtains or blinds across windows at night (although
of course blinds or curtains also have other uses, such as thermal insulation). If there are
streetlights nearby, you may be able to see the reflection and the view outside superimposed
in the window (Figure S13.14). In fact, there will always be some reflection, but we usually
do not notice the reflection so much when it is bright outside. This is because only a small
fraction (<10%) of the light is reflected – so that when it is bright outside, the transmitted
light is much brighter than the reflection.
The same effect is used in so-called ‘two-way glass’. If you set up one side of the glass
with much brighter lighting than the other, it appears to be mirror-like on that side, while
allowing an observer on the dimly lit side to see through. This effect can be enhanced by
partly silvering the side you wish to be reflective (coating the glass with a thin layer of
reflective paint).
We will look again at partial reflection when we discuss thin-film interference.
15
Figure S13.14 Partial reflection in a window. The inside of the room is clearly visible in the right-
hand half of the window, but it is also possible to see outside through the left-hand half.
S13.3 Polarisation of waves

In Figure 13.16 on Chapter 13 of the Coursebook, the electric field of an electromagnetic
wave is shown as oscillating in the vertical plane. This wave can be described as being plane-
polarised in the vertical plane. If the electric field oscillated in the horizontal plane instead,
the wave would be described as being plane-polarised in the horizontal plane (the magnetic
field would now be now be in the vertical plane). We could also have a case where the electric
field oscillated in some intermediate direction between vertical and horizontal. This is
another plane polarisation of the wave. You may see plane polarisation referred to as linear
polarisation.
Figure S13.15 shows us how the concept of plane polarisation applies to a transverse wave
on a string. Only transverse waves can be polarised – longitudinal waves cannot be polarised.
a y vertically polarised
wave wave passes
through
vertical slit
b y
horizontally polarised
wave wave unable to
pass through
vertical slit
x
z
a component
wave with of the wave
c y intermediate passes through–
linear polarisation this component
is vertically
polarised
Figure S13.15 Transverse waves on a string can be polarised. The polarisation plane of each wave
is shown by the orange line. In a, we see a vertically plane-polarised wave, which passes through a
vertical slit. In b, a horizontally plane-polarised wave is unable to pass through the vertical slit, so
there is no transmission (the wave would be absorbed or reflected). In c, a wave with a polarisation
between horizontal and vertical is partially transmitted, as it has a component which is in the
vertical plane.
16
question
13.4 Explain why longitudinal waves cannot be polarised.
Since electromagnetic waves are transverse, they can have a polarisation. For example,
light from the Sun or from a light bulb is described as unpolarised, since it consists of
light in all possible polarisation states superposed. This light can be polarised by a linear
polarising filter, which is often called a Polaroid sheet. This filter only allows light in one
plane of polarisation through. If two Polaroid sheets are placed so that the directions of
polarisation are at right angles to each other, then no light is transmitted (all polarisations
of light are blocked). A Polaroid sheet consists of a transparent polymer in which all of the
long-chain molecules have been aligned in the same direction. The action of this is similar
to the slit in Figure S13.15, except that the waves are absorbed if they are polarised parallel
to the chains of molecules, and transmitted if they are at right angles to the chains. Figure
S13.16 shows the effect of Polaroid sheets on unpolarised light.
Figure S13.16 No light is transmitted in the region where these polarising filters overlap, because
their directions of polarisation are at right angles to each other. Where they are not overlapped,
the light passing through is plane-polarised. There is a reduction in the intensity of light because
not all the light incident on the filter is able to pass through. When unpolarised light of intensity I
is incident on a polarising filter, the transmitted intensity is I
2
Light reflected from the surface of a still lake is partially plane-polarised. This means more
of the reflected light is plane-polarised parallel to the lake’s surface than would be expected in
unpolarised light. At an angle of approximately 37° to the lake’s surface, the reflected light is
fully plane-polarised parallel to the surface. Reflections from other transparent media, such as
glass, are also partially polarised (the angle at which the reflection is fully polarised is different:
it depends on the refractive index). Why this happens is explained in Figure S13.17. If we take a
photograph of a lake or a window through a polarising filter, we can reduce the intensity of the
reflected light compared to the intensity of the transmitted light. If you have polarising sunglasses
you may have noticed this effect: you may be able to see through a car window when without the
polarising sunglasses you would have just seen your reflection in the window (see Figure S13.18). 17
polarised
unpolarised light reflected ray
53° 53°
air
water
(refractive index
n = 1.33)
partially polarised
refracted ray
indicates polarisation direction in plane of paper

indicates polarisation direction out of plane of paper
Figure S13.17 Light polarisation at the surface of a lake.
When light is incident on water at 53° to the normal, the fraction of the light that is reflected
from the water is fully plane-polarised. When light enters the surface, it causes electrons in
the surface to oscillate. These oscillations are in the two directions that are perpendicular
to the refracted ray, and are the source of both the refracted and the reflected rays. However,
the oscillations that are in the plane of the diagram (represented by the bars across the ray)
are parallel to the reflected ray. They cannot therefore contribute to it, since the oscillations
making up the reflected ray must be perpendicular to the ray.
For this reason, the reflected ray is polarised and only consists of the oscillations out of the
plane of the diagram (represented by the circles on the ray). The full polarisation of the reflected
ray only occurs when the reflected and refracted ray are at right angles to each other. At other
angles, the oscillations in the plane of the diagram have a component that is perpendicular to
the reflected ray, and so can contribute to it. This gives rise to a partially polarised reflected ray.
worked example S13.3
Show that the angle of incidence required for the reflected and refracted ray to be at right
angles to each other in Figure S13.17 is 53°.
Step 1 Set up the problem and use a diagram.
Let the angle of incidence (and therefore angle of reflection) be θ. The diagram below shows
all the angles in our problem:
polarised
unpolarised light reflected ray
53° 53°
air
water
(refractive index
n = 1.33)
partially polarised
refracted ray
indicates polarisation direction in plane of paper

indicates polarisation direction out of plane of paper
We know that the angles along one side of the normal must add up to 180°, hence we can
work out that the angle of refraction is 90° – θ.
Step 2 Use Snell’s law to write down an equation to solve for θ. 18
1.00 sin θ = 1.33 sin (90° – θ)
sinθ
Step 3 Recall that sin ( 90° − θ ) = cosθ and tanθ = , and hence solve the equation for θ.
cosθ
sin θ = 1.33 cos θ

tan θ = 1.33
θ = tan−1(1.33)= 53.1°
Figure S13.18 An example of photographs taken a without and b with a polarising filter. With the
filter, most of the light reflected from the surface of the water is absorbed, allowing the refracted
light from beneath the water to be seen.
Polarisation of microwaves
Microwaves are another form of electromagnetic radiation, so they can also be polarised. You
can use the equipment shown in Figure S13.19 to investigate the polarisation of microwaves.
The microwaves used have a wavelength of a few centimetres, much larger than the
wavelengths of visible light. Microwaves therefore need a different kind of polarising filter.
The metal grids shown in Figure S13.19 are used as polarising filters for microwaves. When the
electric field of the electromagnetic wave oscillates parallel to the metal bars, it makes electrons
in the metal move up and down the bars. This absorbs the wave energy and the microwaves
are not transmitted. However, when the electric field of the waves oscillates at right angles to
the bars, then the electrons are not moved up and down the bars. (The electrons do not move
across the bars provided the bars are thin compared to the wavelength of the microwaves.) The
wave energy is not absorbed and hence the wave is transmitted.
If the grids are placed so that the metal bars cross each other at right angles, then no microwaves
will be transmitted through the combination of grids. This is the same effect as in Figure S13.16,
where we observed that no light was transmitted through two crossed polarising filters.
R 20 cm T
polarisation grid
19
Figure S13.19 Microwave transmitter, receiver and polarising grids.
Another source of light that is partially polarised is the light from the daytime sky. Sunlight
is scattered by molecules in the atmosphere and this scattered light is partially plane-
polarised. It is completely plane-polarised when the scattered light is at 90° to the incident
light, as the oscillations in the scattered ray cannot have a component in the direction of the
incident ray. This means that there is only one possible oscillation direction for the scattered
ray in this case. Light reflected from the clouds is unpolarised, so by using a polarising filter
on a camera, we are able to reduce the intensity of the light from the sky and increase the
contrast with the clouds.
Malus’ law
When plane-polarised light falls on a linear polarising filter at an angle θ to the polarisation
direction of the filter, the component of the light that is parallel to the filter’s polarisation
direction is transmitted. The remainder of the light is absorbed by the filter – this energy
must be transferred to thermal energy in the filter.
If the incident polarised light has an amplitude A0 (see Figure S13.20), then the component
of light that is transmitted must have an amplitude given by:
At = A0 cos θ
Remember that intensity is proportional to (amplitude)2 (see Chapter 13 of the Coursebook),

so the transmitted intensity is:
It = I0 cos 2 θ
This result is called Malus’ law, and the graph of the function It is plotted in Figure S13.21.
polarisation direction
of filter
transmitted incident
amplitude θ amplitude A0
A = A0 cos θ
absorbed
amplitude
polarisation of
incident light
Figure S13.20 When plane-polarised light is incident on a linear polarising filter, the component
in the direction of polarisation of the filter is transmitted.
We can use this idea to show that the intensity of polarised light transmitted through a
polaroid sheet is half the incident intensity of unpolarised light on the sheet:
• Let the intensity of the unpolarised light be I0 . Since the light is unpolarised, this
intensity is equally distributed over all polarisation angles.
• To work out how much light is transmitted, we need to add up the contributions
transmitted at each possible polarisation angle in the incident light. The size of the
contribution at a particular angle is the value read from the graph, and we have already
said that the intensity is equally distributed over all angles. If we work out the area under 20
the graph from 0 to π radians, this will be p times the total transmitted intensity. So,
considering the graph of It in Figure S13.21a, if we find the area under the curve between 0
Iπ
and π radians, we get a value of 0 see Figure S13.21b), so the transmitted intensity is I0
2 2
• We do not need to calculate this over the full circle from 0 to 2π because polarisation
angles of, say, π and 3π refer to the same polarisation. However, we would get the same
2 2
result if we did the calculation between 0 and 2π (and divided by 2π rather than π – look
at the graph and convince yourself of this).
We can also approach this problem more formally using calculus:
• The intensity is distributed evenly over all angles, so the incident intensity over a small
range of polarisation angles θ to (θ + dθ) is:
I0dθ
dI =
π
(where the possible range of polarisation angles is 0 to π).
• The transmitted intensity over a small range of polarisation angles θ to (θ + dθ) is:
I0 cos 2 θ dθ
dIt =
π
• Therefore the total transmitted intensity is the integral (the sum) of this over all possible
polarisation angles, that is:
π π
I0 cos 2 θ dθ
∫
It = dIt =
0
∫
0
π
1
2
• We use the trigonometric identity cos θ = (1 + cos2θ ) to do this integral, so:
2
π π
I0 (1 + cos2θ ) I  I
It = ∫
0
2π
dθ =  0  = 0
2π
 0 2
Note that the cos 2θ term gives zero when integrated over this range: you can either do
the calculation to show this or use symmetry considerations).
a
I0
Intensity
0 π/2 π 3 π/2 2π
θ (angle to polarisation axis of filter) 21
b
I0
I0 cos2 θ
π
Intensity
π
I0 /2
I0 sin2 θ
0 π/2 π 3 π/2 2π
Since cos2 θ + sin2 θ = 1, at any point the sum of

these graphs is I0. So the area under both graphs from
0 to π is is I0π. Since the area under each graph is equal
in this range, the area under the I0 cos2 θ graph is
I0π/2. The average transmitted intensity through the
polaroid filter is therefore I0/2.
2
Figure S13.21 a Graph of the function It = I 0 cos θ (Malus’ law). b Calculating the area under
the graph.
You can investigate Malus’ law in the laboratory using some Polaroid sheets and a light level
meter. Take two Polaroid sheets and place them so that their directions of polarisation are at
90° to each other (‘crossed Polaroids’). Place another sheet between them and vary the angle
of this middle sheet’s polarisation direction to the polarisation direction of the top sheet
(see Figure S13.22). Use the light meter to investigate how the intensity of transmitted light
changes as you change the angle.
Figure S13.22 Two crossed Polaroid sheets with a third polariser inserted between them at an angle.
Two pieces of polaroid sheet are placed with their directions of polarisation at 90° to each
other, so that no light passes through. A third piece is placed between them so that its
polarisation direction is at 45° to the polarisation direction of the top sheet. Unpolarised
light of intensity I 0 is incident on the stack of sheets. What fraction of this light is
transmitted through the stack?
22
Step 1 Determine how much light passes through the first sheet.
We know that the intensity of the plane-polarised light transmitted by the polarised
sheet is half the intensity of the unpolarised light incident on it. So after the first
sheet, we are left with an intensity:
I0
I1 =
2
Step 2 Determine how much of that light passes through the second sheet.
Malus’ law tells us that if polarised light is incident on a polarising filter, the
transmitted intensity is given by:
It = I0 cos 2 θ
The second sheet has a polarisation direction at 45° to the polarisation direction of
the light transmitted through the first sheet. So the intensity after the second sheet is
given by:
I1 I0
I 2 = I1 cos 2 45° = =
2 4
Step 3 Determine how much of that light passes through the third sheet.
The polarisation of light passing through the second sheet is also at 45° to the
polarisation direction of the third sheet, so the same reasoning as in Step 2 applies,
and the intensity passing through the third sheet is:
I 2 I1 I0
I3 = I 2 cos 2 45° = = =
2 4 8
So one-eighth of the incident light intensity is transmitted through the stack of sheets.
Applications of polarisation
Certain complex molecules have a ‘handedness’. This means that they exist in two forms
which are mirror images of each other, called enantiomers. These mirror images cannot be
superimposed by rotating the molecules, in the same way as you cannot lay one hand on top
of the other and match up the fingers. If the substance is dissolved to form a solution:
• a solution of one of the two enantiomers of a molecule will rotate the plane of polarisation
of plane-polarised light clockwise
• a solution of the other enantiomer will rotate plane-polarised light anticlockwise.
Molecules that do this are said to be optically active. A solution containing a 1:1 mixture
of the two enantiomers will not rotate the plane of polarisation of the light, since the two
contributions to the rotation from each enantiomer cancel out. The amount of rotation of the
light that passes through a solution can be measured. If we have a solution consisting of only
one of the enantiomers, then this measurement can be used to determine its concentration.
Glucose (a sugar) is an optically active molecule, and its concentration in solution can be
determined in this way. This is used in the food industry.
The polymer molecules in certain plastics such as Perspex can also rotate the plane of
polarisation of light (see Figure S13.23a). The extent to which a plastic rotates the plane of
polarisation depends on the stress exerted on the polymer sample and the colour of the light.
This leads to concentrations of stress showing up as colourful patterns under plane-polarised
light. This is used by engineers to investigate stresses in structures, by building and loading a
Perspex model and examining it under plane-polarised light.
Thin slices of rocks, thin enough for light to be transmitted through the mineral crystals,
allow us to investigate the optical properties of those crystals (see Figure S13.23b). Many
mineral crystals rotate the plane of polarised light in a similar way to Perspex, and the extent
of that rotation is frequency (colour) dependent. This property is known as birefringence. By
examining the crystals under polarised light, it is possible to identify the minerals present in 23
the rock. Usually, polarised light is shone through the crystals and a second polarising filter,
placed at 90° to the plane of polarisation of the first one, is placed at the eyepiece or detector
of a microscope. Only light that has had its plane of polarisation rotated can be seen at the
eyepiece or detector.
a b
Figure S13.23 a A plastic protractor viewed in plane-polarised light. The patterns of stress
that were locked in to the structure of the plastic as the shape was formed and cooled are visible.
b The lower image shows a thin-section (a very fine slice) of the rock in the upper image, viewed in
cross-polarised light. Birefringence colours are visible.
Summary
■ The law of reflection: when rays are reflected from a surface, the incident and
reflected rays are at equal angles to the normal at the reflection point.
■ When a wave travels through a less dense medium and is reflected off a more dense
medium, there is a 180° (π radians) phase shift in the reflected wave (the wave is
inverted). This also applies to light, where a more optically dense medium is one
with a higher refractive index.
■ When a wave travels between two media with different wave speeds, it is refracted
according to the law of refraction:
■ Many transparent materials, such as glass or plastic, are dispersive, which means
that different frequencies of light travel through them at different speeds – they have
a different refractive index for different colours of light.
■ The critical angle is the angle of incidence for a ray crossing the boundary from a
medium of higher refractive index to one of lower refractive index for which the law
of refraction predicts an angle of refraction of 90°.
■ No refracted ray can form and the incident ray undergoes total internal reflection at
all angles greater than or equal to the critical angle.
■ When a wave is refracted, not all of the wave is transmitted – some is reflected too.
This is known as partial reflection.
■ Transverse waves may be polarised, which means that their oscillations are confined
to a particular plane. 24
■ Polarising filters only allow one polarisation of light to pass through. If light is
incident on them at an angle θ to the filter’s polarisation direction, then only a
component of that light will pass through. If the incident wave has intensity I0 the
transmitted wave has intensity:
It = I0 cos 2 θ
S13: Waves and Optics
S13.1
A semi-circular convex mirror is attached to the wall.
mirror
Using a ray diagram, explain why a viewer positioned at A has almost a 180° field of view in the mirror
(that is, they can almost see directly along the wall). [4]
S13.2
a Define the term critical angle. [2]
b
Explain what happens when light is incident on an interface at an angle greater than the critical
angle. [2]
A diamond has a refractive index of 2.4.

25
c Calculate the critical angle for diamond. [2]
The diagrams below show three possible cuts for a diamond, to be used in a ring.
too shallow perfect too deep
d
Using these diagrams, and your knowledge of total internal reflection, explain why the cut of a
diamond is very important if you wish to have a ‘brilliant’ gemstone (one which appears to be
illuminated from within). [3]
e
A ray of light meets a flat surface of the diamond at an angle of 85° to the surface. Calculate the
angle at which the light leaves the diamond into the air. Take the refractive index of the air to
be 1.00. [3]
S13.3
a
Explain how an optical fibre is able to transmit light efficiently over a long distance. Use a diagram in
your answer. [3]
b
An optical fibre consists of an inner cylindrical core, through which the light is transmitted, and a
cladding with a lower refractive index. The core has a refractive index of 1.50, and the cladding a
refractive index of 1.45. Calculate the critical angle within the fibre. [2]
Calculate the maximum angle θmax, relative to the axis of the fibre, at which the light may enter and be
c

transmitted down the core of the fibre. [3]
θc θc
90 – θ c θc
θ max
cladding: n2 core: n1
S13.4
In a seismic survey, an explosion is triggered. The P-waves (compressional waves, like sound) generated by
the explosion travel through the Earth, and are detected by seismometers (sensors). The diagram below
shows some possible paths the wave can take.
explosion 10 km
direct wave
1 km
10° reflected
wave speed 6 km/s wave
α
wave speed
8 km/s
refracted
wave
26
a
At a distance of 10 km from the source, calculate the difference in arrival time between the direct
wave and the reflected wave. [3]
b
When a light wave travels from one medium to another, it obeys Snell’s law relating the angles and the
refractive indices. Write down Snell’s law and then express it in terms of the speed of light in the two
media, v1 and v2. [3]
a
Seismic waves also obey a version of Snell’s law. Using the expression you derived in b, calculate
the angle of refraction α in the diagram. [2]
S13.5
An interview room in a police station is set up as shown in the diagram below.
dimly lit brightly lit

viewing room interview room
glass window
Explain carefully why someone in the viewing room is able to see into the interview room, but someone in the
interview room cannot see into the viewing room and only sees their own reflection in the glass. [3]
S13.6

a Explain why light can be Polarised but sound cannot. [3]
b Explain why Polaroid sunglasses allow you to see beneath the surface of a lake, but without
the sunglasses, you can only see a reflection. [3]
S13.7
A Polaroid sheet, when held in front of an unpolarised light source, reduces the intensity of the light by
a factor of 2.
Two Polaroid sheets are placed ‘crossed’, so that their directions of polarisation are perpendicular to each
other. No light passes through the two sheets.
A third Polaroid sheet is placed in between these two sheets at an angle θ.
If the angle θ is set at 45°, what fraction of incident light does the combination of Polaroid sheets
a
allow through? [4]
b Sketch how the light intensity varies with θ. [2]
1
cos 2 ( 45° ) =
2
27
S14: Superposition of waves
Learning Outcomes
■ determine the resultant amplitude when two waves superpose, making use of phasor
diagrams
■ recall that waves can be diffracted and that substantial diffraction occurs when the size of the
gap or obstacle is comparable to the wavelength
■ recall qualitatively the diffraction patterns for a slit, a circular hole and a straight edge
■ recognise and use the equation nλ = bsinθ to locate the positions of destructive interference
for single-slit diffraction
λ
■ recognise and use the Rayleigh criterion θ ≈ for the resolving power of a single aperture
b
S14.1 Representing waves using phasor diagrams

In Chapters 13 and 14 of the Coursebook, we introduced the concepts of phase and phase
difference. The phase of a wave motion tells us where we are in the cycle (in general, the
word phase can be applied to the same concept for oscillations and for other periodic
phenomena, for instance ‘the phases of the Moon’). Two waves at the same position in their
1
cycle are said to be in phase, and two waves which are half a cycle different in phase are said
to be out of phase or in antiphase.
When two waves meet at a point, we obtain the resultant displacement by adding up the
amplitudes of those two waves at that point. The fact that we are able to do this is known as the
principle of superposition, as we have seen in the Coursebook. In order to obtain a measurable
and regular interference pattern, the waves must be coherent, i.e. the sources producing them
must have a constant phase difference between them (which may be zero). This implies that
both sources have the same frequency, and this frequency must remain constant.
We can use the concept of a phasor to keep track of where a wave is in its cycle. A phasor
is simply a rotating arrow, which turns once per cycle at a constant angular speed (like a
clock). The angle between two phasor arrows gives us the phase difference between them.
The length of the arrow, a, is proportional to the amplitude of the wave, which means that
the displacement of the wave at a given phase angle, θ, is given by asinθ . We can quote the
phase angle in either degrees or radians (but remember, if you are doing any calculations,
make sure that your calculator is in the correct angle mode!). Figure S14.1 shows how we can
use phasor arrows to track where we are in the wave’s cycle, and to calculate the displacement
of the wave at that point.
The idea of phasor diagrams can also be used for other periodic phenomena, such as
oscillations, and can be used in Feynman’s ‘sum over histories’ interpretation of quantum
mechanics (see Chapter S33) to obtain the probability of a particular observation by adding
up phasors for possible paths that give that observation.
Sinusoidal variation, showing phasor and phase angle
0 π/4 π/2 3π/4 π 5π/4 3π/2 7π/4 2π Time
Period is time T, so phasor rotates 2π over the period

Angle θ = 2π t/T = 2π ft
So displacement = a sin 2π ft a sin θ
θ
phase a
angle θ radius a
Figure S14.1 Phasor arrows indicate the point in the cycle that we have reached, and allow us to
calculate the wave displacement at that point.
If we wish to add up the displacements from two different waves, then we can do this very
straightforwardly with phasors. All we need to do is find the vector sum of the two phasor
arrows (add them ‘tip to tail’). The resultant phasor gives us the amplitude and phase of the
superposition of the two waves. If the two waves are in phase, the resultant wave has the
same phase as the two original waves but the amplitude is the sum of their amplitudes. If
the two waves are in antiphase, then the amplitude of the resultant wave is the difference
between the amplitudes of the two original waves, and the phase is the same as the wave
that had a larger amplitude. In the case where the waves have the same amplitude, then the
two waves ‘cancel out’. For waves with a phase difference in between these two extremes, 2
then we go through the procedure of adding the phasor arrows. If the two waves have the
same amplitude, then you will find that the phase of the resultant wave is the average of the
phases of the two original waves. The phasor diagrams for these three cases are shown in
Figure S14.2.
a Oscillations in phase
II
III = I + II
b Oscillations in antiphase
II 3
III = I + II
c Oscillations with π/2 phase difference
II
III = I + II
Figure S14.2 Using phasor arrows to determine the superposition of two waves.
This idea of adding phasor arrows as vectors is very useful when considering situations where
there are multiple contributions to the wave displacement at a particular point, for example
in a diffraction grating where we are adding up contributions that come from many of the
lines in the grating. We will use phasor diagrams to analyse the diffraction pattern for N-slits
(N > 2) and the diffraction grating later.1
S14.2 Double-slit interference revisited

In Chapter 14 of the Coursebook we gave an expression for the separation of the maxima in
double-slit interference. Here we are going to derive that expression, and in addition, using
the phasor diagrams we have just described, investigate the form of the interference pattern
that would be seen.
Remember that to obtain a clear interference pattern:
• The two sources of waves must be coherent (there must be a constant phase difference
between them, which means that they must have the same frequency) and of the same type.
• The sources must be of equal (or almost equal) amplitude.
When we do the double-slit experiment with light, we can produce coherent light either
by using a laser (which always produces coherent light) or by passing light from a light-bulb
through a single slit first: the diffracted light from the single slit is coherent.
We are going to be working in the far-field approximation, which means that, relative to
the slit separation a, the screen on which we are viewing the pattern is a long way away (at a
distance D such that D >> a). This means that the two light rays which meet at the screen and
interfere have such a small angle between them that we can treat them as parallel. The path
difference (the difference in distance travelled for the two waves) between the two rays then
just comes from the difference in the lengths of the path near the slit, as shown in Figure S14.3.
4
to screen
d sin θ
Figure S14.3 Path difference between two interfering rays in double-slit diffraction.
If constructive interference occurs, then the path difference between the two rays must be
an integer multiple of the wavelength: i.e. for an integer n, the condition for constructive
interference is:
path difference = nλ = a sinθ
If destructive interference occurs, then the path difference between the two rays must be
 1
 n +  wavelengths (leading to the waves being out of phase by half a wavelength). The
2
condition for destructive interference is therefore:
 1
path difference =  n +  λ = a sinθ
 2
1 For those of you that have come across complex numbers in your mathematics, the idea of phasor diagrams
leads neatly into representing waves by complex numbers (which also add as vectors in the Argand plane).
The equation for constructive interference above allows us to calculate the angular
separation of two maxima in the interference pattern. If we consider the geometry of the
situation, we can derive the equation for the separation on the screen of the maxima, as given
in Chapter 14 of the Coursebook. We set up the double slits so that they are a distance D
away from the screen. The geometry is shown in figure Figure S14.4>.
X
Center of θ
double
slits D
screen
Figure S14.4 The geometry of double-slit diffraction.
The position of the nth order maximum is given by nλ = asinθ . However, since distance D is
large, angle θ is relatively small, and therefore (with θ in radians):
x
sinθ ≈ tanθ ≈
D
Substituting this into the equation giving the position of our nth order maximum, we get:
ax
nλ =
D
n increases by 1 between successive maxima, so the separation between maxima is given by: 5
Dλ
x=
a
What does the interference pattern look like?

We can use the phasor diagrams that we introduced earlier to work out the form of the
interference pattern in between maxima and minima.
If we know the path difference between two waves, we can calculate the phase difference φ,
in radians:
2π
phase difference = path difference ×
λ
Therefore, the phase difference between the two rays that have travelled at an angle θ to the
normal to the slits is:
2π 2π ax
φ = a sinθ × =
λ λD
x
where in the second equality we have used the approximation sinθ ≈ tanθ ≈ .
D
Constructive interference occurs when the phase difference is an integer multiple of
2π (φ = 2mπ ), and if the amplitude of one of the waves is A0, the amplitude here will be 2A0.
Destructive interference occurs when the phase difference is an odd integer multiple of
π (φ = ( 2m + 1) π ), and the amplitude at these points will be 0. At phase differences between
these two extremes, the amplitude will lie between 0 and 2A0. These situations are shown in
Figure S14.5.
a constructive interfernce b destructive interfernce c between constructive and

phase difference φ = 2m π phase difference φ = (2m+1) π destructive interference
amplitude A = 2A0 amplitude A = 0
A0 A0
φ
Figure S14.5 Adding up phasors for double-slit interference. a shows constructive interference,
b shows destructive interference and c shows a phase difference φ which lies between
constructive and destructive interference.
If we use the cosine rule to calculate the amplitude A of the addition of two phasors, each of
amplitude A0, with a phase difference φ between them, we find:
A2 = A02 + A02 − 2 A02 cos (180 − φ ) = 2 A02 (1 + cos φ )

θ
Using the trigonometric identity cosθ = 2cos 2 − 1, we can simplify this to:
2
φ  π ax 
A2 = 4 A02 cos 2 = 4 A02 cos 2  
2  λD 
where in the second equality we have substituted our approximation for φ .

Remembering that the intensity is proportional to the square of the amplitude, we can
therefore write down an expression for the intensity of the pattern on the screen, as a
function of position x from the centre of the screen:
 π ax 
I = I0 cos 2  
 λD 
where I0 is the maximum intensity seen on the screen. This function is shown in Figure 6
S14.6. In reality, the finite width of the slit means that the pattern decays in intensity as you
move away from the centre, so the central maximum does in fact have the largest intensity.
The actual pattern is in fact a combination of the double-slit pattern we have derived here
and the single-slit pattern we will investigate shortly.
Intensity I/I0
1
0.8
0.6
0.4
0.2
x
‒2λD/a ‒λD/a 0 λD/a 2λD/a
Figure S14.6 The intensity of the double-slit pattern as a function of position on the screen.
S14.3 M
ultiple-slit interference and diffraction
gratings
Now we will investigate what happens when we have more than two slits. Again we will use
the idea of phasor diagrams to work out the intensity of the pattern that will be seen on the
screen.
Imagine that we now have three slits, the centre of each slit being separated from the
centre of the next slit by a distance d (note that in the section above we called this a to match
with the notation given in Chapter 14 of the Coursebook; here we call the slit separation d to
match with the equation for the diffraction grating given in Chapter 14 of the Coursebook).
The maxima of the pattern that results correspond to the phasors from all three slits being
lined up at a particular point on the screen (i.e. the phase difference between the rays is a
multiple of 2π). However, in between these primary maxima, there is a secondary maximum,
which corresponds to two of the phasors being in phase, and the other being out of phase.
There are also two minima between the primary maxima. These occur when the phasors add
up as a closed figure: with three slits, this is a triangle. The phasor diagrams and the graph of
the resulting intensity on the screen for three-slit interference is shown in Figure S14.7.
2 3 4
x
‒λD/d ‒λD/2d λD/2d λD/d
Phasor diagrams corresponding to numbered points on curve :

1 φ=0
2 2π/3 φ = 2π/3
3 φ=π
4π/3
4 φ = 4π/3
Figure S14.7 The diffraction pattern for three-slit interference. φ is the phase difference between
neighbouring slits, and D is the distance to the screen.
Now consider what happens as we add more slits. With four slits, we get two secondary
maxima between primary maxima, and minima between each of the maxima. However,
because the amplitude at the primary maximum now comes from four phasors lined up in
phase, but at the secondary maxima the in phase contribution is smaller (see the question
below), we notice that the primary maxima are brighter compared to the secondary maxima.
The first minimum is also closer to the primary maximum than it was in the three-slit
pattern. This trend continues, and as we add more and more slits, the primary maxima
become brighter and narrower and the secondary maxima gradually disappear. When we
have a large number of slits, we have a diffraction grating, which has bright, narrow maxima
at the positions of the primary maxima in the n-slit pattern. This leads to the diffraction
grating equation, relating the nth order maximum to the slit separation d:
nλ = d sinθ
Since the peaks are so narrow for a grating, it means that we are better able to distinguish
lines of different wavelength. So a diffraction grating can be used in an instrument such as a
spectrometer to measure the wavelength of spectral lines to a high degree of accuracy.
question
14.1 Use the ideas shown in Figure S14.7 for three-slit interference to sketch the phasor
diagrams and intensity of the diffraction pattern for four-slit interference.
S14.4 Single-slit diffraction

We have already seen how light diffracts as it passes through a single slit in Chapter 14 of the 8
Coursebook. In order for diffraction through an aperture or around an object (see later) to be
significant, the aperture or object must be similar in size to the wavelength of the wave being
diffracted. This is why, for instance, we find that light is not diffracted through a window, but
sound is diffracted through the same window aperture (when it is open).
Here we are going determine the position of the minima in the single-slit diffraction
pattern. To analyse the pattern, we can use an idea called Huygens’ principle: each point on
a wavefront is a point source of wavelets (semi-circular waves). These wavelets superpose to
form future wavefronts (see Figure S14.8).
Figure S14.8 Huygens’ principle: each point on a wavefront is a point source of wavelets (semi-
circular waves). These wavelets superpose to form future wavefronts.
This means that we can treat points along the single slit as being sources of secondary
wavelets, or equivalently we can consider the effect of many rays, each equidistant from the
next ray, coming from the slit. The result of this analysis is shown in Figure S14.9. In order
to get destructive interference, we have to have a path difference of a wavelength (a phase
difference of 2π) across the width of the slit.
Phasors all in Large

same direction resultant
The net effect of the complete set of phasors is that they

Each phasor form a circle. This means these phasors add to zero,
Zero
at same angle producing zero amplitude. The path difference across
resultant
to the next the slit must equal one wavelength, λ
First zero intensity at angle θ

λ = b sin θ
Angle θ
θ
Path difference
across whole
slit = b sin θ
9
b b
Figure S14.9 Diffraction at a single aperture.
We find that we get minima at angle θ from the centre of the pattern, where
nλ = bsinθ
In this equation, b is the width of the slit. If you look at the pattern in Figure S14.9 you
will see that there is a maximum at the centre of the pattern, as you might expect. So this
equation is only valid when n ≠ 0. Remember, of course, that in an experiment, you might
measure the angular distance between the first minima on either side of the central peak
(since this is easier to determine than the position of the centre of the pattern). This angle
would be twice the angle from the centre of the pattern. Therefore, you must be careful to
use the correct angle when doing calculations. You must also be careful with calculations
involving double slits or diffraction gratings.
A more detailed analysis of the addition of the phasors for each ray, in the limit where the
spacing between the rays goes to zero, gives us the following expression for the intensity of
the pattern as viewed on the screen (relative to the maximum intensity I0):
 π bx 
I0 sin 2 
 λ D 
I= 2
 π bx 
 
λD 
If you have a graphical calculator you could plot this function to verify that it looks like what
we have drawn in Figure S14.9. You are not expected to know this formula.
A real double-slit interference pattern uses slits that have a finite width (our previous
analysis assumed that the slits were point sources of waves). The slits are narrower than the
separation between them, and this means that the distance between minima of the single-slit
pattern of one of these slits would be much wider than the separation between minima of the
double-slit pattern we derived earlier. In order to work out what the pattern actually looks
like, we use the single-slit pattern as an ‘envelope’ over the double-slit pattern. So the double-
slit pattern ends up brighter where the single-slit pattern has a maximum, and disappears
where the single-slit pattern would have a minimum. Since the single-slit pattern decays
away quite quickly off-axis, this explains why, if you do the double-slit experiment, you will
only see a bright diffraction pattern near the centre of the screen. You should be able to make
out the minima which correspond to the single-slit pattern and work out the width of the
slits in the apparatus you are using. You will see the pattern again outside of these minima,
but it will be fainter. The question below asks you to work through what you might see in
such a case.
question
14.2 Light of wavelength 500 nm is incident on a double slit. The slit separation is 0.50 mm,
and the width of each slit is 0.10 mm. The diffraction pattern is viewed on a screen at a
distance of 5.0 m from the slits.
a Calculate the fringe separation in the double slit interference pattern, assuming that
the slits are point sources of light.
b Calculate the position of the first minimum of the diffraction pattern of a single slit of
width 0.10 mm.
c Use your answers to a and b to sketch the diffraction pattern for these double slits.
In the case of a diffraction grating, the single-slit envelope to the diffraction pattern can
cause us to have ‘missing orders’ – which is where one of the maxima of the pattern we
10
would expect from the grating lines up with one of the minima of the single-slit pattern. The
expected maximum disappears and we get a ‘gap’ in the pattern.
S14.5 The Rayleigh criterion

The diffraction of light as it passes through an aperture, such as the single slit discussed
above, has profound implications for the maximum possible resolution of optical
instruments, such as telescopes. Light from a point source will be diffracted as it passes
through the aperture of the instrument, and this means that there is a limit on how close two
objects can be (in terms of angular distance) and still be distinguished by the instrument.
The wider the aperture, the less diffraction there is (since the diffraction pattern gets
narrower). So instruments with a wider aperture have a higher resolution.
We can only resolve two point sources after the light has been diffracted through an
aperture if the maximum of source 1 is at least as far away from the maximum of source 2
as the first minimum of source 2 (see Figure S14.10). Since we know that for a single slit, the
distance between the maximum and the first minimum is given by n = 1 in nλ = bsinθ , we
require that
λ
sinθ =
b
However, since we are resolving two objects that are close together, the angle θ is small, so
we can use the small angle approximation sinθ ≈ θ . This gives us the Rayleigh criterion: an
aperture of size b allows us to resolve two point sources of light of wavelength λ if they are
separated by an angle greater than
λ
θ ≈
b
θ = λ/b
Figure S14.10 The light from two point sources which has passed through an aperture of width
λ
b can be resolved if the sources are separated by a minimum angular distance of θ = . This
b
corresponds to the maximum of the diffraction pattern of one source lining up with the first
minimum of the other source.
There are other factors which affect the resolution of a telescope, such as the quality of
the optical components (lenses and mirrors), and on Earth, the effects of the atmosphere.
Telescopes are therefore often placed on mountains to reduce distortion due to the
atmosphere. The Hubble Space Telescope is in orbit to avoid all atmospheric effects. This
telescope is quite close to being limited in its resolution only by the diffraction limit.
Most optical instruments have a circular aperture. This changes the analysis presented above,
λ
but it turns out that it only changes the Rayleigh criterion slightly: θ ≈ 1.22 . Remember
b
that this equation gives us the angular distance between the centre of the pattern and the
first minimum. The diffraction pattern from a circular aperture is also circular: it has a 11
central maximum, surrounded by a minimum that takes the form of a ring. This in turn
is surrounded by further maxima and minima, with the maxima getting much less bright
the further you go from the centre. An example of the diffraction pattern from a circular
aperture is shown in figure Figure S14.11.
Figure S14.11 The diffraction pattern from a circular aperture.

questions
14.3 The pupil in your eye has a diameter of about 5 mm. The wavelength of light is
approximately 500 nm.
a What is the limit on the angular resolution of your eye set by the size of the pupil?
b What width does this correspond to on the retina, which is approximately 25 mm
behind the pupil?
c The cones on your retina are separated by about 0.003 mm. Comment on this value,
in light of your answer to part (b).
14.4 The 300m diameter Arecibo radio dish in Puerto Rico is used with 100 mm radio waves.
Estimate the angular resolution that can be achieved at this wavelength.
S14.6 Diffraction at an edge

When we have a plane wave arriving at a barrier, as shown in Figure S14.12a, Huygens’
principle tells us that the wave will diffract into the region behind the barrier (despite the fact
that we would expect this region to be in the geometric shadow).
If we analyse this in more depth (the details of this analysis are beyond the scope of the
course), we find that as well as getting a non-zero intensity in the geometric shadow, there are
maxima and minima in the non-shadow region (see Figures S14.12a and b).
a wavefronts b Intensity
secondary
sources
incident 12
wave
shadow
region
edge of Distance
absorbing geometrical shadow
screen
Figure S14.12 a Huygens’ principle (treating each wavefront as a source of spherical wavelets)
tells us that when a wavefront meets an edge, there will be some diffraction into the shadow
region. b Graph showing the wave intensity as we cross the geometric edge. Notice that as well as
having some intensity in the shadow region, we also get maxima and minima in intensity in the
non-shadowed region – a diffraction pattern. c The diffraction pattern for a straight edge.
Summary
■ Phasor diagrams can be used to track the phase and amplitude of a wave. Phasors
are added like vectors, and the sum of the phasors from two or more different waves
gives us the amplitude and phase of the superposition of those waves.
■ Using phasor diagrams, we can work out the positions of constructive and
destructive interference in the diffraction patterns from double slits, multiple slits
and single slits.
■ When a wave passes through a single slit that is of comparable size to its wavelength,
it is diffracted. The positions of the minima in the diffraction pattern are given by the
equation nλ = bsinθ , where b is the width of the slit.
■ The diffraction patterns produced by double slits and diffraction gratings are a
combination of the interference pattern for that arrangements of slits and the single-
slit diffraction pattern.
■ Diffraction through an aperture limits the resolution of optical instruments such
λ
as telescopes. We can use the Rayleigh criterion θ ≈ to work out the minimum
b
angular distance that can be resolved by an optical instrument.
■ Diffraction also happens at the edge of a barrier. There is both diffraction into the
geometric shadow region and a series of maxima and minima in the non-shadow
region, close to the edge.
13
S16: Radioactivity
Learning Outcomes
■ show an awareness of the existence and main sources of background radiation
■ recall that the standard model classifies matter into three families: quarks (including up and
down), leptons (including electrons and neutrinos) and force carriers (including photons
and gluons)
■ recall that matter is classified as baryons and leptons, and that baryon numbers and lepton
numbers are conserved in nuclear transformations
question
16.1 A proton has about 2000 times the mass of an electron and so the mass of a hydrogen
atom can be assumed to be the same as the mass of the proton. Use the data given for
the radius of a proton and an atom to find the ratio of the densities of hydrogen and the
bare proton.
1
S16.1 Background radiation
We are constantly surrounded by radiation, emitted by radioactive substances in the
environment. This naturally occurring radiation is called background radiation and it
comes from a number of sources including the following.
• Cosmic rays – These are high energy particles from the Sun and other stars which hit our
atmosphere. Some reach ground level, and others interact with atoms in the atmosphere,
changing their nuclei. This is how radioactive carbon-14 (used in carbon dating) is formed.
• Radon – This is a radioactive gas, present in very small quantities in the air and which
can also build up in rocks such as granite. Radon levels vary greatly from place to place
around the world, depending on the underlying geology.
• Terrestrial – Most rocks and soil contain radioactive substances such as uranium in small
quantities. These substances also find their way into building materials.
• Biological – There are radioactive isotopes of many of the atoms of elements that plants
and animals use. As a result, our own bodies and the food we eat are slightly radioactive.
Carbon-14 and potassium-40 are specific examples. The high levels of potassium in nuts
and bananas has led to the joking suggestion that they be used as a unit for radioactivity!
• Nuclear testing and accidents – Open-air tests of nuclear weapons through the 1950s and
1960s and the small number of leaks from nuclear power stations has released radioactive
substances into the atmosphere or environment.
• Medical – Some people count exposure to radiation through medical procedures as part
of background radiation, although it is unevenly distributed and people are usually aware
of the exposure, whereas for the other forms they are not.
Because living things have evolved in an environment of low-level nuclear radiation, all
living things have a certain tolerance for very low doses. In addition, the background level
allows us to set a gauge to measure other radiations by. For example, you may have seen
experiments that demonstrate radioactivity. If you were told that the additional exposure
to radiation was less than 1% of the annual total background radiation you would normally
experience, you would probably find that acceptable. If the exposure turned out to double the
annual background radiation, you may think it was not worth the risk. As it is impossible to
have zero radiation exposure anywhere on Earth, the average background radiation sets a
reasonable level for safe working.
Table S16.1 shows that Cornwall in the UK has a particularly high level of radon, which
otherwise contributes about 50% of a typical person’s background exposure. A transatlantic
flight increases the exposure to cosmic radiation, and working in a nuclear power station
adds very little more to one’s exposure than two flights.
Source Radiation dose / mSv

UK overall annual average 2.7
100g Brazil nuts 0.01
One transatlantic flight 0.08
Nuclear power station worker annual exposure 0.18
UK annual average due to radon 1.3
Cornish annual average due to radon 7.8
Table S16.1 A chart showing some different contributions to background radiation. The unit used,
the milliSievert (mSv), is a measure of the potential biological effect of radiation. Data sourced
from UK Government publication ‘Ionising Radiation: Dose Comparisons’.
S16.2 Conservation laws

You should notice that in the beta decay example, one quark turns into another and no
quarks appear from nowhere or disappear. This is an example of a conservation law in
particle physics, the conservation of baryon number. If we count every baryon (such as a
2
proton or a neutron) as having a baryon number of +1, then each quark has a baryon number
of +⅓. Antiparticles such as antiprotons have a baryon number of −1 and antiquarks are
−⅓. Thus a pair of particles, such as a proton and an antiproton, can be created from non-
baryons, because the total baryon number remains unchanged. For example, the Large
Electron Positron collider at CERN (LEP, which preceded the Large Hadron Collider)
collided high-energy electrons and positrons to create hadrons:
e+ + e− → p+ + p−
where e+ is the positron and p− (which you may see written as − p) is the antiproton. Neither of
the particles on the left is a baryon, so their baryon number is zero. On the right, the baryon
numbers are +1 and −1, for a total of zero as well.
We see a second example of conservation in the beta-decay equation. When an electron (a
lepton) is produced, so is an antineutrino. As there are no leptons on the left of the equation
(just one quark) there must be zero total lepton number on the right. This is achieved by
having two particles produced as well as a change of quark. One is the electron, a lepton with
lepton number of +1, and the other is an antineutrino, with a lepton number of −1. Again,
the total lepton number remains zero. In β+ decay, the positron, with a lepton number of
−1 (an antiparticle) is accompanied by a neutrino (lepton number of +1). This then ensures
conservation of lepton number as well.
Mesons are hadrons with zero baryon number. They consist of a quark and an antiquark
(+⅓ and −⅓) and so can be created from the energy of collisions or decays. An example is
−
the π+ meson, which is an up-quark (u, charge +⅔) and an anti-down quark (d charge +⅓).
Mesons are all short-lived because they can decay into lighter lepton–antilepton pairs whilst
still conserving baryon and lepton number.
S16.3 Force carriers

The theories developed to explain the strong and weak nuclear forces introduced a third class
of particle: force carriers. Each of the fundamental forces is associated with a particle or set of
particles that ‘carry’ the force. Interactions between particles can be modelled as fundamental
particles such as leptons or quarks exchanging these force carriers, which carry momentum
and energy. The force carriers and their properties are summarised in Table S16.2.
Force Particle Symbol Range Notes

electromagnetic photon γ infinite has no mass, so has infinite
range
weak nuclear W and W+, short – across high mass, so range is short;
Z vector W−, Z0 a nucleus three types of force carrier
bosons were predicted, two charged
and one neutral, and all three
were discovered at CERN in
the 1980s
strong nuclear gluons g very short very high mass, so range is
– within a short; there are eight different
nucleus gluons
gravity graviton G infinite zero mass, so infinite range;
predicted by theory but yet to
be discovered or measured
Table S16.2 Fundamental forces and force carriers
Summary
■ Hadrons are particles made of quarks, which are affected by the strong force. They
include baryons and mesons.
■ Leptons are fundamental particles that are not affected by the strong force.
■ Baryon number is conserved, with baryons having a baryon number of +1, antibaryons
−1 and mesons 0. Quarks have a baryon number of +⅓ and antiquarks −⅓.
■ Lepton number is conserved, with leptons (such as electrons and neutrinos) having a
lepton number of +1 and antileptons (such as positrons and antineutrinos) −1.
■ Background radiation is present everywhere on Earth from natural sources
(including cosmic rays and some types of rocks) and artificial sources (including
medical devices such as X-ray machines and nuclear weapons tests).
■ The standard model classifies matter into three families: quarks (including up and
down), leptons (including electrons and neutrinos) and force carriers (including
photons and gluons).
■ Matter can be classified into baryons, mesons and leptons.
■ Baryon numbers and lepton numbers are conserved in nuclear transformations.
S17: Circular motion
Learning Outcomes
■ describe qualitatively the motion of a rigid solid object under the influence of a single force in
terms of linear acceleration and rotational acceleration
■ ∑
recall and use I = mr 2 to calculate the moment of inertia of a body consisting of three or
fewer point particles fixed together
■ use integration to calculate the moment of inertia of a ring, a disk and a rod
■ understand the concept of angular momentum
■ deduce equations for rotational motion by analogy with Newton’s laws for linear motion,
1 2 dω
including E = Iω , L = Iω and Γ = I
2 dt
■ apply the laws of rotational motion to perform kinematic calculations for a rotating object
when the moment of inertia is given
S17.1 Rotational motion

So far in the course, we have worked on the kinematics and dynamics of point bodies. We
1
have applied Newton’s laws and the equations of constantly accelerated motion to objects
while only considering their mass and centre of mass (gravity). We have not considered
the size and shape of the objects. This treatment gives correct results if a force is applied
along a vector that passes through the centre of mass: the force acts to accelerate the centre
of mass and we can continue treating the object as a point body. However, if the force is
applied along a vector that does not pass through the centre of mass, not only will the
object move linearly in the direction of the force, but also the object will start rotating
about its centre of mass.
You can try this for yourself by loosely holding a ruler in one hand and (carefully!)
hitting it out of that hand with your other hand. You will see that the ruler both rotates and
moves away from your hand. It will also fall, of course, because of its weight; but remember
that an object’s weight acts vertically downwards through its centre of mass, so the weight
will not contribute to the rotational or horizontal motion. This combined horizontal and
rotational motion of a rigid (stiff) object under the influence of a single force is illustrated in
Figure S17.1.
v
ω
Figure S17.1 A force that acts on an object in a line that does not pass through the centre of mass
causes the object to undergo linear and angular acceleration.
This combination of linear and rotational acceleration requires some thought to analyse in
detail. First, we will look at how we can describe and explain the rotation of a rigid body
about a fixed axis or pivot point.
Describing rotational motion

We need to develop equations of rotational motion using calculus. We can use some of the
definitions and formulae from circular motion, and extend them to include the rotation of
solid objects:
• Period of rotation, T: the time taken to complete one rotation about an axis (measured in
seconds)
• Frequency, f: the number of complete rotations per second (measured in Hz)
• Angular displacement, θ: the angle an object rotates through (measured in radians).
Frequency and period are related in the same way as we would expect from our study of
waves and circular motion:
2
1
f=
T
We can define the instantaneous angular velocity as the rate of change of angular
displacement, measured in radians per second:
dθ
ω=
dt
If we plot a graph of angular displacement against time, the instantaneous angular velocity at
a particular time is the gradient of the graph taken at that time. Note that this has the same
form as the equation for linear velocity:
dx
v=
dt
If the angular displacement ∆θ changes over a time ∆t, then we can calculate the average
angular velocity as:
∆θ
ω av =
∆t
Just as with linear velocity, if the angular velocity is constant, then the graph of angular
displacement against time is a straight line with the gradient equal to the average angular
velocity. If the angular velocity is changing, the graph of angular displacement against time
is curved and the instantaneous angular velocity is the gradient of a tangent to the curve.
Each complete rotation corresponds to an angular displacement of 2π, so if there are

f rotations per second, the angular velocity in radians/second is given by 2πf. Therefore we
can write down the following formulae:
2π
ω= = 2π f
T
If we want to calculate the linear velocity of a particular point on a rotating object

(see Figure S17.2), then we need to know its distance r from the axis of rotation or pivot
point. The moving point traces a circle of radius r around the pivot point. The linear
speed of the point is the rate at which it moves around the circumference of this circle, in
the direction of a tangent to the circle. Using calculus, we can find the derivative of the
displacement with respect to time:
ds dθ
=r = rω
dt dt
or as we had before:
v = rω
This is the same equation as we used in circular motion.
θ
s = rθ 3
v = rω
Figure S17.2 A rotating object.
question
17.1 a A turntable rotates at 33 revolutions per minute. Determine the period, frequency
and angular frequency for this rotation (in standard units).
b The diameter of the turntable is 30 cm. Calculate the speed of a point on the edge of
the turntable.
Torque and angular acceleration

Remember that in linear motion, a non-zero resultant force causes a linear acceleration. In
rotational motion, a non-zero resultant torque causes an angular acceleration. Remember
also from Chapter 4 that we can calculate the resultant torque on a body that is free to rotate
by finding the sum of all the torques, or moments, on a body:
moment (in N m) = force (in N) × perpendicular distance from the pivot (in m)
The angular acceleration is the rate of change of angular velocity. In this book, we give
angular acceleration the symbol α. It is measured in radians per second per second (radians/
(second)2):
dω
α=
dt
dθ dω d 2θ
Note that since ω = , angular acceleration α = = 2
dt dt dt
We call this the second derivative of angular displacement with respect to time.
If a rotating body changed its angular velocity by an amount ∆ω in time ∆t , then we

could calculate the average angular acceleration during that time as
∆ω
α=
∆t
We have seen how we can determine equations of motion for rotational motion that have
the same form as the equations for linear motion. However, before we can write down the
equivalent of Newton’s second law (F = ma) for rotational motion, we need to answer the
following question: what is the rotational equivalent to mass? This is not a straightforward
question to answer!
S17.2 Moment of inertia, kinetic energy and torque

If you have ever used a shopping trolley at a supermarket, you may have noticed that the
trolley handles differently depending on how you load it (see Figure S17.3). Most of the
large trolleys in British supermarkets have a partition near the front of the trolley, and it 4
is tempting to load this up with any bottles you have in your shopping, to stop them from
rolling around. However, usually bottles containing liquid are the heaviest part of your
shopping, and you’ll find that if you do this, it becomes more difficult to steer the trolley
around the corner, and once it is around the corner, it is difficult to stop it rotating. When
you steer a supermarket trolley you usually pivot it about a point close to the back, so it seems
that positioning the mass in the trolley far from the pivot point makes it harder to start and
stop any rotation.
a b
25 25
kg kg
! !
Figure S17.3 Loading supermarket trolleys: a It is easier to make this supermarket trolley turn
around a corner. b It is much harder to get this supermarket trolley to turn around a corner. Once
you have started the trolley rotating, it is also harder to stop.
We can do a simple experiment in the lab to show the same thing (Figure S17.4). Take a metre
rule, and tape equal masses on either side of the centre, as shown in a. Try to rotate the rule.
Now move the masses further away from the centre towards each end, as shown in b, and try
to rotate the rule again. It should be harder to start and stop the rotation in b than in a.
ω ω
m
m
a b
Figure S17.4 A metre rule with masses attached. It is easier to start and stop the ruler rotating
with the masses in position a than with them in position b.
m1
r1
O
m2
v1 = r1 ω
5
ω
Figure S17.5 A rotating rigid body.
Figure S17.5 shows a rigid body rotating about an axis O at angular velocity ω. Imagine it as
being made up of a series of point particles, of masses m1, m2, m3 . . . , each at a distance of r1,
r2, r3 . . . from the rotation axis. Particle 1 is moving at a speed
v1 = r1ω
and therefore particle 1 has kinetic energy
1 1
KE1 = m1v12 = m1r12ω 2
2 2
We can write down similar equations for the rest of the particles. The total kinetic energy of
the rotating body is the sum of the kinetic energies of the particles:
1 1 1
KE = m1r12ω 2 + m2r22ω 2 + m3r32ω 2 +…
2 2 2

1
( )
KE = ω 2 m1r12 + m2r22 + m3r32 +…
2
We call the quantity in brackets the moment of inertia and give it the symbol I. Note that
the angular velocity ω is the same for all the particles. We can write this quantity using
mathematical notation:
I = m1r12 + m2r22 + m3r32 +…= ∑m ri i

2
i

(the ‘Σ’ means ‘sum over all values of the index i’)
This means that our kinetic energy equation for a rotating body becomes
1
KE = ω 2
2

This equation has a similar form to the kinetic energy of linear motion, where the moment
of inertia is the rotational equivalent of mass (measured in kg m2) and the angular velocity is
equivalent to velocity.
Look again at the formula for the moment of inertia. We can see that if the mass is
distributed further from the pivot point, the moment of inertia is larger. (In fact, if you
double the distance from the pivot, you increase the moment of inertia by a factor of 4.) Just
as a more massive object is more difficult to accelerate, an object with a larger moment of
inertia is harder to rotate. This explains why the supermarket trolley we discussed is hard to
get around a corner with the mass distributed towards the front of the trolley – its moment
of inertia is much larger, and so a larger torque is required to produce a given angular
acceleration. The same logic applies to the experiment with the metre rule.
We can now write down the equivalent to Newton’s second law for angular acceleration: 6
torque = moment of inertia × angular acceleration

Γ = Iα

WORKED EXAMPLE S17.1

O
70 cm
0.25 kg
20 cm
1.00 kg
Figure S17.6 For Worked example S17.1.

WORKED EXAMPLE S17.1 (continued)
A pendulum is constructed by attaching two masses to a light rod, as shown in

Figure S17.6. Calculate the moment of inertia of the pendulum when it is rotated about point O.
Treat the two masses as point masses, each located at their centre of mass.
The formula for the moment of inertia is:
I = m1r12 + m2r22 + m3r32 +… = ∑m r i i

2
The 0.25 kg mass is at 0.5 m from the pivot, and the 1.00 kg mass is at 0.7 m from the pivot.
Therefore the moment of inertia is
2 2
I = 0.25 kg × ( 0.5 m ) + 1.00 kg × ( 0.7 m ) = 0.55 kg m 2

Moment of inertia by integration

In Worked example S17.1 we modelled the rigid body as being made up of a number of point
masses, and added up the moments of inertia of those point masses about the pivot to get
the total moment of inertia. Most objects are not so easily modelled as point masses. An
approximation to the moment of inertia could be made by dividing the object up into small
regions and using the centre of mass of each region in the moment of inertia calculation. As
the regions get smaller, the approximation becomes more and more accurate.
If we could divide the object up into infinitesimally small regions and then sum up these
moments of inertia, then we would have the exact moment of inertia of the object. We can, in
fact, do this – using the calculus technique of integration. The equivalent formula to the sum
we had before is: 7
∫
I = r 2dm

We will demonstrate how to use this formula in three examples: a rod, a disk and a ring.
Moment of inertia of a uniform rod

x element has mass M dx/L
dx
– L/2 0 L/2
Figure S17.7 Calculating the moment of inertia of a uniform rod.
We will calculate the moment of inertia of a rod about its centre point (see Figure S17.7).
The rod is uniform, has mass M and total length L. Since the rod is uniform, it will have a
constant mass per unit length:
M
ρ=
L
We will divide the rod up into small elements, each of length dx. Each element therefore
has mass:
M
dm = dx
L
The x coordinate is the displacement from the pivot point. We can say that the element of
length dx at position x has the moment of inertia:
dI = x 2dm
and so, substituting the formula for dm from above:

x2M
dI = dx
L
Of course, the contribution of each element to the total moment of inertia varies in size
depending on how far you are from the pivot point – this is taken account of here, because
we have the x 2 term. In order to find the total moment of inertia, we need to sum up these
contributions over the entire length of the rod. The x-axis is defined as being along the rod,
with the origin of coordinates at the centre of the rod. So the rod extends from:
x = − L / 2 to x = L / 2
We can find the total moment of inertia by integrating over x , from –L/2 to L/2.
L
x= L
2
x2M  x3M  2 M  L3 L3  ML2
∫
I = dI = ∫ L
L
dx =   =  +  = 12
 3L  − L 3L  8 8 
x =− 2
2
You may want to check you have followed each step in obtaining this result by doing the full
calculation yourself.
When we calculate a moment of inertia, we always put the origin of coordinates at the 8
pivot point. In the example of the rod, if we instead pivot the rod about one end then we
should place the origin at that end of the rod, and x will take values between 0 and L in the
integration.
question
2
17.2 Prove that the moment of inertia of a uniform rod about one of its ends is ML . Hint:
3
follow the same steps we used above, but change the origin of coordinates to one end
of the rod.
Moment of inertia of a hoop or thin ring

A hoop (or thin ring) of radius R and mass M is rotated about an axis perpendicular to the
hoop and through its centre. It has all its mass concentrated at a distance R from the pivot.
So we can simply write down that its moment of inertia about this axis is
I = MR 2
This result is also the moment of inertia of a thin-walled, hollow cylinder about its axis, as
the distribution of mass about the rotation axis is identical.
Moment of inertia of a solid disk
dr
Figure S17.8 Calculating the moment of inertia of a disk.
Figure S17.8 shows a solid disk, with total mass M and radius R. To calculate its moment of
inertia about an axis through the centre and perpendicular to the disk, we need to divide it
up into infinitesimally small rings (annuli). Each ring has a different radius, so to add up all
the infinitesimally small rings we integrate over radii from 0 to R.
The mass per unit area of the disk is
M
ρ= 9
π R2
Consider an element of this disk: a thin ring, of width dr and at radius r from the centre
of the disc. Its circumference is 2π r . (We can ignore the fact that the inner and outer
circumferences are very slightly different, because if we took this into account, they would
contribute terms to the expression with ( dr ) in them. As we integrate and dr tends to zero,
2
then these terms go to zero much faster than terms where dr is the only small quantity.)
The area of the thin ring is therefore
dA = 2π r dr
and the mass of this ring is

M M
dm = dA =
π R2 π R 2 2π r dr
Note carefully the difference between R, which is the radius of the whole disk, and r, which is
the radius of the thin ring whose moment of inertia we are adding to the total.
The formula for the moment of inertia of a thin ring is:
I = mr 2
so each ring contributes the following to the moment of inertia:

M 2M
dI = r 2dm = r 2 × 2π r dr = 2 r 3dr
π R2 R
We now need to add up these contributions for the whole disk, so we integrate over r from 0
to R:
r=R r=R
2M 3 2M  r 4  2 M  R 4  MR 2
I = dI =∫ R ∫
2 r dr = 2   = 2  =
R  4  r =0 R  4  2
r =0
question
17.3 Without doing any further calculation, write down the moment of inertia of a solid
cylinder. Justify your answer.
We can use the result for a disk to calculate the moment of inertia of a sphere. We can
consider a sphere as being made up of lots of thin disks, and their radius varies as a function
of how far they are above the centre of the sphere. Doing this is beyond the scope of this
course, but you might like the challenge! The result is given in the table of moments of
inertias below.
Moment of inertia of a ring (annulus)
R2
dr
10
R1
r
Figure S17.9 Calculating the moment of inertia of a ring (annulus).
Figure S17.9 shows a ring, or annulus, with inner radius R1, outer radius R2 and mass M. If
we want to calculate its moment of inertia about an axis through the centre of the ring and
perpendicular to the ring, then our calculation is very similar to that for the disk. In fact, the
only changes we need to make are:
• adjusting the mass per unit area, so it takes account of the missing central part of the ring
• changing the limits of integration.
The new mass per unit area is (subtracting the area of the central missing part of the disk
from the area of a solid disk):
M
ρ=
π R2 − R1 2
2
( )
When we divide the ring up into infinitesimally thin rings, the moment of inertia of each
thin ring is:
M 2M
dI = r 2dm = r 2 × 2π r dr = r 3dr
π ( R2 2 − R1 2 ) ( R2 2 − R1 2 )
To calculate the moment of inertia of the whole ring, we need to integrate this result from
r = R1 to r = R2:
r = R2 r=R
2M 2M r4  2 2M  R2 4 − R1 4 
∫ ∫ (R
3
I = dI = r dr =   = 2
r = R1
2
2 − R1 2 ) (R
2
2 − R1 2 )  4 r = R1 ( )
R2 − R1 2  4 
However, since

(
R2 4 − R1 4 = R2 2 − R1 2 R2 2 + R1 2)( )
we can simplify our result to:
M 2

I=
2
(
R2 + R1 2 )
S17.3 Using moments of inertia
Point mass M at distance R from the

axis of rotation r
I = MR 2
11
Rod of length L and mass M, with the

axis of rotation at the end of the rod ML2
I=
3
Rod of length L and mass M, with the

axis of rotation at the centre of the rod ML2
I=
12
Thin circular hoop of radius R and

mass M
and
I = MR 2
Thin cylindrical shell with open ends,
of radius R and mass M about its axis
Thin, solid disk of radius R and mass M
and
MR 2
I=
Solid cylinder of radius R and mass M 2
about its axis
z
r
Hollow sphere of radius R and mass M
y 2 MR 2
I=
3
x
z 12
r
Solid sphere of radius R and mass M
y 2 MR 2
I=
5
x
Table S17.1 Moments of inertia of a number of different solid objects.
A vinyl record rotates on a turntable with an angular speed of 3.49 radians per second. The
record’s diameter is 0.305 m and its moment of inertia is 1.28 × 10 −3 kg m2.
a Calculate the mass of the record.
b Calculate its rotational kinetic energy.
c The record is brought to a standstill in 0.50 s by the application of a constant torque.
Calculate the torque exerted on the record.
a The record is a solid disk, so its moment of inertia is
MR 2
I=
2
Step 1 We have been given the moment of inertia and radius of the disk, so rearrange
the formula for the moment of inertia to make mass the subject:
2I
M=
R2
Step 2 Substitute the values given in the question, remembering to divide the
diameter of the record by two to get its radius, to calculate the total mass:
2 × 1.28 × 10−3 kg m 2

M= 2
= 0.110 kg
 0.305 m 
 
2 
b The rotational kinetic energy is given by the formula
1
KE = Iω 2
2
Step 1 Substitute the given values to determine the rotational kinetic energy:
1
( )
2
× 1.28 × 10−3 kg m 2 × 3.49 rad s −1 = 7.80 × 10−3 J
2
c
Step 1 Calculate the angular acceleration of the record as it slows:
∆ω −3.49 rad s −1
α= = = −7.0 rad s −2 13
∆t 0.50 s
Step 2 Use this value to calculate the magnitude of the torque exerted on the record:
Γ = Iα = 1.28 × 10−3 kg m 2 × 7.0 rad s −2 = 8.9 × 103 Nm

Analogies between linear and rotational motion

We have already seen a number of analogies between the equations for linear and rotational
motion. To get from the linear equation to the rotational version, we have swapped quantities
in the equation for their rotational analogue:
• velocity has been replaced by angular velocity (v → ω )
• mass has been replaced by moment of inertia (m → I )
• force has been replaced by torque (F → Γ). Table S17.2 summarises what we can deduce
from these quantities.
Linear quantity Rotational quantity
Kinetic energy Rotational kinetic energy

1 1
KE = mv 2 KE = Iω 2
2 2
Momentum Angular momentum
p = mv L = Iω
Force Torque
dp dL dω
F= = ma Γ= =I = Iα
dt dt dt
Table S17.2 Analogous formulae for linear and rotational motion.
Angular momentum, L
Remember that in a system where no external force acts, momentum is conserved. This is a
powerful law in mechanics. The rotational equivalent is that in a system where no external
torque (moment) acts, angular momentum is conserved.
angular momentum = moment of inertia × angular velocity
L = Iω
You may have experienced this if you have watched or taken part in ice skating or ballet.
An ice dancer who starts spinning with his arms outstretched will increase his rotation rate
as he brings his hands in (see Figure S17.10). As he brings his arms in, the mass of his arms 14
moves closer to his rotation axis. This means that his moment of inertia is reduced. Since
no external torque has acted, angular momentum must be conserved. The reduction in his
moment of inertia must be balanced by an increase in his angular velocity. He therefore
spins faster. Interestingly, his kinetic energy might increase during this process. Think about
where this energy might come from before reading on. As the skater pulls his arms in, he
causes them to accelerate – they do not follow the path that they would follow if no force
acted on them. He therefore has to do work to bring the arms in, and that work increases the
kinetic energy stored in his rotating body.
I1
Figure S17.10 An ice skater speeds up his rotation as he pulls his arms in. Angular momentum is
conserved, so reducing his moment of inertia means that his angular velocity must increase.
The attitude indicator on an aircraft may use a device called a gyroscope to maintain
an artificial horizon (see Figure S17.11). The gyroscope contains a rotating disk, which is
mounted in a framework containing three gimbals so that it is able to rotate freely in three
dimensions. The rotating disk has angular momentum, and since no external torque acts on it
(the gimbals have little friction), the disk remains horizontal while the aircraft and gyroscope
gimbals tilt around it. This enables the pilot to see what angle the aircraft is tilted at.
Figure S17.11 A gyroscope is used in the attitude indicator on an aircraft.
Energy of a rolling solid cylinder

Calculate the kinetic energy of a solid cylinder rolling on a flat surface with its centre of mass
moving at a linear speed v. The cylinder has mass M. 15
Step 1 Determine the moment of inertia
This is a solid cylinder, so it has the same moment of inertia as a solid disk. The question
does not state a radius, so the end result is probably independent of radius, but for now we
will call the radius R.
MR 2
I=
2
Step 2 Determine the angular velocity
The centre of mass is moving with linear speed v. This means that the point at which the
cylinder touches the ground is also moving with speed v, so the speed of the edge of the
cylinder is v.
v
ω=
R

Step 3 Determine the rotational kinetic energy and the kinetic energy of the centre of mass.
The total kinetic energy is the sum of the rotational kinetic energy of the cylinder and
the (linear) kinetic energy of the centre of mass.
Rotational KE:
1 MR 2v 2 Mv 2
KE = Iω 2 = =
2 4R2 4
KE of the centre of mass:

1
KE = Mv 2
2
Total KE:
3
KE = Mv 2
4
Think about what this result means for rolling a cylinder down a slope. At a given speed, it
has greater kinetic energy than would be expected from its centre of mass alone, as there
is also energy stored in the rotational motion. If the cylinder was dropped from a height h,
or slid down a frictionless slope from that height, it would achieve the same final velocity –
all of the initial potential energy would have been converted to kinetic energy of the centre
of mass. However, if it rolls down a slope from this same height, its centre of mass will
end up moving more slowly. The energy is now partitioned (split) between the KE of the
linear motion of the centre of mass and the rotational KE. You could test this for yourself
by rolling a full and an empty cylindrical jam jar or food tin down a slope, and see if the
difference in times you measure is the same as the distance you calculate. You will need to
work through the same calculation steps for a hollow cylinder (the moment of inertia is the
same as a thin hoop).
Summary 16
■ The circular motion of a rigid solid object under the influence of a single force can be
modelled in terms of linear acceleration and rotational acceleration.
■ The moment of inertia of a body consisting of point particles fixed together is given
∑
by I = mr 2 .
■ The moment of inertia of a ring, a disk and a rod can be calculated using integration.
■ Angular momentum is defined by the equation
angular momentum = moment of inertia × angular velocity.
■ In a system where no external torque (moment) acts, angular momentum is
conserved.
■ The equations for rotational motion can be remembered by analogy with Newton’s
1 2 dω
laws for linear motion, including E = Iω , L = Iω and L = .
2 dt
■ When given the moment of inertia for a rotating object, the equations of rotational
motion and the conservation of angular momentum can be used to perform
kinematic calculations.
S17.1
A car is travelling up one side of a hill and down the other side. The crest of the hill is a circular arc with
a radius of 45.0 m. Determine the maximum speed that the car can have while moving over the crest
without losing contact. [6]
S17.2
Find the moment of inertia of an equilateral triangle consisting of three point masses of mass m jointed
by light rods of length L, about the midpoint of one of the sides. [5]
S17.3
Explain why a tightrope walker uses a long pole to maintain their balance as they are walking. [3]
S17.4
A vehicle called a Gyrobus was developed in the 1950s. It used a flywheel to store the energy required to
power the bus: the wheel was spun up at a charging stop before setting off, and was then used to drive a
generator and an electric motor.
a When fully ‘charged’, the flywheel rotates about a vertical axis at 3000 revolutions per minute.
Calculate the angular speed ω of the disc. [2]
b Laws of rotational motion can be deduced by comparison with Newton’s laws of linear motion.
Copy out and complete the table below by stating the equivalent formulae, in words, for rotational
motion. [2]
Linear motion Rotational motion
work = force × displacement 17
momentum = mass × velocity
c
The diagram below shows a flywheel of mass M and thickness t with radius R. The uniform density of the
flywheel is ρ.
R
R t
ω
(i) U se integration to derive an expression for the moment of inertia I of the disc. You may wish to
draw a diagram to illustrate your working. [4]
(ii) The flywheel has a mass of 1500 kg and a moment of inertia of 4.8 × 102 kg m2. Calculate the radius
of the flywheel. [2]
(iii) Determine the rotational kinetic energy of the disc, when rotating at 3000 rpm. [3]
d
The drivers of the Gyrobus found that it did not handle as expected, particularly when the bus tilted
during a turn (for example on a slightly banked turn). Suggest why they found this. [2]
S18: Gravitation
Learning Outcomes
■ state Kepler’s laws of planetary motion:
■ planets move in elliptical orbits with the Sun at one focus
■ the Sun-planet line sweeps out equal areas in equal times
■ the orbital period squared of a planet is proportional to its mean distance from the
Sun cubed
■ understand energy transfer by analysis of the area under a gravitational force–distance graph
■ calculate escape velocity using the ideas of gravitational potential energy (or area under a
force–distance graph) and energy transfer
S18.1 Kepler’s laws

For thousands of years, the motions of stars and planets have been studied by people who
we would now call scientists. Leading philosophers from ancient Greece (including Eudoxus
and Aristotle) developed a model of the Universe in which the planets were fixed on rotating
spheres centred on the Earth, and then these spheres were in turn surrounded by a ‘sphere of 1
the fixed stars’. This Earth-centred model is called a ‘geocentric’ theory.
This basic geocentric model predicts that the planets move in circular paths around the
Earth. When seen from the Earth, the planets should always appear to move in the same
direction across the sky. This is not what is observed: the path of the planet, seen from the
Earth, sometimes changes direction and the planet appears to move backwards (retrograde
motion) across the sky, before resuming its original direction of motion. The first Greek
geocentric models of the Universe could not explain this retrograde motion. The model was
modified by two more Greek philosophers, Apollonius and Hipparchus. They introduced the
idea of epicycles: the planet moves on a smaller circle with a centre that orbits the Earth. The
path that the centre of the epicycle took was called the deferent, and the centre of the deferent
was offset from the position of the Earth. These terms are illustrated in Figure 18.1. This new
model could produce retrograde motion.
Another Greek philosopher, Ptolemy further modified the model to predict the motions
of the planets more accurately. He found that if the deferent rotated about a point other than
its centre (the epicycle containing the planet being at a fixed point on this rotating deferent),
then it matched observations much more accurately. However, he could only do this by
making the model much more complicated than the original geocentric model.
planet
centre of the epicycle,
point about which
the planet rotates
deferent
epicycle
point about which the

deferent rotates
centre of the deferent,

the circle on which the
epicycle moves
Earth
Figure S18.1 The Ancient Greek model of planetary motion.
A key principle in science is that of ‘Occam’s Razor’, named after the English monk and
philosopher William of Occam. This states that ‘among competing hypotheses, the one
with the fewest assumptions should be selected’. In other words, if there are two competing
theories that make exactly the same predictions (and match the experimental data), the 2
simpler one is better. The Polish astronomer and mathematician Nicolaus Copernicus
(1473–1543) developed a different model using another Greek idea from the philosopher
Aristarchus. This model had the Sun at the centre (a heliocentric model), where the planets
move in circular orbits around the Sun, and the Moon orbits the Earth. In this model, the
stars remained on a fixed sphere but at a very great distance from the Sun. This was a much
simpler theory than that of Ptolemy and explained some, but not all of the measurements of
the motion of the planets.
The Danish astronomer Tycho Brahe (1546–1601) made a large number of extremely
accurate observations of the apparent movements of the planets and stars. Many of these
observations could not be explained by a Copernican model using circular orbits. The
German astronomer Johannes Kepler (1571–1630) inherited Brahe’s data after Brahe’s death.
Kepler accepted Copernicus’ idea of a heliocentric solar system (which was controversial at
the time for philosophical and religious reasons), but he realised that in order to fit the data,
the planets had to move in elliptical orbits. The key point here is that the uncertainties in the
observations were small enough to distinguish between these two similar models, which was
incredible given that they were taken without the aid of a telescope. Kepler developed the
following three laws.
Kepler’s laws of planetary motion

• All the planets move in elliptical orbits with the Sun at one focus of the ellipse.
• A line drawn from the Sun to the planet will sweep out equal areas in equal times as the
planet moves in its orbit (see Figure S18.2a).
• The period of a planet’s orbit squared is proportional to its mean distance from the Sun
cubed: T 2 ∝ r 3 (see Figure S18.2b).
Sun is at one focus of a

planet’s elliptical orbit
(Kepler’s first law)
A
planet moves Sun, at focus O

from A to B
in time t
area OAB = area OCD D

C
(Kepler’s second law)
planet moves
from C to D
in same time t
250
T ∝ r2
3
T∝ r 2 Pluto
200
3
Period, T / years
Neptune
150
100
Uranus
50 Mars Saturn T∝ r
Jupiter
0
0 1 2 3 4 5 6
Mean distance, r / 109 km
Figure S18.2 a Kepler’s first and second laws – the planets follow elliptical orbits with the Sun at
one focus (Kepler’s first law), and the line joining the planet to the Sun sweeps out equal areas in
equal times (Kepler’s second law). b Kepler's third law for our solar system.
Kepler’s laws were empirical – which means that they were developed from observations
without being based on a physical theory. The English scientist Isaac Newton (1642–1726)
proposed just such a theory, which suggested (as we have seen) that the force of gravity
between two objects is inversely proportional to the square of the distance between them.
Newton showed that this ‘universal theory of gravitation’ could be used to explain all of
Kepler’s laws. We have already used Newton’s theory to derive Kepler’s third law for the case
of a circular orbit in Chapter 18 of the Coursebook. Kepler’s third law can also be derived for
a more general, elliptical orbit, but that is beyond the scope of this course.
question
18.1 The Earth’s orbit is not very elliptical – the Earth’s closest approach to the Sun is
1.47 × 108 km and its greatest distance from the Sun is 1.52 × 108 km.
a Draw a sketch of the orbit and indicate the points of closest approach (A) and
greatest distance to the Sun (B). Exaggerate your sketch so that the ellipticity is
apparent.
b By considering the time taken to sweep out a small area ∆A, use Kepler’s second law
to estimate the ratio between the Earth’s orbital speeds between points A and B.
c Repeat the calculation in part b for Pluto, where the distance of closest approach to
the Sun is 4.44 × 109 km and the greatest distance from the Sun is 7.38 × 109 km.
S18.2 Gravity is always attractive

The gravitational and electrostatic forces both follow an inverse square law. The gravitational
force between two masses is always attractive, but the electrostatic force between two like
charges is repulsive. You will sometimes see the equation for force from Newton’s law of
gravitation written with a minus sign to take account of this. Newton’s law of gravitation
is really a relationship between vector quantities, i.e. quantities with both magnitude and
direction. The force on mass 2 due to the presence of mass 1 is directed from mass 2 to mass
1, that is to say in the opposite direction to the displacement of mass 2 from mass 1. The
difference between the direction of the force and the displacement is indicated by the minus
sign in the equation.
S18.3 P
otential energy and gravitational 4
force–distance graphs
Remember that we defined gravitational potential at a point as the work done per unit mass
in bringing a mass from infinity to the point. Since the gravitational force is always attractive,
in the opposite direction to the displacement from the object with mass M, the expression for
gravitational potential contains a minus sign:
GM
φ=−
r
The minus sign means that even though the magnitude of the potential decreases as you
move the test mass away from the mass M, the change in potential as you move the test mass
away is positive (i.e. work is done to separate the masses).
Two objects have gravitational potential energy because they are each within the other
object’s gravitational field. We define the objects to have zero potential energy when they
are infinitely far apart. Using the expression for the gravitational potential given above, we
can calculate the potential energy of one object within the gravitational field of another. For
example, if we know the mass of a satellite orbiting a planet, and the gravitational potential
of the planet at the position of the satellite, we multiply the potential by the mass of the
satellite to get the gravitational potential energy E. This quantity is the equivalent of the work
done in bringing the satellite from infinity to that point within the planet’s gravitational
field. For two objects of mass m1 and m2, the gravitational potential energy is given by the
equation below. The GPE is negative because the force is attractive:
Gm1m2
E=−
r
Another way of deriving this result is from Newton’s law of gravitation. We can do this in
two ways – graphically, or by integration. Figure S18.3 shows a force–distance graph for a
mass in a gravitational field.
F
Figure S18.3 Force–distance graph for a mass in a gravitational field. The force has a minus sign
because it is in the opposite direction to the displacement r of the mass.
The shaded area on the graph represents the change in gravitational potential energy as the
mass is moved from one position to another. Remember that if we are moving the masses
together, the change in potential energy will be negative, and if we are moving them apart, it
will be positive. It is always worth double-checking whether you have this the right way round!
Let’s try doing this by integration. We are going to bring mass m2 into the gravitational
field of mass m1, and see how much work is done. This will be the gravitational potential
energy that these masses have in that particular configuration (compared to when they are
infinitely far apart). Since the force changes as the mass is moved, we must move the mass a
small increment dx and multiply by the force at that radius, and then add up contributions
from the range of radii we are interested in. The work done in moving the mass by dx (we
5
will take dx as being positive moving away from mass m1) is:
Gm1m2
dW = dx
x2
Let us double check the signs: we are moving the masses away from each other as we increase
x, so because gravity is attractive we expect to have to do work to do this, so we expect dW to
be positive, as it is.
Now, to get the potential energy in moving the mass from infinity to r, we need to
integrate between limits. Notice that infinity is the lower limit as we are starting there.
r r r
Gm1m2  Gm1m2  Gm1m2
∫
E grav = dW = ∫ (x )2 dx =  −
 x ∞ =−
r
∞ ∞
Now let us check that this still makes sense. If we’re bringing the object in from infinity to
a point in the field, then because the field is attractive we expect the potential energy to be
negative – giving us the minus sign that we indeed have!
question
18.2 The Earth has a radius of 6400 km, and a mass of 6.0 × 1024 kg.
a Calculate the change in gravitational potential in moving from the surface of the
Earth, at a distance of 6400 km from the centre of the Earth, to the orbit of the
International Space Station (ISS), at 410 km above the Earth. Explain the sign of the
change in gravitational potential that you calculated.
b An astronaut of mass 75 kg travels to the ISS. What is the change in her potential
energy between the start and end of the journey?
S18.4 Escape velocity

We can think of the gravitational field around a planet as a ‘well’ (look again at the graph
of gravitational potential in Figure S18.3). To escape from the gravitational field, an object
needs to ‘climb out’ of this well. The escape velocity is the velocity the object needs to achieve
to escape from the gravitational well without any further acceleration. This is the velocity
at which the object’s kinetic energy is equal to the magnitude of the gravitational potential
energy.
Using the expression we just derived for gravitational potential energy and the formula for
kinetic energy, we can write down that at escape velocity ve, the following relationship is true
for a body of mass m escaping from the gravitational field of a body of mass M:
GMm 1 2
= mve
r 2
Rearranging gives us the escape velocity:

2GM
ve =
r
questionS
18.3 Calculate the escape velocity at the surface of each of these objects:
The Earth (mass 5.97 × 1024 kg, radius 6370 km)
The Moon (mass 7.35 × 1022 kg, radius 1740 km)
The Sun (mass 1.99 × 1030 kg, radius 6.96 × 105 km)
6
18.4 A star three times the mass of our Sun can collapse to form a black hole after all the
resources it needs for nuclear fusion to occur have been used up. A black hole is a
region of space where the escape velocity from the gravitational field is greater than
the speed of light. Calculate the radius of an object with three times the mass of the
Sun, where the escape velocity at the surface would be the speed of light (this radius
is known as the Schwarzschild radius).
Summary
■ Kepler’s first law of planetary motion: all the planets move in elliptical orbits with the
Sun at one focus of the ellipse.
■ Kepler’s second law of planetary motion: a line drawn from the Sun to the planet will
sweep out equal areas in equal times as the planet moves in its orbit.
■ Kepler’s third law of planetary motion: the period of a planet’s orbit squared is
proportional to its mean distance from the Sun cubed: T 2 ∝ r 3.
■ The area under a gravitational force–distance graph provides a way to analyse
changes in gravitational potential energy of a mass in a gravitational field.
■ Escape velocity can be determined by calculating the energy required to take a mass
from its initial position in the gravitational field to infinity (by using the expression
for gravitational potential or the area under a force-distance graph). The kinetic
energy that the body has at escape velocity is equal to the potential energy it gains
when it is taken out of the gravitational potential well.
■ The velocity required to escape from the gravitational field of a body of mass M is
2GM
given by ve = .
r
S19: Oscillations
Learning Outcomes
■ show that the condition for simple harmonic motion leads to a differential equation of the
d2x
form = −ω 2 x and that x = A cosω t is a solution to this equation
dt 2
2
■ use differential calculus to derive the expressions v = – Aω sinω t and a = – Aω cosω t for
simple harmonic motion
■ recognise and use the expressions x = A cosω t , v = – Aω sinω t , a = – Aω 2cosω t and
F = –mω 2 x to solve problems
■ understand the phase differences between displacement, velocity and acceleration in simple
harmonic motion
■ show that the total energy of an undamped simple harmonic system is given by
1
E = mA2ω 2 and recognise that this is a constant
2
1
■ recognise and use E = mA2ω 2 to solve problems
2
S19.1 A more mathematical approach to s.h.m. 1
In this section, we are going to work from what we already know about the conditions for
simple harmonic motion, and derive the differential equation that governs it. We can then
show that the solutions to this equation are the sinusoidal oscillations that we have come to
expect for simple harmonic oscillations.
Remember that to have s.h.m. we require a restoring force which is directly proportional
to the displacement from the equilibrium position and acts in the opposite direction to
the displacement (towards the equilibrium point). In a mechanical system we will have
an oscillating mass; if you study physics further you will come across many other examples
where a system can be modelled as a simple harmonic oscillator (or where this model is a
good approximation).
Consider a mass hanging from a spring, as shown in Figure S19.1.
Displaced from
In equilibrium equilibrium
original length original

spring of spring length
constant
x0 equilibrium x0
length - force from
spring balances
weight of mass x
m
Figure S19.1 Mass m suspended from a spring with spring constant k. Displacing the mass from
its equilibrium position results in simple harmonic motion.
Once this system is set up, the mass will rest in equilibrium with the spring extended by an
extension x0. Hooke’s law tells us that if the spring is extended by a distance x0, the restoring
force exerted by the spring is given by F = kx 0 . In equilibrium, this is balanced by the weight
of the mass, mg. So we can calculate the equilibrium position as:
kx 0 = mg
mg
⇒ x0 =
k
If we displace the mass by a distance x downwards from its equilibrium position, the
restoring force from the spring increases to k( x + x 0 ). Remember that in equilibrium, the
restoring force was balanced by the weight, and there was no net force on the mass. Therefore
we know that the unbalanced restoring force is, in fact:
F = − kx
We include the negative sign because the force is in the opposite direction to the
displacement.
Since we know the unbalanced force on the mass, by using F = ma we can calculate the
acceleration. The equation of motion for the mass is therefore:
ma = − kx
Remember, however, that acceleration is the time derivative (rate of change) of velocity,
and velocity is the time derivative of displacement. We say that acceleration is the second
derivative of displacement with respect to time (we differentiate twice). So in fact,
d2x 2
a=
dt 2
and we can express the equation of motion as
d2x
m = − kx
dt 2
d2x k
⇒ 2 = − x
dt m
This is a differential equation, and we can solve it for x to determine how the displacement
of the mass changes with time. Since it is a second-order differential equation (it contains
a second derivative), to solve it we must integrate twice. This means that our solution will
contain two arbitrary constants. This makes sense, because we know that the motion will
depend on the initial position (first constant) and velocity (second constant) of the mass.
In other systems undergoing s.h.m., we may end up with an equation that has a different
coefficient for the term in x. The general form of the simple harmonic motion equation is:
d2x
= −ω 2 x
dt 2
It has the general solution:
x = α cosω t + β sin ω t
where α and β are constants that depend on the initial conditions (position and velocity at
time t = 0). ω is the angular frequency of the oscillation: ω = 2π f . If we compare this general
form of the s.h.m. equation to the equation we derived for the mass on a spring, we can see
that for this system, the angular frequency of oscillation is
k
ω=
m
and therefore the frequency of oscillation is
1 k
f=
2π m
When we use these equations for the mass on a spring, in order to get ω in the correct units
of rad s−1, we must express the stiffness k in N m−1.
In the case where the oscillator starts at its maximum displacement (as is often the case),
the solution can be written as:
x = A cos(ω t )
Here, A is the amplitude of the oscillations and ω is the angular frequency discussed above.
In order to show that this is the correct solution to the s.h.m. equation, we need to
differentiate it twice, since the second derivative of x appears in the differential equation. As
we are doing this, we will also produce equations for the velocity and the acceleration.
If we differentiate the equation for x with respect to t we get the equation for the velocity
of the simple harmonic oscillator at time t. 3
dx
= v = − Aω sin (ω t )
dt
In deriving this equation, we have used the mathematical technique called the chain rule and
the standard result for the derivative of the cosine function. We can then differentiate this
velocity equation again to get an equation for the acceleration of the oscillator at time t.
d2x
= a = − Aω 2 cos(ω t ) = −ω 2 x
dt 2
Since the acceleration is −ω 2 x , this is clearly the correct solution for our original differential
equation.
Figure S19.2 shows sketch graphs of the displacement, velocity and acceleration for a
simple harmonic oscillator. We can use the following trigonometric identity
cos( A + B ) = cos A cos B − sin A sin B

to show that
 π π π
cos  θ +  = cosθ cos − sinθ sin = − sinθ
 2  2 2
So looking at our expressions for v and x, we can say that the phase of v leads x by π radians
π 2
(90°) – this means that we obtain the graph of v by shifting the graph of x by radians along
2
the axis in the negative direction.
π
Similarly, a leads v by radians, and a and x are π radians (180°) out of phase.
2
A
Displacement, x
0
π/2 π 3π/2 2π 5π/2 3π 7π/2 ωt
–A
Aω
4
Velocity, v
0
π/2 π 3π/2 2π 5π/2 3π 7π/2 ωt
–Aω
Aω 2
Acceleration, a
0
π/2 π 3π/2 2π 5π/2 3π 7π/2 ωt
–Aω 2
Figure S19.2 The relationship between displacement, velocity and acceleration
for a simple harmonic oscillator.
question
19.1 Show that x = α cosω t + β sin ω t is also a solution to the s.h.m. equation.
Worked ExamplE S19.1
A 500g mass is hung from a spring with spring constant 0.1 N cm−1. Assume the acceleration
due to gravity, g is 10 ms−2.
a Calculate the extension of the spring when it is at equilibrium.
b The mass is displaced to 5.0 cm below its equilibrium position and released at time t = 0 s. In
the motion that follows, if the displacement below the equilibrium position is x, determine
the equation that describes the motion.
c Calculate the speed of the mass as it passes through the equilibrium position.
d Calculate the magnitude of the maximum acceleration experienced by the mass.
a At the equilibrium extension x0, the restoring force balances the weight:
kx 0 = mg
Therefore
mg 0.5 kg × 10 N kg −1
x0 = = = 50 cm
k 0.1 N cm −1
b Start from the solution to the s.h.m. equation:
5
x = A cos (ω t )
When t = 0, x = 5.0 cm, so A = 5.0 cm.
e can either derive the differential equation and compare it to the standard form to
W
work out ω , or remember that for a mass m on a spring of stiffness k,
k 10 Nm −1
ω= = = 4.8 rad s −1
m 0.5 kg
emember that to get ω in radians s−1, we need to put m and k into SI base units: m in
R
kg and k in N m−1. Note that we have to do this even though we are measuring A and x in
centimetres.
Putting all of this together, the equation describing the motion is
x = 5.0 cos (4.8t)

where x is in cm.
c The mass reaches its maximum speed at the moment it passes through the equilibrium
position. If we look at the equation for the velocity, we can see that the maximum possible
value is
v = ω A = 4.8 rad s −1 × 5.0 cm = 22 cm s −1
d The magnitude of the maximum acceleration is
( )
2
a = ω 2 A = 4.8 rad s −1 × 5.0 cm = 100 cm s −2
question
19.2 Write down the equation describing the motion in the following cases:
a An oscillator which starts from a maximum displacement of 0.2 m and has a
frequency of 10.0 Hz.
S19.2 The simple pendulum

A simple pendulum consists of a point mass suspended from a light, inextensible string. This
set-up is shown in Figure S19.3.
θ
L
FT
L sin θ
m
x
mg cos θ
mg sin θ
mg
6
Figure S19.3 A free-body force diagram of a simple pendulum. The dotted lines represent the
components of the weight resolved in directions parallel and perpendicular to the string.
Applying the angular form (τ = Iα ) of Newton’s second law to the pendulum, we get:
d 2θ
Lmg sinθ = mL2
dt 2
Rearranging and cancelling, we can write this as:
d 2θ g
+ sinθ = 0
dt 2 L
This is the equation of motion for the pendulum. Notice that this equation is non-linear
(because of the sine term) and does not represent s.h.m.
However, for small angles θ (say, less than 10°), we can use the approximation sinθ ≈ θ ,
and then the equation becomes:
d 2θ g
+ θ =0
dt 2 L
g
This is now the s.h.m. equation, with angular frequency ω = .
L
Note that we could also express the equation in terms of the arc length, by using x = Lθ :
1 d2x g
+ x=0
L dt 2 L2
which simplifies to the s.h.m. equation in x, with the same angular frequency:
d2x g
+ x=0
dt 2 L
If we wanted to determine how good an approximation s.h.m. is to the motion of a real
pendulum, we could make a computer model of the original equation and examine how
different it is to s.h.m. for a range of given swing angles.
question
19.5 Determine the length of a pendulum that completes one oscillation per second,
when displaced by a small angle.
S19.3 Energy of an undamped simple harmonic

oscillator
We consider a horizontal spring that may be compressed or stretched from its natural
length and obeys Hooke’s law with stiffness k (see Figure S19.4). The s.h.m. equation takes
the same form as before, but by making the spring horizontal we do not need to include the 7
gravitational potential energy when we are considering the potential energy of the system.
x=0
F
Figure S19.4 A horizontal spring.
Consider the work done in stretching or compressing the spring. Work is done against
the restoring force F = kx . Since the force changes depending on the extension, we cannot
just substitute this simple equation for force into W = Fd . There are two possible ways to
proceed. One is to plot a graph of F against x: the area under the graph is the work done.
By considering the graph in Figure S19.5, we can see that the work done in stretching or
compressing the spring by a distance x is
1
W = E p = kx 2
2
This energy is stored as potential energy in the spring (assuming the spring is ‘ideal’, meaning
that it does not heat up when stretched). We can also obtain this result by integration. If the
spring is stretched by a small increment dx, then a small amount of work, dW, is done:
dW = Fdx = kx dx
Integrating this with respect to x gives us the same equation as we found from plotting the
graph and taking the area under it.
gradient = − k
–x0 +x0
0 x
Figure S19.5 A graph of force vs. extension for a spring.
The system also has kinetic energy due to the motion of the mass:
1
Ek = mv 2
2
The total energy of the oscillator is the sum of the kinetic and potential energies:
1 1
E = E p + Ek = kx 2 + mv 2
2 2
However, we already have expressions for x and v for a simple harmonic oscillator:
8
x = A cos(ω t + δ )
v = − Aω sin(ω t + δ )
Substituting these expressions into the energy equation, we get
1 1
E = kA2 cos 2 (ω t + δ ) + mA2ω 2 sin 2 (ω t + δ )
2 2
k
and using ω 2 = m , this becomes
1 1
E = mA2ω 2 cos 2 (ω t + δ ) + mA2ω 2 sin 2 (ω t + δ )
2 2
Since cos 2 θ + sin 2 θ = 1, this simplifies to
1
E = mA2ω 2
2
This total energy is constant at all times during the oscillations (for undamped oscillations).
Over the course of one oscillation, the energy is transferred from kinetic to potential and
back. All of the energy is in the form of kinetic energy at the point when the mass passes
through the equilibrium point, and all of the energy is in the form of potential energy when
the mass is at its maximum displacement from the equilibrium point. Figures 19.22 and
19.23 in the Coursebook illustrate this graphically.
Although we have derived this result for the case of a mass on a spring, it is in fact a
general result for mechanical simple harmonic oscillators. Certain problems are more easily
solved by first considering the energy of the system, so this equation is a useful problem-
solving tool – see the Worked example.
When a 100 g mass is placed on the pan of a spring balance, the scale reads 100 g and the pan
is displaced downwards by 0.5 cm. The 100 g mass is removed, and then dropped onto the
spring balance from a height of 2 cm above the pan. What is the maximum reading observed
on the scale during the resulting oscillations? Assume that the scale reading and the pan’s
displacement are linearly related, and use g = 10 N kg−1. Also assume that the pan’s mass is
negligible compared to the mass that is dropped into it.
Step 1 Calculate the spring constant for the balance. A force of 1.0 N gives a compression of
0.5 cm, so
F 1.0 N
k= = = 200 Nm −1
x 0.005 m
Step 2 Calculate the angular frequency of oscillations for the 100 g mass on the balance.
k 200 Nm −1
ω= = = 45 rad s −1
m 0.1 kg

Step 3 Calculate the total energy of the oscillations. Since the mass of the pan is much less
than the mass that is landing in the pan, we do not need to include the effects of the
collision and can assume that the mass retains all its kinetic energy. (Note that if the
mass of the pan was significant compared to the dropped mass, we would have to
analyse this as an inelastic collision.) So, the total energy is equal to the potential
energy that the mass had at the start of the drop
E = mgh = 0.1 kg × 10 N kg −1 × 0.02 m = 0.02 J 9

Step 4 Use the formula for the total energy of a simple harmonic oscillator to work out the
amplitude of the oscillations. Rearranging the formula, we get
2E 2 × 0.02 J
A= = = 0.094 m
mω 2
0.1 kg × 45 rad s −1

This corresponds to a maximum reading on the scale of 1880 g.
Summary
■ The condition for simple harmonic motion leads to a differential equation of the
d2x
form = −ω 2 x .
dt 2
■ x = A cosω t is a solution to this equation.
■ The expressions for velocity, v = – Aω sinω t , and acceleration, a = – Aω 2cosω t can
be derived by differentiating the solution to the s.h.m. equation.
■ In simple harmonic motion, the restoring force, F = –mω 2 x .
■ Phase differences arise between displacement, velocity and acceleration; these arise
naturally from the solutions to the differential equation.
■ The total energy of an undamped simple harmonic system is constant and is given by
1
E = mA2ω 2 .
2
S22: Ideal gases
Learning Outcomes
■ explain how empirical evidence leads to the gas laws and to the idea of an absolute scale of
temperature
■ understand that a model will begin to break down when the assumptions on which it is based
are no longer valid, and explain why this applies to kinetic theory at very high pressures or
very high or very low temperatures
■ recall and use the first law of thermodynamics expressed in terms of the change in internal
energy, the heating of the system and the work done on the system
■ recognise and use W = pDV for the work done on or by a gas
■ understand qualitatively how the random distribution of energies leads to the Boltzmann
factor e−E/kT as a measure of the chance of a high energy
■ apply the Boltzmann factor to activation processes including rate of reaction, current in a
semiconductor and creep in a polymer
■ describe entropy qualitatively in terms of the dispersal of energy or particles and realise
that entropy is related to the number of ways in which a particular macroscopic state can be
realised
■ recall that the second law of thermodynamics states that the entropy of an isolated system
cannot decrease and appreciate that this is related to probability 1
■ understand that the second law provides a thermodynamic arrow of time that distinguishes
the future (higher entropy) from the past (lower entropy)
■ understand that systems in which entropy decreases (e.g. humans) are not isolated and that
when their interactions with the environment are taken into account their net effect is to
increase the entropy of the Universe
■ understand that the second law implies that the Universe started in a state of low entropy
and that some physicists think that this implies it was in a state of extremely low probability.
S22.1 Investigating the gas laws

Investigating the different gas laws experimentally is straightforward, but an interesting
challenge because two of the four variables (mass, temperature, pressure and volume) have to
be kept constant.
Investigating Boyle’s law

Boyle’s law relates the pressure and volume of a gas. It can be investigated by attaching
a digital pressure meter to a syringe that initially holds a certain volume of air. Pressing
the syringe plunger down reduces the volume of the gas and raises the pressure. If the
compression of the gas is carried out sufficiently slowly, then the air in the syringe remains at
room temperature and no gas leaks out (meaning that the mass is also kept constant). Data for
pressure (from the meter) and volume (from the markings on the syringe) can be recorded.
Using this method, the pressure can be more than doubled in value. At higher pressures it is
harder to compress the air sufficiently and the seals on the syringe are likely to leak.
Investigating Charles’s law

A similar arrangement can be used to investigate Charles’s law, again using a syringe.
The temperature can be varied using a water bath in which the syringe is fully immersed,
so that all of the gas is at the same temperature. A thermometer in the water records the
temperature, which can be varied from 0 °C (273 K) to 100 °C (373 K) by using an ice-water
mix initially and then a heater or a Bunsen burner. Again the volume markings on the
syringe are useful.
Investigating temperature and pressure

The relationship between temperature and pressure at fixed volume is slightly more
complicated to observe. A pressure sensor is attached to a rigid container full of air, such as
a round-bottomed flask, by using a tube to the pressure sensor connected through a tightly-
fitted rubber bung with a hole in it for the tube. The flask can then be placed in water baths
at different temperatures, for example ice-water, warm water and boiling water, and the
pressure recorded. Again no gas can escape, so the mass of gas is fixed. The container is rigid
so the volume is also fixed. It is very important to leave the flask in the water for long enough
that all the gas reaches the same temperature, as this can be a slow process.
S22.2 Non-ideal gases

When a gas is at either a very low or a very high temperature or pressure, there are two
reasons its behaviour may deviate from that expected of an ideal gas. To derive the kinetic
theory model, we assumed that the gas molecules themselves occupy no volume. This is a
reasonable assumption at low pressure, because the space between the molecules is very
much larger than the size of the molecules themselves. At room temperature and pressure,
every molecule has empty space around it in which about 1000 more molecules could fit. At
2
very high pressures, the actual volume of the molecules is significant compared to the space
between them and so the model begins to break down. We also assumed that the molecules
do not interact with each other except when they collide – meaning that there are no inter-
molecular forces (forces between molecules). At high pressures and low temperatures, the
molecules are sufficiently close together that intermolecular forces have an effect – and at
sufficiently low temperatures or high pressures it is these forces which are responsible for the
gas condensing to form a liquid.
At very high temperatures another assumption breaks down. The kinetic theory model
assumes that collisions between molecules are elastic – in other words, that the kinetic
energy of the molecules cannot be transferred into other forms. No energy is ‘lost as heat’
because heat is the random kinetic energy of the molecules. However, under the right
conditions kinetic energy can be transferred into other forms. For example, in a sufficiently
energetic collision, electrons within the atoms can be excited and even the bonds holding
the atoms together in a molecule can be broken. Again, in these circumstances one of the
assumptions of the kinetic model is no longer true and so the predicted relationship for an
ideal gas breaks down.
S22.3 Doing work on a gas

In Chapter 21 (Thermal physics) the first law of thermodynamics was introduced as
ΔU = q + w
where ΔU is the change in the internal energy of a gas. This law states that internal energy can
be changed either by supplying energy through heating (q) or by doing work on the gas (w).
It is now time to look at the work done on or by a gas in more detail. Consider a piston
containing a gas (Figure S22.1). It has cross-sectional area A and the gas is at pressure p. If
the piston is slowly pushed in to compress the gas, then work is done by the force applied to
the piston (force, F = pA) and it moves through a small distance, x. Hence the work done will
be given by:
w = Fx = pAx
But Ax is the change in volume of the piston, ΔV so this work done can be written as
w = p∆V
gas, pressure p
piston area,
A
compression
distance, x
Figure S22.1 A piston of cross-sectional area A and containing a gas at pressure p
It is important to keep track of the signs in this equation. If the volume decreases then the
gas is compressed, work is done on the gas and its internal energy increases. If the volume
of the gas increases, then work is done by the gas in pushing the piston out and the internal
energy of the gas decreases.
In order to apply this equation, the change in volume must be small enough that the
pressure does not change. It is also important to measure any heating or cooling that occurs.
• If a gas is compressed very slowly, then there is time for energy to flow out into the
environment as work is being done on the gas. Hence there can be a positive w (work
done on the gas) and a negative q (heat flows out of the gas), leading to no change in
internal energy and hence no change in temperature.
3
• If a gas expands quickly, it does work (large negative w) but there is no time for heat flow
(q = 0) so the gas cools. This rapid expansion is used in refrigerators.
When solving problems that involve the first law of thermodynamics, it is important
to understand how the description of a situation can be interpreted using the relevant
thermodynamic variables. The effects on key variables of particular conditions are
summarised in Table S22.1.
Physical description Effects on thermodynamic variables

constant temperature ΔU = 0, q = −w
constant volume w = 0, ΔU = q
fast q = 0, ΔU = w
insulated q = 0, ΔU = w
Table S22.1 Effects on key thermodynamic variables of particular conditions
Look at Worked examples S22.1 and S22.2.
Worked example S22.1
1 A gas in a syringe is compressed by the piston. Its volume is reduced by 10 cm3, by applying a
pressure of 200 kPa.
a Find the work done on the gas.
b Does the internal energy of the gas increase or decrease?
c How could the gas remain at constant temperature even though work is done?
1 a Work done = pΔV = 200 × 103 × 10 × 10-6 = 2 J

b The gas increases in energy as work is done on it
c I f the gas is compressed slowly and is in surroundings that can absorb the heat flow
without warming significantly (a ‘heat sink’), then the gas will remain at constant
temperature.
The pV graph below shows how the pressure and volume of the gas in a cylinder change
around a cycle.
a Use the ideal gas equation, pV = nRT, to explain why returning to the same point on the
graph indicates no change in internal energy
b Describe in words what happens along each of the lines AB, BC, CD and DA
c What is the significance of the area enclosed by the box?
A B
Pressure
D C
Volume
a If p and V are the same then pV is unchanged. This means that nRT is unchanged. As no gas is
added or lost, T must be constant. In a gas the internal energy, U, depends only on T and so
the internal energy must also be constant.
b Along AB the gas is expanding at constant temperature. It is doing work. Along BC the
gas remains at the same volume as the pressure drops. It is not doing work so it must be
cooling – energy must be being taken out of the gas in the form of heat. Along CD the gas is
compressed, and work is done on it. Along DA the pressure increases again, so heat must be
taken in while no work is done. Along CD the pressure is lower, so less work is done on the
gas than the work done by the gas during expansion.
Around the loop ABCD the sum wAB + wCD + qBC + qDA = 0 (as ∆U = 0) and wCD is less than wAB
(and of opposite sign). This means that the amount of heat taken in (qDA) has to be greater
than that taken out (qBC). The net effect is that thermal energy (q) is put in and work (w) is
taken out.
This is an example of a thermodynamic cycle. A thermodynamic cycle can be followed
repeatedly to do work. The combustion engines in vehicles follow similar cycles to extract
work from the thermal energy released by burning fuel. The cycle can also be reversed to
use mechanical work (e.g. from an electric motor) to extract heat, and this is used in the
cooling unit of an air conditioner or refrigerator.
c The area enclosed is (pH – pL)∆V where pH and pL are the high and low pressures on the
graph. This is the difference between the work done by the gas and the work done on the
gas. In other words, this is the net energy transferred from work to heat by the cycle.
question
22.1 A gas of volume 100 cm3 is at temperature 300 K and pressure 100 kPa. It is compressed
slowly to a volume of 90 cm3. It then expands rapidly back to 100 cm3.
a Determine the temperature and pressure after the initial compression.

b What happens to the work done on the gas?

c How much work does the gas do in expanding?

d Describe what happens to the gas (i) as it expands and (ii) in the following few

minutes.
S22.4 Distribution of energy

The kinetic theory of gases allows us to calculate the root-mean-square (r.m.s.) speed of the
molecules in a gas but it is very important to recognise that not all the molecules have same
kinetic energy – there is a distribution of energies. The r.m.s. speed enables us to calculate the
average energy per molecule only. The continuous and random collisions of the molecules
mean that any one molecule may have a kinetic energy that is continuously changing.
However, as one molecule speeds up following a collision, another will slow down. It is the
distribution of energies across large numbers of particles that remains unchanged. Although
there are random processes involved, the very large number of molecules in a typical gas
means the distribution of energies stays constant and we can perform reliable statistical
analysis.
We can represent the full distribution of energies as a graph of the fraction of molecules
against kinetic energy. The general shape of this distribution is the same for all gases; the
height and width of the peak changes with temperature as shown in Figure S22.2. This 5
distribution is often referred to as the Boltzmann distribution.
lower temperature
Fraction of molecules
higher temperature
threshold energy, ET
Kinetic energy
Figure S22.2 Distribution of energy across molecules in a gas at different temperatures
The peak of the distribution represents the highest fraction of molecules with a particular
energy. The value of the kinetic energy corresponding to the peak is the most probable
energy for any individual molecule. At higher temperatures, the peak of the distribution
shifts to the right, meaning the most probable kinetic energy is higher, and overall the
distribution gets wider. The area under the curve represents the total number of molecules.
The distribution also shows that all possible energies are represented: some of the molecules
move very slowly, others move much faster.
This distribution is crucial to understanding a wide range of physical phenomena from
evaporation to chemical reactions. For such processes to take place, some molecules must
have an energy greater than a threshold value, ET, as shown in Figure S22.2. The number
of molecules with an energy greater than this threshold depends on the temperature. We
can see this using the area under the curve, which corresponds to the number of molecules.
The area under the red curve beyond the threshold (higher temperature) is much greater
than that under the blue curve (lower temperature). The number of molecules beyond the
threshold energy is proportional to a quantity called the Boltzmann factor:
N ∝ e − E / kT
Here T is the absolute temperature and k is the Boltzmann constant, which has the value
k = 1.38 × 10−23 J K−1.
We can use this quantity to determine the effect of changing temperature on physical and
chemical processes, for example to find the effect of a 10 °C rise in temperature on the rate of
a chemical reaction – see Worked example S22.3.
A particular chemical reaction requires an activation energy of 3 × 10−19 J and only molecules
with that energy or greater will take part in the reaction. The rate of reaction is proportional to
the number of molecules with an energy greater than this activation energy. Find the ratio of
the number of molecules which can be involved at 30 °C compared to 20 °C. Hence determine
the effect of changing temperature on the rate of reaction.
First convert the temperatures to kelvin: 293 K and 303 K. Then the ratio will be:
e−E/(k × 303)/ e−E/(k × 293) = e−71.7/e−74.2 = 11.6
Therefore a 10 K rise in temperature leads to more than a tenfold increase in the
reaction rate.
Other important processes that depend on a threshold energy are the current in a
semiconductor and creep in polymers.
A semiconductor relies on a small number of electrons being excited to a conduction
band which lies above the valence band, separated by a large energy gap. (See the section
‘Electron energies in solids’ in Chapter 30.) The number of electrons able to enter the
conduction band is determined by the Boltzmann distribution and hence the conductivity
of a semiconductor is very dependent on temperature. This is the basis of the thermistor (see
Chapter 11). Most semiconductors are doped, meaning atoms of other elements are added to
provide conduction electrons. As a result, the conductivity of doped semiconductors is much
less dependent on temperature.
When a material, especially a polymer, is placed under tension, it will extend. The amount
of extension depends on two factors:
• the magnitude of the applied tension, which causes an instantaneous extension
• a quantity called creep, which causes a material to extend more depending on the time for
which the tension is applied.
The amount of creep depends on the material. Even under a constant load, a material
may continue to stretch over time. For many materials this is such a slow process, it may
take hundreds of years to be measurable. However, for many polymers creep is significant,
even at room temperature. For example, very thin plastic shopping bags are easily stretched.
Creep is very important in materials used to make the fan blades of aircraft jet engines,
which operate at very high temperatures and under huge loads. If the blades extend even by
a tiny amount they can hit the outer casing of the engine and cause serious damage. Creep is
again dependent on the Boltzmann distribution: for a material to undergo creep, individual
molecules must exceed a certain threshold energy before they can move within the structure
of the material.
question
22.2 The creep rate of a polymer is proportional to the Boltzmann factor. For a given
polymer the threshold energy for creep to occur is 4.5 × 10−18 J. Compare the creep rate
at 0 °C and 25 °C.
question
22.3 Semiconductor A has a band gap of 1.01 × 10−19 J and semiconductor B has a band gap
of 2.23 × 10−19 J. Which will show a greater temperature-dependent current over a
range of 10 °C to 60 °C?
S22.5 Entropy
Entropy is a very important concept in physics, but a difficult one to understand at first. Just
because a reaction can happen energetically, does not mean it will happen. We can determine
only the probability that a reaction will occur. A particular reaction may be possible, but also
highly unlikely.
First consider this example. A box has a divider in the middle. To the left of the divider
are molecules of gas A; to the right are molecules of gas B. When the divider is removed,
the gases will mix, but what causes that? The answer is simply that mixing is the most likely
thing to happen. What we observe is called the macrostate – that the gases are mixed. Other
macrostate observables include thermodynamic variables such as pressure and temperature.
To understand the reasons for this, we need to think about the microstates – the 7
arrangements of all the individual molecules, which is obviously something we cannot
observe and measure directly for each individual molecule. Look at Figure S22.3, which
shows possible arrangements of just 10 molecules – shown in the diagram as black and white.
Only one arrangement has all of the black on the left and all of the white on the right. There
are 5 ways to have a 4:1 mix on each side as any one of the 5 molecules could cross into the
other half. There are 5 × 4 = 20 ways for there to be a 3:2 mix on each side (any of the 5 can
first cross and then any one of the remaining 4). A 2:3 mix has 20 ways and so on.
5 molecules either side can

only be arranged one way
4:1 split can be arranged five ways
Figure S22.3 Possible arrangements of ten

molecules in two sections of a box separated
3:2 split can be arranged twenty ways by a divider
We can represent this as a bar chart in Figure S22.4.
20
15
Ways
10
0
5:0 4:1 3:2 2:3 1:4 0:5
Arrangement
Figure 22.4 Chart of the possible arrangements of ten molecules in a box separated into two
sections by a divider
This is just for 5 molecules of each gas; even in this very limited example we can see it is
20 times more likely that the gases will mix more or less evenly than that they will separate.
With a mole of molecules, the probability of the gases separating is so small that you could
wait for the entire lifetime of the Universe and still not observe that state. It is not completely
impossible – it is just overwhelmingly improbable.
The quantity called entropy measures the number of possible microstates when the
conditions of a particular macrostate (such as temperature and pressure) are applied.
Entropy is sometimes described as the amount of disorder within a system. In our example
of a small number of molecules forming a mixture, the entropy is highest for there being a
3:2 mix of A and B on either side of the divider when it is removed. The entropy is lowest for 8
the states in which A and B remain separate. So the most probable outcomes are the ones
with the highest entropy.
The mixture macrostate described above is easy to visualise. However, the main use of
entropy is in understanding the distribution of energy, using the same reasoning as our
mixture example to explain why heat is transferred from hot places to cold places: it results
in a situation that is considerably more probable. One way to picture energy distribution is
to think of energy as little packets, distributed among molecules much like the molecules
in our mixture example were distributed within the box. The most probable outcome is a
distribution of different energies across the molecules rather than all the energy being with
just a few molecules. Mixing hot (high energy) and cold (low energy) is far more likely to
lead to an even distribution of energies producing an equalised temperature, rather than any
other outcome.
In many ways this is one of the most satisfying results possible. Rather than an arbitrary
law that forces something to produce a particular outcome, the most likely outcome happens
because of chance.
As an example, consider three units of energy to distribute across three molecules

(A, B and C). The possible arrangements are:
A B C Microstate Arrangement
1 1 1 1 111
3 0 0 2
0 3 0 3 300
0 0 3 4
0 1 2 5
0 2 1 6
1 0 2 7
210
1 2 0 8
2 1 0 9
2 0 1 10
Among the ten possible microstates, there are three different arrangements of energy. The
macrostate with the energy arranged evenly can actually only happen in one way. Having all
the energy in one molecule can happen in three ways and the third macrostate, with 2 units
of energy in one molecule and 1 in another, can happen in six different ways. It is important
to remember that the energy will constantly redistribute between these ten microstates. The
most probable macrostate is the third arrangement, because there are six different ways to
9
achieve this distribution.
It is worth re-reading this example and explanation. At first, it may seem surprising. You
might think that the even distribution is, logically, the most likely. However, this distribution
of energy and the probabilities tell us that the ‘2 1 0’ arrangement is much more likely to
occur: if we could stop time and take a snapshot of the energies, there is a 6 in 10 chance we
will find the ‘2 1 0’ arrangement, a 3 in 10 chance we will find ‘3 0 0’ and just a 1 in 10 chance
we will find ‘1 1 1’.
The power of entropy becomes apparent when we consider energy distributed across very
many more molecules. Particular arrangements become overwhelmingly likely. We see that
the distribution of energy which led to the Boltzmann factor arises simply from the laws of
probability.
Entropy measures the number of ways in which something can be arranged. Because the
laws of probability make the most likely macroscopic state to be the one where there are the
largest number of microstates, systems will tend to evolve into higher entropy situations just
by chance.
S22.6 The second law of thermodynamics

This idea is formalised in the second law of thermodynamics which is usually expressed
as ‘in an isolated system, entropy must increase or stay the same’. It is one of the most
fundamental and powerful ideas in physics. It is often expressed incorrectly as the idea
that ‘disorder increases’ in any situation, because one might see a certain state to be more
‘ordered’ or ‘organised’. There are many more ways in which a room can be untidy than
properly arranged; many more ways the books on a shelf can be out of order than in
alphabetical order. But these are not proper examples of entropy, which really only applies to
the distribution of energy.
The second law of thermodynamics strictly only applies to isolated systems – ones where
there is no energy transfer in or out. It is possible to decrease the entropy of a system, but to
do so requires another system to do work on it, and this second system generates heat and
increases the entropy of the surroundings. An example is a refrigerator: the contents can be
cooled but only if another system, the motor on the outside, extracts the heat and transfers it
to the surrounding room. The refrigerator is not an isolated system.
Another way of expressing the second law of thermodynamics is that in an isolated
system, entropy cannot decrease. This is important to remember when we consider how
living things appear to decrease entropy, arranging molecules into useful structures,
concentrating energy in non-random ways. However, living things are not isolated systems –
the fact that they are alive means there is a constant transfer of energy in and out. For plants,
that energy transfer comes from the Sun and nutrients; for animals it comes mainly from
the chemical energy released in respiration, fueled by food. It is true that living creatures can
reduce entropy locally – for themselves – but only at the cost of increasing entropy globally.
The largest isolated system known is the entire Universe. No energy is transferred into or
out of the Universe and so its entropy cannot decrease. The second law of thermodynamics
implies that the Universe must have begun in a very low entropy state in order for its entropy
to be increasing continually. Some physicists think that this, in turn, suggests that the state
in which the Universe formed was one of very low probability, which raises interesting
philosophical questions beyond the scope of this text.
The arrow of time

The second law of thermodynamics has very important implications for our concept of time.
When we say that entropy increases, this is a physical law that depends on the direction of
the flow of time – from low entropy to high entropy. Other laws of physics are symmetrical
in time. For example, all of the laws of mechanics work exactly the same way in reverse as
forwards. If you were to play a video of two snooker balls colliding, you would not be able to 10
tell if it was running forwards or backwards. The collision obeys the same laws of physics,
such as conservation of momentum, whether time runs forwards or backwards. The idea
of a direction for time – a so-called ‘arrow of time’ – may seem odd when we are used to
living our lives ‘forwards’ through time, but there is a fundamental question here: how can
this direction of time arise when all the other laws of physics are symmetrical in time? For
example, the distribution of energy or the spreading out of gas molecules depends only on
basic Newtonian mechanics – collisions and motion.
Of course, we instinctively seem to know that entropy increases. For example, if we saw a
video of a heavy object ‘falling upwards’ (away from the ground) and slowing down as it does
so, we know that either someone is playing a trick on us, or that the video must be playing in
reverse. Things that we observe in our local gravitational field – that objects fall and accelerate
near the surface of Earth – confirm the ‘arrow of time’. Yet it is curious to notice that in physics
we need the second law of thermodynamics as a way of explaining a ‘direction’ for time.
One of the hypotheses resulting from entropy forever increasing in an isolated system is
that there is a cosmological arrow of time. We know from the work of the astronomer Edwin
Hubble that the Universe is expanding, and has been since its origin in the Big Bang (see
Chapter S35). The cosmological arrow of time points in the direction of the expansion of
the Universe, which itself is (as far as we know!) an isolated system. Some theories suggest
that the second law of thermodynamics is itself a result of the initial conditions in the early
Universe. The implication of the arrow of time is that, at some time unimaginably far into
the future, entropy will reach a maximum so that nowhere in the Universe can useful work
be done. All energy will have been transferred to heat energy, and there will be a thermal
equilibrium everywhere in the Universe. This is referred to as the concept of ‘heat death’ of
the Universe.
Summary
■ The ideal gas laws can only be explored by controlling two of the variables (mass,
pressure, volume and temperature) while changing one and measuring the fourth.
■ The work done on a gas by compressing is pDV
■ On a p-V graph, a cycle which returns to the same point will return the state to the
same internal energy, but the area enclosed by the graph shows the amount of
energy exchanged between work and heat
■ In a gas, the molecules have a distribution of energies
■ The Boltzmann factor e(−E/kT) gives the proportion of molecules in a gas which have
energy above a certain value E at absolute temperature T
■ The distribution of energy amongst molecules is purely due to chance, with the
likelihood of a given state being measured by its entropy
■ The Second Law of Thermodynamics states that in an isolated system, the entropy
cannot decrease over time
■ This gives an “arrow of time” to physical systems where the individual laws of
physics sow not asymmetry with time
11
Cambridge pre-u physics
end-of-chapter questions
S13: Waves and optics
S22.1
Explain the sequence of events in the thermodynamic cycle ABCD shown below:
isothermal
A (constant temperature)
Pressure
Volume
S22.2
A gas is compressed by 1500 cm3 at a pressure of 100 kPa. The internal energy of the gas increases by 100 J.
Determine the amount of heat transferred into or out of the gas.
S22.3
The physicist James Clerk Maxwell suggested a thought experiment, in which two containers A and B are fi lled
with a gas. A and B are connected by a tiny door. A tiny creature controls the door. The creature only allows fast
molecules to pass the door into box A. Slow molecules are only allowed into box B. Gradually, the gas in box A
will increase in temperature, and the gas in box B will decrease in temperature. These two boxes could then run
a ‘heat engine’ that can provide useful work, and the gas will be mixed again. Suggest how the second law of
thermodynamics aff ects this thought experiment. 12
S22.4
A teacup falls to the fl oor and smashes. Is this a reversible process in principle? What about in practice?
S23: Coulomb’s law
Learning Outcomes
■ understand the relationship between electric field and potential gradient, and recall
dV
and use E = −
dx
Q1Q2 QQ
■ use integration to derive W = from F = 1 22 for point charges
4πε 0r 4πε 0r
Q1Q2
■ recognise and use W = for the electrostatic potential energy for point charges
4πε 0r
S23.1 More on electric potential

In Chapter 23 of the AS and A Level Coursebook, in the section on electric potential, we
defined electric potential and related it to electric field strength.
The electric potential (V) at a point is equal to the work done in bringing unit positive
charge from infinity to that point. 1
field strength = –(potential gradient)
In the language of calculus, we can write this as the derivative of potential with respect to
distance:
dV
E=−
dx
However, when we use this relationship to find the field strength at a given point, we need
to remember that the electric field strength is a vector – so at any point in space it has both
a magnitude and a direction. In which direction must we take our step dx in order that the
corresponding potential gradient gives us the correct field strength? It turns out that we must
move in the direction of fastest change in electric potential (the steepest slope). This distance
is perpendicular to the equipotential lines – although we need to remember that in three
dimensions, these equipotentials are actually surfaces, not just lines. Figure S23.1 illustrates
this idea.
equipotentials
+Q
E = –dV/dx
and is perpendicular
to the equipotentials
Figure S23.1 The electric field strength at any point is a vector which is perpendicular to the
equipotential lines or surfaces, with a magnitude equal to the negative of the potential gradient in
that direction.
Worked example 2 in Chapter 23 of the AS & A Level Coursebook shows how to calculate
the electric field strength from a graph of electric potential against the distance moved
perpendicular to the equipotential lines. If we know a function V ( x ) that describes how
the potential changes along such a graph, we can use calculus and the expression above to
calculate the electric field strength.
We already know the functions V ( x ) for particular situations. For example, for a point
charge Q we know that the potential at a distance r from Q is
Q
V=
4πε 0r 2
The equipotentials are spheres, centred on the charge. This means that the electric field,
which is at right angles to the equipotentials, must be radial. So we can differentiate our
expression for the potential V with respect to r to obtain the electric field strength:
dV d Q  Q
E=− =−   =
dr dr  4πε 0r  4πε 0 r 2

You will recognise this as the correct expression for the electric field strength at a distance r
from a point charge Q.
In fact, any field where the force changes according to an inverse square law has the same
property: the field strength vector can be expressed as the gradient of a scalar potential
function. In Section S18.3 we saw this with the gravitational field. This has the consequence
that in any such field, when moving from one position to another there is a change in
potential energy that is independent of the path taken. You can follow any path from one
point to another, whether the path is short and direct or long and taking many turns, and the
net work done between the two points against the force produced by the field is the same. In
your physics studies, you have been making use of this idea for a long time. For example, you
know that the same amount of work is done by gravity regardless of whether you jump off a
cliff to reach the bottom or take a gentle, winding path down (nevertheless, you may prefer
one route for other reasons!). Remember, though, that not all forces have this property –
if you push a box against a frictional force, you do more work if you take a longer path.
Another, less obvious, example where the work done is not independent of path is the force
on an electrically charged particle in a magnetic field.
Electrostatic potential energy for point charges

We can use Coulomb’s law to determine the potential energy of two point charges Q1 and
Q2, separated by a distance r. Coulomb’s law allows us to calculate the force between the two
charges, which is directed radially along the line between the two charges:
Q1Q2
F=
4πε 0r 2
(when F is positive, the force is repulsive).

Since the force changes with r, in order to calculate the work done against this force, we
have to integrate. As the charge Q2 is moved radially away by a displacement dx, the work
done is dW:
Q1Q2
dW = Fdx = − dx
4πε 0 x 2
Notice the minus sign. When both charges have the same sign, the work done is negative.
This is to be expected – when both charges have the same sign, the force is repulsive, so the
charges are in a lower energy configuration when they are moved further apart. When the
charges have opposite signs, the work done is positive: the charges attract each other, so work
has to be done to separate them.
In order to calculate the total work done, we need to integrate the expression for dW
from infinity to radius r. We use the variable x in our expression for the derivative to
avoid confusion between the radius r (which is one of the limits for x) and the variable of
integration. The integral we need to evaluate is:
3
r
Q1Q2 r QQ  QQ

W= ∫
∞ 4πε 0 x
dx =  1 2  = 1 2 − 0
 4πε x
0  ∞ 4πε 0 r
So the potential energy associated with two point charges Q1 and Q2, separated by
distance r is:
Q1Q2
W=
4πε 0r

You need to remember how to produce this derivation.
If we consider the potential energy of a unit positive charge by setting Q2 to 1 C, then we
get the expression for electric potential:
Q1
V=
4πε 0r
Remember that the units of potential V are J C–1 and the units of potential energy
(work done) W are J, so the dimensions are consistent.
questions
23.1 What is the potential energy associated with a +40 µC charge at a distance of 1.5 m from
a +20 µC charge?
23.2 What is the potential energy associated with a +40 µC charge at a distance of 1.5 m from
a –20 µC charge?
Summary
■ We can express the relationship between field strength and potential gradient in
dV
mathematical terms as E = − . When using this, we must calculate dV in a direction
dx dx
at right angles to the equipotential lines or surfaces.
■ By integrating an expression for work done against the Coulomb force as we move
a charged particle a distance dx in the electric field of another charged particle, we
can obtain the electrostatic potential energy associated with two charged particles
QQ
separated by a distance r. This expression is W = 1 22 .
4πε 0r
4
S24: Capacitance
Learning Outcomes
■ analyse graphs of the variation with time of potential difference, charge and current for a
capacitor discharging through a resistor
■ define and use the time constant of a discharging capacitor τ = RC t
−
■ analyse the discharge of a capacitor using equations of the form x = x 0 e RC
S24.1 Capacitor discharge

Chapter 24 of the AS & A Level Coursebook covers the fundamentals of capacitance. Here we
are going to look at how potential difference, charge and current change over time as a capacitor
discharges through a resistor. The discharge of a capacitor follows an exponential decay curve;
you may recognise the form of the mathematical expressions, as they are similar to those used
when considering radioactive decay (see Coursebook Chapter 31 ‘Nuclear physics’).
Imagine that we charge a capacitor, of capacitance C, so that it has a potential difference
across it of V0 . We then allow the capacitor to discharge through a resistor of resistance R
(Figure S24.1). 1
Figure S24.1 A capacitor is discharged through a resistor.
The capacitor is connected to the resistor at time t = 0. The charge, Q(t), potential
difference, VC(t), and current, I(t), all vary with time as the capacitor discharges. The equation
relating the charge to the potential difference is:
Q (t ) = CVC (t )
The equation that governs the potential difference across the resistor is:
Vr (t ) = I (t ) R
Remember that the current in the capacitor is the rate of flow of charge, so we can write
dQ
I (t ) =
dt
and thereby express the equation governing the potential difference across the resistor in
terms of charge:
dQ
Vr (t ) = R
dt
Kirchhoff’s second law tells us that the sum of the potential differences around a loop in a
circuit must be zero, therefore
VC (t ) + VR (t ) = 0
So the potential differences across the capacitor and resistor are of the same magnitude but
opposite in sign. Now, we can substitute in the equations governing the capacitor and the
resistor, to get a differential equation for the charge, Q(t):
Q dQ
+ R =0
C dt
dQ Q
⇒ + =0
dt RC
You may be familiar with this equation: its solution (provided 1/RC is positive, which it is)
is an exponential decay. We will solve this equation now; this form of equation comes up so
often in physics that it is well worth remembering the differential equation and its solution.
We start by separating the variables and then integrating with respect to time t.
1 dQ 1
=−
Q dt RC
2
1 dQ 1

⇒ ∫ Q dt
dt = ∫ −
RC
dt
1 t

⇒ ∫ Q dQ = − RC + k
t
⇒ ln Q = − +k
RC
Remember that integrating 1/Q gives us ln Q, the natural logarithm of Q; k is a constant

of integration. For the next step, we need to remember that the natural logarithm is the
logarithm to the base e, where e is Euler’s number. This means that e ln Q = Q . Therefore we
exponentiate both sides of the equation:
 t 
− +k
ln Q  RC 
e =e
And this gives us the result

t
−
Q (t ) = Ae RC
which we have written in terms of a new constant A, such that A = e k .

Now, since we know that the initial charge was Q(0) = Q 0,
t
−
Q (t ) = Q0e RC
This means we can also determine the potential difference and the current. The potential
difference is given by
t t
Q0 − RC −
V (t ) = e = V0e RC
C
In order to find the current, we need to differentiate the expression for charge with respect to
time:
t t
dQ Q − −
I (t ) = = − 0 e RC = I0e RC
dt RC
(the minus sign produced by differentiation tells us that the charge flows out in the opposite
direction to which it flowed in).
Now we have the expressions for charge, current and potential difference as functions of
time; all three exhibit an exponential decay from their initial value. Figure S24.2 illustrates
these functions.
t
−
Q (t ) = Q0e RC
t
−
I (t ) = I 0e RC
t
−
V (t ) = V0e RC
a
Q0
Q(t) / C
0
0 t = RC,
time at which Q = Q0/e t/s

b V0
V(t) / V
0
0 t = RC,
time at which V = V0/e t/s

c
I0
I(t) / A
0
0 t = RC,
time at which I = I0/e t/s

Figure S24.2 Charge, potential difference and current as a capacitor is discharged.
WORKED EXAMPLE S24.1
In terms of R and C, how long does it take for the charge in a capacitor to drop to half its
initial value?
We need to solve the equation:

t
Q0 −
= Q0e RC
2
Cancelling the Q0, and taking natural logs of both sides, we get:
1 t 4
ln = −
2 RC
t = RC ln 2
In the worked example we showed that the time taken for the charge in a capacitor to drop
to half its initial value was RC ln 2. The quantity RC is known as the time constant for the
circuit containing a capacitor connected to a resistor. The time to discharge to a certain level
is, as we have seen, proportional to RC.
Time constant for a capacitor: τ = RC
In fact, the time RC is the time that it takes for the charge in, current in and potential
difference across a discharging capacitor to drop to a factor of 1/e of their original values.
This is shown on the graphs in Figure S24.2.
question
24.1 A capacitor with a capacitance of 1000 µF is used in a time-delay circuit. The capacitor
is charged to 4.0 V and discharged through a 47 kΩ resistor. When the potential
difference across the capacitor drops to 0.7 V, a transistor circuit is switched off.
a Calculate the time taken for the circuit to switch off (i.e. for the capacitor to
discharge to 0.7 V).
b An electrical engineer swaps the capacitor for a 2500 µF capacitor, but wants the
time taken for the circuit to switch off to remain the same. What value of resistance
should they substitute for the 47 kΩ resistor?
S24.2 Capacitor charging

What happens to the charge, current and potential difference as we charge a capacitor? It
must still obey Kirchhoff’s Laws, but we now have an additional voltage source in the circuit
(Figure S24.3).
Vcell C VC
R VR
Figure S24.3 Charging a capacitor.
Therefore we have:
Vc(t) + VR(t) = Vcell

which means that
Q dQ
+ R = Vcell
C dt
dQ
⇒ Q + RC = CVcell
dt
In Section S23.1, we showed that when the left-hand side of this differential equation equals
zero, it has a solution of Q = Ae −t / RC . Any multiple of e −t / RC put into this equation will always
give zero. So we need to add something of a different form to the solution in order to get a
non-zero right-hand side. (This form of differential equation with a non-zero right-hand
side is known as an inhomogeneous differential equation; you may have seen this type
of equation in your mathematics studies, where you will have seen its solution called the
particular integral.)
One method of solving this type of equation is to try different forms of solution to see
whether a particular function works. It turns out that if we have a function of the form
t
−
Q = Ae RC + CVcell
we will get the correct right-hand side of the differential equation. The initial conditions for
charging are somewhat different. When the capacitor is fully charged, its potential difference
will be equal (and opposite) to that of the cell. Therefore it will have a charge Q = CVcell when
t is large. If we charge up a capacitor that is initially completely discharged, we know that the
initial charge is zero. This information tells us that in this case, the constant of integration A
must be −CVcell . The solution is therefore that the charge increases according to the following
equation:
t
 − 
Q = CVcell  1 − e RC 
 

We can deduce that the potential difference across the capacitor will follow a similar
relationship (increasing until it reaches the same potential difference as the cell):
t
 − 
V = Vcell  1 − e RC 
 

To calculate the current, we need to consider the potential difference across the resistor,
which is
t
−
VR (t ) = Vcell − VC (t ) = Vcelle RC
We can use Ohm’s law to find an expression for the current:

t t
Vcell − RC −
I (t ) = e = I0e RC
R
6
We can test whether this expression is reasonable. Initially, the current will be at a
maximum; once the capacitor is fully charged, the current drops to zero. Another way to
produce this relationship for the current is to differentiate the charge equation with respect
dQ (t )
to time, since I (t ) = .
dt
Summary
■ When a capacitor discharges through a resistor, the potential difference, current and
t
−
charge follow the exponential form x = x 0e RC
■ The time constant for a capacitor τ = RC

S27: Charged particles
Learning Outcomes
■ explain the Hall effect, and derive and use VH = Bvd
■ derive, recall and use r =
mv
for the radius of curvature of a charged particle moving
BQ
in a magnetic field
S27.1 Radius of curvature of a charged particle

In the section ‘Orbiting charges’ of Chapter 27 of the A-level Coursebook, there is a
derivation of the following equation for the radius of curvature of an electron as it moves in a
magnetic field:
mv
r=
Be
This equation can be applied to other charged particles, if we consider a charge Q in place of
the electron. The equation becomes:
mv 1
r=
BQ

Remember, though, that a positively charged particle will travel in the opposite direction
around the path compared to the negatively charged electron.
S27.2 More about the Hall effect

In the section ‘The Hall effect’ of Chapter 27 of the A-level Coursebook an equation is
presented for the Hall voltage:
eVH
= Bev
d
We can re-arrange this equation to express the Hall voltage in the form:
VH = Bvd
This form of equation for the Hall voltage may be more appropriate when solving particular
types of problems.
Summary
■ The Hall voltage can be expressed in the form: VH = Bvd
■ The radius of curvature of charged particle with charge Q as it moves in a magnetic

mv
field is given by r =
BQ
S28: Electromagnetic induction
Learning Outcomes
d ( Nφ )
■ recognise and use E = − and explain how it is an expression of Faraday’s and
dt
Lenz’s laws
S28.1 Combining Faraday’s law and Lenz’s law

Chapter 28 of the Coursebook describes both Faraday’s law and Lenz’s law of
electromagnetic induction. Faraday’s law is expressed as:
∆ ( Nφ )
E =
∆t
Expressed in words, this means that the magnitude of the induced e.m.f. is proportional to
the rate of change of magnetic flux linkage ( Nφ ) . We can also write this law as a derivative:
d ( Nφ )
E = 1
dt
If we have a formula that expresses the flux linkage as a function of time, we can use calculus
to determine the magnitude of the induced e.m.f. (One example of such a function is when
we have a coil that turns at a known rate.)
We can also combine Faraday’s law and Lenz’s law into a single equation:
d ( Nφ )
E=−
dt
This tells us that the induced e.m.f. and the change in magnetic flux linkage have opposite
signs. This is a mathematical way of expressing Lenz’s law: the induced e.m.f. will be established
in a direction so as to produce effects which oppose the change that is producing it.
Summary
■ The equation for the e.m.f. induced across a coil when the magnetic flux linking the
d ( Nφ )
coil changes is E = − , which combines Faraday’s and Lenz’s laws.
dt
S30: Quantum physics
Learning Outcomes
■ explain atomic line spectra in terms of photon emission and transitions between discrete
energy levels
■ apply E = hf to radiation emitted in a transition between energy levels
■ show an understanding of the hydrogen line spectrum, photons and energy levels as
represented by the Lyman, Balmer and Paschen series
■ recognise and use the energy levels of the hydrogen atom as described by the empirical
13.6
equation En = − 2 eV
n
■ explain energy levels using the model of standing waves in a rectangular one-dimensional
potential well
13.6
■ derive the hydrogen atom energy level equation En = − 2 eV algebraically using the
n
model of electron standing waves, the de Broglie relation and the quantisation of angular
momentum
■ understand the use of stopping potential to find the maximum kinetic energy of
photoelectrons
■ plot a graph of stopping potential against frequency to determine the Planck constant, work
1
function and threshold frequency
S30.1 The hydrogen atom

We now need to look more carefully at the spectrum of the hydrogen atom. Historically this
is of great importance. The Swiss scientist and teacher Johann Balmer in 1885 discovered
a simple mathematical formula to describe the wavelengths of the hydrogen spectrum.
This had a major impact on a range of sciences including chemistry and astronomy, but its
full significance only became clear in 1913 when Niels Bohr developed a quantum theory
of hydrogen (see Section S30.2). Bohr’s theory was revolutionary but it matched Balmer’s
formula very closely, which was enough to start putting quantum theory on a strong
mathematical basis.
Hydrogen series
The wavelengths of light emitted by hydrogen atoms to form the lines of an emission
spectrum are best understood by thinking about different series of lines. All the lines in a
given series involve transitions that end on the same energy level, and which start at each of
the higher levels. These series are named after the various scientists involved in measuring
them – see Figure S30.1.
n E(eV)
0.00
6 –0.38
5 –0.54
IR excited
4 –0.85
states
3 –1.51
Paschen
series
2 –3.40
Balmer
series
UV Lyman
series
ground
1 –13.6 state
Figure S30.1 Energy levels of the hydrogen atom with some of the transitions between them that
give rise to the spectral lines indicated.
In Figure S30.1, a new notation is introduced. Alongside the energies in the diagram there is
also a numerical label for each energy level, called the principal quantum number, n. The
Lyman series of lines are all transitions to the lowest energy level, n = 1, called the ground
state. All of these transitions have a minimum energy of 13.6 − 3.4 = 10.2 eV, which is the
energy difference between n = 2 and n = 1. The lowest frequency photon emitted in the Lyman
hc
series has an energy of 10.2 eV and hence a wavelength of λ = = 121nm, which is in the
E
ultraviolet. This is the energy calculated in the section ‘Photon energies’ of Chapter 30 of the
Coursebook.
2
All the other lines in the Lyman series are of greater energy, and so greater frequency and
shorter wavelength, but they converge towards a limit. No transition will have an energy
greater than 13.6 eV, as this would involve transitions from an energy level above zero
(an electron with such energy would not be bound to the hydrogen atom). The observed
spectrum of hydrogen shows many lines getting closer and closer together, converging to a
limit corresponding to an energy of 13.6 eV.
The next series involves transitions to the level n = 2. This Balmer series is one of the most
important for observations, because the transitions largely fall into the visible spectrum and so
were amongst the first observed (see Worked example S30.1). The next series to level n = 3, the
Paschen series, involves transitions of much lower energy and longer wavelength, in the infrared
area of the electromagnetic spectrum.
Worked ExamplE S30.1
Find the wavelength of the light emitted due to a transition from n = 3 to n = 2. This is called
the Balmer alpha line.
The energy gap is −1.51 − (−3.40) eV = 1.89 eV = 1.89 × 1.6 × 10 −19 J = 3.02 × 10 −19 J
hc
Using λ = we find λ = 6.58 × 10−7 m = 658 nm, which is in the red part of the visible
E
spectrum.
question
30.1 Find the longest wavelength and the shortest wavelength lines in the Paschen series.
The energy levels of hydrogen

The principal quantum number is more than just a useful label for the hydrogen energy
levels. It also serves to help us calculate the energies of each level, En; we can use the relation
En = − 13.6 eV where n is a positive integer

n2
As we have seen, n = 1 corresponds to the ground state. Higher values of n correspond to the
excited states, which get closer and closer to the ionisation energy as n tends towards infinity.
For n = 1 this clearly gives the value E1 = −13.6 eV. The other energies are also straightforward
to calculate (see Table S30.1).
n E/eV
1 −13.6
2 −3.40
3 −1.51
4 −0.85
5 −0.54
6 −0.38
3
7 −0.28
Table S30.1 Energy levels of hydrogen compared to the ionisation energy.
questionS
30.2 Calculate the energy levels of hydrogen for n = 8 and n = 10 . From these results,
calculate the transition energy.
30.3 Find the equivalent formula to En = −13.6 eV/n2 for the hydrogen energy levels, but in J
instead of eV.
S30.2 Explaining the energy levels of hydrogen

Why does the electron in hydrogen only possess specific energies? And why those particular
energies (−13.6 eV, −3.4 eV, etc.) and not any others? The answer to this question lies in the
dual nature of electrons: they can behave as waves and as particles (see Chapter 30 of the
Coursebook, section ‘The nature of light – waves or particles?’). This dual nature limits the
behaviour of electrons in a very similar way to the string on a guitar, where the length and
nature of the string limits the frequencies it can oscillate at. For a guitar, the waves on a
string have to be standing waves, which means they can only have certain wavelengths (see
Coursebook Chapter 14 ‘Superposition of waves’). This in turn means they can have only
specific frequencies (in a properly tuned guitar, these are the musical notes). In a similar way,
an electron within an atom forms a type of standing wave, so the de Broglie wavelength of
the electron can only take on specific values. This in turn limits the possible values for the
momentum and hence the energy of the electron.
A one-dimensional potential well

Before considering the hydrogen atom, first we look at an electron confined in an infinite
potential well. This means that the electron has no potential energy but is trapped by
‘walls’ of infinite potential energy at a distance of ±a from the centre. Figure S30.2 shows a
representation of this type of well and three allowed electron waves, which are exactly like
the standing waves on a string.
infinite walls of the well
–a a
Figure S30.2 Representation of an infinite potential well and three electron (standing) waves
within it.
Because the walls of the well are infinitely high the electron wave has a value of zero at
±a. This means the electron can have a wavelength of 4a (blue line), 2a (red line), 4a/3
(orange line) and so on. In general, the allowed electron waves have wavelength 4a/n
where n = 1, 2, 3… 4
de Broglie’s formula relates the wavelength to the momentum:
h nh
p= =
λ 4a
1 p2
and we can also write the kinetic energy, KE = mv 2 as KE =
2 m
Hence the electron’s KE is given by
n 2h 2
KE =
16a 2m
As the potential energy at the bottom of the well is 0, the formula above represents the
electron’s total energy. It predicts that the electron in this (artificial) potential well can have
only specific energies governed by the integer values of n. This is an example of quantisation
of energy and it arises from the wave-like behaviour of electrons.
Electrons in a hydrogen atom

Of course, an atom is not an infinite well and not one-dimensional, so the details of the
calculation will be different from the example above, but the principle of quantisation is
the same.
In deriving his formula for the energy levels in hydrogen, Neils Bohr made a similar
calculation to that of the infinite well, using the de Broglie formula and the idea of standing
waves. However, there is a significant difference: the waves do not fall to zero at the edge of
the box, because the atom is not a box. Instead, the electron waves spread around a circular
orbit and have to complete a fixed number of wavelengths within the circumference of the
orbit (Figure S30.3).
Another way to write this same rule is to say (as Bohr did) that the orbital angular
momentum of the electron can only take on fixed values.
Figure S30.3 Sketched wave functions around a hydrogen atom.
The requirement that there are a fixed number of wavelengths in an orbit means that:
nλ = 2π r where n = 1, 2, 3…
This is the quantum part of the calculation – the rest is classical mechanics. In order for the
electron to follow a circular orbit there must be a centripetal force, which is provided by the
attraction between the electron and the nucleus:
mv 2 Ze 2
=

r
( 4πε 0r 2 )
where e is the fundamental electron charge and Z is the proton (atomic) number. Although
the equation is only true for hydrogen, with one electron, by including Z we can make
predictions for hydrogen-like atoms, such as He+ and Li++ (doubly ionised lithium).
This equation can be rearranged to make v the subject:
(
v = √ Ze 2 / ( 4πε 0mr ) ) 5
Hence we can calculate the kinetic energy of the electron:
1 Ze 2
KE = mv 2 =
2 (8πε 0r )
The potential energy of the electron is not zero, but is given by the laws of electrostatics:
−Ze 2
PE =
4πε 0r
and so the total energy is
− Ze 2
E = KE + PE =
8πε 0r
That the energy is negative is a sign that the electron is bound within the atom. The energy of the
electron is fixed by its radius. We now use the quantisation rule relating r and the wavelength:
nλ = 2π r
h nh
p = =
λ 2π r
p2 n 2h 2
KE = =

2m (
8π 2r 2m )
Ze 2
But we also have derived that KE =
(8πε 0r )
Putting these two expressions equal to each other gives an equation for r:
n2h2 Ze 2
=
8π r m ( 8πε 0r )
2 2

ε 0n 2 h 2
So r=
π mZe 2
There are only specific values of r allowed. Putting those values back into the expression for
the total electron energy, E:
Ze 2 Z 2 e 4m R Z 2 e 4m
E = − = − 2 2 2 = 2 where R = − 2 2 = −21.7 × 10−19 J = −13.6 eV
8πε 0r 8ε 0 h n n 8ε 0 h
Bohr’s analysis produces the empirical (found by experiment) formula for the energy levels
of hydrogen. It also predicts that the energy levels of ionised helium (He+) will be 4 times
greater.
Angular momentum
An alternative and equivalent approach to deriving Bohr’s formula (in fact, the one Bohr
himself used) is to start from the assumption that the angular momentum (L) of the
electrons is quantised – it can only take on specific values given by:
nh 6
L =
2π
As L = mvr (see Chapter 17 of the Coursebook ‘Circular motion’ Sections S17.1 to S17.4), this
is equivalent to saying that:
L nh
mv = p = =
r ( 2π r )
which is identical to the quantisation rule given by the standing wave argument. (In
fact, the standing wave version we saw earlier was developed in 1924 by de Broglie as an
interpretation of this angular momentum rule.)
You should be able to use both methods to derive Bohr’s formula.
question
30.4 A generalised formula for the energy levels of an atom with one electron is:
En = − Z 2 × 13.6 eV / n 2
where Z is the proton (atomic) number. Find the ionisation energy of a lone electron
orbiting the nucleus of a silicon atom ( Z = 14 , so this would be Si13+).
S30.3 Measuring the work function

In order to measure the work function of a material, we can make use of the Einstein
equation:
hf = Φ + k.e.max
rearranged to give:
k.e.max = hf − Φ
We shine light of different frequencies onto a metal surface and measure the kinetic energy
of the emitted electrons. A graph of k.e.max against f will have an intercept of −Φ.
monochromatic
radiation
photocell
V anode
cathode
Figure S30.4 Experimental set-up for measuring the work function.
7
Monochromatic radiation of different wavelengths can be generated from a bright white light
source such as a slide projector, with different coloured filters placed in front. The wavelength
passed by the filters is marked on them and so the frequency of the light can be calculated.
The photocell is in a vacuum so the electrons emitted from the cathode do not lose any
energy in collisions. For each colour of light available, the voltage of the supply is gradually
increased until the microammeter registers zero current. This voltage, called the stopping
potential, is noted, and the experiment repeated with a new colour of light.
The stopping potential is related to the maximum kinetic energy of the electrons by the
following equation:
e × Vstopping = k.e.max
To see why this is, think about the electrons emitted from the cathode. They have kinetic
energies from zero to k.e.max and will travel freely to the anode. However, the anode is at
a negative potential due to the power supply and so the electrons are repelled from it. The
energy they need to cross a potential difference V is eV. The current will drop to zero once
there are no electrons with sufficient energy to reach the anode, that is, once eV is equal to
k.e.max.
Interpreting the results

The photoelectric experiment can be used to find values for the work function of the material
in the photocell, Planck’s constant and the threshold frequency. First we plot a graph of the
stopping potential against the frequency of the light. This should give a straight line with
positive gradient and negative y-intercept (see Figure S30.5).
Stopping potential/V
gradient = h/e
0 Light frequency/Hz
x-intercept = threshold frequency
y-intercept =
Figure S30.5 A graph of stopping potential against light frequency enables us to determine the
work function.
Instead of plotting k.e.max on the y-axis, we have plotted the stopping potential, V = k.e.max /e.
Again by considering Einstein’s equation:
k.e.max = hf − Φ = eV
V = (h / e ) f − Φ / e
We can see that the y-intercept is −Φ/e and the gradient is h/e. The x-intercept shows the
point at which electrons would just be emitted with zero kinetic energy, the threshold
frequency. 8
Summary
■ The energy levels of hydrogen are given by En = −13.6 eV/n2 where n is the principal
quantum number and is a positive integer.
■ Electrons create standing waves and so can only have fixed de Broglie wavelengths
in a potential well or an atom.
h
■ The de Broglie wavelength is linked to electron momentum by p = λ and
p2
momentum is linked to kinetic energy by KE = .
2m
■ The formula for energy levels in an atom can be derived either by using electron
standing waves or by using the quantisation of angular momentum.
S30.1
The Balmer series starts with a red line of wavelength 658 nm. Further lines in the series
are of shorter wavelength. A transition from which excited state is the first one to have a
wavelength below 400 nm, i.e. in the ultraviolet?
S30.2
Find the radius of orbit of the ground state of hydrogen and hence the orbital velocity
and angular momentum (mvr). Express this in units of (h/2π).
S31: Nuclear physics
Learning Outcomes
■ show that the random nature of radioactive decay leads to the differential equation
dN
= − λ N and that N = N 0e − λt is a solution to this equation
dt
■ recognise and use the equation I = I0e − µ x as applied to attenuation losses
■ recall that radiation emitted from a point source and travelling through a non-absorbent
material obeys an inverse square law and use this to solve problems
■ estimate the size of a nucleus from the distance of closest approach of a charged particle
■ relate the equation ∆E = ∆mc 2 to the creation or annihilation of particle–antiparticle pairs
■ understand how the conservation laws for energy, momentum and charge in beta-minus
decay were used to predict the existence and properties of the anti-neutrino
S31.1 Rate of decay

You have seen how every isotope decays at a different rate characterised by its half-life,
the mean time for half of the active nuclei to decay. You also met the term activity which
1
measures the rate of decay, given by:
∆N
A = −
∆t
You were also introduced to the decay constant, λ, which relates A to N:

A = λN
We shall now use these definitions to take a more mathematical approach to radioactive
decay.
The two equations for A must be the same and so we can write that:
∆N
A=− = λN
∆t
However, strictly λ is only accurate if applied to infinitesimally short times, so we need to

replace the ∆N term by an exact differential:
∆t
dN
− = λN
dt
Moving the negative sign across:
dN
= −λ N
dt
This is a differential equation and there are several ways in which to solve it. We meet
equations of this form often in physics, so it is easiest simply to recall that the solution is of
the form:
N (t ) = N 0 e − λt
where N(t) is the number of undecayed nuclei after a time t and N0 is the number of
undecayed nuclei at time t = 0. We can substitute these values of N into the differential
equation to show that these values do indeed solve the equation:
N (t ) = N 0e − λt
dN
= − λ N 0e − λt = − λ N (t )
dt
The number of radioactive atoms decays exponentially with time, as shown in the graph in
the Coursebook (Figure 31.10).
dN
From the definition of activity, A = − , it follows that:
dt
A = λ N 0e − λt = λ N (t )
and
A = A0e − λt
as N decays exponentially, and A is always proportional to N.
S31.2 Intensity of radiation

The intensity of radiation is measured, like the intensity of other forms of radiant energy, by
the power per unit area. Intensity has the symbol I and is usually measured in W m−2.
Attenuation of radiation 2
The radiation emitted by unstable nuclei ionises matter. This means that the radiation
steadily loses energy as it progresses through matter. Therefore, the intensity of the radiation
also reduces as it progresses through matter. This absorption is called attenuation of the
radiation and is expressed mathematically as follows (see Figure S31.1):
I = I0e − µx
In this equation, I is the intensity of the radiation, I0 is the intensity just before the radiation
enters the matter and x is the distance travelled through the matter. The quantity μ is called
the attenuation coefficient, which depends both on the type and the energy of the radiation
as well as the nature of the matter itself. The attenuation coefficient μ has units of m−1 (see
Worked example S31.1).
matter with
absorption
coefficient µ
I0 I0e– µ x
Figure S31.1 The transmission and absorption of radiation as it passes through matter.
Gamma rays of energy 1.0 MeV are fired at a sheet of lead of thickness 5.0 cm. Lead has an
attenuation coefficient of 80 m−1. If the incident gamma rays have an intensity of 10 mW m−2,
find the intensity after passing through the lead.
I = I0e − µx
= 10e −80 × 0.05
= 10e −4
= 0.18 mW m −2
Notice in this calculation that the intensity was left in units of mW m−2, so the answer should
be in the same units, but that the thickness of the lead had to be converted to metres in
order to match the unit of the attenuation coefficient.
The inverse square law

Even when no matter is present to absorb the radiation, the intensity of radiation will reduce
with distance from the source. This is because the total energy carried by the radiation is
unchanged, but the energy is spread over a larger and larger area the further the radiation
travels from the source. We can observe this effect with visible light: a lamp looks dimmer as
we move further away from it. This is because our eyes are sensitive only to the intensity of 3
light that reaches the eye, not the total energy emitted by the source.
Consider a point source of radiation. At a distance x, the radiation will be spread uniformly
over a sphere with a surface area of 4πx2 if the radiation is emitted equally in all directions.
Remember that intensity is the power divided by the area, so that the intensity reduces as 12 .
x
This is called an inverse square law, and it applies to any type of radiated energy that is not
absorbed. Worked example S31.2 shows how an inverse square law can be applied. Usually,
the inverse square law for nuclear radiation is applied only to gamma rays. This is because
alpha and beta radiation are more strongly absorbed by any matter, including air, and these
attenuation effects are more significant than the reduction in intensity due to distance.
Two sources of gamma radiation are of equal power. A detector is placed 1.0 cm from the first
source and 5.0 cm from the second. What is the ratio of the intensity of radiation from the
first compared to the second?
Let the power of each source be P. The intensity at 1.0 cm distance will be
P
I1 = 2
4π × ( 0.010 )
and at 5.0 cm will be
P
I5 = 2
4π × ( 0.050 )
Worked example S31.2 (continued)
Then the ratio

2
I1 ( 0.050 )
= = 25
I5 ( 0.010 )2
This could also have been solved by simply squaring the ratio of the distances (5:1)2 = 25:1 and
recalling that I1 must be greater than I5.
In practice, we do not rely on just attenuation or distance alone to protect ourselves from
sources of radiation. For example, a medical radiographer taking an X-ray image will stand
at some distance from the source and stand behind an attenuating screen. In school and
college laboratories, all radioactive sources are kept in lead boxes. These are then stored a
long way from where people usually work.
questions
31.1 A material has an attenuation coefficient of 65 m−1 for gamma rays of energy 3.0 MeV.
a Express the attenuation coefficient in units of cm−1.
b
Find the fractional reduction in intensity after i 1 cm and ii 30 cm.
31.2 The maximum safe level of a particular radiation is deemed to be 100 nW cm−2. How far
from a source of power 10 W would it be safe to stand, assuming no attenuation by the
surrounding medium?
S31.3 Properties of the nucleus

In Chapter 16 of the Coursebook, we saw how the existence of the nucleus was revealed
by the alpha-particle experiment conceived by Rutherford and carried out by Geiger and
Marsden. As well as demonstrating that the atom had a small, positive core that carried
the majority of the mass, the scientists were also able to measure a maximum size for the
nucleus. When an alpha particle makes a head-on collision with a nucleus and is reflected
through 180°, we know that it does not come close enough to the nucleus to merge or fuse.
This means that the outer edge of the nucleus can be no further out from the centre of the
nucleus than the closest distance to which the alpha particle approaches (Figure S31.2).
nucleus
alpha particle
radius of closest
approach
Figure S31.2 An alpha particle reflected back along its path of approach cannot approach closer
than distance r from the centre of the nucleus.
We can use the idea of electrostatic potential (see Chapter 23 in the Coursebook) to work
out that closest distance. If the alpha particle has a kinetic energy E initially and zero at the
instant it turns around, at that same instant it must have electrostatic potential energy E
because energy is conserved:
kinetic energy + electrostatic potential energy = constant
Using the equation for potential energy from Chapter 20 (Electric fields) we can write:
Q1Q2
E =
( 4πε 0r )
Rutherford’s experiment used alpha particles with a charge Q1 = + 2e = 3.2 × 10−19 C and
gold nuclei with a charge Q2 = + 79e = 1.3 × 10−17 C. The alpha particles had kinetic energy
E = 1.07 × 10−12 J and so the distance of closest approach can be found to be 3.4 × 10 −14 m.
The importance of this result is that it is many times smaller than the radius of an atom,
which Rutherford estimated to be about 10−10 m, and so he could prove that an atomic
nucleus is very small. Note that this gives an upper limit to the size of the nucleus –
Rutherford realised that it could, in fact, be still smaller. In order to investigate the actual
size of a nucleus, it was necessary to use higher and higher energy alpha particles or protons
from particle accelerators. However, particle accelerators were only developed over 20 years
after Rutherford’s alpha-scattering experiment.
S31.4 Particles, anti-particles and

conservation laws
You will recall from Chapter 16 of the Coursebook that there are three types of radiation
emitted from nuclei, labelled α, β and γ. You will also recall that particles have antimatter
‘cousins’, for example the positron has the same mass as an electron and shares many similar
properties, but has an opposite electrical charge. A particle and its antiparticle may collide
and annihilate each other, producing radiation.
We can calculate the energy released in an annihilation using Einstein’s mass–energy
equation: 5
2
∆E = ∆mc
See Worked example S31.3.
A positron and an electron, each of mass 9.11 × 10 −31 kg annihilate each other to produce
two gamma rays. In order to conserve momentum, the gamma rays are emitted in opposite
directions with equal energy. We will assume that the electron and positron were both
initially at rest.
The kinetic energy of each photon is given by ∆E = ∆mc 2, where ∆m is the mass of an
electron. So
( )
2
∆E = 9.11 × 10−31 × c 2 = 9.11 × 10−31 × 3.00 × 108 = 8.20 × 10−14 J
hc
Using the Einstein relation E =
(see Chapter 30 of the Coursebook), we can find the
λ
wavelength of these gamma rays:
hc 6.34 × 10−34 × 3.00 × 108

λ= = = 2.32 pm
E 8.20 × 10−14
Note that we could have left out a step in this calculation by using the de Broglie equation
h
λ= .
mc
In particle accelerators, such as the Large Hadron Collider, new particles can be created from the
kinetic energy of the colliding beams of particles. Particles and antiparticles are created together.
question
31.3 proton has a rest mass of 1.67 × 10−27 kg. A proton and an antiproton are created at
A
rest by colliding a beam of electrons and a beam of positrons head-on, so that one
electron annihilates one positron. Calculate the kinetic energy of each beam.
S31.5 Fusion and fission

The three types of radioactive decay discussed so far are not the only ways in which a nucleus
can emit radiation. Some very heavy atoms, including isotopes of uranium and plutonium,
can split into two approximately equal halves. This process is called spontaneous fission and
it usually results in the emission of several neutrons. These neutrons can be slowed down
and strike further nuclei, causing them to split – this is called induced fission. Because one
fission event can release several neutrons (see worked example S31.4), there is the possibility
of a chain reaction, in which each fission triggers at least one more and no further stimulus is
needed to keep the reaction going, until the fuel is used up. This is the process used in nuclear
reactors, which generate as much power as a large coal-fired power station but using around
20 000 times less mass of fuel. It is also used in nuclear bombs, where the release of energy is
very rapid, leading to enormous temperatures, comparable with the core of the Sun. At these
high temperatures the nuclei of light elements can be fused together to make heavier elements,
releasing further energy. This is the same process as the Sun uses to generate heat and light, but
powered by a nuclear fission bomb. It is known as thermonuclear fusion which, because it is
uncontrolled, has so far only been used by humans in weapons. Large research teams around 6
the world are trying to find ways to harness the energy of fusion in a controlled manner.
Plutonium
Plutonium-239 can undergo a similar reaction to uranium-238 but because it is both
more fissile (more likely to undergo fission) and produces more neutrons per reaction, less
plutonium-239 is needed to start a chain reaction than uranium-235. Uranium-238 is not
useful for nuclear fission, but makes up about 99% of natural uranium. Plutonium-239 can be
created from uranium-238 when fast neutrons strike uranium, creating uranium-239. Beta
decay rapidly turns this isotope of uranium into first neptunium-239 and then plutonium-239.
Like uranium, plutonium will decay into a variety of possible products, most of which are
radioactive. One example is:
1 239 100 137 1
0 n + 94 Pu → 40 Zr + 54 Xe + 3 0 n
A nucleus of uranium-235 undergoes induced fission when struck by a neutron. It splits into
nuclei of krypton-89 and barium-144. How many neutrons are emitted?
You will need to use a Periodic Table to look up the atomic numbers of krypton (36) and barium
(56) and then write the equation:
235 1 89 144 1
92 U + 0 n → 36 Kr + 56 Ba + x 0 n
We have to find x, the number of neutrons. The atomic number (proton number) is equal on
both sides but the mass number has to be as well. There is a total mass number of 236 on the
left and so x = 3 for it to be the same on the right. So this reaction emits three neutrons.
Don’t forget to include the original neutron that caused the fission in the first place.
S31.6 Nuclear equations

In all reactions, chemical or nuclear, we can work out what happens by applying
conservation laws. In chemical reactions, we know that the number of atoms of each element
is conserved, so for example if there are two sodium atoms in the reactants, there must be
two sodium atoms in the products.
Similarly in nuclear reactions there are properties which must be the same on both sides
of a reaction equation. For radioactive decays, the relevant quantities are mass and charge.
When we write the symbol for an isotope AZ X then the nucleon number A represents the
mass and the proton number Z represents the charge. As long as the total mass and total
charge on the left and right of the reaction equation balance then the conservation laws are
obeyed.
Consider the alpha decay of americium-241. Americium has proton number 95 and the
isotope name tells us the nucleon number, 241. An alpha particle is a helium nucleus and
has proton number 2 and nucleon number 4. We can write americium-241 as 241 95 Am and an
4
alpha particle as 2 α so the decay can be represented as:
241 4
95 Am → 2 α + AZ X
where A and Z have to be found. Once Z is known we can identify the element written as ‘X’. The
conservation of mass means A = 241 – 4 = 237. Conservation of charge means Z = 95 – 2 = 93.
The Periodic Table then tells us that the element is neptunium, Np and the full equation is:
241 4
95 Am → 2 α + 237
93 Np
The conservation laws simply mean that the top line of numbers adds up to be equal on both
sides of the reaction arrow, and similarly with the bottom line of numbers.
Applying the conservation laws to beta decay is a little more difficult because the beta 7
particle does not have a nucleon number or proton number. However, if we remember that
0
we are looking to balance mass and charge we can write the beta particle as −1 β and the
positron as +10 β . We give them zero mass because we are only using whole numbers and the
mass of a beta particle is about 1/2000 that of a proton or neutron.
For example, strontium-90 decays by emitting a beta-minus particle:
90 0 90
38 Sr → −1 β + 39 Y
This time, in order to balance the bottom line, the proton number of the daughter nucleus is
one higher than that of the parent, so strontium produces yttrium, proton number 39. The
mass number remains unchanged.
Sodium-22 undergoes beta-plus decay:
22 0 22
11 Na → +1 β + 10 Ne
Once again the top line, the mass, remains unchanged, but this time the daughter nucleus
has a lower proton number, because the positive charge of the positron is lost from the parent
nucleus.
The principles of conservation of mass, energy and momentum can have very important
consequences in physics, as we will see.
One such example comes from the discovery of neutrinos. In alpha decay, the alpha
particle is always emitted with the same amount of energy and momentum (measured in a
cloud chamber by the length and curvature of the track) for a given isotope. Each radioactive
decay produces the same amount of energy and only one particle is produced, so it carries
off that full amount as kinetic energy. In beta decay it was observed that the beta particle
can vary in energy and momentum (including direction). For energy and momentum to
be conserved, scientists suggested the existence of a new particle which shared the KE and
momentum with the beta particle. For charge to be conserved, this particle had to be neutral.
As the electron sometimes carried nearly all the energy of the decay there was little energy
left to create the mass of this new particle so it must be very light. Hence it was called the
neutrino, meaning “little neutral one” in Italian.
Summary
■ The intensity of radiation is reduced when passing through matter, according to the
equation I = I0e − µ x
■ Radiation is reduced in intensity by the inverse square law as it spreads over a larger
area.
■ The Rutherford scattering experiment reveals a maximum size for the nucleus, which
is known to be around 10−15 m.
■ Nuclear fission can happen spontaneously or be induced by a neutron colliding with
the nucleus.
■ Fission can result in the release of further neutrons, causing more fission events and
a chain reaction.
■ Fusion can be caused by the high temperatures generated in a fission explosion.
■ Fission and fusion can both release enormous amounts of energy.
End-of-chapter questions 8
S31.1
A narrow beam of gamma rays is attenuated by 20 cm of material with an attenuation
coefficient of 1.2 m−1. What is the fractional reduction in intensity?
S31.2
Safety rules recommend that no one should work within 2.0 m of a stored radioactive
source. A desk is placed 1.0 m from the source but a lead shield with an attenuation
coefficient of 15.0 m−1 is added. What thickness of lead should be used to offer the same
reduction in intensity?
S31.3
a Potassium-40 undergoes beta decay. Write a balanced equation for the process.
b Thorium-232 (atomic number 90) decays by a sequence of alpha and beta decays to
lead-208 (atomic number 82), which is stable. How many of each of alpha and beta
particles are emitted?
S33: I nterpreting quantum

theory
Learning Outcomes
■ interpret the double-slit experiment using the Copenhagen interpretation (and collapse of
the wavefunction), Feynman’s sum-over-histories and Everett’s many-worlds theory
■ describe and explain Schrödinger,s cat paradox and appreciate the use of a thought
experiment to illustrate and argue about fundamental principles
■ recognise and use ∆p∆x > h/2π as a form of the Heisenberg uncertainty principle and
interpret it
■ recognise that the Heisenberg uncertainty principle places limits on our ability to know the
state of a system and hence to predict its future
■ recall that Newtonian physics is deterministic, but quantum theory is indeterministic
■ understand why Einstein thought that quantum theory undermined the nature of reality
by being:
● indeterministic (initial conditions do not uniquely determine the future)
● non-local (for example, wave-function collapse)
● incomplete (unable to predict precise values for properties of particles) 1
S33.1 The paradox of the double-slit experiment

You met Young’s double-slit experiment in Chapter 14 ‘Superposition of waves’ from the
Coursebook, where it was used as strong evidence for the wave-like nature of light. Similar
experiments have been carried out with particles, from electrons to C60 (buckminsterfullerene
or ‘buckyballs’), which all confirm the evidence of electron diffraction. This all implies that
particles have a wave-like nature, which controls how they travel.
However, this so-called ‘wave-particle duality’ raises as many questions as it answers.
Specifically, we use a particle model to describe light and electrons (and other particles) when
they interact with matter – for example, when photons are emitted or absorbed, or when
electrons ionise a gas. However, we use a wave model to describe these objects in motion,
which is why they diffract through slits, interfere and superpose. How does a photon or an
electron ‘know’ when to be a wave and when to be a particle?
Secondly, how does a photon ‘know’ that it should act like a wave, for example, when it
interacts with matter in a double-slit experiment, and when it should act like a particle, for
example when it is detected. Is it possibly something to do with the person observing it?
These sorts of questions are beyond the realm of experimental physics, which
can determine what happens, but not why it happens. There are several theoretical
interpretations that set out to explain the double-slit experiment and the other results of
experiments in quantum physics.
The wave-function
Fundamental to the idea of interpreting quantum mechanics is the concept of the wave-
function. This is a mathematical function that contains all the information about a system
or particle. How it changes with time then depends on the surroundings. To calculate the
outcome of an experiment we use the wave-function, just as we used the wave nature of light
to calculate interference effects. However, the wave-function associated with a particle such
as a photon or electron is not a physical wave that we can measure and display. Instead, it is
a mathematical model of what happens. We can calculate the intensity of the wave-function
much as we would do for other types of wave, using the square of the amplitude; the intensity
gives us the probability of finding the particle at a given position. This is very significant:
it suggests that, until a particle arrives and is detected, there is uncertainty associated with
the outcome of an experiment. A particle could arrive at one of a number of different places,
and we do not know with certainty where it is going to arrive until it actually arrives; an
interpretation of this is that until we detect the particle, it actually is in a number of places.
In terms of the double-slit experiment, the wave-function for a single electron behaves
like a wave that passes through both slits and interferes to create maxima and minima.
However, these are variations in probability, not the measured intensity of a single electron.
The important fact is that we cannot ‘see’ a single electron or photon split up into pieces; in
the end, one particle enters the apparatus and one particle arrives at the detector – not 10%
of a particle at one point and 90% somewhere else. This is what is meant by something being
detected as a particle – when measured, it is definitely in one place.
The important thing to remember is that a single particle could arrive at any point where
the predicted intensity of the wave-function is not zero, so at any of the ‘peaks’ we can
calculate. Once the particle is detected, then the outcome of the experiment is known and
there is only one possible outcome for a single particle; yet until it is detected, mathematically
we have to consider that it could be at one of a number of different places. This is one of the
strangest things about quantum physics, and can take some time to get used to.
In learning about the double-slit experiment with light we discussed the idea of light
waves interfering. So is this a case of the wave-functions of particles interfering with each
2
other? A beautiful experiment in 1909 by G.I. Taylor reduced the light intensity in the
double-slit experiment so much that only a single photon was present in the system at a
time, yet interference fringes still appeared. Such experimental evidence suggests that the
photon interferes with itself rather than with other photons. Somehow, the wave-function
simultaneously and instantly ‘knows about’ both slits.
The process by which a wave-function changes from probability and uncertainty to a
definite outcome, is the source of much of the disagreement that arose about interpreting
quantum theory.
The Copenhagen interpretation

One of the first ways of interpreting the wave-function was suggested by Niels Bohr and co-
workers, based in Copenhagen – hence the name, the Copenhagen interpretation. There is no
single, complete statement that defines this interpretation, but the essential idea is that before
detecting a particle, the results of an experiment remain uncertain and so the wave-function
does not give a definitive answer to questions such as ‘where is the particle now?’. After
the detection occurs (in effect, a measurement is made of the particle), one of the possible
answers is known to be true. The wave-function is said to ‘collapse’ when a measurement is
made: all of the possibilities collapse into one certainty.
However, experiments show that this is not simply a case of the particle having been
in that single, definite state all along and us not knowing about it. Let us think about the
double-slit experiment: if each particle travelled through one slit, one at a time, but we did
not know which slit an individual particle took until after detecting it, we would expect to
see a pattern of two peaks. One peak would represent all the particles that travelled through
one slit, and the second peak would represent all the particles that travelled through the
second slit – the sum of two single slits. However, as we know, we get multiple peaks that
show interference has occurred (Figure S33.1). This interference pattern suggests that each
particle and its wave-function appears to ‘pass through’ both slits and then later collapse into
a definite state when detected (Figure S33.2).
a
Sum of two single slits
b
2 slit pattern
Figure S33.1 Pattern due to a particles passing through two separate slits and b particles passing
through two slits and interfering.
probability distribution
When detected the particle could be found, for example,

at any of these places, at which point the wave-function
collapses and the probability distribution becomes 1
at that point and zero elswhere.
Figure S33.2 The wave-function collapses when the particle is detected.
The ‘many worlds’ interpretation

If the wave-function collapses by detection, then somehow all the other possibilities
disappear. This disturbed some physicists, because it seemed rather arbitrary – what is
special about an act of measurement that makes the wave-function collapse? After all, a
measurement is just an interaction with the outside world (or, in some interpretations, a
conscious mind), but a particle undergoes many interactions before it is finally detected.
What is special about the one where it is detected? In an attempt to answer this, in 1957
Hugh Everett suggested a different interpretation in which the wave-function never collapses.
Instead, at every moment in time there is a choice – for example, for a photon to arrive at
one point or a different point in an interference pattern – reality itself splits. In one version
of reality, the particle arrives at point A; in another version, it arrives at point B. In the
reality where it arrives at point A, the wave-function has not collapsed, it is just now 100%
correlated with us knowing that it is at A. Yet there is another version of reality in which the
wave-function is 100% correlated with another version of ourselves knowing that the particle
is at point B (see Figure S33.3).
The difference with the Copenhagen interpretation is that in the ‘many worlds’
interpretation, the wave-function never collapses and all possibilities remain true. What
changes is that in each reality we know can only know of one of the outcomes. This means
there are multiple different realities being generated all the time.
The different realities are sometimes referred to as different worlds or Universes, which is
why this is called the ‘many worlds’ or ‘multiple Universes’ interpretation.
no knowledge
of outcome particle particle
known to known to
be at A be at B
A B A B
Before measurement, After measurement,

particle is 50% likely to in one reality the particle
be at A or B and we do is at A, in the other at B.
not know which.
Figure S33.3 The many-worlds interpretation of the possible results of a measurement.
Sum over histories

A third interpretation is due to the American physicist, Richard Feynman. In this way of
looking at quantum mechanics, all possibilities happen. In the example of the double-slit 4
experiment a photon does not pass through one slit or the other – we must treat the system
as if the photon passes through both. We calculate the effect of all of the different paths and
add them together just as we add waves together. This gives us a set of probabilities for each
possible outcome of where the photon will arrive, and added together these probabilities
produce the interference pattern. Some paths can cancel each other out, and some reinforce.
A particularly interesting example of this is to consider light travelling in a straight line. In
the ‘sum over histories’ approach we calculate the effects of the photon travelling along every
possible path – straight and curved. The summing of the probabilities leads to all the curved
paths cancelling each other out, leaving just the straight path as the most probable (see
Figure S33.4).
photon
When adding up the probability for the photon travelling

along the black paths the wave-functions add up and give
a very high probability amplitude. Along the red paths the
wave-functions cancel and give a very low probability.
Figure S33.4 In the ‘sum over histories approach’, the sum is taken over all possible paths.
S33.2 Schrödinger,s cat

As quantum theory was being first devised, the ideas it generated caused a great deal of
controversy. The suggestion that the behaviour of particles was controlled by waves that
acted in a random fashion was totally counter to people’s intuition of how the world worked.
Many scientists accepted that small particles could behave in this fashion, but imagined there
must be some ‘missing theory’ awaiting discovery that linked this microscopic, probabilistic
behaviour to the macroscopic, ‘classical’ physics with absolute predictions and certainty. The
Austrian physicist Erwin Schrödinger devised a thought experiment that challenged this
position (see Figure S33.5).
A thought experiment is a way of taking a physical situation to a logical extreme in order
to see what implications it might have. In the case of Schrödinger he tied an undeniably
random, microscopic quantum event to a macroscopic, real world effect. He imagined the
following situation: put a cat and a radioactive isotope together into a box and leave the
box closed long enough that the isotope has a 50% chance to decay – this is a truly random,
quantum process, described by a wave-function. Place a detector, a Geiger–Muller tube,
in the box linked to a vial of poison. If there is a decay, the detector triggers a circuit that
releases the poison and the cat dies. (Remember, this was just a thought experiment – this
was not carried out for real!)
In this arrangement, when we open the box we make a measurement and know the
outcome – in the Copenhagen interpretation it means the wave-function collapses into
a definite state, with the cat either dead or alive. But what if we don’t open the box?
According to quantum theory and the wave-function, we must consider that the particle is
simultaneously both decayed and not decayed. Just as a photon in a double-slit experiment
can pass through both slits at the same time, and we can only know the state of the photon
when it is detected, so the only way of knowing whether the isotope has decayed is to open
the box. Until that moment, we must consider all possible outcomes and must treat the 5
mathematics as fact – in other words, the cat is dead and alive at the same time until we
open the box. What Schrödinger showed was that people who agree that the microscopic,
quantum, random behaviour of particles is correct, must also accept that large, macroscopic
systems and even living creatures can be affected by that same behaviour. We cannot have
one without the other.
The Schrödinger,s cat thought-experiment is sometimes called a paradox – it seems
ridiculous that a cat could be both dead and alive at the same time. Yet this is what quantum
theory tells us, and all the scientific evidence we have built up since Schrödinger,s time tells
us that quantum theory does exactly predict what we observe. Such a thought experiment is
set up as a challenge to our ideas about a theory. Thought experiments are highly valuable in
making us think carefully about physics and the implications of some of our ideas.
Figure S33.5 Schrödinger's cat thought experiment: inside the box, the radioactive isotope both
has and has not decayed, and so the cat is both dead and alive.
S33.3 Uncertainty
If we think about the double-slit experiment for an electron, all we know is that the electron
arrived in a particular place at a particular time. Because it travelled as a wave that passed
through both slits, we don’t know where the electron was at the moment it arrived at the slits.
If we move our detector to the slits, the electron doesn’t pass through them because it has
been detected.
You might suggest that we could work out the exact path of the electron if, at the same
time we detected its position on the screen, we also measured its velocity or momentum.
Another surprising aspect of quantum theory is that this combination of measurements –
knowing exactly the position and momentum of a particle at the same time – is impossible.
In fact, quantum theory teaches us that even asking such a question is meaningless – we
cannot know which slit the electron passed through, nor can we measure quantities later that
would enable us to calculate exactly where it was and how fast it was moving.
This is a difficult concept to grasp, as it seems very different from what we observe in the
‘real’, macroscopic world of objects, position, momentum and collisions. Let us think about
the microscopic electron, and how we might try to measure exactly where it is in the double-
slit experiment. We need to appreciate that the wave-function of a particle does not describe
a single, easily measured wave with precise wavelength. The correct description is a ‘wave
packet’, meaning a number of wavelengths superposed onto each other.
In order to deduce exactly where an electron is, according to quantum theory we need to
localise the electron’s wave-function – meaning that the spread of the electron wave-packet
would have to be known to sufficient precision that we can assign it only a very narrow
range of position. The nature of the wave-function is such that by narrowing down the range
of position, the range of momentum the electron can have gets broader. In other words,
the more precisely we know the position, the less precisely we can know the momentum.
Similarly, if we know the momentum more precisely, we know how fast the electron is going, 6
but we don’t know where it has been! This is a mathematical property of the wave-function.
We call position and momentum conjugate variables, because knowing either one with
more precision means the other must be known less precisely.
There are other pairs of conjugate variables affected in exactly the same way, for example
energy and time, and angular momentum and angular displacement.
This turns out to be a fundamental problem not just of quantum mechanics, but of these
types of conjugate variables more generally. At the microscopic scales involved, there is a
trade-off between knowledge of position and knowledge of momentum that is impossible to
get round – it is deep-rooted in nature and is not a limitation of our measuring instruments.
This was first understood in quantum theory by Werner Heisenberg and he expressed it
mathematically as the ‘uncertainty principle’:
h
∆p∆x ≥ =
2π
In this equation Δp is the uncertainty in the momentum – the spread of possible values
the momentum might have. Δx is the uncertainty in the position. If the uncertainty in one
variable is small then the uncertainty in the other variable must be large, because the product
has to be greater than the constant on the right: Planck’s constant divided by 2π.
Heisenberg’s uncertainty principle is especially significant when we consider how we
calculate what happens to a particle in the future. At a microscopic level, to be able to
calculate exactly what will happen to every particle in a room at every moment thereafter,
we would need to know precisely the position and momentum of every single particle in the
room. However, we cannot know both the position and the momentum of any one particle –
if we know where a particle is, we cannot know precisely how fast and in which direction
it is travelling, and vice versa. This means the future of any individual microscopic system
cannot be predicted with absolute certainty, which is a profoundly different situation from
the classical physics that came before quantum mechanics. What can usually be predicted,
however, is the general behaviour of the macroscopic system, because we can sum across all
the probabilities of the individual microscopic parts.
Heisenberg’s uncertainty principle is sometimes confused with something called the
‘observer effect’. Even Heisenberg himself first thought about his uncertainty principle in
these terms, although he soon realised his mistake. In the observer effect, we consider how we
might find the position of an electron – for example, in a double-slit experiment, we consider
how we might know which slit it passed through. To do that we must look at it (observe it) in
some way – for example, we might shine light on it. To get a sufficiently precise observation
of the position, we need to shine light waves of very short wavelength and high energy. To
get an observation of an electron, one of these high-energy photons needs to ‘bounce off’ the
electron, and this collision would cause the electron to change speed and/or direction. So
any attempt to observe the electron with high precision will in itself change the momentum
and/or position of the electron. In other words, the act of observing the electron changes the
very things we were trying observe. This ‘observer effect’ is different from the uncertainty
principle, although both affect our ability to observe and predict quantum effects.
The observer effect seems to be a limitation on our abilities to make measurements. One
can try to think of clever ways around it. However, Heisenberg’s uncertainty principle is
much more basic than this – there is no clever way round it. This is illustrated by double-slit
experiments that attempt to measure which slit the electron passes through on its way to
the detector. Any experiment sensitive enough to locate the electron at the slits destroys the
interference pattern and gives a pattern of electrons at the detector which is the sum of that
due to two separate single slits. If the electron’s position is known, it cannot pass through
both slits so the interference pattern disappears.
S33.4 Quantum theory and classical physics

7
Quantum theory is not just a little bit different from classical physics. It completely changes
our understanding of the Universe. The ideas introduced in the previous section are the
clearest indication of that. In classical physics the only limitations are human or mechanical
– our ability to know the exact state of a system and predict how it will evolve (develop)
are generally only limited by the quality of our measuring instruments. In principle, using
classical physics the assumption was that we could pin down the position and momentum
of every particle and determine exactly where all those particles will be in one second,
one hour, even one year’s time. We say classical physics is deterministic. Even apparently
random events such as the roll of a die or the selection of a lottery ball could be determined if
we knew enough about the initial state of every particle involved.
However, we have discovered that at a microscopic level, the Universe does not work in
this way. Quantum physics, in contrast to classical physics, is indeterministic – the future of
a system is not uniquely determined by its current state. Two electrons from the same source
and with the same wave-function will not necessarily end up in the same place. With exactly
the same initial conditions and passing through the same apparatus, one electron may end
up at one point and a second electron may end up somewhere else. There is no way, even in
principle, to predict the precise outcome of all the variables in a quantum experiment, only
the probabilities of particular states occurring.
A second difference between quantum theory and classical theory is that in classical
physics we can know everything about a particle, whereas in quantum theory we have
already seen that the Heisenberg uncertainty principle prevents that. We say quantum theory
is incomplete.
A third difference is that quantum theory appears to be non-local – effects happen over
a distance. In the Copenhagen interpretation, the collapse of the wave-function happens
instantly. This is particularly important when two particles become entangled – their wave-
functions linked together. Once a property of one particle is measured (its position for
example) the whole wave-function of the system collapses and the other particle’s position is
fixed, even though the position of this second entangled particle was not itself being measured!
These differences call into question our very understanding of reality. The Universe
does not follow the simple rules we used to expect. Many 20th century physicists were
uncomfortable with the way in which quantum theory appeared to undermine reality,
including Albert Einstein who famously said ‘God does not play dice’, meaning that nature
cannot be truly random. He was convinced that particles must have had properties that
did determine the outcome of experiments, but those properties were not measurable
directly – so-called ‘hidden variables’. Einstein spent a great deal of his later life trying to
make quantum theory deterministic, local and complete by adding hidden variables. Recent
experiments, based on a theory by John Bell developed in the 1960s, have shown that there
are no hidden variables. Quantum theory is every bit as strange as it seems!
Summary
■ The double-slit experiment tells us what happens as a result of wave-particle duality,
but not why it happens.
■ Different interpretations of quantum theory explain this and other experiments in
different ways. These interpretations include the Copenhagen interpretation (and
collapse of the wave-function), Feynman’s sum-over-histories and Everett’s many-
worlds theory.
■ A thought experiment is a way of viewing a new or challenging scientific theory to
highlight its conclusions or prompt discussion of its consequences.
■ Schrödinger’s cat thought-experiment shows how apparently microscopic quantum
effects can affect the macroscopic ‘real world’.
■ Quantum theory is indeterministic, meaning that the outcome of an experiment is 8
not fully determined by the state of the particles and the system.
■ Quantum theory is incomplete because we cannot fully determine the values of all
the variables at the same time.
■ Quantum theory is non-local because wave-function collapse appears to happen
instantly, affecting all entangled wave-functions in a system.
■ Heisenberg’s uncertainty principle tells us that the precision with which we can
measure the position and momentum of a particle is limited by the equation
h
∆p∆x ≥ =
2π
S34: The special theory

of relativity
Learning Outcomes
■ recall that Maxwell’s equations describe the electromagnetic field and predict the existence
of electromagnetic waves that travel at the speed of light
■ recall that at the end of the 19th century, most physicists assumed that these electromagnetic
waves were vibrations in a medium called the aether, filling absolute space
■ recall that experiments looking for variations in the speed of light caused by the Earth’s
motion through this aether gave null results
■ understand that Einstein’s theory of special relativity dispensed with the idea of the aether
■ state the postulates of Einstein’s special principle of relativity
■ explain how Einstein’s postulates lead to the idea of time dilation and length contraction, and
therefore undermine the idea of absolute time and space
■ understand the idea of a frame of reference (an inertial frame)
■ recognise the equations for time dilation and length contraction
■ understand that two events which are simultaneous in one frame of reference may not be
simultaneous in another, and explain this in terms of the fundamental postulates of relativity; 1
distinguish this from the phenomenon of time dilation
The derivation of the time dilation and length contraction formulae are beyond the requirements
of the syllabus, but the formulae themselves must be known. The mathematical treatment
of the loss of simultaneity is also beyond the requirements of the syllabus, as is the detailed
explanation of the twin paradox. The Lorentz transformations are also not required. This
material is included here to allow a more complete understanding of the topic.
S34.1 Introduction
At the end of the 19th century and the beginning of the 20th century, many physicists believed
that they had discovered most of the laws of the Universe. A quote attributed to Lord Kelvin
(perhaps erroneously) in 1900 was: “There is nothing new to be discovered in physics now.
All that remains is more and more precise measurement.” His sentiments were echoed by
Albert Michelson, an American physicist about whom we will learn more in this chapter.
He said “The more important fundamental laws and facts of physical science have all
been discovered, and these are so firmly established that the possibility of their ever being
supplanted in consequence of new discoveries is exceedingly remote…”. There were, however,
a number of loose ends remaining, which would ultimately lead to the theories of relativity
and quantum mechanics (as discussed in earlier chapters). These topics are often referred
to as ‘modern physics’, and earlier physics as ‘classical physics’ or ‘Newtonian physics’. We
can solve many problems in physics with purely classical physics, but as we start to consider
things moving close to the speed of light, classical physics begins to break down and we must
use relativity.
Cambridge Pre-u Physics
S34.2 Towards Einstein’s theory of relativity

Looking at the wave nature of light
The greatest achievement of 19th century physics was James Clerk Maxwell’s theory of
electromagnetism and the pioneering experimental work of Michael Faraday that led to it.
Maxwell described all of the electrical and magnetic phenomena that had been discovered in
just four equations, framed in the language of vector calculus and showing that electrical and
magnetic fields and forces had common origins in the unified concept of electromagnetism.
Maxwell discovered that these equations predicted the existence of electromagnetic waves,
and that the speed of these waves was equal to the speed of light. This discovery combined
light (and the rest of what we now call the electromagnetic spectrum, much of which
remained to be discovered at the time) with electromagnetism.
You will be familiar with the idea of waves travelling through a medium. For mechanical
waves such as sound waves in air, if we take the medium away, the wave no longer propagates
(travels). So it is impossible for sound and other such mechanical waves to travel in a
vacuum. Clearly, though, light does travel in a vacuum. In the early part of the 19th century,
physicists thought that there must be a medium through which the light travelled, that
existed throughout space, even in a vacuum. They called this medium the ‘luminiferous
aether’. At the time, it was thought that if you were stationary relative to the aether, light
would be seen to travel at the measured speed of light within the aether. However, if you were
moving relative to the aether, the idea was that you should be able to measure a different
speed of light. It was thought that this would be much the same as the observations of the
speed of sound as being higher on a windy day, if the wind was blowing from the source of
the sound towards the observer.
In order to confirm the idea of an aether, it would be necessary to observe a change in
the speed of light when we were moving relative to that aether. Since the Earth orbits the
Sun in an elliptical (almost circular) orbit, its velocity vector will reverse in direction every 2
six months. Therefore, over the course of a year, the Earth would change the direction of its
motion through this aether, and some change in the speed of light ought to be observed. The
Earth moves at approximately 30 km s−1 in its orbit, while the speed of light is 300 000 km s−1.
The change in velocity relative to the aether therefore would be only 0.01% of the speed of
light, so sensitive equipment would be needed to detect it. The motion of the Earth through
the aether was described as producing an ‘aether wind’. Figure S34.1 shows what we should
expect to observe if light propagates through an aether, and we observe it from a position
moving relative to the aether.
Earth travelling with velocity v In Earth frame or reference, we

relative to the aether would experience aether wind
with velocity –v
Figure S34.1 Eff ects of the ‘aether wind’ on Earth.
It should also be possible to change the motion of the light relative to the aether by rotating
the equipment in the laboratory, so that the light moved in the opposite direction compared
to the Earth’s motion. This is the approach that Albert Michelson and Edward Morley took
in an experiment they set up in 1887 to determine the effects of the aether.
A diagram of the set-up used in the Michelson-Morley experiment is shown in
Figure S34.2: this equipment is known as the Michelson interferometer. Light enters the
interferometer and is split into two beams, which travel at right angles to each other, are
then each reflected from a mirror, and return to the point at which they were split. The two
beams are then recombined and this recombined beam is observed on a screen (or through
an eyepiece). Depending on the optical path difference between the two paths the light took,
there may be constructive or destructive interference observed on the screen, or something
in between. There would be constructive interference if the path difference were equal to
a complete wavelength or a multiple of a wavelength (nl), and destructive interference if
the path difference were an odd multiple of half a wavelength ((2n+1)l/2). Of course, since
v = f l, and because the frequency remains constant, if we change the velocity of the light,
we will change the wavelength. So if an aether wind exists, the equipment can be arranged
so that one of the light paths is parallel to this wind, and the other is perpendicular. If
the two different paths are exactly the same physical distance, the aether wind should
cause an optical path difference to arise between the two paths, and so interference effects
should be observed. If the apparatus were to be rotated, so the speed of light along the two
paths changed due to the altered direction relative to the aether wind, then the optical
path difference and thus the interference effects would change. This should be particularly
noticeable if a white light source were to be used, as this produces a range of wavelengths.
Changing the optical path length would change the colour pattern produced, much as the
colours change when you view an oil slick on a puddle from different angles.
screen 3
(1) + (2)
laser L2
mirror 2
(2) (M2)
beam splitter (BS)

L1
(1)
mirror 1
(M1)
Figure S34.2 Diagram of the Michelson interferometer.
Michelson and Morley’s experiment showed precisely zero change in the interference pattern
when the equipment was rotated. They repeated the experiment six months later, just in case
they had happened to perform the original experiment at a point in the Earth’s orbit where
there was no motion relative to the aether. They still found no change in the pattern as the
equipment was rotated. Their equipment was sensitive enough to detect changes of the size
of those expected (it was sensitive enough to detect an aether wind of just a few km s−1). They
had to conclude that they could not detect any motion relative to the aether. Either there
was no aether, or it was being ‘dragged along’ by the moving Earth. This experiment is often
called the ‘most famous null result in history.’ It carried serious implications for classical
physics, as we will see.
Other experiments (such as those by Fizeau, earlier in the 19th century) had shown that
in water, light was ‘dragged along’ by the water, but not completely – the measured speed
of travel of the light was less than the sum of the speed of the water and the speed of light
in stationary water. So, if the concept of an aether was correct, we have two experiments
showing apparently contradictory results. The solution, as we will see, is that there is no
aether. Light is unlike mechanical waves, in that it does not require a medium to travel
through.
Could treating light as a stream of particles (photons) provide

an answer?
You will remember from the chapter on quantum physics that light and matter show both
wavelike and particle-like properties. Let us examine what happens to photons emitted from
a source moving close to the speed of light. We have the benefit of hindsight, knowing that
Albert Einstein in 1905 had shown in his papers on the photoelectric effect and the special
theory of relativity that the speed of light in empty space is absolute, and that photons can
show the properties of both wave and particle behaviour. Remember that physicists in the
19th century did not have the correct theories to explain these effects, nor did they have the
sophisticated equipment to prove such theories.
In fact, since Einstein’s papers there have been particle physics experiments that enable
us to investigate the speed of photons emitted from a moving source. In 1964 Alväger and
co-workers at CERN (a European nuclear and particle physics facility that is now the largest
such facility in the world) fired protons at a beryllium target to produce fast-moving neutral
particles called pions (π0), travelling at 0.9998c. These pions quickly decay into two gamma-
ray photons. The experimental team measured the speed of these photons in the laboratory
rest frame and found the speed to be c to within 0.005%. A similar experiment had been
conducted in 1963 with neutral pions at a speed of 0.2c by Filippas and Fox. Both of these
sets of experiments confirmed Einstein’s special theory of relativity. 4
Let us investigate what such a pion experiment would reveal if we used classical,
‘Newtonian’ physics. Figure S34.3 shows a model of the decay. We would have expected the
two emitted photons to each have a different momentum and hence velocity, due to the initial
high velocity of the pion before the decay. A ‘forward-emitted’ photon would travel faster
than the ‘backward emitted’ photon. However, this is certainly not what is observed. Both
photons are measured as having speed c, even though they are emitted from a moving source.
0.0002c 1.9998c
v = 0.9998c
π0
Before decay After decay,

in a Newtonian framework
Figure S34.3 The pion experiment, showing what we would expect to observe according to
Newton’s laws of motion.
Classical physics cannot explain the results of these experiments. Neither the wave model
nor the particle model of classical physics are sufficient to give us an explanation, even when
we take all the potential effects of imprecise or inaccurate measurements into account. To
produce an explanation, we need a new model that can be applied to light and other particles
travelling close to the speed of light.
S34.3 The postulates of relativity

In 1905, Einstein published his paper ‘Zur Elektrodynamik bewegter Körper’ (‘On the
electrodynamics of moving bodies’). This proposed changes to the laws of mechanics for
objects moving at speeds close to that of light, to extend the laws of mechanics and make
them consistent with Maxwell’s equations of electromagnetism.
Note that Einstein’s 1905 paper was the ‘special theory of relativity’, which applies in
particular situations. Einstein also realised that there were even greater consequences of his
ideas, which he would later develop an entirely new type of mathematics to explain. In 1915,
he published his paper on the ‘general theory of relativity’, which took his ideas further still
and made us view gravitation, space and time in a wholly new way. This is well beyond the
scope of this book; for now, we are considering just the special theory and how it explains the
Michelson-Morley and pion experiments.
The special theory of relativity is summarised by two postulates (statements that are
assumed to be true):
First postulate (the principle of relativity): The laws of physics are the same in all inertial
frames of reference.
Second postulate: The speed of light in free space (in a vacuum) has the same value c in
all inertial frames of reference.
In order to understand these postulates, we need to understand what is meant by an inertial

frame of reference.
Inertial frames of reference

First of all, what is a ‘frame of reference’? You have already seen frames of reference being
used. For example, when we measure the displacement and velocity of a moving object, we
are measuring these quantities relative to our own frame of reference. In an experiment,
we consider ourselves in our laboratory to be stationary, and we set up a three-dimensional
frame of reference in which we can measure the components of a displacement or velocity, 5
including their magnitude and direction.
Recall Newton’s first law of motion – a body moves with constant velocity (which may
be zero) unless a resultant external force acts upon it. An inertial frame of reference is any
frame of reference in which no resultant external force acts, so that Newton’s first law tells us
it must be stationary or moving at a constant speed in a straight line. So our laboratory frame
of reference, in which we the observer are stationary, is one example of an inertial frame of
reference. Another frame of reference, this time moving with constant velocity with respect
to our first inertial frame, is itself another inertial frame.
For example, imagine you set up your laboratory experiment on a train moving at
constant velocity. You could still carry out your experiment and measure the same results
as if you were in a stationary laboratory; Newton’s first law applies both in your stationary
frame of reference in the lab and in your moving frame of reference on the train. Both the
stationary experiment and the experiment moving at a constant speed are inertial frames of
reference. This idea is illustrated in Figure S34.4.
zB zB
velocity v
yB yB
xB xB
Inertial frame B Inertial frame A
Figure S34.4 Inertial frame A is moving at a constant velocity v with respect to inertial frame B.
In the case shown, the frames do not occupy the same point in space at time t = 0. Note that we
could choose any perpendicular axes x A , yA and z A for frame A, and a different set of perpendicular
axes for frame B, xB, yB and zB, and the constant relative velocity of the two frames can be in any
direction, and they will both still be inertial frames of reference.
What if the train starts to accelerate? A frame that is accelerating is a non-inertial frame of
reference. We can observe the effects of this. Imagine you as an observer place a ball in the
centre of an otherwise empty train carriage with a smooth floor. Assume there is no friction
between the ball and the floor. As the train accelerates, the velocity of the train carriage
increases. However, the ball is free to move and the concept of inertia tells us that the ball
does not accelerate.
Yet from the point of view of you, the observer sitting in the carriage, the ball moves to the
back of the carriage. From your point of view, you are stationary relative to the train, and it
would appear to you that a force instead must be acting on the ball, accelerating it towards
the back of the train. From your frame of reference, the ball appears to be in a non-inertial
frame of reference. However, someone measuring the motion from the side of the train
track would observe the train accelerating beneath a ball that continued moving at constant
velocity. To them, you and the train are in a non-inertial frame of reference, not the ball. You
can see that considering non-inertial frames of reference can get complicated!
Obviously, if you were sitting in the train carriage, it is not correct to think you would be
unaware of the force acting on the train. For example, as the train accelerated you would feel
yourself being pushed back against your seat, and you might see from the countryside passing
by outside that you were moving faster. The important concept to grasp is that an external
resultant force causes an acceleration, and an accelerating frame of reference is non-inertial.
A rotating frame of reference is also non-inertial. An object that rotates at a constant
speed is accelerating, because although its speed is constant, the direction of its velocity is
constantly changing.
In special relativity, it is important to remember that we are going to deal exclusively
with inertial frames of reference. We only consider objects that are stationary or moving
at constant speed in a straight line relative to each other. Einstein extended his theory of
relativity later to deal with accelerating frames of reference: the general theory of relativity.
6
The word ‘special’ indicates that we are dealing with this special case of inertial frames.
In classical physics, we can easily take into account the differences between inertial
frames. In Figure S34.4 the two frames are labelled A and B. The frame A is moving at
speed v relative to frame B, in the direction of both frames’ x axes. Imagine that two events
happened one after the other in different places in frame A, with a time difference ∆tA.
The same two events are observed in the stationary frame B. If these events as measured
in frame A are separated by a distance ∆xA between the x co-ordinates, ∆yA between the y
co-ordinates, ∆zA between the z co-ordinates, then the separations in frame B are given by:
∆xB = ∆xA + v∆tA

∆yB = ∆yA
∆zB = ∆zA
∆tB = ∆tA
In the time between the first and second events, the frame A will have moved to the right by
v∆tA, so we need to add this term when calculating the x co-ordinate in frame B. This move
from one inertial frame to another is called a Galilean transformation when it is done in
classical physics, named after the physicist and astronomer Galileo. At low speeds, this is all
reasonably straightforward. Galileo realised that the laws of motion are the same in both
frame A and frame B; he proposed that the laws of motion are the same in all inertial frames.
For hundreds of years this was a basic assumption applied to all physics. The consequence of
Einstein’s postulates of relativity is that this simple transformation from one inertial frame
to another is actually not correct. Einstein realised that Galileo’s idea of relativity, when
applied to light, would result in the speed of light being different in different inertial frames.
Einstein determined that a different type of transformation is needed, one that ensures
the speed of light will be the same in all inertial frames. This transformation is named the
Lorentz transformation. At low speeds, this transformation produces almost exactly the
same mathematical results as the Galilean transformation. However, as the speed of a frame
of reference approaches the speed of light, the results of the transformation are very different.
So thankfully in most circumstances, we can still add velocities in the way we are used to
from classical physics!
A good resource for exploring the ideas of frames of reference is a video entitled ‘Frames
of Reference’, produced in 1960 and presented by University of Toronto professors Patterson
Hume and Donald Ivey (available on YouTube at the time of writing).
S34.4 Consequences of the postulates of relativity

The two postulates of special relativity have some interesting consequences, but importantly
they are consistent with both the null result of the Michelson-Morley experiment and the
pion experiment.
• If the speed of light is a constant in all inertial frames, then we wouldn’t expect anything
other than a null result in the Michelson-Morley experiment. When the equipment is
rotated to a new position and the experiment repeated, there should be no difference in
the results as the time taken for light to travel down each leg of the interferometer will
remain constant. In fact, Einstein’s special theory of relativity meant that the whole idea
of the aether was no longer needed.
• In the pion experiments, considering Einstein’s theory means we would actually expect
the two photons emitted from the decay to be measured as travelling at the speed of light
in the laboratory frame of reference. The speed at which the pion travels before it decays
makes absolutely no difference to the speeds of the emitted photons.
Another consequence of the postulates is that in order for the speed of light in free space
to be measured as having the same value, c, in all inertial frames, it means that no single
inertial frame is ‘better’ or ‘worse’ than any other. There is no particular frame of reference
that we can say is the absolute stationary frame for the Universe, from which we should 7
measure everything else. We have to abandon our ideas of absolute space and time. Each
inertial frame must have its own space and time coordinates, and they are equally valid
compared to any other inertial frame. Another way of looking at this is if two ‘events’ are
separated by an interval in space and time, these measurements are tied to the inertial frame
in which the measurements are made. In a different inertial frame, the separation in space
and in time of those two events will be different. At first, this idea may seem very strange; we
will see how this works below.
Time dilation
Now, we will look at the first of the unexpected consequences of the postulates of relativity –
time dilation. We are used to thinking of time as absolute, but what we are about to show
is that it is not! The idea of absolute time is something we take for granted: for example,
imagine you and a friend had identical clocks that are extremely precise and never run out
of power. You then spend a long time apart – it could be minutes, hours, days or even many
years – and when you meet up again you compare your clocks. The idea of absolute time is
that those clocks would show exactly the same time. A consequence of the special theory of
relativity is that these clocks may not show the same time.
The following is a classic thought experiment, due to the Nobel Prize-winning physicist
Richard Feynman. Einstein used the German word gedankenexperiment, which translates
as ‘thought experiment’, to describe the conceptual experiments he used in creating the
theory of relativity. In 1905, the fastest way to travel as a passenger was in a train, so just like
Einstein, let’s set our thought experiment on a train.
a
mirror
A In A’s frame, the light

travels straight up and
back down the same path
b
mirror mirror mirror
c
c
y
A A
v v∆ B v v
In B’s frame, the light travels a different, longer path,
B but it still travels at speed c
Figure S34.5 Feynman’s light clock thought experiment.
Figure S34.5 shows the set-up. We have a device called a ‘light clock’ on a moving train.
Observer A is in the train carriage; observer B is at rest by the side of the track. We will
call A’s frame of reference the train frame, and B’s frame of reference the Earth frame. The
clock consists of a light source and receiver which are in the same position. The light source
flashes, and the flash is reflected from a mirror on the roof of the carriage, back down to the
receiver. Let’s call the time from emission to reception of the light one ‘tick’ of the clock. We 8
are going to look at the time taken for one tick of the clock from the point of view of observer
A, and then from the point of view of observer B.
In the train, we will call the time taken for one tick ∆tA. The carriage is of height y, so the
time for the light to reach the ceiling and return to the detector is (using time = distance/speed):
2y
∆t A =
c
For observer B, in the Earth frame, the light follows the path shown in Figure S34.5b. The
light must be observed by B (as well as A) to travel at speed c (the second postulate of special
relativity), but you can see from the diagram that in B’s frame, the Earth frame, it travels a
greater distance. If the speed has not changed but the distance travelled is greater, then the
time elapsed for one tick is longer in the Earth frame. We can actually work out exactly
how much further it travels, and thus work out the time for one tick in frame B. Let’s call the
time taken for the light to travel from the source to the mirror and back to the receiver in B’s
frame ∆tB. During that time, the carriage travels a distance
x = v∆tB
Using Pythagoras’ theorem, the total distance travelled by the light is twice the hypotenuse
of the right angled triangle with sides y and x/2. Therefore the total distance travelled, 2d, is
2
v ∆t B 
2d = 2 y 2 + 
 2 
But we also know that the time taken must be such that the speed of light is measured to be c.
So we can write that:
2
2 y 2 +  v ∆t B 
2d  2 
∆t B = c = c
After re-arrangement, we can write this expression as:

2y 1 1
∆t B = = ∆t A
c v2
v2
1− 1−
c2 c2
question
34.1 Prove that the expression above follows from the previous expression for ∆tB.
In relativity, the expression

1
γ=
v2
1−
c2
appears a lot, so we give it the symbol γ , and often call it the γ -factor (gamma factor). Think
carefully about this expression: you can see that γ is always greater than 1, and that it is
approximately equal to 1 for speeds that are small compared to c. It becomes very large
(tending to infinity) for speeds close to the speed of light.
Let’s write our expression relating the times in the two frames using γ :
∆tB = γ∆tA
Now let’s think about what this means. Since γ is always greater than 1, more time elapses
between ticks of the clock in frame B (the Earth frame) than in frame A (the train frame). If
we think carefully about this, it means that time is running more slowly in the train frame – 9
since the time between the emission and reception of the light is shorter. This phenomenon is
known as time dilation. It is often quoted as ‘moving clocks run slow’. Remember of course
that moving in this case means moving relative to another frame of reference!
You may be thinking: “From observer A’s point of view, B is moving past him at velocity
–v. So, since –v gives the same γ-factor as +v, we could write the time-dilation equation as
∆tA = γ∆tB. This is an apparent contradiction unless γ = 1.” What we have forgotten is that the
equation we derived assumes that the light clock remains at the same x coordinates in frame
A, so the journey of the light in frame A is straight up and down. This assumption breaks
the symmetry between the frames, so we can’t just switch frames as suggested. In fact the
equation ∆tA = γ∆tB would only be valid if the light clock were instead stationary in B’s frame,
and ∆tA and ∆tB referred to times measured on this clock. So in fact, both observers see the
other’s frame as being time dilated, but there is no contradiction! Also, if time is running
more slowly, observer A in the train is aging less quickly than observer B in the Earth
frame. Later we will look at the famous thought experiment where one observer sets off on a
relativistic journey and returns having aged less than people who stayed on Earth. Again, the
situation is not as symmetrical as it might first appear.
Evidence for time dilation: Muon decay

The muon (μ-) is a fundamental particle, a lepton (like the electron, but with more mass).
It is unstable, and decays to an electron, an electron anti-neutrino and a muon neutrino.
The half-life of the muon is 1.56 μs, in its own rest frame (a frame that is moving with the
same velocity as the neutrino). Muons can be observed on Earth as a component of cosmic
radiation, the natural radiation that is present throughout our galaxy. Most muons observed
on Earth are thought to be created at altitudes of 15 km from other highly energetic particles
making up the cosmic radiation. We can measure the number of muons that decay as they
travel through the atmosphere, by comparing the density of muons detected high up in the
atmosphere (e.g. up a mountain) with the density observed at ground level.
The muons travel at high relativistic speeds, i.e. close to the speed of light. A muon with an
energy of 20 GeV has a γ-factor of approximately 190. This means that its speed is 0.999986c.
We know the half-life of these muons because of very precise laboratory measurement. We
can therefore calculate what fraction of all the muons should remain undecayed after a 15 km
trip through the atmosphere. The time taken to travel 15 km is:
15 000 m
T= = 5.00 × 10−5 s = 32.1half -lives
0.999 986 × 3 × 108 ms −1
Therefore we would expect 2−32.1 = 2.1 × 10−10 to be the fraction of muons that reach ground
level, i.e. less than one in a billion. When we make measurements of what actually takes place
in the atmosphere, many more muons than this are observed. This is because the muon,
moving at high relativistic speed, experiences less time passing in its frame of reference than
the observer in the Earth frame of reference. We need to take into account relativistic time-
dilation. The lifetime of a muon moving at this speed, observed from the Earth frame, is γ
× 1.56 μs = 0.296 ms. Now our travel time becomes 0.17 half-lives, and therefore the fraction
that is able to reach the ground is 2−0.17 = 0.89. So, in fact, after taking relativity into account,
most of the muons reach the ground. This prediction is consistent with experimental
measurements.
Length contraction
IA
10
A’s frame
IB
B’s frame
start
light reaches mirror
c
B
finish
Figure S34.6 Thought experiment for length contraction.
Another phenomenon associated with travel at relativistic speeds is length contraction.

We use another thought experiment to investigate this. Consider the situation shown in
Figure S34.6. Again observer A is positioned in a train carriage, and the train is moving at
velocity v past observer B, who is in the Earth frame of reference. This time, our ‘light clock’
is arranged so that it sends a pulse of light along the direction of motion of the carriage, to a
mirror at the far end, and receives it back. We measure the time taken for the pulse to travel
to the mirror and back.
If we call the length of the carriage in A’s frame of reference (the train frame) lA and the
travel time of the pulse ∆tA, then since the light travels distance 2lA in time ∆tA:
2lA
∆t A =
c
In B’s frame of reference (the Earth frame), after the light pulse is emitted, the mirror
is moving away from the light: the light is travelling at speed c, so the relative speed of
approach of the light to the mirror is c – v. Once the light reflects off the mirror, and reverses
its direction, in B’s frame it is moving towards the mirror at relative speed c + v. We can
calculate the travel times for the light to get to the mirror, t1, and the time for the light to
return from the mirror to the detector, t2:
lB l
t1 = ; t2 = B
c−v c+v
Here, lB is the length of the carriage as measured in B’s frame. We can sum these times and
re-arrange to get the total travel time of the pulse in frame B:
 1 1  2lBc 2l 1 2l
∆t B = lB  + = = B = Bγ2
 c − v c + v  c 2 − v 2 c  v2  c
 1 − c 2 
However, since the emission and reception of the light occur at the same spatial co-ordinates
in A’s frame, we can use our earlier time dilation result to relate ∆tA and ∆tB too: 11
∆tB = γ∆tA
Combining our two expressions for ∆tB , and the expression for ∆tA, we can deduce:
2 LB 2 2L
γ = γ∆t A = A γ
c c
lA
⇒ lB =
γ
What does this mean? Remembering that γ is always > 1, then it tells us that observer B
measures the length of the carriage to be shorter than observer A. More generally, if we
make a length measurement of an object in a frame where the object is stationary, otherwise
known as the rest frame of the object, then we are measuring the longest possible length for
the object. The length of the object in its rest frame is called the proper length. In any other
frame its length will be less than or equal to its proper length: we say it is length contracted.
You could reason that from A’s point of view, B is moving past with velocity –v, so if A
measures an object which is at rest in B’s frame, A will measure it as shorter than an observer
in B’s frame would measure it. You would be right, but this does not contradict the idea
that the shortest possible length for the object is in its rest frame – since we are considering
measuring two different objects.
When we measure an object, it means that we determine the coordinates of the two ends
of the object simultaneously (at exactly the same time). We can consider the act of measuring
coordinates to be an ‘event’. Therefore, measuring the two ends of the object means there are
two events. Although the two events are simultaneous in one frame of reference, we will see
below that if they are separated in space, they will not be simultaneous in another frame
of reference that is moving with respect to the first frame. For example, if an object’s length
is measured in its rest frame (by taking the coordinates of the ends simultaneously in that
frame), the two events involved in the measurement are not simultaneous in any other frame,
so they are not a measurement of length in any other frame!
question
34.2 Look back at the previous section, where we used the time dilation formula to show
that the lifetime of the muon in the Earth frame was long enough that approximately
90% of the muons reach the surface of the Earth. Analyse the situation again in a
frame travelling at the same velocity as the muon (the muon’s rest frame), where the
half-life is 1.56 ms. From this frame, the distance the muons have to travel is length
contracted.
a Calculate the length-contracted distance that the muons have to travel.
b Use this length and the muon’s lifetime of 1.56 µs to calculate the fraction of muons
that reach the surface of the Earth. This should be the same as the answer we
arrived at by considering the effect of time dilation on the muon’s lifetime.
Loss of simultaneity
IA
c c
A’s frame - light reaches the ends of the carriage

at the same time
IB
12
v
start
c c
light reaches rear of carriage
light reaches front of carriage

B’s frame - light emitted from a source in the centre
of the carriage reaches the two ends of the carriage
at different times
Figure S34.7 Loss of simultaneity: events which are simultaneous (have the same time coordinate)
in one frame are not simultaneous in a second frame that is moving relative to the first frame.
Here is another thought experiment. In Figure S34.7, observer A is once again in a train
carriage, moving at velocity v, relative to observer B in the Earth frame. In the centre of the
carriage is a light source, which emits a flash of light. In observer A’s frame, the flash of light
reaches the two ends of the carriage simultaneously – it travels at speed c and has to travel
an equal distance to each end. However, in observer B’s frame, the front of the carriage is
moving away from the point at which the light was emitted, and the back of the carriage
towards it. Since the light has to travel at speed c in B’s frame, it therefore reaches the back
of the carriage first. The two events – light reaching the front of the carriage and light
reaching the back of the carriage – which happen simultaneously in A’s frame, do not happen
simultaneously in B’s frame.
Frame A B Frame B
L LB
A v
Event 1: Event 1:
light light
emitted LB(c+v) /2c L(c–v) /2c emitted LB(c+v) /2c LB(c–v) /2c
A
Event 2:
light
reaches Event 2 & 3: v
front light reaches
front and back
(simultaneous
in frame B) 13
A The clocks show the time in frame A

Event 2:
when the light hits the ends of the
light
carriage – they are illuminated
reaches
simultanously in frame B
back
Figure S34.8 Positioning the light source so that it illuminates the two ends of the carriages
simultaneously in the Earth (B’s) frame: in the carriage (A’s) frame, it now reaches the back later.
So if it illuminates a clock moving with the carriage (in A’s frame), the rear clock will be ahead
(show a larger reading) when illuminated.
Now, imagine that we position the light source further forward in the carriage, so that in B’s
frame the light now reaches both ends of the carriage simultaneously, and illuminates a clock
at each end. The clocks are synchronised in A’s frame. In A’s frame, the light takes longer to
reach the rear of the carriage, so when the clocks are illuminated by the light, the clock at the
rear of the carriage will be ahead of the clock at the front (that is, the time elapsed since it
was set will be larger) – see Figure S34.8.
If we continued to emit pulses of light, the rear clock will continue to be ahead, but
always by the same amount. The rate of passage of time on the two clocks is the same – the
rear clock is just ahead by a constant amount. This effect is therefore a completely different
effect to time dilation. From B’s perspective, the passage of time in the train carriage will
be slower, that is, it will be time dilated, but this time dilation affects both clocks equally. It
is worth restating this, as it is important: the fact that the rear clock is ahead by a constant
amount is unrelated to any time dilation effect. The effect we are dealing with here is called
loss of simultaneity. The clock at the rear is illuminated later after the emission of the light
in A’s frame, but both clocks are illuminated simultaneously in B’s frame. Since the clocks
show the time elapsed in A’s frame, when they are illuminated, the rear clock shows the
higher reading (is ahead).
Extension: Quantitative treatment of loss of simulteneity
With a bit of further consideration, we can work out how much the rear clock is ahead, in
a carriage of proper length L and moving at velocity v relative to B’s frame. A light source,
stationary in the carriage frame, emits photons. A photon travelling backwards approaches the
rear wall of the carriage at speed c + v in B’s frame. A photon travelling forward approaches the
front wall of the carriage at speed c – v. If we divide the train in this ratio in B’s frame, as shown in
Figure S34.8, then in B’s frame the photons will reach the walls simultaneously. The ratio is the
same in A’s frame, because length contraction contracts all lengths by the same factor. We can
work out the required position of the light source by knowing that the lengths are divided in this
ratio and must add up to L. Figure S34.8 shows the position required.
Now, the light travelling to the rear clock travels an extra distance of
L(c + v ) L(c − v ) Lv
− =
2c 2c c
in A’s frame. If we divide this by the speed of light, we get the extra time taken for the light to
travel to the rear of the carriage. So the rear clock is ahead by a time
Lv
c2
We will refer to this difference in our analysis of the twin ‘paradox’. This effect also has nothing
to do with the travel time of light – there is a true difference between the time coordinates at the
two different locations in space.
Relative versus absolute time and space

We have seen through the examples here that lengths and time intervals are not absolute –
14
the values they take depend on which frame they were measured in. We have also seen that
two events that are simultaneous in one frame of reference are not simultaneous in another
frame moving with non-zero velocity relative to the first frame, unless they take place at
the same coordinates on the axis along the direction of relative motion of the frames (they
could be separated by some distance perpendicular to the relative motion vector and still be
simultaneous in the two frames).
All of this work indicates that we must abandon any concept of absolute space or time. All
distance and time measurements depend on which frame you are in.
S34.5 The twin ‘paradox’

The twin ‘paradox’ is another classic thought experiment in special relativity. We put
‘paradox’ in inverted commas because it is not, in fact, a paradox – but we do need to think
carefully about why this is the case.
The set-up is this: two identical twins start on Earth, the same age. One takes a flight at a
speed v close to the speed of light to a nearby star, and returns. On his return, owing to the
time dilation effect, he has aged less and is younger than his twin who remained on Earth.
So far, so good. But from the perspective of the twin who is travelling, he could see himself
as stationary, and on the outward journey see his brother moving away from him at speed v,
and on the inward journey towards him at speed v. So he might try to argue that the brother
who remained on Earth should be the one who has aged less.
In fact, it is the brother who remained on Earth who is older when the travelling brother
returns. The problem with reasoning the ‘paradox’ the other way round, i.e. the argument
that the brother who stays on Earth is younger, is that only the twin on Earth remains in the
same inertial frame for both legs of the trip. The travelling twin switches from a frame that
is moving away from Earth at speed v, to a frame that is moving towards Earth at speed v.
The time dilation result only applies to one inertial frame. It turns out that if we analyse the
situation carefully, in the turn around and switch to the new inertial frame, the Earth clock
suddenly jumps ahead. This is related to the ideas of loss of simultaneity that we have been
discussing. On both the outward and return legs, the travelling twin ‘sees’ time passing more
slowly in the Earth frame (he could determine this from a transmission from Earth, taking
into account the effects of the time a radio signal would take to reach him), but the change of
frame on turn-around means that in the end, the Earth-bound twin is older.
The sudden ‘jump’ in the Earth clock is an effect of the change in inertial frames, and is
not an effect of the acceleration (although if we also tried to take the required acceleration
into account, it would get more complicated to calculate, as we must introduce the general
theory of relativity into the argument).
We can set the experiment up differently to avoid having to include the effects of the
acceleration. As a spaceship containing a clock travels at velocity v past the Earth, it
synchronises its clock with a clock on the Earth. It travels to a nearby star, maintaining its
velocity. When it gets there, another ship, with velocity v in the opposite direction passes it,
heading for Earth. As they pass, they synchronise their clocks. When the second spaceship
passes the Earth, they compare the reading on its clock and the clock that remained on
Earth. More time has passed on the Earth clock. This scenario gives us the same change in
frame as in the classic ‘twin paradox’, but without the acceleration.
Extension: Quantitative treatment of the twin ‘paradox’
Let’s analyse what happens more quantitatively, in the situation where the spacecraft is
travelling at 3c/5, to a star 4 light years (ly) away, Alpha Centauri. (You do not need to be
able to remember the steps of this worked example; it is included to give you another way of
understanding the twin ‘paradox’.)
The γ-factor for 3c/5 is: 15
1 5
γ= =
 3
2 4
1−  
 5
In the Earth frame, the return journey distance is 8 ly, so a journey at the speed of light would
take 8 years. At 3c/5, the journey takes
5 40
×8= years
3 3
This is the time that will elapse on the clock that is left on Earth during the journey. As the
outgoing spaceship reaches the ‘turn-around’ point, its clock is synchronised with the incoming
spaceship’s clock while they’re at the same point in space (avoiding any problems from lack
of simultaneity). From the point of view of the observer on Earth, the clock on the spaceship
is time-dilated. The outgoing and incoming journeys take the same time (in both frames), and
therefore the total time elapsed on the spaceship clock as it returns to Earth is
1 40 32
× years = years
γ 3 3
Now let’s look at what happens in the spaceships’ frames. In those frames, the distance that the
spaceship needs to travel in each direction is length contracted. The distance to Alpha Centauri
is therefore, in this frame:
4ly 16
= ly
γ 5
Extension: Quantitative treatment of the twin ‘paradox’ (continued)
The travel time to Alpha Centauri, in the astronaut’s frame, is:
16
ly
5 = 16 years
3 3
c
5
Since the return journey will take the same amount of time, we can already see that this matches
up with our calculation in the Earth frame: the clock on the spaceship will have advanced by
32/3 years.
Now, let’s look at what happens to the clock on Earth, in the ships’ frames. During the
outward journey, the astronaut sees the Earth’s clock as running slow (reads less), due to time
dilation. So as he arrives at Alpha Centauri, the Earth clock reads
16 1 16 4 64
years × = × years = years
3 γ 3 5 15
Now, imagine a clock on Alpha Centauri which was synchronised with the Earth clock at the time
the journey started. From the astronaut’s point of view during the outward journey, the Alpha
Centauri clock is the ‘rear clock’ (look back to our analysis of loss of simultaneity). So it is ahead
of the Earth clock by a constant amount Lv/c2. So on arrival at Alpha Centauri, the Alpha Centauri
clock reads:
64 Lv 64 3 100
years + 2 = years + 4 × years = years
15 c 15 5 15
Now, when the incoming ship arrives, it is in an inertial frame moving in the opposite direction to
the original outgoing ship. It is also at the same spatial location as the Alpha Centauri clock, so 16
it must see the same reading on that clock as the outgoing ship. However, from its point of view,
now the Earth clock is the rear clock (as Alpha Centauri is moving away at the front of the ‘train’).
So, in this change of frames, instantaneously the Earth clock advances by Lv/c2. The reading on
the Earth clock from the point of view of the ship has now become:
100 Lv 100 3 136

years + 2 = years + 4 × years = years
15 c 15 5 15
Now, on the return journey, the Earth clock is again time-dilated, so running slow from the
astronaut’s point of view. During the journey it advances the same amount as it did in the
outward journey, 64/15 years. Therefore the reading on the Earth clock, as the spaceship arrives
at Earth, is:
136 64 200 40
years + years = years = years
15 15 15 3
This is the same as our calculation in the Earth frame. All is consistent, and the clock on the
spaceship has advanced less than the clock on Earth. There is, indeed, no paradox!
S34.6 Experimental evidence supporting relativity

We have already discussed a number of pieces of experimental data that support special
relativity:
• The null result of the Michelson-Morley experiment
• Measurement of the speed of photons emitted by neutral pion decay
• The time-dilated lifetime of muons generated by cosmic rays in the lower atmosphere.
Usually we move at speeds where the effects of relativity are virtually unnoticeable. However,
atomic clocks are accurate enough to measure time dilation at the speed that jet airliners
travel. Hafele and Keating did an experiment in 1971 where they flew four caesium atomic
beam clocks around the world on scheduled airline flights, both eastwards and westwards.
They found that the results were consistent with the predictions of relativity to within the
experimental error. They needed to take both special and general relativity into account, as
at altitude, the gravitational field is weaker. Their paper states that ‘these results provide an
unambiguous empirical resolution of the famous clock “paradox” with macroscopic clocks’.
What they refer to as the ‘clock “paradox”’ is what we have called the ‘twin “paradox”’.
Similar, more accurate, experiments conducted later have also confirmed the predictions of
relativity.
The Global Positioning System (GPS), used for satellite navigation, relies on accurate
timing to determine your position on the Earth. The GPS satellites also use atomic clocks,
and these must be corrected for the effects of relativity.
S34.7 S
ome hints to remember how to apply
relativistic effects
Remember that the γ-factor is always greater than or equal to 1.
1
γ=
v2
1−
c2
Time dilation
Moving clocks run slow – less time elapses between events in a frame that is moving with 17
respect to you. So if frame A is moving at velocity v with respect to frame B, then more time
elapses between events in frame B – so the γ-factor must multiply ∆tA:
∆t B = γ∆t A
Length contraction
Moving objects are measured as being shorter – an object is longest in its rest frame. When
an object, stationary in frame A with length lA in that frame, then if frame A is moving with
velocity v with respect to frame B, the object will be measured as having a shorter length in
frame B:
lA
lB =
γ
Don’t forget, though, that it’s equally valid for observer A, for whom frame B is moving
at velocity –v, to say that lengths in frame B are length contracted. So the equation is
equally valid with A and B exchanged: but in this case we are measuring an object which is
stationary in frame B in frame A, so there is no contradiction!
Loss of simultaneity
Rear clock ahead – if you observe two clocks separated in space that are both in the same
inertial frame, which itself is moving relative to you, then the rear clock (the one that would
pass you second if they were approaching) is a constant amount ahead (whenever you
observe them). This comes about because two observations that are simultaneous in your
frame are not simultaneous in the frame that is moving relative to you. Often, apparent
paradoxes in relativity can be answered by considering the loss of simultaneity.
Traditional notation
In many relativity textbooks, you will often see the transformations expressed between a
primed frame (Δx', Δy', Δz', Δt') and an un-primed frame (Δx, Δy, Δz, Δt). Conventionally,
the primed frame is the frame moving with velocity v along the x-axis with respect to the un-
primed frame. Often, books also drop the Δ (but it is implicitly there).
So this means that we can write our time-dilation and length contraction effects in the
following form:
∆t ' = γ∆t
l' = l
γ
EXTENSION: Lorentz transformations
If we combine our knowledge of all of these effects and the conditions under which they apply,
we can write down coordinate transformations for going from one frame to another. This is the
relativistic equivalent of the Galilean transformation we discussed initially. Using the prime/
un-primed frame notation, the transformations are:
∆x = γ ( ∆x + v ∆t )
 v ∆x ' 
∆t = γ  ∆t ' + 2 
 c 
You do not need to remember these now, but they are presented for completeness. They allow
us to work with events that do not fit the restrictions that we built into our derivations of time
dilation, length contraction and loss of simultaneity – i.e. cases where we would expect a
combination of these effects.
18
Summary
■ In the late 19th century, most physicists thought that electromagnetic waves travelled
in a medium that they called the aether. However, experiments to measure the
variation in the speed of light due to Earth’s motion through the aether all yielded
null results.
■ Einstein’s two postulates of relativity are:
■ The laws of physics are the same in all inertial frames of reference (frames of
reference/coordinate systems moving at a constant speed with respect to each
other).
■ The speed of light in free space has the same value c in all inertial frames of
reference.
■ Einstein’s postulates of relativity dispense with the idea of the aether – light does not
require a medium in the same way as a mechanical wave.
■ The postulates of relativity give rise to time dilation and length contraction: space
and time are no longer absolute quantities: distances and times between events
change depending on which inertial frames they are measured in
1
■ Time dilation: ∆t ' = γ ∆t
v2
1− 2
c
v2
■ Length contraction: l ' = 1 − l
c2
■ Two events that are simultaneous in one frame of reference may not be
simultaneous in another frame of reference.
end-of-chapter questions
S34.1.
a What does Einstein’s special theory of relativity state about the laws of physics? [1]
b What does Einstein’s special theory of relativity state about the speed of light? [1]
c F illipas and Fox conducted an experiment to test special relativity. They measured the speed of the
gamma rays emitted when a particle called a neutral pion decays into a pair of gamma rays. The
gamma rays are emitted in opposite directions, and there are no other products of the decay.
i Explain why the gamma rays are expected to travel at the speed of light. [1]
ii Explain why a stationary pion could not decay to a single gamma ray photon. [2]
d T
he pions used in the experiment in (c) were moving at a speed of 0.20 c in the laboratory frame of
reference. The gamma rays were emitted parallel to the motion, as shown in the diagram below.
neutral pion
backward pion velocity forward

γ -ray photon = 0.20c γ -ray photon
i The results of the experiment showed that the velocities of the photons relative to the laboratory
were equal to c in both directions, to within the limits of the experimental uncertainty. What
conclusion can be drawn from this? [1]
ii What is the velocity of the forward photon relative to the pion, i.e. seen from a reference frame
moving with the same velocity as the pion when it decays? [1]
iii The momentum of a photon is related to its energy by the formula E = pc. What can be said about
the frequency of the two photons emitted in this decay? [4]
19
iv In the laboratory, the half-life for the decay of a stationary neutral pion is 0.18 ns. Calculate the
half-life of the pion when it is moving at 0.2 c. [2]
S34.2.
The principle of relativity states that the laws of physics are the same for all uniformly moving observers.
a State what is meant by uniformly moving. [1]
b What does this imply about c, the speed of light in a vacuum? [1]
c Explain what is meant by time dilation (it is not necessary to derive any formulae). [3]
d A
muon has a mean lifetime of 2.2 µs when it is stationary in the laboratory. Sketch a graph to show
how the particles observed lifetime in the laboratory depends on its velocity through the laboratory.
Label your graph carefully. [4]
S34.3.
One of the consequences of special relativity is that if an astronaut were to take a lengthy journey at
speeds close to the speed of light, leaving and returning to the Earth, for her the journey would take a
relatively short length of time, but several generations may have passed on Earth.
a Using your knowledge of special relativity, explain the statement above. [3]
b The total distance travelled on such a trip is 50 light years and the astronaut travels at a speed of 0.98c.
i Calculate how much time has passed during the journey for the people remaining on Earth.
ii Calculate how much time has passed for the astronaut during her journey. [3]
c Explain why this could be considered to be an example of time travel. [2]
d T
he calculations you have done are in the frame of reference of the Earth. Explain why it would not be
justifi ed to carry out the same analysis in the same way from the reference frame of the astronaut. [2]
S34.4

The Global Positioning System is used for satellite navigation. GPS receivers pick up and compare time
signals from orbiting satellites and use them to calculate positions relative to the satellites. In order for
the system to work accurately, the clocks on board the satellites must be corrected for two relativistic
eff ects that aff ect the rate of the atomic clocks.
he fi rst eff ect comes from the special theory of relativity, and arises because the clock is in motion
T
with respect to an observer on Earth.
a E
xplain why a ‘moving’ clock runs slow compared to a clock at rest beside the observer. Ignore the
eff ects of gravity. You may wish to use a diagram in your answer. [4]
b T
he satellite’s relative velocity is typically 3.5 x 10 ms . Calculate the time that the clock ‘loses’ each
3 -1
second due to time dilation. You may wish to use the following approximation to the time dilation
equation:
t  1 v2 
t' = ≈ t 1 + 2 
1 − v 2 / c2  2c 
[3]
c H
ow long would it take to accumulate an error of 100 m in position (given that the signals travel at
the speed of light), if this error were not corrected for? [2]
(The second eff ect on the clock comes from the general theory of relativity, and is due to gravitational
time dilation.)
S34.5
Two trains, A and B, each have proper length (length in their rest frame) L, and move in the same direction.
A’s speed is 4c/5, and B’s speed is 3c/5. A starts immediately behind B (see diagram below).
20
4c/5
A
3c/5
B
C
a H
ow long, as viewed by person C on the ground, does it take for A to overtake B? This is the time
elapsed between them being in the position shown in the diagram until the back of A is level with the
front of B. [6]
b E
xplain why we cannot use the time dilation result to calculate the time taken for the trains to
overtake in A’s frame (or B’s frame). [3]
S34.6
Two painters stand on a train platform, a distance L apart. As a train passes by at speed v, both painters
simultaneously (in the platform frame) make a mark with their brushes on the train. Due to the length
contraction of the train, we know that the marks on the train are a distance γ L apart when viewed in
the train’s frame of reference, because this distance is the distance that is length contracted down to a
distance L in the platform frame.
a How would someone on the train qualitatively explain why the marks are a distance γ L apart, even
though in their frame the painters stood a distance of L apart? [2]
γ
b Can you explain part (a) quantitatively (harder!)? [5]
S35: A
stronomy and cosmology
Learning Outcomes
■ understand the terms luminosity and luminous flux
L
■ recall and use the inverse square law for flux F =
4π d 2
■ understand the need to use standard candles to help determine distances to galaxies
■ recognise and use Wien’s displacement law λmax ∝ 1/T to estimate the peak surface
temperature of a star either graphically or algebraically
■ recognise and use Stefan’s law for a spherical body L = 4π r 2σ T 4
■ use Wien’s displacement law and Stefan’s law to estimate the radius of a star
■ understand that the successful application of Newtonian mechanics and gravitation to the
Solar System and beyond indicated that the laws of physics apply universally and not just
on Earth
■ recognise and use Δλ/λ ≈ Δf/f ≈ v/c for a source of electromagnetic radiation moving relative
to an observer
■ state Hubble’s law and explain why galactic redshift leads to the idea that the Universe is
expanding and to the Big Bang theory
■ explain how microwave background radiation provides empirical support for the Big Bang
theory 1
■ understand that the theory of the expanding Universe involves the expansion of space-time
and does not imply a pre-existing empty space into which this expansion takes place or a
time prior to the Big Bang
■ recall and use the equation v ≈ H0d for objects at cosmological distances
■ derive an estimate for the age of the Universe by recalling and using the Hubble time t = 1/H0
S35.1 Introduction
Since ancient times, humans have sought to understand and explain what they have seen in
the night sky. Earlier we discussed how the ancient Greek geocentric (Earth centred) model
of the Universe gave way in the Renaissance to a heliocentric model consisting of elliptical
orbits. Empirically described by Kepler’s Laws, the elliptical orbits of the solar system were
explained by Newton’s theory of gravity. Newton’s theory, and the modifications made
by Einstein in his general theory of relativity, apply across the entire visible Universe. The
same physical laws that have been experimentally determined on Earth and within the
solar system can be seen to apply universally. Astronomical phenomena offer us a natural
laboratory, the observation of which allows us to test our physical theories under extreme
conditions and large scales not available in a laboratory on Earth.
S35.2 How bright is that star?

The total power radiated by a star is known as its luminosity, L (units, W). We cannot
measure its luminosity directly. However, we can measure the intensity of radiation received
from the star at the surface of the Earth, which is known as the luminous flux, F.
This is defined as the power per unit area of surface perpendicular to the radiation at a
distance d from the star, and has units W m−2 (it is an intensity).
We can relate the star’s luminosity and the luminous flux by the equation:
L
F=
4π d 2
The flux follows an inverse square law. The equation assumes that all the radiation of the
star is spread out evenly in all directions. At a distance d, the total radiation from the star
is spread out over the surface of a sphere of radius 4π d 2, see Figure S35.1. This law means
that if we have a star of a known luminosity, and can measure the luminous flux on Earth,
we can work out how far away the star is. Alternatively, for some stars there are other ways
of determining the distance, in which case we can use the equation to determine the star’s
luminosity.
surface area 4π r22
surface area 4π r12
r2
r1
L
intensity
star 4π r22
luminosity L
2
L
intensity
4π r22
Figure S35.1 The relationship between luminosity and luminous fl ux.
S35.3 The spectrum of stars

If you gradually increase the current through a filament light bulb, the filament heats up,
and you will notice that it begins to glow a deep, dull red colour. As the current is increased
further, the filament gets hotter and will glow a brighter orange-yellow. When the current is
sufficiently high, the filament produces a bright white light. It has gone from being ‘red hot’
to ‘white hot’. There are two effects we are seeing here:
• The intensity of the radiation emitted at all wavelengths increases as the temperature
increases, which is why the filament shines more brightly.
• If we plot the intensity of radiation against wavelength, we see a distribution of
wavelengths – a spectrum – with a peak at a particular wavelength. As the temperature
of the filament increases, the peak of the spectrum moves to shorter wavelengths (higher
frequencies), which is why the colour changes.
Black body radiation

A ‘black body’ is a term used to describe an object that can absorb electromagnetic radiation
evenly at all wavelengths. This type of ‘perfect object’ does not exist in reality. It is an
idealised object used as a device in physical theories, much as we use the concept of an
ideal gas to develop our thinking about kinetic theory. If such a body is allowed to come
into thermal equilibrium with its surroundings, so that it reaches a constant temperature,
then it is also an ideal emitter of radiation, emitting that radiation equally in all directions.
The spectrum of the radiation that is emitted from this black body follows Planck’s law,
which means that its spectrum only depends on its temperature (see Figure S35.2). At room
temperature, the spectrum of a black body peaks in the infrared, so to a human eye the
object would appear matt black at visible wavelengths.
10 ultraviolet visible infrared
8
Intensity / (arbitrary units)
6
T=
λ max 6000 K
4
5000 K
λ max
2 4000 K
3000 K
0
0 1.0 2.0 3.0
µ
Figure S35.2 The spectrum of a black body at various temperatures.
It may seem surprising, but the spectrum of a filament bulb as it is heated, and the spectrum
of a star, are close to that of an ideal black body, even though they are not in thermal
equilibrium with their surroundings. The black body spectrum is a good first approximation
to the spectrum of these objects. (The observed spectrum closest to a perfect black body
3
spectrum is that of the cosmic microwave background radiation, which we will discuss later.)
The temperature of the black body spectrum that most closely matches a star’s spectrum
is known as the effective temperature of the star. This temperature is a good estimate for the
peak surface temperature of a star; the star will be hotter inside. We can estimate the surface
temperature of a star by using Wien’s displacement law, which relates the wavelength at the
maximum of a black body spectrum to the temperature of the black body.
Wien’s displacement constant ( m K )
wavelength of maximum ( m ) =
absolute temperature ( K )
B
λmax =
T
where Wien’s displacement constant B = 2.898 × 10−3 m K
Wien’s law was developed by Wilhelm Wien several years before Max Planck derived the
general form of the black body spectrum.
WORKED EXAMPLE
Estimate the surface temperature of a red-orange star with a spectrum that peaks at 700 nm.
B 2.898 ×10−3 m K

T= = = 4100 K
λ 700 ×10−9 m
The estimated surface temperature is 4100 K (2 s.f.).

This enables us to classify stars according to their colour. Red stars are (relatively!) cool,
with surface temperatures of around 3000 K. Yellow stars such as our Sun have surface
temperatures closer to 6000 K. Some blue stars will have surface temperatures greater than
20 000 K (in fact the peak of their spectrum falls in the ultraviolet).
Calculating luminosity from the temperature

The Stefan–Boltzmann law gives us the power radiated per unit area (I) for a black body:
I = σT 4
where σ = 5.7 × 10−8 W m −2 K −4 is the Stefan constant
If we multiply this power per unit area by the surface area of the star, we get the luminosity,
L, for the star. For a star of radius r, the luminosity is:
L = 4π r 2σ T 4
This equation has some important consequences.

• Doubling the radius of the star increases luminosity by a factor of four.
• Doubling the temperature increases the luminosity by a factor of 16.
When we increase the temperature, we increase both the intensity of the emitted radiation
and the frequency (but remember that increased frequency means shorter wavelength). This
means that the spectrum of light emitted has a higher energy due to its increased frequency,
and is of increased intensity.
4
WORKED EXAMPLE
The spectrum of Sirius A (the brightest star in the night sky) has its maximum at 292 nm. Its
luminosity is 25.4 times the luminosity of our Sun, which has a luminosity of 3.85 × 1026 W. Use
this data to estimate the radius of Sirius A.
Step 1 Use Wien’s displacement law to estimate the surface temperature of the star:
B 2.898 × 10−3 m K

T= = = 9920 K
λ 292 × 10−9 m
Step 2 Use Stefan’s Law for a spherical body to calculate the radius:
L = 4π r 2σ T 4
L 25.4 × 3.85 × 1026 W

r= = = 1.2 × 109 m
4πσ T 4 4π × 5.7 × 10−8 W m −2 K −4 × ( 9920 K )
4

For comparison, the radius of the Sun is 6.96 × 107 m, so the radius of Sirius A is
approximately 17 times larger.
questions
35.1 The background radiation, a remnant from the Big Bang, has the spectrum of thermal
radiation from a black body at a temperature of 2.7 K.
a Calculate the peak wavelength of this spectrum.
b What region of the electromagnetic spectrum does this peak wavelength belong to?
35.2 Mintaka is a star system at a distance of 1200 light years from Earth, in the
constellation Orion. One of the component stars is a class O star with a surface
temperature of 29 500 K. Its luminosity is 190 000 times the luminosity of our Sun
(which has a luminosity of 3.85 × 1026 W).
a Calculate the peak wavelength in this star’s spectrum.
b Calculate an estimate for the radius of the star, using the Stefan–Boltzmann law.
35.3 Using the data for the Mintaka star in question 35.2, determine how far it would have
to be from the Earth for the luminous flux of radiation arriving from it to be equal to the
luminous flux from the Sun. Leave your answer in terms of the mean distance between
11
the Earth and the Sun, which is called 1 AU (astronomical unit). (1 AU = 1.496 × 10 m)
Extension: Hertzsprung-Russell diagram
According to the Stefan–Boltzmann law, a star could be very luminous either because its radius
is very large, or because it is very hot, or a combination of these two factors. Astronomers
classify different types of stars into categories using the spectral class, a classification system
based on the elements observed in a star’s absorption spectrum (see Chapter 30), which is
closely connected to the temperature of a star’s outer layers. Astronomers have observed 5
that there is a clear relationship between the spectral class of a star and its luminosity. This
relationship is shown in a plot known as the Hertzsprung–Russell diagram (named after the
two scientists who independently discovered this relationship). The diagram is shown in Figure
S35.3. The y-axis of the diagram is the luminosity of the star, relative to the Sun, on a logarithmic
scale. On the x-axis is effective surface temperature, also on a logarithmic scale.
You may also see this diagram presented in terms of the stars’ magnitudes. Astronomers often
describe the luminosities of stars in terms of magnitudes. The apparent magnitude is related
to the luminous flux (the brightness as it appears in the sky), while the absolute magnitude is
related to the luminosity (the total power output of the star). Magnitudes are expressed on a
logarithmic scale, and the lower the apparent magnitude, the more luminous the star.
The most notable feature of the Hertzsprung–Russell diagram is the main sequence, along
which luminosity rises with surface temperature. Our Sun is currently on the main sequence,
and is labelled on the diagram. The relationship between luminosity and temperature for a main
sequence star can be modelled by an approximate power law
3.5
L  M 
=
LSun  M Sun 

where L is the luminosity and M is the mass. Conventionally, we have divided each by the
value for the Sun, as often values of these quantities are quoted in terms of the Sun’s luminosity
and mass. Note that if we look at a particular part of the main sequence, we can work out the
particular power law for that type of star, which will fit the particular trend for that part of the
main sequence more precisely than this approximate power law.
Inside the core of a star on the main sequence, hydrogen nuclei fuse together to form helium
nuclei. This is a nuclear fusion reaction (see Chapter 31) that generates huge amounts of thermal
energy. This energy spreads outwards, creating a thermal pressure outwards from from the
core, which balances the gravitational pressure caused by the mass of the star pulling inwards.
This balance of forces means that the star maintains a particular radius.
EXtEnsiOn: Hertzsprung-Russell diagram (continued)
The gravitational pressure is greater for a star with larger mass, so more massive stars
are hotter – a greater thermal pressure is needed to balance this gravitational pressure. The
more massive star generates more power, and so has a larger luminosity. Since the luminosity
increases as approximately M3.5, but the amount of fuel a star has for fusion depends on its mass,
it follows that more massive stars burn their fuel more quickly and have a shorter lifetime on the
main sequence.
Stars in other regions of the diagram exist under diff erent conditions. Above and to the right
of the main sequence are the red giant stars. These are very luminous, but comparatively cool.
The Stefan–Boltzmann law tells us that a cooler star emits much less power per unit area. In
order to be more luminous than stars on the main sequence, a red giant must therefore have
a much larger radius. A red giant is typically formed when a main sequence star of average
mass has used up the supply of hydrogen in its core. The star is now fusing hydrogen in a shell
surrounding the core. The core has contracted under gravity, bringing this additional shell of
hydrogen into a zone where it can undergo fusion. The temperature is higher and the reaction
rate of nuclear fusion is increased, increasing the star’s luminosity. This causes the outer layers
of the star to expand greatly, but because the radius is much larger, the surface temperature of
the star drops.
The supergiants evolve from more massive stars on the main sequence. A supergiant is
massive enough that when it runs out of hydrogen in its core, the additional gravitational forces
almost immediately cause helium nuclei to fuse in the core. This means that the luminosity does
not increase in the same way as a red giant star, and so supergiants move horizontally across the
Hertzsprung–Russell diagram.
Below the main sequence are the white dwarfs. These are hot stars (by their colour) but they
are not particularly luminous. A white dwarf is typically an older star that no longer produces
energy by nuclear fusion; the luminosity comes from stored thermal energy. Since there is
no longer any outward pressure from the nuclear reactions, a white dwarf contracts until it
reaches a state in which the inward gravitational pressure is balanced by electron degeneracy 6
pressure. This is a quantum mechanical eff ect, and occurs because each quantum state can
only contain one electron.
106
105 supergiants
104
Luminosity (compared to the Sun)
103
main giants
102 sequence
10
1
Sun
10–1
white
10–2
dwarfs
10–3
10–4
10–5
30 000 10 000 6 000 3 000

Surface temperature (in degrees)
Figure S35.3 The Hertzsprung–Russell diagram.
EXtEnsiOn: Hertzsprung-Russell diagram (continued)
quEstiOns
35.4 Why must a cool star be large in order to have a large luminosity?
35.5 Explain why a very massive star on the main sequence is likely to have a large
luminosity. Why is it likely to have a very short life?
35.6 Why is there a lower limit to the mass of a star?
35.7 Our Sun is approximately 300 times more luminous than 40 Eridani B, a white
dwarf star (the fi rst to be discovered). Its mass is approximately half that of
the Sun. Does 40 Eridani B obey the mass–luminosity relationship for the main
sequence?
S35.4 How far to the stars?

In order to work out how far away various objects in our Universe are, we need to draw on
a range of astronomical techniques. We build a cosmic distance ladder to take us from
the distances of nearby objects to work out (based on certain assumptions) the distance of
objects that are further away.
Galaxy clusters
(1010 ly)
Nearby galaxies
(107 ly)
Milky Way
(105 ly) 7
Nearby stars
(102 ly)
Solar system
(10–4 ly)
white dwarf
H0
Venus
Hubble’s law: d = v
Relative apparent
supernovae
Luminosity
brightness
Sun
radar ranging Period

parallax Surface
temperature (K)
main-sequence Tully-Fisher
fitting relation
Cepheids
distant
standards
Figure S35.4 The cosmic distance ladder. We measure nearby objects using direct measurements,
such as parallax, and then use ‘standard candles’ to extend our distance scale to more distant objects.
The astronomical unit (AU)

The most important distance measurement when measuring distances within the solar
system, on which all of our other distance measurements are based, is the astronomical unit.
This is the average distance from Earth to the Sun. From Kepler’s third law (see Chapter
18), we can work out the distance to all of the planets in terms of the AU: so we know the
relative distances, but not the overall scale. If we can determine the distance to one of the
planets, though, we can determine the distance to all of them. Historically, the AU has been
determined by a method suggested by Halley in 1716 – observations of the transit of Venus
across the face of the Sun from two different points on the Earth’s surface. The orbit of Venus
is less than 1 AU in radius. Therefore occasionally, it is possible from Earth to observe Venus
as it passes between the Sun and the Earth, in an event called a transit. Observation of the
angle subtended at the Earth between those paths, and knowledge of the distance between
the two observation points, allow us to calculate the distance to Venus. The measurement
was first made in 1761, the first transit of Venus after Halley developed the method, but
unfortunately after his death. It led to a measurement of the AU that was respectably close to
our current best estimate. We now determine the distance using radar signals bounced off
Venus and received by radio telescopes.
In fact, recently (2009), the astronomical unit has been redefined to be exactly
149 597 870 700 m. This definition means the AU is no longer exactly the same as the mean
distance between the Earth and the Sun; it is based around other constants. However, the
original definition is important in order that we understand how other measurements of
distance have been based upon it.
EXTENSION: Proper motion of the stars
From the Earth, the stars appear fixed in place over long periods of time: we can observe the
same constellations as the ancient Greeks or Babylonians. However, the reality is that nothing
is fixed in space – everything moves relative to other objects. We orbit the Sun, the Sun moves
about the galactic centre, the other stars in the galaxy are moving relative to the Sun, and so on.
The stars are far enough away that although they are moving relative to us, on the timescales
that we observe them, these motions, termed ‘proper motions’ by astronomers, are very small
(but can be measured). For the closest star system, the Alpha Centauri system, these motions
are on the order of 1/1000th of a degree per year.
Although 1/1000th of a degree sounds tiny, remember that this difference is measured from
Earth according to the relative movement across the sky that we observe. Given the enormous
distances between Earth and the stars and galaxies we observe, that 1/1000th of a degree can
mean a very large distance has been moved by the object itself.
8
Astronomical parallax
Once we have a measurement for the AU (the Earth’s mean orbital distance), we can use it
to measure the distance to nearby stars. The trigonometric or astronomical parallax is the
amplitude of annual shift in position of a star as the Earth moves around its orbit, measured
as an angle (see Figure S35.5).
distant stars
apparent parallax
motion of near star
P
parallax angle
near star
1 AU
Earth’s motion around Sun

Figure S35.5 Astronomical parallax
A star with a larger parallax is closer to the Earth than a star with a smaller parallax. You can
observe this easily for yourself. Try standing inside a building near a window and placing an
object on the windowsill. Now bring your eyes level with the object and look past it through
the window. If you move your head from side to side, the object will appear to move further
through your field of view than a distant object outside the window.
Often parallax is a small fraction of a degree, so we use the arcsecond as the basic unit
of measurement. Just as one hour of time is divided into 60 minutes, and each minute into
60 seconds:
1 degree of arc = 60 minutes of arc = 3600 seconds of arc.
EXTENSION: Seconds of arc and the parsec
A star that has a parallax of one second of arc is defined as being at a distance of one parsec
(pc). Therefore:
1
distance, d ( pc ) =
parallax, p ( seconds of arc )

Using trigonometry (see Figure S35.5), we can work out how the parsec is related to the
astronomical unit:
1 AU
tan ( p ) =
d
1 AU 1 AU
1 pc = = = 2.06 × 105 AU = 3.09 × 1016 m = 3.26 light years ( ly for short )
tan ( p ) tan (1′′ )
The first successful use of this measurement to measure the distance to a star was by German
astronomer Friedrich Bassel in 1838, when he determined the distance to 61 Cygni to be 10.4 ly.
Current estimates place it at 11.4 ly. This method is limited to relatively close stars (up to 100 pc),
9
since as the stars get further away, the parallax gets too small to measure accurately (although it
can be averaged for clusters of stars that are close together).
question
35.8 Calculate the distance to the following stars, given their trigonometric parallax:
a Proxima Centauri (our nearest star): 0.772 arc seconds
b Wolf 359: 0.419 arc seconds
c Alpha Cephei (Alderamin): 0.067 arc seconds
Standard candles
As we mentioned earlier, if we know the luminosity of an object and can measure its
luminous flux, then we can calculate the distance it is away from us. Astronomical objects
for which the luminosity is well known are described as standard candles, and they can be
used for distance measurements.
By examining the shape of a star’s spectrum, we can determine the surface temperature
(using Wien’s displacement law). The width of the spectral lines gives us information
that means we can set limits on the luminosity. We can use this information to compare
the star to known stars and determine its luminosity (if you have read the extension
box entitled Hertzsprung-Russell diagram, you might be interested to know that we use
this to determine its luminosity). Then by measuring its luminous flux, we can work out
the distance to the star. This technique is, confusingly, called spectroscopic parallax
(confusing because there is no parallax involved!).
If we have a cluster of stars, we can plot their apparent magnitude against surface temperature.
Assuming the cluster contains typical stars, we know the distribution of luminosities that we
might expect. We can therefore use their measured luminous flux to work out the distance to the
stars (if you have read the extension box entitled Hertzsprung-Russell diagram, we in fact compare
the main sequence of the cluster to the main sequence on the Hertzspung-Russell diagram). This
technique is called main sequence fitting. We can use spectroscopic parallax and main sequence
fitting to get the distances to stars within our galaxy, the Milky Way.
Another commonly used standard candle is the Cepheid variable, a type of star named
after Delta Cephei in the constellation Cepheus. These have a periodic luminosity – the
luminosity increases and decreases in a regular pattern over time. Astronomers discovered
that there is a direct relationship between the luminosity of a Cepheid variable and the period
over which it oscillates. Therefore, by measuring the period, we can determine the luminosity
(Figure S35.6). From the luminosity and measured luminous flux, we can calculate the
distance to the star. Of course, in order to calibrate our scale, we need some nearby Cepheids
for which we can use parallax to determine the luminosity. The relationship between period
and luminosity for Cepheid variables was first recognised by Henrietta Leavitt in 1912, and
this was later calibrated by Harlow Shapley. Cepheid variables were used as standard candles
by Edwin Hubble to find the distances to nearby galaxies (see section S35.5). In the 1950s, it
was determined that there is more than one type of Cepheid variable – and so the distance
scale had to be recalibrated. Cepheids are used as a standard candle for nearby galaxies.
At greater distances (to more distant clusters of galaxies), Type I supernovae can be used
as standard candles. A supernova is a violently exploding star. Such an explosion can produce
as much energy as an entire galaxy, but over a short period of time. They can therefore be
spotted at great distances. A ‘type I’ supernova is thought to have a consistent luminosity
and is therefore useful as a standard candle. However, as they are relatively rare, and short-
lived, we have to spot one before we can use it as a standard candle. Type I supernovae
involve a white dwarf star in what is called a binary system with another star close by.
There is an upper mass limit for a white dwarf star, of 1.44 times the mass of the Sun. This
is known as Chandrasekhar’s limit, and it is the point at which the gravity of the star can 10
no longer be matched by the electron degeneracy pressure mentioned earlier. If material

flows rapidly from the white dwarf’s companion star onto the white dwarf, its mass may
exceed the Chandrasekhar limit, which causes a supernova. The white dwarf is destroyed in
a sudden burst of fusion and no remnant is left behind. Since we know how much material is
undergoing fusion (1.44 times the Sun’s mass), we can calculate the luminosity and therefore
can work out, from the luminous flux, the distance to the supernova.
The distance to even more distant objects can be calculated using redshift and Hubble’s
Law (see section S35.5).
a b
–7 6.0
Apparent magnitude
–6 type I
(classical)
–5 Cepheids 6.5
Absolute magnitude
–4
delta-Cephei
–3
0 10 20 30 40 50
–2 Time (days)
type II
–1 Cepheids
0
RR Lyrae
0.3 1 3 10 30 100
Period (days)
Figure S35.6 a The relationship between luminosity and period for variable stars. Three classes
of variable star are shown, type I Cepheids, type II Cepheids and RR Lyrae stars. The scale here is
given as absolute magnitude – the lower the magnitude (more negative) the more luminous the
star. b The periodic variation in luminosity of a Cepheid variable star.
S35.5 The expanding Universe

As an emergency vehicle drives past you, you will notice that as it is coming towards you, the
pitch of the siren is higher than when it is stationary, and as it is going away, the pitch of the
siren is lower than when it is stationary. This is known as the Doppler effect (see Chapter 13
in the Coursebook). As the vehicle drives away, the wavelength of the sound emitted is
increased, as the vehicle has moved away between wavecrests.
Light can also be Doppler shifted. If a light source is moving away at a speed v, the light is
redshifted – its wavelength has been increased. The fractional change in wavelength, z, due
to this motion is given by:
δλ δ f v
z= = =
λ f c
In the formula, δλ is the change in wavelength, λ is the unshifted wavelength, δ f is the

change in frequency, f is the unshifted frequency and c is the speed of light. If the value of
z is positive, the light is redshifted (i.e. the wavelength has been increased). If instead the
source of the light was moving towards us, and v in this formula was negative, then the value
of z would also be negative and the wavelength of the light would have been reduced, or
blueshifted.
Of course, we need to know the frequency of the light that was emitted in order to
determine the change in frequency and therefore the Doppler shift. We know the spectra
of the various elements found on stars, and by comparing the spectral lines observed from
the Sun and other stars we are able to work out which elements are present. So we can
compare the known frequencies of these spectral lines with the measured frequencies from
astronomical objects to determine the Doppler shift. Remember that the spectra we observe
will be absorption spectra, as the light emitted by a star excites the hydrogen in the star’s
11
outer layers of atmosphere, which then re-radiates that light in all directions. This leads to a
reduction in the intensity of light at frequencies where transitions between energy levels in
the atom have been excited, and a black line in the spectrum. Please refer to the discussion of
the quantum atom (Chapter 30 in the Coursebook) for more detail about spectra.
In 1912, an astronomer at the Lowell Observatory in Arizona, Vesto Slipher, was using
Doppler shifts to study the rotation rates of galaxies. He discovered that as well as effects due
to rotation, there was also a Doppler shift due to relative motion of the galaxy and the Earth.
Relative motion was to be expected, but in a Universe that is not expanding, it should be
equally likely that a galaxy is either moving towards us or away from us. By the early 1920s,
Slipher had discovered that the majority (36 out of 41) of galaxies on which he had made
measurements were in fact redshifted – they were moving away from us.
At the same time, Edwin Hubble had been working on techniques to measure the
distances to galaxies using Cepheid variables. When the redshift data of Slipher and Hubble’s
distance data were put together, it was found that the redshift increases in direct proportion
to the distance to the galaxy. This leads to Hubble’s law:
v = H 0d
where v is the recession velocity of the galaxy, d is the distance to the galaxy and H0 is known
as Hubble’s constant. H0 is usually measured in km s−1 Mpc−1, so distance d is given in Mpc
(megaparsecs), and the equation will give us a recession velocity in km s−1. Figure S35.7 shows
a recent plot of velocity vs distance for Type 1a Supernovae, which fit Hubble’s law.
4 × 104
Hubble diagram for Type 1a supernovoe
3 × 104
Velocity (km s–1)
2 × 104
1 × 104
0
0 100 200 300 400 500 600 700
Distance (Mpc)
Figure S35.7 Hubble’s law for Type 1a supernovae.
The value of Hubble’s constant lies between 60 and 80 km s−1 Mpc−1. A number of recent
measurements of Hubble’s constant are shown in Figure S35.8.
question
35.9 A galaxy has a redshift of 0.1. Calculate:

a The observed wavelength of the hydrogen alpha line, which has a non-redshifted
wavelength of 656 nm.
b The speed of the galaxy relative to us.
12
What does Hubble’s law mean?

Hubble’s law combined with Einstein’s theory of general relativity indicate that our entire
Universe is expanding. If the scale of the Universe is increasing now, then it must have
been smaller in the past. If we run the evolution of the Universe backwards, it is reasonable
to assume there must have been a point in the distant past where the whole Universe was
packed into a tiny volume – it implies a start point for the Universe, a ‘time zero’. At this
point, there would be infinite density and temperature – physicists call this a singularity.
Since Hubble’s time, theories have been developed, tested and modified that indicate
that the whole Universe began at this point, in an explosion known as the Big Bang. Our
understanding of particle physics allows us to explain broadly what happened in the first
fractions of a second after the Big Bang, and the subsequent evolution of the Universe. Some
people think that it is also important to propose and analyse ideas about what came ‘before’
the Big Bang, but applying the ideas of physical science to this question does not really
make sense – the Big Bang was itself the origin of time and space. Time only started when it
occurred. Cosmologists do not think of the Universe as being a vast empty space before the
Big Bang – space itself was created in the Big Bang.
Hubble constant
calculated using different survey methods
78
76
74
72
70
68
66
64
Hubble Spitzer WMAP9 Planck
(2011) (2012) (2012) (2013)
Figure S35.8 Values of Hubble’s constant from different experiments.
What does the ‘expansion of space-time’ mean?

Imagine taking a piece of paper, on which a number of dots have been drawn to represent
galaxies (a 1D version of this is shown in Figure S35.9): the red dot represents us.
a 13
d 2d
b 1 2
2d 4d
Figure S35.9 The expansion of the Universe, modelled in one dimension. The red dot represents
our observer, the black dots other galaxies. a An observed configuration. b A Universe that is
expanding at a constant rate has a ‘scale factor’ that is increasing with time.
If the Universe doubles in scale compared to its initial configuration in Figure S35.9a, over a time
interval Δt, then the distance between each black dot and the red dot doubles (Figure S35.9b).
This means the distance to dots that were initially further away has increased by a larger amount,
and those dots appear to be moving away faster (as the rate of motion is the distance divided
by the time taken for the expansion to happen, Δt). For instance, the dot (labelled 2) that was
initially a distance 2d away moved a distance 2d in time Δt (so that it is now 4d away from us). A
dot which was initially a distance d away from us (labelled 1) has moved an additional distance d
away in time Δt. So the recession velocity of dot 2 is twice that of dot 1.
During every time interval Δt, the scale of the Universe increases by a constant factor (a
factor of 2 is used in Figure S35.9). It is a bit like using the ‘enlarge’ setting repeatedly on a
‘cosmic photocopier’: during each time interval, the scale is increased in all directions by the
same amount. The result of this is that galaxies that are further away (from us, i.e. the red
dot) appear to be moving away from us faster. In other words, this theory of expansion from
the Big Bang matches Hubble’s law, and the observed redshifts of galaxies.
There are a number of things to point out about this idea. Firstly, the galaxies need not
be moving through space as they recede from us – the space itself expands and the gaps
between the galaxies therefore get bigger. This means that the observed redshift is actually
not due to a Doppler shift from classical physics, but in fact because the scale of the Universe
itself has increased since the light was emitted. The redshift, z, is directly linked to this
change in scale by the equation
R2
z= −1
R1
where R1 was the scale factor of the Universe when the light was emitted, and R2 is the scale
factor when it is received. If you look back to the formula for redshift in terms of recession
velocity, a redshift greater than 1 would imply v > c, but in fact this formula breaks down
close to the speed of light, and should not be used for redshifts greater than about 0.1.
However, the new interpretation of space itself expanding gives us an interpretation of
redshifts greater than 1 – for example, a redshift of 3 means that the Universe is now 4 times
larger than it was when the light was emitted. (It does not imply something is travelling
through space faster than light!)
Secondly, imagine that our red dot in Figure S35.9 was in a different position. The same
result would apply, that the recession velocity was proportional to the distance between that
position and the object whose recession velocity was being measured. This means that the
Earth is not in a ‘special place’ at the centre of the Universe in order for Hubble’s law to apply
– the same expansion can be observed at any other point in the Universe. The idea that the
Universe should look the same if viewed from any other point in the Universe (apart from
local small-scale structure) is known as the cosmological principle. We can write this as a
formal definition:
Viewed at a sufficiently large scale, the properties of the Universe are the same for all
observers.
The theory that the Universe is expanding, and that it originated in a Big Bang, does
not imply that there has to be anything for it to expand into. As we discussed above, the
expansion can be viewed as an expansion of space-time: the scale factor of the Universe is 14
increasing with time. Similarly, the theory does not imply that there was a time before the
Big Bang.
The age of the Universe

If Hubble’s law has applied for the entire age of the Universe, then the scale factor of the
Universe has been increasing at a constant rate for all time. If a galaxy is now a distance d
away from you and is moving at a recession velocity of v, then at a time t = d/v in the past, it
would have been right next to you. Therefore, using the assumption that the Hubble constant
has been the same for all time, we could estimate the age of the Universe as:
d d 1
t= = =
v H 0d H 0
This time is known as the Hubble period. With Hubble’s constant being in the range 60
to 80 km s−1 Mpc−1, we can calculate a range of possible ages for the Universe. We need to
convert H0 to s−1 first, and then we find that the Hubble period is between 12 and 16 billion
years. This is a very approximate figure for the age of the Universe.
question
35.10 Show that the range of Hubble’s constant, between 60 to 80 km s−1 Mpc−1, leads to a
Hubble period of between 12 and 16 billion years.
There are some reasons why we might not expect the rate of expansion to be constant,
though. For example, gravity, as an attractive force between all objects with mass, would
slow down the expansion. This suggests our estimate of the age of the Universe from the
present value of Hubble’s constant would be an overestimate. Hubble’s constant may not in
fact be a constant for all of time; it is possible that it has changed as the Universe has evolved.
Therefore, cosmologists often refer to it now as the Hubble parameter, and define it as the
rate of change of the scale factor of the Universe.
a
Hfuture
Hnow
Expansion velocity
Hpast
0
0 past now future
Distance
b accelerating
Size of universe
low
empty density
critical density
All lines above this
correspond to an
open universe.
high density
Closed universe.
Time (billion years)

0 5 10
15
1 present
H0 time
Figure S35.10 a The expansion velocity of the Universe vs. time. If we assume that the current
expansion rate is constant, we get an estimate for the age of the Universe from Hubble’s law. This
estimate is an overestimate if the rate of expansion has been decreasing. b Possibilities for the
expansion rate of the Universe. If the density is higher than the critical density, then the final fate
of the Universe is to collapse in on itself. If the density is lower than the so-called ‘critical density’,
then the Universe will continue expanding forever. An empty Universe would continue expanding
at the current rate forever. Extrapolating this rate backwards in time would give us an upper limit
on the age of the Universe: this age is given by 1/H0.
There are a number of possibilities for the rate of expansion of the Universe (see Figure S35.10b).
• If there is sufficient mass in the Universe, i.e. the density of the Universe is high enough,
then gravitation will eventually cause the Universe to collapse in on itself. The rate of
expansion decreases and then becomes negative – the Universe is described as ‘closed’,
and will end in a ‘Big Crunch.’
• There is a density of matter, known as the critical density, at which the Universe’s
expansion will slow to zero after an infinite amount of time.
• If there is insufficient matter in the Universe (less than the critical density), then the
gravitational attraction will never cause the rate of expansion to reach zero, and the
Universe will continue expanding forever – an ‘open’ Universe.
Note that we may not be able to ‘see’ all the matter in the Universe. We can estimate the
masses of galaxies and other very large objects in the Universe in two different ways.
• We can examine the amount of light and other electromagnetic radiation given off by
an object, and estimate the mass of the object based on what we know of the physical
processes that produce the radiation.
• We can measure the gravitational effects of our target object on other large objects.
When we compare the masses of galaxies and clusters of galaxies produced using these
methods, we find they are significantly different. Cosmologists have therefore proposed the
existence of dark matter, which does not produce or interact with electromagnetic radiation,
but which does create gravitational effects. Estimates based on all the observations we have
made so far suggest that there is not sufficient mass (either normal matter or ‘dark matter’) to
cause the Universe to be closed.
Perhaps surprisingly, recent observations suggest that the rate of expansion of the
Universe is in fact increasing. For this to be the case, there must be something driving the
expansion. After all, we know that gravitation is an attractive force, so gravitation would
tend to slow the expansion down, not increase it. Cosmologists have therefore proposed the
existence of dark energy – a type of energy that again does not produce or interact with
electromagnetic radiation. It is thought to be present everywhere in space, and is estimated
to provide the majority (70%) of all mass–energy in the Universe. Note that dark energy and
dark matter are not the same thing.
The development of cosmological ideas provides many examples of the scientific approach
in action. For example, when Einstein was developing his general theory of relativity in
the years leading up to 1915, the equations he used predicted an expanding Universe. He
felt that this must be an error in his formula, so in order to make the Universe static (not
expanding), he added a term which he called the ‘cosmological constant’. Many years later,
when observations confirmed that the Universe was expanding, Einstein referred to his
suggestion of a cosmological constant as his ‘greatest blunder.’ However, a modified form of
the cosmological constant may now be needed to model the accelerating expansion of the
Universe. We still have many observations to make and theories to develop in order to find
all the answers.
Evidence for the Big Bang

We have already discussed how the redshift of galaxies allows us to deduce that the Universe 16
is expanding, and this leads to the idea that it originated in a Big Bang.
There is other convincing evidence supporting the Big Bang theory. If we were to predict
the state of the Universe one second after the Big Bang, our current thinking is that we would
see a ‘hot sea’ of hadrons (including baryons such as the proton and the neutron) and leptons
(including the electron and the electron neutrino). This ‘sea’ would include both matter and
antimatter particles (see the section ‘Families of particles’ in Chapter 16). Before this time,
even these fundamental particles were not able to form.
As time went on, the Universe cooled, and eventually it reached a temperature
where electrons combined with nuclei to form neutral atoms. This process is known as
recombination, and is thought to have occurred around 380 000 years after the Big Bang. At
the end of this process, most protons and electrons were bound in atoms. Before this point,
the free electrons and protons scattered photons (light) as they travelled, much as sunlight
scatters from the water droplets in clouds. This means that until recombination, the Universe
was opaque; any light produced was quickly scattered or absorbed. After recombination, with
the charged particles becoming bound together in atoms, the photons were scattered much
less and the Universe became transparent. After that point, the photons that were present in
the Universe then were able to travel freely through space. In fact, we can still detect some
of those early photons today. We call these photons the cosmic microwave background
radiation (often given the abbreviation CMB). The CMB has the spectrum of a black body at
a temperature of 2.73 K. The peak of the spectrum is in the microwave range, at 160.2 GHz.
The CMB was first detected in 1964 by two American radio astronomers, Arno Penzias
and Robert Wilson. They discovered it accidentally while testing the horn antenna shown
in Figure S35.11. They were using this antenna to detect weak radio signals reflected from
balloon satellites. To do this, they needed to eliminate interference from their receiver.
After taking many steps to reduce the interference (including clearing a pigeon’s nest
and its droppings from the horn!) they found that there was a source of noise they could
not eliminate. They concluded that the noise was coming from outside our galaxy. Other
astrophysicists at Princeton University (Dicke, Peebles and Wilkinson), in the United States,
were preparing to search for microwave radiation from the Big Bang. News of their work
reached Penzias via a friend, and the two radio astronomers realised the significance of
their discovery. They published a joint paper with the Princeton astrophysicists. Penzias and
Wilson won the 1978 Nobel Prize in Physics for their discovery.
More recently, NASA (the United States’ space agency) has launched two missions to
study the CMB. The first was the Cosmic Background Explore (COBE), the results of which
were published in 1992. They mapped variations in the CMB, which are related to the
gravitational fields present in the early Universe. These variations are thought to be evidence
for the gravitational forces that eventually drew together the galaxies and clusters of galaxies
that we observe today. The second experiment was the Wilkinson Microwave Anisotropy
Probe (WMAP). This had greater resolution, and surveyed the entire sky. Figure S35.12
shows the results of the WMAP mission.
A third mission, Planck, led by the European Space Agency with participation from
NASA, was launched in 2009 and has made even more accurate maps of the CMB. It has
mapped the polarisation of the CMB, and results from this suggest that the first stars formed
much later than was previously thought.
17
Figure S35.11 The horn antenna with which Penzias and Wilson first detected the cosmic
microwave background radiation.
Figure S35.12 WMAP’s map of the temperature of the cosmic microwave background radiation.
Hot spots show as red, and cold spots as dark blue. The variation in the temperature of the CMB is
only over a range of a few microdegrees (10 −6 K); the equipment used to map these variations has
to be very sensitive and placed in space.
Further evidence for the Big Bang theory comes from the composition of very distant
galaxies and old stars. The amount of each element in these astronomical objects is the
same as predictions developed from the Big Bang theory. However, we can only make this
comparison by looking at very old objects, which tend to be very far away. Stars that formed
more recently have a different composition, because they contain elements that were made by
nuclear fusion in previous generations of stars.
Unanswered questions
Cosmology is still an active field of research, and there are many unanswered questions. One
is the question of why there is an imbalance between matter and antimatter in the Universe.
The Big Bang theory predicts that there should have been equal amounts of matter and
antimatter produced, but our Universe is dominated by matter. There are various proposed
theories to explain this imbalance, but as yet no scientific consensus.
Another interesting question is related to the CMB. The variations in temperature across
the sky are remarkably small: although the variations exist and have been mapped, it is
surprising that regions of the Universe that have apparently never been in contact with each
other have come into thermal equilibrium at very nearly exactly the same temperature. This,
and some other cosmological problems, can be solved by postulating a very short period after
the Big Bang where there was a huge burst of expansion, called ‘inflation’. For this inflation
to happen, there would have had to have been an unknown form of energy present, which
has so far not been detected. This energy would have been unevenly distributed in space due
to quantum fluctuations when the Universe was very small, and it is thought that this should
give rise to the patterns that are seen in the COBE and WMAP images of the Universe.
We cannot see back beyond the time of recombination by observing photons, as the
Universe was opaque before then. One possibility for investigating the Universe at the time
when inflation was taking place is by detecting gravitational waves. These were predicted as
part of Einstein’s general theory of relativity, but they have proven to be extremely difficult
to observe. On 11th February 2016, physicists at the Laser Interferometer Gravitational-Wave
Observatory (LIGO) in the United States announced the first observation of a gravitational
wave, which was produced by a collision between two black holes.
Summary 18
■ Luminosity is defined as the total power emitted by a star. Luminous flux is the
power per unit area of surface perpendicular to the radiation at a distance d from the
L
star, and is given by the equation F =
4π d 2
■ The peak of the spectrum of a star allows us to estimate its surface temperature,
using Wien's displacement law.
■ Stefan's law L = 4π r 2σ T 4 allows us to calculate the luminosity of a spherical body.
By using Stefan's law and Wien's displacement law together, we can estimate the
radius of a star based on the peak wavelength in its spectrum and its luminosity.
■ A source of electromagnetic radiation moving relative to an observer undergoes
a shift in wavelength (and therefore frequency) given by the equation
δλ δ f v
z= = =
λ f c
■ The vast majority of galaxies in our Universe are observed to be 'redshifted', which
implies that they are moving away from us. This leads to the idea that the Universe
is expanding. The fact that it is expanding leads to the idea that it originated in a
singularity known as the Big Bang. An expanding Universe does not imply that there
is pre-existing empty space for the Universe to expand into: space itself was created
in the Big Bang.
■ There is other evidence to support the Big Bang theory, such as the detection of the
cosmic microwave background radiation.
■ The equation v = H 0d can be used to relate the speed of recession of distant objects
to their distance from us.
■ The constant H0 in Hubble's equation gives rise to the Hubble time 1 , which gives
us a first estimate for the age of the Universe.
H0
S35.1
a T
he Sun has a surface temperature of 5700 K. It has a radius of 6.96 × 108 m. Use Stefan’s law to fi nd
the luminosity of the Sun. [2]
b U
se your answer to (a) to estimate the luminous fl ux at the radius of the Earth’s orbit,
1.496 × 108 km from the Sun. [2]
c Use Wien’s law to calculate the peak wavelength of the electromagnetic radiation from the Sun. [2]
S35.2
a Defi ne, for a star, the following terms:
i Luminosity [1]
ii Luminous fl ux. [1]
b Explain carefully how astronomers can estimate the luminosity of a star from its colour. [3]
c What information can be gained from the absorption spectrum of a star? [2]
d A
n ultraviolet line from the hydrogen spectrum has a wavelength of 121.6 nm when measured
in the laboratory. The same line measured in the radiation from a distant galaxy has a wavelength
of 130.5 nm.
i Calculate the velocity of recession of the galaxy. [2]
ii Estimate the distance of the galaxy from the Earth. The Hubble constant is approximately
2.3 × 10−18 s−1. [2]
S35.3
The binary system of stars 61 Cygni is observed to have a parallax of 0.286 arcseconds. 19
a Show that the parallax is 0.000 0794°. [1]

b Given that the radius of the Earth’s orbit around the Sun is 1.50 × 1011 m, calculate the distance to
61 Cygni. [2]
c T
he luminous fl ux on the Earth of 61 Cygni A (one member of the binary) is observed to be
4.0 × 10−10 W m−2. Calculate the luminosity of 61 Cygni A. [2]
S35.4
The table below shows the distance to a number of galaxies and their speeds as used by Hubble in 1921.
Galaxy Distance (Mpc) Speed (km s−1)

NGC-5357 0.45 200
NGC-3627 0.9 650
NGC-5236 0.9 500
NGC-4151 1.7 960
NGC-4472 2.0 850
NGC-4486 2.0 800
NGC-4649 2.0 1090
a P
lot a graph of the speed of the galaxies (on the vertical axis) against the distance to each galaxy
(on the horizontal axis). Draw a line of best fi t and calculate its gradient. [4]
b State Hubble’s law. [1]
c Determine a value for Hubble’s constant from your graph. [1]
d U
se your value of Hubble’s constant to estimate the age of the Universe. How does this compare
to current estimates of the age of the Universe? [2]
Questions adapted from http://spacemath.gsfc.nasa.gov/universe/5Page1.pdf
S35.5

a A
typical Milky Way star has a speed within our Galaxy of 20 km s−1. Estimate the maximum shift of a
line of wavelength 486 nm in the hydrogen spectrum of the star which results from such a speed.
b T
he spectrum of the Andromeda galaxy (the nearest spiral galaxy beyond the Milky Way) shows
[2]
blue shift . Why is this observation unusual? [2]
S35.6
Explain how redshift leads to the ideas of the expanding Universe and to the Big Bang theory.
S35.7
Explain the origin of the Cosmic Microwave Background Radiation, and how it provides signifi cant
evidence for the Big Bang theory.
20

Pre-U Physics

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Pre-U Physics

Încărcat de

Drepturi de autor:

Formate disponibile

Cambridge Pre-U Physics

S1.1 Using x instead of s for displacement

S2: Accelerated motion

S2.1 Describing motion using calculus

The integrated function for displacement should look familiar!

S4: Forces – vectors and moments

S4.1 Adding three or more forces

S4: Forces – vectors and moments

Worked Example S4.1

Figure S4.3 For Worked example 2.

Worked Example S4.1 (continued)

Step 3 We need to calculate W and N.

Step 4 We use Newton’s second law to find the acceleration:

S5: Work, energy and power

S5.1 Heat engines

S5: Work, energy and power

S5.2 Gravitational potential

Worked Example S5.1

S5: Work, energy and power

S5.3 Calculating efficiency using power

■ Change in gravitational potential is given by g∆h and is measured in J kg−1.

mB∆vB = FAt and mA∆vA = FBt

mB∆vB = − mA∆vA so that mA∆vA + mB∆vB = 0

Worked Example S6.1

final momentum − initial momentum = (−3) − 5 = −8 kg m s−1

The momentum change is negative, so it is to the east.

where t = 160 ms = 0.16 s. Hence

S6.2 Determining impulse from a force–time graph

Force area under

S7: Matter and materials

S7.1 Describing materials

elastic limit strength

Figure S7.3 A materials testing rig and samples.

The importance of deformation

S7.2 Explaining stress–strain graphs

Figure S7.6 Cross-links in a polymer.

a wave at time t = 0 after time t = T/2, the wave

13.2 Waves at boundaries

a The law of reflection

dotted lines are the

Change of phase on reflection

fixed end free end

The laws of refraction

b zoom in on dashed square

Equating the two expressions:

Absolute refractive index

So the equation we derived above can be re-written as

When re-arranged this gives us a simpler form of Snell’s law:

Material Refractive Index

Worked Example S13.1

Worked Example S13.1 (continued)

1.00 sin 45

Worked Example S13.2

water (n = 1.33) 35°

Step 2 Rearrange Snell’s law to find θ2.

 1.33 sin 35 

SUMMARY OF THE LAWS OF REFRACTION

Figure S13.7 Snell’s law

Trigonometry tells us that in triangle AOB

Combining these two equations with Snell’s law tells us that:

However, as we make θ1 smaller, length OB tends to length AB and OC tends to AC

Dispersion and the prism

Total internal reflection

a refraction critical total internal 12