Sunteți pe pagina 1din 12

A Simple Derivation of Einsteins Field Equations

Written by: Joshua Pilipovsky Editor in Chief: Karol Woloszyn Presenter in Chief:
Siddique Shafi Midwood High School

This is going to be an introduction to and simple derivation of Einsteins Field


Equations. Geared for relatively novice math backgrounds, this derivation will be
neither rigorous nor ambiguous. It will be a clear and concise way to understand the
genius that Einstein had when he came up with this theory of relativity.
Before we begin, I must outline the two fundamental principles that you must know
before we can even begin to derive these equations. These two are the principle of
equivalence and curved spacetime. Imagine that you are in an inertial box (no
outside forces), and you are traveling upwards with an acceleration of g. According to
the principle of equivalence, this will be the same exact thing as if you are stationary
on Earths surface subject to a downward acceleration of g. You will feel absolutely
no difference in your referencn bb e frame between these two different scenarios. This
becomes important as you will see. Imagine you are again in this accelerating inertial
frame of the box and there is a beam of light going to the left edge of the box. As
you are moving up in space with an acceleration of g, notice that according to you,
the light will go down because you are going up, logicly. However, if you consider
three instances and add the positions of the light into one diagram, you will notice,
surprisingly, that light actually bends in a parabolic manner. This is a very surprising
result from the classical point of view, because light should always follow a straight
line path. However, this is not quite the case and light actually does bend in an
inertial reference frame. Since this bending occurs in the accelerating frame, by the
principle of equivalence, light must also then bend when you are stationary subject to
the gravitational force. We cannot see this with our eyes, however, because light is
traveling at far too great of a speed for our eyes to perceive this minimal bending.
This is quite a result, but how can we account for this? One might say this has to
do with the gravitational attraction so lets begin there. The gravitational attractions
between two masses is
GM m
F =
r2
However, one immediately runs into a problem in that the mass of light, which, according to quantum theory, is made of a quanta of photons, which have masses of 0.
So this whole term reduces therefore to 0 which is obviously trivial. Einstein knew
this, and he came up with a new approach... He said that all forms of motion are
in curved spacetime. Now what does this exactly mean? He postulated that light
bends not because of a gravitational attraction between it and the Earth, but it bends
because the Earth causes a curve in the spacetime and light just follows this curve in
a straight line. Think about it; lets say you go onto a curved surface. You are just
following a straight line path, but that straight line path is in fact curved, so you are
just following that curved path.
This explanation of how light travels also accounts for the Newtonian gravitational
m
. Consider a bead on a trampoline. It would only make a little
attraction, aka the GM
r2
kink in it. However, if I am standing on that trampoline, there will be a large bend in
it, much greater than that of the bead. Therefore, due to my mass, the bead will start
rolling towards me, or towards the dip in the trampoline. This analogy accounts for
the gravitational attraction between two masses. They attract each other because of
the dip in the space time, following the path of least energy. Analogously, Earth also
follows this principle of curved spacetime. When Earth is orbiting around the Sun, it
is actually going in a straight line, however the curve in spacetime due to the sun is
forcing the Earth to follow its curved path. If this curve is flattened, Earth would just
2

be going in a straight line after all, but this curved spacetime is what makes for the
curved path of the Earth, which, as I repeat, is just following a straight path.
As a little digression, this pivotal fact of curved spacetime has some useful applications in the large scale world of astrophysics, namely gravitational lensing. In fact, it
is indeed possible to see galaxies which are behind other galaxies, which, if you think
about it, is completely remarkable. This fact, with the ingenious methods of scientists
and astrophysicists, have them able to measure the distances the two galaxies are from
each other, and even signal supernovae that happened behind one galaxy and in the
other. Imagine this in our world, we look at someone, and we can see the thing that
is behind them...absolutely mind blowing fact. And as I said again, this is all due to
the curved spacetime, which allow for light to bend, allowing us to view through our
telescopes the planets and galaxies that lie behind others.
Now, you may be asking, what exactly is a spacetime ? Spacetime is intuitively the
combination of our 3-D space and time into a 4 dimension model of the universe. This
combination of space and time constitutes a space called the Minkwoski Space,
which is very handy for physicists to work with, but we wont get in to that for
our purposes. Minkwoski Space is similar to Euclidean space in that they perform
similar functions in regards to transformations, rotations (although Minkowski Space
transformations are invariant, meaning independent of frame of reference), or anything
else, other than the obvious difference that Minkowski Space is 3+1 dimensions while
the former is 3. Great! Now that we are done with the basics, we can finally start
deriving the Einstein Field Equations, which read:
1
8G
R g R + g = 4 T
2
c
Let us first begin by asking ourselves: what do these equations actually mean, all
these hieroglyphics (to some) can be extremely confusing. Starting from the far left,
we encounter R , which is called the Ricci tensor. The Ricci Tensor essentially
is a measure of the curvature of an object, in laypersons terms. More technically,
if measure the deviation of a geodesic in a Riemannian Manifold (which is a curved
space) from a stardard Euclidean n-space, which is just the usual n coordinate axes
rectangular system. Next, we bump into g , which is known as the metric tensor.
This tensor is a correction for pythagoreas theorem in curved space time. If you
imagine a triangle in a curved space, say a sphere, the hypotenuse will most definitely
not be a straight line as in a regular Euclidean space, but instead it will be curved,
and the metric tensor accounts for this and thus you are able to have pythagoreas
in curved space. Right next to this, we have R, which is the Ricci scalar. This is
basically the Ricci tensor but it is truncated to become a scalar (more about this later;
a scalar is actually a tensor of rank 0). Lastly, on the far right, we have T , which is
the stress-energy-momentum tensor. This tensor accounts for all the energy and
mass in the universe into a nice compact matrix, which all tensors can be defined as.
Now we come to the fundamental conceptual goal of this paper, answering the
question: What does this equation really mean? The terms on the left hand side
(LHS) of the equation all represent space and time, while all of the terms on the right
hand side (RHS) all represent mass and energy. What this is fundamentally saying is
that mass tells spacetime how to curve, and curved spacetime tells mass how
to move. This is the essence of Einsteins Field Equations. If you dont understand
anything from here on out, this should be your main take-away message. With that
understood, we can continue to derive this equation.
3

To start our derivation, we will begin by analyzing the very basics of differential
geometry, and how this will lead us to define the metric tensor. Consider a field in
our, lets call it, 3 dimensional space. Lets say that we want to find the height at
any point in this field (call it a cow field), and that we also want to see how our height
will change if we move in the x or y directions. Let us consider that we are standing
on top of a little bump on this cow field, where there is essentially a ridge. A ridge is
define to be a maximum of this fielding, meaning that if we move along the x direction,
our height wont change because we are already at the maximum, but if we move along
the y direction, we take a big dip down, so we will have a negative change in height.
Now consider a gradient in this field. A gradient, in our case, is the ratio between
the height and the distance traveled, whether in the x or y directions. For example,
consider a gradient of 1:10, meaning for every 1 meter of height, we go 10 meters in the
(for example) x direction. To show this mathematically over an infinitesimal length,
we say that the change in height is modeled by:
d =

d
dx
dx

This equation means that the change in height d is equal to the gradient 1 d
muldx
tiplied by the distance traveled dx. So, as an example, lets say that we moved 5
meters horizontally in this gradient. What would be our change in height? Well,
1
d = 10
(5) = .5, so we have moved a half a meter up our field.
This equation is, if you think about it mathematically, just the chain rule, where
is a function of either x or y (in our case both, which we will see later), and therefore
when we take the derivative of it, we then have to take the derivative of x due to the
chain rule. However, this is one little nuance that we missed out on, that direction
is extremely important here. If we take the change in height on our ridge in the x
direction, it would be 0, while in the y, it would be a negative number. Therefore, we
will have two seperate equations for the change in height in our x and y directions of
our field.
d
d
dx
dy =
dy
dx =
dx
dy
These two are clearly not the same because of the analogy to the ridge. Now that
we understand gradients and changes in height, let us move onto the pythagorean
theorem. Everyone knows the pythagorean theorem, but lets apply it for infinitesimal
lengths such that we use differentials. Therefore, if we have a length dx and dy, then
the third side forming the right triangle, namely ds, will, according to pythagoreas, be
governed by this formula:
dx2 + dy 2 = ds2
If, however, we treat these lengths (which are scalars), as vectors (which have magnitude and direction), then we will just have
d~s = d~x + d~y ,
1

This is a very watered down version of the gradient for mathematically simplicity. The gradient
is actually a term that you use in multivariable calculus, and is defined as such:
=< fx , fy , fz >
This means that the gradient is actually the partial derivative (which I will explain later) of our
function f in every dimension that we are working with, in this case x, y, and z. Also, the gradient
is a vector, so it should be written with <> noting its components.

as simple as that because of basic vector addition, where we use the tip to tail method
of adding two vectors. Now ds is pretty general, but for our purposes we just want
the change in of our field, so that equation will turn into
ds = dx + dy
Recall previoulsy that we had found equations for dx and dy , and so we can just
plug them in:
d
d
dx +
dy,
ds =
dx
dy
and if you think about it, this is just the chain rule applied two times for our two
variable x and y. However, to be proper, we should be writing partial derivatives
instead of regular ones for the following reason: We are working a three dimensional
space, so when we are changing our height , we can be moving in either the x or the
y directions. For this reason, we must use partial derivatives instead of regular ones
because the partial symbolically means keeping the other constant, its a partial, not
a total derivative. For example, if we only want to see our rate of change in the x
direction, we take the partial with respect to x, keeping y constant, so we are not even
considering y. In single-variable calculus, there was only one direction you can move
in to affect the function, so there was no need to use partial derivatives. Thus, our
equation should be properly written as:
ds =

dx +
dy.
x
y

Here we stop for a much needed nomenclature change. In the context of our field,
we are only working in three dimensions but in the context of general relativity, we
must generalize to n dimensions. So, if we keep writing x and y and z, we will
eventually run out of letters of the alphabet so from here on out, we will be writing
x = x1 , y = x2 , z = x3 , and so on so that when we generalize in n dimensions, we
will have consistent results. So, as a result, our previously corrected equation for the
change in height of a field is now read as:
ds =

1
dx + 2 dx2 + ...
1
x
x

If we have more dimensions, the terms will keep simply adding together and so we can
easily generalize this by saying:
ds =

X
dxn
n
x
n

(1)

However, we have a very big problem on the horizon. The calculations we have
just made all stem from the coordinate axes that we have set on the field in the
beginning. What if this coordinate system was rotated 90 degrees or 40 degrees?
How would our results change? We want our rules to be invariant of reference frame,
meaning it doesnt matter what coordinate system we have, the results will always be
the same. This is extremely important because in the future we will have tensors,
which are defined to be invariant of frame as well, so we need to be consistent with
our methodology. Thus, we need to repeat this process for another coordinate system,
5

say y 1 and y 2 and we want to see if the gradient in this coordinate system is the same
as in the x coordinate system or different, and if so, how can we relate?
So lets begin. For our y coordinate system consisting of y 1 and y 2 coordinates, we
can use the chain rule as we did previously to find the gradient in the y 1 direction. So,

x1
x2
=
+
y 1
x1 y 1 x2 y 1
You can easily see that for the gradient of one y coordinate, you need to know the
gradients of all x coordinates. Expanding this equation in n dimensions, we get

x1
x2
=
+
+ ...
y 1
x1 y 1 x2 y 1
or more generally

X xm

=
y n
xm y n ,
m

(2)

for any n that we would like to chose. Just to reiterate, WE choose the n value, while
ALL of the m values are summed up.
This equation is very important in that it represents the change in the height of
the field in the y frame of reference in terms of the x frame of reference, and as you
m
. Now that we have that finished,
can see the two are only seperate by the term x
y n
we can move on to the instrumental topic of tensors. So, this is how the story goes:
Consider a scalar, something that has only magnitude, no direction. A scalar is called
a tensor of rank 0. Now consider a vector, which has a magnitude and a direction. A
vector is a tensor of rank 1. If we keep following this pattern, we get to rank 2, and
these are formally known as tensors.
Definition:
1. A combination of vectors that has a fixed relationship among themselves.
2. If a tensor is 0 in one reference frame, then it is 0 in all reference frames. (Invariant)
The latter fact is extremely important and therefore we say that a tension is invariant
under coordinate transformations. So let us recap for a moment what we have done
so far: We first found the change in height of a field. Next, we found how the height
of a field transforms under coordinate transformations, from the x coordinate system
to the y. Casually, we would also like to know how tensors transform from different
coordinate systems as well, because as I repeatedly stress, we want everything to be
invariant under transformations. However, lets first start off by asking ourselves, how
does a vector transform?
This is a very easy task as you shall see. Consider a vector in the x frame of
reference Vxm . We would like to see how this vector transforms in the y frame of
reference Vyn , and establish a relationship between them. Well, recall equation (1):
ds =

X
dxn
n
x
n
6

Now, we just replace the d with Vyn and with y and dx with Vxm and weve got our
equation:
X y n
Vyn =
Vm
(3)
m x
x
m
This is how vectors transform between two coordinate systems, with their relationship
y n
being the term x
m . As I remind you again of this rather confusing but important
concept; we pick the n value (there is only 1) but all m values are summed over2 as in
our previous equation (equation 2).
So what does a tensor actually look like mathematically? We defined a tensor
before qualitatively, so now lets take a quantitative look:
T mn Am B n
Notice that the tensor has 2 indices, both containing the vectors that make it up.
Also notice that the tensor has mxn components because for example if m and n both
ranged from 0 to 3, then each has 4 components for each of its dimensions, so the
tensor would have 4x4 = 16 components contained inside it. From this definition, you
can easily see how a tensor transforms because we now just have to simply plug in
equation (3) into this tensor formula and see what happens from there:
n
Am
y By =

X y m
x

Arx
r

X y n
s

xs

Bxs ,

where r and s are the so called dummy variables because they are just indices that
are being summed over and have no significant purpose in this equation; basically they
act as place holders because the indices m and n were already taken up. Therefore,
Tymn =

X X y m y n
r

Tymn =

xr xs

Arx Bxs

X X y m y n
r

xr xs

Txrs

(4)

Now this is a real tensor calculus-like equation!3 But what is this actually saying? Our
whole goal of this was to see how tensors transform in changing coordinate systems,
specifically in our case from the x to the y coordinate system, and this formula shows
the relationship between them, explaining mathematically how they transform. In
technical terms, this transformation where the indices are upstairs (on the top of the
tensor) is a called a contravariant transformation. There is an equivalent form of
2

In general, if you have an equation with repeated indices, they are always summed over, no matter
what. This is why Einstein, when deriving these equations, actually dropped this summation, which
later came to be called the Einstein Summation Convention, because it is automatically assumed
that repeated indices are always summed up. For an example of repeated indices, consider this:
g dx dx
The repeated indices are the and , so they are automatically summed over, so we technically do
not need the summation there.
3
As you are beginning to see, we will soon need to follow the Einstein Summation Convention
because there will be wa-a-a-a-ay too many summations to write explicitly - all repeated indices are
summed over!

this equation called the covariant transformation where all of the terms are flipped
and we will be needing that too so it is useful to have it:
y
Tmn
=

X X xr xs
Tx
m y n rs
y
r
s

(5)

In essence, all of this was preparation for deriving the metric tensor as we said we
would do in the beginning of this section. So without further ado, lets begin. Consider
again the pythagorean theorem, no vectors this time, just magnitudes.
2

ds2 = dx1 + dx2 + dx3 + ...


=

dxm dxm

XX
m

dxm dxn mn

Woah... what happened there? Why is there this really weird delta term? This mn
is called the kronocker delta. The kronocker delta is 1 if the indicies m and n are
equal to each other and 0 if they are not, so
(
1 if m = n
mn
0 if m 6= n
So, for all the terms when m is equal to n (all of terms that are needed in the
pythagorean theorem because they must be equal), then the kronocker delta is 1,
and we are just left with dxm 2 , which is what we want. This unusually complicated
way of writing the pythagorean theorem is needed as you shall see4 . So, if you recall
equation (1)5 , we can rewrite it and plug the corresponding dxm and dxn into this
equation for the unusually complicated pythagorean theorem to achieve our result:
ds =

dxm =

X
dxn
n
x
n

xm r
dy
y r
ds2 = mn

dxn =

xn s
dy
y s

xm r xn s
dy
dy
y r
y s

ds2 = mn

xm xn r s
dy dy
y r y s

and finally, the term with the partial derivatives (which are summed over by the
Einstein Convention) and the kronocker delta is defined to be the metric tensor!
xm xn
mn r
gmn
y y s
4
5

(6)

From now on, we am going to assume the Einstein Summation Convention


Or if you just want to think of it as the chain rule from multivariable calc extended in n dimensions

Notice that the metric tensor reduces to the kronocker delta in flat, Euclidean
space, but when we in curved space, we need this metric tensor. Thus, our equation,
with the metric tensor, reads:
ds2 = gmn dy r dy s
Think of a curved space, say a sphere. Imagine a right triangle on this sphere, the
hypotenuse (as I said in the intro to this paper) will be curved so the metric tensor
accounts for this curvature with the two terms of partial derivatives. So there we have
it, the the metric tensor of Einsteins Field Equations finished, lets move on to the
arcane Christoffel Symbols, which we will ever so need when we derive the Ricci
tensor.
Recall the definition of a tensor; a tensor is something that is invariant under
coordinate transformations. Therefore, if we have
x
x
Wnm
= Vnm

for the x direction, then this statement would be tru in all frames of reference. For
example relating to our cow field, if the height at a point is 2 in one frame of reference,
then it is 2 in all frames of reference, regardless of the placement of the reference point.
Now, we just found how the pythagorean theorem transforms in curved space. Sticking
with this idea, lets see how tensors, specifically the derivatives of tensors, transform
in different coordinate systems. The result will truly be interesting as you shall see.
x
such that it is the derivative of some vector Vmx .
Say we have a tensor Tmn
x
Tmn
=

Vmx
xn

y
, which is this same tensor, but now in the y frame of
Now consider a tensor Tmn
reference. Our fundamental question is: Does this tensor in the y frame of reference
transform as the derivative of the vector in the f frame of reference? Or, mathematically,
y
? Vm
y
Tmn
=
y n

The answer to this is no, for reasons that we will see. To do this, we need to essentially
make a counter proof, which means that we need to make the calculation to show that
this equality does not hold, and thus if it does not hold, then the equality is false.
Consider equation (5), which shows how two tensors transform in different reference
frames (using the covariant form):
y
Tmn
=

xr xs x
T
y m y n rs

Rewriting this, replacing the tensor on the right hand side with the partial derivative
of the vector, we get
xr xs V x
= m n rs
y y x
Notice that the second and third term are actually the inverse chain rule, so we can
contract them together to form one term:
y
Tmn
=

xr Vrx
y m y n
9

Our question is if this is =

y
Vm
n
y

y
Tmn

Summed all in one,

xr Vrx ? Vmy
=
y m y n
y n
Well, if we rewrite equation (5) for a single index, we get:
y
Tmn
=

Vmy
xr x
=
(
V )
y n
y n y m r
, the only thing we did was drop the s index from that equation, contracting in a way
the tensor to a vector. Now we have a derivative of a product, so we use the good ol
product rule!
r
Vmy
xr Vrx
x x
=
+
V
r
y n
y m y n
y n y m
xr
Vmy !! y
=
T
+
mn
y n
y n y m
xr
y
y
Tmn
= Tmn
+ n m
y y
You can see that we have our result, but with an additional term, and this combination of derivatives we call a Christoffel Symbol, denoted as such:
rnm =

xr
y n y m

So, we see that tensors which are derivatives of vectors do not truly transform invariantly under coordinate transformations, and that in doing this, we get this extra
term which is the Christoffel symbol. Again, as gmn was a correction of pythagoreas in
curved space, thus rmn is a correction for the transformationy of derivatives of tensors
y
m
in different coordinate systems! So, in conclusion, Tmn
6= V
y n
Here, we introduce some more nation, namely the notation of the covariant derivative. The covariant derivative is a very useful notation to adapt when studying general
relativity because when you take derivatives of tensors, you just saw that they dont
transform invariantly. On the contrary, they transform with this gamma correction
term stuck there in the equation. To alleviate our major stresses with this correction
term, we use the covariant derivative, which does, when applied to a vector, transform
it in any reference frame (yay!). The notation is as follows:
y
Tmn
= n Vmy ,

and this n is the covariant derivative. See, in this case, we dont need to add that
correction term, because it is already contained inside the covariant derivative. So,
summing this all up in a nice neat formula:
y
Tmn
= n Vmy =

Vmy
+ rnm Vrx
y n

(7)

So what is this Christoffel Symbol? Simply put, it is the compensation so that the
derivative of a vector as part of a tensor transforms invariantly.
10

Ok, this is great but this is only for how derivatives of vectors transform, we want
to know how derivatives of tensors transform!! Luckily, this is very easy, all we have
to do is add one more gamma term for the extra index:
p Tmn =

Tmn
+ rpm Tnr + rpn Tmr
y p

(8)

With this knowledge, lets conceptually try to understand the answer to the followx
, where gmn is of course the metric tensor and the former is the
ing: What is r gmn
covariant derivative with a dummy variable r. Lets take two cases, in flat and in curved
space. We have already shown that the metric tensor is equal to the kronocker delta
in flat space, because since it is not round, there need not be any correction,
so the
(
1 if m = n
x
x
=
mn
tensor greatly reduces to just this constant delta term, gmn
0 if m 6= n
And we all know (well i hope we do) that the derivative, whether covariant, contravariant, partial, regular derivative of any constant is always 0, therefore in flat space, the
covariant derivative of the metric is 0. In addition, if we know that this derivative is
0 in the x frame of reference that we were working with, we know that it is 0 in all
frames of reference because of the definition of a tensor! Thus, from equation (8):
p gmn =

gmn
!
+ rpm gnr + rpn gmr = 0
p
y

What have we just done you may ask? This is just an obvious consequence from
taking the covariant derivative of a constant, but it is not. We now have an equation
with the metric, the derivative of the metric, and the gamma correction term. What
if we want to solve for gamma in terms of the metric and its derivatives. For this, we
(and so did Einstein) go to our handy mathematicians so they can solve this for us!
The result (brace yourself) gives us:
gdc gab gbc
1

),
abc = g ad ( b +
2
x
xc
xd

(9)

where a, b, c, and d are just the dummy indices that are being summed over, remind
you, by the Einstein Summation Convention. A few remarks; you now have an equation
for in terms of the gmn and gmn . Also, itself is not a tensor, but rather a correction
term that is made of partial derivatives. In addition, = 0 always in flat space, because
the metric is 0 in flat space and it is made up of the metric, so if one is 0, then it
implies that the other is too. This will become important, because the Ricci Tensor,
the fundamental tensor for curvature, has Christoffel Symbols buried inside it. Up
next: the Ricci Tensor!
Consider a curved space (if you want to think of a cone, be my guest). Take a
vector in this curved space and parallel transport this vector around the circumference of the curved space. Parallel transport is the act of moving a geometrical object
(such as a vector in our case) along a smooth manifold. In flat space, if we have two
points A and B that are the same, when we parallel transport a vector, it will have
the same magnitude and direction at this point, because they are intuitively the same
point. Now we consider a curved space, like a cone, which if we unwind the curved
space into a flat space, A and B will be different points. If we parallel transport the
vector starting at A, it will end at point B, but its direction will be much different because of the geometry of the cone. The deviation from the actual place that the vector
11

should be when there are no discontinuities is the angle . This measures the curvature of the geometrical object. Now we must digress to explain a little more notation.
Definition: The commutator [A, B] AB BA. In the classical sense, the commutator is always equal to 0, but for example quantum mechanically, when we consider
spin operators, this is not the case. Thus, we say that spin operators do not commute.
Now let us try

[ , f (x)]
x
and see the result. We will find that it does not actually equal 0 because we are
working with partial derivatives.
[

, f (x)]V =
f (x)V f (x) V
x
x
x

For the former, we can use the product rule for differentiation:
=V

f (x)
V
v
+ f (x)
f (x)
x
x
x
=V

f (x)
x

Therefore,
[

f (x)
, f (x)] =
x
x

12

S-ar putea să vă placă și