Sunteți pe pagina 1din 79

CALIFORNIA INSTITUTE OF TECHNOLOGY

PHYSICS MATHEMATICS AND ASTRONOMY DIVISION

Freshman Physics Laboratory (PH003)

VadeMecum for
Data Analysis Beginners
(October 3, 2007)

c
Copyright Virgnio
de Oliveira Sannibale, June, 2001

Acknowledgments
I started this work with the aim of improving the course of Physics Laboratory for Caltech freshmen students, the so called ph3 course . Thanks
to Donald Skelton, ph3 was already a very good course, well designed to
satisfy the needs of news students eager to learn the basics of laboratory
techniques and data analysis.
Because of the need of introducing new experiments, and new topics in
the data analysis notes, I decided to rewrite the didactical material trying
to keep intact the spirit of the course, i.e emphasis on techniques and not
on the details of the theory.
Anyway, I believe and hope that this attempt to reorganize old experiments and introduce new ones constitutes an improvement of the course.
I would like to thank, in particular, Eugene W. Cowan for his incommensurable help he gave to me with critiques, suggestions, discussions,
and corrections to the notes. His experience as professor at Caltech for
several years were really valuable to make the content of these notes suitable for students at the first year of the undergraduate course.
I would like to thank also all the teaching assistants that make this
course work, for their patience and valuable comments that I constantly
received during the academic terms.
Sincerely,
Virgnio de Oliveira Sannibale

Contents
1

Physical Observables
1.1 Random Variables and Measurements . . . . . . . . . . . . .
1.2 Uncertainties on Measurements . . . . . . . . . . . . . . . . .
1.2.1 Accuracy and Precision . . . . . . . . . . . . . . . . .
1.3 Measurement and Probability Distribution . . . . . . . . . .
1.3.1 Gaussianity . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Gaussian Distribution Parameter Estimation for a Single Variable . . . . . . . . . . . . . . . . . . . . . . . .
1.3.3 Gaussian Distribution Parameter Estimation for the
Average Variable . . . . . . . . . . . . . . . . . . . . .
1.3.4 Gaussian Distribution Parameter Estimation for the
Weighted Average . . . . . . . . . . . . . . . . . . . .
1.3.5 Example (Unweighted Average) . . . . . . . . . . . .
1.3.6 Example (Weighted Average) . . . . . . . . . . . . . .

9
9
12
13
14
16

Propagation of Errors
2.1 Propagation of Errors Law . . . . . . . . . . . . .
2.2 Statistical Propagation of Errors Law (SPEL) . .
2.2.1 Example 1: Area of a Surface . . . . . . .
2.2.2 Example 2: Power Dissipated by a Circuit
2.2.3 Example 4: Improper Use of the Formula
2.3 Relative Uncertainties . . . . . . . . . . . . . . .
2.3.1 Example 1: . . . . . . . . . . . . . . . . .
2.4 Measurement Comparison . . . . . . . . . . . . .
2.4.1 Example . . . . . . . . . . . . . . . . . . .

21
21
22
23
23
24
24
25
26
26

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

17
17
18
18
19

Graphical Representation of Data


27
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5

CONTENTS
3.2

.
.
.
.
.
.
.
.
.
.
.

29
29
30
31
34
34
35
35
36
36
36

Probability Distributions
4.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1 Probability and Probability Density Function (PDF) .
4.1.2 Distribution Function (DF) . . . . . . . . . . . . . . .
4.1.3 Probability and Frequency . . . . . . . . . . . . . . . .
4.1.4 Continuous Random Variable v.s. Discrete Random
Variable . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.5 Expectation Value . . . . . . . . . . . . . . . . . . . . .
4.1.6 Intuitive Meaning of the Expectation Value . . . . . .
4.1.7 Variance . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.8 Intuitive Meaning of the Variance . . . . . . . . . . .
4.1.9 Standard Deviation . . . . . . . . . . . . . . . . . . . .
4.2 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Random Variable Uniformly Distributed . . . . . . .
4.2.1.1 Example: Ruler Measurements . . . . . . . .
4.2.1.2 Example: Analog to Digital Conversion . .
4.3 Gaussian Distribution (NPDF) . . . . . . . . . . . . . . . . . .
4.3.1 Standard Probability Density Function . . . . . . . .
4.3.2 Probability Calculaltion with the Error Function . . .
4.4 Exponential Distribution . . . . . . . . . . . . . . . . . . . . .
4.4.1 Random Variable Exponentially Distributed . . . . .
4.5 Binomial/Bernoulli Distribution . . . . . . . . . . . . . . . .
4.6 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Example: Silver Activation Experiment . . . . . . . .

39
39
39
39
40

3.3

3.4
3.5

Graphical Fit . . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 Linear Graphic Fit . . . . . . . . . . . . . . . .
3.2.2 Theoretical Points Imposition. . . . . . . . . .
Linear Plot and Linearization . . . . . . . . . . . . . .
3.3.1 Example 1: Square Function . . . . . . . . . . .
3.3.2 Example 2: Power Function . . . . . . . . . . .
3.3.3 Example 3: Exponential Function . . . . . . . .
Logarithmic Scales . . . . . . . . . . . . . . . . . . . .
3.4.1 Linearization with Logarithmic Graph Sheets
Difference Plots . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Difference Plot of Logarithmic Scales . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

41
41
41
42
43
43
43
45
45
45
46
47
48
49
50
51
52
54

CONTENTS
5

Parameter Estimation
5.1 The Maximum Likelihood Principle (MLP) . . . . . .
5.1.1 Example: and of a Normally Distributed
dom Variable . . . . . . . . . . . . . . . . . . .
5.1.2 Example: of a set of Normally Distributed
dom Variables . . . . . . . . . . . . . . . . . . .
5.2 The Least Square Principle (LSP) . . . . . . . . . . . .
5.2.1 Geometrical Meaning of the LSP . . . . . . . .
5.2.2 Example: Linear Function . . . . . . . . . . . .
5.2.3 The Reduced 2 (Fit Goodness) . . . . . . . . .
5.3 The LSP with the Effective Variance . . . . . . . . . .
5.4 Fit Example (Thermistor) . . . . . . . . . . . . . . . . .
5.4.1 Linear Fit . . . . . . . . . . . . . . . . . . . . .
5.4.2 Quadratic Fit . . . . . . . . . . . . . . . . . . .
5.4.3 Cubic Fit . . . . . . . . . . . . . . . . . . . . . .
5.5 Fit Example (Offset Constant) . . . . . . . . . . . . . .

7
57
. . . . 57
Ran. . . . 58
Ran. . . . 59
. . . . 60
. . . . 61
. . . . 61
. . . . 62
. . . . 63
. . . . 64
. . . . 65
. . . . 66
. . . . 67
. . . . 69

A Central Limit Theorem

71

B Statistical Propagation of Errors

73

C NPDF Random Variable Uncertainties

75

D The Effective Variance

77

CONTENTS

Chapter 1
Physical Observables
1.1

Random Variables and Measurements

During an experiment, the result of a measurement of a physical quantity1


x, (a number or a set of numbers) is always somewhat indeterminate. In
other words, if we repeat the measurement we can get a different result.
Apart from any philosophical point of view, the reasons for this indetermination can be explained considering that we are able to control
or measure just a few of the physical quantities involved in the experiment and we dont completely know the dependency of each one of them.
Moreover, all those variables can change with time and it becomes impossible to measure their evolution.
Fortunately and quite often, this ignorance does not preclude measurement with the required precision.
Lets consider as example, a physical system ( see figure 1.1) made of a
thermally isolated liquid, a heater, and a paddle wheel turning at constant
velocity. Lets assume that we want to measure the average liquid temperature versus time using a mercury thermometer as instrument. Lets then
try to list some of the potential perturbations mechanisms that can affect
the measurement:
During the measurement process the liquid temperature changes not
uniformly because the system is not perfectly isolated and looses
heat.
1 Any

measurable quantity is a physical quantity.

10

CHAPTER 1. PHYSICAL OBSERVABLES


Current
Motor
Thermometer

1111111111111111111111
0000000000000000000000
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
Paddlewheel
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
Liquid
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
Dewar
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
0000000000000000000000
1111111111111111111111
Figure 1.1: Example of a physical system under measurement , i.e a variant of the
Joules Experiment (1845). The isolated liquid is heated up by the paddle wheel
movement driven by an electric motor. The mercury thermometer measures the
temperature changes.

During the measurement the liquid is irregularly heated up because


of the location of the paddle wheel.
The measurement should be taken when the liquid and the instrument are in thermal equilibrium (same temperature, no heat exchange),
and this cannot really happen because the liquid is heated up and
because of the thermometer heat capacity. In other words, the temperature is changing constantly impeding the thermal equilibrium.
The instrument reading is affected by the parallax error, i.e. the position of the mercury column cannot be accurately read.
The accuracy of the thermometer scale divisions is not perfect, i.e.
the scale calibration is not perfect (the scale origin and/or the average divisions distance is not completely right, the divisions dont
have the same distance, et cetera...).
Liquid currents produced by the paddle wheel movement are turbulent (chaotic) affecting the uniformity of the liquid temperature.

1.1. RANDOM VARIABLES AND MEASUREMENTS

11

The heat flow due to the paddle wheel movement is not completely
constant and is affected by small unpredictable fluctuations. For example, measuring the current flowing through the electric motor we
see small random fluctuations around a constant value.
Some of those perturbations are probably completely negligible (the instrument is unable to see them), some others can be estimated and minimized, and some others cannot2 .

Disturbance
Disturbance

111111111111111
000000000000000
Excitation
000000000000000
111111111111111
000000000000
111111111111
000000000000000
111111111111111
000000000000
000000000000000
111111111111111
x+ 111111111111
000000000000
111111111111
000000000000000
111111111111111
000000000000
111111111111
Ideal Physical
000000000000000
111111111111111
Instrument
000000000000
111111111111
000000000000000
111111111111111
System

000000000000
111111111111
000000000000000
111111111111111
000000000000
111111111111
000000000000000
111111111111111
000000000000
111111111111
000000000000000
111111111111111
Disturbance
Response
000000000000000
111111111111111
1

Ambient

Figure 1.2: Model of a physical system under measurement.

Figure 1.2 shows a quite general model of the experimental situation.


The physical system as an ideal system follows a known theoretical model.
The instrument interacting with the system allows us to measure the physical quantity x. External unpredictable disturbances 1 , 2 , ..., n , ... perturb the system and the instrument. The instrument itself perturbs the
physical system, when we perform the measurement.
These considerations can be incorporated by a simple but widely used
linear model for any physical quantity x. If we call x (t) the value of a
2 One can argue that we are trying to measure something with something worse than a

kludge. In other words, if we want the average liquid temperature, we need for example
a more sophisticated apparatus that allows us to map and average very accurately the
liquid temperature. Anyway, what we can do is only minimize perturbations, but never
get rid of them.

12

CHAPTER 1. PHYSICAL OBSERVABLES

physical quantity with no disturbances at the time t, and (t) its random
fluctuations, the measured value of x at the time t will be
x ( t ) = x ( t ) + ( t ).
Any physical quantity x is indeed a random variable or a stochastic variable.

1.2

Uncertainties on Measurements

We can distinguish types of uncertainties based on their nature, i.e. uncertainties that can be in principle eliminated, and uncertainties that cannot
be eliminated.
Starting from this criterion, we can divide the source of uncertainties
also called errors into two categories:
random errors: any errors which are not or do not appear to be directly connected to any cause (the cause and effect principle doesnt
work), and are indeed not repeatable but random. Random errors
cannot be completely eliminated.
systematic error: any errors in the measurement which are not random.
Quite often, this kind of error algebraically adds to the measurement
a constant unknown value. This value can also change/drift with
time.
A typical systematic error comes from a wrong calibration of the instrument used. This kind of error is hard to minimize and quite often
difficult to detect. Sometimes, it can be found by repeating the measurement with different procedures and/or instruments.
We can have also systematic errors due to the measurement procedure or definition. Lets consider as example, the measurement of
the thickness of an elastic material. Lack in the procedure definition: measurement with a micrometer without defining the instrument applied pressure, the temperature, the humidity, etc...
Lack in the procedure execution: drift of physical quantities supposed to be stationary such as pressure, temperature, etc...
Systematic errors in principle can be completely removed.

1.2. UNCERTAINTIES ON MEASUREMENTS

x1

x*

13

x2

Figure 1.3: Accuracy and precision comparison. The gray strips represent
the uncertainty associated with the measurement xi , x represent the value
of the measured physical quantity with no perturbations. Measurement
x1 is more accurate but less precise than measurement x2 .

Figure 1.4: Accuracy and precision comparison. The shooter of the target
on the left is clearly more precise that the shooter of the right target. Anyway, the former shooter is more accurate than the other shooter. Which
shooter would you like to have as your bodyguard?

1.2.1

Accuracy and Precision

Here we explain two definitions that allow to compare different measurements and establish their most important relative qualities:
Accuracy: a measurement is said to be accurate if it is not affected by
systematic errors. This characteristic does not preclude the presence
of small or large random error.
Precision: a measurement is said to be precise if it is not affected by
random errors. This characteristic does not preclude the presence of
any type of systematic errors.

14

CHAPTER 1. PHYSICAL OBSERVABLES

Even if those properties definitions are absolute, real measurements can


only approximate the concept of precision and accuracy and therefore we
can only establish if one measurement is more accurate or precise than
others.
Lets analyze the two examples shown in Figures 1.3, and 1.4. In Figure
1.3, for some mysterious reasons we know the unperturbed value of the
physical quantity to be x ; then we can conclude that measurement x2
is more precise but less accurate than measurement x1 , and x1 is more
accurate but less precise than x2 . In Figure 1.4, the left shooter is quite
precise but not accurate as the right shooter, and the right shooter is more
accurate but less precise than the left shooter.
In general, accuracy and precision are antithetic properties. A measurement with very large random errors can be extremely precise because the
systematic error is negligible with respect to the random error. An analogous statement can be formulated for extremely accurate measurements.

1.3

Measurement and Probability Distribution

It is experimentally well known (or accepted) that physical quantities are


random variables, whose distribution follows quite often, and with good
precision the so called Gaussian or Normal Distribution3 . In other words,
if we are very good to keep the experimental conditions unchanged, we
perform an experiment several times, and then histogram the results we
will probably find a bell-like curve which is proportional to the Gaussian
probability density function.
The Gaussian distribution
p( x ) =

1
2

( x )2
22

is a continuous function (see figure 1.5) with one peak, symmetric around
a vertical axis crossing the value , and with tails exponentially decreasing
to 0. It has therefore one absolute maximum at x = . The peak width is
defined by the parameter .
3 In

general, statistics are not able to provide a necessary and sufficient test to check if
a random variable follows the Gaussian distribution (Gaussianity)[4].

1.3. MEASUREMENT AND PROBABILITY DISTRIBUTION

15

Pobability Density (1/A.U.)

0,4

0,3

0,2

2
0,1

0
-4

2
2
Normally distributed physical quantity (A.U.)

Figure 1.5: Probability density of a physical quantity following the


Gauss/Normal distribution. The x axis units are normalized to , i.e. the
square root of the variance of the distribution. The hashed area represents
a probability of 68.3%, which correspond to the probability of having x in
the symmetric interval ( x , x + ).
The probability dP to measure a value in the interval ( x, x + dx ) is
dP( x ) =

1
2

( x )2
22

dx ,

and in general the probability to measure a value in the interval ( a, b) is


P( a < x < b) =

Z b
a

1
2

( x )2
22

dx .

The most probable value lies indeed inside an interval centered around
, and the probability to have a value in the interval ( , + ) is 68.3%
and is represented by the dashed area of figure 1.5. The statistical result of
a measurement with a probability/confidence level of 68.3% is written as
x = ( x0 )Units,

[68.3% confidence]

16

CHAPTER 1. PHYSICAL OBSERVABLES

The half width of the interval is the uncertainty, or the experimental


error, or simply the error that we associate to the measurement of x.
Quite often, the Gaussian distribution parameters are unknown, and
the problem arises to experimentally estimate and . The theory of
statistics comes into play to give us the tools to estimate parameter distributions, which allow us to define the uncertainty of the measurement.
The basic idea of statistics is to use a large number of measurements of
the physical quantity (samples) to estimate the parameters of the distribution. In the next paragraphs we will just state some basic results with
some naive but quite intuitive explanations. They will be studied in more
detail in chapter 5.

1.3.1

Gaussianity

As it has been said before, when we perform measurements physical quantities arise that behave as random variables following, to a good approximation, the NPDF.
This experimental evidence is theoretically corroborated by the so called
central limit theorem (see appendix A). Under a reasonably limited number
of hypothesis, this theorem states that the average x of any random variable x is a random variable normally distributed, when the number of
averages tends to infinity. Often, the measurement of a physical quantity
is the result of a intentional or unintentional average of several measurements and therefore tends to follows the Gaussian distribution.
Deviations from a Gaussian distribution (gaussianity) are quite often
time dependent. In other words, a physical quantity behaves as a Gaussian random variable for a given period of time. This happens mainly
because it is always difficult to keep the experimental conditions constant
and controlled during the time needed to perform all the measurements.
Sudden or slow uncontrolled changes of the system can easily modify the
parameters or the PDF of the physical quantity we are measuring.
Anyway, it is important to stress that the gaussianity of a random variable or more in general the type of PDF should be always investigated.

1.3. MEASUREMENT AND PROBABILITY DISTRIBUTION

1.3.2

17

Gaussian Distribution Parameter Estimation for a Single Variable

Lets suppose that we measure a normally distributed physical quantity


x N times, obtaining the results x1 , x2 , ..., x N . If no systematic errors are
present, it is probable that the average x
x1 + x2 + ... + x N
N
becomes closer to the value , when the number of measurement N increases4 . We can assume indeed that the average x is the so called estimator of . To distinguish between the parameter and its estimator we will
use the symbol ^, i.e.
x =

= x,

' .

Averaging the square of the distance of each single measurement xi


from , we will have

( x1 x )2 + ( x2 x )2 + ... + ( x N x )2
.
N
Because we are averaging the distance squared between the theoretical
and experimental data-points, it is reasonable to assume that the squareroot of this value is an estimator of the uncertainty of each single measurement xi . A rigorous approach shows that is an estimator of the of the
distribution, and a more rigorous approach will show that an even better
estimator is the sum of the square of the distances divided by N 1, i.e.
2 =

2 =

1.3.3

N
1
( xi )2 ,
N 1 i
=1

' .

Gaussian Distribution Parameter Estimation for the


Average Variable

The average x of a normally distributed variable x, is a random variable


itself, and it must follow the normal distribution. What is the x of the
normal distribution associated to x ? It can be proved that
4 It

is possible and probable also to have a single measurement much closer to than
the average, but this does not help to find an estimate of .

18

CHAPTER 1. PHYSICAL OBSERVABLES

x2 =

1.3.4

2
N

x ' x .

Gaussian Distribution Parameter Estimation for the


Weighted Average

Lets suppose now that N measurements of the same physical quantity


x1 , x2 , ..., x N , are still normally distributed, but each measurement has known
2.
variance 12 , 22 , ..., N
Considering that the variance is a statistical parameter of the measurement precision, to calculate the average we can use the i as weight for
each measurement, i.e.
x =

1
iN=1 wi

wi x i .

weighted average

i =1

where
wi =

1
,
i2

i = 1, 2, ..., N .

The uncertainty associated with the weighted random variable x is


x2 '

1
iN=1 1/i2

It is left as exercise to check what happen to the previous equation


when 1 = 2 = ... = N .
To try to digest these new fundamental formulas, Lets see two examples.

1.3.5

Example (Unweighted Average)

The voltage difference V across a resistor directly measured 10 times, gives


the following values:
i
Vi [mV]

1
123.5

2
125.3

3
124.1

4
123.9

5
123.7

6
124.2

7
123.2

8
123.7

9
124.0

10
123.2

1.3. MEASUREMENT AND PROBABILITY DISTRIBUTION

19

The voltage difference average V is, indeed


1 10
V =
Vi = 123.880 mV
10 i
=1
Assuming that V is a random variable normally distributed, we will
have that uncertainty sv on each single measurement Vi is
v
u
u
sV = t

10
1
(Vi V )2 = 0.6070 mV.
10 1 i
=1

and the uncertainty on V will be


sV
sV = = 0.1919 mV.
10
Finally, we will have5
V1 = (123.5 0.6)mV, V2 = (125.3 0.6)mV, ..., V10 = (123.2 0.6)mV,
V = (123.880 0.192)mV.

1.3.6

Example (Weighted Average)

The reflectivity R 6 of a dielectric mirror, measured with 5 sets of measurements , gives the following table

i
Ri
si

1
0.4932
0.0021

2
0.4947
0.0025

3
0.4901
0.0032

4
0.4921
0.0018

5
0.4915
0.0027

5 The cumbersome notation of each single measurement is used here to avoid any kind

of ambiguity. Whenever possible results should be tabulated.


6 The reflectivity of a dielectric mirror for a polarized monochromatic light of wavelength , with a incident angle can be defined as the ratio of the reflected light power
and the impinging power.

20

CHAPTER 1. PHYSICAL OBSERVABLES

Assuming that R is a normally distributed random variable, we will


have that the uncertainty s R on the weighted average R is
s
1
= 0.001037.
s R =
5
i=1 1/s2i
The weighted average R
R = s2R

i =1

Ri
= 0.492517
s2i

Finally, we will have


R = 0.49252 0.00103.

Chapter 2
Propagation of Errors
2.1

Propagation of Errors Law

f(x0)+df

f(x 0)
f(x 1) +df
f(x1)

x1

x1+dx

x0

x0+dx

Figure 2.1: Variation of f ( x ) at two points x0 , and x1 . The derivative accounts for the difference in magnitude variation of the function f ( x ) at
different points.
We want to find a method to approximate the uncertainty of a physical
quantity f , which has been indirectly determined, i.e f is a function of a a
set of random variables that are physical quantities.
To focus the problem it is better to consider the case of f as a function
21

22

CHAPTER 2. PROPAGATION OF ERRORS

of a single variable x with uncertainty x . The differential


 
df
df =
dx
dx x= x0
represents the variation of f for the corresponding variation dx around x0
(see figure 2.1). Imposing
dx = x ,
we can interpret the following expression as an estimation of the uncertainty on f

df
f ' x ,
dx x0
where x0 is the measured value. The absolute value is calculated to take
into account a possible negative sign of the derivative. This formula can be
easily extended, considering the definition of the differential of a function
of n variables x1 , x2 , ..., xn , i.e.






f
f
f
x +
x ... +

f '
(2.1)
xn xn .
x1 1 x2 2
The previous expression is called the law of propagation of errors, which
gives an estimation of the maximum uncertainty on f for a given set of
uncertainties x1 , ..., xn . The derivatives are calculated using the measured
values of the physical quantities x1 , x2 , ..., xn .
This law is rigorously exact (but not statistically correct as shown in the
next section) in the case of a linear function, where the Taylor expansion
coincides with the function itself.
This formula is quite useful during the design stages of an experiment.
In fact, because of its linearity it is easy to apply and estimate the contributions to the error of measured physical quantities. Moreover, this estimate
can be used to minimize the error and optimize the measurement (see example 2.3.1)

2.2

Statistical Propagation of Errors Law (SPEL)

A more orthodox approach, which starts from a Taylor expansion formula


of the variance, gives the statistically correct formula for the propagation

2.2. STATISTICAL PROPAGATION OF ERRORS LAW (SPEL)

23

of errors (see appendix B) . For the case of a function f of two random


variables x, and y and the two data points ( x0 x0 ), (y0 y0 ), we have
2f

f
x

2

x20

f
y

2

y20

+2

f
x



f
y


E [( x x0 ) (y y0 )]

where the partial derivatives are calculated at ( x0 , y0 ).


This expression is called the law of statistical propagation of errors (SPEL).
In the special case of uncorrelated random variables, i.e. variables having
independent PDF, the previous equation becomes
2f

f
x

2

x20

f
y

2

y20 .

The most general expression for SPEL and its derivation can be found
in appendix B.

2.2.1

Example 1: Area of a Surface

Lets suppose that the area A of a rectangular surface having side lengths
a and b is indirectly measured by measuring the sides. We will have
A = ab,

2
' b2 a2 + a2 b2 .
A

If the surface is a square, we could write


A = a2 ,

2
A
' 2a2 a2 ,

which implies that we are assuming that the square is perfect, and it is
sufficient to measure one side of the square.

2.2.2

Example 2: Power Dissipated by a Circuit

Suppose we want to know the uncertainty on the power


P = V Icos ,
dissipated by an AC circuit where V and I are the sinusoidal voltage and
current of the circuit, and the phase difference between the V and I.

24

CHAPTER 2. PROPAGATION OF ERRORS

Applying the SPEL and supposing that there is no correlation among


the variables, we will have
P2 ' ( Icos)2 V2 + (Vcos)2 I2 + (V Isin)2 2

 
V 2  I 2
2
2
= P
+
+ (tan ) .
V
I

(2.2)
(2.3)

Considering the following experimental values with the respective estimates of their expectation values and uncertainties, we get

V = (77.78 0.71)V,

P = (90.37 5.39)W.
I = (1.21 0.071)A,

= (0.283 0.017)rad,

2.2.3

Example 4: Improper Use of the Formula




Let consider a random variable x which follows a NPDF with

E[ x ] = 0,
V [ x ] = ,

and a function f of x, f ( x ) = x2 .
Applying the SPEL to f ( x ), we obtain
f = [2x ] x=0 x = 0,
which leads to a wrong result, because this approximation (see eq. B.1
in Appendix B) is not legitimate. In fact, there is no need to expand the
function up to the second order term to understand that the second order
expansion (i.e. the function itself) is not negligible at all.
Considering the definition of variance instead, and with the aid of the
integration by parts formula, we get the correct result
f = V [ f ( x )] = V [ x2 ] = 2.

2.3

Relative Uncertainties

Let f be a physical quantity and f its uncertainty. The ratio


f
f =
f

2.3. RELATIVE UNCERTAINTIES

25

is said to be the relative uncertainty or fractional error of the physical


quantity f .
The importance of this quantity arises mainly, when we have to analyze the contribution of each uncertainty to the uncertainty of the physical
quantity f .
Expression (2.1) becomes quite useful for a quick estimate of the relative uncertainty .

2.3.1

Example 1:

We want to measure the gravity constant g with a precision of 0.1% ( g =


.001) using the equation of the resonant angular frequency of a simple pendulum of length l, i.e.
r
0 =

g
.
l

Applying the logarithm in the previous expression, and making the


derivative of both sides , we get
1
0
=
0
2

g l
+
g
l


.

Considering that uncertainties cannot exactly cancel out, we will have


g = 2

0 l
+ ,
0
l

which means that the contribution of the uncertainty on 0 and on l are


linear and different just by a factor two.
Supposing that we are able to measure 0 within less than 0.1%, we
have to be able to measure l at least with the same precision to guarantee
the required precision . If we have
l = 1m

l < 0.001m.

This formula tells us that the accuracy on the knowledge of l must be


smaller than 1mm to measure g within 0.1%.

26

2.4

CHAPTER 2. PROPAGATION OF ERRORS

Measurement Comparison

We can use the SPEL to determine if two different measurements of the


same physical quantity x are statistically the same. Lets suppose that the
following measurements
x1 x1 ,

x2 x2 ,

are two independent measurements of the physical quantity x. The difference and the uncertainty of the difference will be, indeed
q
x = | x2 x1 |,
x = x21 + x22 .
We can assume as test of confidence that the two measurements are
statistically the same if
x < 2x .

2.4.1

Example

Suppose that the following measurements


I1 = (4.398 1.256) 102 kg m2
I2 = (4.431 1.324) 102 kg m2
are two measurements of the moment of inertia of a cylinder. We will have
I = (0.033 1.825) 102 kg m2 ,
which shows that I = | I2 I1 |, is less than 2I . Statistically, the two
measurements must be considered two determinations of the same physical quantity.

Chapter 3
Graphical Representation of Data
3.1

Introduction

A good way to analyze a set of experimental data, and review the results,
is to plot them in a graph. It is important to provide all the information
necessary to correctly and easily read the graph. The choice of the proper
scale and the type of scale is also important.
For a reasonably good understanding of a graph, the following information should be included:
a title,
axis labels to define the plotted physical quantities,
the physics units of the plotted physical quantities,
a dot corresponding to each experimental point, and error bars or
error rectangles,
graphical analysis made on the graph, in particular the data-points
used clearly labeled,
a legend if more than one data set is plotted.
Figure 3.1 shows an example of a graph containing all the information
needed to properly read the plot.
Judgment of the curve fit goodness is quite often done by inspection
of the graph, the data points and the theoretical fitted curve, or better, by
analyzing the so called difference plot.
27

28

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

Current -Voltage Characteristic for a Carbon Resistor

12

y(x)=ax+b

a=(996.3+-12.9)

Voltage difference across the Resistor, (V)

10

b=(-0.103+-0.094)mA

0
0

Current through the Resistor, (mA)

Figure 3.1: Example of Graph.

10

12

3.2. GRAPHICAL FIT

3.2

29

Graphical Fit

Nowadays, graphic curve fitting of experimental data can be considered


a romantic or nostalgic way to obtain an estimate of function parameters.
Anyway, we might still face a situation where the omnipresent computer
cannot be accessed to perform a statistical numerical fit. Moreover, drawing a graph with data points is an exercise with a lot of pedagogical value,
and therefore deserves to be studied.
Statistical curve fit techniques will be explained in chapter 5.

3.2.1

Linear Graphic Fit

Lets consider the case of the fit of a straight line


y = ax + b

(3.1)

where the two parameters, the slope a, and the intercept b must be
graphically determined.
Lets assume that we are able to trace a straight line, which fits the
experimental points reasonably well. Considering then, two points A =
( x1 , y1 ), and B = ( x2 , y2 ) belonging to the straight line, eq. (3.1), and some
trivial algebra, we will have the two estimators of a, and b
a =

y2 y1
,
x2 x1

x2 y1 x1 y2
,
b =
x2 x1

x1 < x2 .

Anyway, because of the previous assumption we still need an objective


criterion to trace the straight line.
If we draw a rectangle centered on each data point, having sides corresponding to the data-point uncertainties, we will obtain a plot similar to
that shown in figure 3.2. Using a ruler, and applying the following two
rules we are able to trace two straight lines with maximum and minimum
slopes:
The maximum slope is such that if we try to draw a straight line with
steeper slope, not all the rectangles will be intercepted,
The minimum slope is such that if we try to draw a straight line with
less steep slope, not all the rectangles will be intercepted,

30

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

We will then have two estimations of a, and b, whose averages will give
estimated values of the slope and the intercept. Their semi-differences will
give the maximum uncertainties associated with them
a max + a min
,
2
b max + b min
b =
,
2

a =

a max a min
,
2
b max b min
b =
.
2
a =

(3.2)
(3.3)

Figure 3.2 shows an example of graphic fit applying the above-stated


max/min slope criteria. Using the data point A, B, C, and D we obtain
12.4 (0.9)
1.0
= 1.108k
a min = 12.0
13.50.0 = 0.815
12.0 0.0
12.0 (0.9) 0.0 12.4
1.00.012.0
= 1.0V
b max =
= 0.9V b min = 13.513.5
0.0
12.0 0.0
a max =

and finally
a = (1.0 0.1)k
b = (0 1)V
If we cannot have all the points intercepted by a straight line, and we
really need to give some numbers for the slope and intercept, we could
use this additional but very subjective thumb rule:
The maximum and the minimum slope straight lines are those lines,
which make the straight line computed using eq.s.(3.2) and (3.3) intercept at least 2/3 of the rectangles.
This rule tries to empirically take into account the results of statistics,
when applied to a curve fit. It is indeed better to use a statistical fitting
methods as explained in chapter 5.

3.2.2

Theoretical Points Imposition.

Imposing theoretical points to the fit curve implies that we are assuming
that the uncertainty of each experimental point is not dominated by any

3.3. LINEAR PLOT AND LINEARIZATION

31

systematic error (which is negligible compared to random errors). In fact,


if we have a systematic error, the theoretical points will be not be aligned
properly with the experimental points. This is likely to cause a large systematic error on the parameter estimation, especially in the case of statistical fits.
Figure 3.3 shows an examples of graphic fit that imposes a zero point
crossing on the max/min straight lines.
Using the data-points A and B we obtain
a max =

12.1
= 1.008k
12.0

a min =

12.0
= 0.916
13.1

and finally
a = (0.96 0.05)k
Statistically, this new measurement of a agrees with the previous one
within their uncertainty. Which measurement is more accurate is difficult
to say. A statistical analysis can reduce the uncertainty, giving a more
precise measurement.

3.3

Linear Plot and Linearization

The graphical fitting methods for straight lines can be extended to apply
to non-linear functions through the so called process of the linearization.
In other words, if we have a function, which is not linear we can apply
functions to linearise it.
We can mathematically formulate the problem in the following terms.
Lets suppose that
y = y( x; a, b)
is a non-linear function, with two parameters a and b. If we can find transformations
Y = Y ( x, y),
X = X ( x, y),
that allow us to write the following relation
Y = XF ( a, b) + G ( a, b),
where F, and G are known expression that only depend on a, and b, then
we have linearized y. Once the F and G values are found with graphical

32

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA


Voltage-Current Characteristic for a Carbon Resistor

12

Voltage Difference Across the Resistor (V)

10

8
Exp. Points
Limiting the
Min Slope

Exp. Points
Limiting the
Max Slope

A=( 0.0mA, 1.1V)

B=(13.5mA, 12.0V)

C=( 0.0mA, -0.9V)


D=(12.0mA, 12.4V)

C
-2
0

10

12

14

Current through the Resistor (mA)

Figure 3.2: Maximum an minimum slope straight lines intercepts all the
experimental points. If we try to draw a straight line that is steeper than
the maximum slope line, some of the points will not be intercepted; the
same is true for the minimum slope line, if we try to draw a line with a
slope that is less steep.

3.3. LINEAR PLOT AND LINEARIZATION

33

Voltage-Current Characteristic for a Carbon Resistor

Exp. Point
Limiting the
Max. Slope

12

Voltage Difference Across the Resistor (V)

10

8
Exp. Point
Limiting the
Min Slope

A=(12.0mA, 12.1V)

B=(13.1mA, 12.0V)

-2
0

10

12

14

Current through the Resistor , (mA)

Figure 3.3: Maximum an minimum slope straight lines with zero crossing
point imposed. The comments in the previous graphs also apply to this
figure.

34

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

or numerical methods a and b can be found by inversion of F and G. The


uncertainties on a and b can be calculated using the SPEL. Quite often,
the complexity of inverting F and G can make the linearization method
impractical.
Sometimes the linearization of a function can be achieved with nonlinear scales, as shown in the next sections.

3.3.1

Example 1: Square Function

Lets suppose that

y = ax2 .

If we define (the two functions to linearise the equation)


X ( x ) = x2 ,

Y (y) = y

then we will have the linear function


Y ( X ) = aX,
that can be plotted in a linear graph. The parameter a is now the slope of a
straight line, and it can be estimated using the method already explained.

3.3.2

Example 2: Power Function

If we have

y = bx a ,

(3.4)

applying the logarithm to both sides of the previous expression, we will


have
log y = a log x + log b.
If we then define the following functions
Y = log y,

X = log x,

we will have the linear function


Y = aX + log b.
In this case the slope a and the intercept b of the straight line are respectively the exponent and the coefficient of function y.

3.4. LOGARITHMIC SCALES

3.3.3

35

Example 3: Exponential Function

If we have
y = be ax ,
applying the logarithm to the right-hand sides, we have
log y = ax + log b.
If we define the following functions Y = log y,
finally have the linear function to plot

X = x, we will

Y = aX + log b.

3.4

Logarithmic Scales

A logarithmic scale is an axis whose scale is proportional to the logarithm


of the plotted value. The base of the logarithm is arbitrary for logarithmic
scales, and for the sake of simplicity we will assume base 10.
For example, if we plot on a logarithmic scale the numbers log10 0.1 =
1, log10 1 = 0,and log10 10 = 1, they will be exactly equally-spaced. In
general any multiple of power of 10 will be also equally-spaced. This
distance is a characteristic of the logarithmic scale sheet and is called a
decade.
One of the main advantages of using logarithmic scales is that we are
able to plot ranges of several orders of magnitude on a manageable sheet.
The inconvenience is the great distortion of the plotted curves, which sometimes can lead to a wrong interpretation of the results. This distortion becomes small when the range amplitude of the scale x = (b a), is smaller
than the magnitude a of the range. In fact, if x  a, we will have

x
x
x
log( a + x ) = log a(1 +
) = log( a) + log(1 +
) ' log( a) +
,
a
a
a


and the logarithmic scale is then approximated well by the linear scale.
Another advantage of using logarithmic scales is for the linearization
of functions as briefly discussed in the next subsection.

36

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

3.4.1

Linearization with Logarithmic Graph Sheets

There are essentially two cases that can be linearized using logarithmic
graph sheets:
If the experimental points follow a power law ( y = ax b ) we will obtain
a straight line if we plot y and x on a logarithmic scale.
If the experimental points follow an exponential law ( y = ab x , as per
the linearization procedure of example 2 ), we will obtain a straight line if
we plot the logarithm of y versus the logarithm of x.

3.5

Difference Plots

The ability to see how data points scatter from the theoretically fit curve in
a graph is quite important for assessing the quality of the fit. Quite often, if
we plot experimental points and the fit curve together it becomes difficult
to appreciate and analyze the difference between the experimental points
and the curve. In fact, if the measurement range of a physical quantity is
greater than the average distance between the experimental points and the
curve, the data points and the curve become indistinguishable.
One way to avoid this problem is to produce the so called difference
plot, i.e the plot of the difference between the theoretical and the measured
points. In a difference plot, the goodness of the fit and/or the poorness of
the theoretical model can be better analyzed.

3.5.1

Difference Plot of Logarithmic Scales

The difference of an experimental point y0 and a theoretical point y in a


logarithmic scale is the dimensionless quantity
 
y
= log(y) log(y0 ) = log
.
y0
Supposing that
y = y0 + y,

y
 1,
y0

and considering the first order expansion


log(1 + e) ' e

e1

3.5. DIFFERENCE PLOTS

37

we get


y
= log 1 +
y0

'

y
y y0
.
y0
y0

Assuming small deviations between experimental and theoretical data,


a difference plot with a vertical logarithmic scale is the relative uncertainy
plot.

38

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

Chapter 4
Probability Distributions
This chapter reports the basic definitions describing the probability density functions (PDF) and some of the frequently used distributions with
their main properties. To comprehend the next chapters it is important to
become familiar mainly with the first section, where some new concepts
are introduced. The understanding of each distribution is required since
they will be used in the next chapters.

4.1
4.1.1

Definitions
Probability and Probability Density Function (PDF)

Lets consider a continuous random variable x. We define the probability


that x assumes a value in the interval ( a, b) the following integral
P{ a < x < b} =

Z b
a

p( x )dx,

where p( x ) is a function, which is called probability density function (PDF)


of the random variable x. From the previous definition p( x )dx represents
the probability of having of x in the interval ( x, x + dx ). It is worth noticing
that in general p( x ) has dimensions [ p( x )] = [ x 1 ].

4.1.2

Distribution Function (DF)

The following function


39

40

CHAPTER 4. PROBABILITY DISTRIBUTIONS

F(x) =

Z x

p( x 0 )dx 0

(4.1)

is called the distribution function (DF) or cumulative distribution function


of p( x ).
Considering the properties of the integral, and equation (4.1), we will
have
P { a < x < b } = F ( b ) F ( a ).
The two following relations must be satisfied also
lim F ( x ) = 1,

(4.2)

lim F ( x ) = 0.

(4.3)

x +
x

The first limit, the so called normalization condition, represents the probability that x assumes anyone of its possible values. The second limit represents the probability that x does not assume any value.

4.1.3

Probability and Frequency

Lets suppose that we measure N times a random variable x, which can


assume any value inside the interval ( a, b). Partitioning the interval into n
intervals, and counting how may times k i the measured value is inside the
i-th interval, we will have
fi =

ki
,
N

i = 1, ..., n

which is the called frequency of the random variable x . The limit


Pi ( x ) = lim

ki
,
N

i = 1, ..., n

is the probability of obtaining a measurement of x in the i-th interval. This


experimental (unpractical) definition will be used in the next subsections.

4.1. DEFINITIONS

4.1.4

41

Continuous Random Variable v.s. Discrete Random


Variable

We can always make a continuous random variable x become a discrete


random variable X. Defining a partition1 of the domain [ a, b) of x
= { a = x0 , x1 , x2 , ..., xn = b},
we can compute the probability Pi associated with each interval [ x, xi+1 ).
If we define the discrete variable X, assuming one value per each interval2 X = { X1 , X2 , ...}, then we will have defined a new discrete random
variable X, with probability Pi .

4.1.5

Expectation Value

The following integral


E[ x ] =

Z +

x p( x )dx

is defined to be the expectation value of the random variable x.


Because of the linearity of the operation of integration, we have
E[x + y] = E[ x ] + E[y],
where and are constant values.

4.1.6

Intuitive Meaning of the Expectation Value

The intuitive meaning of the expectation value can be easily understood


in the case of a discrete random variable.
In this case, we have that the expectation value becomes3
n

E[ X ] =

Xi P ( Xi ) ,

i =1

general, the partition can be numerable. In other words, it can have infinite
number of intervals but we can associate an integer number to each one of them.
2 There is an arbitrariness on choice of the values of X because we are assuming that
i
any value in each given interval is equiprobable.
3 In general, when we change between the continuous to discrete variable, we have
R
that p( x )dx P( xi )
1 In

42

CHAPTER 4. PROBABILITY DISTRIBUTIONS

where P( Xi ) is the probability that X assumes the value Xi .


Lets suppose that measuring X N times, we obtain M different values
of X. Let k i be the number of times that we measure the value Xi , with
i = 1, 2, ..., M.
If we estimate P( Xi ) with the frequency k i /N of having the event X =
Xi
k
P ( Xi ) ' i ,
N
we will have
1 M
E[ X ] '
Xi k i ,
N i
=1
which corresponds to the average of the N measurements of the variable
X.
We can conclude that experimentally, the expectation value of a random variable is estimated by the average of the measured values of the
random variable.

4.1.7

Variance

The expectation value of the square of the difference between the random
variable x and its expectation value is called variance of x, i.e.
V [ x ] = E[( x E[ x ])2 ].
A more explicit expression gives
V [x] =

( x )2 p( x )dx

Using the properties of the expectation value and of the distribution


function, we obtain
V [ x ] = E [ x 2 ] E [ x ]2 .
A common symbol used as a shortcut for the variance is the Greek
letter sigma, i.e.
2 = V [ x ].
The variance has the so called pseudo linearity property. Considering
two random variables x, and y, we will have
V [x + y] = 2 V [ x ] + 2 V [y],
where and are constant values.

4.2. UNIFORM DISTRIBUTION

4.1.8

43

Intuitive Meaning of the Variance

To provide an intuitive meaning of the variance, we can still make use of


a discrete random variable X.
In this case, we will have
n

V [X] =

(Xi E[X ])2P(Xi ),

i =1

which shows that the variance is just the sum of the square of the distance of each experimental point to the expectation value weighted by the
probability to obtain the measurement. Estimating the probability with
the frequency, we will have that V [ X ] is just the average of the the square
of the distance of the experimental points Xi from their average, i.e.
V [X] '

1
N

(Xi E[X ])2 ki ,

i =1

We can conclude that the variance is estimated by the average of the


square of the distances of the experimental values from their expectation value.

4.1.9

Standard Deviation

The square root of the variance is defined as the standard deviation of the
random variable x, i.e.
rZ
=

4.2

( x )2 p( x )dx

Uniform Distribution

The following PDF of the random variable x



1/(b a) x [ a, b]
p( x; a, b) =
0
x
/ [ a, b]
p( x ) is called a uniform probability density function, and x is said to be
uniformly distributed in the interval [ a, b] (see figure 4.1). This PDF dictates that any value in the interval [ a, b] has the same probability.

44

CHAPTER 4. PROBABILITY DISTRIBUTIONS


0.5
a = 1 b
a = 2 b
a = 3 b
a = 4 b

p(x;a,b)

[AU1]

0.4
0.3

=1
=2
=3
=4

0.2
0.1
0
5

1
x

0
[AU]

P(x;a,b)

[#]

0.8
0.6
0.4
a = 1 b
a = 2 b
a = 3 b
a = 4 b

0.2
0
5

1
x

0
[AU]

=1
=2
=3
=4
5

Figure 4.1: Uniform probability density function p( x; a, b) and its cumulative function P( x; a, b) for different intervals [ a, b].
The cumulative distribution function is

x < a,
0
xa
axb
P( x; a, b) =
b a
1
x>b
The expectation value of x is
E[ x ] =
and the variance is

a+b
,
2

1
( b a )2 .
12
The calculation of E[ x ], and V [ x ], are left as exercise.
V [x] =

4.2. UNIFORM DISTRIBUTION

4.2.1

45

Random Variable Uniformly Distributed

Lets suppose that measuring N times a given physical quantity x we always obtain the same result x0 . In this case, we cannot study how x is statistically distributed. With this limited knowledge, a reasonable
hypothh
i

x
esis is that x is uniformly distributed in the interval x0 x
2 , x0 + 2 ,
where x is the instrument resolution. Under this assumption, the best
estimate of the uncertainty on x is indeed

x
ba
= .
x =
2 3
12
This is a case where the statistical uncertainty cannot be evaluated from
the measurements, and has to be estimated from the instrument resolution.
4.2.1.1

Example: Ruler Measurements

The measurement of a distance d with a ruler with a resolution x =


0.5mm (half of the smallest division) is repeated several times, and gives
always the same value of 12.5mm. Assuming that d is uniformly distributed in the interval [12.25, 12.75], the statistical uncertainty associated
to d will be
0.5
d = = 0.144mm .
2 3
4.2.1.2

Example: Analog to Digital Conversion

The conversion of an analog signal to a number through an Analog to Digital Converter (ADC), is another example of creation of a uniformly distributed random variable. In fact, the conversion rounds the analog value
to a given integer number. The integer value depends on which interval
the analog value lies in, and the interval length V is the ADC resolution. Then, it is reasonable to assume that the converted value follows the
uniform PDF.
If the ADC numerical representation is 12bit long, and the input/dynamic
range is from -10V to 10V the interval length is
V =

10 (10)
' 4.88mV ,
212

46

CHAPTER 4. PROBABILITY DISTRIBUTIONS

and the uncertainty associated to each conversion will be


4.88
V = ' 1.4mV .
2 3
Anyway, the previous statement about the distribution followed by the
converted value is not general, and not applicable to any ADC. For example, there are some numerical techniques applied to converters that change
the statistical distribution of the converted values. As usual, we have to
investigate which PDF the random variable follows.

4.3

Gaussian Distribution (NPDF)

The following PDF of the random variable x


p( x ) =

1
22

( x )2
22

xR

(4.4)

is said Gauss/Normal probability density function (NPDF) centered in


and with variance 2 . The variable x is said to be normally distributed
around (see figure 4.2).
Lets make a list of some important properties of the NPDF:
The variance and the expectation values are respectively
V [ x ] = 2 .

E[ x ] = ,

Calculation of E[ x ], and V [ x ], are left as exercise.


(4.4) is symmetric around the vertical axis crossing the point x = .
(4.4) has one absolute maximum in x = .
(4.4) has exponentially decaying tails

| x |  | |,

p( x ) e

x2
22

4.3. GAUSSIAN DISTRIBUTION (NPDF)

47

p(x;,)

[AU1]

0.4
=1
=2
=3
=4
=5

0.3

0.2

0.1

0
8

2
x

0
[AU]

P(x;,)

[#]

0.8
0.6
=1
=2
=3
=4
=5

0.4
0.2
0
8

2
x

0
[AU]

Figure 4.2: Gaussian probability density function p( x; , ) and its cumulative function P( x; , ) for different values of .
the analytical cumulative function
P( x ) =

1
22

Z x

dx 0 e

( x 0 )2
22

(4.5)

which is not an elementary function.


It is left to the student to verify the normalization condition (4.2) for
the equation (4.4).

4.3.1

Standard Probability Density Function

Applying the following transformation to the NPDF

48

CHAPTER 4. PROBABILITY DISTRIBUTIONS

x
,

we obtain the so called standard probability density function


t=

t2
1
p(t) = e 2 ,
2

x R,

with
E[t] = 0,
and

4.3.2

1
P(t) =
2

V [t] = 1 ,
Z t

dt0 et

2 /2

Probability Calculaltion with the Error Function

Considering the definition of error function


2
erf(t) =

Z t
0

dx e x ,

then
1
P {t1 t t2 } =
2


erf

t
1
2

+ erf

t
2
2


,

0 t1 t2 .

The error function is usually available in most of modern programming


languages implementations and even in some pocket calculators.
Calculating the probability of a normally distributed variable x using
P(t) , and for an arbitrary interval containing , is straightforward. In fact,
because of the properties of the integral operator we have





1
a
b

P { a x b; , } =
erf
+ erf
,
a b.
2
2
2
For a symmetric interval about , we have


a

P { a x a ; , } = erf
,
2

a .

The demonstration of the three previous formulas are left as exercise.

4.4. EXPONENTIAL DISTRIBUTION

49

1
x0 = 1
x0 = 2

p(x;x0)

[AU1]

0.8

x0 = 3
0.6

x0 = 4
x0 = 5

0.4
0.2
0

5
x [AU]

10

P(x;x0)

[#]

0.8
0.6

x0 = 1
x0 = 2

0.4

x0 = 3
x =4

0.2

x0 = 5
0

5
x [AU]

10

Figure 4.3: Exponential probability density function p( x; x0 ) and its cumulative distribution function P( x; x0 ) for different values of x0 .

4.4

Exponential Distribution

The following PDF of the random variable x


1 x/x
0
0 x < ,
x0 e
p( x; x0 ) =

0
x < 0,

x0 > 0

is the exponential probability density function, and x is therefore exponentially distributed in the interval [0, ) (see figure 4.3).
The cumulative distribution function is
Z x
1 x0 /x0 0
e
dx = 1 e x/x0 ,
P( x; x0 ) =
0 x0

50

CHAPTER 4. PROBABILITY DISTRIBUTIONS

the expectation value of x is


Z +
x x/x0
E[ x ] =
e
dx = x0 ,
0

x0

and the variance


V [x] =

Z +
0

( x x0 )2

1 x/x0
e
dx = x02 .
x0

The calculation of E[ x ], and V [ x ], are left as exercise.

4.4.1

Random Variable Exponentially Distributed

The decay time of an unstable particle measured in its rest frame is a


random variable that follows the exponential distribution whose PDF is
p( ) =

1 /0
e
.
0

The quantity 0 is called the mean lifetime of the particle. In other words,
the previous formula gives the probability of an unstable particle to decay
after a time interval (, + d ) measured in its rest frame.
Lets demonstrate the previous formula. If N (t) is the number of unstable particles at the time t, then the rate of decayed particle N after a
time t will be
N
= N ,
> 0,
t
where is the propability of a decay in a time t. The assumption here is
that the decay of each single particle is an independent random process,
and therefore the rate of particles that decay must be proportional to the
number of particles. The minus sign is necessary because the particles
number is decreasing (N 0). Considering N very large (lots of decays
per unit time and therefore the variation in the particle number is almost
continuous), then we can approximate with a continuous decay rate
N dN

dN
= dt .
N

Integrating the previous differential equation we get

4.5. BINOMIAL/BERNOULLI DISTRIBUTION

N (t) = N0 et ,

51

N (t = 0) = N0 .

The previous formula give us the particles number survived after a


time t. The PDF of one single particle to survive after a time t
p(t) =

d N (t)
d
= et = et
dt N0
dt

1
0 = .

Experiment 26 of the ph7 sophomore lab is an interesting study of unstable decay which uses Californium 252 (252 Cf) neutrons source to activate silver atoms.

4.5

Binomial/Bernoulli Distribution

Lets suppose that we perform an experiment, which only allows to obtain


The first event has probability P( A) = p
two results/events4 A, and A.
and the other event has necessarily probability P( A ) = (1 p).
If we repeat the experiment N times, the probability of having k times
is
the event A (and indeed having N k times A)


N
P N,p (k ) =
p k (1 p ) N k ,
0p1
(4.6)
k
where

n
k

n!
(n k)! k!

is the binomial coefficient. (4.6) is called the Binomial or Bernoulli distribution.


The expectation value of k and its variance are respectively
E[k] = N p,
V [ k ] = N p (1 p ).
The calculation of E[k ], and V [k], are left as exercise.
4 Any

kind of experiment which has more than one, numerable, or infinite results can
be arbitrarily arranged into two set of results, and indeed into two possible events.

52

CHAPTER 4. PROBABILITY DISTRIBUTIONS


0,18
0,16
0,14

Probability

0,12
0,1
0,08
0,06
0,04
0,02
0

10

15

20

25
Events

30

35

40

45

50

Figure 4.4: Poisson Distribution Pm (k ) for different values of the parameter


m (5, 10, 15, 25)
For large values of N (N > 20, and especially for p ' 1/2), P(k ) becomes well approximated by the NPDF.
The Poisson distribution which is described in the next section, is an
asymptotic limit of the binomial distribution when N .

4.6

Poisson Distribution

Imposing the following conditions to the binomial distribution


N
p 0
N p = m = const.,
we obtain the so called Poisson distribution (see figure 4.4)
Pm ( k ) =

mk m
e .
k!

4.6. POISSON DISTRIBUTION

53

The expectation value of k and its variance are respectively


E[k] = m,
V [ x ] = m.
The calculation of E[k ], and V [k], are left as exercise.
The meaning of this distribution is essentially the same as the binomial
distribution. It represents the probability of measuring an event k times
when the number of measurements N is very large, i.e N  m.
Lets mention some important qualitative properties of the Poisson distribution.
For m < 20, Pm (k) is quite asymmetric and the expectation value
does not coincide with the maximum.
For m 20, Pm (k) is quite symmetric, the expectation value is very
close to the maximum, and the curve is quite
well approximated by
a Gaussian distribution with = m, and = m.
If k1 , k2 , ..., k n are n measurements of k, a good estimator of m is the average
of the measurements, i.e.
1 n
= ki ,
m
n i =1
and the estimated uncertainty on each single measurement is indeed
k =

m.

is
The uncertainty on the estimator m

m = k =
n

m
.
n

The demonstration of the validity of these estimators is based on concepts explained in chapter 5.

54

CHAPTER 4. PROBABILITY DISTRIBUTIONS

4.6.1

Example: Silver Activation Experiment

A classical example of application of the Poisson distribution is the statistical analysis of atomic decay. In this case, the Poisson variable is the total
number of decays measured during a given time . The number N is the
number of atoms that can potentially decay, which is normally quite difficult to make very small, making the approximation N quite good.
Lets consider here some real data taken from the activation of the silver with a radioactive source:
Number of measurements n = 20
Measurement time = 8s
Table of measurements (with the average radioactive background
decays already removed)
i
1
k i (Counts) 52

2
46

3
61

4
60

5
48

6
55

7
53

8
59

9
56

10
53

i
11
k i (Counts) 59

12
48

13
63

14
50

15
55

16
56

17
55

18
61

19
49

20
39

Using the average as the estimator of m we get


=
m

1
1078
ki =
= 53.90 counts

n
20

The uncertainty is
k =

= 7.3417 counts,
m

The uncertainty on m is

m = k = 1.6416 counts .
n
The mean number of decaying atoms during a period of = 8s is
indeed
= (53.90 1.64) counts
m

4.6. POISSON DISTRIBUTION

55

Finally, neglecting the uncertainty on the measurement time , the statistical measurement of the decay rate obtained dividing by is
R = (6.74 0.21) decays/s
It is important to notice that a single long measurement gives the same
result. In fact, considering the overall decay time , we will have

m = 1078
counts

m = (1078.0 32.8) counts


=
1078
= 32.83 counts
Because of the integration time is now n = 8 20 = 160s, we will have
R = (6.74 0.21) decays/s
The calculation of R using a single long measurement has essentially
two issues:
it does not allow us to check the assumption on the statistic distribution.
There is no way to check for any anomalies during the data taking
with just one single datum.

56

CHAPTER 4. PROBABILITY DISTRIBUTIONS

Chapter 5
Parameter Estimation
When a function depends on a set of parameters we are faced with the
problem of estimating the values of those parameters. Starting from a finite set of measurements we can make a statistical determination of the
parameters set.
In the following sections we will examine two standard methods, the
maximum-likelihood and least-square methods, to estimate the parameters of a PDF and of a general function of one independent variable.

5.1

The Maximum Likelihood Principle (MLP)

Let x be a random variable and f its PDF, which depends on a set of unknown parameters ~ = (1 , 2 , ...n )
f = f ( x; ~ ).
Given N independent samples of x, ~x = ( x1, x2 , ..., x N ), the quantity
N

L(~x; ~ ) = f ( xi ; ~ )
i =1

is called the likelihood of f . L is proportional to the probability to obtain


the set of samples ~x, assuming that the N samples are independent.
The maximum likelihood principle (MLP) states that the best estimate of
the parameters ~ is the set of values which maximizes L(~x; ~ ).
57

58

CHAPTER 5. PARAMETER ESTIMATION

The MLP reduces the problem of parameter estimation to that of maximizing the function L. Because, in general, it is not possible to find the parameters ~ that maximize L analytically, numerical methods implemented
in computers are often used.

5.1.1

Example: and of a Normally Distributed Random


Variable

Lets suppose we have N independent samples of a normally distributed


random variable x, whose and are unknown. Experimentally, this case
corresponds to measuring the same physical quantity x several times with
the same instrument.
In this caseL is
"
#

N
N
2
1
(
x

)
i
L(~x, ~ ) =
exp
,
22
2
i =1
and we have to maximize it.
Considering that the exponential function is monotone, we have just to
study the argument of the exponential. Imposing the following conditions
#
"
N

( x i )2
= 0,

22
i =1
"
#
N

( x i )2

= 0,

22
i =1
is sufficient to determine the absolute minimum of L.
Solving the first equation respect to , we obtain the estimator of
=

1
N

xi ,

i =1

which, substituted into the second equation, gives the estimator of


2 =

1
N

(xi )2.

i =1

5.1. THE MAXIMUM LIKELIHOOD PRINCIPLE (MLP)

59

The estimator of the variance is biased, i.e . the expectation value of


the estimator is not the parameter itself; in fact


1
2
E[ ] = 1
2 .
N
Because of this it is preferable to use the following unbiased estimator
s2 =

N
1
( xi )2 .
N 1 i
=1

E [ s2 ] = 2 .

What is the variance associated with the average ?


To answer to this question, lets consider the average variable,
x =

1
N

xi ,

i =1

which must be a Gaussian random variable. Using the pseudo-linearity


property, its variance can be computed directly, i.e.
V [ x ] =

5.1.2

1 2
.
N

Example: of a set of Normally Distributed Random


Variables

Lets suppose we have N independent samples of N normally distributed


random variables x1 , x2 , ..., x N , having the same unknown expected value
2 . Experimentally, this case cor but different known variances 12 , 22 ..., N
respond to measuring the same physical quantity x several times with different instruments.
In this case L is
"
#

N N
N
2
(
x

)
1
1
exp i 2
,
L(~x, ~ ) =

2
2
i
i =1
i =1
i
and we have to maximize it.
Imposing the following condition
"
#
N
d
( x i )2

= 0,
d
2i2
i =1

60

CHAPTER 5. PARAMETER ESTIMATION

is sufficient to determine the absolute maximum of L.


Solving this equation respect to , we obtain the estimator of
1/i2

i =1

k =1

1/k2

xi ,

which is the weighted average of the random variables . What is the variance associated with the weighted average ?
To answer to this question lets consider the weighted average of a random variable,
N
1/i2
x = N
x
2 i
i =1 k=1 1/k
Using the pseudo-linearity property, its variance can be directly computed,
i.e.
1
V [ x ] = N
i=1 1/i2

5.2

The Least Square Principle (LSP)

Lets suppose that we have a function y of one variable x, and of a set of


parameters ~ = (1 , 2 , ...n )
y = y( xi ; ~ ).
If we have N pairs of values of ( xi , yi i ) with i = 1, ..., N, then the
following quantity
2 (~x; ~ ) =

i =1

"

yi y( xi ; ~ )
i

#2
,

is called the chi square1 of y.


The Least Square Principle (LSP) states, that the best estimate of the
parameters ~ is the set of values which minimizes 2 .
If i = for i = 1, ..., N, the 2 expression can be simplified because
can be eliminated, so the search for the 2 minimum becomes easier .
1 2

must be considered a symbol, i.e. we cannot write =

2 .

5.2. THE LEAST SQUARE PRINCIPLE (LSP)

61

It is important to notice that in this formulation the LSP requires knowledge of the i (the uncertainties of the measurements yi ) and assumes no
uncertainties are associated with the xi . In a more general and correct
formulation the uncertainties of the xi should be taken into account (see
appendix D). In general, uncertainties in the independent variable x can
be neglected if
y
xi
 k
xi
yk

i = 1, 2, ..., N,

k = 1, 2, ...., N

It is easy to prove that in the case of a normally distributed random


variable the MLP and the LSP are equivalent, i.e. applying the two principles to the Gaussian function yields the same estimators for and .

5.2.1

Geometrical Meaning of the LSP

Neglecting the uncertainties i , 2 is just the sum of the square of the distance between the curves points ( xi , y( xi )) and the points yi . The minimization of 2 minimization corresponds to the search of the best curve,
which minimizes the distance between the points yi and the curves points,
varying the parameters k .
The introduction of the uncertainties is necessary if we want to perform
a statistical analysis instead of just solving a pure geometrical problem.

5.2.2

Example: Linear Function

Lets suppose that the function we want to fit is a straight line


y( x ) = ax + b,
where a and b are the parameters that we have to determine. For the sake
of simplicity, we assume that all the experimental values y1 , y2 , ..., yn have
the same uncertainty y , and that the uncertainties on x1 , x2 , ..., xn are negligible
Applying the LSP to y( x ), we get
2 (~x; a, b) =

1
y

(yi axi b)2 .

i =1

62

CHAPTER 5. PARAMETER ESTIMATION

Computing the partial derivatives with respect to the parameters a and


b and equating them to zero,
2
= 0,
a
2
= 0,
b
and solving the linear system for a, and b, we finally get2
a =

iN=1 ( xi x )(yi y )
,
iN=1 ( xi x )2

where
x =

1
N

y iN=1 xi2 x iN=1 xi yi


b =
,
iN=1 ( xi x )2

xi ,

i =1

y =

1
N

yi .

i =1

The functions a , and b are the best estimators of parameters a and b given
by the LSP.
Uncertainties on a and b can be estimated by applying the SPEL to the
a and b expressions. After some tedious algebra we get
v
uN 

u
a 2
t
,
a ' y
y
i
i =1

v
u
uN
b ' y t

i =1

b
yi

!2
,

where the partial derivatives with respect to x1 , ...x N are neglected because
we assumed that their uncertainties are negligible.

5.2.3

The Reduced 2 (Fit Goodness)

The Reduced 2 is defined as


"
#2
N
~
1
y

y
(
x
;

)
i
i
2 / ( N d ) =
,
N d i

i
=1
2 As already mentioned, the use of the hat symbol is just to distinguish the parameter

from its estimator, i.e. a is the estimator of a.

5.3. THE LSP WITH THE EFFECTIVE VARIANCE

63

where d , which is called the number of degrees of freedom, equals the


number of parameters to be estimated.
The meaning of the reduced 2 can be intuitively understood as follows. Because the difference between the fitted valued y( xi ; ~ ) and the
experimental value yi is statistically close to i , we can naively estimate
the 2 value to be equal to N ( i ) /i = N. Dividing 2 by N , we expect to obtain a number close to 1. A rigorous theoretical approach can
explain the need to subtract d from N, but that is outside the scope of this
introductory note. It is worthwhile to notice that this subtraction is not
negligible for small values of N which are comparable to d .
For N d < 20, the reduced 2 should be slightly less than one.
When a reduced 2 is not close to one, the most likely causes are:
Uncertainties on xi and/or on yi are too small 2 /( N d) > 1,
Uncertainties on xi and/or on yi are too large 2 /( N d) < 1,
a small number of data points too much scattered from the theoretical curve 2 /( N d) > 1,
wrong fitting function or poor physical model 2 /( N d) > 1,
Poor gaussianity of the data points distribution 2 /( N d) > 1,
A more rigorous statistical interpretation of the reduced 2 value implies
the study of its PDF.

5.3

The LSP with the Effective Variance

To generalize the LSP in the case of appreciable uncertainties on the independent variable x, we can use the so called effective variance, i.e.
 2
f
2
i =
x2i + y2i .
x
where xi , and yi are the uncertainties associated with xi , and yi respectively. The derivative is calculated in xi .
Substituting this new definition of i into the previous the definition of
the 2 will take into account the effect of the uncertainty on x.
The proof of this formula is given in appendix D.

64

CHAPTER 5. PARAMETER ESTIMATION


3

Thermistor Characteristic Curve Fit

x 10

1/T (K)

3.5

Difference Plot

x 10
6
5
1/T (K)

4
3
2
1
0
1
0

2
Resistance

3
(log(Ohm))

Figure 5.1: Linear fit of the thermistor data. The difference plot clearly
show the bad approximation of the linear fit.

5.4

Fit Example (Thermistor)

Thermistors, which are devices made of a semiconductor material [5], are


remarkably good candidates to study fitting techniques. In fact, the temperature dependency of their energy band gap Eg ( T ), makes their resistance vs. temperature characteristics ideal for demonstrating some of the
aspects and issues and of fitting.
The thermistor response is described by the equation
R( T ) = R0 e Eg (T )/2kb T .
where R is the resistance, T is temperature in Kelvin, and k b is the Boltzmann constant.

5.4. FIT EXAMPLE (THERMISTOR)

65

Taking the logarithm of the previous expression, we have


log R = log R0 +

Eg ( T )
.
2k b T

Neglecting the temperature dependence of Eg , we can linearize the thermistor response as follows
1
2k
2k
= b log R b log R0 ,
T
Eg
Eg

y=

1
, x = log R.
T

Following what has been done in a published article [6], the corrections to
the linear fit can be introduced empirically using a polynomial expansion
in log R, i.e.
1
= C0 + C1 log R + C2 log2 R + C3 log3 R.
T

5.4.1

Linear Fit

Applying a linear curve fit, we obtain


1
T

= C0 + C1 log R

C0 = (2.67800 0.0005) 103 a.u.


C1 = (3.0074 0.00024) 104 a.u.
2 /( N 2) = 4727.5
It is clear from the difference plot in figure 5.1 and the reduced 2 value
that there is more than just a linear trend in the data set. In fact, the difference plot shows a quadratic curve instead of a random distribution of
data-points around the horizontal axis. It is noteworthy to observe how
difficult it is to see any difference between the straight line and the experimental points in the curve fit of figure 5.1.

66

CHAPTER 5. PARAMETER ESTIMATION


3

Thermistor Characteristic Curve Fit

x 10

1/T (K)

3.5

Difference Plot

x 10
5
4
1/T (K)

3
2
1
0
1
2
0

2
Resistance

3
(log(Ohm))

Figure 5.2: Quadratic fit of the thermistor data. The plot of the experimental points and the theoretical curve seems to show a good agreement. The
difference plot clearly still shows a coarse approximation of the quadratic
fit.

5.4.2

Quadratic Fit

Applying a quadratic curve fit, we obtain


1
T

= C0 + C1 log R + C2 log2 R

C0 = (2.68800 0.0005) 103 a.u.


C1 = (2.7717 0.00067) 104 a.u.
C2 = (5.460 0.014) 106 a.u.
2 /( N 3) = 34.2
Again, it is difficult to see any non-linear trend in the data points of the
curve fit shown in figure 5.2. However, the reduced 2 is still larger than

5.4. FIT EXAMPLE (THERMISTOR)


3

67

Thermistor Characteristic Curve Fit

x 10

1/T (K)

3.5

Difference Plot

x 10

1/T (K)

4
2
0
2
4
6
0

2
Resistance

3
(log(Ohm))

Figure 5.3: Cubic fit of the thermistor data. The plot of the experimental
points and the theoretical curve, and the difference plot dont show any
clear systematic difference between the experimental point an theoretical
curve.

1 and the difference plots clearly shows a residual trend in the data.

5.4.3

Cubic Fit

Applying a cubic curve fit, we obtain

68

CHAPTER 5. PARAMETER ESTIMATION


1
T

= C0 + C1 log R + C2 log2 R + C3 log3 R

C0
C1
C2
C3

= (2.6874 0.0003) 103 a.u.


= (2.8021 0.0012) 104 a.u.
= (3.376 0.066) 106 a.u.
= (2.88 0.09) 107 a.u.

2 /( N 4) = 0.403
In this final case the reduced 2 value is smaller than one and the scatter of the data from the horizontal axis in the difference plot of figure 5.3
suggests that the assumed uncertainty on the temperature T = 0.02K is
probably a little too large. Apart from an increase of the data-points uncertainty, no special trend seems to be visible on the difference plot.
The following table contains the data points used for the thermistor
characteristic fits .

Point
(#)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Resist.
R()
0.76
0.86
0.97
1.11
1.45
1.67
1.92
2.23
2.59
3.02
3.54
4.16
4.91
5.83
6.94
8.31

Temp.
T (K )
383.15
378.15
373.15
368.15
358.15
353.15
348.15
343.15
338.15
333.15
328.15
323.15
318.15
313.15
308.15
303.15

T
(K )
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02

R
()
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

Point
(#)
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

Resist.
R()
10.00
12.09
14.68
17.96
22.05
27.28
33.89
42.45
53.39
67.74
86.39
111.3
144.0
188.4
247.5
329.2

Temp.
T (K )
298.15
293.15
288.15
283.15
278.15
273.15
268.15
263.15
258.15
253.15
248.15
243.15
238.15
233.15
228.15
223.15

T
(K )
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02

R
()
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

5.5. FIT EXAMPLE (OFFSET CONSTANT)

5.5

69

Fit Example (Offset Constant)

The following data (?) 3


are the measurements of the voltage-current (V versus I ) characteristic
of a silicon diode[5], i.e


I (V ) = I0 eqV/kb T 1 ,
where I0 is the reverse bias current, q is the electron charge, k b the Boltzmann constant, T the absolute temperature, and a parameter, which is
equal to 2 for silicon diodes
Figure (?), which has been made using the following fitting curve
y = aebx ,
shows a systematic quadratic residue in the difference plot.
Fitting the experimental points with the following curve
y = aebx + c
the quadratic trend in the difference plot disappears.
This effect is typical of fits for which the correct offset constant is not
introduced, thus producing systematic trends in the difference plots.

3 Please

let us know if you notice that the data and plot are missing in this section.

70

CHAPTER 5. PARAMETER ESTIMATION

Appendix A
Central Limit Theorem
Lets consider a set of N independent random variables 1 , 2 , . . . , N ,
all having the same PDF p( i ) with the following parameters1
E[ i ] = ,

V [ i ] = i2 ,

i {1, 2, . . . , N }

Let s then consider the following random variable


1
x=
N

i.
i

The central limit theorem states that, for N , x is normally distributed


around and its variance is the sum of the random variable variances,
i.e.

E[ x ] =

2
(
x

)
1
2
e 2 ,
N  1 p( x ) '

2 2
V [ x ] = 1N i2 /N 2
The proof of this theorem is rather complicated and is outside the scope
of this work.
The central limit theorem tell us that a random varaiable that is the sum
of random variables following an unknow PDF behaves as a normallly
distributed random variable.
1 This

theorem has a more general formulation. It was proved first by Laplace and
then extended by other mathematicians including P.L.Chebychev, A.A. Markov and A.M.
Lyapunov.

71

72

APPENDIX A. CENTRAL LIMIT THEOREM

It is noteworthy that this theorem suggests a simple way to generate


values for a normally distributed variable. In fact a normally distributed
value can be computed just by adding a large number of values of a given
random variable. The and the can be estimated from the set of generated values.

Appendix B
Statistical Propagation of Errors
We want to find an approximate formula that computes the variance of a
function of random variables by using the standard Taylor series expansion.
Let f be a function of n random variables x = ( x1 , x2 , ..., xn ).
The first order Taylor expansion of f ( x ) about = (1 , 2 , ..., n ) where
i = E [ x i ],
is

f ( x ) = f () +
1

i = 1, 2, ...n,


f ( x )
( x i ) + O ( xi i ) ,
xi xi =i i

(B.1)

where O(m) indicates terms of order higher than m.


Substituting the (B.1) into the definition of the variance of f ( x ), we get
V [ f ( x )] = E[( f ( x ) f ())2 ]

!2

n
f ( x )
= E
( x i )
xi xi =i i
1
By the aid of the expected value properties, we obtain


n,n


f ( x )
f (x)
V [ f ( x )] =
E
[(
x

].
)

i
i
j
j
xi xi =i x j
i =1,j=1
x j = j

73

(B.2)
(B.3)

74

APPENDIX B. STATISTICAL PROPAGATION OF ERRORS


Defining the covariance matrix Vij as follows

Vij = E[( xi i ) x j j ],

i, j = 1, 2, ....n,

the previous expression of the variance of f finally becomes




N,N
f ( x )
f ( x )
Vij
V [ f ( x )] =

xi xi =i x j
i =1,j=1
x j = j

which is the law of statistical propagation of errors.

(B.4)

Appendix C
NPDF Random Variable
Uncertainties
Lets consider a set of independent measurements X = { x1 , x2 , ..., xn } of
the same physical quantity x following the NPDF.
Measurement Set with no Uncertainties (Unweighted Case)
The uncertainty s of each single measurement xi , and the way to report it,
are
s2 =

N
1
( xi x )2 ,
N 1 i
=1

x =

1
N

xi ,

x = ( xi s)units

i =1

For the mean of the measurements xand


its uncertaintys x , we have
1
x =
N

xi ,

i =1

s2x =

s2
,
N

x = ( x s x )units

Measurement Set with Uncertainties (Weighted Case)


If each measurement has an uncertainty = {1 , 2 , ..., n }, we will trivially have
x = ( xi i )units
For the mean of the measurements and its uncertainty, we have
75

76

APPENDIX C. NPDF RANDOM VARIABLE UNCERTAINTIES

s2x

1
,
= N
i=1 1/i2

x =

s2x

2i ,

i =1

x = ( x s x )units

Appendix D
The Effective Variance
Let x be a normally distributed random variable and let y = f ( x ) be another random variable. Lets suppose that each data point ( xi xi , yi
yi ), i = 1, ..., N, is normally distributed around ( xi , yi ).
Applying the MLP to y and x we obtain
#
"
N
1
( xi xi )2
L( x, y) =
exp
2x2i
i =1 xi 2
"
#
1
(yi yi )2
exp

2y2i
yi 2
1

xi 2 yi 2

where
N

S=

i =1

"

exp S,

#
( xi xi )2 (yi yi )2
,
+
2x2
2y2i

Making the following approximation


y( xi ) ' f ( xi ) + ( x xi ) f 0 ( xi ),
S becomes
N

S'

i =1

"

#
( xi xi )2 [ f ( xi ) + ( x xi ) f 0 ( xi ) yi ]2
+
.
2x2
2y2i
77

78

APPENDIX D. THE EFFECTIVE VARIANCE


In this case, the condition of maximization of L( x, y)
S
= 0,
xi

gives

i = 1, ..., N,

2( xi xi ) 2{yi [ f ( xi ) ( xi xi f 0 ( xi )} f 0 ( xi )
+
= 0.
x2
x2

Solving with respect to the parameters xi we obtain the expression


xi = xi
where

x2i
i2

[ f ( xi ) yi )] f 0 ( xi ),

i2 = [ f 0 ( xi )]2 x2i + y2i .

Replacing this expression in the definition of S we obtain


N

S=

[yi f ( xi )]2
2i2
i =1

which is the new S that must be minimized.

Bibliography
[1] P.R. Bevington, D. K. Robinson, Data Reduction and Error Analysis
for the Physical Science, second edition, WCB McGraw-Hill.
[2] J. Orear, Least squares when both variables have uncertainties, Am.
J. Phys. 50(10), Oct 1982
[3] S. G. Rabinovich, Measurements Errors and Uncertainties Theory
and Practice, second edition, Springer .
[4] C. L. Nikias, A. P. Petropulu, Higher-Order Spectral Analysis, PTR
Prentice Hall.
[5] V. de O. Sannibale, Basics on Semiconductor Physics, Freshman Laboratory Notes, //http://www.ligo.caltech.edu/~vsanni/ph3/
[6] Deep Sea Research, 1968, Vol 15, pp 497 to 501, Pergamon Press
(printed in Great Britain).

79

S-ar putea să vă placă și