Sunteți pe pagina 1din 20

Some definitions

experiment
e.g. throwing 2 times a dice
realization: outcome of one specific experiment
event space/sample space:
all possible outcomes
random variable Zufallsvariable
function of outcome of experiment (e.g. sum of
dices)
event space and realization defined accordingly
for random variables
random variable can be either discrete or
continues

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Example
experiment: throwing two dices
event space of experiment
{11,12,13,14,15,16,21,22,23,...,64,65,66}
random variable x = sum of dices
event space of x: {2,3,4,5,6,7,8,9,10,11,12}
realization of experiment e.g. 25 accordingly
realization of x=7

cumulated distribution u(x): probability to observe or smaller value

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Probability density function (p.d.f)
Repeat an experiment with outcome characterized by
single continuous variable x
Definition: the probability to measure a value x in the
interval [x,x+dx] is give by probability density function f(x)
(pdf) Wahrscheinlichkeitsdichte

P is a measure of how often a value of x occurs in a given sample


pdf f(x) >=0 and normalized

pdf f(x) is NOT a probability, it has dimension 1/x!


Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Cumulative distribution function (c.d.f.)

Cumulative distribution function F(x), also know as


probability distribution function.
Wahrscheinlichkeitsverteilung = Verteilungsfunktion
F(x') is interpreted as probability to find value x <= x'
F(x) is continuously non-decreasing function
F(-) = 0 and F(+)=1
is directly related to the probability density
function f(x) by:

f(x) is given (for well-behaved distributions) by:

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Average
arithmetic mean of data set:

geometric mean of data set:

harmonic mean of data set:

3 Pythagorean means:
harmonic mean <= geometric mean <= arithmetic mean
weighted mean of data set:

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Example: Arithmetic Mean
Average number of children per family in
Germany is 2.3
Average lifetime expectation for men is 74 for
women 78
Average amount of semester for physics
studies in Heidelberg is 11.2
...

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Example: Geometric Mean
Needed to average multiplicative functions:
interests per year in the last 5 years
2002: 2.5 %
2003: 2.5 %
2004: 3.0 %
2005: 3.5 %
2006: 3.5 %
after five years:

comparable average interest:

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Example: Harmonic Mean

Travel half of the distance with 40 km/h and


half of the distance with 60 km/h.
What is the average speed?

Travel half of the time with 40 km/h and half


of the time with 60 km/h.
What is the average speed?

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Example: Weighted Mean

5 measurements { } with
different uncertainties { }

arithmetic mean is special case of weighted


means (same weight for each measurement)

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


... even more averages
Mode/Modus:
most probable value (highest bin in distribution)
definition not unique, unimodal, bimodal distributions

Median:
smallest value which is than 50% of the events
(median more robust against outliers than artithm .mean)

For unimodal, symmetric functions centered around :


median = mode = arithmetic mean

else, for unimodal functions, empiric rule


mean-mode = 3 x (mean-median)

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Examples:

Give median, mode, arithmetic mean of:

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Energy loss in Material
Assume tracking stations with 12 layers
The energy loss per traversed material
(dE/dx) of the particle traversing a layer
follows a Landau distribution

mode

dE/dx
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Energy loss in Material
The mode of the dE/dx distribution as a
function of particle momentum is used to
seperate different particle species

1 10
momentum [GeV/c]

How to get estimate of mode from 12 measurements?


Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Truncated Mean
Discard 10/20% measurements with larges
value (symmetrize the function)
Take arithmetic mean of remaining ones

All about estimating the true mode/median/mean


of distributions from given data set. More about
estimators later.
Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Citation ....

Die Rate der Menschen die in USA in Armut


leben dramatisch gestiegen.
Die Hlfte aller Menschen haben ein
Einkommen unter dem Durchschnitt.

aus So lgt man mit Statistik

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Measure the Spread
How to characterise width/spread?
First thought mean deviation from the mean:

Could consider average absolute deviation:

However hard to handle mathematically.


Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer
Variance
Way better quantity:

mean square deviation called sample variance

For any function :

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Variance
For data analysis, preferably loop only once over data:

mean square square of the mean

For large numbers, safer to shift distribution by estimated mean :

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer


Standard Deviation (R.M.S), FWHM
standard deviation or RMS: root mean squared

[standard is a joke, there are several standards in literature ...]


FWHM: full width at half maximum
more robust against outliers, fluctuations harder at low statistics
for Gaussian distribution: FWHM = 2.35

Modern Methods of Data Analysis - WS 07/08 Stephanie Hansmann-Menzemer

S-ar putea să vă placă și