Stats Revision Slides (From Maria Molina-Domene)

Refreshing Statistics
Some useful concepts
Maria Molina-Domene
LSE
October, 2014
Maria Molina
(LSE)
Stats - Revision
October, 2014
1 / 32
REVIEW LECTURE 1
Maria Molina
(LSE)
Stats - Revision
October, 2014
2 / 32
Experiments - Outcomes - Probability
Back to Stats, we dened experiments as processes under controlled

conditions. The emblematic example we use was rolling a die and
recording its value.
The results of the experiment were called outcomes.
We dened probability as the proportion of trials of getting a specic
outcome A if the experiment was repeated a large number of times.
Example:
Experiment: toss a coin
Outcomes: head or tail (mutually exclusive)
Probability: the proportion of times heads come out of those tosses
Maria Molina
(LSE)
Stats - Revision
October, 2014
3 / 32
Random Variables
Experiment: toss a coin 10,000 times
Outcomes: 5023 of the tosses come up heads
Probability: P(heads) = 0.5 for that coin
Event
Outcomes
A
Heads
=
= 0.5
P (A) =
Total
Sample
Outcomes
Space
Random variables are the numeric values assigned to these
"randomoutcomes".
Ex:rolling two coins S = {(1, 1), (0, 0), . . . , (0, 1), (0, 1)}
Maria Molina
(LSE)
Stats - Revision
October, 2014
4 / 32
Random Variables, Probability and Expectation

For instance the probability distribution of a discrete random
variable X consists of its possible values x and the probabilities of
each of these values, i.e. P (X = x ).
Then the expected value of a discrete r.v. E (X ) is the long-run
value (i.e. under repeated trials), computed as the probabilities
(weighted average) of the possible outcomes of the r.v.:
E (X ) = xi 2S xi Pr(xi )
where each xi is weighted by its probability Pr(xi ).
A special case is E (b ) = b, where b as the expected value of a
constant is the constant itself.
Expected value and variance (central tendency) help describing
typical outcomes of a random variable.
Maria Molina
(LSE)
Stats - Revision
October, 2014
5 / 32
Expected Values of Linear Transformations of Random

Variables
Now we focus on the a function say a linear transformation of a
discrete r.v.
Y = a + bX
where X and Y are r.v. and a and b are two constants. Our new
expected value is:
E (a + bX ) = a + bE (X )
X and Y are independent r.v. (we go back to independence in depth
later on)
E (XY ) = E (X )E (Y )
N.B. this analysis can be generalized for continuous r.v.
Maria Molina
(LSE)
Stats - Revision
October, 2014
6 / 32
Some rules for the expected values of linear

transformations of r.v.
For arbitrary r.v. X1 , X2 ,..., Xn it holds that:

E (X1 + X2 + ... + Xn ) = E (X1 ) + E (X2 ) + ... + E (Xn )
And applying this to linear transformations of r.v. we get:
E (c0 + c1 X1 + ... + cnXn ) = c0 + c1 E (X1 ) + ... + cn E (Xn )
where X1 , X2 ,..., Xn are r.v. and c0 , c1 ,..., cn are arbitrary constants
Maria Molina
(LSE)
Stats - Revision
October, 2014
7 / 32
Variance of one r.v.
The variance or the spreadmeasures how far an outcome of X is

from the expected value of X . Recall that E (X ) = is the
expected value.
var (X ) = E [(X
2x ] = E (X 2 )
E (X )2 = 2x
Variance is always nonnegative

If the r.v. X is a constant its variance is zero: var (X ) = 0
Adding a constant to a r.v. does not change the variance,
Multiplying the r.v. by a constant changes its dispersion.
Maria Molina
(LSE)
Stats - Revision
October, 2014
8 / 32
Variance of one r.v. - Linear transformations
Considering one r.v. and applying linear transformation we get:

var (aX + b ) = a2 var (X )
where X is a r.v. and a and b are constants
var (aX + b ) = var (Y )
var (Y ) = E (Y 2 ) E (Y )2
= E (a2 X 2 ) [E (aX )]2
= a2 E (X 2 ) a2 E (X )2
= a2 [E (X 2 ) E (X )2 ]
= a2 var (X )
Maria Molina
(LSE)
Stats - Revision
October, 2014
9 / 32
Linear Transformations of Random Variables: an example

Lets see in practice what a linear transformation of a r.v. means:
Example:
E(X) = $ 2.30 (average price of an ice cream to go)
sd(X) = $ 0.30 (its standard deviation)
a = 1.1 (add 10 % for tip if you stay)
b= $ 0.50 right to stay(each time you get an ice cream to stay)
E(Y) = ? (average price of an ice cream to stay?)
sd(Y) = ? (standard deviation to stay?)
Our new expected value and standard deviation are:
E (Y ) = aE (X ) + b
E (Y ) = 1.10
2.30 + 0.50 = $3.03
sd (Y ) = j a j sd (X )
sd (Y ) = 1.10
Maria Molina
(LSE)
0.30 = $0.33
Stats - Revision
October, 2014
10 / 32
Two or more independent r.v. - Variance and linear

transformation
Assume now that X and Y are independent discrete or continuous
r.v. (we go back to independence in depth later on), then:
var (X + Y ) = var (X ) + var (Y )
var (X Y ) = var (X ) + var (Y )
And applying linear transformation we get:
var (aX + bY ) = a2 var (X ) + b 2 var (Y )
var (c + aX + bY ) = a2 var (X ) + b 2 var (Y )
were cov (X , Y ) = 0, X and Y are r.v. and a, b, c are constants.
cov (Xi , Xj ) = 0 for all i 6= j , even if the
N.B.These results also hold whenever

r.v.s are not independent
Maria Molina
(LSE)
Stats - Revision
October, 2014
11 / 32
Two or more not independent r.v. - Variance and linear

transformation
If X and Y are not independent discrete or continuous r.v. we must
consider cov (X , Y ) 6= 0 (i.e. if two r.v. are independent they are also
uncorrelated)
var (aX + bY ) = a2 var (X ) + b 2 var (Y ) +2abcov (X , Y )
var (aX bY ) = a2 var (X ) + b 2 var (Y ) 2abcov (X , Y )
and we know that covariance is an indicator of how, on average, a
r.v. X is associated with variation in another r.v.Y .
cov (X , Y ) = E [(X x )(Y y )] = E (XY ) E (X )E (Y )
N.B. If X and Y are independent, then cov (X , Y ) = 0 , however, the converse
is not true. That is, if cov (X , Y ) = 0, then the independence of X and Y
should not be inferred.
Maria Molina
(LSE)
Stats - Revision
October, 2014
12 / 32
Covariance calculation: an example

Economic Growth %
xi
2.1
2.5
4
3.6
x = 3.1
Dow Jones Returns %

yi
8
12
14
10
y = 11
Substituing in the formula cov (X , Y ) = E [(xi

cov (Y ) =
=
x )(yi
y )] :
(2.1 3.1 )(8 11 )+(2.5 3.1 )(12 11 )+(4 3.1 )(14 11 )+(3.6 3.1 )(10 11 )
4 1
4.6
=
1.53
3
Since the covariance is positive, the variables are positively related

and move together in the same direction.
Maria Molina
(LSE)
Stats - Revision
October, 2014
13 / 32
More rules on covariance and correlation of r.v.

The covariance of a r.v. with itself is the variance of the variable.
cov (X , X ) = E (XX )
E (X )E (X ) = var (X )
Assume that X and Y are r.v.s and a,b, c and d are constants:
cov (a, X ) = E (aX ) E (a)E (X ) = 0
cov (aX , bY ) = abcov (X , Y )
Regarding linear transformations of the r.v.s the covariance becomes:
cov (aX , bY + c ) = abcov (X , Y ) + acov (X , c ) = abcov (X , Y )
And the correlation is:
corr (aX + c, bY + d ) = corr (X , Y ) if a b > 0
corr (aX + c, bY + d ) = -corr (X , Y ) if a b < 0
Maria Molina
(LSE)
Stats - Revision
October, 2014
14 / 32
Two or more not independent r.v. - Covariance and linear

transformation
Suppose Y = a + bX . What is cov (X , Y )?
y = a + bx
and
var (Y ) = E (Y y )2
Y y = a + bX (a + bx ) = b (X x )
E (Y y )2 = E [b (X x )]2 = b 2 [E (X x )]2
cov (X , Y ) = cov (X , a + bX )
= E [(X x )(Y y )]
= E [(X X )(a + bX (a + bx )]
= E [(X x )b (X x )]
= bE [(X x )2 ]
= bvar (X )
Maria Molina
(LSE)
Stats - Revision
October, 2014
15 / 32
Covariance and correlation

The correlation is an alternative measure of the dependence between
X and Y .
cov (X ,Y )
corr (X , Y ) = p
var (X )var (Y )
XY
X Y
The advantage with respect to covariance is that is solves the unitsissues of the
covariance (i.e. the units of
X multiplied by the units of Y ).
When cov (X , Y ) = 0, also corr (X , Y ) = 0. When this is the case,

we say that X and Y are uncorrelated.
If two r.v. are independent, they are also uncorrelated. The reverse
is not true: two r.v. can be dependent even when their correlation
is 0 (i.e. they have a non-linear association).
Note that correlation and covariance are measures of the strength
of linear association between X and Y (varies between 1 and 1).
Maria Molina
(LSE)
Stats - Revision
October, 2014
16 / 32
REVIEW LECTURE 2
Maria Molina
(LSE)
Stats - Revision
October, 2014
17 / 32
Independence r.v.
Summarizing some basics of independence:
If X and Y are independent, then cov (X , Y ) = 0
If cov (X , Y ) = 0 then corr (X , Y ) = 0 ( X and Y are uncorrelated)
Knowing only that cov (X , Y ) = 0 the independence of X and Y
should not be inferred. Indeed two r.v. can be dependent even when
their correlation is 0 if they have a non-linear relationship (see plots
in slide 25 ).
If the conditional mean of Y does not depend on X , then X and Y
are uncorrelated:
if E (Y jX ) = y ,then cov (X , Y ) = 0 and corr (X , Y ) = 0
It is not necessarily true that if X and Y are uncorrelated, then the
conditional mean of Y given X does not depend on X .
Ex: let Y and X be two independently distributed standard normal
r.v.s.
Y = X2 + Z
E (Y jX ) = X 2 depends on X but corr (X , Y ) = 0
Maria Molina
(LSE)
Stats - Revision
October, 2014
18 / 32
Conditional probability and independence of r.v.

Consider the probability of two events, suppose that you are told
that X has occurred. How does this aect the probability of event
Y ? The answer is given by the conditional probability of Y given
that X has occured:
Pr(Y = y jX = x ) =
Pr (X =x ,Y =y )
Pr (X =x )
However if X and Y are independent, learning that X has occurred

does not change the probability of Y , and learning that Y has
occurred does not change the probability of X .
Pr(Y = y jX = x ) =
Maria Molina
(LSE)
Pr (X =x ) Pr (Y =y )
Pr (X =x )
Stats - Revision
= Pr(Y )
October, 2014
19 / 32
Conditional probability and conditional expectations
Recall the denition of conditional probability associated with

BayesTheorem:
Pr(Yi jX ) =
Pr (X jY I ) Pr (Y I )
kxi =1 Pr (X jY I ) Pr (Y I )
Now we have additional information (the event X occured). How should

we modify our estimate to take this new information into account? Now
the weightsdepend on the outcome of the random variable X .The
resulting formula for conditional expectation of Y given X (also called
conditional mean) is:
E (Y jX ) = kxi =1 yi Pr(Y = yi jX = xi )
Maria Molina
(LSE)
Stats - Revision
October, 2014
20 / 32
Conditional expectations
Our rst line of attacking on the causality problemis the

randomized trials .
We are interested on the eect of a variable X on the expected value
of Y .
This is the reason why conditional expectations closely related to
the LLN and expectation are core concepts for us.
E (Y jX = xi ) is the average obtained as if everyone in the population
who has X = xi were to be sampled.
If Y and X are independent r.v.s., then the conditional
expectation of one random variable given the other is the same as
the unconditional expectation: E (Y jX ) = E (Y )
(See Angrist and Pischke, Mastering Metrics, Chapter 1 )
Maria Molina
(LSE)
Stats - Revision
October, 2014
21 / 32
Conditional expectations
E (Y ) is the simple weighted average of E (Y jX = xi ) where the
weights are the probability that X takes on the values x1 , x2 , .., xn
E (Y ) = E (Y jX = x1 ) Pr(x1 ) + E (Y jX = x2 ) Pr(x2 )+
... + E (Y jX = xl ) Pr(xl )
E (Y ) = lxi =1 E (Y jX = xi ) Pr(X = xi )
In other words the conditional expectation of Y given X (law of
iterated expectations) is given by:
E (Y ) = E [E (Y jX )]
This holds also for expectations that are conditioned on multiple r.v.
Maria Molina
(LSE)
Stats - Revision
October, 2014
22 / 32
Conditional variance
Similarly, if we are considering a conditional distribution Y jX , we

dene the conditional variance:
var (Y jX ) = lxi =1 [yi
E (Y jX = x )]2 Pr(Y = yi jX = x ))
which is the variance of a conditional distribution (i.e. we replace the

probability of Y with the probability of Y jX )
Maria Molina
(LSE)
Stats - Revision
October, 2014
23 / 32
Conditional expectation and correlation

If the conditional mean of Y does not depend on X , then Y and X
are uncorrelated.
E (Y jX ) = y ,then cov (X , Y ) = 0 and corr (X , Y ) = 0
If X and Y have mean zero:
cov (Y , X ) = E [(Y y )(X x )] = E (YX )
E (YX ) = E [E (YX jX )] = E [E (Y jX )X ] = 0
because E (Y jX ) = 0 so cov (X , Y ) = 0
Note it is not necessarily true that if X and Y are uncorrelated then
the conditional mean of Y given X does not depend on X. That is,
mean independence is a stronger concept than uncorrelatedness,
so uncorrelated doesnt imply mean independent (see more about this
in Stock and Watson, p.74).
Maria Molina
(LSE)
Stats - Revision
October, 2014
24 / 32
Dierent correlations: some plots

Its a good idea to plot the data (i.e. knowing the correlation
coe cient is zero does not mean the two variables are independent or
unrelated).
Correlation between X and Y
30
.2
40
.4
50
.6
60
.8
70
Correlation between writing and reading score
30
40
50
60
reading score
Fitted values
70
80
-1
-.5
writing score
0
X
Fitted values
Correlation between math and writing score
.5
30
40
5000
50
10000
60
15000
70
Correlation between price and repair record
30
40
50
writing score
Fitted values
Maria Molina
60
math score
(LSE)
70
3
Repair Record 1978
Fitted values
Price
Stats - Revision
October, 2014
25 / 32
Joint and marginal distributions
Suppose the values of the joint probability function Pr(X , Y ) are:

X
1
2
3
Pr(X )
1
0.1
0.2
0.1
0.4
2
0.2
0.3
0.1
0.6
Pr(Y )
0.3
0.5
0.2
1
Note that Pr(X , Y ) = 0.1 + 0.2 + ... + 0.6 = 1, as the sum of

all the probabilities is 1.
Maria Molina
(LSE)
Stats - Revision
October, 2014
26 / 32
Joint and marginal distributions

Applying what we learnt so far...
From the joint probability function Pr(X , Y ) we can get
Pr(1, 1) = 0.1 which is the probability when x = 1 and y = 1
The edges of the table show the values of the univariate marginal
probability functions,
for instance: Prx (1) = 0.1 + 0.2 + 0.1 = 0.4
Pry (1) = 0.1 + 0.2 = 0.3
The expected value of x is E (X ) = x Pr(x ) and

E (Y ) = y Pr(y )
E (X ) = 1
0.4 + 2
0.6 = 1.6
E (Y ) = 1
0.5 + 2
0.3 + 3
Maria Molina
(LSE)
0.2 = 1.7
Stats - Revision
October, 2014
27 / 32
Conditional probability and expectation
1
2
3
Pr(X )
X
1
0.1
0.2
0.1
0.4
2
0.2
0.3
0.1
0.6
Pr(Y )
0.3
0.5
0.2
1
Conditional probability: an example with our data

Pr Y jX for x = 1 equals Pr Y jX (x = 1) =
Pr(1, y )/ Pr(x = 1)
and when y = 2,becomes Pr y jx (x = 1)
= Pr(1, 2)/ Pr(x = 1) = 0.2/0.4 = 0.5
Conditional expectation: an example with our data
E (Y jX for x = 1) is y Pr y jx (x = 1)
= 1 0.25 + 2 0.5 + 3
Maria Molina
(LSE)
Stats - Revision
0.25 = 2
October, 2014
28 / 32
Correlation and independence

Pr Y jX
X
1
2
Y
3
Sum
E (Y jX )
1
0.25
0.5
0.25
1
2
2
0.33
0.5
0.17
1
1.84
Are x and y independent? If we nd at least one counterexample

(recall the joint Pr should be the same as the product of marginal
probability) we can assure that x and y are not independent.
Pr(Y = 1, X = 1) = 0.1 6=Pr(Y = 1) Pr(X = 1) = 0.3 0.4 = 0.12
Are x and y correlated? If two r.v. are independent, then their
correlation is zero.
However, if two r.v. have zero correlation, they are not necessarily
independent (i.e. they might be non-linearly associated)
Maria Molina
(LSE)
Stats - Revision
October, 2014
29 / 32
Hypothesis Testing, p-values, condence intervals

Connecting with lectures, how do we decide if:
Two group averages are the same (e.g. expenses for the group with
catastrophic and free coverage)? Test two means
The variance of school achievements for a small class should not be
greater than a given value? Test variances
There is a linear relationship between family size and mothers
education? Test correlation coe cients
Statistics provides us dierent tools to make a binary decision on a
hypothesis H0 , that is to reject or not reject H0 :
1
2
3
Hypothesis testing: we subjectively decide the signicance level :

10%, 5%, 1%.
P-values: a probablity of observing our estimate under H0 .
Condence intervals (CI): the set of values for which we cannot
reject H0 .
Maria Molina
(LSE)
Stats - Revision
October, 2014
30 / 32
Mechanics for taking our decision to reject H0

First step:
Find a t-statistic (e.g. for the dierence of two mean: t =
(Y n Y m )
SE
Second step:
1. Identify a critical value tcrit () for a t-test of signicance level : if the
observed t-statistic is bigger than 1.65, 2, ...H0 is rejected.
2. Set the p-value: if this probability is less than the signicance level
() 0.05, 0.10,... H0 is rejected.
3. Having decided the signicance level for CI, H0 is rejected if the
value hypothesised by H0 is not within the condence interval.
Maria Molina
(LSE)
Stats - Revision
October, 2014
31 / 32

Conclusions
If our tested value looks extreme, (e.g. outside two standard errors
away from our original target), we reject H0 .
If instead our tested value seems not that far away from our target
we do not have signicant evidence to reject H0 .
N.B. - Remember that we are always more conclusive if we can reject H0 .
- We can commit two type of errors: type I and type II.
- We subjectively decide (if not told we use 5%) which is the
probability of making a Type I error. Then this type of error is under
control.
Maria Molina
(LSE)
Stats - Revision
October, 2014
32 / 32

Stats Revision Slides (From Maria Molina-Domene)

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Stats Revision Slides (From Maria Molina-Domene)

Încărcat de

Drepturi de autor:

Formate disponibile

Refreshing Statistics

Some useful concepts

Experiments - Outcomes - Probability

Back to Stats, we dened experiments as processes under controlled

Random Variables, Probability and Expectation

Expected Values of Linear Transformations of Random

Some rules for the expected values of linear

For arbitrary r.v. X1 , X2 ,..., Xn it holds that:

Variance of one r.v.

The variance or the spreadmeasures how far an outcome of X is

Variance is always nonnegative

Variance of one r.v. - Linear transformations

Considering one r.v. and applying linear transformation we get:

Linear Transformations of Random Variables: an example

2.30 + 0.50 = $3.03

Two or more independent r.v. - Variance and linear

N.B.These results also hold whenever

Two or more not independent r.v. - Variance and linear

Covariance calculation: an example

Dow Jones Returns %

Substituing in the formula cov (X , Y ) = E [(xi

Since the covariance is positive, the variables are positively related

More rules on covariance and correlation of r.v.

Two or more not independent r.v. - Covariance and linear

Covariance and correlation

X multiplied by the units of Y ).

When cov (X , Y ) = 0, also corr (X , Y ) = 0. When this is the case,

Conditional probability and independence of r.v.

However if X and Y are independent, learning that X has occurred

Conditional probability and conditional expectations

Recall the denition of conditional probability associated with

Now we have additional information (the event X occured). How should

Our rst line of attacking on the causality problemis the

Similarly, if we are considering a conditional distribution Y jX , we

which is the variance of a conditional distribution (i.e. we replace the

Conditional expectation and correlation

Dierent correlations: some plots

Correlation between writing and reading score

Correlation between math and writing score

Correlation between price and repair record

Joint and marginal distributions

Suppose the values of the joint probability function Pr(X , Y ) are:

Note that Pr(X , Y ) = 0.1 + 0.2 + ... + 0.6 = 1, as the sum of

Joint and marginal distributions

Pry (1) = 0.1 + 0.2 = 0.3

The expected value of x is E (X ) = x Pr(x ) and

Conditional probability and expectation

Conditional probability: an example with our data

Correlation and independence

Are x and y independent? If we nd at least one counterexample

Hypothesis Testing, p-values, condence intervals

Hypothesis testing: we subjectively decide the signicance level :

Hypothesis Testing, p-values, condence intervals

Mechanics for taking our decision to reject H0

Hypothesis Testing, p-values, condence intervals

S-ar putea să vă placă și