Sunteți pe pagina 1din 32

Refreshing Statistics

Some useful concepts

Maria Molina-Domene
LSE

October, 2014

Maria Molina

(LSE)

Stats - Revision

October, 2014

1 / 32

REVIEW LECTURE 1

Maria Molina

(LSE)

Stats - Revision

October, 2014

2 / 32

Experiments - Outcomes - Probability

Back to Stats, we dened experiments as processes under controlled


conditions. The emblematic example we use was rolling a die and
recording its value.
The results of the experiment were called outcomes.
We dened probability as the proportion of trials of getting a specic
outcome A if the experiment was repeated a large number of times.
Example:
Experiment: toss a coin
Outcomes: head or tail (mutually exclusive)
Probability: the proportion of times heads come out of those tosses

Maria Molina

(LSE)

Stats - Revision

October, 2014

3 / 32

Random Variables
Experiment: toss a coin 10,000 times
Outcomes: 5023 of the tosses come up heads
Probability: P(heads) = 0.5 for that coin
Event
Outcomes
A
Heads
=
= 0.5
P (A) =
Total
Sample
Outcomes
Space
Random variables are the numeric values assigned to these
"randomoutcomes".
Ex:rolling two coins S = {(1, 1), (0, 0), . . . , (0, 1), (0, 1)}

Maria Molina

(LSE)

Stats - Revision

October, 2014

4 / 32

Random Variables, Probability and Expectation


For instance the probability distribution of a discrete random
variable X consists of its possible values x and the probabilities of
each of these values, i.e. P (X = x ).
Then the expected value of a discrete r.v. E (X ) is the long-run
value (i.e. under repeated trials), computed as the probabilities
(weighted average) of the possible outcomes of the r.v.:
E (X ) = xi 2S xi Pr(xi )
where each xi is weighted by its probability Pr(xi ).
A special case is E (b ) = b, where b as the expected value of a
constant is the constant itself.
Expected value and variance (central tendency) help describing
typical outcomes of a random variable.
Maria Molina

(LSE)

Stats - Revision

October, 2014

5 / 32

Expected Values of Linear Transformations of Random


Variables
Now we focus on the a function say a linear transformation of a
discrete r.v.
Y = a + bX
where X and Y are r.v. and a and b are two constants. Our new
expected value is:
E (a + bX ) = a + bE (X )
X and Y are independent r.v. (we go back to independence in depth
later on)
E (XY ) = E (X )E (Y )
N.B. this analysis can be generalized for continuous r.v.

Maria Molina

(LSE)

Stats - Revision

October, 2014

6 / 32

Some rules for the expected values of linear


transformations of r.v.

For arbitrary r.v. X1 , X2 ,..., Xn it holds that:


E (X1 + X2 + ... + Xn ) = E (X1 ) + E (X2 ) + ... + E (Xn )
And applying this to linear transformations of r.v. we get:
E (c0 + c1 X1 + ... + cnXn ) = c0 + c1 E (X1 ) + ... + cn E (Xn )
where X1 , X2 ,..., Xn are r.v. and c0 , c1 ,..., cn are arbitrary constants

Maria Molina

(LSE)

Stats - Revision

October, 2014

7 / 32

Variance of one r.v.

The variance or the spreadmeasures how far an outcome of X is


from the expected value of X . Recall that E (X ) = is the
expected value.
var (X ) = E [(X

2x ] = E (X 2 )

E (X )2 = 2x

Variance is always nonnegative


If the r.v. X is a constant its variance is zero: var (X ) = 0
Adding a constant to a r.v. does not change the variance,
Multiplying the r.v. by a constant changes its dispersion.

Maria Molina

(LSE)

Stats - Revision

October, 2014

8 / 32

Variance of one r.v. - Linear transformations

Considering one r.v. and applying linear transformation we get:


var (aX + b ) = a2 var (X )
where X is a r.v. and a and b are constants
var (aX + b ) = var (Y )
var (Y ) = E (Y 2 ) E (Y )2
= E (a2 X 2 ) [E (aX )]2
= a2 E (X 2 ) a2 E (X )2
= a2 [E (X 2 ) E (X )2 ]
= a2 var (X )
Maria Molina

(LSE)

Stats - Revision

October, 2014

9 / 32

Linear Transformations of Random Variables: an example


Lets see in practice what a linear transformation of a r.v. means:
Example:
E(X) = $ 2.30 (average price of an ice cream to go)
sd(X) = $ 0.30 (its standard deviation)
a = 1.1 (add 10 % for tip if you stay)
b= $ 0.50 right to stay(each time you get an ice cream to stay)
E(Y) = ? (average price of an ice cream to stay?)
sd(Y) = ? (standard deviation to stay?)
Our new expected value and standard deviation are:
E (Y ) = aE (X ) + b
E (Y ) = 1.10

2.30 + 0.50 = $3.03

sd (Y ) = j a j sd (X )
sd (Y ) = 1.10
Maria Molina

(LSE)

0.30 = $0.33

Stats - Revision

October, 2014

10 / 32

Two or more independent r.v. - Variance and linear


transformation
Assume now that X and Y are independent discrete or continuous
r.v. (we go back to independence in depth later on), then:
var (X + Y ) = var (X ) + var (Y )
var (X Y ) = var (X ) + var (Y )
And applying linear transformation we get:
var (aX + bY ) = a2 var (X ) + b 2 var (Y )
var (c + aX + bY ) = a2 var (X ) + b 2 var (Y )
were cov (X , Y ) = 0, X and Y are r.v. and a, b, c are constants.
cov (Xi , Xj ) = 0 for all i 6= j , even if the

N.B.These results also hold whenever


r.v.s are not independent
Maria Molina

(LSE)

Stats - Revision

October, 2014

11 / 32

Two or more not independent r.v. - Variance and linear


transformation
If X and Y are not independent discrete or continuous r.v. we must
consider cov (X , Y ) 6= 0 (i.e. if two r.v. are independent they are also
uncorrelated)
var (aX + bY ) = a2 var (X ) + b 2 var (Y ) +2abcov (X , Y )
var (aX bY ) = a2 var (X ) + b 2 var (Y ) 2abcov (X , Y )
and we know that covariance is an indicator of how, on average, a
r.v. X is associated with variation in another r.v.Y .
cov (X , Y ) = E [(X x )(Y y )] = E (XY ) E (X )E (Y )
N.B. If X and Y are independent, then cov (X , Y ) = 0 , however, the converse
is not true. That is, if cov (X , Y ) = 0, then the independence of X and Y
should not be inferred.
Maria Molina

(LSE)

Stats - Revision

October, 2014

12 / 32

Covariance calculation: an example


Economic Growth %
xi
2.1
2.5
4
3.6
x = 3.1

Dow Jones Returns %


yi
8
12
14
10

y = 11

Substituing in the formula cov (X , Y ) = E [(xi


cov (Y ) =
=

x )(yi

y )] :

(2.1 3.1 )(8 11 )+(2.5 3.1 )(12 11 )+(4 3.1 )(14 11 )+(3.6 3.1 )(10 11 )
4 1
4.6
=
1.53
3

Since the covariance is positive, the variables are positively related


and move together in the same direction.
Maria Molina

(LSE)

Stats - Revision

October, 2014

13 / 32

More rules on covariance and correlation of r.v.


The covariance of a r.v. with itself is the variance of the variable.
cov (X , X ) = E (XX )

E (X )E (X ) = var (X )

Assume that X and Y are r.v.s and a,b, c and d are constants:
cov (a, X ) = E (aX ) E (a)E (X ) = 0
cov (aX , bY ) = abcov (X , Y )
Regarding linear transformations of the r.v.s the covariance becomes:
cov (aX , bY + c ) = abcov (X , Y ) + acov (X , c ) = abcov (X , Y )
And the correlation is:
corr (aX + c, bY + d ) = corr (X , Y ) if a b > 0
corr (aX + c, bY + d ) = -corr (X , Y ) if a b < 0
Maria Molina

(LSE)

Stats - Revision

October, 2014

14 / 32

Two or more not independent r.v. - Covariance and linear


transformation
Suppose Y = a + bX . What is cov (X , Y )?
y = a + bx
and
var (Y ) = E (Y y )2
Y y = a + bX (a + bx ) = b (X x )
E (Y y )2 = E [b (X x )]2 = b 2 [E (X x )]2

cov (X , Y ) = cov (X , a + bX )
= E [(X x )(Y y )]
= E [(X X )(a + bX (a + bx )]
= E [(X x )b (X x )]
= bE [(X x )2 ]
= bvar (X )
Maria Molina

(LSE)

Stats - Revision

October, 2014

15 / 32

Covariance and correlation


The correlation is an alternative measure of the dependence between
X and Y .
cov (X ,Y )
corr (X , Y ) = p

var (X )var (Y )

XY
X Y

The advantage with respect to covariance is that is solves the unitsissues of the
covariance (i.e. the units of

X multiplied by the units of Y ).

When cov (X , Y ) = 0, also corr (X , Y ) = 0. When this is the case,


we say that X and Y are uncorrelated.
If two r.v. are independent, they are also uncorrelated. The reverse
is not true: two r.v. can be dependent even when their correlation
is 0 (i.e. they have a non-linear association).
Note that correlation and covariance are measures of the strength
of linear association between X and Y (varies between 1 and 1).
Maria Molina

(LSE)

Stats - Revision

October, 2014

16 / 32

REVIEW LECTURE 2

Maria Molina

(LSE)

Stats - Revision

October, 2014

17 / 32

Independence r.v.
Summarizing some basics of independence:
If X and Y are independent, then cov (X , Y ) = 0
If cov (X , Y ) = 0 then corr (X , Y ) = 0 ( X and Y are uncorrelated)
Knowing only that cov (X , Y ) = 0 the independence of X and Y
should not be inferred. Indeed two r.v. can be dependent even when
their correlation is 0 if they have a non-linear relationship (see plots
in slide 25 ).
If the conditional mean of Y does not depend on X , then X and Y
are uncorrelated:
if E (Y jX ) = y ,then cov (X , Y ) = 0 and corr (X , Y ) = 0
It is not necessarily true that if X and Y are uncorrelated, then the
conditional mean of Y given X does not depend on X .
Ex: let Y and X be two independently distributed standard normal
r.v.s.
Y = X2 + Z
E (Y jX ) = X 2 depends on X but corr (X , Y ) = 0
Maria Molina

(LSE)

Stats - Revision

October, 2014

18 / 32

Conditional probability and independence of r.v.


Consider the probability of two events, suppose that you are told
that X has occurred. How does this aect the probability of event
Y ? The answer is given by the conditional probability of Y given
that X has occured:
Pr(Y = y jX = x ) =

Pr (X =x ,Y =y )
Pr (X =x )

However if X and Y are independent, learning that X has occurred


does not change the probability of Y , and learning that Y has
occurred does not change the probability of X .
Pr(Y = y jX = x ) =

Maria Molina

(LSE)

Pr (X =x ) Pr (Y =y )
Pr (X =x )

Stats - Revision

= Pr(Y )

October, 2014

19 / 32

Conditional probability and conditional expectations

Recall the denition of conditional probability associated with


BayesTheorem:
Pr(Yi jX ) =

Pr (X jY I ) Pr (Y I )
kxi =1 Pr (X jY I ) Pr (Y I )

Now we have additional information (the event X occured). How should


we modify our estimate to take this new information into account? Now
the weightsdepend on the outcome of the random variable X .The
resulting formula for conditional expectation of Y given X (also called
conditional mean) is:
E (Y jX ) = kxi =1 yi Pr(Y = yi jX = xi )
Maria Molina

(LSE)

Stats - Revision

October, 2014

20 / 32

Conditional expectations

Our rst line of attacking on the causality problemis the


randomized trials .
We are interested on the eect of a variable X on the expected value
of Y .
This is the reason why conditional expectations closely related to
the LLN and expectation are core concepts for us.
E (Y jX = xi ) is the average obtained as if everyone in the population
who has X = xi were to be sampled.
If Y and X are independent r.v.s., then the conditional
expectation of one random variable given the other is the same as
the unconditional expectation: E (Y jX ) = E (Y )
(See Angrist and Pischke, Mastering Metrics, Chapter 1 )
Maria Molina

(LSE)

Stats - Revision

October, 2014

21 / 32

Conditional expectations
E (Y ) is the simple weighted average of E (Y jX = xi ) where the
weights are the probability that X takes on the values x1 , x2 , .., xn
E (Y ) = E (Y jX = x1 ) Pr(x1 ) + E (Y jX = x2 ) Pr(x2 )+
... + E (Y jX = xl ) Pr(xl )
E (Y ) = lxi =1 E (Y jX = xi ) Pr(X = xi )
In other words the conditional expectation of Y given X (law of
iterated expectations) is given by:
E (Y ) = E [E (Y jX )]
This holds also for expectations that are conditioned on multiple r.v.
Maria Molina

(LSE)

Stats - Revision

October, 2014

22 / 32

Conditional variance

Similarly, if we are considering a conditional distribution Y jX , we


dene the conditional variance:
var (Y jX ) = lxi =1 [yi

E (Y jX = x )]2 Pr(Y = yi jX = x ))

which is the variance of a conditional distribution (i.e. we replace the


probability of Y with the probability of Y jX )

Maria Molina

(LSE)

Stats - Revision

October, 2014

23 / 32

Conditional expectation and correlation


If the conditional mean of Y does not depend on X , then Y and X
are uncorrelated.
E (Y jX ) = y ,then cov (X , Y ) = 0 and corr (X , Y ) = 0
If X and Y have mean zero:
cov (Y , X ) = E [(Y y )(X x )] = E (YX )
E (YX ) = E [E (YX jX )] = E [E (Y jX )X ] = 0
because E (Y jX ) = 0 so cov (X , Y ) = 0
Note it is not necessarily true that if X and Y are uncorrelated then
the conditional mean of Y given X does not depend on X. That is,
mean independence is a stronger concept than uncorrelatedness,
so uncorrelated doesnt imply mean independent (see more about this
in Stock and Watson, p.74).
Maria Molina

(LSE)

Stats - Revision

October, 2014

24 / 32

Dierent correlations: some plots


Its a good idea to plot the data (i.e. knowing the correlation
coe cient is zero does not mean the two variables are independent or
unrelated).
Correlation between X and Y

30

.2

40

.4

50

.6

60

.8

70

Correlation between writing and reading score

30

40

50
60
reading score
Fitted values

70

80

-1

-.5

writing score

0
X
Fitted values

Correlation between math and writing score

.5

30

40

5000

50

10000

60

15000

70

Correlation between price and repair record

30

40

50
writing score
Fitted values

Maria Molina

60
math score

(LSE)

70

3
Repair Record 1978
Fitted values

Price

Stats - Revision

October, 2014

25 / 32

Joint and marginal distributions

Suppose the values of the joint probability function Pr(X , Y ) are:


X

1
2
3
Pr(X )

1
0.1
0.2
0.1
0.4

2
0.2
0.3
0.1
0.6

Pr(Y )
0.3
0.5
0.2
1

Note that Pr(X , Y ) = 0.1 + 0.2 + ... + 0.6 = 1, as the sum of


all the probabilities is 1.

Maria Molina

(LSE)

Stats - Revision

October, 2014

26 / 32

Joint and marginal distributions


Applying what we learnt so far...
From the joint probability function Pr(X , Y ) we can get
Pr(1, 1) = 0.1 which is the probability when x = 1 and y = 1
The edges of the table show the values of the univariate marginal
probability functions,
for instance: Prx (1) = 0.1 + 0.2 + 0.1 = 0.4

Pry (1) = 0.1 + 0.2 = 0.3

The expected value of x is E (X ) = x Pr(x ) and


E (Y ) = y Pr(y )
E (X ) = 1

0.4 + 2

0.6 = 1.6

E (Y ) = 1

0.5 + 2

0.3 + 3

Maria Molina

(LSE)

0.2 = 1.7
Stats - Revision

October, 2014

27 / 32

Conditional probability and expectation

1
2
3
Pr(X )

X
1
0.1
0.2
0.1
0.4

2
0.2
0.3
0.1
0.6

Pr(Y )
0.3
0.5
0.2
1

Conditional probability: an example with our data


Pr Y jX for x = 1 equals Pr Y jX (x = 1) =
Pr(1, y )/ Pr(x = 1)
and when y = 2,becomes Pr y jx (x = 1)
= Pr(1, 2)/ Pr(x = 1) = 0.2/0.4 = 0.5
Conditional expectation: an example with our data
E (Y jX for x = 1) is y Pr y jx (x = 1)
= 1 0.25 + 2 0.5 + 3
Maria Molina

(LSE)

Stats - Revision

0.25 = 2
October, 2014

28 / 32

Correlation and independence


Pr Y jX

X
1
2
Y
3
Sum
E (Y jX )

1
0.25
0.5
0.25
1
2

2
0.33
0.5
0.17
1
1.84

Are x and y independent? If we nd at least one counterexample


(recall the joint Pr should be the same as the product of marginal
probability) we can assure that x and y are not independent.
Pr(Y = 1, X = 1) = 0.1 6=Pr(Y = 1) Pr(X = 1) = 0.3 0.4 = 0.12
Are x and y correlated? If two r.v. are independent, then their
correlation is zero.
However, if two r.v. have zero correlation, they are not necessarily
independent (i.e. they might be non-linearly associated)
Maria Molina

(LSE)

Stats - Revision

October, 2014

29 / 32

Hypothesis Testing, p-values, condence intervals


Connecting with lectures, how do we decide if:
Two group averages are the same (e.g. expenses for the group with
catastrophic and free coverage)? Test two means
The variance of school achievements for a small class should not be
greater than a given value? Test variances
There is a linear relationship between family size and mothers
education? Test correlation coe cients
Statistics provides us dierent tools to make a binary decision on a
hypothesis H0 , that is to reject or not reject H0 :
1

2
3

Hypothesis testing: we subjectively decide the signicance level :


10%, 5%, 1%.
P-values: a probablity of observing our estimate under H0 .
Condence intervals (CI): the set of values for which we cannot
reject H0 .
Maria Molina

(LSE)

Stats - Revision

October, 2014

30 / 32

Hypothesis Testing, p-values, condence intervals

Mechanics for taking our decision to reject H0


First step:
Find a t-statistic (e.g. for the dierence of two mean: t =

(Y n Y m )
SE

Second step:
1. Identify a critical value tcrit () for a t-test of signicance level : if the
observed t-statistic is bigger than 1.65, 2, ...H0 is rejected.
2. Set the p-value: if this probability is less than the signicance level
() 0.05, 0.10,... H0 is rejected.
3. Having decided the signicance level for CI, H0 is rejected if the
value hypothesised by H0 is not within the condence interval.
Maria Molina

(LSE)

Stats - Revision

October, 2014

31 / 32

Hypothesis Testing, p-values, condence intervals


Conclusions
If our tested value looks extreme, (e.g. outside two standard errors
away from our original target), we reject H0 .
If instead our tested value seems not that far away from our target
we do not have signicant evidence to reject H0 .
N.B. - Remember that we are always more conclusive if we can reject H0 .
- We can commit two type of errors: type I and type II.
- We subjectively decide (if not told we use 5%) which is the
probability of making a Type I error. Then this type of error is under
control.

Maria Molina

(LSE)

Stats - Revision

October, 2014

32 / 32

S-ar putea să vă placă și