Sunteți pe pagina 1din 20

Review of Probability

References Hogg, R. V., and A. T. Craig, 1995. Introduction to Mathematical Statistics. Prentice-Hall. J. H. Stock and M. W. Watson, 2003. Introduction to Econometrics. Addison-Wesley. (Chapter 2 - advanced undergraduate level book) E. Zivot, 2002. Lecture Notes on Applied Econometric Modeling in Finance. The web link is: http://faculty.washington.edu/ezivot/econ483/483notes.htm W. H. Greene, 2000. Econometric Analysis. Prentice Hall. (Chapters 3, 4 - introductory graduate level book)

Random variables and Probability Distribution

We view the observation on some aspect of the economy (or real life) as the outcome of a random experiment. The probability of an outcome is the proportion of the time that the outcome occurs in the long run. If the probability of your computer not crashing while you are doing a problem set is 90 %, then over the course of doing many problem sets, you will complete 90 % without a crash. The set of all possible outcomes is called the sample space, denoted S X . An event is a subset of the sample space, i.e an event is a set of one or more outcomes. A random variable is a numerical summary of a random outcome. Some random variables are discrete and some are continuous. Denition A random variable X is a variable that can take on a given set of values, called the sample space and denoted SX , where the likelihood of the values in SX is determined by Xs probability distribution (pdf).

1.1

Probability Distribution of a Discrete Random Variable

The probability distribution of a discrete random variable, denoted f (x), is the list of all possible values of the variable and the probability that each value will occur, i.e. f (x) = P r(X = x). The pdf mus satisfy (i) f (x) 0 for all x SX ; (ii) f (x) = 0 for all x / SX ; and (iii) variable is less than or equal to a particular value: F (x) = P r(X x), x The cdf has the following properties: 1 (1)
xSX

f (x) = 1.

The cumulative probability distribution (c.d.f ), denoted F , is the probability that the random

1. If x1 < x2 then F (x1 ) F (x2 ) 2. F () = 0 and F () = 1 3. P r(X > x) = 1 F (x) 4. P r(x1 < X x2 ) = F (x2 ) F (x1 ) Table 1: Probability of Your Computer Crashing M Times Outcome (number of crashes) 0 Probability distribution Cumulative prob. distribution 0.80 0.80 1 0.10 0.90 2 0.06 0.96 3 0.03 0.99 4 0.01 1.00

An important special case of a discrete random variable is binary, i.e. outcomes are 0 and 1. A binary random variable is called a Bernoulli random variable and its probability is called the Bernoulli distribution.

1.2

Probability Distribution of a Continuous Random Variable

Unlike discrete random variable, a continuous random variable takes on a continuum of possible values and it is not possible to list the probability of each possible value of random variables. The probability is summarized by the probability density function (p.d.f.), denoted f (x). The area under the probability function between any two points is the probability that the random variable falls between those two points. Formally, the pdf of a continuous random variable X is a nonnegative function f (x), dened on the real line, such that for any interval A:

P r(X A) = satisfy (i) f (x) 0; and


f (x)d(x)
A

(2)

That is, P r(X A) is the area under the probability curve over the interval A. The pdf f (x) must f (x)dx = 1. Example 1 Let the random variable X of the continuous type have the pdf f (x) = 2/x 3 , 1 < x < , zero elsewhere. The distribution function of X is

F (x)

= =
1

0dw = 0, x < 1

2 1 dw = 1 2 , 1 x w x

Data
Factor returns {F }T Factor exposures {B }
T t t=1

t t=1

market returns size mimicking portfolio return (FF 92) value mimicking portfolio return industry mimicking portfolio return momentum mimicking portfolio return, etc

Exposure for all stock for all periods market capitalization of each company B/M ratio for each company E/P ratio, D/P ratio leverage ratios industry and sector dummies

Timeseries regressions

Crosssectional regressions

Estimate i, i=1,...,N

Estimate Ft, t=1,...,T

Portfolio variance: VpT=wTBTTBTwT+wTDTwT

Portfolio variance: VpT=wTBTTBTwT+wTDTwT

Figure 1: Pdf and Cdf of random variable in Example 1. Example 2 let f (x) = 1 2 , 1 < x < 1, zero elsewhere, be the pdf of the random variable X. Dene the random variable Y by Y = X 2 . We wish to nd the pdf of Y. If y 0, the probability P r(Y y ) is equivalent to P r(X 2 y ) = P r( y X sqrty ) Accordingly, the distribution function of Y, G(y ) = P r(Y y ), is given by

(3)

G(y )

0, y < 0 3

= =

1 dx = y, 0 y < 1 2 1, 1 y
y

Since Y is a random variable of the continuous type, the pdf of Y is g (y ) = G (y ) at all point of continuity of g (y ). Therefore,

g (y )

= =

1 , 0<y<1 2 y 0, elsewhere

Expected Values, Mean, and Variance

The expected value of a random variable Y, denoted E (Y ), is the long-run average value of the random variable over many repeated trials or occurrences. The expected value of Y is also called the expectation of Y or the mean of Y. The terminology of expectation or expected value has its origin in games of chance. This can be illustrated as follows: four similar chips, numbered 1,1,1, and 2, respectively, are placed in a bowl and are mixed. A player is blindfolded and is to draw a chip from the bowl. If she draws one the three chips numbered 1, she will receive one dollar. If she draws the chip numbered 2, she receive two dollars. It
1 seems reasonable to assume that the players has a 3 4 claim on the $1 and a 4 claim on the $2. Her 3 4 1 4

total claim is 1 game.

+2

= 1.25. Thus the expectation of X is precisely the players claim in this

Suppose that the random variable Y takes on k possible outcomes, y1 , y2 , ..., yk , where y1 denotes the rst value, y2 denotes the second value, etc., and the probability that Y takes on y1 is p1 , the probability that Y takes on y2 is p2 , and so forth. The expected value of Y is:
k

E (Y ) = y1 p1 + y2 p2 + ... + yk pk =
i=1

y i pi

(4)

The expected value of a continuous random variable is


b

E [x] =
a

xf (x)dx

(5)

where f (x) is probability density function and x takes on values between points a and b. As an example, consider the number of computer crashes M with the probability distribution given in Table 1. The expected value of M is the average number of crashes over many problem sets, weighted by the frequency with which a crash of a given size occurs:

E (M ) = 0 0.80 + 1 0.10 + 2 0.06 + 3 0.03 + 4 0.01 = 0.35

(6)

That is the expected number of computer crashes while doing a particular problem set is 0.35. Obviously, the actual number of crashes is an integer. The calculation above just means that the average number of crashes over many problem sets is 0.35. Example 3 Let the random variable X of the discrete type have the pdf given by the table x f(x) 1
4 10

2
1 10

3
3 10

4
2 10

Here f (x) = 0 if x is not equal to one of the rst four positive integers. This illustrates the fact that there is no need to have a formula to describe a pdf. We have 4 1 3 2 +2 +3 +4 = 2.3 10 10 10 10 Example 4 Let X have the pdf: f (x) = = Then
1 1

E (X ) = 1

(7)

4x3 , 0 < x < 1 0 elsewhere

E (X ) =
0

x(4x3 )dx =
0

4x4 dx =

4x5 5

=
0

4 5

(8)

2.1

Variance, Standard Deviation, Moments, Skewness and Kurtosis

The variance and standard deviation measure the dispersion or the spread of a probability distribution. The variance of random variable Y is the expected value of the square of the deviation of Y from its mean:

V ar(Y )

= =

E (Y E (Y ))2
y

(y Y )2 f (y ), if y is discrete (y Y )2 f (y )dy if y is continuous 5

=
y

The standard deviation is the square root of the variance. The mean of Y, E (y ), is also called the rst moment of Y, and the expected value of square of Y, E (Y 2 ), is also called the second moment of Y. In general, the expected value of Y r is called the r-th moment of the random variable Y. Example 5 Let X have the pdf 1 (x + 1), 1 < x < 1 2 0 elsewhere

f (x)

= =

Then the mean value of X is


1

= while the variance of X is 2 =


xf (x)dx =
1

x+1 1 dx = 2 3

(9)

x2 f (x)dx 2 =

x2
1

x+1 1 2 dx ( )2 = 2 3 9

(10)

Example 6 It is known that the series 1 1 1 + 2 + 2 + ... 12 2 3 converges to 2 /6. Then 6 2 x2

(11)

f (x)

= =

, x = 1, 2, 3, ..., (12)

0 elsewhere

is the pdf of a discrete type of random variable. The skewness of a random variable X, denoted skew(X ), measures the symmetry of a distribution about its mean. The skewness is dened: E [(X X )3 ] 3 X
xSX (x (x

skew(X )

= = =

X )3 P r(X = x) , f or a discrete random variable 3 X , f or a continuous random variable 6

X )3 f (x)dx
3 X

If the random variable X has a symmetric distribution then skew(X ) = 0. If skew(X ) > 0 then the distribution X has a long right tail (positive value are more likely than negative ones) and if skew(X ) < 0 the distribution of X has a long left tail. The kurtosis of a random variable X , denoted kurt(X ), measures the thickness in the tails of a distribution. The kurtosis is dened:

kurt(X )

= = =

E [(X X )4 ] 4 X
xSX (x (x

X )4 P r(X = x) , f or a discrete random variable 4 X , f or a continuous random variable

X )4 f (x)dx
4 X

The normal distribution has a kurtosis of 3. If a distribution has a kurtosis greater than 3 then the distribution has thicker tails than the normal distribution and if a distribution has kurtosis less than 3 then the distribution has thinner tails than the normal distribution.

2.2

Mean and Variance of a Linear Function of a Random Variable

Suppose that after-tax earnings Y are related to pre-tax earnings X by the equation:

Y = 2, 000 + 0.8X

(13)

where $2, 000 is the amount of grant. Suppose an individual pre-tax earnings next year are a random
2 . Since pre-tax earnings are random, so are after-tax earnings. variable with mean X and variance X

The mean and variance of Y are as follows: E (Y )


2 Y

= =

Y = 2, 000 + 0.8X
2 0.82 X

(14)

In general, if Y depends on X with and intercept a and a slope b, so that: Y = a + bX Then the mean and variance of Y are: (15)

Y
2 Y

= =

a + bX
2 b2 X

(16)

Two Random Variables

The joint probability distribution of two discrete random variables, X and Y , is the probability that the random variables simultaneously take on certain values, x and y . Consider an example of a joint distribution of two variables in Table 2. Let Y be a binary random variable that equals one if the commute is short (less than 20 minutes) and equal zero otherwise, and let X be a binary random variable that equals zero id it is raining and one if not. The joint distribution is the frequency with which each of these four outcomes occur over many repeated commutes. Table 2: Joint Distribution of Weather Conditions and Commuting Times Rain (X=0) Long Commute (Y=0) Short Commute (Y=1) Total 0.15 0.15 0.30 No Rain (X=1) 0.07 0.63 0.70 Total 0.22 0.78 1.00

Formally, the joint density function for two random variables X and Y , denoted f (x, y ) is dened so that:

P rob(a x b, c y d) = if x and y are discrete.

f (x, y )
axb cy d

(17)

P rob(a x b, c y d) = if x and y are continuous.

f (x, y )dydx
axb cy d

(18)

The marginal probability distribution of a random variable Y is just another name for its probability distribution. The term is used to distinguish the distribution of Y alone (the marginal distribution) from the join distribution of Y and another random variable. The marginal distribution of Y can be computed from join distribution of X and Y by adding up the probabilities of all possible outcomes for which Y takes on a specied value. For example, in Table 2, the probability of a long rainy commute is 15% and the probability of a long rain with no rain is 7%, so the probability of a long commute (rainy or not) is 22%. Formally, to obtain the marginal distribution from the joint density, it is necessary to sum or integrate out the other variables:

fx (x)

=
y SY

f (x, y ), in the discrete case

fx (x) and similarly for fy (y ).

=
y

f (x, s)ds, in the continuous case

Example 7 Let X1 and X2 have the joint pdf

f (x1 , x2 )

= =

x1 + x2 , 0 < x1 < 1, 0 < x2 < 1 0 elsewhere

(19) (20)

The marginal pdf of X1 is


1

f1 (x1 ) =
0

1 (x1 + x2 )dx2 = x1 + , 0 < x1 < 1 2

(21)

zero elsewhere, and the marginal pdf of X2 is


1

f2 (x2 ) =
0

1 (x1 + x2 )dx1 = x2 + , 0 < x2 < 1 2

(22)

zero elsewhere. A probability like P r(X1 1 2 ) can be computed from either f1 (x1 ) or f (x1 , x2 ) because
1/2 0 0 1 1/2

f (x1 , x2 )dx2 dx1 =


0

f1 (x1 )dx1 = 3/8

(23)

However to nd a probability like P r(X1 + X2 1), one must use the joint pdf.

3.1

Conditional Distributions

The distribution of a random variable Y conditional on another random variable X taking on a specic value is called the conditional distribution of Y given X. What is the probability of a long commute (Y = 0) if you know it is raining (X = 0)? From Table 2, the joint distribution of a rainy short commute is 15% and the joint probability of a rainy long commute is 15%, so if it is raining a long commute and a short commute are equally likely. Thus the probability of a long commute (Y = 0), conditional on it being rainy (X = 0) is 50%, i.e. P r(Y = 0|X = 0) = 0.50. In general the conditional distribution of Y given X = x is P r(Y = y |X = x) = or f (x, y ) fx (x) P r(X = x, Y = y ) P r(X = x) (24)

f (y |x) = 9

(25)

3.2

Conditional Expectation

The conditional expectation of Y given X, also called the conditional mean of Y given X, is the man of the conditional distribution of Y given X. That is, the conditional expectation of is the expected value of Y, computed using the conditional distribution of Y given X. If Y takes on values y1 , ..., yk , then the conditional mean of Y given X = x is
k

E (Y |X = x) = or E (y |x) = E (y |x) =
y

i=1

P r(Y = yi |X = x)

(26)

yf (y |x), if y is discrete

(27)

yf (y |x)dy, if y is continuous

(28)

Consider an example in Table 3. The expected number of computer crashes, given the computer is old, is E (M |A = 0) = 0.70 + 1 0.13 + 2 0.10 + 3 0.05 + 4 0.02 = 0.56. The expected number of computer crashes given that the computer is new, is E (M |A = 1) = 0.14. Table 3: Joint and Conditional Distributions of Computer Crashes(M) and Computer Age (A) A. Joint Distribution M=0 Old computer (A=0) New computer (A=1) Total 0.35 0.45 0.8 M= 1 0.065 0.035 0.1 M= 2 0.05 0.01 0.06 M= 3 0.025 0.005 0.03 M= 4 0.01 0.00 0.01 Total 0.5 0.5 1.0

B. Conditional Distributions of M given A M=0 Pr(MA=0) Pr(MA=1) 0.70 0.90 M= 1 0.13 0.07 M= 2 0.10 0.02 M= 3 0.05 0.01 M= 4 0.02 0.00 Total 1.0 1.0

3.3

The law of iterated expectations

The mean of Y is the weighted average of the conditional expectation of Y given X , weighted by the probability distribution of X . For example, the mean number of crashes M is the weighted average of the conditional expectation of M given that it is old and the conditional expectation of M given that it is new, so E (M ) = E (M |A = 0) P r(A = 0)+ E (M |A = 1) P r(A = 1) = 0.56 0.50+0.14 0.50 = 0.35. This is the mean of the marginal distribution calculated before (Table 1)

10

Formally, the expectation of Y is the expectation of the conditional expectation of Y given X, that is: E (Y ) = Ex [E (Y |X )] where the notation Ex [] indicates the expectation over the values of x. (29)

3.4

Conditional Variance

The variance of Y conditional on X is the variance of the conditional distribution of Y given X:

V ar[y |x]

= =

E [(y E [y |x])2 |x]


y

(y E [y |x])2 f (y |x)dy, if y is continuous

and (y E [y |x])2 f (y |x), if y is discrete

V ar[y |x] =

(30)

3.5

Decomposition of Variance

In a joint distribution,

V ar[y ] = V arx [E [y |x]] + Ex [V ar[y |x]] where the notation V arx [] indicates the variance over the distribution of X .

(31)

3.6

Independence

Two random variables X and Y are independently distributed, or independent, if knowing the value of one of the variables provides no information about the other. In other words, X and Y are independent if the conditional distribution of Y given X equals the marginal distribution of Y:

P r(Y = y |X = x) = P r(Y = y ) or

(independence of X and Y )

(32)

f (y |x) = f (y ) independent random variables is the product of their marginal distribution. f (x, y ) = f (x)f (y ) 11

(33)

Substitute the equation (33) into the equation (25) and one can see the joint distribution of two

(34)

3.7

Covariance and Correlation

Covariance is a measure of the extent to which two random variables move together. The covariance between X and Y is the expected value E [(X X )(Y Y )], where X is the mean of X and Y is the mean of Y .

Cov (X, Y )

=
x y

(x X )(y Y )f (x, y ), if y is discrete

=
x y

(x X )(y Y )f (x, y )dydx, if x is continuous

To interpret these formulas, suppose that when X is greater than its mean (so that X X is positive), then Y tends to be greater than its mean (so that Y Y is positive), and when when X is less than its mean (so that X X < 0 ), then Y tends to be less than its mean (so that Y Y 0). In both cases, the product (X X )(Y Y ) > 0, so that covariance is positive and we know that that X and Y tend to move in the same direction. When covariance is negative, X and Y tend to move in the opposite direction. If random variables X and Y are independent, regardless of their joint distribution, XY = 0. Note that the converse is not always true. Properties of Covariance: 1. Cov (X, X ) = V ar(X ) 2. Cov (X, Y ) = Cov (Y, X ) 3. Cov (aX, bY ) = abCov (X, Y ) 4. In any bivariate distribution, Cov (X, Y ) = Cov (X, E [Y |X ]) 5. If X and Y are independent then Cov (X, Y ) = 0 (no association no linear association). However, if cov (X, Y ) = 0 then X and Y are not necessarily independent. 6. If X and Y are jointly normally distributed and Cov (X, Y ) = 0, then X and Y are independent

3.8

Correlation

The correlation is an alternative measure of dependence between X and Y . The correlation between X and Y s the covariance between X and Y , divided by their standard deviations: Cov (X, Y ) V ar(X )V ar(Y ) XY X Y

Corr(X, Y ) = XY =

(35)

The random variables X and Y are said to be uncorrelated if corr(X, Y ) = 0. Properties of Corr(X, Y ) are: 12

1. 1 XY 1 2. If XY = 1 then X and Y are perfectly negatively linearly related. That is, Y = aX + b, where a < 0. 3. If XY = 1 then X and Y are perfectly positively linearly related. That is, Y = aX + b, where a > 0. 4. If XY = 0 then X and Y are not linearly related but may be nonlinearly related. 5. Corr(aX, bY ) = Corr(X, Y ) if a > 0 and b > 0; Corr(aX, bY ) = Corr(X, Y ) if a > 0, b < 0 or a < 0,b > 0.

3.9

The Mean and Variances of Sums of Random Variables

2 Let X, Y , and V be random variables, let X and X be the mean and variance of X , let XY be the

covariance between X and Y (and so forth for the other variables), and let a, b, and c be constants. The following facts follow from the denitions of the mean, variance and covariance:

E (a + bX + cY ) var(a + bY ) var(aX + bY ) E (Y 2 ) cov (a + bX + cV, Y ) E (XY )

= = = = = =

a + bX + cY
2 b 2 Y 2 2 + 2abXY + b2 y a 2 X 2 Y + 2 Y

(36)

(37)

bXY + cV Y XY + X Y

4
4.1

The Normal, Chi-Squared, Fm, , and Student t Distributions


The Normal Distribution

A continuous random variable with a normal distribution has the familiar bell-shaped form. The general form of a normal distribution with mean and standard deviation is: f (x|, 2 ) = 1 exp( 1 2 2 [x X ] ) 2X (38)

X 2

2 This result is usually denoted x N (X , X ). The normal density is symmetric around its mean

X and has 95% of its probability between X 1.96X and X + 1.96X .

13

Using numerical approximations, it can be shown that: P r(X X < X < X + X ) 0.67 P r(X 2X < X < X + 2X ) 0.95 P r(X 3X < X < X + 3X ) 0.99 The standard normal distribution is the normal distribution with mean = 0 and variance 2 = 1 and is denoted N (0, 1), with density: 1 1 (z ) = exp( z 2 ) 2 2 The specic notation (z ) is often used for this distribution and (Z ) for its cdf.

(39)

Total Risk

Systematic Risk

Idiosyncratic Risk

Market Factor

Size Factor

Value Factor

Industry Factors

Sector Factors

Figure 2: The pdf of normal distribution.

4.2

The Log-Normal Distribution

2 if A random variable Y is said to be log-normally distributed with parameters Y and Y

14

2 ln Y N (Y , Y )

(40)

2 Or let X N (X , X ) and dene Y = exp(X ). Then Y is log-normally distributed and is denoted 2 Y ln N (Y , Y ) and it can be shown that:

Y
2 Y

= =

2 X ) 2 2 )(exp( 2 ) 1) var(Y ) = exp(2X + X

E [Y ] = exp(X +

4.3

The Multivariate Normal Distribution

The normal distribution can be generalized to describe the joint distribution of a set of random variables. In this case, the distribution is called the multivariate normal distribution, or, if only two variables are being considered, the bivariate normal distribution. The multivariate normal distribution has three important properties: 1. If X and Y have a bivariate normal distribution with covariance XY , and if a and b are two constants, then
2 2 + 2abXY ) + b2 Y aX + bY N (aX + bY , a2 X

(41)

In general, if n random variables have a multivariate normal distribution, then any linear combination of these variables results in a random variable that is normally distributed. 2. If a set of variables has a multivariate normal distribution, then the marginal distribution of each of the variables is normal. 3. If variables with a multivariate normal distribution have covariances that equal zero, then the variables are independent. Therefore, if X and Y have a bivariate normal distribution and XY = 0, then X and Y are independent.

4.4

The Chi-Squared and Fm, Distribution

The chi-squared and Fm, distributions are used when testing certain types of hypothesis in statistics and econometrics. The chi-squared distribution is the distribution of the sum of m squared independent standard normal random variables. The distribution depends on m, which is called the degrees of freedom. For
2 2 + example, let Z1 , Z2 , Z3 and Z4 be independent standard normal random variables. Then Z1 + Z2 2 2 has a chi-square distribution with 4 degrees of freedom, denoted 4 . Z3 + Z4

15

8 6 4 2 0 4 2 0 2 4 4 2 2 0 4 Bivariate Standard Normal

Figure 3: The pdf of bivariate standard normal distribution.

16

The pdf of chi-squared distribution with r degrees of freedom has the following form: 1 xr/21 ex/2 , 0 < x < (r/2)2r/2

f (x) =

(42)

0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 40 (4), = 4, =8
2 2

(2), = 2, =4

2(6), = 6, 2 =16

Figure 4: The pdf of bivariate standard normal distribution. The Fm, distribution is the distribution of a random variable with a chi-squared distribution
2 2 2 2 with m degrees of freedom divided by m. Continuing with the previous example (Z 1 + Z2 + Z3 + Z4 )/4

has a F4, distribution.

4.5

The Student t Distribution

The Student t distribution with m degrees of freedom is dened to be the distribution of the ratio of a standard normal random variable, divided by the square root of an independently distributed chi-squared random variable with m degrees of freedom divided by m. Let Z be a standard normal random variable, i.e Z N (0, 1), let W be a random variable with a chi-squared distribution with m degrees of freedom, i.e. W m , and let Z and W be independently distributed. Then the random variable Z/ W/m has a Student t distribution with m degrees of

17

freedom, denoted tm .

Random Sampling and the Distribution of the Sample Average

Suppose that you have selected 20 UNCC students to measure their height in order to learn something about the average weight of students in UNCC Simple random sampling is a situation in which n objects are selected at random from a population(the population of all UNCC students) and each member of the population is equally likely to be included. The n observations in the sample are denoted Y1 , ..., Yn , where Y1 is the rst observation, Y2 is the second observation, and so forth. Because the members of the population included in the sample are selected at random, the values of observations Y1 , ..., Yn are themselves random. If dierent members of the population are choses, their values of Y will dier.

5.1

i.i.d. draws

Because Y1 , ..., Yn are randomly drawn from the same population, the marginal distribution of Y i is the same for each i = 1, ..., n. When Yi has the same marginal distribution for i = 1, ..., n, then Y1 , ..., Yn are said to be identically distributed. When Y1 , ..., Yn are drawn from the same distribution and are independently distributed, they are said to be independently and identically distributed, or i.i.d..

5.2

The Sampling Distribution of the Sample Average

, of the n observations Y1 , ..., Yn is The sampling average, Y = 1 Y n


n

Yi
i=1

(43)

An essential concept is that the act of drawing a random variable has the eect of making the sample a random variable. Because the sample Y1 , ..., Yn is random, their average is random, i.e. the average Y average depends on the sample that is realized. is random, it has a probability distribution. The distribution of Y is called the sampling Because Y . distribution of Y
2 Suppose that the observations Y1 , ..., Yn are i.i.d., and let Y and Y denote the mean and variance

of Yi , i = 1, ..., n. Apply the formula (36) to nd that the mean of the sample average is: 18

) = E( 1 E (Y n

Yi ) =
i=1

1 E (Y1 + ... + Yn ) = Y n

(44)

Apply the formula (37) to nd the variance of the sample mean: ) = V ar( 1 var(Y n
n

Yi ) =
i=1

2 1 V ar ( Y + ... + Y ) = 1 n n2 n

(45)

are: In summary, the mean, the variance and the standard deviation of Y

) E (Y ) V ar(Y ) std.dev (Y

= = =

Y 2 Y n Y n

2 Suppose that Y1 , ..., Yn are i.i.d. draws from N (Y , Y ) distribution. Using the property of multi-

variate normal distribution in (41), the sum of n normally distributed random variables is itself normally
2 , i.e. distributed. Therefore, the sample average is normally distributed with mean Y and variance Y

N (Y , 2 ). Y Y There are two approaches to characterizing sampling distributions: an exact and an approximate approach. The exact approach entails deriving a formula for the sampling distribution that holds exactly for for an n is called the exact any value n. The sampling distribution that describes the distribution of Y . distribution or nite-sample distribution of Y The approximate approach uses approximations to the sampling distribution that rely on the sample size being large. The large sample approximation to the sampling distribution is often called the asymptotic distribution. The term asymptotic is used to the fact that the approximations become exact in the limit n . There are two key tools used to approximate sampling distribution when the sample size is large: 1. The law of large numbers 2. The central limit theorem

5.3

The Law of Large Numbers and Consistency

will be near Y with very high probability when n is large. The law of large numbers states that Y is near Y with increasing probability as n increases is called convergence in The property that Y
p probability or consistency, written Y Y .

19

The law of large number says that if Yi , i = 1, ..., n are independently and identically distributed
p 2 with E (Yi ) = Y and var(Yi ) = Y < , then Y Y .

Formally, the random variable ZT converges in probability to a constant c if limn P r(|Z c| > e) = 0 for any positive e.

5.4

The Central Limit Theorem

is well approxiThe central limit theorem says that, under general conditions, the distribution of Y mated by a normal distribution when n is large. Central Limit Theorem. If Y1 , ..., Yn are a random sample from a probability distribution with
2 = nite mean Y and nite variance Y and Y 1 n n i=1 d

Yi , then (46)

2 Y ) N (0, Y n( Y )

20

S-ar putea să vă placă și