Documente Academic
Documente Profesional
Documente Cultură
References Hogg, R. V., and A. T. Craig, 1995. Introduction to Mathematical Statistics. Prentice-Hall. J. H. Stock and M. W. Watson, 2003. Introduction to Econometrics. Addison-Wesley. (Chapter 2 - advanced undergraduate level book) E. Zivot, 2002. Lecture Notes on Applied Econometric Modeling in Finance. The web link is: http://faculty.washington.edu/ezivot/econ483/483notes.htm W. H. Greene, 2000. Econometric Analysis. Prentice Hall. (Chapters 3, 4 - introductory graduate level book)
We view the observation on some aspect of the economy (or real life) as the outcome of a random experiment. The probability of an outcome is the proportion of the time that the outcome occurs in the long run. If the probability of your computer not crashing while you are doing a problem set is 90 %, then over the course of doing many problem sets, you will complete 90 % without a crash. The set of all possible outcomes is called the sample space, denoted S X . An event is a subset of the sample space, i.e an event is a set of one or more outcomes. A random variable is a numerical summary of a random outcome. Some random variables are discrete and some are continuous. Denition A random variable X is a variable that can take on a given set of values, called the sample space and denoted SX , where the likelihood of the values in SX is determined by Xs probability distribution (pdf).
1.1
The probability distribution of a discrete random variable, denoted f (x), is the list of all possible values of the variable and the probability that each value will occur, i.e. f (x) = P r(X = x). The pdf mus satisfy (i) f (x) 0 for all x SX ; (ii) f (x) = 0 for all x / SX ; and (iii) variable is less than or equal to a particular value: F (x) = P r(X x), x The cdf has the following properties: 1 (1)
xSX
f (x) = 1.
The cumulative probability distribution (c.d.f ), denoted F , is the probability that the random
1. If x1 < x2 then F (x1 ) F (x2 ) 2. F () = 0 and F () = 1 3. P r(X > x) = 1 F (x) 4. P r(x1 < X x2 ) = F (x2 ) F (x1 ) Table 1: Probability of Your Computer Crashing M Times Outcome (number of crashes) 0 Probability distribution Cumulative prob. distribution 0.80 0.80 1 0.10 0.90 2 0.06 0.96 3 0.03 0.99 4 0.01 1.00
An important special case of a discrete random variable is binary, i.e. outcomes are 0 and 1. A binary random variable is called a Bernoulli random variable and its probability is called the Bernoulli distribution.
1.2
Unlike discrete random variable, a continuous random variable takes on a continuum of possible values and it is not possible to list the probability of each possible value of random variables. The probability is summarized by the probability density function (p.d.f.), denoted f (x). The area under the probability function between any two points is the probability that the random variable falls between those two points. Formally, the pdf of a continuous random variable X is a nonnegative function f (x), dened on the real line, such that for any interval A:
f (x)d(x)
A
(2)
That is, P r(X A) is the area under the probability curve over the interval A. The pdf f (x) must f (x)dx = 1. Example 1 Let the random variable X of the continuous type have the pdf f (x) = 2/x 3 , 1 < x < , zero elsewhere. The distribution function of X is
F (x)
= =
1
0dw = 0, x < 1
2 1 dw = 1 2 , 1 x w x
Data
Factor returns {F }T Factor exposures {B }
T t t=1
t t=1
market returns size mimicking portfolio return (FF 92) value mimicking portfolio return industry mimicking portfolio return momentum mimicking portfolio return, etc
Exposure for all stock for all periods market capitalization of each company B/M ratio for each company E/P ratio, D/P ratio leverage ratios industry and sector dummies
Timeseries regressions
Crosssectional regressions
Estimate i, i=1,...,N
Figure 1: Pdf and Cdf of random variable in Example 1. Example 2 let f (x) = 1 2 , 1 < x < 1, zero elsewhere, be the pdf of the random variable X. Dene the random variable Y by Y = X 2 . We wish to nd the pdf of Y. If y 0, the probability P r(Y y ) is equivalent to P r(X 2 y ) = P r( y X sqrty ) Accordingly, the distribution function of Y, G(y ) = P r(Y y ), is given by
(3)
G(y )
0, y < 0 3
= =
1 dx = y, 0 y < 1 2 1, 1 y
y
Since Y is a random variable of the continuous type, the pdf of Y is g (y ) = G (y ) at all point of continuity of g (y ). Therefore,
g (y )
= =
1 , 0<y<1 2 y 0, elsewhere
The expected value of a random variable Y, denoted E (Y ), is the long-run average value of the random variable over many repeated trials or occurrences. The expected value of Y is also called the expectation of Y or the mean of Y. The terminology of expectation or expected value has its origin in games of chance. This can be illustrated as follows: four similar chips, numbered 1,1,1, and 2, respectively, are placed in a bowl and are mixed. A player is blindfolded and is to draw a chip from the bowl. If she draws one the three chips numbered 1, she will receive one dollar. If she draws the chip numbered 2, she receive two dollars. It
1 seems reasonable to assume that the players has a 3 4 claim on the $1 and a 4 claim on the $2. Her 3 4 1 4
+2
Suppose that the random variable Y takes on k possible outcomes, y1 , y2 , ..., yk , where y1 denotes the rst value, y2 denotes the second value, etc., and the probability that Y takes on y1 is p1 , the probability that Y takes on y2 is p2 , and so forth. The expected value of Y is:
k
E (Y ) = y1 p1 + y2 p2 + ... + yk pk =
i=1
y i pi
(4)
E [x] =
a
xf (x)dx
(5)
where f (x) is probability density function and x takes on values between points a and b. As an example, consider the number of computer crashes M with the probability distribution given in Table 1. The expected value of M is the average number of crashes over many problem sets, weighted by the frequency with which a crash of a given size occurs:
(6)
That is the expected number of computer crashes while doing a particular problem set is 0.35. Obviously, the actual number of crashes is an integer. The calculation above just means that the average number of crashes over many problem sets is 0.35. Example 3 Let the random variable X of the discrete type have the pdf given by the table x f(x) 1
4 10
2
1 10
3
3 10
4
2 10
Here f (x) = 0 if x is not equal to one of the rst four positive integers. This illustrates the fact that there is no need to have a formula to describe a pdf. We have 4 1 3 2 +2 +3 +4 = 2.3 10 10 10 10 Example 4 Let X have the pdf: f (x) = = Then
1 1
E (X ) = 1
(7)
E (X ) =
0
x(4x3 )dx =
0
4x4 dx =
4x5 5
=
0
4 5
(8)
2.1
The variance and standard deviation measure the dispersion or the spread of a probability distribution. The variance of random variable Y is the expected value of the square of the deviation of Y from its mean:
V ar(Y )
= =
E (Y E (Y ))2
y
=
y
The standard deviation is the square root of the variance. The mean of Y, E (y ), is also called the rst moment of Y, and the expected value of square of Y, E (Y 2 ), is also called the second moment of Y. In general, the expected value of Y r is called the r-th moment of the random variable Y. Example 5 Let X have the pdf 1 (x + 1), 1 < x < 1 2 0 elsewhere
f (x)
= =
xf (x)dx =
1
x+1 1 dx = 2 3
(9)
x2 f (x)dx 2 =
x2
1
x+1 1 2 dx ( )2 = 2 3 9
(10)
(11)
f (x)
= =
, x = 1, 2, 3, ..., (12)
0 elsewhere
is the pdf of a discrete type of random variable. The skewness of a random variable X, denoted skew(X ), measures the symmetry of a distribution about its mean. The skewness is dened: E [(X X )3 ] 3 X
xSX (x (x
skew(X )
= = =
X )3 f (x)dx
3 X
If the random variable X has a symmetric distribution then skew(X ) = 0. If skew(X ) > 0 then the distribution X has a long right tail (positive value are more likely than negative ones) and if skew(X ) < 0 the distribution of X has a long left tail. The kurtosis of a random variable X , denoted kurt(X ), measures the thickness in the tails of a distribution. The kurtosis is dened:
kurt(X )
= = =
E [(X X )4 ] 4 X
xSX (x (x
X )4 f (x)dx
4 X
The normal distribution has a kurtosis of 3. If a distribution has a kurtosis greater than 3 then the distribution has thicker tails than the normal distribution and if a distribution has kurtosis less than 3 then the distribution has thinner tails than the normal distribution.
2.2
Suppose that after-tax earnings Y are related to pre-tax earnings X by the equation:
Y = 2, 000 + 0.8X
(13)
where $2, 000 is the amount of grant. Suppose an individual pre-tax earnings next year are a random
2 . Since pre-tax earnings are random, so are after-tax earnings. variable with mean X and variance X
= =
Y = 2, 000 + 0.8X
2 0.82 X
(14)
In general, if Y depends on X with and intercept a and a slope b, so that: Y = a + bX Then the mean and variance of Y are: (15)
Y
2 Y
= =
a + bX
2 b2 X
(16)
The joint probability distribution of two discrete random variables, X and Y , is the probability that the random variables simultaneously take on certain values, x and y . Consider an example of a joint distribution of two variables in Table 2. Let Y be a binary random variable that equals one if the commute is short (less than 20 minutes) and equal zero otherwise, and let X be a binary random variable that equals zero id it is raining and one if not. The joint distribution is the frequency with which each of these four outcomes occur over many repeated commutes. Table 2: Joint Distribution of Weather Conditions and Commuting Times Rain (X=0) Long Commute (Y=0) Short Commute (Y=1) Total 0.15 0.15 0.30 No Rain (X=1) 0.07 0.63 0.70 Total 0.22 0.78 1.00
Formally, the joint density function for two random variables X and Y , denoted f (x, y ) is dened so that:
f (x, y )
axb cy d
(17)
f (x, y )dydx
axb cy d
(18)
The marginal probability distribution of a random variable Y is just another name for its probability distribution. The term is used to distinguish the distribution of Y alone (the marginal distribution) from the join distribution of Y and another random variable. The marginal distribution of Y can be computed from join distribution of X and Y by adding up the probabilities of all possible outcomes for which Y takes on a specied value. For example, in Table 2, the probability of a long rainy commute is 15% and the probability of a long rain with no rain is 7%, so the probability of a long commute (rainy or not) is 22%. Formally, to obtain the marginal distribution from the joint density, it is necessary to sum or integrate out the other variables:
fx (x)
=
y SY
=
y
f (x1 , x2 )
= =
(19) (20)
f1 (x1 ) =
0
(21)
f2 (x2 ) =
0
(22)
zero elsewhere. A probability like P r(X1 1 2 ) can be computed from either f1 (x1 ) or f (x1 , x2 ) because
1/2 0 0 1 1/2
(23)
However to nd a probability like P r(X1 + X2 1), one must use the joint pdf.
3.1
Conditional Distributions
The distribution of a random variable Y conditional on another random variable X taking on a specic value is called the conditional distribution of Y given X. What is the probability of a long commute (Y = 0) if you know it is raining (X = 0)? From Table 2, the joint distribution of a rainy short commute is 15% and the joint probability of a rainy long commute is 15%, so if it is raining a long commute and a short commute are equally likely. Thus the probability of a long commute (Y = 0), conditional on it being rainy (X = 0) is 50%, i.e. P r(Y = 0|X = 0) = 0.50. In general the conditional distribution of Y given X = x is P r(Y = y |X = x) = or f (x, y ) fx (x) P r(X = x, Y = y ) P r(X = x) (24)
f (y |x) = 9
(25)
3.2
Conditional Expectation
The conditional expectation of Y given X, also called the conditional mean of Y given X, is the man of the conditional distribution of Y given X. That is, the conditional expectation of is the expected value of Y, computed using the conditional distribution of Y given X. If Y takes on values y1 , ..., yk , then the conditional mean of Y given X = x is
k
E (Y |X = x) = or E (y |x) = E (y |x) =
y
i=1
P r(Y = yi |X = x)
(26)
yf (y |x), if y is discrete
(27)
yf (y |x)dy, if y is continuous
(28)
Consider an example in Table 3. The expected number of computer crashes, given the computer is old, is E (M |A = 0) = 0.70 + 1 0.13 + 2 0.10 + 3 0.05 + 4 0.02 = 0.56. The expected number of computer crashes given that the computer is new, is E (M |A = 1) = 0.14. Table 3: Joint and Conditional Distributions of Computer Crashes(M) and Computer Age (A) A. Joint Distribution M=0 Old computer (A=0) New computer (A=1) Total 0.35 0.45 0.8 M= 1 0.065 0.035 0.1 M= 2 0.05 0.01 0.06 M= 3 0.025 0.005 0.03 M= 4 0.01 0.00 0.01 Total 0.5 0.5 1.0
B. Conditional Distributions of M given A M=0 Pr(MA=0) Pr(MA=1) 0.70 0.90 M= 1 0.13 0.07 M= 2 0.10 0.02 M= 3 0.05 0.01 M= 4 0.02 0.00 Total 1.0 1.0
3.3
The mean of Y is the weighted average of the conditional expectation of Y given X , weighted by the probability distribution of X . For example, the mean number of crashes M is the weighted average of the conditional expectation of M given that it is old and the conditional expectation of M given that it is new, so E (M ) = E (M |A = 0) P r(A = 0)+ E (M |A = 1) P r(A = 1) = 0.56 0.50+0.14 0.50 = 0.35. This is the mean of the marginal distribution calculated before (Table 1)
10
Formally, the expectation of Y is the expectation of the conditional expectation of Y given X, that is: E (Y ) = Ex [E (Y |X )] where the notation Ex [] indicates the expectation over the values of x. (29)
3.4
Conditional Variance
V ar[y |x]
= =
V ar[y |x] =
(30)
3.5
Decomposition of Variance
In a joint distribution,
V ar[y ] = V arx [E [y |x]] + Ex [V ar[y |x]] where the notation V arx [] indicates the variance over the distribution of X .
(31)
3.6
Independence
Two random variables X and Y are independently distributed, or independent, if knowing the value of one of the variables provides no information about the other. In other words, X and Y are independent if the conditional distribution of Y given X equals the marginal distribution of Y:
P r(Y = y |X = x) = P r(Y = y ) or
(independence of X and Y )
(32)
f (y |x) = f (y ) independent random variables is the product of their marginal distribution. f (x, y ) = f (x)f (y ) 11
(33)
Substitute the equation (33) into the equation (25) and one can see the joint distribution of two
(34)
3.7
Covariance is a measure of the extent to which two random variables move together. The covariance between X and Y is the expected value E [(X X )(Y Y )], where X is the mean of X and Y is the mean of Y .
Cov (X, Y )
=
x y
=
x y
To interpret these formulas, suppose that when X is greater than its mean (so that X X is positive), then Y tends to be greater than its mean (so that Y Y is positive), and when when X is less than its mean (so that X X < 0 ), then Y tends to be less than its mean (so that Y Y 0). In both cases, the product (X X )(Y Y ) > 0, so that covariance is positive and we know that that X and Y tend to move in the same direction. When covariance is negative, X and Y tend to move in the opposite direction. If random variables X and Y are independent, regardless of their joint distribution, XY = 0. Note that the converse is not always true. Properties of Covariance: 1. Cov (X, X ) = V ar(X ) 2. Cov (X, Y ) = Cov (Y, X ) 3. Cov (aX, bY ) = abCov (X, Y ) 4. In any bivariate distribution, Cov (X, Y ) = Cov (X, E [Y |X ]) 5. If X and Y are independent then Cov (X, Y ) = 0 (no association no linear association). However, if cov (X, Y ) = 0 then X and Y are not necessarily independent. 6. If X and Y are jointly normally distributed and Cov (X, Y ) = 0, then X and Y are independent
3.8
Correlation
The correlation is an alternative measure of dependence between X and Y . The correlation between X and Y s the covariance between X and Y , divided by their standard deviations: Cov (X, Y ) V ar(X )V ar(Y ) XY X Y
Corr(X, Y ) = XY =
(35)
The random variables X and Y are said to be uncorrelated if corr(X, Y ) = 0. Properties of Corr(X, Y ) are: 12
1. 1 XY 1 2. If XY = 1 then X and Y are perfectly negatively linearly related. That is, Y = aX + b, where a < 0. 3. If XY = 1 then X and Y are perfectly positively linearly related. That is, Y = aX + b, where a > 0. 4. If XY = 0 then X and Y are not linearly related but may be nonlinearly related. 5. Corr(aX, bY ) = Corr(X, Y ) if a > 0 and b > 0; Corr(aX, bY ) = Corr(X, Y ) if a > 0, b < 0 or a < 0,b > 0.
3.9
2 Let X, Y , and V be random variables, let X and X be the mean and variance of X , let XY be the
covariance between X and Y (and so forth for the other variables), and let a, b, and c be constants. The following facts follow from the denitions of the mean, variance and covariance:
= = = = = =
a + bX + cY
2 b 2 Y 2 2 + 2abXY + b2 y a 2 X 2 Y + 2 Y
(36)
(37)
bXY + cV Y XY + X Y
4
4.1
A continuous random variable with a normal distribution has the familiar bell-shaped form. The general form of a normal distribution with mean and standard deviation is: f (x|, 2 ) = 1 exp( 1 2 2 [x X ] ) 2X (38)
X 2
2 This result is usually denoted x N (X , X ). The normal density is symmetric around its mean
13
Using numerical approximations, it can be shown that: P r(X X < X < X + X ) 0.67 P r(X 2X < X < X + 2X ) 0.95 P r(X 3X < X < X + 3X ) 0.99 The standard normal distribution is the normal distribution with mean = 0 and variance 2 = 1 and is denoted N (0, 1), with density: 1 1 (z ) = exp( z 2 ) 2 2 The specic notation (z ) is often used for this distribution and (Z ) for its cdf.
(39)
Total Risk
Systematic Risk
Idiosyncratic Risk
Market Factor
Size Factor
Value Factor
Industry Factors
Sector Factors
4.2
14
2 ln Y N (Y , Y )
(40)
2 Or let X N (X , X ) and dene Y = exp(X ). Then Y is log-normally distributed and is denoted 2 Y ln N (Y , Y ) and it can be shown that:
Y
2 Y
= =
E [Y ] = exp(X +
4.3
The normal distribution can be generalized to describe the joint distribution of a set of random variables. In this case, the distribution is called the multivariate normal distribution, or, if only two variables are being considered, the bivariate normal distribution. The multivariate normal distribution has three important properties: 1. If X and Y have a bivariate normal distribution with covariance XY , and if a and b are two constants, then
2 2 + 2abXY ) + b2 Y aX + bY N (aX + bY , a2 X
(41)
In general, if n random variables have a multivariate normal distribution, then any linear combination of these variables results in a random variable that is normally distributed. 2. If a set of variables has a multivariate normal distribution, then the marginal distribution of each of the variables is normal. 3. If variables with a multivariate normal distribution have covariances that equal zero, then the variables are independent. Therefore, if X and Y have a bivariate normal distribution and XY = 0, then X and Y are independent.
4.4
The chi-squared and Fm, distributions are used when testing certain types of hypothesis in statistics and econometrics. The chi-squared distribution is the distribution of the sum of m squared independent standard normal random variables. The distribution depends on m, which is called the degrees of freedom. For
2 2 + example, let Z1 , Z2 , Z3 and Z4 be independent standard normal random variables. Then Z1 + Z2 2 2 has a chi-square distribution with 4 degrees of freedom, denoted 4 . Z3 + Z4
15
16
The pdf of chi-squared distribution with r degrees of freedom has the following form: 1 xr/21 ex/2 , 0 < x < (r/2)2r/2
f (x) =
(42)
0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 5 10 15 20 25 30 35 40 (4), = 4, =8
2 2
(2), = 2, =4
2(6), = 6, 2 =16
Figure 4: The pdf of bivariate standard normal distribution. The Fm, distribution is the distribution of a random variable with a chi-squared distribution
2 2 2 2 with m degrees of freedom divided by m. Continuing with the previous example (Z 1 + Z2 + Z3 + Z4 )/4
4.5
The Student t distribution with m degrees of freedom is dened to be the distribution of the ratio of a standard normal random variable, divided by the square root of an independently distributed chi-squared random variable with m degrees of freedom divided by m. Let Z be a standard normal random variable, i.e Z N (0, 1), let W be a random variable with a chi-squared distribution with m degrees of freedom, i.e. W m , and let Z and W be independently distributed. Then the random variable Z/ W/m has a Student t distribution with m degrees of
17
freedom, denoted tm .
Suppose that you have selected 20 UNCC students to measure their height in order to learn something about the average weight of students in UNCC Simple random sampling is a situation in which n objects are selected at random from a population(the population of all UNCC students) and each member of the population is equally likely to be included. The n observations in the sample are denoted Y1 , ..., Yn , where Y1 is the rst observation, Y2 is the second observation, and so forth. Because the members of the population included in the sample are selected at random, the values of observations Y1 , ..., Yn are themselves random. If dierent members of the population are choses, their values of Y will dier.
5.1
i.i.d. draws
Because Y1 , ..., Yn are randomly drawn from the same population, the marginal distribution of Y i is the same for each i = 1, ..., n. When Yi has the same marginal distribution for i = 1, ..., n, then Y1 , ..., Yn are said to be identically distributed. When Y1 , ..., Yn are drawn from the same distribution and are independently distributed, they are said to be independently and identically distributed, or i.i.d..
5.2
Yi
i=1
(43)
An essential concept is that the act of drawing a random variable has the eect of making the sample a random variable. Because the sample Y1 , ..., Yn is random, their average is random, i.e. the average Y average depends on the sample that is realized. is random, it has a probability distribution. The distribution of Y is called the sampling Because Y . distribution of Y
2 Suppose that the observations Y1 , ..., Yn are i.i.d., and let Y and Y denote the mean and variance
of Yi , i = 1, ..., n. Apply the formula (36) to nd that the mean of the sample average is: 18
) = E( 1 E (Y n
Yi ) =
i=1
1 E (Y1 + ... + Yn ) = Y n
(44)
Apply the formula (37) to nd the variance of the sample mean: ) = V ar( 1 var(Y n
n
Yi ) =
i=1
2 1 V ar ( Y + ... + Y ) = 1 n n2 n
(45)
are: In summary, the mean, the variance and the standard deviation of Y
) E (Y ) V ar(Y ) std.dev (Y
= = =
Y 2 Y n Y n
2 Suppose that Y1 , ..., Yn are i.i.d. draws from N (Y , Y ) distribution. Using the property of multi-
variate normal distribution in (41), the sum of n normally distributed random variables is itself normally
2 , i.e. distributed. Therefore, the sample average is normally distributed with mean Y and variance Y
N (Y , 2 ). Y Y There are two approaches to characterizing sampling distributions: an exact and an approximate approach. The exact approach entails deriving a formula for the sampling distribution that holds exactly for for an n is called the exact any value n. The sampling distribution that describes the distribution of Y . distribution or nite-sample distribution of Y The approximate approach uses approximations to the sampling distribution that rely on the sample size being large. The large sample approximation to the sampling distribution is often called the asymptotic distribution. The term asymptotic is used to the fact that the approximations become exact in the limit n . There are two key tools used to approximate sampling distribution when the sample size is large: 1. The law of large numbers 2. The central limit theorem
5.3
will be near Y with very high probability when n is large. The law of large numbers states that Y is near Y with increasing probability as n increases is called convergence in The property that Y
p probability or consistency, written Y Y .
19
The law of large number says that if Yi , i = 1, ..., n are independently and identically distributed
p 2 with E (Yi ) = Y and var(Yi ) = Y < , then Y Y .
Formally, the random variable ZT converges in probability to a constant c if limn P r(|Z c| > e) = 0 for any positive e.
5.4
is well approxiThe central limit theorem says that, under general conditions, the distribution of Y mated by a normal distribution when n is large. Central Limit Theorem. If Y1 , ..., Yn are a random sample from a probability distribution with
2 = nite mean Y and nite variance Y and Y 1 n n i=1 d
Yi , then (46)
2 Y ) N (0, Y n( Y )
20