Documente Academic
Documente Profesional
Documente Cultură
y = b
0
+b
1
x is a linear equation with one variable, or
equivalently, a straight line.
b
0
and b
1
are constants, b
0
is the y-intercept and b
1
is the
slope of the line, y is the dependent variable, and x is the
independent variable
y
yP (Y = y|X = x)
10
Example 2, contd
Table: Conditional Distribution of Y , savings, given X, income:
P (Y |X)
X (in 1000)
Y 1.4 3.0 4.9 7.8 14.2
0.45 0.112 0.149 0.125 0.110 0.200
0.18 0.142 0.183 0.264 0.435 0.382
0.05 0.440 0.377 0.329 0.277 0.297
-0.11 0.172 0.200 0.208 0.152 0.091
-0.25 0.134 0.091 0.074 0.026 0.030
Column sum 1 1 1 1 1
Y |X
0.045 0.074 0.079 0.119 0.156
What is the relationship between the conditional mean and
income level?
11
Empirical and Theoretical Relationships
yY
(y c(X))
2
P(y)
is minimized
14
Prediction: The best constant prediction
yY
(y c)
2
P(y) c = E[Y ] =
Y
In the case that we have bivariate data (Y, X), say savings
rate and income levels, then we can make use of the
relationship between them to predict Y
So, we need to nd c
0
and c
1
such that
E[U
2
] =
yY
(y c(X))
2
P(y) is minimized
17
Prediction: The best linear prediction
By replacing c(X) by c
0
+c
1
X, and solving for the
minimization problem we get
c
0
=
0
= E(Y )
1
E(X) =
Y
1
X
c
1
=
1
=
C(X, Y )
V (X)
=
XY
2
X
The function
0
+
1
X is the linear projection (or the
best linear prediction) of Y given
X L(Y |X) =
0
+
1
X
18
Example 2, contd
For E[XY ]:
XY P(XY) XY P(XY) XY P(XY) XY P(XY)
-3.55 0.005 -0.75 0.016 0.07 0.059 0.54 0.032
-1.95 0.008 -0.54 0.045 0.15 0.066 0.63 0.015
-1.56 0.015 -0.35 0.018 0.25 0.071 0.71 0.049
-1.23 0.016 -0.33 0.035 0.25 0.019 0.88 0.057
-0.86 0.047 -0.15 0.023 0.39 0.086 1.35 0.026
E[XY ] =
5
i=1
5
j=1
X
i
Y
j
P (XY = X
i
Y
j
) = 0.783
19
Example 2, contd
E[X
2
] = 1.42
2
0.134 + 3.02
2
0.175 + 4.92
2
0.216 +
7.82
2
0.310 + 14.22
2
0.165 = 59.155
V (X) = E[X
2
] (E[X])
2
= 59.155 6.5322
2
= 16.488
c
0
=
0
= E(Y )
1
E(X) = 0.0433
c
1
=
1
=
C(X, Y )
V (X)
= 0.00845
20
Example 2, contd
_
0.043278 + 0.008451 3.0 = 0.069 if X = 3.0
0.043278 + 0.008451 4.9 = 0.085 if X = 4.9
0.043278 + 0.008451 7.8 = 0.1092 if X = 7.8
0.043278 + 0.008451 14.2 = 0.1633 if X = 14.2
21
Best Prediction
= (
0
,
1
) Parameter vector, population parameters
Error term
27
Simple Linear Regression: Assumptions
A1: Linear in parameters
For the all values of X, the mean error rate is same and
equal to 0.
In particular, C(X, ) = 0
C(X, ) = E[X] E[X]E[]
E[X] = E[E[X|X]] = E[XE[|X]] = 0
E(X)E() = 0
29
Simple Linear Regression: Assumptions
E[Y |X] =
0
+
1
X, the conditional expectation function
is linear, hence
C(Y, X) = C (E[Y |X], X) =
1
V (X)
1
=
C(Y, X)
V (X)
E[Y ] = E [E[Y |X]] =
0
+
1
E[X]
0
= E[Y ]
1
E[X]
This implies V () =
2
V (|X) = E[
2
|X] E [(E[|X])] = E[
2
|X] =
2
E[
2
] = E
_
E[
2
|X]
=
2
V () = E[
2
] (E[])
2
=
2
This implies:
1
=
E[Y |X]
X
. That is, When X increases
by one unit, Y , on average, increases by
1
units.
32
Simple Linear Regression: Interpretation of
Coecients
E[Y |X = 0] = E [
0
+
1
X +|X = 0] =
0
0
is the mean value of Y , when X = 0
0
is the intercept term
Our Model: Y =
0
+
1
X +, where E[|X] = 0 and
V (|X) =
2
= (
0
,
1
) and
2
39
Estimation: The Analogy Principle
i=1
y
i
n
to estimate the population
mean
Use sample variance s
2
=
n
i=1
_
y
i
Y
_
2
n 1
to estimate the
population variance
k=1
(x
k
)
i
n
40
Estimation: The Analogy Principle
0
= E(Y )
1
E(X)
1
=
C(X, Y )
V (X)
41
Estimation: The Analogy Principle
0
=
Y
1
X
1
=
n
i
(X
i
X)(Y
i
Y )
n
i
(X
i
X)
2
=
1
n
n
i
(X
i
X)Y
i
1
n
n
i
(X
i
X)
2
=
S
XY
S
2
X
42
Estimation: The Ordinary Least Squares (OLS)
min E[
2
], or equivalently, min E
_
(Y
0
+
1
X)
2
_
i
= Y
i
E[Y
i
|X
i
]
43
The Ordinary Least Squares (OLS)
1
X
i
0
,
1
1
n
n
i=1
2
i
44
The Ordinary Least Squares (OLS)
FOC are:
n
i=1
= 0
n
i=1
X
i
= 0
0
=
Y
1
X
1
=
n
i=1
x
i
Y
i
n
i=1
x
2
i
=
n
i=1
c
i
Y
i
where x
i
= (X
i
X) and
c
i
=
x
i
n
i=1
x
2
i
45
The Ordinary Least Squares (OLS)
Predicted values:
Y
i
=
1
X
i
Y
i
i
= 0
46
Estimation Terminology
0
=
Y
1
X
1
= 0.003
47
The Properties of The OLS Estimators
P1 Linear in Y
0
=
Y
1
X
1
=
i
c
i
Y
i
48
The Properties of The OLS Estimators
P2 Unbiasedness: Expected value of the estimator is equal to
the parameter itself
E[
0
] =
0
E[
1
] =
1
49
The Properties of The OLS Estimators
The unbiasedness property indicates that if we have innitely
many samples of size-n from the same population and estimate
the same model with each of the samples:
1
] =
1
1
=
i
c
i
Y
i
=
i
c
i
(
0
+
1
X
i
+
i
)
=
0
i
c
i
+
1
i
c
i
X
i
+
i
c
i
i
=
1
i
c
i
X
i
+
i
c
i
i
_
because
i
c
i
= 0
_
=
1
+
i
c
i
i
_
because
i
c
i
X
i
= 1
_
51
The Properties of The OLS Estimators
E[
1
|X] = E
_
1
+
i
c
i
i
|X
_
=
1
+
i
c
i
E[
i
|X]
. .
0
=
1
E[
1
] = E [E[
1
|X]]
= E[
1
] =
1
52
The Properties of The OLS Estimators
The Variance
From A1, A2, and A3, we have
V (
0
) =
i
X
2
i
n
V (
1
)
V (
1
) =
2
n
E
_
1
S
2
X
_
53
The Properties of The OLS Estimators
Lets show that V (
1
) =
2
n
E
_
1
S
2
X
_
V (
1
) = E
_
_
1
1
_
2
_
= E
_
_
_
i
c
i
i
_
2
_
_
=
i
E[c
2
i
2
i
]
= E
_
i
E
_
c
2
i
2
i
|X
_
= E
_
i
c
2
i
E
_
2
i
|X
_
=
2
E
_
i
c
2
i
_
(by A3)
=
2
E
_
i
_
x
i
i
x
2
i
_
2
_
=
2
E
_
i
1
n
1
n
(
i
x
2
i
)
2
_
=
2
n
E
_
1
S
2
X
_
54
The Properties of The OLS Estimators:
Gauss-Markov Theorem
0
and
0
are the ecient estimators among
all linear unbiased estimators
0
and
0
are the most ecient estimators among all the
linear unbiased estimators Best Linear Unbiased
Estimator
55
The Properties of The OLS Estimators:
Gauss-Markov Theorem
0
and
1
are consistent if
p lim
n
j
=
j
, j = 0, 1
in other words,
lim
n
P
_
j
j
<
_
= 1 > 0
0
and
1
become more and more concentrated near the true
values
0
and
1
as sample size increases, so that the
probability of
0
and
1
being arbitrarily close to
0
and
1
converges to one
0
and
1
, depends on
2
= V () = E[
2
]
1
X
=
0
+
1
X +
1
X
=
_
0
0
_
1
1
_
X
Even though E[
j
] =
j
for j = 0, 1, =
57
The Variance of The OLS Estimators
2
instead of E[
2
]. However, this is not feasible
2
. However, this is biased. The
bias comes from unmatched degrees of freedom and
denominator
i
2
i
n 2
58
The Variance of The OLS Estimators
Recall that V (
1
) =
2
n
E
_
1
S
2
X
_
We estimate V (
1
) by
2
V (
1
) =
2
nS
2
X
, where E
_
1
S
2
X
_
is estimated by
1
S
2
X
Recall V (
0
) =
i
X
2
i
n
V (
1
)
We estimate V (
0
) by
2
V (
0
) =
2
i
X
2
i
n
2
S
2
X
59
The Goodness of Fit Measures
Coecient of Determination, R
2
, is a very common
measure of goodness of t
R
2
=
i
y
2
i
i
y
2
i
= 1
2
i
i
y
2
i
(1)
where y
i
= Y
i
Y , y
i
=
Y
i
Y , and
i
= Y
i
Y
i
Notice that y
i
= y
i
+
i
. This implies that 0
i
y
2
i
i
y
2
i
1
If R
2
= 1, the model explains all of the variation (100%) in
Y
If R
2
= 0, the model explains none of the variation (0%) in
Y
61
The Goodness of Fit Measures
A low R
2
implies:
The model fails to explain a signicant proportion of the
variation of Y.
A low R2 does not necessarily mean that the estimates are
unhelpful or inappropriate.
The R
2
can be used for comparing dierent models for the
same dependent variable Y.
Y
, we ahe the following
relationship: R
2
=
_
Y,
Y
_
2
62