02 SimpleLinearRegression2017forStudents

Lecture Notes—By Tekie Alemu (2017)
II Simple Linear Regression

2.1. Concept of regression function
Much of applied econometric analysis begins with the following premise: and are
two variables, representating some population, and we are interested in “explaining in
terms of ,” or in “studying how varies with changes in . In this process, regression is the starting point
in discussing econometric work since it is the single most common tool used in the area. Regression is
concerned with describing and evaluating
the relationship between
a) one variable (known as the dependent/explained) variable, denoted by with
b) one or more variables (known as the independent/explaining) variable(s), denoted by , , … .
The way regression is used in econometrics has nothing to do with how it is defined in the dictionary; i.e.,
“backward movement, a retreat, a return to an earlier stage” or with the way it was first coined by data
analysts. The person who coined the word in data analysis was Sir Francis Galton (1822-1911): ‘regression
towards mediocrity’.
Given the dependent variable, Y, and explanatory variables, Xi i = 1, 2, ..., K:
1. if K = 1, we say we have simple regression
2. if K > 1, we have a multiple regression
we shall start with simple regression. Thus, we have
= ( ) 1
In what follows we shall always assume that the relationship is
1. stochastic, i.e.,
= ( )+ 2
2. linear in parameters, i.e.,
= + + 3
Given n observations on X and Y equation 3 can be written as
= + + = 1, 2, … , 4
Objective 1= obtain appropriate estimates for the parameters  and .
The classical assumptions in regression analysis, are needed;
Some are needed for estimation,
Others are needed for inference purposes.
Assumptions
1. Zero mean/expected value. [ ] = 0 ∀ = 1,2, … ,
2. Zero conditional expected value [ | ] = 0 ∀ = 1,2, … ,
3. Common (constant) variance. var[ ] = ∀ = 1,2, … ,
4. Common (constant) conditional variance. var[ | ] = ∀ = 1,2, … ,
5. Independence of xj. cov , = 0 ∀ ≠ . This assumption automatically follows if we assume that
Xj are fixed (non-random variable).
6. Independence (error terms are not correlated). cov , = 0∀ ≠
7. Normality. ~ (0, ) ∀
Together, assumptions 1, 2, 3 and 5 imply that
5.a. i’s are independently and normally distributed. ~ (0, ) ∀
Classical regression analysis requires these assumption, which we shall also use initially. We shall
eventually relax each assumption as the course proceeds. However, the assumption of normality will be
retained for inference purposes. We shall also retain the 1st or 2nd assumption throughout. Thus, since
[ ] = 0 equation 4 as
[ ]= + + [ ] or [ | ] = + + [ | ]
[ ]= + 5
since xi’s are fixed (non stochastic).
E(Yi|Xi) is also known as the population regression function; after substituting the estimates of the
parameters ( and ) in this equation we get the sample regression equation
There are a number of methods for estimating the parameters of this model. The most popular ones are
a) the method of moments
b) the method of least squares, and
c) the method of maximum likelihood.
Though they give different outcomes in generalised models, in the case of the simple regression all three
give in identical results.
2.2 Estimation
2.2a Method of moments
The main assumption introduced about the error term  imply that
[ ] = 0 and cov[ ] = 0
in the method of moments, all we do is replace these conditions by their sample counterparts. Letting
and be the estimators of + respectively, and the sample counterpart of  be e (called the
residual), defined as:
= = and
The two equations that determine and are obtained by replacing population assumptions by their
sample counterparts; i.e.,
[ ] = 0 is replaced by ∑ = 0 or ∑ = 0, and
[ ] = 0 is replaced by ∑ = 0 or ∑ =0
From these replacements we get the following two equations.
∑ = 0 or ∑ − − = 0 and
∑ = 0 or ∑ − − =0
These two equations can be written as follows and are known as normal equations.
∑ = + ∑ and
∑ = ∑ + ∑
Given data on observations of Y and X,  X ,  X ,  X Y and  Y
i i
2
i i i are known magnitudes, what
we do not know are the parameter values (estimates) of and .
Thus, we can solve these as follows:
∑ ∑
= − → = −
∑ = − ∑ + ∑ →∑ = ∑ − ( ∑ +∑ )→
∑ − = (∑ − )
∑
= ∑
Let;
= =∑ = ∑( − )( − ) = ∑( −2 + )
=∑ −2 ∑ + =∑ −
=∑ = ∑( − )( − ) = ∑( − 2 + )
=∑ −2 ∑ + =∑ −
=
Once we solve for we can substitute it to the first equation to get
∑
= − ∑
= −
For example, consider the data for advertisement expenditure and sales revenue data.
(X) (Y) X2 XY Y2 (X- ) (Y- ) (X- )2 (Y- )2 dx*dY p e
0 3000 0 0 9000000 -42.5 -650 1806.25 422500 27625 3826 -826
20 3600 400 72000 12960000 -22.5 -50 506.25 2500 1125 3743 -143
50 5500 2500 275000 30250000 7.5 1850 56.25 3422500 13875 3619 1881
100 2500 10000 250000 6250000 57.5 -1150 3306.25 1322500 -66125 3412 -912
170 14600 12900 597000 58460000 0 0 5675 5170000 -23500
42.5 3650
2
To solve for and we need to compute  X ,  X ,  X Y and  Y
i i
2
i i i
∑ ( . )( )
= = (
= = −4.141 and
∑ . )( . )
= − = 3650 + 4.141(42.5) = 3825
The residuals (estimated errors are then given by)
= − − = − 3825 + 4.141
which can be calculated for each observation in our sample, and are presented in the last column in the first
table above.
The errors tell us what we get if we tried to predict the value of Y on the basis of the estimated regression
equation and the values of X, within the range of the sample. We do not get errors for X outside our range
simply because we do not have the corresponding values of Y. By the virtue of the first condition we put in
the normal equations, ei = 0. The sum of squares of these errors is given by
2.2b. Method of least squares
The method of least squares, often termed Ordinary Least Squares (OLS), requires us to choose and
such that the sum of squares of errors is minimized. Thus, given the population regression function
(equation 4)
= + + = 1, 2, … , 1
we substitute , and ei for , , and ∈ to get
= + + = 1, 2, … , 2
Where, ei’s are called the residuals and the equation is the sample regression equation
From 2, we can write
= − −
Square the ei and sum over the observations to get
∑ =∑ − − 3
∑ is called the residual sum of squares.
The intuitive idea behind the procedure of least squares is given by looking at the following figure
6000
5000
4000
3000
2000
0 20 40 60 80 100
X
Y Fitted values
We want the regression line to pass through points in such a way that it is ‘as close as possible’ to the
points of the data. Closeness could mean different things. The minimisation procedure of the OLS implies
that we minimise the sum of squares of the vertical distances of the points from the line.
Minimisation of equation 3
min , ∑ = min , ∑ − − requires that we differentiate it with respect to and and
equate the derivatives to zero. The equations so obtained are known as the first order conditions for
minimisation. This procedure yields
∑
= 0 → −2 ∑ − − =0
∑ =0
∑ = + 3a
= + → = −
3
As before!
∑
= 0 → −∑ − − =0
∑ =0
∑ = ∑ + ∑ 3b
Equations 3a and 3b are known as the normal equations. Use the same method to solve for and and
get
∑ = − ∑ + ∑
∑ = ∑ − ∑ + ∑ →
∑ − = (∑ − )→
∑ ∑ ∑ ∑
= ∑
= ∑ (∑ )
4
Equation 4 can be simplified as follows:
1st we take the numerator to get
∑ −∑ ∑ = ∑ −
= −
= (∑( − )) = (∑( − )( − ))
( − )( − ) = ( − − + ) = − − +
1 1
= − − +
1 1 1
= −2 +
1 1
= −2 +
1
= −
1
= −
= −
nd
2 we similarly take the denominator to get
∑ − (∑ ) = ∑ −( )
= (∑ − )
= ∑( − )
− = ( − )
Therefore, equation 4 reduces to
∑( )( ) ∑ ( , )
= ∑( )
= ∑
= ( )
= − and = −
̂ = 1075.8 + 4.03 or = 1075.8 + 4.03 +
= 1075.8 = 4.03
2.3 Residuals and goodness of fit
Given the residuals
= − −
We have
∑ =0→ ̅=0 6a
∑
∑ = =0 6b
in this formulation
= +
is the estimated (fitted) value of Yi. Equations a and b of 11 imply that
1. the mean of residuals is zero, and
2. The residuals and the explanatory variable are uncorrelated.
4
Given this, it follows that
∑ =0
that is, the residuals and the estimated value of Y are uncorrelated.
Proof
∑ =∑ +
= ∑ + ∑ =0
Recall
= + 7
Observed value = predicted value + residual
This holds for all observations
Sum equation 7 over i, the sampled observations, to get
∑ =∑ +∑
Thus
∑ = ∑ because ∑ = 0
Now given equation 7, we subtract and , from its left-hand-side and right-hand-side to get
− = − +
= + 8
squaring both sides and then summing over i we get
=( + )
∑ = ∑( + )
∑ = ∑ + ∑ + 2∑
Now, ∑ = 0 which implies that
∑ =∑ +∑ 9
Note that the left-hand-side of the equation 9 is the Total Sum of Squares (TSS) of the dependent variable.
The first component of the left-hand-side of the equation is the Explained Sum of Squares (ESS) and the
second element is the Residual Sum of Squares (RSS). Thus
∑ =∑ +∑
= +
The ratio of ESS to TSS is called the coefficient of determination, and is given by R2, i.e.,
∑
= =∑ note that by definition 0 ≤ R2 ≤ 1.
We know that
= ∑ and
= −
= −
= + −
but we know that
= −
Therefore
= + −
= − + −
= ( − )
=
Therefore
∑ =∑ =∑
= ∑
= ∑ ( − )
= ∑ −∑ (= 0)
= ∑ 10
or equivalently, since
∑
= ∑
it follows that
5
∑
∑ = ∑
∑
(∑ )
= ∑
11
Since =∑ it follows that
∑ ∑ ∑
= ∑ ∑
= ∑
12
Denote the square of the correlation coefficient between Y and , i.e., the observed and predicted values of
Y by r2, thus, their correlation coefficient would be
∑
=
∑ ∑
Proposition:
=
proof
Start with the fact that
= + ̂
multiply this equation throughout by and sum over i to get
∑ =∑ +∑ ̂ (= 0)
=∑ 13
Now
∑
=
∑ ∑
∑
=
∑ ∑
∑ ∑
= ∑ ∑
∑
= ∑
=
=√
Therefore
=
Moreover:
Show that
=
The proof goes as follows
∑
=
∑ ∑
∑
=
∑ ∑
∑
=
∑ ∑
∑
=
∑ ∑
=
Example: Consider the yield data
use "C:\Users\6440\Documents\DocA\Econometrics\Econ352\Kefyalew\Data_wheat_Yield.dta", clear
keep if yield>0
(6 observations deleted)
drop if yield ==.
(8 observations deleted)
6
twoway (scatter yield fertha) (lfit yield fertha)
20000
15000
10000
5000
0
0 200 400 600 800

total inorganic fertilizer (dap and urea) used in the plot in kg/ha
wheat yield in kg/ha Fitted values
The following are the necessary pieces of information

sum yield fertha
Variable Obs Mean Std. Dev. Min Max
yield 516 1681.074 1412.952 13.1579 22000
fertha 516 150.2757 101.3604 0 700
collapse (sum) yield fertha
∑ . ∑ .
= = = 1681.074; = = = 150.2757
use "C:\Users\6440\Documents\DocA\Econometrics\Econ352\Kefyalew\Data_wheat_Yield.dta", clear
keep if yield>0
drop if yield ==.
gen ymybar = yield - 1681.0734
gen xmxbar = fertha-150.27568
gen ymbtxmb = ymy*xmx
gen xmxbsq = xmxbar*xmxbar
gen ymybsq = ymybar*ymybar
∑ = ∑( − ) = 1.03(10 )
∑ = ∑( − ) = 5291075
∑ = ∑( − )( − ) = 2.13(10 )
Thus, we had
∑ . ( )
= ∑
= = 4.02565
= − = 1681.074 − 4.02565(150.2757) = 1076.1166

reg y fer
Source SS df MS Number of obs = 516
F(1, 514) = 46.82
Model 85839434 1 85839433.6 Prob > F = 0
Residual 9.42E+08 514 1833315.4 R-squared = 0.0835
Adj R-squared = 0.0817
Total 1.03E+09 515 1996434.08 Root MSE = 1354
Yield Coef. Std. Err. t P>t [95% Conf. Interval]
Fertha 4.027833 0.588636 6.84 0.000 2.871405 5.184261
_cons 1075.788 106.6663 10.09 0.000 866.2327 1285.344
Hence, the regression of Y on X is
= 1075.788 + 4.027833 +
Since Y measures yield and X measures fertilizer use; it follows that the slope coefficient
7
= 4.027833
Since the slope is 4, the equation implies that yield will increase by this amount if fertilizer use increases by
1 unit! We can go further and estimate
∑ . ( )
= = = 0.28852871
∑ ∑ ( )( . ( ))
display 2.13e+07/(5291075*1.03e+09)^0.5
.28852871
Therefore
. display .28852871*.28852871
.08324882
= .08324882 =
Or, similarly
∑ ( . )( . ( ) )
= ∑
= = .08329402
. ( )
display 4.027833 * 2.13e+07/ 1.03e+09
.08329402
8
2.4 Properties of LS estimates and Gauss-Markov theorem
Given our regression model
= + + ; = 1,2, … 1
The classical assumptions we put forward earlier can be divided into two parts: those that are made on Xi
and those made on .
a) Assumptions imposed on X
a1) The values of X: X1, X2, …, Xn, are fixed in advance (i.e., they are non stochastic)
a2) Not all X are equal.
b) Assumptions on the error term
b1) ( | ) = 0 ∀
b2) ( | )= ∀ (homoskedasticity)
b3) , = 0 ∀ ≠ (Assumption of no autocorrelation)
Note: so far we have not introduced normality.
Given these assumptions, we propose that the Least Squares Estimators are Best Linear
Unbiased Estimators (BLUE)
Recall: an estimator of , say is BLUE if it is:
1. a linear function of the random variable
2. unbiased, and
3. among all the linear unbiased estimators it has minimum variance.
These are known as the Gauss-Markov theorem, which is stated as follows:
Gauss-Markov theorem: Given the assumptions of the classical linear regression model, the least-squares
estimators, in the class of unbiased linear estimators, have minimum variance, i.e., they are BLUE.
Proof: we shall provide a proof for , try to prove this for
1. is a linear estimator of
We know that
∑ ∑( ) ∑ ∑ ∑
= ∑
= ∑
= ∑
= ∑
2a
and
∑ ∑( ) ∑ ∑ ∑
= ∑
= ∑
= ∑
= ∑
2b
Take equation 2a,
and let =∑
then
=∑ 3
This shows that is linear in Yi—it is in fact its weighted average, with wi serving as weights. Note the
following properties of wi.
a) since the X variable is assumed non-stochastic wi are also fixed in advance and is not random;
b) ∑ = 0
c) ∑ =∑
d) ∑ =∑ =1
Assignment: show that properties b, c and d are true.
2. is an unbiased estimator of
if we substitute equation 1 into equation 3 we get
=∑
=∑ ( + + )
=∑ +∑ +∑
= ∑ + ∑ +∑
∑
= + ∑
Since b and d above are true. If we now take the expectation of this result, we get
∑
= + ∑
= [ ] + [∑ ]
= +∑ [ ]=
9
Since E[ ] = 0. Therefore, is an unbiased estimator of .
3. Among the set of linearly unbiased estimators of the parameters in a regression model, the least
squares estimators have minimum variance. To show this we need to derive the variances and
covariances of the least squares estimators. We do this for the slope parameter, . At the same time,
we shall also need the least squares estimator of σ2.
a) By the definition of variance, we have the variance of as
= −
= [∑ − ]
= [∑ ( + + )− ]
= [ ∑ + ∑ +∑ ∈− ]
= [ + ∑ − ]
= [∑ ]
= [ + + ⋯+ ]
= [ + +⋯+ +2 +⋯+2 ]
= [ ]+ [ ]+⋯+ [ ]+2 [ ] + ⋯2 [ ]
= [ ]+ [ ]+ ⋯+ [ ]
= + + ⋯+
= ∑
=∑ 4
b) by the definition of covariance we have
, = − − 5
We know that
= −
= −
= −
= −
Using the results here and in a) we can write equation 5 as follows
, = − −
, = − −( − ) −
, = − + −
, =− − − =− 6
2
c) To derive the least squares estimator of σ , which is unbiased, we proceed as follows
Given, the population regression equation, equation 1
= + + = 1, 2, … , 1
We have
= + + ̅ 7
Subtracting equation 7 from 1, we get
( − ) = ( − ) + ( − )̅ = 1, 2, … ,
= + ( + )̅ = 1, 2, … , a
Moreover, from the sample regression we have
= + + = 1, 2, … ,
Then = + implying ( − ) = ( − ) +
= + = 1, 2, … , b
Subtract b from a to get
= ( − )̅ − − = 1, 2, … , 8
Squaring equation 8, one gets
= ( − )− − = 1, 2, … ,
= ( − )̅ + − − 2 ( − )̅ − = 1, 2, … ,
Summing this over the sample and taking expectations we get
10
[∑ ] = [∑( − )̅ ] + ∑ − −2 − ∑ ( − )̅ 9
Equation 9 has three component parts which can be reduced as follows
a) The 1st element in the equation could be written as follows
[∑( − )̅ ] = [∑( + ̅ − 2 )̅ ]
= [∑ + ̅ − 2 ̅ ∑ ]
= [∑ + ̅ − 2 ̅ ]
= [∑ − ̅ ]
=∑ [ ]− [ ̅ ]
But we know that
[ ]= [ ] − [ ] and [ ̅ ] = [ ]̅ − ( [ ]̅ )
Therefore
[∑( − )̅ ] = ∑ [ ] − [ ̅ ] = ∑( [ ]− [ ] )− ( [ ]̅ − [ ]̅ )
[∑( − )̅ ] = ∑( − [ ] ) − − ( [∈])
= − [ ] − + [ ]̅ = −
[∑( − )̅ ] = ( − 1)
b) the second part is also simplified as follows

∑ − =∑
∑ − =∑ =
∑
rd
c) to simplify the 3 element in equation 9 we use the following results
use the fact that
∑ ∑ ( ) ∑ ∑
= ∑
= ∑
= ∑
∑
= + ∑
Thus
∑
−2 − ∑ ( − )̅ = −2 + − ∑ ( − )̅
∑
∑
= −2 ∑ ( − )̅
∑
∑ ∑
= −2 ∑
+2 ∑
̅∑
∑
= −2 ∑
Now
∑ = + + ⋯+ +2 +⋯+2
Therefore
∑ =
[ ]+ [ ] + ⋯+ [ ]+2 [ ]+ … + 2 [ ]
= + + ⋯+
= ∑
It easily follows then that
−2 − ∑ ( − )̅ = − ∑
∑
= −∑ ∑
=−2
Collecting the results obtained in a) b) and c) above, we get
[∑ ] = [∑( − )̅ ] + ∑ − −2 − ∑ ( − )̅
= ( − 1) + − 2 = ( − 2)
It easily follows that if we set
∑
= =
11
we have an unbiased estimator of 2
To show that the LS estimator of  is BLUE, in addition to the conditions of linearity and unbiasedness we
need to show that it has minimum variance. To show this, we proceed as follows: we already know that
=∑
where = ∑( )
Now define an alternative linear estimator of that is unbiased, say
=∑
this makes linear, in that it is a linear function of Yi. For this estimator to be unbiased its expected value
must be equal to 1, i.e.,
=∑ [ ]=∑ [ + + ]=∑ ( + )= ∑ + ∑
Therefore, for to be unbiased
∑ + ∑ =
For this to be true
∑ = 0 and ∑ =1
Note also that the variance of is
= [∑ ]
=∑ [ ]= ∑
We now compare the variances of and . To do so, let
= − note that ∑ = ∑ − ∑ = 0
= + implying that =( + )
Therefore
∑ =∑ +∑ + 2∑
∑ =∑ +∑
because
∑ =0
as
∑
∑ = ∑ and
∑ =∑ ( − )=∑ − ∑ (= 0) and
∑ = ∑( − ) = ∑ −∑ = 1−1= 0
therefore
∑ =∑ +∑
so that
∑ = ∑ + ∑
= + ∑
therefore
≥ ; since ∑ ≥0
This establishes that the OLS estimator, , is the Best Linear Unbiased Estimator (BLUE) of .
12
2.5. Confidence intervals and hypothesis testing
In this section, we shall discuss issues of interval estimation and hypothesis testing—what is known as
statistical inference in the statistics literature. For this and related aspects we need the
1. Variances of the OLS estimators
2. Covariance between the OLS estimators, and
3. The unbiased estimator of σ2, which were derived earlier, and
4. Discuss and derive the implications of the normality assumption of the error terms. This assumption is
particularly crucial for inference, because without this we cannot do any statistical testing on the
parameters, nor can we do interval estimation.
The variances and covariances of the OLS estimators
In our earlier discussions we showed that the variance of is
=∑ 1a
and you must have obtained the variance of to be
= +∑ 1b
the covariance between and is given as
, =− 1c
the least squares estimator of σ2, which is unbiased, is given by
∑
= =
For our example raised earlier we obtained
= 1075.788 − 4.027833 ℎ + = 0.0835
Recall that
∑ =∑ +∑
∑ =∑ −∑
∑ =∑ − ∑
= 1028163547.315 − (4.027833)(21311566.26134)
= 942324102.28
Now
∑ .
= = = 1833315.4
( )
Implications of the normality assumption on the distribution of the parameters of interest
Let
∈= ⋮
be a vector of error terms. If ~ (0, ) it then follows that ~ ( + , ) because Yi is a

linear combination of the error terms.
Therefore, we get
1. =∑ ~ ,∑
∑
2. ~
The first proposition follows from the fact that is a linear combination of Yi, which is normal. The proof
of the 2nd proposition goes as follows. Given that
~ (0, ),
~ (0,1) is a standard normal distribution. Then ~ as the square of a standard normal is a chi-
squared,
∑
~ since the sum of n chi squared distributions is a chi-squared distribution with n degrees of
freedom. Now
∑ = ∑( − + )
adding and subtracting + in the right hand side of the equation we get
∑ =∑ − − + + − −
13
=∑ + − + −
=∑ +∑ − +( − ) ∑ + terms involving and
Dividing the whole equation by 2 and discarding the terms involving multiples of ei, and we get
∑ ∑ ∑
= + +
Thus, the required result:

∑
~
We also know that
∑
= →∑ = ( − 2)
Therefore
( )
~ 2a
Recalling the fact that
~ ,∑
it follows that
∑ ~ (0,1) 2b
i.e., it is standard normal.
Recall that if we divide a standard normal by the square root of a χ2 with n - 2 degrees of freedom, divided
by n - 2, we get a t distribution with n - 2 degrees of freedom. Thus dividing equation 2b by the square root
of equation 2a divided by n - 2 we get a t distribution, i.e.,
∑ ∑
= = ∑ = ~
( )
This result is used for both estimating confidence intervals and hypothesis testing. Notice the switch from
the variance of to the estimator of the variance of
We shall use the data in our previous example to calculate the variances and standard errors of the
estimators
= 1075.788 − 4.027833 ℎ + = 0.0835
The standard errors are obtained by
1. calculating the variances of and
2. substitute for σ2
3. take the square root of the resulting expression
Now
=∑ =
.
. . .
= +∑ = + = =
. .
(0.00620605834363549)
Recall
∑
= = = 1833315.4
Therefore
= (0.0062) = 1833315.4(0.0062) = 11377.684
= √11377.684 = 106.66623
.
= = = 0.3464
.
= √0.34649209 = 0.5886
Usually, the complete result of the regression is written as follows:
= 1075.788 − 4.0278833 ℎ
= 0.0835
(106.66623) (0.5886)
14
It is usually prefered to put the results on a table when we have a large number of explanatory variables;
given in stata as follows
Source SS df MS Number of obs = 516
F(1, 514) = 46.82
Model 85839433.6 1 85839433.6 Prob > F = 0
Residual 942324116 514 1833315.4 R-squared = 0.0835
Adj R-squared = 0.0817
Total 1.03E+09 515 1996434.08 Root MSE = 1354
yield Coef. Std. Err. t P>t [95% Conf. Interval]

fertha 4.027833 0.5886358 6.84 0 2.871405 5.184261
_cons 1075.788 106.6663 10.09 0 866.2327 1285.344
We can then easily obtain the confidence intervals for and by using the t distribution with n-2 degrees
of freedom.
For instance, in our example we know that − / has t distribution with n - 2 degrees of
freedom, therefore
< < =1−
where  is the level of confidence. If we let out confidence level be 0.05, we have the following confidence
interval for 516 - 2 (=514) degrees of freedom from the statistical tables.
−1.96 < < 1.96 = 0.95
Therefore
− − 1.96 <− <− + 1.96 = 0.95
+ 1.96 > > − 1.96 = 0.95
[1075.788 + 1.96(106.66623) > > 1075.788 − 1.96(106.66623)] = 0.95
[1285 > > 866] = 0.95
Thus, ’s 95% confidence interval is (866, 1285).
Similarly, since − / has t distribution with n - 2 degrees of freedom, it follows that
< < =1−
where  is the level of confidence. If we let out confidence level be 0.05, we have the following confidence
interval for 18 - 2 (=16) degrees of freedom from the statistical tables.
−1.96 < < 1.96 = 0.95
Therefore
− − 1.96 <− <− + 1.96 = 0.95
+ 1.96 > > − 1.96 = 0.95
[4.027833 + 1.96(0.5886358) > > 1075.788 − 1.96(0.5886358)] = 0.95
[5.18 > > 2.87] = 0.95
Thus, ’s 95% confidence interval is (2.87, 5.18).
The confidence interval could also be used as a means of testing whether the parameter of interest is
statistically different from zero. If zero is a member of the interval, or is within it, then the parameter is not
statistically significantly different from zero! If zero is not a member then the parameter is statistically
significantly different from zero.
Hypothesis testing
15
The most common hypothesis tested regarding the parameters in the simple linear regression model is
whether the parameters of interest are different from zero or not. Such hypotheses could be easily tested
using the following null and alternative hypotheses.
1. H0: = 0, and H1: ≠0

2. H0: = 0, and H1: ≠0
Interpretation: the 1st set of hypotheses is testing whether the slope parameter is different from zero. If the
data supports the null, whereby we say we accept it, then the alternative is rejected. This implies that the
relationship we formulated is not supported by the data as a result there is no relationship between the
explanatory and dependent variables.
The 2nd set of hypotheses also intends to test whether the intercept is different from zero or not. However,
the interpretation is different, in that we are now asking whether the function passes through the origin or
through a different point on the y axis.
Now ~
Therefore, if the null hypothesis (H0) is true, it follows that follows
.
= 6.84 → | | = 6.84
.
Now, from the t table for 514 degrees of freedom we read that
Pr[t > 0.674] = 0.25, Pr[t > 1.282] = 0.10, Pr[t > 1.645] = 0.05, Pr[t > 1.96] = 0.025 and so on. Actually, the
probability that Pr[t > 6.84] = 0. Since this is a low probability (actually the lowest we can obtain), we
reject the null hypothesis. Thus, what the slope parameter we calculated is statistically different from Zero.
People customarily say that their parameters have been found to be significant. It is customary to use as
cut-offs probability levels of 0.05 and 0.01 to reject the null hypothesis; i.e., we reject the null hypotheses if
the probabilities obtained are less than 0.05 or 0.01.
Assignment:
1. Test whether the intercept is significantly different from zero.
2. Given the following
= 30 6 ℎ
+ + = 200 = 0.95
(5.33) (2.55)
a) interpret the results
b) test the hypotheses that the parameters are different from zero
2.6. Analysis of variance
The analysis of variance is yet another way of presenting results in regression analysis that complements
statistical inference. It uses the decomposition of variation in Y into the explained and residual sum of
squares. Under the assumption of normality, we obtained the following facts:
∑
= ~
and
~
Of course, these hold if the true is zero, i.e., the null hypothesis holds. It can be shown further that these
two distributions are independent. Thus, under the assumption that = 0, dividing both equations by their
respective degrees of freedom and taking their raito one gets
/
~ ,
/( )
This result can be used to test whether = 0. The sketch for presenting the analysis of variance is given in
the following table.
16
Source of variation Sum of Squares df Mean Square
Model = 1 ESS/1
Residual = − n-2 RSS/(n-2)
Total = n-1 1996434.08

Given this sketch of presenting the analysis of variance, that of our example is given as follows:
Source of variation Sum of Squares df Mean square
Model 85839433.6 1 85839433.6

Residual 942324116 514 1833315.4
Total 1.03E+09 515 1996434.08
/
Since ~ ,
/( )
The F statistic we obtain from this data is
. /
= = 46.82
/( )
Recall the t statistic we obtained for testing the significance of the parameter was.
/ = 4.027833/0.5886358 = 6.84
Note that the F statistic is the square of this t statistic.

There is also a mechanism for checking whether what we have done is correct. Notice that
= ∑ = ∑
=∑ − ∑ =∑ − ∑ = (1 − ) ∑
Hence we can write the F statistic as
/ ∑ ( )
= =( )∑
= ( )
/( ) /( )
Remembering that F = t2, we get

( )
= → (1 − )= ( − 2) → − = ( − 2)
( )
→ = ( − 2) + = ( −2+ )
Therefore
→ =( )
We can check the R2 we obtained earlier using this formula
. .
→ =( )
= = 0.083
. .
17
2.7. Prediction in simple regression model.
Recall our estimated regression equation

= + . 1
With this our interest is to predict the values of Y given values of X. This is known as conditional
prediction.
Let the given value of X = X*, then we predict the corresponding value of Y* of Y by solving
∗ = + ∗ 2
Where ∗ is the predicted value of ∗ .
Now, the true value of ∗ is
∗ = + ∗+ ∗ 3
Where ∗ is the disturbance (error) term. We now try to look at the desirable properties of ∗ .
First, we note that ∗ is a linear function of Y1, Y2,… Yn, since and are linear in Yi. Thus ∗ is a
linear function of ∗
Second, ∗ is unbiased. This follows from the fact that
= ∗ − ∗
But ∗ = + ∗
And ∗ = + ∗+ ∗
Therefore, the prediction error is
= ∗ − ∗= − + − ∗− ∗
Thus
[ ] = [ ∗ − ∗] = − + − ∗ − [ ∗] = 0
Thus,
[ ∗ − ∗] = 0 → [ ∗ ] = [ ∗]
Thus it is unbiased.
Third, though we shall not show this here, ∗ has the smallest variance among the linear unbiased
predictors of ∗ . Thus, ∗ is the Best Linear Unbiased Predictor (BLUP) of ∗ .
We will, however, derive the variance of the predictor’s error, [ ], which is obtained as follows:
Now:
= − + − ∗− ∗
Thus,
[ ]= − + ∗ − + ∗ − , − + [ ∗]
∗
= +∑ + ∑
−2 ∗∑ +
= 1+ +∑ +∑∗ −2 ∗∑
∗ ∗
= 1+ + ∑
( ∗ )
= 1+ + ∑
As a result, we observe that
1. The variance of the predictor increases as X* is further away from the mean of X, , (i.e., the mean of the
observations on the basis of which and have been computed). Or as the distance between X* and
increases, the variance of the error of prediction increases.
2. The variance of the error in prediction increases with the variance of the regression.
3. It decreases with n (number of observations used in the estimation of the parameters).
Interval prediction
Given the assumption of normality, it can be shown that Y* follows a normal distribution with mean
[ ∗] = + ∗
Its variance, on the other hand, is
18
( ∗ )
( ∗) = 1+ + ∑
Substituting for it follows that
∗ ∗
= ( ∗)
Which follows a t distribution with n – 2 degrees of freedom. This result can be used for both interval
estimation and other inference purposes. Let us try to obtain confidence intervals for the predictor using the
results in our earlier example. Recall that the regression results were
= 1075.788 − 4.0278833 ℎ
(106.66623) (0.5886)
= 1833315.4; ℎ = 150.2757 ∑ = 5291074.3 recall also that ferth ranges from 0 to 700.
Now, suppose we are interested to predict Yield* for, ferth* = 150.2757; i.e., the mean. In this case
∗ = + ℎ∗
= 1075.788 − 4.0278833(150.2757)
=1681.0736
And given
( ∗ )
( ∗) = 1+ + ∑
The standard error of the predictor is
( ∗ )
( ∗) = 1+ + ∑
( . . )
( ∗) = 1833315.4 1 + +
.
( ∗) = 1833315.4 1 + = 1355.311
The t value for 95% confidence for 514 degrees of freedom is 1.960. Thus, the 95% confidence interval for
Y* is
1681.0736 ± 1.96(1355.311) = (−975, 4337)
Now, suppose we wanted to predict the value of Yield* that is far away from the mean of fertha, say, 200.
Then the predicted value of MTAX* is
∗ = + ℎ∗
= 1075.788 − 4.0278833(200)
=1881.188
Then, the standard error of the predictor is
( . )
( ∗) = 1833315.4 1 + +
.
1 2500
( ∗) = 1833315.4 1 + + > 1355.311
516 5291074.886233
Thus, the 95% confidence interval for Y* is wider

Notice the large confidence interval as we move away from the mean.
Sometimes we are interested not in Y*, buy in ∗ | ∗ . Thus, we are interested in the mean of Y* and not
Y* as such. Note that
∗ | ∗ = + ∗
Thus, the predicted value of the expectation would be
∗ = + ∗
Which is the same as what we did earlier. However, the prediction error will be smaller. Because the
variance will now be:
( ∗ )
( ∗) = + ∑
Think about why this happens!
19

02 SimpleLinearRegression2017forStudents

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

02 SimpleLinearRegression2017forStudents

Încărcat de

Drepturi de autor:

Formate disponibile

Lecture Notes—By Tekie Alemu (2017)

II Simple Linear Regression

0 200 400 600 800

wheat yield in kg/ha Fitted values

The following are the necessary pieces of information

= − = 1681.074 − 4.02565(150.2757) = 1076.1166

b) the second part is also simplified as follows

be a vector of error terms. If ~ (0, ) it then follows that ~ ( + , ) because Yi is a

Thus, the required result:

yield Coef. Std. Err. t P>t [95% Conf. Interval]

1. H0: = 0, and H1: ≠0

Residual = − n-2 RSS/(n-2)

Total = n-1 1996434.08

Model 85839433.6 1 85839433.6

Note that the F statistic is the square of this t statistic.

Remembering that F = t2, we get

Recall our estimated regression equation

Thus, the 95% confidence interval for Y* is wider

S-ar putea să vă placă și