Sunteți pe pagina 1din 96

Linear Regression Analysis

Correlation
Simple Linear Regression
The Multiple Linear Regression Model
Least Squares Estimates
R2 and Adjusted R2
Overall Validity of the Model (F test)
Testing for individual regressor (t test)
Problem of Multicollinearity
Gaurav Garg (IIM Lucknow)

Smoking and Lung Capacity


Suppose, for example, we want to investigate the
relationship between cigarette smoking and lung
capacity
We might ask a group of people about their smoking
habits, and measure their lung capacities
Cigarettes (X)
0
5
10
15
20

Lung Capacity (Y)


45
42
33
31
29
Gaurav Garg (IIM Lucknow)

Scatter plot of the data


Lung Capacity

60
40
20

0
0

10

20

30

We can see that as smoking goes up, lung


capacity tends to go down.
The two variables change the values in opposite
directions.
Gaurav Garg (IIM Lucknow)

Height and Weight


Consider the following data of heights and weights of 5
women swimmers:
Height (inch):
62
64
65
66
68
Weight (pounds): 102 108 115 128 132
We can observe that weight is also increasing with
height.
150
100
50
0
60

65
Gaurav Garg (IIM Lucknow)

70

Sometimes two variables are related to each


other.
The values of both of the variables are paired.
Change in the value of one affects the value of
other.
Usually these two variables are two attributes of
each member of the population
For Example:
Height
Weight
Advertising Expenditure
Sales Volume
Unemployment
Crime Rate
Rainfall
Food Production
Expenditure
Savings
Gaurav Garg (IIM Lucknow)

We have already studied one measure of relationship


between two variables Covariance
Covariance between two random variables, X and Y is
given by
Cov( X , Y ) E ( XY ) E ( X ) E (Y )
XY

For paired observations on variables X and Y,


1 n
Cov( X , Y ) XY ( xi x )( yi y )
n i 1
xx
yy

Gaurav Garg (IIM Lucknow)

Properties of Covariance:

Cov(X+a, Y+b) = Cov(X, Y)


[not affected by change in location]
Cov(aX, bY) = ab Cov(X, Y)
[affected by change in scale]
Covariance can take any value from - to +.
Cov(X,Y) > 0 means X and Y change in the same direction
Cov(X,Y) < 0 means X and Y change in the opposite direction
If X and Y are independent, Cov(X,Y) = 0 [other way may not be true]

It is not unit free.


So it is not a good measure of relationship between two
variables.
A better measure is correlation coefficient.
It is unit free and takes values in [-1,+1].

Gaurav Garg (IIM Lucknow)

Correlation
Karl Pearsons Correlation coefficient is given by
rXY Corr ( X , Y )

Cov( X , Y )
Var( X )Var(Y )

When the joint distribution of X and Y is known


Cov( X , Y ) E ( XY ) E ( X ) E (Y )
Var( X ) E ( X 2 ) [ E ( X )]2 ,Var(Y ) E (Y 2 ) [ E (Y )]2

When observations on X and Y are available


1 n
Cov( X , Y ) ( xi x )( yi y )
n i 1
1 n
1 n
2
Var( X ) ( xi x ) ,Var(Y ) ( yi y ) 2
n i 1
n i 1
Gaurav Garg (IIM Lucknow)

Properties of Correlation Coefficient

Corr(aX+b, cY+d) = Corr(X, Y),


It is unit free.
It measures the strength of relationship on a
scale of -1 to +1.
So, it can be used to compare the relationships of
various pairs of variables.
Values close to 0 indicate little or no correlation
Values close to +1 indicate very strong positive
correlation.
Values close to -1 indicate very strong negative
correlation.
Gaurav Garg (IIM Lucknow)

Scatter Diagram
Y

X
Positively Correlated

Weakly Correlated

Negatively Correlated

Strongly Correlated
Gaurav Garg (IIM Lucknow)

Not Correlated

Correlation Coefficient measures the strength of


linear relationship.
r = 0 does not necessarily imply that there is no
correlation.
It may be there, but is not a linear one.

Gaurav Garg (IIM Lucknow)

y y

xx

1.25

125

-0.9

45

0.8100

2025

-40.50

1.75

105

-0.4

25

0.1600

625

-10.00

2.25

65

0.1

-15

0.0100

225

-1.50

2.00

85

-0.15

0.0225

25

-0.75

2.50

75

0.35

-5

0.1225

25

-1.75

2.25

80

0.1

0.0100

2.70

50

0.55

-30

0.3025

900

-16.50

2.50

55

0.35

-25

0.1225

625

-8.75

17.50

640

1.560

4450

-79.75

SSX

SSY

SSXY

( x x )2 ( y y)2

( x x )( y y )

Cov( X , Y )
SSXY
79.75
r

0.957
Var ( X )Var (Y ) GauravSSX
SSY
1.56 4450
Garg (IIM Lucknow)

Alternative Formulas for Sum of Squares

x y

, SSXY xy
, SSY y
n
n

x2

y2

x.y

1.25

125

1.5625

15625

156.25

1.75

105

3.0625

11025

183.75

2.25

65

5.0625

4225

146.25

2.00

85

4.0000

7225

170.00

2.50

75

6.2500

5625

187.50

2.25

80

5.0625

6400

180.00

2.70

50

7.2500

2500

135.00

2.50

55

6.2500

3025

137.50

17.20

640

38.54

55650

1296.25

SSX x

SSX = 1.56
SSY = 4450

SSXY= -79.75

Cov( X , Y )
SSXY
79.75
r

0.957
Var ( X )Var (Y ) GauravSSX
SSY
1.56 4450
Garg (IIM Lucknow)

Smoking and Lung Capacity Example


Cigarettes
(X)

0
5
10
15
20
50

0
25
100
225
400
750

XY

0
210
330
465
580
1585

Gaurav Garg (IIM Lucknow)

Lung
Capacity
(Y)

2025
1764
1089
961
841
6680

45
42
33
31
29
180

rxy

(5)(1585) (50)(180)

(5)(750) 502 (5)(6680) 1802


7925 9000

(3750 2500)(33400 32400)

1075

1250 (1000)

.9615

Gaurav Garg (IIM Lucknow)

Regression Analysis
Having determined the correlation between X and Y, we
wish to determine a mathematical relationship between
them.
Dependent variable: the variable you wish to explain
Independent variables: the variables used to explain the
dependent variable
Regression analysis is used to:
Predict the value of dependent variable based on the
value of independent variable(s)
Explain the impact of changes in an independent
variable on the dependent variable
Gaurav Garg (IIM Lucknow)

Types of Relationships
Linear relationships

Curvilinear relationships

X
Gaurav Garg (IIM Lucknow)

Types of Relationships
Strong relationships

Weak relationships

X
Gaurav Garg (IIM Lucknow)

Types of Relationships
No relationship

X
Y

X
Gaurav Garg (IIM Lucknow)

Simple Linear Regression Analysis

The simplest mathematical relationship is


Y = a + bX + error (linear)
Changes in Y are related to the changes in X
What are the most suitable values of
a (intercept) and b (slope)?
Y
b

}a

1
y = a + b.x
X

X
Gaurav Garg (IIM Lucknow)

Method of Least Squares


(xi, yi)

a bX

error

yi

a bx i
X

xi

The best fitted line would be for which all the


ERRORS are minimum.
Gaurav Garg (IIM Lucknow)

We want to fit a line for which all the errors are


minimum.
We want to obtain such values of a and b in
Y = a + bX + error for which all the errors are
minimum.
To minimize all the errors together we minimize
the sum of squares of errors (SSE).
n

SSE (Yi a bX i )
i 1

Gaurav Garg (IIM Lucknow)

To get the values of a and b which minimize SSE, we


proceed as follows:
n
SSE
0 2 (Yi a bX i ) 0
a
i 1
n

Yi na b X i
i 1

(1)

i 1

n
SSE
0 2 (Yi a bX i ) X i 0
b
i 1
n

Yi X i a X i b X i2
i 1

i 1

i 1

Eq (1) and (2) are called normal equations.


Solve normal equations to get a and b
Gaurav Garg (IIM Lucknow)

( 2)

Solving above normal equations, we get


n

Y
i 1

Y X
i 1

na b X i
i 1

a X i b X
i 1

n n

n Yi X i Yi X i
i 1
i 1
i 1

2
n
n

2
Xi Xi

i 1
i 1
n

i 1

2
i

Y Y X
n

i 1

a Y bX
Gaurav Garg (IIM Lucknow)

X
n

i 1

SSXY

SSX

The values of a and b obtained using least squares


method are called as least squares estimates (LSE)
of a and b.
Thus, LSE of a and b are given by

SSXY

b
.
SSX
Also the correlation coefficient between X and Y is
a Y bX,

rXY

Cov( X , Y )
Var ( X )Var (Y )

SSXY

SSXY

SSX
SSX SSY
Gaurav Garg (IIM Lucknow)

SSX
SSX

b
SSY
SSY

y y

xx

1.25

125

-0.9

45

0.8100

2025

-40.50

1.75

105

-0.4

25

0.1600

625

-10.00

2.25

65

0.1

-15

0.0100

225

-1.50

2.00

85

-0.15

0.0225

25

-0.75

2.50

75

0.35

-5

0.1225

25

-1.75

2.25

80

0.1

0.0100

2.70

50

0.55

-30

0.3025

900

-16.50

2.50

55

0.35

-25

0.1225

625

-8.75

17.50

640

1.560

4450

-79.75

SSX

SSY

SSXY

( x x )2 ( y y)2

X 2.15, Y 80 .
Gaurav Garg (IIM Lucknow)

( x x )( y y )

SSXY
0.957
SSX SSY

b SSXY 51.12 a Y bX 189.91


SSX
Fitted Line is Y 189 .91 51 .12 X
140
120
100
80

60
40

0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75

Gaurav Garg (IIM Lucknow)

Fitted Line is Y 189 .91 51 .12 X


189.91 is the estimated mean value of Y when
the value of X is zero.
-51.12 is the change in the average value of Y as
a result of a one-unit change in X.
We can predict the value of Y for some given
value of X.
For example at X=2.15, predicted value of Y is
189.91 51.12 x 2.15= 80.002
Gaurav Garg (IIM Lucknow)

Residuals : ei Yi Yi
Residual is the unexplained part of Y
The smaller the residuals, the better the utility of
Regression.
Sum of Residuals is always zero. Least Square
procedure ensures that.
Residuals play an important role in investigating
the adequacy of the fitted model.
We obtain coefficient of determination (R2)
using the residuals.
R2 is used to examine the adequacy of the fitted
linear model to the given data.
Gaurav Garg (IIM Lucknow)

Coefficient of Determination
Y Y Y Y

Y Y

Y
X
n

Total Sum of Squares : SST (Yi Y ) 2


i 1

Regression Sum of Squares : SSR (Yi Y ) 2


i 1

Error Sum of Squares : SSE (Yi Yi ) 2


i 1

Also, SST = SSR + SSE


Gaurav Garg (IIM Lucknow)

The fraction of SST explained by Regression is given by R2


R2 = SSR/ SST = 1 (SSE/ SST)
Clearly, 0 R2 1
When SSR is closed to SST, R2 will be closed to 1.
This means that regression explains most of the variability
in Y. (Fit is good)
When SSE is closed to SST, R2 will be closed to 0.
This means that regression does not explain much
variability in Y. (Fit is not good)
R2 is the square of correlation coefficient between X and
Y. (proof omitted)
Gaurav Garg (IIM Lucknow)

r = -1

r=1
R2 = 1
Perfect linear
relationship
100% of the variation
in Y is explained by X

0 < R2 < 1
Weak linear
relationships
Some but not all of
the variation in Y is
explained by X
Gaurav Garg (IIM Lucknow)

R2 = 0
No linear
relationship

None of the
variation in Y is
explained by X

1.25

125

(Y Y ) (Y Y ) (Y Y ) (Y Y ) 2 (Y Y ) 2 (Y Y ) 2
126.0
45
-1
46
2025
1
2116

1.75

105

100.5

25

4.5

20.5

625

20.25

420.25

2.25

65

74.9

-15

-9.9

-5.1

225

98.00

26.01

2.00

85

87.7

-2.2

7.7

25

4.84

59.29

2.50

75

62.1

-5

12.9

-17.7

25

166.41 313.29

2.25

80

74.9

5.1

-5.1

26.01

26.01

2.70

50

51.9

-30

-1.9

-28.1

900

3.61

789.61

2.50

55

62.1

-25

-7.1

-17.9

625

50.41

320.41

17.20

640

4450

370.54 4079.4
6

Coefficient of Determination: R2 = (4450-370.5)/4450 = 0.916


Correlation Coefficient:
r = -0.957
Coefficient of Determination = (Correlation Coefficient)2
Gaurav Garg (IIM Lucknow)

Example:
Watching television also reduces the amount of physical exercise,
causing weight gains.
A sample of fifteen 10-year old children was taken.
The number of pounds each child was overweight was recorded
(a negative number indicates the child is underweight).
Additionally, the number of hours of television viewing per weeks
was also recorded. These data are listed here.
TV
42 34 25 35 37 38 31 33 19 29 38 28 29 36 18
18 sample
6 0 regression
-1 13 14 line
7 and
7 -9describe
8 8 what
5 3 the
14 -7
Overweight
Calculate the

coefficients tell you about the relationship between the two


variables.

Predicted Y = -24.709 + 0.967 X


0.768
Gaurav Garg (IIM Lucknow)

and

R2 =

Gaurav Garg (IIM Lucknow)

20.00

15.00

10.00

5.00
Y
Predicted Y
0.00

10

-5.00

-10.00

-15.00

Gaurav Garg (IIM Lucknow)

11

12

13

14

15

Standard Error
Consider a dataset.
All the observations can not be exactly the same as
arithmetic mean (AM).
Variability of the observations around AM is measured
by standard deviation.
Similarly in regression, all Y values can not be the same
as predicted Y values.
Variability of Y values around the prediction line is
measured by STANDARD ERROR OF THE ESTIMATE.
n
It is given by
2
SYX

SSE

n2

(Y Y )
i 1

Gaurav Garg (IIM Lucknow)

n2

Assumptions
The relationship between X and Y is linear
Error values are statistically independent
All the Errors have a common variance.
(Homoscedasticity)
Var(ei )= 2, where e Y Y
i
i
i
E(ei )= 0
No distributional assumption about errors is
required for least squares method.

Gaurav Garg (IIM Lucknow)

Linearity
Linear

Not Linear

residuals

residuals

Gaurav Garg (IIM Lucknow)

Independence
Independent

residuals

residuals

residuals

Not Independent

Gaurav Garg (IIM Lucknow)

Equal Variance
Unequal variance
(Heteroscadastic)

Equal variance
(Homoscadastic)
Y

residuals

residuals

Gaurav Garg (IIM Lucknow)

TV Watching Weight Gain Example


20.00

Scatter Plot of X and Y

15.00
10.00
5.00
0.00
0

10

15

20

25

30

35

40

45

-5.00
-10.00

-15.00

Scatter Plot of X and Residuals


6.00
4.00
2.00
0.00
-2.00

10

15

20

25

-4.00
-6.00

-8.00
-10.00
-12.00

Gaurav Garg (IIM Lucknow)

30

35

40

45

The Multiple Linear Regression Model


In simple linear regression analysis, we fit linear relation
between
one independent variable (X) and
one dependent variable (Y).

We assume that Y is regressed on only one regressor


variable X.
In some situations, the variable Y is regressed on more
than one regressor variables (X1, X2, X3, ).
For EXample:
Cost
Salary
Sales

-> Labor cost, Electricity cost, Raw material cost


-> Education, EXperience
-> Cost, Advertising EXpenditure
Gaurav Garg (IIM Lucknow)

Example:
A distributor of frozen dessert pies wants to
evaluate factors which influence the demand
Dependent variable:
Y: Pie sales (units per week)

Independent variables:
X1: Price (in $)
X2: Advertising Expenditure ($100s)

Data are collected for 15 weeks


Gaurav Garg (IIM Lucknow)

Week

Pie
Sales

Price
($)

Advertising
($100s)

350

5.50

3.3

460

7.50

3.3

350

8.00

3.0

430

8.00

4.5

350

6.80

3.0

380

7.50

4.0

430

4.50

3.0

470

6.40

3.7

450

7.00

3.5

10

490

5.00

4.0

11

340

7.20

3.5

12

300

7.90

3.2

13

440

5.90

4.0

14

450

5.00

3.5

15

300

7.00

2.7

Gaurav Garg (IIM Lucknow)

Using the given data, we wish to fit a linear


function of the form:
Yi 0 1 X 1i 2 X 2 i i ,
i 1,2,,15.
where
Y: Pie sales (units per week)
X1: Price (in $)
X2: Advertising Expenditure ($100s)

Fitting means, we want to get the values of


regression coefficients denoted by
Original values of s are not known.
We estimate them using the given data.
Gaurav Garg (IIM Lucknow)

The Multiple Linear Regression Model


Examine the linear relationship between
one dependent (Y) and
two or more independent variables (X1, X2, , Xk).

Multiple Linear Regression Model with k


Independent Variables:
Intercept

Random Error

Slopes

Yi 0 1 X 1i 2 X 2 i k X ki i
i 1,2,, n.
Gaurav Garg (IIM Lucknow)

Multiple Linear Regression Equation


Intercept and Slopes are estimated using observed
data.
Multiple linear regression equation with k
independent variables
Estimated
value

Estimate of
intercept

Estimates of slopes

Yi b0 b1 X1i b2 X 2i bk X ki
i 1,2,, n.
Gaurav Garg (IIM Lucknow)

Multiple Regression Equation


EXample with two independent variables
Y b0 b1 X1 b2 X 2
Y

X2

X1

Gaurav Garg (IIM Lucknow)

Estimating Regression Coefficients


The multiple linear regression model
Yi 0 1 X 1i 2 X 2i k X ki i ,i 1,2,...,n
In matriX notations
0
Y1 1 X 11 X 12 X 1k 1

1
Y2 1 X 21 X 22 X 2 k 2
2



Y 1 X

X
n1
n2
nk
n

k
or

Y X
Gaurav Garg (IIM Lucknow)

Assumptions
No. of observations (n) is greater than no. of
regressors (k). i.e., n> k
Random Errors are independent
Random Errors have the same variances.
(Homoscedasticity)
Var(i )= 2
In long run, mean effect of random errors is zero.
E(i )= 0.

No Assumption on distribution of Random errors


is required for least squares method.
Gaurav Garg (IIM Lucknow)

In order to find the estimate of , we minimize


n

S( ) (Y X)(Y X )
i 1

2
i

Y Y-2 X Y X X
We differentiate S() with respect to and equate
to zero, i.e.,
This gives

S
0,

b (X X) X Y

b is called least squares estimator of .


Gaurav Garg (IIM Lucknow)

Example: Consider the pie example.


We want to fit the model Yi 0 1 X 1i 2 X 2 i i ,
The variables are
Y: Pie sales (units per week)
X1: Price (in $)
X2: Advertising Expenditure ($100s)

Using the matriX formula, the least squares estimate


(LSE) of s are obtained as below:
LSE of Intercept 0

Intercept

(b0)

306.53

LSE of slope 1

Price

(b1)

-24.98

LSE of slope 2

Advertising (b2)

74.13

Pie Sales = 306.53 24.98 Price + 74.13 Adv. Expend.


Gaurav Garg (IIM Lucknow)

Sales 306 .53 - 24 .98( X 1 ) 74 .13( X 2 )

b1 = -24.98: sales will decrease, on


average, by 24.98 pies per week for
each $1 increase in selling price,
while advertising expenses are kept
fixed.

Gaurav Garg (IIM Lucknow)

b2 = 74.13: sales will


increase, on average, by
74.13 pies per week for
each $100 increase in
advertising, while selling
price are kept fixed.

Prediction:
Predict sales for a week in which
selling price is $5.50
Advertising eXpenditure is $350:

Sales = 306.53 24.98 X1 + 74.13 X2

= 306.53 24.98 (5.50) + 74.13 ( 3.5)

= 428.62

Predicted sales is 428.62 pies


Note that Advertising is in $100s, so X2 = 3.5

Gaurav Garg (IIM Lucknow)

Y 306 .52619 24 .97509 X 1 74 .13096 X 2


Y
350
460
350
430
350
380
430
470
450
490
340
300
440
450
300

X1
5.5
7.5
8.0
8.0
6.8
7.5
4.5
6.4
7.0
5.0
7.2
7.9
5.9
5.0
7.0

X2
3.3
3.3
3.0
4.5
3.0
4.0
3.0
3.7
3.5
4.0
3.5
3.2
4.0
3.5
2.7

Predicted Y
413.77
363.81
329.08
440.28
359.06
415.70
416.51
420.94
391.13
478.15
386.13
346.40
455.67
441.09
331.82

Gaurav Garg (IIM Lucknow)

Residuals
-63.80
96.15
20.88
-10.31
-9.09
-35.74
13.47
49.03
58.84
11.83
-46.16
-46.44
-15.70
8.89
-31.85

600

500

400

Y
Predicted Y

300

200

100

0
1

10

11

12

Gaurav Garg (IIM Lucknow)

13

14

15

Coefficient of Determination
Coefficient of Determination (R2 ) is obtained using the
same formula as was in simple linear regression.
n

Total Sum of Squares, SST (Yi Y ) 2


i 1

Regression Sum of Squares, SSR (Yi Y ) 2


i 1

Error Sum of Squares, SSE (Yi Yi ) 2


i 1

Also, SST = SSR + SSE

R2 = SSR/SST = 1 (SSE/SST)
R2 is the proportion of variation in Y explained by
regression.
Gaurav Garg (IIM Lucknow)

Since
SST = SSR + SSE
and all three quantities are non-negative,
Also,
0 SSR SST
So

0 SSR/SST 1

Or

0 R2 1

When R2 is close to 0, the linear fit is not good


And X variables do not contribute in explaining the
variability in Y.
When R2 is close to 1, the linear fit is good.
In the previously discussed example, R2 = 0.5215
If we consider Y and X1 only, R2 =0.1965
If we consider Y and X2 only, R2 =0.3095
Gaurav Garg (IIM Lucknow)

Adjusted R2
If one more regressor is added to the model, the value
of R2 will increase
This increase is regardless of the contribution of newly
added regressor.
So, an adjusted value of R2 is defined, which is called as
adjusted R2 and defined as
R

2
Adj

SSE (n k 1 )
1
SST (n 1 )

This Adjusted R2 will only increase, if the additional


variable contribute in explaining the variation in Y.
For our example, Adjusted R2 = 0.4417
Gaurav Garg (IIM Lucknow)

F-Test for Overall Significance


We check if there is a linear relationship between all the
regressors (X1, X2, , Xk) and response (Y).
Use F test statistic
To test:
H0: 1 = 2 = = k = 0 (no regressor is significant)
H1: at least one i 0 (at least one regressor affects Y)

The technique of Analysis of Variance is used.


Assumptions:
n > k, Var(i )= 2, E(i )= 0.
is are independent. This implies that Corr (i , j ) = 0, for i j
is have Normal Distribution. [i ~ N(0, 2)]
[NEW ASSUMPTION]
Gaurav Garg (IIM Lucknow)

Total Sum of Square (SST) is partitioned into


Sum of Squares due to Regression (SSR) and
Sum of Squares due to Residuals (SSE)

where

SST Yi Y
n

i 1
n

SSE e Yi Yi
i 1

2
i

i 1

SSR SST SSE

eis are called the residuals.


Gaurav Garg (IIM Lucknow)

Analysis of Variance Table


df

SS

MS

Fc

Regression

SSR

MSR

MSR/MSE

Residual or Error

n-k-1

SSE

MSE

Total

n-1

SST

Test Statistic:
Fc = MSR / MSE ~ F(k, n-k-1)
For the previous eXample, we wish to test
H0: 1 = 2 = 0 Against H1: at least one i 0
ANOVA Table
df

SS

MS

F(2,12)(0.05)

Regression

29460.03

14730.01

6.5386

3.89

Residual or Error

12

27033.31

2252.78

Total

14

56493.33

Thus H0 is rejected at 5% level of significance.


Gaurav Garg (IIM Lucknow)

Individual Variables Tests of Hypothesis


We test if there is a linear relationship between a
particular regressor Xj and Y
Hypotheses:
H0: j = 0 (no linear relationship)
H1: j 0 (linear relationship exists between Xj and Y)

We use a two tailed t-test


If H0: j = 0 is accepted,
this indicates that the variable Xj can be deleted
from the model.
Gaurav Garg (IIM Lucknow)

Test Statistic:

Tc

bj

2 C jj

Tc ~ Students t with (n-k-1) degree of freedom


bj is the least squares estimate of j
C j j is the (j, j)th element of matrix (XX)-1

MSE

(MSE is obtained in ANOVA Table)


Gaurav Garg (IIM Lucknow)

In our example
2 2252.7755 and

5.7946 0 .3312 1.0165

(X X) 0 .3312 0 .0521 0 .0038


1.0165 0 .0038 0 .2993

To test H0: 1 = 0 against H1: 1 0


Tc = -2.3057
To test H0: 2 = 0 against H1: 2 0
Tc =2.8548
Two tailed critical values of t at 12 d.f. are
3.0545 for 1% level of significance
2.6810 for 2% level of significance
2.1788 for 5% level of significance
Gaurav Garg (IIM Lucknow)

Standard Error
Consider a dataset.
All the observations can not be exactly the same as
arithmetic mean (AM).
Variability of the observations around AM is measured
by standard deviation.
Similarly in regression, all Y values can not be the same
as predicted Y values.
Variability of Y values around the prediction line is
measured by STANDARD ERROR OF THE ESTIMATE.
n
It is given by
2
SYX

SSE

n k 1

(Y
i 1

Gaurav Garg (IIM Lucknow)

Yi )

n k 1

Assumption of Linearity
Linear

Not Linear

residuals

residuals

Gaurav Garg (IIM Lucknow)

Assumption of Equal Variance


We assume that Var(i )= 2
The variance is constant for all observations.

This assumption is examined by looking at the


plot of
Predicted values Yi and residuals ei Yi Yi
Gaurav Garg (IIM Lucknow)

residuals

residuals

Residual Analysis for Equal Variance

Unequal variance

Y
Equal variance

Gaurav Garg (IIM Lucknow)

Assumption of Uncorrelated Residuals


DurbinWatson statistic is a test statistic used to detect
the presence of autocorrelation.
n
It is given by
(e e ) 2
d

i 2

i 1

e
i 1

2
i

The value of d always lies between 0 and 4.


d = 2 indicates no autocorrelation.
Small values of d < 2 indicate successive error terms are
positively correlated.
If d > 2 successive error terms are negatively correlated.
The value of d more than 3 and less than 1 are alarming.
Gaurav Garg (IIM Lucknow)

Residual Analysis for Independence


(Uncorrelated Errors)

residuals

Independent

Y
residuals

residuals

Not Independent

Gaurav Garg (IIM Lucknow)

Assumption of Normality
When we use F test or t test, we assume that 1,
2 , , n are normally distributed.
This assumption can be examined by histogram
of residuals.

NORMAL

NOT NORMAL
Gaurav Garg (IIM Lucknow)

Normality can also be examined using Q-Q plot


or Normal probability plot.

NORMAL

NOT NORMAL
Gaurav Garg (IIM Lucknow)

Standardized Regression Coefficient


In a multiple linear regression, we may like to know
which regressor contributes more.
We obtain standardized estimates of regression
coefficients.
For that, first we standardize the observations.
1 n
Y Yi ,
n i 1

sY

1 n
2
(
Y

Y
)
i
n 1 i 1

1 n
X 1 X 1i , s X1
n i 1

1 n
2
(
X

X
)

1i
1
n 1 i 1

1 n
X 2 X 2i , s X 2
n i 1

1 n
2
(
X

X
)

2i
2
n 1 i 1

Gaurav Garg (IIM Lucknow)

Standardize all Y, X1 and X2 values as follows:


Y Y
Standardized Yi
,
sY
X1i X1
X 2i X 2
Standardized X1i
, Standardized X 2i
s X1
sX 2

Fit the regression in the standardized data and obtain


the least squares estimate of regression coefficients.
These coefficients are dimensionless or unit-free and
can be compared.
Look for the regression coefficient having the highest
magnitude.
Corresponding regressor contributes the most.
Gaurav Garg (IIM Lucknow)

Standardized Data
Week

Pie
Sales

Price
($)

Advertising
($100s)

-0.78

-0.95

-0.37

0.96

0.76

-0.37

-0.78

1.18

-0.98

0.48

1.18

2.09

-0.78

0.16

-0.98

-0.30

0.76

1.06

0.48

-1.80

-0.98

1.11

-0.18

0.45

0.80

0.33

0.04

10

1.43

-1.38

1.06

11

-0.93

0.50

0.04

12

-1.56

1.10

-0.57

13

0.64

-0.61

1.06

14

0.80

-1.38

0.04

15

-1.56

0.33

-1.60

Y = 0 0.461 X1 + 0.570 X2
Since 0.461 < 0.570
X2 Contributes the most

Gaurav Garg (IIM Lucknow)

Note that:
2
R Adj

(1 R 2 )( n 1)
1
(n k 1)

(n k 1) R 2
Fc
k (1 R 2 )

Adjusted R2 can be negative


Adjusted R2 is always less than or equal to R2
Inclusion of intercept term is not necessary.
It depends on the problem.
Analyst may decide on this.
Gaurav Garg (IIM Lucknow)

Example: Following data was collected for the sales, number of


advertisements published and advertizing expenditure for 12
weeks. Fit a regression model to predict the sales.
Sales (0,000 Rs)

Ads (Nos.)

Adv Ex (000 Rs)

43.6

12

13.9

38.0

11

12

30.1

9.3

35.3

9.7

46.4

12

12.3

34.2

11.4

30.2

9.3

40.7

13

14.3

38.5

10.2

22.6

8.4

37.6

11.2

35.2

10

11.1

Gaurav Garg (IIM Lucknow)

ANOVAb
Model
1

Sum of
Squares
309.986

Regression
Residual
Total

df

Mean Square
2
154.993

143.201

453.187

11

F
9.741

Sig.
.006a

15.911

a. Predictors: (Constant), Ex_Adv, No_Adv

CONTRADICTION

b. Dependent Variable: Sales

p-value < 0.05;

H0 is rejected;

All s are not zero

All p-values > 0.05; No H0 rejected.

0 =0, 1 =0, 2 =0

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model
B
Std. Error
Beta
1
(Constant)
6.584
8.542
No_Adv
.625
1.120
.234
Ex_Adv
2.139
1.470
.611
a. Dependent Variable: Sales
Gaurav Garg (IIM Lucknow)

.771
.558
1.455

Sig.
.461
.591
.180

Multicollinearity
We assume that regressors are independent variables.
When we regress Y on regressors X1, X2, , Xk.

We assume that all regressors X1, X2, , Xk are


statistically independent of each other.
All the regressors affect the values of Y.
One regressor does not affect the values of other
regressor.
Sometimes, in practice this assumption is not met.
We face the problem of multicollinearity.
The correlated variables contribute redundant information
to the model
Gaurav Garg (IIM Lucknow)

Including two highly correlated independent variables can


adversely affect the regression results
Can lead to unstable coefficients
Some Indications of Strong Multicollinearity:
Coefficient signs may not match prior expectations
Large change in the value of a previous coefficient when a new
variable is added to the model

A previously significant variable becomes insignificant when a


new independent variable is added.
F says at least one variable is significant, but none of the ts
indicates a useful variable.

Large standard error and corresponding regressors is still


significant.
MSE is very high and/or R2 is very small
Gaurav Garg (IIM Lucknow)

EXAMPLES IN WHICH THIS MIGHT HAPPEN:


Miles per gallon Vs. horsepower and engine size
Income Vs. age and experience
Sales Vs. No. of Advertisement and Advert. Expenditure

Variance Inflationary Factor:


VIFj is used to measure multicollinearity generated
by variable Xj
It is given by
1

VIF j

1 R

2
j

where R2j is the coefficient of determination of a


regression model that uses
Xj as the dependent variable and
all other X variables as the independent variables.
Gaurav Garg (IIM Lucknow)

If VIFj > 5, Xj is highly correlated with the other


independent variables
Mathematically, the problem of multicollinearity occurs
when the columns of matrix X have near linear
dependence
LSE b can not be obtained when the matrix XX is singular
The matrix XX becomes singular when
the columns of matrix X have exact linear dependence
If any of the eigen value of matrix XX is zero

Thus, near zero eigen value is also an indication of


multicollinearity.
The methods of dealing with multicollinearity:
Collecting Additional Data
Variable Elimination
Gaurav Garg (IIM Lucknow)

Coefficientsa
Standardize
d
Unstandardized
Coefficients
Coefficients
Model
B
Std. Error
Beta
1
(Constant)
6.584
8.542
No_Adv
.625
1.120
.234
Ex_Adv
2.139
1.470
.611
a. Dependent Variable: Sales

t
.771
.558
1.455

Tolerance = 1/VIF

Sig.
.461
.591
.180

Collinearity
Statistics
Tolerance VIF
.199 5.022
.199 5.022

Greater than 5

Collinearity Diagnosticsa
Variance Proportions
Condition
Index
Model
Dimension
Eigenvalue
(Constant) No_Adv Ex_Adv
1
1
2.966
1.000
.00
.00
.00
2
.030
9.882
.33
.17
.00
3
.003
30.417
.67
.83
1.00
a. Dependent Variable: Sales

Negligible Value

Large Value

Gaurav Garg (IIM Lucknow)

We may use the method of variable elimination.


In practice, If Corr (X1, X2) is more than 0.7 or
less than -0.7, we eliminate one of them.
Techniques:
Stepwise
Forward Inclusion
Backward Elimination

(based on ANOVA)
(based on Correlation)
(based on Correlation)

Gaurav Garg (IIM Lucknow)

Stepwise Regression
Y = 0 + 1 X 1 + 2 X2 + 3 X3 + 4 X4 + 5 X5 +
Step 1: Run 5 simple linear regressions:

Y = 0 + 1 X1
Y = 0 + 2 X2
Y = 0 + 3 X3
Y = 0 + 4 X4 <==== has lowest p-value (ANOVA) < 0.05
Y = 0 + 5 X5

Step 2: Run 4 two-variable linear regressions:

Y = 0 + 4 X4 + 1 X1
Y = 0 + 4 X4 + 2 X2
Y = 0 + 4 X4 + 3 X3 <= has lowest p-value (ANOVA) < 0.05
Y = 0 + 4 X4 + 5 X5
Gaurav Garg (IIM Lucknow)

Step 3: Run 3 three-variable linear regressions:


Y = 0 + 3 X3 + 4 X4 + 1 X1
Y = 0 + 3 X3 + 4 X4 + 2 X2
Y = 0 + 3 X3 + 4 X4 + 5 X5

Suppose none of these models have


p-values < 0.05

STOP
Best model is the one with X3 and X4 only
Gaurav Garg (IIM Lucknow)

Example: Following data was collected for the sales, number of


advertisements published and advertizing expenditure for 12
months. Fit a regression model to predict the sales.
Sales (0,000 Rs)

Ads (Nos.)

Adv Ex (000 Rs)

43.6

12

13.9

38.0

11

12

30.1

9.3

35.3

9.7

46.4

12

12.3

34.2

11.4

30.2

9.3

40.7

13

14.3

38.5

10.2

22.6

8.4

37.6

11.2

35.2

10

11.1

Gaurav Garg (IIM Lucknow)

Summary Output 1: Sales Vs. No_Adv


Model Summary
Model
R
R Square
Adjusted R Square
a
1
.781
.610
.571
a. Predictors: (Constant), No_Adv
Model
1

Std. Error of the


Estimate
4.20570

ANOVAb
Sum of Squares
df
Mean Square
F
276.308
1
276.308 15.621
176.879
10
17.688

Regression
Residual
Total

453.187

Sig.
.003a

11

a. Predictors: (Constant), No_Adv


b. Dependent Variable: Sales
Coefficientsa
Model
1

(Constant)

No_Adv
a. Dependent Variable: Sales

Standardized
Unstandardized Coefficients Coefficients
B
Std. Error
Beta
16.937
4.982
2.083
.527
.781

Gaurav Garg (IIM Lucknow)

t
3.400
3.952

Sig.
.007
.003

Summary Output 2: Sales Vs. Ex_Adv


Model Summary
Model
R
R Square
Adjusted R Square
a
1
.820
.673
.640
a. Predictors: (Constant), Ex_Adv
Model
1

Std. Error of the


Estimate
3.84900

ANOVAb
Sum of Squares
df
Mean Square
F
305.039
1
305.039 20.590
148.148
10
14.815

Regression
Residual
Total

453.187

Sig.
.001a

11

a. Predictors: (Constant), Ex_Adv


b. Dependent Variable: Sales
Coefficientsa
Model
1

(Constant)

Ex_Adv
a. Dependent Variable: Sales

Standardized
Unstandardized Coefficients Coefficients
B
Std. Error
Beta
4.173
7.109
2.872
.633
.820

Gaurav Garg (IIM Lucknow)

t
.587
4.538

Sig.
.570
.001

Summary Output 3: Sales Vs. No_Adv & Ex_Adv


Model Summary
Model
R
R Square
Adjusted R Square
a
1
.827
.684
.614
a. Predictors: (Constant), Ex_Adv, No_Adv
Model
1

ANOVAb
Sum of Squares
df
309.986
143.201

Regression
Residual
Total

453.187
a. Predictors: (Constant), Ex_Adv, No_Adv

Std. Error of the


Estimate
3.98888

Mean Square
2
154.993
9
15.911

F
9.741

Sig.
.006a

Sig.
.461
.591
.180

11

b. Dependent Variable: Sales

Coefficientsa
Model
1

(Constant)

No_Adv
Ex_Adv
a. Dependent Variable: Sales

Unstandardized Coefficients
B
Std. Error
6.584
8.542
.625
1.120
2.139
1.470

Gaurav Garg (IIM Lucknow)

Standardized
Coefficients
Beta
.234
.611

.771
.558
1.455

Qualitative Independent Variables


Johnson Filtration, Inc., provides maintenance
service for water filtration systems throughout
southern Florida.
To estimate the service time and the service cost,
the managers want to predict the repair time
necessary for each maintenance request.
Repair time is believed to be related to two
factors Number of months since the last maintenance
service
Type of repair problem (mechanical or electrical)
Gaurav Garg (IIM Lucknow)

Data for a sample of 10 service calls are given:


Service Call
1
2
3
4
5
6
7
8
9
10

Months Since Last


Service
2
6
8
3
2
7
9
8
4
6

Type of Repair
electrical
mechanical
electrical
mechanical
electrical
electrical
mechanical
mechanical
electrical
electrical

Repair Time in
Hours
2.9
3.0
4.8
1.8
2.9
4.9
4.2
4.8
4.4
4.5

Let Y denote the repair time, X1 denote the number of


months since last maintenance service.
Regression Model that uses X1 only to regress Y is
Y=0+ 1 X1+
Gaurav Garg (IIM Lucknow)

Using least squares method, we fitted the model as

Y 2.1473 0.3041 X 1

R2 =0.534
At 5% level of significance, we reject
H0: 0 = 0 (Using t test)
H0: 1 = 0 (Using t and F test)
X1 alone explains 53.4% variability in repair time.
To introduce the type of repair into the model, we define a
dummy variable given as
0, if type of repair is mechanical
X2
1, if type of repair is electrical

Regression Model that uses X1 and X2 to regress Y is


Y=0+ 1 X1+ 2 X2+

Is the new model improved?

Gaurav Garg (IIM Lucknow)

Summary
Multiple linear regression model Y=X +
Least Squares Estimate of is given by b= (XX)-1XY
R2 and adjusted R2
Using ANOVA (F test), we examine if all s are zero or
not.
t test is conducted for each regressor separately.
Using t test, we examine if corresponding to that
regressor is zero or not.
Problem of Multicollinearity VIF, eigen value
Dummy Variable
Examining the assumptions :
common variance, independence, normality
Gaurav Garg (IIM Lucknow)

S-ar putea să vă placă și