Chapter 03

IOM 530: Intro.
to Statistical Learning
LINEAR REGRESSION
Chapter 03
IOM 530: Intro. to Statistical Learning
Outline
The Linear Regression Model
Least Squares Fit
Measures of Fit
Inference in Regression
Other Considerations in Regression Model
Qualitative Predictors
Interaction Terms
Potential Fit Problems
Linear vs. KNN Regression
Outline
Least Squares Fit
Measures of Fit
Interaction Terms

Yi = b0 + b1X1 + b2 X2 +
+ bp Xp +e
The parameters in the linear regression model are very
easy to interpret.
0 is the intercept (i.e. the average value for Y if all the
Xs are zero), j is the slope for the jth variable Xj
j is the average increase in Y when Xj is increased by
one and all other Xs are held constant.
Least Squares Fit

We estimate the
parameters using
least squares i.e.
minimize
1
MSE = Yi - Yi
n i=1
1 n
= Yi - b0 - b1 X1 n i=1
- b p X p
Relationship between population and

least squares lines
Population
line
Yi = b0 + b1X1 + b2 X2 +
Least Squares
line
Yi = b0 + b1 X1 + b2 X2 +
+ bp Xp +e
+ b p X p
We would like to know 0 through p i.e. the population line.

Instead we know 0 through p i.e. the least squares line.
Hence we use 0 through p as guesses for 0 through p and Yi
as a guess for Yi. The guesses will not be perfect just as X is
not a perfect guess for .
Measures of Fit: R2
Some of the variation in Y can be explained by variation
in the Xs and some cannot.
R2 tells you the fraction of variance that can be explained
by X.
RSS
Ending Variance
R 1
1
2
Starting Variance
(Yi Y )
2
R2 is always between 0 and 1. Zero means no variance has

been explained. One means it has all been explained (perfect
fit to the data).
Estimated (least squares) line.
The regression line from the
sample is not the regression line

from the population.
What we want to do:
Assess how well the line

describes the plot.
Guess the slope of the
population line.
Guess what value Y would take
for a given X value
14
12
10
8
6
4
2
0
-10
-5
0
X
10
True (population) line. Unobserved
Some Relevant Questions

1.
Is j=0 or not? We can use a hypothesis test to answer

this question. If we cant be sure that j0 then there is
no point in using Xj as one of our predictors.
1.
Can we be sure that at least one of our X variables is a

useful predictor i.e. is it the case that 1= 2== p=0?
10
1. Is j=0 i.e. is Xj an important variable?

We use a hypothesis test to answer this question
H0: j=0
Calculate
vs
t
Ha: j0
Number of standard deviations
away from zero.
SE( j )
If t is large (equivalently p-value is small) we can be sure
that j0 and that there is a relationship

Regression coefficients
Coefficient
Constant
7.0326
TV
0.0475
1 is 17.67 SEs from 0
Std Err
0.4578
0.0027
SE( 1 )
t-value
15.3603
17.6676
p-value
0.0000
0.0000
P-value
11
Testing Individual Variables

Is there a (statistically detectable) linear relationship between
Newspapers and Sales after all the other variables have been
accounted for?
Coefficient
Constant
2.9389
TV
0.0458
Radio
0.1885
Newspaper
-0.0010
Coefficient
Constant
12.3514
Newspaper
0.0547
Std Err
0.3119
0.0014
0.0086
0.0059
Std Err
0.6214
0.0166
t-value
9.4223
32.8086
21.8935
-0.1767
t-value
19.8761
3.2996
p-value
0.0000
0.0000
0.0000
0.8599
p-value
0.0000
0.0011
No: big p-value

Small p-value in
simple regression
Almost all the explaining that Newspapers could do in simple

regression has already been done by TV and Radio in multiple
regression!
12
2. Is the whole regression explaining

anything at all?
Test for:
H0: all slopes = 0
(1=2==p=0),
Ha: at least one slope 0

ANOVA Table
Source
Explained
Unexplained
df
2
197
SS
4860.2347
556.9140
MS
2430.1174
2.8270
F
859.6177
p-value
0.0000
Answer comes from the F test in the ANOVA (ANalysis Of

VAriance) table.
The ANOVA table has many pieces of information. What we
care about is the F Ratio and the corresponding p-value.
Outline
Least Squares Fit
Measures of Fit
Interaction Terms
13
14
How do you stick men and women (category listings)
into a regression equation?

Code them as indicator variables (dummy variables)
For example we can code Males=0 and Females= 1.
15
Interpretation
Suppose we want to include income and gender.
Two genders (male and female). Let
0 if male
Genderi =
1 if female
then the regression equation is
b0 + b1Incomei if male
Yi b0 + b1Incomei + b2Genderi =
b0 + b1Incomei + b2 if female
2 is the average extra balance each month that females
have for given income level. Males are the baseline.
Constant
Income
Gender_Female
Coefficient
233.7663
0.0061
24.3108
Std Err
39.5322
0.0006
40.8470
t-value
5.9133
10.4372
0.5952
p-value
0.0000
0.0000
0.5521
16
Other Coding Schemes

There are different ways to code categorical variables.
Two genders (male and female). Let
-1 if male
Genderi =
1 if female
then the regression equation is
b0 + b1Incomei -b2, if male

Yi b0 + b1Incomei + b2Genderi =
b0 + b1Incomei + b2 , if female
2 is the average amount that females are above the
average, for any given income level. 2 is also the

average amount that males are below the average, for
any given income level.
Other Issues Discussed

Interaction terms
Non-linear effects
Multicollinearity
Model Selection
17
18
Interaction
When the effect on Y of increasing X1 depends on
another X2.
Example:
Maybe the effect on Salary (Y) when increasing Position (X1)
depends on gender (X2)?
For example maybe Male salaries go up faster (or slower)
than Females as they get promoted.
Advertising example:
TV and radio advertising both increase sales.
Perhaps spending money on both of them may increase
sales more than spending the same amount on one alone?
19
Interaction in advertising
Sales = b0 + b1 TV + b2 Radio+ b3 TV Radio
Sales = b0 +(b1 + b3 Radio)TV + b2 Radio
Spending $1 extra on TV increases average sales
Interaction Term
by 0.0191 + 0.0011Radio
Sales = b0 + (b2 + b3 TV) Radio + b2 TV
Spending $1 extra on Radio increases average
sales by 0.0289 + 0.0011TV

Parame ter Estimates
Term
Intercept
TV
Radio
TV*Radio
Estimate
6.7502202
0.0191011
0.0288603
0.0010865
Std Error t Ratio Prob>|t|

0.247871 27.23 <.0001*
0.001504 12.70 <.0001*
0.008905
3.24 0.0014*
5.242e-5 20.73 <.0001*
20
Parallel Regression Lines

Line for women
Expanded Estimates
170
Nominal factors expanded to all levels

Term
Estimate Std Error
t Ratio
Prob>|t|
Intercept
Gender[female]
Gender[m ale]
Pos ition
77.52
3.53
-3.53
21.60
<.0001
0.0005
0.0005
<.0001
112.77039
1.8600957
-1.860096
6.0553559
1.454773
0.527424
0.527424
0.280318
160
150
140
Regression equation
130
female: salary = 112.77+1.86 + 6.05 position
120
males: salary = 112.77-1.86 + 6.05 position
Different
intercepts
Same
slopes
2.5
7.5
Position
Line for men

Parallel lines have the same slope.
Dummy variables give lines
different intercepts, but their
slopes are still the same.
10
21
Interaction Effects
Our model has forced the line for men and the line for
women to be parallel.
Parallel lines say that promotions have the same salary
benefit for men as for women.

If lines arent parallel then promotions affect mens and
womens salaries differently.
22
Should the Lines be Parallel?

170
160
150
140
Expanded Estimate s
130
Nominal factors expanded to all levels

Term
Estimate Std Error
t Ratio
Prob>|t|
Intercept
112.63081
Gender[female]
1.1792165
Gender[m ale]
-1.179216
Pos ition
6.1021378
Gender[female]*Position0.1455111
Gender[m ale]*Pos ition -0.145511
75.85
0.79
-0.79
20.58
0.49
-0.49
<.0001
0.4280
0.4280
<.0001
0.6242
0.6242
1.484825
1.484825
1.484825
0.296554
0.296554
0.296554
Interaction between gender and position
120
110
0 1 2 3 4
5 6 7
8 9 10
Position
Interaction is not significant
Outline
Least Squares Fit
Measures of Fit
Interaction Terms
23
24

There are a number of possible problems that one may
encounter when fitting the linear regression model.
1. Non-linearity of the data
2. Dependence of the error terms
3. Non-constant variance of error terms
4. Outliers
5. High leverage points
6. Collinearity
See Section 3.3.3 for more details.
Outline
Least Squares Fit
Measures of Fit
Interaction Terms
25
26
KNN Regression
kNN Regression is similar to the kNN classifier.
To predict Y for a given value of X, consider k closest
points to X in training data and take the average of the

responses. i.e.
f (x) =
1
yi
K xi Ni
If k is small kNN is much more flexible than linear
regression.
Is that better?
KNN Fits for k =1 and k = 9
27
28
KNN Fits in One Dimension (k =1 and k =

9)
Linear Regression Fit
29
KNN vs. Linear Regression
30
Not So Good in High Dimensional

Situations
31

Chapter 03

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Chapter 03

Încărcat de

Drepturi de autor:

Formate disponibile

IOM 530: Intro.

IOM 530: Intro. to Statistical Learning

IOM 530: Intro. to Statistical Learning

IOM 530: Intro. to Statistical Learning

The Linear Regression Model

The parameters in the linear regression model are very

Xs are zero), j is the slope for the jth variable Xj

j is the average increase in Y when Xj is increased by

one and all other Xs are held constant.

IOM 530: Intro. to Statistical Learning

Least Squares Fit

IOM 530: Intro. to Statistical Learning

Relationship between population and

We would like to know 0 through p i.e. the population line.

IOM 530: Intro. to Statistical Learning

in the Xs and some cannot.

R2 tells you the fraction of variance that can be explained

R2 is always between 0 and 1. Zero means no variance has

IOM 530: Intro. to Statistical Learning

The regression line from the

sample is not the regression line

Assess how well the line

True (population) line. Unobserved

IOM 530: Intro. to Statistical Learning

Some Relevant Questions

Is j=0 or not? We can use a hypothesis test to answer

Can we be sure that at least one of our X variables is a

IOM 530: Intro. to Statistical Learning

1. Is j=0 i.e. is Xj an important variable?

If t is large (equivalently p-value is small) we can be sure

that j0 and that there is a relationship

1 is 17.67 SEs from 0

IOM 530: Intro. to Statistical Learning

Testing Individual Variables

No: big p-value

Almost all the explaining that Newspapers could do in simple

IOM 530: Intro. to Statistical Learning

2. Is the whole regression explaining

H0: all slopes = 0

Ha: at least one slope 0

Answer comes from the F test in the ANOVA (ANalysis Of

IOM 530: Intro. to Statistical Learning

IOM 530: Intro. to Statistical Learning

into a regression equation?

For example we can code Males=0 and Females= 1.

IOM 530: Intro. to Statistical Learning

then the regression equation is

have for given income level. Males are the baseline.

IOM 530: Intro. to Statistical Learning

Other Coding Schemes

then the regression equation is

b0 + b1Incomei -b2, if male

average, for any given income level. 2 is also the

IOM 530: Intro. to Statistical Learning

Other Issues Discussed

IOM 530: Intro. to Statistical Learning

IOM 530: Intro. to Statistical Learning

sales by 0.0289 + 0.0011TV

Std Error t Ratio Prob>|t|

IOM 530: Intro. to Statistical Learning

Parallel Regression Lines

Nominal factors expanded to all levels

female: salary = 112.77+1.86 + 6.05 position

males: salary = 112.77-1.86 + 6.05 position