Documente Academic
Documente Profesional
Documente Cultură
Chap 11-1
Statistics for Managers
using Microsoft Excel
3
rd
Edition
Chapter 11
Simple Linear Regression
2002 Prentice-Hall, Inc.
Chap 11-2
Chapter Topics
Types of regression models
Determining the simple linear regression
equation
Measures of variation
Assumptions of regression and correlation
Residual analysis
Measuring autocorrelation
Inferences about the slope
2002 Prentice-Hall, Inc.
Chap 11-3
Chapter Topics
Correlation - measuring the strength of the
association
Estimation of mean values and prediction of
individual values
Pitfalls in regression and ethical issues
(continued)
2002 Prentice-Hall, Inc.
Chap 11-4
Purpose of Regression Analysis
Regression analysis is used primarily to
model causality and provide prediction
Predicts the value of a dependent (response)
variable based on the value of at least one
independent (explanatory) variable
Explains the effect of the independent variables
on the dependent variable
2002 Prentice-Hall, Inc.
Chap 11-5
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
2002 Prentice-Hall, Inc.
Chap 11-6
Simple Linear Regression Model
Relationship between variables
is described by a linear function
The change of one variable
causes the change in the
other variable
A dependency of one variable
on the other
2002 Prentice-Hall, Inc.
Chap 11-7
Population
Regression
Line
(conditional mean)
Population Linear Regression
Population regression line is a straight line that
describes the dependence of the average value
(conditional mean) of one variable on the other
Population
Y intercept
Population
Slope
Coefficient
Random
Error
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
i i i
Y X | | c
0 1
+ + =
YX
Y b b X = + =
Sample Regression Line
(Fitted Regression Line, Predicted Value)
2002 Prentice-Hall, Inc.
Chap 11-10
Sample Linear Regression
and are obtained by finding the values
of and that minimizes the sum of the
squared residuals
provides an estimate of
provides and estimate of
0
b
1
b
0
b
1
b
0
b |
0
1
b |
1
(continued)
( )
2
2
1 1
n n
i i i
i i
Y Y e
= =
=
2002 Prentice-Hall, Inc.
Chap 11-11
Sample Linear Regression
(continued)
Y
X
Observed Value
YX i
X | |
0 1
= +
i
c
|
0
|
1
i i i
Y X | | c
0 1
+ + =
0 1
i
i
Y b b X = +
i
e
0 1 i i i
b b Y X e + + =
1
b
0
b
2002 Prentice-Hall, Inc.
Chap 11-12
Interpretation of the
Slope and the Intercept
is the average value of Y
when the value of X is zero.
measures the change in the
average value of Y as a result of a one-unit
change in X.
( )
| 0 E Y X |
0
= =
( )
1
| E Y X
X
|
A
=
A
2002 Prentice-Hall, Inc.
Chap 11-13
is the estimated average
value of Y when the value of X is zero.
is the estimated change in
the average value of Y as a result of a one-
unit change in X.
(continued)
( )
| 0 b E Y X
0
= =
( )
1
| E Y X
b
X
A
=
A
Interpretation of the
Slope and the Intercept
2002 Prentice-Hall, Inc.
Chap 11-14
Simple Linear Regression:
Example
You want to examine
the linear dependency
of the annual sales of
produce stores on their
size in square footage.
Sample data for seven
stores were obtained.
Find the equation of
the straight line that
fits the data best.
Annual
Store Square Sales
Feet ($1000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
2002 Prentice-Hall, Inc.
Chap 11-15
Scatter Diagram: Example
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S qua re Fe e t
A
n
n
u
a
l
S
a
l
e
s
(
$
0
0
0
)
Excel Output
2002 Prentice-Hall, Inc.
Chap 11-16
Equation for the Sample
Regression Line: Example
0 1
1636.415 1.487
i i
i
Y b b X
X
= +
= +
From Excel Printout:
Co effi ci en ts
I n t e r c e p t 1 6 3 6 . 4 1 4 7 2 6
X V a r i a b l e 1 1 . 4 8 6 6 3 3 6 5 7
2002 Prentice-Hall, Inc.
Chap 11-17
Graph of the Sample
Regression Line: Example
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S q u a r e F e e t
A
n
n
u
a
l
S
a
l
e
s
(
$
0
0
0
)
2002 Prentice-Hall, Inc.
Chap 11-18
Interpretation of Results:
Example
The slope of 1.487 means that for each increase of
one unit in X, we predict the average of Y to
increase by an estimated 1.487 units.
The model estimates that for each increase of one
square foot in the size of the store, the expected
annual sales are predicted to increase by $1487.
1636.415 1.487
i i
Y X = +
2002 Prentice-Hall, Inc.
Chap 11-19
Simple Linear Regression in
PHStat
In excel, use PHStat | regression | simple
linear regression
EXCEL spreadsheet of regression sales on
footage
2002 Prentice-Hall, Inc.
Chap 11-20
Measure of Variation:
The Sum of Squares
SST = SSR + SSE
Total
Sample
Variability
=
Explained
Variability
+
Unexplained
Variability
2002 Prentice-Hall, Inc.
Chap 11-21
Measure of Variation:
The Sum of Squares
SST = total sum of squares
Measures the variation of the Y
i
values around
their mean Y
SSR = regression sum of squares
Explained variation attributable to the relationship
between X and Y
SSE = error sum of squares
Variation attributable to factors other than the
relationship between X and Y
(continued)
2002 Prentice-Hall, Inc.
Chap 11-22
Measure of Variation:
The Sum of Squares
(continued)
X
i
Y
X
Y
SST = (Y
i
- Y)
2
SSE =(Y
i
- Y
i
)
2
.
SSR = (Y
i
- Y)
2
.
_
_
_
2002 Prentice-Hall, Inc.
Chap 11-23
Venn Diagrams and
Explanatory Power of Regression
Sales
Sizes
Variations in sales
explained by sizes or
variations in sizes
used in explaining
variation in sales
Variations in
sales explained
by the error
term
Variations in
store sizes not
used in
explaining
variation in
sales
( )
SSE
( )
SSR
2002 Prentice-Hall, Inc.
Chap 11-24
The ANOVA Table in Excel
ANOVA
df SS MS F
Significance
F
Regression p SSR
MSR
=SSR/p
MSR/MSE
P-value of
the F Test
Residuals n-p-1 SSE
MSE
=SSE/(n-p-1)
Total n-1 SST
2002 Prentice-Hall, Inc.
Chap 11-25
Measures of Variation
The Sum of Squares: Example
ANOVA
df SS MS F Significance F
Regression 1 30380456.12 30380456 81.17909 0.000281201
Residual 5 1871199.595 374239.92
Total 6 32251655.71
Excel Output for Produce Stores
SSR
SSE
Regression (explained) df
Degrees of freedom
Error (residual) df
Total df
SST
2002 Prentice-Hall, Inc.
Chap 11-26
The Coefficient of Determination
Measures the proportion of variation in Y
that is explained by the independent
variable X in the regression model
2
Regression Sum of Squares
Total Sum of Squares
SSR
r
SST
= =
2002 Prentice-Hall, Inc.
Chap 11-27
Venn Diagrams and
Explanatory Power of Regression
Sales
Sizes
2
SSR
SSR S
r
SE
=
=
+
2002 Prentice-Hall, Inc.
Chap 11-28
Coefficients of Determination (r
2
)
and Correlation (r)
r
2
= 1,
r
2
= 1,
r
2
= .8, r
2
= 0,
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
r = +1
r = -1
r = +0.9 r = 0
2002 Prentice-Hall, Inc.
Chap 11-29
Standard Error of Estimate
The standard deviation of the variation of
observations around the regression line
( )
2
1
2 2
n
i
i
YX
Y Y
SSE
S
n n
=
= =
= = +
=
=
Should be close to 2.
If not, examine the model
for autocorrelation.
2002 Prentice-Hall, Inc.
Chap 11-39
Durbin-Watson Statistic
in PHStat
PHStat | regression | simple linear regression
Check the box for Durbin-Watson Statistic
2002 Prentice-Hall, Inc.
Chap 11-40
Obtaining the Critical Values of
Durbin-Watson Statistic
5 o=.0
p=1 p=2
n d
L
d
U
d
L
d
U
15 1.08 1.36 .95 1.54
16 1.10 1.37 .98 1.54
Table 13.4 Finding critical values of Durbin-Watson Statistic
2002 Prentice-Hall, Inc.
Chap 11-41
Accept H
0
(no autocorrelatin)
Using the
Durbin-Watson Statistic
: No autocorrelation (error terms are independent)
: There is autocorrelation (error terms are not
independent)
0
H
1
H
0 4 2 d
L
4-d
L
d
U
4-d
U
Reject H
0
(positive
autocorrelation)
Inconclusive
Reject H
0
(negative
autocorrelation)
2002 Prentice-Hall, Inc.
Chap 11-42
Residual Analysis
for Independence
Not Independent
Independent
e
e
Time
Time
Residual is plotted against time to detect any autocorrelation
No Particular Pattern Cyclical Pattern
Graphical Approach
2002 Prentice-Hall, Inc.
Chap 11-43
Inference about the Slope:
t Test
t test for a population slope
Is there a linear dependency of Y on X ?
Null and alternative hypotheses
H
0
: |
1
= 0 (no linear dependency)
H
1
: |
1
= 0 (linear dependency)
Test statistic
1
1
1 1
2
1
where
( )
YX
b
n
b
i
i
b S
t S
S
X X
|
=
= =
. . 2 d f n =
2002 Prentice-Hall, Inc.
Chap 11-44
Example: Produce Store
Data for Seven Stores:
Estimated
Regression
Equation:
The slope of this
model is 1.487.
Is square footage of
the store affecting its
annual sales?
.
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760
Y
i
= 1636.415 +1.487X
i
2002 Prentice-Hall, Inc.
Chap 11-45
Inferences about the Slope:
t Test Example
H
0
: |
1
= 0
H
1
: |
1
= 0
o = .05
df = 7 - 2 = 5
Critical Value(s):
Test Statistic:
Decision:
Conclusion:
There is evidence that
square footage affects
annual sales.
t
0 2.5706 -2.5706
.025
Reject Reject
.025
From Excel Printout
Reject H
0
Coefficients Standard Error t Stat P-value
Intercept 1636.4147 451.4953 3.6244 0.01515
Footage 1.4866 0.1650 9.0099 0.00028
1
b
1
b
S
t
2002 Prentice-Hall, Inc.
Chap 11-46
Inferences about the Slope:
Confidence Interval Example
Confidence Interval Estimate of the Slope:
1
1 2 n b
b t S
=
= =
=
1
= =
2002 Prentice-Hall, Inc.
Chap 11-55
Example: Produce Stores
Reg ressi o n S tati sti cs
M u l t i p l e R 0 . 9 7 0 5 5 7 2
R S q u a r e 0 . 9 4 1 9 8 1 2 9
A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4
S t a n d a r d E r r o r 6 1 1 . 7 5 1 5 1 7
O b s e r va t i o n s 7
From Excel Printout
r
Is there any
evidence of a linear
relationship between
the annual sales of a
store and its square
footage at .05 level
of significance?
H
0
:
= 0 (No association)
H
1
: = 0 (Association)
o = .05
df = 7 - 2 = 5
2002 Prentice-Hall, Inc.
Chap 11-56
Example:
Produce Stores Solution
0 2.5706 -2.5706
.025
Reject Reject
.025
Critical Value(s):
Conclusion:
There is evidence of a
linear relationship at 5%
level of significance
Decision:
Reject H
0
2
.9706
9.0099
1 .9420
5
2
r
t
r
n
= = =
( )
i
i n YX
n
i
i
X X
Y t S
n
X X
=
2002 Prentice-Hall, Inc.
Chap 11-58
Prediction of Individual Values
Prediction interval for individual response
Y
i
at a particular X
i
Addition of one increases width of interval
from that for the mean of Y
2
2
2
1
( ) 1
1
( )
i
i n YX
n
i
i
X X
Y t S
n
X X
+ +
4610.45 612.66
( )
i
i n YX
n
i
i
X X
Y t S
n
X X
+ =
Predicted Sales Y
i
= 1636.415 +1.487X
i
= 4610.45 ($000)
.
X = 2350.29 S
YX
= 611.75
t
n-2
= t
5
= 2.5706
Confidence Interval Estimate for
|
i
Y X X
=
2002 Prentice-Hall, Inc.
Chap 11-62
Prediction Interval for Y :
Example
Find the 95% prediction interval
for the annual sales of a 2,000 square-foot store
Predicted Sales Y
i
= 1636.415 +1.487X
i
= 4610.45 ($000)
.
X = 2350.29 S
YX
= 611.75
t
n-2
= t
5
= 2.5706
2
2
2
1
( ) 1
1 4610.45 1687.68
( )
i
i n YX
n
i
i
X X
Y t S
n
X X
+ + =