Sunteți pe pagina 1din 68

2002 Prentice-Hall, Inc.

Chap 11-1
Statistics for Managers
using Microsoft Excel
3
rd
Edition
Chapter 11
Simple Linear Regression
2002 Prentice-Hall, Inc.

Chap 11-2
Chapter Topics
Types of regression models
Determining the simple linear regression
equation
Measures of variation
Assumptions of regression and correlation
Residual analysis
Measuring autocorrelation
Inferences about the slope
2002 Prentice-Hall, Inc.

Chap 11-3
Chapter Topics
Correlation - measuring the strength of the
association
Estimation of mean values and prediction of
individual values
Pitfalls in regression and ethical issues
(continued)
2002 Prentice-Hall, Inc.

Chap 11-4
Purpose of Regression Analysis
Regression analysis is used primarily to
model causality and provide prediction
Predicts the value of a dependent (response)
variable based on the value of at least one
independent (explanatory) variable
Explains the effect of the independent variables
on the dependent variable
2002 Prentice-Hall, Inc.

Chap 11-5
Types of Regression Models
Positive Linear Relationship
Negative Linear Relationship
Relationship NOT Linear
No Relationship
2002 Prentice-Hall, Inc.

Chap 11-6
Simple Linear Regression Model
Relationship between variables
is described by a linear function
The change of one variable
causes the change in the
other variable
A dependency of one variable
on the other
2002 Prentice-Hall, Inc.

Chap 11-7
Population
Regression
Line
(conditional mean)
Population Linear Regression
Population regression line is a straight line that
describes the dependence of the average value
(conditional mean) of one variable on the other
Population
Y intercept
Population
Slope
Coefficient
Random
Error
Dependent
(Response)
Variable
Independent
(Explanatory)
Variable
i i i
Y X | | c
0 1
+ + =
YX

2002 Prentice-Hall, Inc.



Chap 11-8
Population Linear Regression
(continued)
i i i
Y X | | c
0 1
+ + =
= Random Error
Y
X
(Observed Value of Y) =
Observed Value of Y
YX i
X | |
0 1
= +
i
c
|
0
|
1
(Conditional Mean)
2002 Prentice-Hall, Inc.

Chap 11-9
Sample regression line provides an estimate
of the population regression line as well as a
predicted value of Y
Sample Linear Regression
Sample
Y Intercept

Sample
Slope
Coefficient

Residual
0 1 i i i
b b Y X e + + =
0 1

Y b b X = + =
Sample Regression Line
(Fitted Regression Line, Predicted Value)
2002 Prentice-Hall, Inc.

Chap 11-10
Sample Linear Regression
and are obtained by finding the values
of and that minimizes the sum of the
squared residuals



provides an estimate of
provides and estimate of
0
b
1
b
0
b
1
b
0
b |
0
1
b |
1
(continued)
( )
2
2
1 1

n n
i i i
i i
Y Y e
= =
=

2002 Prentice-Hall, Inc.

Chap 11-11
Sample Linear Regression
(continued)
Y
X
Observed Value
YX i
X | |
0 1
= +
i
c
|
0
|
1
i i i
Y X | | c
0 1
+ + =
0 1
i
i
Y b b X = +
i
e
0 1 i i i
b b Y X e + + =
1
b
0
b
2002 Prentice-Hall, Inc.

Chap 11-12
Interpretation of the
Slope and the Intercept
is the average value of Y
when the value of X is zero.

measures the change in the
average value of Y as a result of a one-unit
change in X.
( )
| 0 E Y X |
0
= =
( )
1
| E Y X
X
|
A
=
A
2002 Prentice-Hall, Inc.

Chap 11-13
is the estimated average
value of Y when the value of X is zero.

is the estimated change in
the average value of Y as a result of a one-
unit change in X.

(continued)
( )

| 0 b E Y X
0
= =
( )
1

| E Y X
b
X
A
=
A
Interpretation of the
Slope and the Intercept
2002 Prentice-Hall, Inc.

Chap 11-14
Simple Linear Regression:
Example
You want to examine
the linear dependency
of the annual sales of
produce stores on their
size in square footage.
Sample data for seven
stores were obtained.
Find the equation of
the straight line that
fits the data best.
Annual
Store Square Sales
Feet ($1000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760

2002 Prentice-Hall, Inc.

Chap 11-15
Scatter Diagram: Example
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S qua re Fe e t
A
n
n
u
a
l

S
a
l
e
s

(
$
0
0
0
)
Excel Output
2002 Prentice-Hall, Inc.

Chap 11-16
Equation for the Sample
Regression Line: Example
0 1

1636.415 1.487
i i
i
Y b b X
X
= +
= +
From Excel Printout:
Co effi ci en ts
I n t e r c e p t 1 6 3 6 . 4 1 4 7 2 6
X V a r i a b l e 1 1 . 4 8 6 6 3 3 6 5 7
2002 Prentice-Hall, Inc.

Chap 11-17
Graph of the Sample
Regression Line: Example
0
2 0 0 0
4 0 0 0
6 0 0 0
8 0 0 0
1 0 0 0 0
1 2 0 0 0
0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0
S q u a r e F e e t
A
n
n
u
a
l

S
a
l
e
s

(
$
0
0
0
)
2002 Prentice-Hall, Inc.

Chap 11-18
Interpretation of Results:
Example
The slope of 1.487 means that for each increase of
one unit in X, we predict the average of Y to
increase by an estimated 1.487 units.
The model estimates that for each increase of one
square foot in the size of the store, the expected
annual sales are predicted to increase by $1487.

1636.415 1.487
i i
Y X = +
2002 Prentice-Hall, Inc.

Chap 11-19
Simple Linear Regression in
PHStat
In excel, use PHStat | regression | simple
linear regression
EXCEL spreadsheet of regression sales on
footage
2002 Prentice-Hall, Inc.

Chap 11-20
Measure of Variation:
The Sum of Squares
SST = SSR + SSE
Total
Sample
Variability
=
Explained
Variability
+
Unexplained
Variability
2002 Prentice-Hall, Inc.

Chap 11-21
Measure of Variation:
The Sum of Squares
SST = total sum of squares
Measures the variation of the Y
i
values around
their mean Y
SSR = regression sum of squares
Explained variation attributable to the relationship
between X and Y
SSE = error sum of squares
Variation attributable to factors other than the
relationship between X and Y
(continued)
2002 Prentice-Hall, Inc.

Chap 11-22
Measure of Variation:
The Sum of Squares
(continued)
X
i
Y
X
Y
SST = (Y
i
- Y)
2
SSE =(Y
i
- Y
i
)
2

.
SSR = (Y
i
- Y)
2


.
_
_
_
2002 Prentice-Hall, Inc.

Chap 11-23
Venn Diagrams and
Explanatory Power of Regression
Sales
Sizes
Variations in sales
explained by sizes or
variations in sizes
used in explaining
variation in sales
Variations in
sales explained
by the error
term
Variations in
store sizes not
used in
explaining
variation in
sales
( )
SSE
( )
SSR
2002 Prentice-Hall, Inc.

Chap 11-24
The ANOVA Table in Excel
ANOVA
df SS MS F
Significance
F
Regression p SSR
MSR
=SSR/p
MSR/MSE
P-value of
the F Test
Residuals n-p-1 SSE
MSE
=SSE/(n-p-1)
Total n-1 SST
2002 Prentice-Hall, Inc.

Chap 11-25
Measures of Variation
The Sum of Squares: Example
ANOVA
df SS MS F Significance F
Regression 1 30380456.12 30380456 81.17909 0.000281201
Residual 5 1871199.595 374239.92
Total 6 32251655.71
Excel Output for Produce Stores
SSR
SSE
Regression (explained) df
Degrees of freedom
Error (residual) df
Total df
SST
2002 Prentice-Hall, Inc.

Chap 11-26
The Coefficient of Determination



Measures the proportion of variation in Y
that is explained by the independent
variable X in the regression model
2
Regression Sum of Squares
Total Sum of Squares
SSR
r
SST
= =
2002 Prentice-Hall, Inc.

Chap 11-27
Venn Diagrams and
Explanatory Power of Regression
Sales
Sizes
2


SSR
SSR S
r
SE
=
=
+
2002 Prentice-Hall, Inc.

Chap 11-28
Coefficients of Determination (r
2
)
and Correlation (r)
r
2
= 1,
r
2
= 1,
r
2
= .8, r
2
= 0,
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
Y
Y
i
= b
0
+ b
1
X
i
X
^
r = +1
r = -1
r = +0.9 r = 0
2002 Prentice-Hall, Inc.

Chap 11-29
Standard Error of Estimate





The standard deviation of the variation of
observations around the regression line
( )
2
1

2 2
n
i
i
YX
Y Y
SSE
S
n n
=

= =

2002 Prentice-Hall, Inc.



Chap 11-30
Measures of Variation:
Produce Store Example
Reg ressi o n S tati sti cs
M u l t i p l e R 0 . 9 7 0 5 5 7 2
R S q u a r e 0 . 9 4 1 9 8 1 2 9
A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4
S t a n d a r d E r r o r 6 1 1 . 7 5 1 5 1 7
O b s e r va t i o n s 7
Excel Output for Produce Stores
r
2
= .94
94% of the variation in annual sales can be
explained by the variability in the size of the
store as measured by square footage
S
yx
2002 Prentice-Hall, Inc.

Chap 11-31
Linear Regression Assumptions
Normality
Y values are normally distributed for each X
Probability distribution of error is normal
2. Homoscedasticity (Constant Variance)
3. Independence of Errors
2002 Prentice-Hall, Inc.

Chap 11-32
Y values are normally distributed
around the regression line.
For each X value, the spread or
variance around the regression line is
the same.
Variation of Errors around
the Regression Line
X
1
X
2
X
Y
f(e)
Sample Regression Line
2002 Prentice-Hall, Inc.

Chap 11-33
Residual Analysis
Purposes
Examine linearity
Evaluate violations of assumptions
Graphical Analysis of Residuals
Plot residuals vs. X
i
, Y
i
and time
2002 Prentice-Hall, Inc.

Chap 11-34
Residual Analysis for Linearity
Not Linear
Linear

X
e
e
X
Y
X
Y
X
2002 Prentice-Hall, Inc.

Chap 11-35
Studentized Residual



Residual divided by its standard error
Standardized residual adjusted for the distance
from the average X value
Allow us to normalize the magnitude of the
residuals in units reflecting the variation around
the regression line
( )
( )
2
2
1
1
where
1
i
i
i i
n
YX i
i
i
X X
e
SR h
n
S h
X X
=

= = +

2002 Prentice-Hall, Inc.



Chap 11-36
Residual Analysis for
Homoscedasticity
Heteroscedasticity

Homoscedasticity
SR
X
SR
X
Y
X X
Y
2002 Prentice-Hall, Inc.

Chap 11-37
Residual Plot
0 1000 2000 3000 4000 5000 6000
Square Feet
Residual Analysis:Excel Output
for Produce Stores Example
Excel Output
Observation Predicted Y Residuals
1 4202.344417 -521.3444173
2 3928.803824 -533.8038245
3 5822.775103 830.2248971
4 9894.664688 -351.6646882
5 3557.14541 -239.1454103
6 4918.90184 644.0981603
7 3588.364717 171.6352829
2002 Prentice-Hall, Inc.

Chap 11-38
Residual Analysis
for Independence
The Durbin-Watson Statistic
Used when data is collected over time to detect
autocorrelation (residuals in one time period are
related to residuals in another period)
Measures violation of independence assumption
2
1
2
2
1
( )
n
i i
i
n
i
i
e e
D
e

=
=

Should be close to 2.
If not, examine the model
for autocorrelation.
2002 Prentice-Hall, Inc.

Chap 11-39
Durbin-Watson Statistic
in PHStat
PHStat | regression | simple linear regression

Check the box for Durbin-Watson Statistic

2002 Prentice-Hall, Inc.

Chap 11-40
Obtaining the Critical Values of
Durbin-Watson Statistic

5 o=.0


p=1 p=2
n d
L
d
U
d
L
d
U

15 1.08 1.36 .95 1.54
16 1.10 1.37 .98 1.54


Table 13.4 Finding critical values of Durbin-Watson Statistic
2002 Prentice-Hall, Inc.

Chap 11-41
Accept H
0
(no autocorrelatin)
Using the
Durbin-Watson Statistic
: No autocorrelation (error terms are independent)
: There is autocorrelation (error terms are not
independent)
0
H
1
H
0 4 2 d
L
4-d
L
d
U
4-d
U

Reject H
0
(positive
autocorrelation)
Inconclusive
Reject H
0
(negative
autocorrelation)
2002 Prentice-Hall, Inc.

Chap 11-42
Residual Analysis
for Independence
Not Independent
Independent

e
e
Time
Time
Residual is plotted against time to detect any autocorrelation
No Particular Pattern Cyclical Pattern
Graphical Approach
2002 Prentice-Hall, Inc.

Chap 11-43
Inference about the Slope:
t Test
t test for a population slope
Is there a linear dependency of Y on X ?
Null and alternative hypotheses
H
0
: |
1
= 0 (no linear dependency)
H
1
: |
1
= 0 (linear dependency)
Test statistic




1
1
1 1
2
1
where
( )
YX
b
n
b
i
i
b S
t S
S
X X
|
=

= =

. . 2 d f n =
2002 Prentice-Hall, Inc.

Chap 11-44
Example: Produce Store
Data for Seven Stores:
Estimated
Regression
Equation:
The slope of this
model is 1.487.
Is square footage of
the store affecting its
annual sales?
.
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760

Y
i
= 1636.415 +1.487X
i
2002 Prentice-Hall, Inc.

Chap 11-45
Inferences about the Slope:
t Test Example
H
0
: |
1
= 0
H
1
: |
1
= 0
o = .05
df = 7 - 2 = 5
Critical Value(s):
Test Statistic:
Decision:

Conclusion:

There is evidence that
square footage affects
annual sales.
t
0 2.5706 -2.5706
.025
Reject Reject
.025
From Excel Printout
Reject H
0
Coefficients Standard Error t Stat P-value
Intercept 1636.4147 451.4953 3.6244 0.01515
Footage 1.4866 0.1650 9.0099 0.00028
1
b
1
b
S
t
2002 Prentice-Hall, Inc.

Chap 11-46
Inferences about the Slope:
Confidence Interval Example
Confidence Interval Estimate of the Slope:
1
1 2 n b
b t S

Excel Printout for Produce Stores


At 95% level of confidence, the confidence interval
for the slope is (1.062, 1.911). Does not include 0.
Conclusion: There is a significant linear dependency
of annual sales on the size of the store.
Lower 95% Upper 95%
I n te r c e p t 4 7 5 . 8 1 0 9 2 6 2 7 9 7 . 0 1 8 5 3
X V a r i a b l e 11 . 0 6 2 4 9 0 3 7 1 . 9 1 0 7 7 6 9 4
2002 Prentice-Hall, Inc.

Chap 11-47
Inferences about the Slope:
F Test
F Test for a population slope
Is there a linear dependency of Y on X ?
Null and alternative hypotheses
H
0
: |
1
= 0 (No Linear Dependency)
H
1
: |
1
= 0 (Linear Dependency)
Test statistic




Numerator d.f.=1, denominator d.f.=n-2
( )
1

2
SSR
F
SSE
n
=

2002 Prentice-Hall, Inc.



Chap 11-48
Relationship between
a t Test and an F Test
Null and alternative hypotheses
H
0
: |
1
= 0 (No linear dependency)
H
1
: |
1
= 0 (Linear dependency)
( )
2
2 1, 2 n n
t F

=
2002 Prentice-Hall, Inc.

Chap 11-49
ANOVA
df SS MS F Significance F
Regression 1 30380456.12 30380456.12 81.179 0.000281
Residual 5 1871199.595 374239.919
Total 6 32251655.71
Inferences about the Slope:
F Test Example
Test Statistic:
Decision:
Conclusion:

H
0
: |
1
= 0
H
1
: |
1
= 0
o = .05
numerator
df = 1
denominator
df = 7 - 2 = 5


There is evidence that
square footage affects
annual sales.
From Excel Printout
Reject H
0
0 6.61
Reject
o = .05
1, 2 n
F

2002 Prentice-Hall, Inc.

Chap 11-50
Purpose of Correlation Analysis
Correlation analysis is used to measure
strength of association (linear relationship)
between two numerical variables
Only concerned with strength of the relationship
No causal effect is implied
2002 Prentice-Hall, Inc.

Chap 11-51
Purpose of Correlation Analysis
Population correlation coefficient (Rho) is
used to measure the strength between the
variables
Sample correlation coefficient r is an estimate
of and is used to measure the strength of
the linear relationship in the sample
observations
(continued)
2002 Prentice-Hall, Inc.

Chap 11-52
r = .6 r = 1
Sample of Observations from
Various r Values
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
2002 Prentice-Hall, Inc.

Chap 11-53
Features of and r
Unit free
Range between -1 and 1
The closer to -1, the stronger the negative
linear relationship
The closer to 1, the stronger the positive
linear relationship
The closer to 0, the weaker the linear
relationship
2002 Prentice-Hall, Inc.

Chap 11-54
Test for a Linear Relationship
Hypotheses
H
0
: = 0 (no correlation)
H
1
: = 0 (correlation)
Test statistic


( )( )
( ) ( )
2
2
1
2 2
1 1
where
2
n
i i
i
n n
i i
i i
r
t
r
n
X X Y Y
r r
X X Y Y

=
= =

=
1


= =


2002 Prentice-Hall, Inc.

Chap 11-55
Example: Produce Stores
Reg ressi o n S tati sti cs
M u l t i p l e R 0 . 9 7 0 5 5 7 2
R S q u a r e 0 . 9 4 1 9 8 1 2 9
A d j u s t e d R S q u a r e 0 . 9 3 0 3 7 7 5 4
S t a n d a r d E r r o r 6 1 1 . 7 5 1 5 1 7
O b s e r va t i o n s 7
From Excel Printout
r
Is there any
evidence of a linear
relationship between
the annual sales of a
store and its square
footage at .05 level
of significance?
H
0
:

= 0 (No association)
H
1
: = 0 (Association)
o = .05
df = 7 - 2 = 5
2002 Prentice-Hall, Inc.

Chap 11-56
Example:
Produce Stores Solution
0 2.5706 -2.5706
.025
Reject Reject
.025
Critical Value(s):
Conclusion:
There is evidence of a
linear relationship at 5%
level of significance
Decision:
Reject H
0
2
.9706
9.0099
1 .9420
5
2
r
t
r
n

= = =

The value of the t statistic is


exactly the same as the t
statistic value for test on the
slope coefficient
2002 Prentice-Hall, Inc.

Chap 11-57
Estimation of Mean Values
Confidence interval estimate for :
The mean of Y given a particular X
i
2
2
2
1
( ) 1

( )
i
i n YX
n
i
i
X X
Y t S
n
X X

t value from table


with df=n-2
Standard error
of the estimate
Size of interval varies according
to distance away from mean,
X
|
i
Y X X

=
2002 Prentice-Hall, Inc.

Chap 11-58
Prediction of Individual Values
Prediction interval for individual response
Y
i
at a particular X
i
Addition of one increases width of interval
from that for the mean of Y
2
2
2
1
( ) 1

1
( )
i
i n YX
n
i
i
X X
Y t S
n
X X

+ +

2002 Prentice-Hall, Inc.



Chap 11-59
Interval Estimates
for Different Values of X
Y
X
Prediction Interval
for a individual Y
i
A given X
Confidence
Interval for the
mean of Y
X
2002 Prentice-Hall, Inc.

Chap 11-60
Example: Produce Stores
Y
i
= 1636.415 +1.487X
i
Data for seven stores:
Regression Model Obtained:
Predict the annual
sales for a store
with 2000 square
feet.
.
Annual
Store Square Sales
Feet ($000)
1 1,726 3,681
2 1,542 3,395
3 2,816 6,653
4 5,555 9,543
5 1,292 3,318
6 2,208 5,563
7 1,313 3,760

2002 Prentice-Hall, Inc.

Chap 11-61
Estimation of Mean Values:
Example
Find the 95% confidence interval for the average annual
sales for a 2,000 square-foot store.
2
2
2
1
( ) 1

4610.45 612.66
( )
i
i n YX
n
i
i
X X
Y t S
n
X X

+ =

Predicted Sales Y
i
= 1636.415 +1.487X
i
= 4610.45 ($000)
.
X = 2350.29 S
YX
= 611.75
t
n-2
= t
5
= 2.5706
Confidence Interval Estimate for

|
i
Y X X

=
2002 Prentice-Hall, Inc.

Chap 11-62
Prediction Interval for Y :
Example
Find the 95% prediction interval
for the annual sales of a 2,000 square-foot store
Predicted Sales Y
i
= 1636.415 +1.487X
i
= 4610.45 ($000)
.
X = 2350.29 S
YX
= 611.75
t
n-2
= t
5
= 2.5706
2
2
2
1
( ) 1

1 4610.45 1687.68
( )
i
i n YX
n
i
i
X X
Y t S
n
X X

+ + =

Prediction Interval for Individual Y


2002 Prentice-Hall, Inc.

Chap 11-63
Estimation of Mean Values and
Prediction of Individual Values in PHStat
In excel, use PHStat | regression | simple
linear regression
Check the confidence and prediction interval for
X= box
EXCEL spreadsheet of regression sales on
footage

2002 Prentice-Hall, Inc.

Chap 11-64
Pitfalls of Regression Analysis
Lacking an awareness of the assumptions
underlying least-squares regression
Not knowing how to evaluate the assumptions
Not knowing the alternatives to least-squares
regression if a particular assumption is violated
Using a regression model without knowledge of
the subject matter
2002 Prentice-Hall, Inc.

Chap 11-65
Strategies for Avoiding
the Pitfalls of Regression
Start with a scatter plot of X on Y to
observe possible relationship
Perform residual analysis to check the
assumptions
Use a histogram, stem-and-leaf display,
box-and-whisker plot, or normal probability
plot of the residuals to uncover possible
non-normality
2002 Prentice-Hall, Inc.

Chap 11-66
Strategies for Avoiding
the Pitfalls of Regression
If there is violation of any assumption, use
alternative methods (e.g.: least absolute
deviation regression or least median of squares
regression) to least-squares regression or
alternative least-squares models (e.g.:
Curvilinear or multiple regression)
If there is no evidence of assumption violation,
then test for the significance of the regression
coefficients and construct confidence intervals
and prediction intervals
(continued)
2002 Prentice-Hall, Inc.

Chap 11-67
Chapter Summary
Introduced types of regression models
Discussed determining the simple linear
regression equation
Described measures of variation
Addressed assumptions of regression and
correlation
Discussed residual analysis
Addressed measuring autocorrelation
2002 Prentice-Hall, Inc.

Chap 11-68
Chapter Summary
Described inference about the slope
Discussed correlation -- measuring the
strength of the association
Addressed estimation of mean values and
prediction of individual values
Discussed possible pitfalls in regression and
recommended a strategy to avoid them

(continued)

S-ar putea să vă placă și