Sunteți pe pagina 1din 55

Business Statistics, 5

th
ed.
by Ken Black



Chapter 14

Simple Regression
Analysis
Discrete Distributions
PowerPoint presentations prepared by Lloyd Jaisingh,
Morehead State University
Learning Objectives
Compute the equation of a simple regression line from a
sample of data, and interpret the slope and intercept of the
equation.
Understand the usefulness of residual analysis in testing the
assumptions underlying regression analysis and in
examining the fit of the regression line to the data.
Compute a standard error of the estimate and interpret its
meaning.
Compute a coefficient of determination and interpret it.
Test hypotheses about the slope of the regression model and
interpret the results.
Estimate values of Y using the regression model.
Regression and Correlation
Regression analysis is the process of
constructing a mathematical model or
function that can be used to predict or
determine one variable by another variable.

Correlation is a measure of the degree of
relatedness of two variables.


Simple Regression Analysis

bivariate (two variables) linear regression --
the most elementary regression model
dependent variable, the variable to be
predicted, usually called Y
independent variable, the predictor or
explanatory variable, usually called X

Airline Cost Data
Number of
Passengers
X
Cost ($1,000)
Y
61 4.280
63 4.080
67 4.420
69 4.170
70 4.480
74 4.300
76 4.820
81 4.700
86 5.110
91 5.130
95 5.640
97 5.560
Scatter Plot of Airline Cost Data
100 80 60 40 20 0
6000
5000
4000
3000
2000
1000
0
Number of Passengers
C
o
s
t
(
$
1
,
0
0
0
)
Scatterplot of Cost($1,000) vs Number of Passengers
Note: The
scales
start at 0.
Scatter Plot of Airline Cost Data
100 90 80 70 60
5800
5600
5400
5200
5000
4800
4600
4400
4200
4000
Number of Passengers
C
o
s
t
(
$
1
,
0
0
0
)
Scatterplot of Cost($1,000) vs Number of Passengers
Note: The scales
do not start at 0.
Regression Models
Deterministic Regression Model

Y = |
0
+ |
1
X

Probabilistic Regression Model

Y = |
0
+ |
1
X + c

|
0
and |
1
are population parameters

|
0
and |
1
are estimated by sample statistics b
0
and b
1
Equation of the Simple Regression
Line
Y Y
where
X Y
b
b
b b
of value predicted the =

slope sample the =


intercept sample the = :

1
0
1 0
+ =
Least Squares Analysis
( )( )
( )
( )( )
1
2 2 2
2
2 b
X X
X X
X
X
X X Y Y XY nXY
n
XY
X Y
n
n
=

=

0 1 1
b b b
Y X
Y
n
X
n
= =

Least Squares Analysis
( )( )
( )( )
( )
SS X X Y Y XY
X Y
n
SS
n
SS
SS
XY
XX
XY
XX
X X X
X
b
= =
= =

=



2
2
2
1
0 1 1
b b b
Y X
Y
n
X
n
= =

Solving for b
1
and b
0
of the Regression
Line: Airline Cost Example (Part 1)

Number of
Passengers Cost ($1,000)
X Y X
2
XY
61 4.28 3,721 261.08
63 4.08 3,969 257.04
67 4.42 4,489 296.14
69 4.17 4,761 287.73
70 4.48 4,900 313.60
74 4.30 5,476 318.20
76 4.82 5,776 366.32
81 4.70 6,561 380.70
86 5.11 7,396 439.46
91 5.13 8,281 466.83
95 5.64 9,025 535.80
97 5.56 9,409 539.32


X = 930

Y = 56.69

2
X = 73,764

XY = 4,462.22
Solving for b
1
and b
0
of the Regression
Line: Airline Cost Example (Part 2)
745 . 68
12
) 69 . 56 )( 930 (
22 . 462 , 4 = = =


n
Y X
XY SS
XY
1689
12
) 930 (
764 , 73
) (
2
2
2
= = =


n
X
X SS
XX
0407 .
1689
745 . 68
1
= = =
XX
XY
SS
SS
b
57 . 1
12
930
) 0407 (.
12
69 . 56
1 0
= = =

n
X
b
n
Y
b
X Y 0407 . 57 . 1

+ =
Graph of Regression Line
for the Airline Cost Example
100 80 60 40 20 0
6000
5000
4000
3000
2000
1000
0
Number of Passengers
C
o
s
t
(
$
1
,
0
0
0
)
Cost($1,000) = 1570 + 40.70 Number of Passengers
Airline Cost: Excel Summary Output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.94820033
R Square 0.89908386
Adjusted R Square 0.88899225
Standard Error 0.17721746
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 2.79803 2.79803 89.092179 2.7E-06
Residual 10 0.31406 0.03141
Total 11 3.11209
Coefficients Standard Error t Stat P-value
Intercept 1.56979278 0.33808 4.64322 0.0009175
Number of Passengers 0.0407016 0.00431 9.43887 2.692E-06
Airline Cost: MINITAB Summary
Output
Residual Analysis:
Airline Cost Example
Number of Predicted
Passengers Cost ($1,000) Value Residual
X Y Y

Y Y

61 4.28 4.053 .227


63 4.08 4.134 -.054
67 4.42 4.297 .123
69 4.17 4.378 -.208
70 4.48 4.419 .061
74 4.30 4.582 -.282
76 4.82 4.663 .157
81 4.70 4.867 -.167
86 5.11 5.070 .040
91 5.13 5.274 -.144
95 5.64 5.436 .204
97 5.56 5.518 .042

= 001 . )

( Y Y
Excel Graph of Residuals
for the Airline Cost Example

100 90 80 70 60
0.2
0.1
0.0
-0.1
-0.2
-0.3
Number of Passengers
R
e
s
i
d
u
a
l
MINITAB Graph of Residuals
for the Airline Cost Example
100 90 80 70 60
0.3
0.2
0.1
0.0
-0.1
-0.2
-0.3
Number of Passengers
R
e
s
i
d
u
a
l
s
0
Scatterplot of Residuals vs Number of Passengers
Nonlinear Residual Plot
Nonconstant Error Variance
Nonconstant Error Variance
Graphs of Nonindependent
Error Terms
Graphs of Nonindependent
Error Terms
Healthy Residual Plot
Demonstration Problem 14.2 MINITAB
Computations for Residuals
Demonstration Problem 14.2 MINITAB
Graphical Display of Residuals
0.50 0.25 0.00 -0.25 -0.50
99
90
50
10
1
Residual
P
e
r
c
e
n
t
5.6 5.2 4.8 4.4 4.0
0.30
0.15
0.00
-0.15
-0.30
Fitted Value
R
e
s
i
d
u
a
l
0.2 0.1 0.0 -0.1 -0.2 -0.3
3
2
1
0
Residual
F
r
e
q
u
e
n
c
y
12 11 10 9 8 7 6 5 4 3 2 1
0.30
0.15
0.00
-0.15
-0.30
Observation Order
R
e
s
i
d
u
a
l
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
Histogram of the Residuals Residuals Versus the Order of the Data
Residual Plots for Cost($1,000)
Standard Error of the Estimate
( )
SSE
Y XY
SSE
n
Y Y
Y b b
S
e
=
=
=


2
2
0 1
2

Sum of Squares Error


Standard Error
of the
Estimate
Determining SSE
for the Airline Cost Example
Number of
Passengers Cost ($1,000) Residual
X Y Y Y


2
)

( Y Y
61 4.28 .227 .05153
63 4.08 -.054 .00292
67 4.42 .123 .01513
69 4.17 -.208 .04326
70 4.48 .061 .00372
74 4.30 -.282 .07952
76 4.82 .157 .02465
81 4.70 -.167 .02789
86 5.11 .040 .00160
91 5.13 -.144 .02074
95 5.64 .204 .04162
97 5.56 .042 .00176

= 001 . )

( Y Y


2
)

( Y Y
=.31434
Sum of squares of error = SSE = .31434
Determining SSE for the Airline
Cost Example MINITAB Output
SSE = 0.3141
Standard Error of the Estimate
for the Airline Cost Example
( )
1773 . 0
10
31434 . 0
2
31434 . 0

2
=
=

=
=
=


n
SSE
SSE
S
Y Y
e
Sum of Squares Error
Standard Error
of the
Estimate
Standard Error of the Estimate
for the Airline Cost Example
Coefficient of Determination
( )
( )
( )
SS
n
SS lained iation un lained iation
SS SSR SSE
SSR
SS
SSE
SS
SSR
SS
SSE
SS
SSE
n
YY
YY
YY
YY YY
YY
YY
Y Y Y
Y
r
Y
Y
= =

= +
= +
= +
=
=
=

2
2
2
2
2
2
1
1
1
exp var exp var
0 1
2
s s
r
Coefficient of Determination
for the Airline Cost Example
( ) ( )
899 .
11209 . 3
31434 .
1
1
11209 . 3
12
56.69
9251 . 270
31434 . 0
2
2 2
2
=
=
=
= =

=
=

YY
YY
SS
SSE
r
n
Y
Y
SS
SSE
89.9% of the variability
of the cost of flying a
Boeing 737 is accounted for
by the number of passengers.
Coefficient of Determination
for the Airline Cost Example
Hypothesis Tests for the Slope
of the Regression Model
( )
t
where
SS
SSE
n
SS
n
df n
b
S
S
S
S
X
X
b
b
e
XX
e
XX
=

=
=

=

=
=

1
1
2
2
1
2
2
|
|
:
the hypothesized slope
H
H
0
1
1
1
0
0
:
:
|
|
=
=
H
H
0
1
1
1
0
0
:
:
|
|
s
>
H
H
0
1
1
1
0
0
:
:
|
|
>
<
Hypothesis Test: Airline Cost
Example
0
0
10 , 025 .
H reject not do , 228 . 2 228 . 2
H reject , 228 . 2 | |
228 . 2
05 .
10 2 10 2
s s
>
=
=
= = =
t If
t If
n df
t
o
H
H
0
1
1
1
0
0
:
:
|
|
=
=
Hypothesis Test: Airline Cost Example
|t| = 9.44 > 2.228
so reject H
0

Note:
P-value = 0.000
Testing the Overall Model
0
0
10 , 1 , 05 .
H reject not do , 96 . 4
H reject , 96 . 4
96 . 4
05 .
10 1 1 12 1
1
s
>
=
=
= = =
= =
F If
If F
k n df
k df
F
err
reg
o
H
H
0
1
1
1
0
0
:
:
|
|
=
=
Testing the Overall Model
F = 89.09 > 4.96
so reject H
0

Note:
P-value = 0.000
Point Estimation
for the Airline Cost Example
( )
10 . 541 , 4 $ 5411 . 4
73 0407 . 0 57 . 1

, 73
0407 . 0 57 . 1

or
Y
X For
X Y
=
+ =
=
+ =
Confidence Interval to Estimate
Y
:
Airline Cost Example
( )
( )
( )( )
( )
( )
( ) 6631 . 4 4191 . 4
1220 5411 . 4
12
930
764 , 73
5 . 77 73
12
1
1773 . 0 228 . 2 5411 . 4
, level confidence 95% a and 73 For
= SS
of value particular a :
SS
0
1

73
2
2
0
2
2
XX
0
XX
2
2 ,
2
s s
=

+
=

=
+

Y
E
X
n
X
X
X
X
where
n
Y
X
X
S t
e n
o
Confidence Interval to Estimate the
Average Value of Y for some Values of
X: Airline Cost Example
X Confidence Interval
62 4.0934 + .1876 3.9058 to 4.2810
68 4.3376 + .1461 4.1915 to 4.4837
73 4.5411 + .1220 4.4191 to 4.6631
85 5.0295 + .1349 4.8946 to 5.1644
90 5.2230 + .1656 5.0674 to 5.3986
Confidence Interval to Estimate the
Average Value of Y for some Values
of X: Airline Cost Example
100 90 80 70 60
6.0
5.5
5.0
4.5
4.0
Number Passengers
C
o
s
t
(
$
1
,
0
0
0
)
S 0.177217
R-Sq 89.9%
R-Sq(adj) 88.9%
Regression
95% CI
Fitted Line Plot
Cost($1,000) = 1.570 + 0.04070 Number Passengers
Prediction Interval to Estimate Y
for a given value of X
( )
( )

=
+ +

n
where
n
Y
X
X
X
X
X
S t
e n
2
2
XX
0
XX
2
2 ,
2
= SS
X of value particular a :
SS
0
1
1

o
Confidence & Prediction Intervals for
Estimation

100 90 80 70 60
6.0
5.5
5.0
4.5
4.0
3.5
Number Passengers
C
o
s
t
(
$
1
,
0
0
0
)
S 0.177217
R-Sq 89.9%
R-Sq(adj) 88.9%
Regression
95% CI
95% PI
Fitted Line Plot
Cost($1,000) = 1.570 + 0.04070 Number Passengers
MINITAB Regression Analysis of
the Airline Cost Example
The regression equation is
Cost = 1.57 + 0.0407 Number of Passengers
Predictor Coef StDev T P
Constant 1.5698 0.3381 4.64 0.001
Number o 0.040702 0.004312 9.44 0.000
S = 0.1772 R-Sq = 89.9% R-Sq(adj) = 88.9%
Analysis of Variance
Source DF SS MS F P
Regression 1 2.7980 2.7980 89.09 0.000
Residual Error 10 0.3141 0.0314
Total 11 3.1121
Obs Number o Cost Fit StDev Fit Residual St Resid
1 61.0 4.2800 4.0526 0.0876 0.2274 1.48
2 63.0 4.0800 4.1340 0.0808 -0.0540 -0.34
3 67.0 4.4200 4.2968 0.0683 0.1232 0.75
4 69.0 4.1700 4.3782 0.0629 -0.2082 -1.26
5 70.0 4.4800 4.4189 0.0605 0.0611 0.37
6 74.0 4.3000 4.5817 0.0533 -0.2817 -1.67
7 76.0 4.8200 4.6631 0.0516 0.1569 0.93
8 81.0 4.7000 4.8666 0.0533 -0.1666 -0.99
9 86.0 5.1100 5.0701 0.0629 0.0399 0.24
10 91.0 5.1300 5.2736 0.0775 -0.1436 -0.90
11 95.0 5.6400 5.4364 0.0912 0.2036 1.34
12 97.0 5.5600 5.5178 0.0984 0.0422 0.29
Pearson Product-Moment
Correlation Coefficient
( )( )
( )( )
( ) ( )
( )( )
( ) ( )
r
SSXY
SSX SSY
X X Y Y
XY
X Y
n
n n
X X Y Y
X
X
Y
Y
=
=

=

(
(

(
(



2 2
2
2
2
2
s s 1 1 r
Pearson Product-Moment
Correlation Coefficient- MINITAB
output for Airline Cost Example
Three Degrees of Correlation
r < 0 r > 0
r = 0
Using Regression to Develop a
Forecasting Trend Line Scatter Plot of
Huntsville Chemicals
2006 2004 2002 2000 1998 1996
35
30
25
20
15
10
Year
S
a
l
e
s
(
$
m
i
l
l
i
o
n
s
)
Scatterplot of Sales($millions) vs Year
Using Regression to Develop a
Forecasting Trend Line Scatter Plot of
Huntsville Chemicals
Sales = - 5320 + 2.669 Year
Using Regression to Develop a
Forecasting Trend Line Scatter Plot of
Huntsville Chemicals
2006 2004 2002 2000 1998 1996
35
30
25
20
15
10
Year
S
a
l
e
s
(
$
m
i
l
l
i
o
n
s
)
S 1.68131
R-Sq 96.3%
R-Sq(adj) 95.8%
Fitted Line Plot
Sales($millions) = - 5320 + 2.669 Year
Using Regression to Develop a
Forecasting Trend Line Scatter Plot of
Huntsville Chemicals
2006 2004 2002 2000 1998 1996
35
30
25
20
15
10
Year
Y
-
D
a
t
a
Sales($millions)
FITS1
Variable
Scatterplot of Sales($millions), FITS1 vs Year
Copyright 2008 John Wiley & Sons, Inc.
All rights reserved. Reproduction or translation
of this work beyond that permitted in section 117
of the 1976 United States Copyright Act without
express permission of the copyright owner is
unlawful. Request for further information should
be addressed to the Permissions Department, John
Wiley & Sons, Inc. The purchaser may make
back-up copies for his/her own use only and not
for distribution or resale. The Publisher assumes
no responsibility for errors, omissions, or damages
caused by the use of these programs or from the
use of the information herein.

S-ar putea să vă placă și