Sunteți pe pagina 1din 103

1

EM 561 GW DePuy 286


8-1 Introduction
In the previous chapter we illustrated how a
parameter can be estimated from sample data.
Estimate mean length of adult foot. Measure 28
students and find X = 26.322 cm. Is this a good estimate
for ?
Do we believe EXACTLY equals 26.322 cm (=X)?
Rather than estimate with a single value (called
point estimate) lets develop range or interval of
values (called interval estimate) we think
contains true value of .
Now rather than saying = 26.322 cm, we say 25.77
26.87 cm
EM 561 GW DePuy 287
2
8-1 Introduction
A confidence interval for an unknown parameter
is an interval that contains a set of plausible or
believable values of the parameter.
It is associated with a confidence level, 1-,
which measures the probability that the confidence
interval actually contains the unknown parameter.
A 100(1-) percent confidence interval on the
unknown parameter :
P(lower limit upper limit) = 1-
EM 561 GW DePuy 288
8-1 Introduction
For example, a 99% confidence interval on
the parameter
Find the lower and upper limit such that:
P(lower limit upper limit) = 0.99
Common 100(1-)% CI:
99% CI 1- = 0.99 = 0.01
95% CI 1- = 0.95 = 0.05
90% CI 1- = 0.90 = 0.10
IE 360 GW DePuy 289 IE 360 GW DePuy 289
3
8-1 Introduction
We use a sample statistic to estimate the
population parameter
X to estimate , S
2
to estimate
2
General procedure to form confidence
interval
Take a sample
Find the sample statistic of interest (i.e. X or S
2
)
Form interval around sample statistic that we
think contains true population parameter value
Calculate upper and lower CI limit
EM 561 GW DePuy 290
8-1 Introduction
How to calculate upper and lower CI limits?
Using the sampling distribution of the
appropriate sample statistic, choose CI such that:
P(lower limit upper limit) = 1-
Use Normal distribution to find CI for when n40
Use Student t distribution to find CI for when
n<40
Use Chi-square distribution to find CI for
2
IE 360 GW DePuy 291 IE 360 GW DePuy 291
4
8-1 Introduction
How to calculate upper and lower CI limits?
P(lower limit upper limit) = 1-
As we will soon see, width (upper limit lower
limit) of a CI is a function of:
Confidence level, 1-
As confidence level , CI width
Sample size (i.e. # observations)
As sample size, CI width
Variance of data (as measured by s
2
)
As variance , CI width
IE 360 GW DePuy 292 IE 360 GW DePuy 292
8-1 Introduction
The width of a confidence interval is a measure of the
precision of estimation.
The width of the confidence interval is a measure of the
quality of information obtained from the sample.
The wider the confidence interval, the more confident we
are that the interval actually contains the true value of .
Remember: as confidence level , CI width
However, the wider the confidence interval the less
information we have about the true value of
In an ideal situation, we obtain a relatively narrow interval
with high confidence. This is possible when sample size is
large and/or variance is small.
EM 561 GW DePuy 293
5
8-1 Introduction
Precision of CI
Do not want CI that is too narrow
Low probability it will contain true population parameter
Do not want CI that is too wide
Not useful to make decisions
Mean length of foot example
50% CI: 26.22 26.42 cm
99% CI: 16.0 36.0 cm
99% CI: 25.92 26.72 cm
IE 360 GW DePuy 294 IE 360 GW DePuy 294
8-1 Introduction
Two-sided confidence interval specifies both a
lower and upper limit on
P(lower limit upper limit) = 1-
One-sided confidence interval may be more
appropriate for some applications.
One-sided lower confidence interval on
P(lower limit ) = 1-
One-sided upper confidence interval on
P( upper limit) = 1-
IE 360 GW DePuy 295 IE 360 GW DePuy 295
6
8-1 Introduction
We will find confidence intervals for several
population parameters:
Mean, , of any distribution, large sample size
(n 40)
Mean, , of normal distribution, small sample
size (n < 40)
Variance,
2
, of normal distribution
Population proportion, p
EM 561 GW DePuy 296
8-2 Confidence Interval on the Mean, ,
for Large Sample
Want to estimate the population mean, , of data
from any distributed with unknown variance,
2
Take a sample of size n 40 (Note: data can have
any distribution)
Find sample mean, , and sample variance, S
2
will be Normally distributed with mean and
variance
2
/n
How do I know this? Remember Central Limit Theorem
from Chpt 7. As sample size, n, gets large, the distribution
of sample means approaches a normal distribution
X
X
EM 561 GW DePuy 297
7
D
e
n
s
i
t
y
Distribution Plot
8-2 Confidence Interval on the Mean, ,
for Large Sample

|
|
.
|

\
|
n
N X
2
, ~
o

Data from UNKNOWN distribution with UNKNOWN mean, , and


UNKNOWN variance
2
By CLT, sample means normally distributed with mean, , and variance
2
/n
Both distributions have same mean, , what is it? Form interval we
think contains


) , ??( ~
2
o X
EM 561 GW DePuy 298
8-2 Confidence Interval on the Mean, ,
for Large Sample
Use sample mean, , to estimate
Sample mean, , will be center of CI for
Calculate upper and lower CI limit using ~ N(,
2
/n)
and 1- such that
P(lower limit upper limit) = 1-
How to calculate lower limit and upper limit?
X
X
EM 561 GW DePuy 299
X
8
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2


X
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2


X
8-2 Confidence Interval on the Mean, ,
for Large Sample
IE 360 GW DePuy 300
Our sample mean is a single draw from this distribution.
We do not know which part of the curve our sample mean
was drawn from. So we do not know how close the sample
mean is to
X X X
IE 360 GW DePuy 300
8-2 Confidence Interval on the Mean, ,
for Large Sample
We want the CI we
form around our
sample mean to have
a high (1-)
probability of
including
All samples from this
distribution will have
same width CI. What
should that width be?
IE 360 GW DePuy 301
X X X
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2


X
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2


X
IE 360 GW DePuy 301
9
8-2 Confidence Interval on the Mean, ,
for Large Sample
We choose the CI limits such that
P(lower limit upper limit) = 1-
We choose upper and lower limits such that
repeated sampling will result in 100(1-)% of the
CI containing
95% CI example: choose upper and lower limit such
that when we repeatedly take a sample of size n and
form a CI, 95% of these CIs will contain
EM 561 GW DePuy 302
8-2 Confidence Interval on the Mean, ,
for Large Sample
Figure 8-1 Repeated construction of a confidence interval for .
How wide to
make this
interval so
that it
includes
100(1-)% of
the time?
EM 561 GW DePuy 303
10
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

8-2 Confidence Interval on the Mean, ,


for Large Sample
100(1-)% of our samples will have a sample mean in this
range.
If we choose the upper and lower CI limits so that these
samples CIs include , then 100(1-)% of the CI will include
1-

2
o
2
o
EM 561 GW DePuy 304
8-2 Confidence Interval on the Mean, ,
for Large Sample
For example, 95% of
samples will have
sample mean in this
range
Choose the CI width so that all sample means in this range
will include . Consider most extreme cases
X X
Lower Limit Upper Limit
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
EM 561 GW DePuy 305
0.025 0.025
11
8-2 Confidence Interval on the Mean, ,
for Large Sample
So, for X ~N(,
2
/n), 95% of area under the curve within 1.96
s.d. of mean where s.d. =
0.4
0.3
0.2
0.1
0.0
f
(
x
)
0
0.95
Z
-1.96 1.96

n
So our 95% CI on has a width of 2(1.96)( ) n
0.025 0.025
Using standard normal distribution, N(0,1), we find 95% of area
under curve within 1.96 standard deviations of mean
Z~N(0,1)
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
0.025 0.025

n
o
96 . 1
n
o
96 . 1
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
0.025 0.025

n
o
96 . 1
n
o
96 . 1
EM 561 GW DePuy 306
8-2 Confidence Interval on the Mean, ,
for Large Sample
IE 360 GW DePuy 307
95% CI 100(1-)% CI
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

X
0.95

n
X
o
96 . 1 +
n
X
o
96 . 1
0.025 0.025
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

X
0.95

n
X
o
96 . 1 +
n
X
o
96 . 1
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

X
0.95

n
X
o
96 . 1 +
n
X
o
96 . 1
0.025 0.025
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

X
1-

n
Z X
o
o 2 /
+
n
Z X
o
o 2 /


2
o
2
o
IE 360 GW DePuy 307
12
8-2 Confidence Interval on the Mean, ,
for Large Sample
100(1-)% Confidence Interval on (2-sided):
EM 561 GW DePuy 308

n
Z X
n
Z X
o

o
o o 2 / 2 /
+ s s
But we have unknown variance,
2
When large sample size (n 40), we can replace
2
with sample variance, S
2
100(1-)% Confidence Interval on (2-sided):

n
S
Z X
n
S
Z X
2 / 2 / o o
+ s s
100(1-)% Confidence Interval on (2-sided):
where Z
/2
is the upper 100(/2) percentage point of
the standard normal distribution
Z
/2
= z for which P(Z>z) = /2
For a 90% CI, = 0.10 and Z
0.05
= 1.645
For a 95% CI, = 0.05 and Z
0.025
= 1.96
For a 99% CI, = 0.01 and Z
0.005
= 2.576
8-2 Confidence Interval on the Mean, ,
for Large Sample
IE 360 GW DePuy 309
These Z values
come from the
N(0,1) table
IE 360 GW DePuy 309

n
S
Z X
n
S
Z X
2 / 2 / o o
+ s s

n
S
Z X
2 / o

13
100(1-)% Confidence Interval on (2-sided):
Notice: CI centered around sample mean
Notice: CI width is a function of confidence level,
sample size, and variance
8-2 Confidence Interval on the Mean, ,
for Large Sample
EM 561 GW DePuy 310

n
S
Z X
n
S
Z X
2 / 2 / o o
+ s s

n
S
Z X
2 / o

8-2 Confidence Interval on the Mean, ,


for Large Sample
Kelloggs is interested in estimating the mean fill weight of
their corn flakes cereal boxes.
They take a sample of 60 boxes and calculate the sample
mean to be 12.05 oz. and the sample standard deviation to
be 0.14 oz.
1. Find a 90% CI for the mean fill weight,
Believable = 12.00 oz? = 12.10 oz? = 12.025 oz?
EM 561 GW DePuy 311
14
8-2 Confidence Interval on the Mean, ,
for Large Sample
One-sided Confidence Intervals
Often only interested in upper or lower CI limit (not
both)
Lower limit of concrete strength
Upper limit on automated fill volume
One-sided lower confidence interval on
P(lower limit ) = 1-
One-sided upper confidence interval on
P( upper limit) = 1-
EM 561 GW DePuy 312
8-2 Confidence Interval on the Mean, ,
for Large Sample
One-sided lower confidence interval on
We choose the Lower CI limit such that
P(lower limit ) = 1-
We choose the lower limit such that repeated sampling will
result in 100(1-)% of the one-sided lower CI containing
95% CI example: choose lower limit such that when we
repeatedly take a sample of size n and form a one-sided
lower CI, 95% of these Lower CIs will contain
Only those 5% of sample means that are very large will
not include in their Lower CI
IE 360 GW DePuy 313
15
8-2 Confidence Interval on the Mean, ,
for Large Sample
Example of 95%
Lower CI
95% of samples will
have sample mean in
this range
Choose the CI lower limit so that all sample means in this
range will include . Consider most extreme case
X
Lower Limit
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
0.05
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
0.05
IE 360 GW DePuy 314
8-2 Confidence Interval on the Mean, ,
for Large Sample
Using standard normal distribution, N(0,1), we find 95% of area
under curve is less than +1.645 standard deviations
0.4
0.3
0.2
0.1
0.0
D
e
n
s
it
y
Z~N(0,1)
Z
0.95
0 1.645
0.05
0.4
0.3
0.2
0.1
0.0
D
e
n
s
it
y
Z~N(0,1)
Z
0.95
0 1.645
0.05
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
0.05

n
o
645 . 1
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
0.05
f
(
x
)
|
|
.
|

\
|
n

, N ~ X
2

X

0.95
0.05

n
o
645 . 1
So, for X ~N(,
2
/n), 95% of area under the curve is less than
1.645 s.d. of mean where s.d. = n o
IE 360 GW DePuy 315
16
8-2 Confidence Interval on the Mean, ,
for Large Sample
IE 360 GW DePuy 316 IE 360 GW DePuy 316
( )
o
s
n
S
Z X
( )
n
S
Z X
o
+ s
100(1-)% Lower Confidence Bound on (1-sided):
One-sided Upper CI calculated in much the same way as
One-sided Lower CI
100(1-)% Upper Confidence Bound on (1-sided):
8-2 Confidence Interval on the Mean, ,
for Large Sample
Kelloggs is interested in estimating the mean fill weight of
their corn flakes cereal boxes.
They take a sample of 60 boxes and calculate the sample
mean to be 12.05 oz. and the sample standard deviation to
be 0.14 oz.
2. Find a 90% Lower Confidence Bound for the mean fill
weight,
Believable > 12.00 oz? > 12.10 oz?
EM 561 GW DePuy 317
17
8-2 Confidence Interval on the Mean, ,
for Large Sample
Summary of Procedure
Data Distribution: Any i.e. X~ ??(,
2
)
Population parameters:
Mean, : unknown to be estimated
Variance,
2
: unknown estimate using S
2
Sample Size: large i.e. n 40
Sample Statistic: Sample Mean
Sampling Distribution: Normal Distribution
CIs:
2-sided CI Upper 1-sided CI Lower 1-sided CI
( )
n
S
Z X
o
+ s ( )
o
s
n
S
Z X
n
S
Z X
2 / o

EM 561 GW DePuy 318


8-2 Confidence Interval on the Mean, , for
Large Sample
How to do CI on , for large sample, in
Minitab?
Can either enter each observation in a
worksheet column or input summarized data
Stat
Basic Statistics
1-sample Z
EM 561 GW DePuy 319
18
8-2 Confidence Interval on the Mean, , for
Large Sample
How to do CI on , for large sample, in Minitab?
n


X
Default is 95%
two-sided CI
Options to
change
confidence level
and one-sided CI
EM 561 GW DePuy 320
8-2 Confidence Interval on the Mean, ,
for Large Sample
One-Sample Z
The assumed standard deviation = 100
N Mean SE Mean 95% CI
17 215.0 24.3 (167.5, 262.5)
EM 561 GW DePuy 321
19
8-2 Confidence Interval on the Mean,
, for known population variance,
2
Use Normal distribution to find CI for when
population variance,
2
, known
EM 561 GW DePuy 322
2-sided CI Upper 1-sided CI Lower 1-sided CI
n
Z X
o
o 2 /

n
Z X
o

o 2 /
+ s
o
o
s
n
Z X
2 /
8-3 Confidence Interval on the Mean, , of
Normal Distribution with Small Sample
Want to estimate the population mean, , of data that is
Normally distributed with unknown variance (i.e. small sample)
Take a sample of size n from Normally distributed data (Note:
n < 40)
Find sample mean,
will be Student-t distributed with n-1 degrees of freedom
How do I know this? Remember from Chpt 7: Student t Distribution is
sampling distribution of the sample mean when sample size, n, is small
and underlying distribution is Normal (or close to Normal)
Calculate upper and lower CI limit using Student-t distribution
and 1- such that P(lower limit upper limit) = 1-
X

n S
- X
EM 561 GW DePuy 323
20
8-3 Confidence Interval on the Mean, , of
Normal Distribution with Small Sample
Remember: Student-t distribution similar to
Standard Normal Distribution, N(0,1)
Both symmetric, bell-shaped curves
Both have a mean of zero,
T
=0,
Z
= 0
We use tables (t-tables or Z-table) to evaluate
the area under the curve for standardized
values
Example: standardized sample means

n
X
Z
o

=

n S
X
T

=
EM 561 GW DePuy 324
8-3 Confidence Interval on the Mean, , of
Normal Distribution with Small Sample
So for X, 100(1-)% of area under the curve within t
/2,n-1
s.d.
of mean where s.d. =
Using Student-t distribution, we find 100(1-)% of area under
curve within t
/2,n-1
of mean
f
(
x
)
0
/2 /2
1-
t
/2,n-1
-t
/2,n-1
t
f
(
x
)
0
/2 /2
1-
t
/2,n-1
-t
/2,n-1
t

n S
f
(
x
)

/2 /2
1-

X

n S t
n 1 , 2 /


n S t
n 1 , 2 /
+
o

EM 561 GW DePuy 325


21
100(1-)% Confidence Interval on (2-sided):
where t
/2,n-1
is the upper 100(/2) percentage point
of the student-t distribution with n-1 d.f.
Use t table in back of book to find appropriate t value for n
and
8-3 Confidence Interval on the Mean, ,
of Normal Distribution with Small Sample

( )
n
S
t X
n 1 , 2 /

o
n
S
t X
n
S
t X
n n 1 , 2 / 1 , 2 /
+ s s
o o

EM 561 GW DePuy 326
8-3 Confidence Interval on the Mean, ,
of Normal Distribution with Small Sample

n
S
t X
n 1 ,
+ s
o

o
s

n
S
t X
n 1 ,
100(1-)% Upper Confidence Bound on (1-sided):
100(1-)% Lower Confidence Bound on (1-sided):
EM 561 GW DePuy 327
22
8-3 Confidence Interval on the Mean, ,
of Normal Distribution with Small Sample
Suppose I measure my systolic blood pressure 5 times and
get the following values:
98, 117, 132, 105, 121
Assume my systolic blood pressure is normally distributed
1. Find a 95% CI on my mean systolic blood pressure
Believable my mean systolic BP is 100?
EM 561 GW DePuy 328
8-3 Confidence Interval on the Mean, ,
of Normal Distribution with Small Sample
2. Believable my mean systolic BP is less than 100? Find
appropriate 95% CI.
EM 561 GW DePuy 329
23
Summary of Procedure
Data Distribution: Normal i.e. X~ N(,
2
)
Population parameters:
Mean, : unknown to be estimated
Variance,
2
: unknown estimate using S
2
Sample Size: small (i.e. n< 40)
Sample Statistic: Sample Mean
Sampling Distribution: Student-t Distribution
CIs:
8-3 Confidence Interval on the Mean, ,
of Normal Distribution with Small Sample
2-sided CI Upper 1-sided CI Lower 1-sided CI

( )
n
S
t X
n 1 , 2 /

o

n
S
t X
n 1 ,
+ s
o

o
s

n
S
t X
n 1 ,
EM 561 GW DePuy 330
8-3 Confidence Interval on the Mean, , of
Normal Distribution with Small Sample
How to do CI on , with n small,
in Minitab?
Can either enter each
observation in a worksheet
column or input summarized
data
Stat
Basic Statistics
1-sample t
EM 561 GW DePuy 331
24
8-3 Confidence Interval on the Mean, , of
Normal Distribution with Small Sample
How to do CI on , with n small, in Minitab?
Default is 95%
two-sided CI
Options to
change
confidence level
and one-sided CI
EM 561 GW DePuy 332
8-3 Confidence Interval on the Mean, , of
Normal Distribution with Small Sample
One-Sample T: BP
Variable N Mean StDev SE Mean 95% CI
BP 5 114.60 13.39 5.99 (97.97, 131.23)
EM 561 GW DePuy 333
25
Confidence Intervals on
We discussed several different CI on how to
know which CI is appropriate for a particular
problem?
Depends on data distribution, whether population
variance,
2
, is known and sample size
2-sided or 1-sided CI?
IE 360 GW DePuy 334 IE 360 GW DePuy 334
data distribution

2
sample size CI distribution
any distribution unknown large Z
Normal Distribution unknown small t
Normal Distribution known any Z
Stop here on Saturday?
Turn in HW#1
Review HW#1 solutions
Test #1 on Monday!
IE 360 GW DePuy 335
26
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
Want to estimate the population variance,
2
, of data that is
Normally distributed
Take a sample of size n from Normally distributed data (Note:
n is any size)
Find sample variance, S
2
S
2
will be Chi-squared distributed (multiplied by a constant)
with n-1 degrees of freedom
How do I know this? Remember from Chpt 7: Chi-square Distribution
is sampling distribution of the sample variance,S
2
Calculate upper and lower CI limit using S
2
~ (
2
/n-1)X
2
n-1
and
1- such that P(lower limit
2
upper limit) = 1-
EM 561 GW DePuy 336
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
Using Chi-squared distribution, we find 100(1-)% of area
under curve in range X
2
1-(/2),n-1
to X
2
/2,n-1
100(1-)% of our samples will have a Chi-squared value, X
2
, in
this range where X
2
= (n-1)S
2
/
2
f
(
x
)
0
1- /2 /2

2
1
2
2
1
~

|
|
.
|

\
|

n
n
S _
o

2
2
) 1 (
o
S n

2
1 , 2 / 1 n o
_

2
1 , 2 / n o
_
f
(
x
)
0
1- /2 /2

2
1
2
2
1
~

|
|
.
|

\
|

n
n
S _
o

2
2
) 1 (
o
S n

2
1 , 2 / 1 n o
_

2
1 , 2 / n o
_
EM 561 GW DePuy 337
27
100(1-)% Confidence Interval on
2
(2-sided):
Notice: CI width is a function of confidence level, sample
size, and sample variance
Use Chi-squared table in back of book to find appropriate
Chi-squared value for n and
To find CI on , take square root of both CI limits
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
IE 360 GW DePuy 338

2
1 , 2 / 1
2
2
2
1 , 2 /
2
) 1 ( ) 1 (

s s

n n
S n S n
o o
_
o
_
IE 360 GW DePuy 338
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution

2
1 , 1
2
2
) 1 (

s
n
S n
o
_
o

2
2
1 ,
2
) 1 (
o
_
o
s

n
S n
100(1-)% Upper Confidence Bound on
2
(1-sided):
100(1-)% Lower Confidence Bound on
2
(1-sided):
To find CI on , take square root of CI limits
EM 561 GW DePuy 339
28
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
A rivet is to be inserted into a hole. A random sample of n
= 15 parts is selected, and the hole diameter is measured.
The sample standard deviation of the hole diameter
measurements is S = 0.008 mm. If the variance of the hole
diameter is too large, an unacceptable proportion of rivets
will not fit properly in the hole. Assume the hole diameter
is normally distributed.
Construct a 99% upper confidence bound for
2
Believable < 0.010 mm? < 0.015 mm?
EM 561 GW DePuy 340
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
2-sided CI Upper 1-sided CI Lower 1-sided CI
Summary of Procedure
Data Distribution: Normal i.e. X~ N(,
2
)
Population parameters:
Variance,
2
: unknown to be estimated
Sample Size: any
Sample Statistic: Sample variance
Sampling Distribution: Chi-squared Distribution
CIs:

2
1 , 2 / 1
2
2
2
1 , 2 /
2
) 1 ( ) 1 (

s s

n n
S n S n
o o
_
o
_

2
1 , 1
2
2
) 1 (

s
n
S n
o
_
o

2
2
1 ,
2
) 1 (
o
_
o
s

n
S n
IE 360 GW DePuy 341
29
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
How to do CI on
2
in Minitab?
Can either enter each observation in a
worksheet column or input summarized data
Stat
Basic Statistics
1 Variance
EM 561 GW DePuy 342
How to do CI on
2
, in Minitab?
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
n
S
Default is 95% two-
sided CI
Options to change
confidence level and
one-sided CI
Choose to enter S
or S
2
EM 561 GW DePuy 343
30
How to do CI on
2
, in Minitab?
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
Change to 99% CI
Change to 1-sided
upper bound CI
EM 561 GW DePuy 344
Test and CI for One Standard Deviation
Statistics
N StDev Variance
15 0.00800 0.000064
99% One-Sided Confidence Intervals
Upper Bound Upper Bound
Method for StDev for Variance
Standard 0.01387 0.000192
8-4 Confidence Interval on the Variance,

2
, of a Normal Distribution
EM 561 GW DePuy 345
31
8-5 Large Sample Confidence
Interval for a Population Proportion
Suppose random sample of size n has been taken
from a large population and X observations belong
to a class of interest.
Example: take sample of 500 people. Count the
number with long hair
Sample proportion, = X/n, is point estimator of
the proportion of the population that belongs to
this class.
Can form CI around sample proportion
p
EM 561 GW DePuy 346
8-5 Large Sample Confidence Interval for
a Population Proportion
Want to estimate the population proportion, p, of data from any
distribution
Take a large sample of size n from Normally distributed data
(Note: n 40)
Find sample proportion,
P will be normally distributed
How do I know this? See Chpt 4: Normal Approximation to the
Binomial Distribution. As sample size gets large, binomial distribution
can be approximated with Normal distribution.
Calculate upper and lower CI limit using Normal Distribution
and 1- such that P(lower limit p upper limit) = 1-

n
X
p =
EM 561 GW DePuy 347
32
100(1-)% Confidence Interval on p (2-sided):
where Z
/2
is the upper 100(/2) percentage point of
the standard normal distribution
Z
/2
= z for which P(Z>z) = /2
For a 90% CI, = 0.10 and Z
0.05
= 1.645
For a 95% CI, = 0.05 and Z
0.025
= 1.96
For a 99% CI, = 0.01 and Z
0.005
= 2.576
8-5 Large Sample Confidence Interval for
a Population Proportion
IE 360 GW DePuy 348
These Z values
come from the
N(0,1) table

n
p p
Z p p
n
p p
Z p
) 1 (

) 1 (

2 / 2 /

+ s s

o o

n
p p
Z p
) 1 (

2 /

o
IE 360 GW DePuy 348
100(1-)% Confidence Interval on p (2-sided):
Notice: CI centered around sample proportion
Notice: CI width is a function of confidence level,
sample size, and variance
variance of binomial is
8-5 Large Sample Confidence Interval for
a Population Proportion
IE 360 GW DePuy 349

n
p p
Z p p
n
p p
Z p
) 1 (

) 1 (

2 / 2 /

+ s s

o o

n
p p
Z p
) 1 (

2 /

o

n
X
p =
) 1 ( p p n
IE 360 GW DePuy 349
33
8-5 Large Sample Confidence Interval for
a Population Proportion

p
n
p p
Z p s

) 1 (

o

n
p p
Z p p
) 1 (


+ s
o
100(1-)% Upper Confidence Bound on p (1-sided):
100(1-)% Lower Confidence Bound on p (1-sided):
EM 561 GW DePuy 350
8-5 Large Sample Confidence
Interval for a Population Proportion
A manufacturer of electronic calculators takes a random sample
of 1200 calculators and finds there are 8 defective units.
a) Construct a 95% confidence interval on the population
proportion
Believable that 1% defective?
EM 561 GW DePuy 351
34
8-5 Large Sample Confidence
Interval for a Population Proportion
b) Is there evidence to support a claim that the fraction of
defective units produced is 1% or less?
EM 561 GW DePuy 352
8-5 Large Sample Confidence Interval
for a Population Proportion
2-sided CI Upper 1-sided CI Lower 1-sided CI
Summary of Procedure
Data Distribution: Any i.e. X~ ??(,
2
)
Population parameters:
Proportion, p: unknown to be estimated
Sample Size: large (n 40)
Sample Statistic: Sample proportion
Sampling Distribution: Normal Distribution
CIs:

n
p p
Z p
) 1 (

2 /

o

p
n
p p
Z p s

) 1 (

o

n
p p
Z p p
) 1 (


+ s
o
IE 360 GW DePuy 353
35
8-5 Large Sample Confidence Interval for a
Population Proportion
How to do CI on p, in Minitab?
Can either enter each observation in a
worksheet column or input summarized data
Stat
Basic Statistics
1 Proportion
EM 561 GW DePuy 354
8-5 Large Sample Confidence Interval for a
Population Proportion
How to do CI on p, in Minitab?
X
n
Default is 95%
two-sided CI
Options to
change
confidence level
and one-sided CI
EM 561 GW DePuy 355
36
8-5 Large Sample Confidence Interval
for a Population Proportion
Test and CI for One Proportion
Sample X N Sample p 95% CI
1 8 1200 0.006667 (0.002882, 0.013094)
EM 561 GW DePuy 356
Confidence Intervals
We discussed several different CI how to know
which CI is appropriate for a particular problem?
What population parameter is being estimated?
?
2
? p?
If being estimated, is
2
known? Sample size?
If
2
known, then use Z
If
2
unknown and large sample, then use Z
If
2
unknown and small sample, then use t
If
2
being estimated, use Chi-squared
If p being estimated, use Z
2-sided or 1-sided CI?
EM 561 GW DePuy 357
37
More Chapter 8 Examples
A drink machine is adjusted to release a certain amount of
syrup into a chamber where it is mixed with carbonated
water. A random sample of 25 drinks were found to have a
mean syrup content of 1.10 fluid ounces and a standard
deviation of 0.015 fluid ounces.
Believable that the mean syrup dispensed is 1.15 fl oz?
1.05 fl oz? Find a 95% CI.
EM 561 GW DePuy 358
More Chapter 8 Examples
The percentage of titanium in an alloy used in aerospace
castings is measured in 51 randomly selected parts. The
sample standard deviation is S = 0.37 percent.
Believable that the standard deviation is 0.50 percent? Find a
95% CI.
Believable that the variance is 0.13 percent? Find a 95% CI.
EM 561 GW DePuy 359
38
IE 360 GW DePuy 360
More Chapter 8 Examples
Need more Chapter 8 examples?
READ THE BOOK!
We covered all sections 8-1 thru 8-6 (not 8-
7)
IE 360 GW DePuy 360
IE 360 GW DePuy 361
39
IE 360 GW DePuy 362
9-1 Hypothesis Testing
In previous chapter we learned to construct CI
estimate of population parameter from sample
data
In this chapter well learn how to accept or reject a
statement or hypothesis about a parameter
Useful since we can formulate many types of decision-
making problems, tests, or experiments in the
engineering world as hypothesis-testing problems
EM 561 GW DePuy 363
40
CI and Hypothesis Testing
Both CI (Chpt 8) and Hypothesis testing (Chpt 9)
are used to make a statistical inference
Use sample statistic to guess population parameter
In CI we formed an interval around sample
statistic and used the interval to answer
questions
Believable that =12? Believable that <3.2?
In hypothesis testing we are making a statement
about the population parameter then using the
sample statistic to decide whether the statement
is believable or not
EM 561 GW DePuy 364
9-1 Hypothesis Testing
Suppose we are interested in the average length of an
adults foot
Take sample
Can use sample to form CI we think contains population
mean,
OR can use sample to decide whether a statement or
hypothesis about the population mean is believable
e.g. Hypothesis is = 27.0 cm
In both CI and hypothesis testing we use the sample
statistic (e.g. sample mean) to make inference about the
unknown population statistic (e.g. )
EM 561 GW DePuy 365
41
9-1 Hypothesis Testing
State our hypothesis in terms of Null Hypothesis, H
0
and Alternative Hypothesis, H
1
Two-sided hypothesis
H
0
: = 27.0 cm
H
1
: 27.0 cm
One-sided hypotheses
H
0
: 27.0 cm H
0
: 27.0 cm
H
1
: < 27.0 cm OR H
1
: > 27.0 cm
EM 561 GW DePuy 366
9-1 Hypothesis Testing
General hypotheses for population mean
Two-sided hypothesis
H
0
: =
0
H
1
:
0
One-sided hypotheses
H
0
:
0
H
0
:
0
H
1
: <
0
OR H
1
: >
0

EM 561 GW DePuy 367


42
9-1 Hypothesis Testing
Use sample statistics to decide which statement, H
0
or H
1
,
is more believable
Example H
0
: = 27.0 cm
H
1
: 27.0 cm
If sample mean = 26.8 cm I would believe H
0
: = 27.0 cm
If sample mean = 23.1 cm I would believe H
1
: 27.0 cm
EM 561 GW DePuy 368
9-1 Hypothesis Testing
How close does sample mean need to be to 27.0
cm for me to believe H
0
: = 27.0 cm?
How far away from 27 cm does the sample mean
need to be for me to believe H
1
: 27.0 cm?
Depends on:
Variance,
2
Sample size, n
Significance level,
EM 561 GW DePuy 369
43
9-1 Hypothesis Testing
Example: H
0
: = 27.0 H
1
: 27.0
If 26.0 sample mean 28.0, Accept H
0
If sample mean < 26.0 or if sample mean > 27.0,
Reject H
0

X
26.0 27.0 28.0
Accept H
0
= 27.0
Reject H
0
27.0
Reject H
0
27.0
EM 561 GW DePuy 370
9-1 Hypothesis Testing
Example: H
0
: = 27.0 H
1
: 27.0

X
Accept H
0
= 27.0
Reject H
0
27.0
Reject H
0
27.0
Acceptance Region
Rejection
Region
Critical values
How to choose
critical values?
Rejection
Region
EM 561 GW DePuy 371
26.0 27.0 28.0
44
9-1 Hypothesis Testing
How to choose critical values?
Use the sampling distribution of the appropriate
sample statistic to determine critical values and
therefore whether to accept or reject H
0
Use Normal distribution to find CI for when n40
Use Student t distribution to find CI for when
n<40
Use Chi-square distribution to find CI for
2
EM 561 GW DePuy 372
9-1 Hypothesis Testing
We either Accept or Reject H
0
Always in terms of accepting/rejecting H
0
, NOT H
1
When we reject H
0
we are saying H
0
is not
plausible or believable
When we accept H
0
we are saying H
0
is plausible
or believable. However accepting H
0
does not
prove H
0
to be true.
There may be other plausible H
0
EM 561 GW DePuy 373
45
9-1 Hypothesis Testing
The strongest inference is available when the null
hypothesis is rejected.
This point is important when an experimenter
decides what should be the null hypothesis and
what should be the alternative hypothesis for one-
sided problems.
In order to prove or establish a statement (say
>
0
) it is necessary to make it the alternative
hypothesis (H
1
).
For one-sided hypothesis test, make H
1
whatever
you are trying to prove
EM 561 GW DePuy 374
9-1 Hypothesis Testing
We either Accept or Reject H
0
based on our
sample statistic
Our decision to accept or reject H
0
will either be
right or wrong.
Well never know for certain whether were right or
wrong (because we do not know population parameter),
but we can find the probability were wrong
Two ways we could be wrong:
Reject H
0
when H
0
is true (type I error)
Accept H
0
when H
0
is false (type II error)
EM 561 GW DePuy 375
46
9-1 Hypothesis Testing
Probability of Type I error =
also called significance level
Probability of Type II error =
1- called power of the hypothesis test
EM 561 GW DePuy 376
9-1 Hypothesis Testing
Power, 1-, can be interpreted as the probability of
correctly rejecting a false null hypothesis.
We often compare statistical tests by comparing
their power properties.
EM 561 GW DePuy 377
47
9-1 Hypothesis Testing
General procedure for hypothesis test
1. From the problem context, identify the parameter of interest.
2. State the null hypothesis, H
0
.
3. Specify an appropriate alternative hypothesis, H
1
.
4. Choose a significance level, o.
5. Determine an appropriate test statistic.
6. State the rejection region for the statistic.
7. Calculate test statistic.
8. Decide to accept or reject H
0
and report conclusion in the
context of the problem
EM 561 GW DePuy 378
9-1 Hypothesis Testing
We will conduct hypothesis tests for
several population parameters:
Mean, , of any distribution, large sample
size (n 40)
Mean, , of normal distribution, small
sample size (n < 40)
Variance,
2
, of normal distribution
Population proportion, p
EM 561 GW DePuy 379
48
9-2 Hypothesis Test on the Mean, ,
for Large Sample
Alternative
Hypothesis Rejection criteria
Rejection
criteria in box
on p.307 are
incorrect
Null Hypothesis: H
0
: =
0
Test Statistic:
State hypothesis
Take sample
Calculate test stat
Compare to
appropriate rejection
criteria
Accept or reject H
0

o
o
o



z Z H
z Z H
z Z H
< <
> >
> =
0 0 1
0 0 1
2 / 0 0 1
:
:
:
EM 561 GW DePuy 380

n
S
X
Z
0
0

=
9-2 Hypothesis Test on the Mean, ,
for Large Sample
Calculate Test Statistic (i.e. standardize sample
mean), Z
0
a) Reject H
0
: =
0
(i.e. believe H
1
:
0
) if test statistic very large or very small
b) Reject H
0
: =
0
(i.e. believe H
1
: >
0
) if test statistic very large
c) Reject H
0
: =
0
(i.e. believe H
1
: <
0
) if test statistic very small
EM 561 GW DePuy 381
49
How does this Hypothesis Test Work?
Suppose H
0
: = 10.0 H
1
: 10.0
We want to find critical values for sample of
size 25 with =2

X
??? 10 ???
Accept H
0
= 10
Reject H
0
10
Reject H
0
10
Acceptance Region
Rejection
Region
Critical values
Rejection
Region
EM 561 GW DePuy 382
How does this Hypothesis Test Work?
For =0.05, critical values are 1.96
standard deviations from
0
=10.0
Remember from CLT
So critical values are

n
X
o
o =

216 . 9
25
2
96 . 1 10 =
|
.
|

\
|


784 . 10
25
2
96 . 1 10 =
|
.
|

\
|
+
EM 561 GW DePuy 383
50
How does this Hypothesis Test Work?
Take a sample. If sample mean between 9.216
and 10.784 then we believe =10 at =0.05

X
9.216 10 10.784
Accept H
0
= 10
Reject H
0
10
Reject H
0
10
Acceptance Region
Rejection
Region
Critical values
Rejection
Region
EM 561 GW DePuy 384
How does this Hypothesis Test Work?
Similarly we can work with the standardized values.
H
0
:=10 H
0
:Z
0
=0
H
1
:10 H
1
:Z
0
0

n
X
Z
/
0
0
o

=
Equivalent to
EM 561 GW DePuy 385

n
S
X
Z
0
0

=
Known population st. dev. Large sample size (n40)
51
How does this Hypothesis Test Work?
Take a sample. Find test statistic, Z
0
(i.e. standardize
sample mean). If Z
0
between -1.96 and 1.96 then we
believe Z
0
=0 or equivalently =10 at =0.05
-1.96 0 1.96
Accept H
0
Reject H
0
Reject H
0
Acceptance Region
Rejection
Region
Critical values
Rejection
Region
Z
0
EM 561 GW DePuy 386
9-2 Hypothesis Test on the Mean, ,
for Large Sample
The percent yield of a chemical process is being studied.
75 observations on yield have been taken and their sample
mean is 90.68% and sample variance 11.24%
Is it believable the true mean yield is 90.0%? Use =0.05
EM 561 GW DePuy 387
52
Stop here Tuesday?
Chapter 8 extra credit
EM 561 GW DePuy 388
9-2 Hypothesis Test on the Mean, ,
for Large Sample
How to do hypothesis test on , with
2
known, in Minitab?
Can either enter each observation in a
worksheet column or input summarized data
Stat
Basic Statistics
1-sample Z
EM 561 GW DePuy 389
53
9-2 Hypothesis Test on the Mean, ,
for Large Sample
How to do hypothesis test on , with
2
known,
in Minitab?
n


X
Default is H
1

0
Options to
change to H
1
>
0
or H
1
<
0

0
EM 561 GW DePuy 390
9-2 Hypothesis Test on the Mean, ,
for Large Sample
One-Sample Z
Test of mu = 90 vs not = 90
The assumed standard deviation = 3
N Mean SE Mean 95% CI Z P
5 90.68 1.34 (88.05, 93.31) 0.51 0.612
EM 561 GW DePuy 391
54
P-value
The hypothesis test can quickly show
decisions (accept or reject) for a variety of
significance levels,
We can calculate the p-value associated
with a hypothesis test
P-value is the smallest level of significance
(i.e. ) that would lead to rejection of the null
hypothesis H
0
with the given data.
EM 561 GW DePuy 392
P-value
The P-value is the probability that the test
statistic will take on a value that is at least as
extreme as the observed value of the statistic
when the null hypothesis H
0
is true.
Thus, a P-value conveys much information
about the weight of evidence against H
0
, and
so a decision maker can draw a conclusion at
any specified level of significance.
P-value indicates how strongly we believe H
0
or H
1
EM 561 GW DePuy 393
55
P-value
We reject H
0
when p-value <
Suppose p-value = 0.0316
At =0.05 accept or reject H
0
?
At =0.01 accept or reject H
0
?
So when p-value very small (i.e. near 0) we
STRONGLY reject H
0
.
We REALLY do not believe H
0
We REALLY do believe H
1
EM 561 GW DePuy 394
Relationship between
Hypothesis Test and CI
Close relationship between CI and hypothesis test
H
0
: =
0
H
1
:
0
For same sample and significance level,, the H
0
will be rejected if and only if is NOT in the 100(1-
)% CI on
This relationship between hypothesis test and CI is
same whether population parameter is ,
2
, or p
EM 561 GW DePuy 395
56
Relationship between
Hypothesis Test and CI
For a specific significance level, , a CI is
more informative than performing a
hypothesis test.
The decision made by the hypothesis test can
be deduced from the confidence interval by
looking to see whether
0
is inside the
confidence interval or not.
Thus the confidence interval portrays the
decisions made by hypothesis test for all
possible values of
0
(for a given )
EM 561 GW DePuy 396
Relationship between
Hypothesis Test and CI
However, for a specific sample, a hypothesis
test is more informative than performing a CI
because the hypothesis test can quickly
show decisions (accept or reject) for a
variety of significance levels, .
EM 561 GW DePuy 397
57
9-3 Hypothesis Test on the Mean, , of
Normal Distribution with Small Sample
Alternative
Hypothesis Rejection criteria
Null Hypothesis: H
0
: =
0
Test Statistic:
State hypothesis
Take sample
Calculate test stat
Compare to
appropriate rejection
criteria
Accept or reject H
0

n
S
X
T
0
0

=

1 , 0 0 1
1 , 0 0 1
1 , 2 / 0 0 1
:
:
:

< <
> >
> =
n
n
n
t T H
t T H
t T H
o
o
o



EM 561 GW DePuy 398
9-3 Hypothesis Test on the Mean, , of
Normal Distribution with Small Sample
Figure 9-8 The reference distribution for H
0
: =
0
with critical
region for (a) H
1
: =
0
, (b) H
1
: >
0
, and (c) H
1
: <
0
.
a) Reject H
0
: =
0
(i.e. believe H
1
:
0
) if test statistic very large or
very small
b) Reject H
0
: =
0
(i.e. believe H
1
: >
0
) if test statistic very large
c) Reject H
0
: =
0
(i.e. believe H
1
: <
0
) if test statistic very small
EM 561 GW DePuy 399
58
9-3 Hypothesis Test on the Mean, , of
Normal Distribution with Small Sample
The tar content in cigars is being studied. A sample of 30
cigars is taken and the sample mean is 1.529mg and
S=0.0566mg. Assume the tar content is normally distributed.
Can you support a claim the mean tar content exceeds
1.5mg? Use =0.05
EM 561 GW DePuy 400
How can we test to see if data is normally distributed?
9-4 Hypothesis Test on the Variance,

2
, of a Normal Distribution
Alternative
Hypothesis Rejection criteria
Null Hypothesis: H
0
:
2
=
0
2
Test Statistic:
State hypothesis
Take sample
Calculate test stat
Compare to
appropriate rejection
criteria
Accept or reject H
0

2
0
2
2
0
) 1 (
o
S n
= X

2
1 , 1
2
0
2
0
2
1
2
1 ,
2
0
2
0
2
1
2
1 , 2 / 1
2
0
2
1 , 2 /
2
0
2
0
2
1
:
:
:


< <
> >
< > =
n
n
n n
H
H
or H
o
o
o o
_ _ o o
_ _ o o
_ _ _ _ o o
Rejection criteria in box
on p.323 are incorrect
EM 561 GW DePuy 401
59
9-4 Hypothesis Test on the Variance,

2
, of a Normal Distribution
a) Reject H
0
:
2
=
0
2
(i.e. believe H
1
:
2

0
2
) if test statistic very large
or very small
b) Reject H
0
:
2
=
0
2
(i.e. believe H
1
:
2
>
0
2
) if test statistic very large
c) Reject H
0
:
2
=
0
2
(i.e. believe H
1
:
2
<
0
2
) if test statistic very small
EM 561 GW DePuy 402
9-4 Hypothesis Test on the Variance,

2
, of a Normal Distribution
An engineer is testing the tire life for a new rubber
compound. 16 tires were tested to end-of-life on a road test.
The sample mean is 60139.7 km and S = 3645.94 km.
Assume the tire life is normally distributed.
Can you conclude that the standard deviation of tire life
exceeds 3500 km? Use =0.05
EM 561 GW DePuy 403
60
9-5 Hypothesis Test on a
Population Proportion
Alternative
Hypothesis Rejection criteria
Null Hypothesis: H
0
: p = p
0
Test Statistic:
State hypothesis
Take sample
Calculate test stat
Compare to
appropriate rejection
criteria
Accept or reject H
0

) 1 (
0 0
0
0
p np
np X
Z

=
H
1
: p p
0
|Z
0
| > Z
/2
H
1
: p > p
0
Z
0
> Z

H
1
: p < p
0
Z
0
< Z

Rejection
criteria in box
on p.326 are
incorrect
EM 561 GW DePuy 404
9-5 Hypothesis Test on a
Population Proportion
The fraction of defective integrated circuits is being studied.
A random sample of 300 circuits is tested, revealing 13
defectives.
Do the data support the claim the fraction of defective units
produced is less than 0.05? Use =0.05
EM 561 GW DePuy 405
61
Hypothesis Testing
We discussed several different hypothesis tests
how to know which test is appropriate for a
particular problem?
What population parameter is being estimated?
?
2
? p?
If being estimated, is
2
known?
If
2
known or large sample, then use Z test statistic
If
2
unknown and small sample, then use t test
statistic
If
2
being estimated, use Chi-squared test statistic
If p being estimated, use Z test statistic
2-sided or 1-sided hypothesis test?
EM 561 GW DePuy 406
9-7 Testing for Goodness of Fit
Is a particular distribution a good fit to the data?
H
0
: data is well fit by proposed distribution
H
1
: data is not well fit by proposed distribution
1. Divide proposed distribution into intervals
a. # intervals 5
b. # expected observations in each interval 5
c. Therefore # observations 25
2. Count number of actual observations in each interval
3. Find number of expected observations in each interval for
proposed distribution
4. Do hypothesis test to determine goodness of fit
EM 561 GW DePuy 407
62
9-7 Testing for Goodness of Fit
The test is based on the Chi-square distribution.
Assume there is a sample of size n from a
population whose probability distribution is
unknown.
Let O
i
be the observed frequency in the ith class
interval.
Let E
i
be the expected frequency in the ith class
interval.
Test statistic:

= X
k
i
i
i i
E
E O
1
2
2
0
) (
EM 561 GW DePuy 408
9-7 Testing for Goodness of Fit
The test statistic is
Reject H
0
if
Where k = # intervals
p = # distribution parameters estimated by
sample statistics

2
1 ,
2
0
>
p k o
_ _

= X
k
i
i
i i
E
E O
1
2
2
0
) (
EM 561 GW DePuy 409
63
9-7 Testing for Goodness of Fit
Is it believable the following data (n=50) came
from a U(36,78) distribution?
77.72 45.64 45.90 72.69 43.69
58.19 57.30 59.59 76.35 69.98
43.12 61.87 74.61 57.66 45.00
44.22 77.51 66.22 74.85 68.82
43.96 36.19 76.19 46.39 55.71
57.95 55.96 43.93 63.46 54.54
59.95 76.79 54.92 63.73 55.46
36.93 62.64 63.31 46.34 67.13
48.90 61.37 67.64 57.10 76.26
57.02 42.78 42.89 56.96 62.77
EM 561 GW DePuy 410
9-7 Testing for Goodness of Fit
H
0
: data well fit by U(36,78)
H
1
: data not well fit by U(36,78)
1. Divide U(36,78) into intervals
2. Count # of actual observations in each interval
3. Find expected # observations in each interval for
U(36,78) distribution
4. Do Goodness-of-Fit Hypothesis test
EM 561 GW DePuy 411
64
9-7 Testing for Goodness of Fit
1. Divide U(36,78) into intervals
2. Count # of actual observations in each interval
interval actual # obs
[36.0, 44.4) 9
[44.4, 52.8) 6
[52.8, 61.2) 14
[61.2, 69.6) 11
[69.6, 78.0] 10
EM 561 GW DePuy 412
9-7 Testing for Goodness of Fit
3. Find expected # observations in each interval for
U(36,78) distribution
P(36.0<X<44.4) = (44.4 36.0)(1/42) = 0.20
So, E = nP = 50(0.20) = 10
Repeat for each interval.
36.0
78.0
1/42
44.4
EM 561 GW DePuy 413
65
9-7 Testing for Goodness of Fit
interval
actual # obs
(O)
Expected # obs
(E)
[36.0, 44.4) 9 10
[44.4, 52.8) 6 10
[52.8, 61.2) 14 10
[61.2, 69.6) 11 10
[69.6, 78.0] 10 10

4 . 3
10
) 10 10 (
10
) 11 10 (
10
) 14 10 (
10
) 6 10 (
10
) 9 10 (
) (
2 2 2 2 2
2
0
2
0
=

= X

= X

i
i
i i
E
E O
EM 561 GW DePuy 414
9-7 Testing for Goodness of Fit
X
0
2
= 3.4
Critical value (at =0.05) = X
2
0.05,k-p-1
= X
2
0.05,5-0-1
=9.49
Note: 0 distribution parameters estimated from data
we were told which distribution to try, U(36,78)
Since test statistic = 3.4 < critical value = 9.49, we
ACCEPT H
0
We do believe a U(36,78) distribution is a good fit to
this data
EM 561 GW DePuy 415
66
9-7 Testing for Goodness of Fit
Number of customers arriving at an ATM is
hypothesized to follow a Poisson distribution.
Data is collected over several days and the
number of customers to arrive at the ATM each
minute is summarized as follows
# arrivals per
minute
Observed
Frequency
0 146
1 132
2 72
3 36
4 11
5 3
EM 561 GW DePuy 416
9-7 Testing for Goodness of Fit
Estimate mean of hypothesized Poisson Distribution
from the observed data
E(X) = = 1.1075 Remember: E(X)=xP(x)
# arrivals per
minute, x
Observed
Frequency Observed P(X)
0 146 0.3650
1 132 0.3300
2 72 0.1800
3 36 0.0900
4 11 0.0275
5 3 0.0075
400 E(X)=1.1075
EM 561 GW DePuy 417
67
9-7 Testing for Goodness of Fit
H
0
: Data is well fit by Poisson with =1.1075
H
1
: Data is not well fit by Poisson with =1.1075
Now calculate expected frequency (for Poisson
with =1.1075) in each interval
# arrivals per
minute, x
P(X) for Poisson
with =1.1075
Expected
Frequency with
n=400
0 0.3304 132.15
1 0.3659 146.36
2 0.2026 81.05
3 0.0748 29.92
4 0.0207 8.28
5 0.0056 2.24
EM 561 GW DePuy 418
9-7 Testing for Goodness of Fit
Since last interval has < 5 expected observations,
combine it with previous interval
# arrivals per
minute, x
P(X) for Poisson
with =1.1075
Expected
Frequency with
n=400
0 0.3304 132.15
1 0.3659 146.36
2 0.2026 81.05
3 0.0748 29.92
4 0.0263 10.52
EM 561 GW DePuy 419
68
9-7 Testing for Goodness of Fit
Now perform hypothesis test
H
0
: Data is well fit by Poisson with =1.1075
H
1
: Data is not well fit by Poisson with =1.1075
Calculate test stat, X
0
2
Compare to appropriate rejection criteria
Accept or reject H
0
# arrivals per
minute
Observed
Frequency, O
Expected
Frequency, E
0 146 133.82
1 132 146.53
2 72 80.22
3 36 29.28
4 14 10.15
EM 561 GW DePuy 420
9-7 Testing for Goodness of Fit
X
0
2
= 6.3950
Critical value (at =0.05) = X
2
0.05,k-p-1
= X
2
0.05,5-1-1
=7.81

= X
k
i
i
i i
E
E O
1
2
2
0
) (
# arrivals per
minute
Observed
Frequency, O
Expected
Frequency, E X
0
2
0 146 133.82 1.1094
1 132 146.53 1.4405
2 72 80.22 0.8431
3 36 29.28 1.5413
4 14 10.15 1.4606
6.3950
EM 561 GW DePuy 421
69
9-7 Testing for Goodness of Fit
Since test statistic X
0
2
= 6.3950 < critical
value X
2
0.05,5-1-1
=7.81, we ACCEPT H
0
We do believe a Poisson =1.1075
distribution is a good fit to this data
EM 561 GW DePuy 422
More Chapter 9 examples
A particular type of gasoline is supposed to have a mean
octane rating greater than 90%. Five measurements are
taken of the octane rating as follows: 90.8%, 88.4%,
89.2%, 91.6%, 92.1%
Can you conclude that the mean octane rating is greater
than 90%? Use =0.10
EM 561 GW DePuy 423
70
More Chapter 9 examples
A rivet is to be inserted into a hole. A random sample of n
= 15 parts is selected, and the hole diameter is measured.
The sample standard deviation of the hole diameter
measurements is S = 0.008 mm. If the variance of the hole
diameter is too large, an unacceptable proportion of rivets
will not fit properly in the hole. Assume the hole diameter
is normally distributed.
Believable the population standard deviation less than
0.012 mm? use =0.01
EM 561 GW DePuy 424
More Chapter 9 examples
A new concrete mix is being designed to provide adequate
compressive strength for concrete blocks. The specs call
for the blocks to have a mean compressive strength
greater than 1350 kPa. A sample of 100 blocks is
produced and tested. The mean strength of the sample is
1360 kPa and the standard deviation is 70 kPa.
Is it believable the blocks meet spec? Use =0.05
EM 561 GW DePuy 425
71
Stop here Wednesday
Chapter 9 extra credit
EM 561 GW DePuy 426
IE 360 GW DePuy 427
72
IE 360 GW DePuy 428
Regression Models
An experimenter is often interested in how a particular
variable depends upon one or more of the other variables.
Modeling is often performed by finding a functional
relationship between the expected value of a dependent
variable, Y, and a set of independent variables, X
i
.
How does X affect Y? What is the effect of X on Y?
Linear Regression is a modeling technique in which the
expected value of a dependent variable is modeled as a
linear combination of a set of independent variables.
Many problems in engineering and science involve exploring
the relationships between two or more variables.
IE 360 GW DePuy 429
73
Regression Analysis
Example
A company that makes paper grocery bags is
interested in improving the tensile strength of their
bags. Specifically they are interested in the
relationship between the hardwood concentration in
the pulp and the tensile strength of the bag.
Regression analysis can be used to build a model
to relate tensile strength (Y) to hardwood
concentration (X).
EM 561 GW DePuy 430
Linear Regression
Fit a simple linear regression model to paper bag example
hardwood
s
t
r
e
n
g
t
h
20.0 17.5 15.0 12.5 10.0 7.5 5.0
25
20
15
10
5
Scatterplot of strength vs hardwood
EM 561 GW DePuy 431
74
11-1 Empirical Models
Based on the scatter diagram, it is probably reasonable to
assume that the variable Y is related to X by the following
simple linear regression model :
where the slope,
1
, and intercept,
0
, of the line are
called regression coefficients and where c is the random
error term. We assume the mean and variance of c are 0
and o
2
Y =
0
+
1
X +
EM 561 GW DePuy 432
11-2 Simple Linear Regression
The case of simple linear regression considers
a single regressor or predictor x and a
dependent or response variable Y.
How to fit line to data? How to find values for
slope,
1
, and intercept,
0
?
find slope,
1
, and intercept,
0
, of best fitting line.
EM 561 GW DePuy 433
75
Simple Linear Regression
Used when demand is linearly increasing or
decreasing over time
Find best fitting line to data
25
35
45
55
65
75
85
95
0 5 10 15 20 25
Time
D
e
m
a
n
d
EM 561 GW DePuy 434
11-2 Simple Linear Regression
The method of least squares is used to
estimate the parameters, |
0
and |
1
by minimizing
the sum of the squares of the vertical deviations in
Figure 11-3.
Figure 11-3
Deviations of the
data from the
estimated
regression model.
EM 561 GW DePuy 435
76
11-2 Simple Linear Regression
The sum of the squares of the deviations of the
observations from the true regression line is
How to find |
0
and |
1
that minimize L?
EM 561 GW DePuy 436
Simple Linear Regression
Fitted or estimated regression line is

x y
1 0

| | + =
EM 561 GW DePuy 437
77
Simple Linear Regression
Fit a simple linear regression model to paper bag example
Hardwood Strength
x y x
2
y
2
xy
5 7 25 49 35
5 8 25 64 40
5 15 25 225 75
5 11 25 121 55
5 9 25 81 45
5 10 25 100 50
10 12 100 144 120
10 17 100 289 170
10 13 100 169 130
10 18 100 324 180
10 19 100 361 190
10 15 100 225 150
15 14 225 196 210
15 18 225 324 270
15 19 225 361 285
15 17 225 289 255
15 16 225 256 240
15 18 225 324 270
20 19 400 361 380
20 25 400 625 500
20 22 400 484 440
20 23 400 529 460
20 18 400 324 360
20 20 400 400 400
Total 300 383 4500 6625 5310
Avg 12.5 15.96
EM 561 GW DePuy 438
Simple Linear Regression
Y = 7.25 + 0.697*X
Strength = 7.25 + 0.697*Hardwood

697 . 0
24
300
4500
24
) 300 )( 383 (
5310

2
1
=

= |

25 . 7 ) 5 . 12 ( 697 . 0 96 . 15

0
= = |
EM 561 GW DePuy 439
78
Regression Analysis in Minitab
Data in 2 columns: dependent (Y) & independent(X)
Stat
Regression
Regression
IE 360 GW DePuy 440
Regression Analysis in Minitab
Regression Analysis: strength versus hardwood
The regression equation is
strength = 7.25 + 0.697 hardwood
Predictor Coef SE Coef T P
Constant 7.250 1.301 5.57 0.000
hardwood 0.69667 0.09501 7.33 0.000
S = 2.60201 R-Sq = 71.0% R-Sq(adj) = 69.6%
Analysis of Variance
Source DF SS MS F P
Regression 1 364.01 364.01 53.76 0.000
Residual Error 22 148.95 6.77
Total 23 512.96
EM 561 GW DePuy 441
79
Finding the regression equation in
Excel
Put data in 2 columns
dependent (Y)
Independent (X)
EM 561 GW DePuy 442
Regression in Excel
In Excel
> Data Analysis
>Regression
EM 561 GW DePuy 443
80
Regression in Excel
strength = 7.25 + 0.697 hardwood
EM 561 GW DePuy 444
Prediction of New Observations
If x
0
is the value of the regressor variable of interest,
is the point estimator of the new or future value of
the response, Y
0
.
Predict bag strength for a hardwood concentration of
8%
Strength = 7.25 + 0.697(8) = 12.826

0 1 0 0
x

Y

| | + =
EM 561 GW DePuy 445
How did I know to use 8 and not 0.08?
81
Prediction of New Observations
and where

E
1 o
2
2
MS
2 N
XY

Y
=


=

| |
o
A 100(1-)% prediction interval on a future
observation Y
0
at the value x
0
is given by
where

0 1 0 0
x

Y

| | + =

( )
(
(
(
(

+
+

N
x
x
) x x (
N
1 N
t Y

2
2
2
0 2
2 N ,
2
0
o
o
EM 561 GW DePuy 446
Prediction of New Observations
Find a 95% PI for the tensile strength at a
hardwood concentration of x
0
= 8%
(7.28, 18.37)

826 . 12

0
= Y

69 . 6
2 24
) 5310 ( 697 . 0 ) 383 ( 25 . 7 6625

2
=


= o

(
(
(
(

+
24
300
4500
) 5 . 12 8 (
24
25
69 . 6 074 . 2 826 . 12
2
2
EM 561 GW DePuy 447
82
Regression Analysis in Minitab
Prediction intervals in Minitab
Options menu within Regression
Enter X
0
value for PI
IE 360 GW DePuy 448
Predicted Values for New Observations
New
Obs Fit SE Fit 95% CI 95% PI
1 12.823 0.682 (11.409, 14.237) (7.245, 18.402)
Values of Predictors for New Observations
New
Obs hardwood
1 8.00
Regression Analysis in Minitab
Difference between CI and PI:
CI refers to the true mean response at X
0
. CI based only on data used to fit
the regression model.
PI for future observation which is independent of observations used to
develop regression model. Therefore more error (fitted model error and
error associated with projecting into future) and wider interval.
EM 561 GW DePuy 449
83
Linear Regression
A few questions to answer about this linear
regression analysis
How good a fit is this line to the data?
Does the independent variable have a significant
effect on the dependent variable?
Does X have a significant effect Y?
Does hardwood concentration have a significant effect
on paper tensile strength?
EM 561 GW DePuy 450
Adequacy of the Regression Model
Coefficient of Determination (R
2
)
The quantity R
2
is called the coefficient of
determination and is often used to judge the fit of
a regression model
0 s R
2
s 1 or 0% s R
2
s 100%
We often refer to R
2
as the amount of variability in
the data explained or accounted for by the
regression model.
EM 561 GW DePuy 451
84
R
2
Higher R
2
values indicate a better fit of line to data
We often refer to R
2
as the amount of variability in
the data explained or accounted for by the
regression model
No universal cut-off or threshold R
2
value to define
good fitting model versus bad fitting model
Y
X
Y
X
Higher R
2
Lower R
2
EM 561 GW DePuy 452
Regression Analysis: strength versus hardwood
The regression equation is
strength = 7.25 + 0.697 hardwood
Predictor Coef SE Coef T P
Constant 7.250 1.301 5.57 0.000
hardwood 0.69667 0.09501 7.33 0.000
S = 2.60201 R-Sq = 71.0% R-Sq(adj) = 69.6%
Analysis of Variance
Source DF SS MS F P
Regression 1 364.01 364.01 53.76 0.000
Residual Error 22 148.95 6.77
Total 23 512.96
Regression Analysis in Minitab
EM 561 GW DePuy 453
85
Regression in Excel
EM 561 GW DePuy 454
Significance of Regression
An important part of assessing the adequacy of a linear
regression model is testing statistical hypotheses about the
model parameters.
We can form a CI on both the slope,
1
, and the intercept,
0
The most important CI is on the slope,
1
, since this CI tests
the significance of regression (i.e. Does the independent
variable have a significant effect on the dependent
variable?).
A slope of 0 indicates there is no linear relationship between
X and Y.
EM 561 GW DePuy 455
86
Significance of Regression
Figure 11-5 |
1
= 0 indicating no linear relationship
between X and Y.
EM 561 GW DePuy 456
Significance of Regression
Figure 11-6 |
1
0 indicating a linear relationship
between X and Y.
EM 561 GW DePuy 457
87
IE 360 GW DePuy 458
Significance of Regression
H
0
:
1
= 0
H
1
:
1
0
In Excel
given CI for slope. Believe
1
0 (i.e. X DOES have
significant effect on Y) if 0 not in CI.
given t test statistic for slope. Reject H
0
(i.e. X DOES have
significant effect on Y) if |T| > t
/2,n-2
or if P-value <
In Minitab
given t test statistic for slope. Reject H
0
(i.e. X DOES have
significant effect on Y) if |T| > t
/2,n-2
or if P-value <
IE 360 GW DePuy 458
Regression in Excel
IE 360 GW DePuy 459

1
0? Remember: when
1
0 then X significantly affects Y
Use CI - does CI contain 0?
If CI does not contain 0, then we believe X significantly affects Y
IE 360 GW DePuy 459
88
Regression Analysis: strength versus hardwood
The regression equation is
strength = 7.25 + 0.697 hardwood
Predictor Coef SE Coef T P
Constant 7.250 1.301 5.57 0.000
hardwood 0.69667 0.09501 7.33 0.000
S = 2.60201 R-Sq = 71.0% R-Sq(adj) = 69.6%
Analysis of Variance
Source DF SS MS F P
Regression 1 364.01 364.01 53.76 0.000
Residual Error 22 148.95 6.77
Total 23 512.96
460
Regression Analysis in Minitab
IE 360 GW DePuy 460
Hardwood concentration DOES have a significant effect on
bag strength
Another Example
How does weight (in lbs) affect time to run 100
yards?
dependent (Y)?
Independent (X)?
weight run time
120 16
142 37
137 27
129 22
137 31
112 12
148 39
132 26
126 17
EM 561 GW DePuy 461
89
Plot data in Excel
Straight line good fit?
Do Excel Scatter plot
0
5
10
15
20
25
30
35
40
45
80 100 120 140 160
weight (lbs)
r
u
n

t
i
m
e

(
s
e
c
)
EM 561 GW DePuy 462
Find Regression Equation
By Hand
In Excel
> Tools
> Data Analysis
>Regression
In Minitab
EM 561 GW DePuy 463
90
Excel Regression Output
Time = -82.1 + 0.82(weight)
Line a good fit? R
2
=0.94
95% CI on
1
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.969596224
R Square 0.940116838
Adjusted R Square 0.931562101
Standard Error 2.453310621
Observations 9
ANOVA
df SS MS F Significance F
Regression 1 661.4244245 661.4244 109.8943 1.5663E-05
Residual 7 42.13113102 6.018733
Total 8 703.5555556
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -82.0970885 10.2700248 -7.993855 9.16E-05 -106.38184 -57.81234
weight 0.816461366 0.077883967 10.48305 1.57E-05 0.63229505 1.0006277
EM 561 GW DePuy 464
Excel Regression Output
Is this line a good fit? R
2
=0.94
Does weight significantly affect run time?
Is weight useful in predicting run time?
95% CI on
1
= (0.63, 1.00)
Since CI does not contain 0, we believe
1
0, therefore weight does significantly
affect run time.
EM 561 GW DePuy 465
91
Predictions
Predict run time for weight of 135 lbs.
Predict run time for weight of 50 lbs.
EM 561 GW DePuy 466
IE 563 Dr. G.W. DePuy, UofL
467
Predictions
Predictions only meaningful in general
range of original data.
Be very careful extrapolating regression
model
Y
X
92
468
Another Example
How does outdoor temperature affect time
(in min) to unload truck?
dependent (Y)?
Independent (X)?
obs # temp unload time
7 40 37
6 40 29
12 40 35
2 50 39
13 50 40
5 50 44
16 60 47
11 60 44
1 60 46
9 70 49
4 70 53
14 70 55
3 80 46
10 80 47
18 80 43
15 90 36
17 90 37
8 90 34
IE 563 Dr. G.W. DePuy, UofL
469
Truck Unload Example
Regression Analysis: unload versus temp
The regression equation is
unload = 36.8 + 0.0848 temp
Predictor Coef SE Coef T P
Constant 36.768 6.442 5.71 0.000
temp 0.08476 0.09586 0.88 0.390
S = 6.94574 R-Sq = 4.7% R-Sq(adj) = 0.0%
Good regression model for this data? Why?
Does outdoor temperature affect time (in min) to
unload truck?
93
IE 563 Dr. G.W. DePuy, UofL
470
Truck Unload Example
Plot data straight line a good fit?
temp
u
n
l
o
a
d
90 80 70 60 50 40
55
50
45
40
35
30
Scatterplot of unload vs temp
471
Truck Unload Example
Include higher order terms in model e.g.
temp
2
term
Unload time =
0
+
1
temp +
2
temp
2
Now regression model has more than one
term called Multiple Linear Regression
94
472
Multiple Linear Regression
General multiple linear regression model with k
regressors
Y =
0
+
1
X
1
+
2
X
2
+ +
k
X
k
+
For our truck unload example
X
1
= temp
X
2
= temp
2
Model a linear function of parameters
0
,
1
,
2
, ,
k
Testing for significance of regression now becomes:
H
0
:
1
=
2
= =
k
= 0
H
1
:
k
0 for at least one k
473
Multiple Linear Regression
Do not add too many terms to model
Adding terms will always make R
2
increase
If we had 8 data points, a model with 7 higher
order terms will fit perfectly, i.e. R
2
= 100%
Y =
0
+
1
X +
2
X
2
+
3
X
3
+ +
7
X
7
But too many terms will be difficult to interpret and
will not give good predictions
So keep model as low order as possible
95
474
Multiple Linear Regression
For Multiple Linear Regression use Adjusted R
2
to measure model adequacy
Adjusted R
2
will not automatically increase as
terms added to model.
R
2
adjusted by the number of terms in model
Where n = # data points and p = # terms in
model (including constant)

( )
2 2
adj
R 1
p n
1 n
1 R
|
|
.
|

\
|

=
IE 563 Dr. G.W. DePuy, UofL
475
Truck Unload Example
Include temp
2
term
Add temp
2
column to Minitab worksheet
Calc
Calculator
96
IE 563 Dr. G.W. DePuy, UofL
476
Truck Unload Example
IE 563 Dr. G.W. DePuy, UofL
477
Truck Unload Example
Include temp
2
term in Regression model
97
478
Truck Unload Example
Regression Analysis: unload versus temp, temp^2
The regression equation is
unload = - 55.7 + 3.14 temp - 0.0235 temp^2
Predictor Coef SE Coef T P
Constant -55.71 12.22 -4.56 0.000
temp 3.1413 0.3945 7.96 0.000
temp^2 -0.023512 0.003015 -7.80 0.000
S = 3.19108 R-Sq = 81.1% R-Sq(adj) = 78.6%
Analysis of Variance
Source DF SS MS F P
Regression 2 656.87 328.43 32.25 0.000
Residual Error 15 152.75 10.18
Total 17 809.61
Model a good fit
to data?
Does temperature
affect truck
unload time?
H
0
:
1
=
2
=0
479
Truck Unload Example
Predict unload time for temperature of 55
Unload time = -55.7 + 3.14(55) 0.0235(55
2
) = 45.94 min
95% PI on unload time for temperature of 55
In Minitab, Options menu of Regression Menu, input values
for each model term
temp = 55
temp
2
= 55
2
= 3025
98
480
Truck Unload Example
Predicted Values for New Observations
New
Obs Fit SE Fit 95% CI 95% PI
1 45.937 1.046 (43.708, 48.166) (38.779, 53.094)
Values of Predictors for New Observations
New
Obs temp temp^2
1 55.0 3025
Truck Unload Example in Excel
Columns for temp, temp
2
,
unload time
Include columns for both
temp & temp
2
EM 561 GW DePuy 481
99
Truck Unload Example in Excel
EM 561 GW DePuy 482
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.90074139
R Square 0.811335052
Adjusted R Square 0.786179726
Standard Error 3.191083809
Observations 18
ANOVA
df SS MS F Significance F
Regression 2 656.865873 328.4329365 32.25301233 3.69559E-06
Residual 15 152.7452381 10.18301587
Total 17 809.6111111
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -55.7119048 12.22389608 -4.55762258 0.000377434 -81.7665224 -29.6572871
temp 3.141309524 0.394454127 7.96368782 9.10675E-07 2.300550459 3.982068589
temp^2 -0.0235119 0.003015291 -7.79755802 1.179E-06 -0.02993884 -0.01708496
Model a good fit
to data?
Does temperature
affect truck
unload time?
483
Multiple Linear Regression
Another example
A pastry chef is interested in
how number of eggs and
amount of milk effect cake
height.
Dependent (Y)?
Independent (X)?
eggs milk height
1 0.5 2.3
1 0.5 2.1
1 0.5 2.5
2 0.5 3.4
2 0.5 3.3
2 0.5 3
3 0.5 4.2
3 0.5 3.9
3 0.5 4.3
1 1 2.4
1 1 2.7
1 1 2.3
2 1 2.8
2 1 2.9
2 1 2.5
3 1 2.9
3 1 3
3 1 3.2
100
Multiple Linear Regression
EM 561 GW DePuy 484
Include terms for
Eggs, Milk, and
Eggs*Milk
Columns for all terms
and response in
Minitab
Multiple Linear Regression
Regression Analysis: height versus eggs, milk, eggs*milk
The regression equation is
height = 0.600 + 1.55 eggs + 1.58 milk - 1.27 eggs*milk
Predictor Coef SE Coef T P
Constant 0.6000 0.3630 1.65 0.121
eggs 1.5500 0.1680 9.22 0.000
milk 1.5778 0.4592 3.44 0.004
eggs*milk -1.2667 0.2126 -5.96 0.000
S = 0.184089 R-Sq = 93.2% R-Sq(adj) = 91.8%
EM 561 GW DePuy 485
What terms
significantly
effect cake
height?
Is this model a good fit?
101
Multiple Linear Regression
Predict cake height for 1.5 eggs and cup
of milk.
height = 0.600 + 1.55 eggs + 1.58 milk -
1.27 eggs*milk
height = 0.600 + 1.55 *(1.5) + 1.58 *(.75) -
1.27 *(1.5)*(.75) = 2.6833
EM 561 GW DePuy 486
Multiple Linear Regression
Interaction of #eggs and milk
EM 561 GW DePuy 487
3 2 1
4.0
3.5
3.0
2.5
eggs
M
e
a
n
0.5
1.0
milk
Interaction Plot for height
Fitted Means
102
Multiple Linear Regression in Excel
EM 561 GW DePuy 488
Columns for eggs, milk,
eggs*milk, height
Include columns for eggs, milk,
eggs*milk
Multiple Linear Regression in Excel
EM 561 GW DePuy 489
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.96564146
R Square 0.93246342
Adjusted R Square 0.9179913
Standard Error 0.18408935
Observations 18
ANOVA
df SS MS F Significance F
Regression 3 6.550555556 2.18351852 64.431694 1.9533E-08
Residual 14 0.474444444 0.03388889
Total 17 7.025
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 0.6 0.363029095 1.65276009 0.12062076 -0.17861997 1.37861997
eggs 1.55 0.168049816 9.22345549 2.5181E-07 1.18956899 1.91043101
milk 1.57777778 0.459199518 3.43593082 0.00401543 0.59289277 2.56266279
eggs*milk -1.26666667 0.212568072 -5.95887546 3.4936E-05 -1.72257984 -0.8107535
What terms significantly
effect cake height?
Is this model a good fit?
103
Test #2 here
Take home exam
You may use book, notes, calculator
You may not discuss test with anyone other
than me.
Due
Covers Chapters 6, 7, 8, 9, 11
EM 561 GW DePuy 490
Good-Bye!
Thanks!
Keep working hard you will be successful!!
Good luck!!
EM 561 GW DePuy 491

S-ar putea să vă placă și