Sunteți pe pagina 1din 83

Inference on a Single

Mean

L. Wang, Department of Statistics


University of South Carolina

Use Calculation from Sample to


Estimate Population Parameter
Population

(select)

Sample
(calculate)

(describes)

Parameter

p?

(estimate)

Statistic

p 63%
L. Wang, Department of Statistics
University of South Carolina; Slide 2

Use Calculation from Sample to


Estimate Population Parameter
Population

(select)

Sample
(calculate)

(describes)

Parameter

(estimate)

Statistic

y 2,200hrs
L. Wang, Department of Statistics
University of South Carolina; Slide 3

Statistic
Describes

sample.
Always known
Changes upon
repeated
sampling.
Examples:
2

y , s , s, p

Parameter
Describes

a
population.
Usually unknown
Is fixed
Examples:

, , , p
2

L. Wang, Department of Statistics


University of South Carolina; Slide 4

A Statistic is a Random Variable

Upon repeated sampling of the same


population, the value of a statistic changes.

While we dont know what the next value


will be, we do know the overall pattern over
many, many samplings.

The distribution of possible values of a


statistic for repeated samples of the same
size from a population is called the
sampling distribution of the statistic.
L. Wang, Department of Statistics
University of South Carolina; Slide 5

Sampling Distribution ofy


If a random sample of size n is taken
from a normal population having mean
y and variance yy2, then
is a random
variable which is also normally
distributed with mean y and variance
y2/n .

L. Wang, Department of Statistics


University of South Carolina; Slide 6

Sampling Distribution ofy


Original Population

Averages - Sample Size = 10

N(100,5)

80

85

90

95

100

105

110

115

N(100,1.58)

120

80

85

90

95

100

90

95

100
X(2)

115

120

Averages - Sample Size = 25

N(100,3.54)

85

110

X(10)

Averages - Sample Size = 2

80

105

105

110

115

N(100,1)

120

80

85

90

95

100

105

110

115

120

X(25)
L. Wang, Department
of Statistics
University of South Carolina; Slide 7

Light Bulbs
The life of a light bulb is normally
distributed with a mean of 2000 hours
and standard deviation of 300 hours.
What is the probability that a
randomly chosen light bulb will have a
life of less than 1700 hours?
What is the probability that the mean
life of three randomly chosen light
bulbs will be less than 1700 hours?

L. Wang, Department of Statistics


University of South Carolina; Slide 8

Why Averages Instead of Single


Readings?

Suppose we are manufacturing light bulbs.


The life of these bulbs has historically
followed a normal distribution with a mean of
2000 hours and standard deviation of 300
hours.
We change the filament material and
unbeknown to us the average life of the bulbs
decreases to 1500 hours. (We will assume
that the distribution remains normal with a
standard deviation of 300 hours.)
If we randomly sample 1 bulb, will we realize
that the average life has decrease? What if
we sample 3 bulbs? 9 bulbs?
L. Wang, Department of Statistics
University of South Carolina; Slide 9

Why Averages Instead of


Single Readings?
= 1500

800

1300

= 300

= 2000

1800

2300

2800

Single Readings
Y < 1400 would signal shift

L. Wang, Department of Statistics


University of South Carolina; Slide 10

Why Averages Instead of


Single Readings?
= 1500

800

1300

= 173

= 2000

1800

2300

2800

Averages of n = 3
Y < 1650 would signal shift

L. Wang, Department of Statistics


University of South Carolina; Slide 11

Why Averages Instead of


Single Readings?
= 1500
=
= 1500
1500

= 2000
=
= 2000
2000

= 100

Averages of n = 9
Y < 1800 would signal shift

L. Wang, Department of Statistics


University of South Carolina; Slide 12

What if the original


distribution is not normal?
Consider the roll of a fair
die:
Rolling A Fair Die

Probability

0.20
0.15
0.10
0.05
0.00
1

# of Dots

L. Wang, Department of Statistics


University of South Carolina; Slide 13

Let

Suppose the single


measurements are not
normally Distributed.

Y = life of a light bulb in


hours
Y is exponentially distributed
with = 0.0005 = 1/2000
0.000
5

L. Wang, Department of Statistics


University of South Carolina; Slide 14

Single measurements

Averages of 2 measurements

Averages of 4 measurements

Averages of 25

Source: Lawrence L.
Lapin, Statistics in
Modern Business
measurementsDecisions, 6th ed.,
1993, Dryden Press,
Ft. Worth, Texas.

L. Wang, Department of Statistics


University of South Carolina; Slide 15

n=1

n=2

n=4

As n increases, what
happens to the
variance?
A.Variance increases.
B.Variance decreases.
C.Variance remains
the same.

n=25

L. Wang, Department of Statistics


University of South Carolina; Slide 16

n=1
n=2
n=4

n=
25
L. Wang, Department of Statistics
University of South Carolina; Slide 17

Central Limit Theorem


If

n is sufficiently large, the sample


means of random samples from a
population with mean and
standard deviation are
approximately normally distributed
with mean
/ nand standard
deviation
.
L. Wang, Department of Statistics
University of South Carolina; Slide 18

Random Behavior of Means


Summary
Y is distributed n(, ), thenyn
is distributed N(, / n
).

If

If

Y is distributed non-N(, ), then


yn30
is distributed approximately
N(, / n ).
L. Wang, Department of Statistics
University of South Carolina; Slide 19

If We Can Consider
y
Normal

to be

Recall: If Y is distributed normally


with mean and standard deviation
, then
Y

So if
is distributed normally
with
/ n
mean and standard deviation
,
Y
Z
then
/ n
L. Wang, Department of Statistics
University of South Carolina; Slide 20

If the time between adjacent accidents


in an industrial plant follows an
exponential distribution with an
average of 700 days, what is the
probability that the average time
between 49 pairs of adjacent
accidents will be greater than 900
days?

L. Wang, Department of Statistics


University of South Carolina; Slide 21

XYZ Bottling Company claims


that the distribution of fill on
its 16 oz bottles averages 16.2
ounces with a standard
deviation of 0.1 oz. We
randomly sample 36 bottles
and get y = 16.15. If we
assume a standard deviation of
0.1 oz, do we believe XYZs
claim of averaging
16.2
L. Wang, Department of Statistics
University of South Carolina; Slide 22

Up Until Now We have been


Assuming that We Knew the True
Standard Deviation (), But Lets
Face Facts

When we use s to estimate , then the


calculated value y

s/ n
follows a t-distribution with n-1
degrees of freedom.
Note: we must be able to assume that we are
sampling from a normal population.
L. Wang, Department of Statistics
University of South Carolina; Slide 23

Lets take another look at XYZ


Bottling Company. If we
assume that fill on the
individual bottles follows a
normal distribution, does the
following data support the
claim of an average fill of 16.2
oz? 16.1 16.0 16.3 16.2 16.1
L. Wang, Department of Statistics
University of South Carolina; Slide 24

In Summary
When

we know :

y
Z
/ n
When

we estimate with s:

y
t df n 1
s/ n

We assume we are
sampling from a
normal population.
L. Wang, Department of Statistics
University of South Carolina; Slide 25

Relationship Between Z and t


Distributions
Z
tdf=3
tdf=1

-4

-3

-2

-1

L. Wang, Department of Statistics


University of South Carolina; Slide 26

Internal Combustion Engine


The

nominal power produced by a studentdesigned internal combustion engine is 100


hp. The student team that designed the
engine conducted 10 tests to determine the
actual power. The data follow:
98, 101, 102, 97, 101, 98, 100, 92, 98, 100

Assume data came from a normal


distribution.

L. Wang, Department of Statistics


University of South Carolina; Slide 27

Internal Combustion Engine


Summary Data:
Column
hp

Mean
10

98.7

Std. Dev.
2.9

What is the probability of getting a


sample mean of 98.7 hp or less if the
true mean is 100 hp?
L. Wang, Department of Statistics
University of South Carolina; Slide 28

Internal Combustion Engine

P ( y 98.7 | 100) P t df 9

98.7 100

P (t df 9 1.418)
2.9 / 10

0.0949

What did we assume when doing this


analysis?
L. Wang, Department of Statistics
University of South Carolina; Slide 29

Can We Assume Sampling from


a
Normal Population?

If data are from a normal population,


there is a linear relationship between
the data and their corresponding Z
values.
Y
Z
Y Z

If we plot y on the vertical axis and z on the horizontal


axis, the y intercept estimates and the slope estimates .
L. Wang, Department of Statistics
University of South Carolina; Slide 30

How to Calculate
Corresponding Z-Values
Order

data
Estimate percent of population below
each data point.i 0.5

Pi

where i is a data points position in the


ordered set and n is the number of data
in the
set. that has P proportion
points
Look up
Z-Value
i

of distribution below it.


L. Wang, Department of Statistics
University of South Carolina; Slide 31

Normal Probability (QQ) Plot


Data set:
2

10

Pi

yi

-1.15

.125

-0.32

.375

+0.32

.625

+1.15

.875

10

Normal QQ Plot
12
10

Data

8
6
4
2
0
-1.5

-1

-0.5

0.5

1.5

Z values

L. Wang, Department of Statistics


University of South Carolina; Slide 32

Normal Probability (QQ) Plot


QQ Plot with Data on Vertical Axis
16
14
12
10
8
6
4
2
0
-3

-2

-1

This data is a random sample from a N(10,2) population.


L. Wang, Department of Statistics
University of South Carolina; Slide 33

Normal Probability (QQ) Plot


QQ Plot with Data on Vertical Axis
16
14
12
10
8
6
4
2
0
-3

-2

-1

L. Wang, Department of Statistics


University of South Carolina; Slide 34

Estimation of the
Mean

L. Wang, Department of Statistics


University of South Carolina

Point Estimators

A point estimator is a single number


calculated from sample data that is used
to estimate the value of a parameter.
Recall that statistics change value upon
repeated sampling of the same population
while parameters are fixed, but unknown.
Examples:

p estimates p

y estimates

s estimates s estimates
2

L. Wang, Department of Statistics


University of South Carolina; Slide 36

In General: is an estimator of the arbitrary parameter


What makes a Good
estimator?

(1) Accuracy: An unbiased estimator of a


parameter is one whose expected value is
equal to the parameter of interest.
(2) Precision: An estimator is more
precise if its sampling distribution
has a smaller standard error*.

*Standard error is the standard


deviation for the samplingL. Wang, Department of Statistics

University of South Carolina; Slide 37

Unbiased Estimators
For normal populations, both the
sample mean and sample median
are unbiased estimators of .
Sampling Distributions for Mean and Median

mean
median

-8

-6

-4

-2

L. Wang, Department of Statistics


University of South Carolina; Slide 38

Most Efficient Estimators

If you have multiple unbiased estimators,


then you choose the estimator whose
sampling distribution has the least variation.
This is called the most efficient estimator.
Sampling Distributions for Mean and Median

mean
median

-8

-6

-4

-2

For normal populations, the sample mean is the most efficient


L. Wang, Department of Statistics
estimator of .
University of South Carolina; Slide 39

Interval Estimate of the


Mean
Y
n

/ n

follows a standard normal distribution

Y
P(1.96
1.96) 0.95
/ n
P (Y z / 2

(with a little algebra)

Y z / 2
) (1 )
n
n

So we say that we are 95% sure


that is in the interval

Y 1.96
n

What assumptions have we


made?
L. Wang, Department of Statistics
University of South Carolina; Slide 40

Interval Estimate of the


Mean
Standard Normal

0.95

.025

-4

-3

-2

-1.96

-1

.025
1

1.96

L. Wang, Department of Statistics


University of South Carolina; Slide 41

Interval Estimate of the


Mean
Lets

go from 95% confidence to the


general case.
The symbol z is the z-value that has
an area of to the right of it.
P ( z / 2
P (Y z / 2

z / 2 ) (1 )
/ n

Y z / 2
) (1 )
n
n
L. Wang, Department of Statistics
University of South Carolina; Slide 42

Interval Estimate of the


Mean
Standard Normal

1-

/2

-4

-3

-Z/2

-2

-1

/2

+Z/22

(1 ) 100% Confidence Interval


L. Wang, Department of Statistics
University of South Carolina; Slide 43

What Does (1 ) 100% Confidence


Mean?
Sampling Distribution
of the y

n( , / n)

y
y
y

xy

y
y

(1-)100%
Confidence
Intervals

L. Wang, Department of Statistics


University of South Carolina; Slide 44

If Z0.05 = 1.645, we are _____%


confident that the mean is
between

y 1.645
n
A.99%
B.95%
C.90%
D.85%

L. Wang, Department of Statistics


University of South Carolina; Slide 45

Which z-value would you


use to calculate a 99%
confidence interval on a
mean?
Z0.10 = 1.282
B. Z0.01 = 2.326
A.

Z0.005 = 2.576
D. Z0.0005 = 3.291
C.

L. Wang, Department of Statistics


University of South Carolina; Slide 46

Plastic Injection Molding


Process
A

plastic injection molding process for a


part that has a critical width dimension
historically follows a normal distribution
with a standard deviation of 8.
Periodically, clogs from one of the
feeder lines causes the mean width to
change. As a result, the operator
periodically takes random samples of
size 4.
L. Wang, Department of Statistics
University of South Carolina; Slide 47

Plastic Injection Molding


A

recent sample of four yielded a


sample mean of 101.4.
Construct a 95% confidence interval
for the true mean width.
Construct a 99% confidence for the
true mean width.

L. Wang, Department of Statistics


University of South Carolina; Slide 48

When going from a 95%


confidence interval to a 99%
confidence interval, the width of
the interval will
Increase.
B. Decrease.
C. Remain the same.
A.

L. Wang, Department of Statistics


University of South Carolina; Slide 49

Interval Width, Level of


Confidence and Sample Size
At

a given sample size, as level of


confidence increases, interval width
__________.

At

a given level of confidence as


sample size increases, interval width
__________.
L. Wang, Department of Statistics
University of South Carolina; Slide 50

Calculate Sample Size Before


Sampling!

The width of the interval is determined by:


z / 2

Suppose we wish to estimate the mean to


a maximum error of d:

Max error d z / 2

z / 2
n

L. Wang, Department of Statistics


University of South Carolina; Slide 51

Plastic Injection Molding


A plastic injection molding process for
a part that has a critical width
dimension historically follows a normal
distribution with a standard deviation
of 8.
What sample size is required to
estimate the true mean width to within
+ 2 units at 95% confidence?
What sample size is required to
estimate the true mean width to within
+ 2 units at 99% confidence?
L. Wang, Department of Statistics

University of South Carolina; Slide 52

If we dont have prior knowledge


of the standard deviation, but can
assume we are sampling from a
normal population
Instead

of using a z-value to calculate


the confidence interval
P (t / 2

P (Y t / 2

t / 2 ) (1 )
s/ n
s
s
Y t / 2
) (1 )
n
n

L. Wang, Department of Statistics


University of South Carolina; Slide 53

Interval Estimate of the


Mean
Standard Normal

1-

/2

-4

-3

df=n-1

-t/2

-2

-1

/2

+t/22

(1 ) 100% Confidence Interval


L. Wang, Department of Statistics
University of South Carolina; Slide 54

Plastic Injection Molding


Reworded
A

plastic injection molding process for


a part that has a critical width
dimension historically follows a normal
distribution.
A recent sample of four yielded a
sample mean of 101.4 and sample
standard deviation of 8.
Estimate the true mean width with a
95% confidence interval.
L. Wang, Department of Statistics
University of South Carolina; Slide 55

Hypothesis Testing

L. Wang, Department of Statistics


University of South Carolina

Statistical Hypothesis
A

statistical hypothesis is an
assertion or conjecture concerning one
or more population parameters.
Examples:
More than 7% of the landings for a certain
airline exceed the runway.
The defective rate on a manufacturing line
is less than 10%.
The mean lifetime of the bulbs is above
2200 hours.
L. Wang, Department of Statistics
University of South Carolina; Slide 57

The Null and Alternative


Hypotheses

Null Hypothesis, Ho, represents what we


assume to be true. It is always stated so
as to specify an exact value of the
parameter.
Alternative (Research) Hypothesis, H1
or Ha, represents the alternative to the
null hypothesis and allows for the
possibility of several values. It carries the
burden of proof.
In most situations, the researcher hopes to
disprove or reject the null hypothesis in
favor of the alternative hypothesis.

L. Wang, Department of Statistics


University of South Carolina; Slide 58

Steps to a Hypothesis
Test
(1)
(2)
(3)

(4)

Determine the null and alternative


hypotheses.
Collect data and calculate test statistic,
assuming null hypothesis it true.
Assuming the null hypothesis is true,
calculate the p-value or use rejection
region method.
Draw conclusion and state it in English.
L. Wang, Department of Statistics
University of South Carolina; Slide 59

Two types of mistakes


(1) Type I error
Reject null hypothesis when it is true.
(2) Type II error
Fail to reject the null hypothesis when
the alternative hypothesis is true.
Let = P(type I error), =P(type II error)
Power of the test is 1-.
L. Wang, Department of Statistics
University of South Carolina; Slide 60

Combustion Engine
The nominal power produced by a student
designed combustion engine is assumed to
be at least 100 hp. We wish to test the
alternative that the power is less than 100
hp.
Let = nominal power of engine.
QQ plots shows it is reasonable to assume
data came from a normal distribution.
Sample Data:

n 10

y 98.7 s 2.8694
L. Wang, Department of Statistics
University of South Carolina; Slide 61

Combustion Engine
(1) State hypotheses, set alpha.
(2) Choose test statistic
(3,4) Designate critical value for test
( if using the rejection region
method) and draw
or conclusion
Calculate p-value and draw
conclusion.
L. Wang, Department of Statistics
University of South Carolina; Slide 62

(3) Designate Rejection


Region
Assumes H0: = 100 is true

0.05
100
-4

-3

-2

-1

Y=avg hp
+1

+2

+3

+4

tdf=9
-1.833

L. Wang, Department of Statistics


University of South Carolina; Slide 63

Draw conclusion:
t df 9

y 0
98.7 100

1.4327
s / n 2.8694 / 10

-1.4327
-1.833

tdf=9
L. Wang, Department of Statistics
University of South Carolina; Slide 64

p-value
The

p-value is the probability of


getting the sample result we got or
something more extreme.

0.0928

tdf=9
-1.4327

L. Wang, Department of Statistics


University of South Carolina; Slide 65

p-value
P(tdf=9

< -1.4327) = 0.0928

Note:

If p-value < , reject H0.


If p-value > . Fail to reject H0.

0.0928
0.05

1.432
-1.833

tdf=9
L. Wang, Department of Statistics
University of South Carolina; Slide 66

Average Life of a Light Bulb


Historically, a particular light bulb
has had a mean life of no more
than 2000 hours. We have
changed the production process
and believe that the life of the
bulb has increased.
Let

=
mean
life.
(1) Set Up Hypotheses
= 0.05
H0:
Ha:

L. Wang, Department of Statistics


University of South Carolina; Slide 67

Average Life of a Light Bulb


(2) Collect Data and calculate test statistic:

y 2141
t df 14

s 216

n 15

y 0 2141 2000

2.5282
s/ n
216 / 15

0.05
0.0121

tdf=14
1.761 2.5282
L. Wang, Department
of Statistics
p-value = P(tdf=14 > 2.5282)
=
0.0121
University of South Carolina; Slide 68

Average Life of a Light Bulb


State Conclusion:
At 0.05 level of significance
there is insufficient evidence to
conclude that > 2000 hours.
B. At 0.05 level of significance
there is sufficient evidence to
conclude that > 2000 hours.
A.

L. Wang, Department of Statistics


University of South Carolina; Slide 69

Mean Width of a Manufactured


Part
Test

the theory that the mean width


of a manufactured part differs from
100 cm.

Let = mean width.


(1) Set up Hypotheses

= 0.05

L. Wang, Department of Statistics


University of South Carolina; Slide 70

Mean Width of a Manufactured


Part

(2,3) Collect data and calculate test

statistic.
y 105

s 6 n 20

t df 19
p value 2 * P (t df 19 ....
(4) State conclusion.
L. Wang, Department of Statistics
University of South Carolina; Slide 71

Given population parameter and value


0:
For Ho: = 0
/2

Ha: = 0

/2

Ha

H0

Ha

Ha: > 0

H0

Ha

Ha: < 0

Ha

L.HWang,
Department of Statistics
0
University of South Carolina; Slide 72

Focus on the two types of


errors in hypothesis test
1)

Reject H0 when H0 is true. This is called a


type I error.
P(Rej H0|H0 is true) =

2)

Fail to Reject H0 when Ha is true at some


value. This is called a type II error.
P(Fail to Rej H0|Ha is true at some value)
=
L. Wang, Department of Statistics
University of South Carolina; Slide 73

Avg Life of Light Bulb - Type I


Error
H0: < 2000
Ha: > 2000

Assumes H0
is true.

Fail to reject H0.

= Probability
that we will reject
Ho when Ho is
L. Wang, Department of Statistics
true.
University of South Carolina; Slide 74

Type I and Type II Errors


H0: = 2000

= Probability we will
fail to reject Ho when
Ha is true at = 2200

What if = 2200

= Probability
that we will reject
Ho when Ho is
L. Wang, Department of Statistics
true.
University of South Carolina; Slide 75

How can we control the size of


?
The

value of .

Location
Sample

of our point of interest.

size.
L. Wang, Department of Statistics
University of South Carolina; Slide 76

Calculating
If

= 2200, what is the probability of a


type II error?
Given: = 0.05 and we are assuming
= 2000. We will also assume we know
= 216.

P ( Z 1.645) 0.05
y 2000
1.645
y 2091
216 / 15
L. Wang, Department of Statistics
University of South Carolina; Slide 77

Calculating
H0: = 2000

Fail to Reject Ho

What if = 2200

2091

Reject Ho

P( y 2091 | 2200)
L. Wang, Department of Statistics
University of South Carolina; Slide 78

Calculating
P ( y 2091 | 2200)
2091 2200
P z
P ( z 1.9544) 0.0254
216 / 15

P (Fail to Reject H 0 | 2200) 0.0254


L. Wang, Department of Statistics
University of South Carolina; Slide 79

, and Power

= P(Reject H0| = 2000) = 0.05

= P(Fail to Rej H0| = 2200) = 0.0254

We

say that the power of this test at


= 2200 is 1 0.0254 = 0.9746

Power

= 1
Power = P(Rej H0| is at some Ha level)
L. Wang, Department of Statistics
University of South Carolina; Slide 80

Plastic Injection Molding


A

plastic injection molding process for a


part that has a critical width dimension
historically follows a normal
distribution.
A recent sample of n = 4 yielded a
sample mean of 101.4 and sample
standard deviation of 8.
Does this data support the statement:
The true average width is greater than
95.?
L. Wang, Department of Statistics
University of South Carolina; Slide 81

Plastic Injection Molding


Confidence Interval Approach
95%

confidence interval on :

y t df 3, 0.025

s
n

8
101.4 3.182
101.4 12.728
4

(93.56,109.24)
L. Wang, Department of Statistics
University of South Carolina; Slide 82

Plastic Injection Molding


Hypothesis Test Approach
H0:

= 0.05

Ha:
Test statistics is
p-value =
Conclusion:
L. Wang, Department of Statistics
University of South Carolina; Slide 83

S-ar putea să vă placă și