Sunteți pe pagina 1din 35

Chapter 11

Sampling Distributions

BPS - 5th Ed.

Chapter 11

Sampling Distributions
Outline of Chapter 11
Parameters and Statistics
Statistical Estimation and the Law of Large
Numbers
Sampling Distribution
The Sampling Distribution of the Sample Mean
The Central Limit Theorem
BPS - 5th Ed.

Chapter 11

Statistical Inference
Data generated from random sampling or
randomized comparative experiments
Do a statistical analysis on the sample data
Using the laws of probability to answer the
question What would happen if we did this
very many times?

Statistical Inference:
Using the sample to infer something about
the population based on the above
Reasoning rests on asking How often
would this method give a correct answer if
I used it very many times?
BPS - 5th Ed.

Chapter 11

Sampling Terminology
Parameter

a number that describes the population


in practice, the value is unknown number
For example:
, population mean
, population standard deviation
p, population proportion

Statistic

known value calculated from a sample


a statistic is often used to estimate a parameter
For Example:
, sample mean
s, sample standard deviation
p , sample proportion

BPS - 5th Ed.

Chapter 11

Sampling Terminology (continued)

A Statistic

Estimate
s

, sample mean

estimates

, population mean

S , sample standard
p
deviation

estimates

, population standard
deviation

estimate

p, population proportion

, sample proportion

A Parameter

Statistics come from samples.


Parameters come from the population

BPS - 5th Ed.

Chapter 11

Sampling Terminology (continued)


Variability

different samples from the same population


may yield different values of the sample
statistic

Sampling Distribution
tells what values a statistic takes and how
often it takes those values in repeated
sampling

Sampling Distribution of a Statistic


The distribution of values taken by the
statistic in all possible samples of the
same size from the same population.

BPS - 5th Ed.

Chapter 11

Parameters and Statistics

We want to use the sample proportion, p to


estimate the population proportion, p.
If we want to estimate the proportion of people in
the U.S. who watch a certain television program ,
we can proceed as follows:
A properly chosen sample of 1600 people across the
United States was asked if they regularly watch a
certain television program
The parameter of interest here is the true proportion
of all people in the U.S. who watch the program
(unknown), while the statistic is the value 24%
obtained from the sample of 1600 people.
= 24% said yes, use this as our estimate for p
So, p
Parameter

statistic
BPS - 5th Ed.

Chapter 11

Parameters and Statistics


We want to use the sample mean x to estimate the
population mean .
If we want to estimate the heights of eight-year-old
girls, we can proceed as follows:
Randomly select 100 eight-year-old girls.
Compute the sample mean of the 100 heights.
Use that as our estimate.

This is using the sample mean to estimate the


population mean.
Statistic

Parameter

BPS - 5th Ed.

Chapter 11

x
xp p

Parameter vs. Statistic


The

mean of a population is denoted by this


is a parameter.
The mean of a sample is denoted by
this is
a statistic. is used to estimate .
The

true proportion of a population with a certain


trait is denoted by p this is a parameter.

The

proportion of a sample with a certain trait is


denoted by (p-hat) this is a statistic. is
used to estimate p.

BPS - 5th Ed.

Chapter 11

The Law of Large Numbers

Law of Large Numbers as the sample


size increases, the sample mean gets
closer to the population mean. That
is , the difference between the sample
mean and the population mean tends
to become smaller (i.e., approaches
zero).
For Example:

BPS - 5th Ed.

gets closer to )

Chapter 11

10

The Law of Large Numbers


Gambling
The house in a gambling operation is not
gambling at all
the games are defined so that the gambler has
a negative expected gain per play (the true
mean gain after all possible plays is negative)
each play is independent of previous plays, so
the law of large numbers guarantees that the
average winnings of a large number of
customers will be close the the (negative) true
average

BPS - 5th Ed.

Chapter 11

11

Distribution of the Sample Mean


Suppose we take a series of different random samples.
Sample 1 we compute sample mean x1
Sample 2 we compute sample mean x2
Sample 3 we compute sample mean x3
etc.

Each time we sample, we may get a different result. The


sample mean x is a random variable! Therefore, the sample
mean has a mean, a standard deviation, and a probability
distribution.
Since the sample mean is determined by chance, there is
variability in our point estimates.
This variability leads to uncertainty as to whether our estimates
are correct.
So, need some way to indicate the reliability of statements
made about a population based on sample data.
BPS - 5th Ed.

Chapter 11

12

Distribution of the Sample Mean

Example of a Sampling Distribution of the


Sample Mean

Populatio
n

sample , n
1

= 10 , 1 = 23.3

Sample2, n2 = 10 , 2 = 22.7
Sample3, n3 = 10 , 3 = 23.8
.
.
.
Sample15, n15 = 10 , 15 = 23.2

Since we do not know the value of in advance when we take


a sample of 10 from our population, is a random variable.
And will have a distribution with a mean, , and a standard
deviation, . This distribution is called the Sampling
Distribution of the Sample Mean.
We can estimate by taking the mean of the 15 from our 15
different samples above ((1 + 2+ +15)/15)
BPS - 5th Ed.

Chapter 11

13

Distribution of the Sample Mean


Show example in EXCEL
For more practice look at 11.7 on
page 298
(instead of using Table B, use
MINITAB to generate the 10 Simple
Random Samples, SRS)

BPS - 5th Ed.

Chapter 11

14

Distribution of the Sample Mean


(cont.)
Terminology: Probability distribution of a statistic is called
a sampling distribution.
Use the sampling distribution to make probability
statements about the values the sample mean takes on
and the likelihood that our estimates of the population
mean based on the sample mean are accurate.
The sampling distribution of the sample mean depends
upon:

Sample size n.
Mean of the population.
Standard deviation of the population.
Shape of the population distribution.

BPS - 5th Ed.

Chapter 11

15

Distribution of the Sample Mean


(cont.)
Description of the Sampling Distribution:
If you repeated taking simple random samples of size n from
a population and calculate the sample mean for each sample,
the distribution of this accumulation of sample means is the
sampling distribution of the sample mean.

Important Property:
The sampling distribution of the sample mean does not
necessarily look like the population distribution.

BPS - 5th Ed.

Chapter 11

16

Example: Sampling Distribution


Lets illustrate the concept of the sampling distribution
of the sample mean with an example.
P o p u la t io n D is t rib u tio n F ro m W h ic h
W e W ill T a k e S a m p le s
0 .2 8

P o p u la t io n P r o p o r t io n

We will be
taking samples
from this
population
distribution:

Population
Mean = 2.56

0 .2 4
0 .2 0
0 .1 6
0 .1 2
0 .0 8
0 .0 4
0 .0 0
0

10

11

V a lue s o f X

This distribution is skewed to the right and looks nothing like a


normal curve.
BPS - 5th Ed.

Chapter 11

17

Example: Sampling Distribution


(cont.)
First a look at the Law of Large Numbers
Sample Distribution for Sample 1

Sample Distribution for Sample 2

11

11

10

10

Number of Data Points

Number of Data Points

Here are the distribution


of the measurements in
the sample from four
different random
samples from our
population on the
previous slide. Each
sample is of size 25.

8
7
6
5
4
3
2
1
0

7
6
5
4
3
2
1
0

10

11

10

Values of X

Values of X

Sample Distribution for Sample 3

Sample Distribution for Sample 4

11

11

10

10

Number of Data Points

Number of Data Points

Each of these
distributions looks
somewhat like the
parent population
distribution (i.e., the
distribution being
sampled from), but
none look exactly like
it.
They all look different
from each other as
well.
BPS - 5th Ed.

8
7
6
5
4
3
2
1
0

11

8
7
6
5
4
3
2
1
0

Values of X

Chapter 11

10

11

10

11

Values of X

18

Example: Sampling Distribution


(cont.)
First a look at the Law of Large Numbers

Now here is the distribution of the measurements in a sample of


size 1000.
This distribution looks much more like the parent population.
Sample Distribution for Sample of Size 1000

Distribution
within a single
sample.
BPS - 5th Ed.

200

Number of Data Points

Conclusion: The
distribution of
measurements in a
large sample looks
like the distribution in
the parent population.
(Think about the Law
of Large Numbers)

100

0
0

10

11

Values of X
Chapter 11

19

Example: Sampling Distribution (cont.)


A Sampling Distribution of The Sample Mean for
Samples of Size 10

Now suppose that we took a sample of size 10 and computed the


sample mean as a sample statistic and did this many times. The
sampling distribution of this random variable looks like this.

Skewness is still evident.

Distribution
across many
samples.

Probability Dist'n of Sample Mean for Samples of Size 10

=2.56

0.5

Probability Density

This looks somewhat like a


normal curve and not at all
like the parent population.

0.4
0.3
0.2
0.1
0.0
1

Values of the Sample Mean

Values of the sample mean range from approximately 0 to 6.


(Values in original population are 0 to 11.)
BPS - 5th Ed.

Chapter 11

20

Example: Sampling Distribution (cont.)


A Sampling Distribution of The Sample Mean for Samples of Size
40 and 160
=2.56

The sampling distribution for the


sample mean for a sample size of 40.
This looks more like a normal curve
than for a sample of size 10.

=2.56

The sampling distribution for the sample


mean for a sample size of 160.
This looks almost exactly like a normal
curve.

Values of the sample mean range from


Values of the sample mean range from
approximately 1.75 to 3.25. (A much
approximately 1.5 to 4.
tighter distribution variation
decreases as sample size gets larger.)
BPS - 5th Ed.
Chapter 11
21

Behavior of Sampling Distribution


What can we take from our sampling distribution example
above?
1) The distribution of measurements in a sample looks like the
distribution in the parent population, NOT necessarily like a
Normal curve.
2) The sampling distribution of the sample mean looks like a normal
curve as our sample size increased, even though the parent
population is definitely NOT normal.
3) As the sample size increases, the sample mean gets closer to the
population mean, i.e., the difference between the sample mean and
the population mean tends to become smaller (i.e., approaches zero).
(Law of Large Numbers!)
4) The spread in the histograms for the sampling distribution of the
sample mean is getting smaller for larger sample sizes.
(Law of Large Numbers!)
BPS - 5th Ed.

Chapter 11

22

Mean and Standard Deviation of the


Sampling Distribution of the Sample Means
If numerous samples of size n are taken from a
population with mean and standard deviation
If the mean, , is taken for each of these samples of
size n, then is now a random variable (sampling
distribution of the sample means)
The mean of this sampling distribution of the sample
mean, , equals (where the population mean)
and the standard deviation, , of the sampling
distribution of the sample mean called the standard
error is: / n
(where is the population standard deviation)
BPS - 5th Ed.

Chapter 11

23

Summary of Properties of Sampling


Distribution

The sampling distribution of the sample mean has several


important properties:
If a simple random sample of size n is drawn from any large population,
then the sampling distribution of the sample mean has:
Mean: =
x

(The mean of the sampling distribution of the sample mean

equals the population mean.)


Standard deviation, called the standard error of the mean:

x =

(As the sample size increases, the standard error of the sample mean gets
smaller.)

In addition, if the population is normally distributed, then, the sampling


distribution is normally distributed.
Terminology: The Standard Deviation of a statistic is called its Standard Error.
BPS - 5th Ed.

Chapter 11

24

Mean and Standard Deviation of the


Sampling Distribution of the Sample Means
Since the mean of the random variable,
is that is =), we say that is an
unbiased estimator of
Individual

observations have standard


deviation , but sample means from
samples of size n have standard deviation
/ n (called the standard error, that is
= / n )

Averages are less variable than


individual observations.

BPS - 5th Ed.

Chapter 11

25

Notation for Sampling Distribution of


Sample Means
If individual observations have the N(, )
distribution, then the sampling distribution of
the sample mean, , of n independent
observations has the N(, /

) distribution.

If measurements in the population follow a


Normal distribution, then so does the sample
mean.

BPS - 5th Ed.

Chapter 11

26

Sample Mean Example #1


Suppose a simple random sample is obtained from a
population that is normally distributed with a mean of
20 and a standard deviation of 12.
a) Describe the sampling distribution of the sample mean for
a sample of size n = 4 . (When we discribe as
distribution, we state the mean and standard deviation
and the shape.)
b) Calculate the probability a random sample of size n = 4
will have a mean between 16 and 24?
c) If another sample of size n = 9 is taken, will the sampling
distribution change? What is the probability a random
sample of size n = 9 will have a mean between 16 and 24?
d) What effect does increasing the sample size have on the
probabilities? Why do you think this is the case?
BPS - 5th Ed.

Chapter 11

27

Sample Mean Example #2


Based on tests of the Chevrolet Cobalt, engineers
have found that the miles per gallon in highway
driving are normally distributed, with a mean of
32 miles per gallon and a standard deviation of
3.5 miles per gallon.
a) What is the probability that a randomly selected Cobalt gets
more than 34 miles per gallon?
b) Suppose that 10 Cobalts are randomly selected and the miles
per gallon for each car are recorded. What is the probability
that the mean miles per gallon exceeds 34 miles per gallon?
c) Suppose that 20 Cobalts are randomly selected and the miles
per gallon for each car are recorded. What is the probability
that the mean miles per gallon exceeds 34 miles per gallon?
Would this result be unusual?
BPS - 5th Ed.

Chapter 11

28

Central Limit Theorem


If our population has a normal distribution, then the
sampling distribution of the sample mean is normal.
However what if the population does not have a
normal distribution. What can we do?
Wouldnt it be very nice if the sampling distribution for the
sample mean is normal, even when the population
distribution is not? This is almost true
The Central Limit Theorem states:
Regardless of the shape of the population distribution,
the sampling distribution of the sample mean becomes
approximately normal as the sample size n increases.
BPS - 5th Ed.

Chapter 11

29

Central Limit Theorem (cont.)


Summary:
If the random variable X (i.e., the population) is normally
distributed, then the sampling distribution of the sample mean is
normally distributed for any sample size.
For all other random variables X (i.e., other populations), the
sampling distribution of the sample mean is approximately
normally distributed if n is 30 or higher. (The convention in our
class for n large enough)

BPS - 5th Ed.

Chapter 11

30

Old Faithful Example


The most famous geyser in the world, Old Faithful in Yellowstone
Natl Park, has a mean time between eruptions of 85 minutes
and a standard deviation of 21.25 minutes. The distribution of
the time interval between eruptions is not normal.
a) What is the probability that a randomly selected time interval will
be less than 75 minutes?
b)What is the probability that a random sample of 20 time intervals
will have a mean less than 75 minutes?
c) What is the probability that a random sample of 30 time intervals
will have a mean less than 75 minutes?
d)What is the probability that a random sample of 30 time intervals
will have a mean greater than 100 minutes?
e) What is the probability that a random sample of 30 time intervals
will have a mean between 75 and 90 minutes?

BPS - 5th Ed.

Chapter 11

31

Old Faithful Example Using MINITAB


Calculate Standard Error
MTB > let k1=21.25/sqrt(30)
MTB > print k1
Data Display
K1
3.87970

c)

d)

MTB > cdf 75 k2;


SUBC> norm 85 k1.
MTB > print k2

MTB >
SUBC>
MTB >
MTB >

Data Display
K2 0.00497564

e)
cdf 100 k2;
norm 85 k1.
let k3=1-k2
print k3

Data Display
K3 0.000055255

MTB >
SUBC>
MTB >
SUBC>
MTB >
MTB >

cdf 75 k2;
norm 85 k1.
cdf 90 k3;
norm 85 k1.
let k4=k3-k2
print k4

Data Display
K4 0.896283

BPS - 5th Ed.

Chapter 11

32

P
()x

P
()x

MINITAB Solution for

Calculate a cumulative probability for the sample mean:


MTB > let k1=sigma/sqrt(n)
Either sampling from a
MTB > cdf x k2;
normal population or n 30

SUBC> norm mu k1.


MTB > print k2

where:

Note: Substitute the correct


numbers from specific problem
for the items in italics.

x is the value of sample mean given in the question.


mu and sigma are the population mean and standard
deviation given in the question.
n is the sample size given in the question.
k1 is the storage location for the standard error (i.e.,
standard deviation of the sampling distribution of the
sample mean).
useisMINITAB
the storage
location
for error
the desired probability.
How tok2
to calculate
a standard
(regardless of whether sampling from normal
and sample size).
BPS - 5th Ed.

Chapter 11

MTB > let k1= sigma / sqrt(n)


MTB > print k1
33

P
()
x

P
()x

MINITAB Solution for

Calculate a complementary cumulative probability for the


sample mean:
MTB > let k1=sigma/sqrt(n)
MTB > cdf x k2;

SUBC> norm mu k1.


MTB > let k3=1-k2
MTB > print k3

Either sampling from a


normal population or n 30

Note: Substitute the correct


numbers from specific problem
for the items in italics.

where:
x is the value of sample mean given in the question.
mu and sigma are the population mean and standard
deviation.
n is the sample size.
k3 is the storage location for the desired probability.
BPS - 5th Ed.

Chapter 11

34

P
()
x

y
P
()
xy
MINITAB Solution for

Calculate an in-between cumulative probability for the sample


mean:
MTB > let k1=sigma/sqrt(n)
MTB > cdf x k2;

SUBC> norm mu k1.


MTB > cdf y k3;

Either sampling from a


normal population or n 30

Note: Substitute the correct


numbers from specific problem
for the items in italics.

SUBC> norm mu k1.


MTB > let k4=k3-k2
MTB > print k4
where:

x and y are the values of sample mean given in the question.


mu and sigma are the population mean and standard deviation.
n is the sample size.
k4 is the storage location for the desired probability.

BPS - 5th Ed.

Chapter 11

35

S-ar putea să vă placă și