Sunteți pe pagina 1din 4

Lecture 30

Today’s lecture will cover material related to


Sections 19.1-19.3

Please turn off cell phones, pagers, etc. 1. Sampling variability (Section 19.1)
The lecture will begin shortly.

2. Variability of a proportion (Section 19.2)

3. Variability of a mean (Section 19.3)

1. Sampling variability Example


Statistical inference Suppose that the true value of a relative risk in the
population is 1.2. (In a real problem, of course, you
The goal of statistical inference is to make scientifically would never know this value.)
defensible statements about a population based on
information from a single sample. Then suppose you take a random sample from this
population and compute the relative risk in your
Sampling error sample. Will it be exactly 1.2?

The difference between the value of a quantity in the Probably not.


population the estimate we get from a sample is called
sampling error. Most likely, it will be fairly close to 1.2, especially if
the sample size is large. But it will not be exactly 1.2.

Suppose that the sample gives an estimated relative


risk of 1.4. Then the sampling error in this relative
risk is 1.4 – 1.2 = 0.2.

Sampling variability Example


Suppose that 57.0% of likely voters in America would
In any real problem, we cannot know what the sampling support the creation of a national guest worker program.
error is, because the true value of the quantity in the
population is unknown. Suppose we take a sample of n=500 likely voters from this
population and find the proportion of support in the sample.
For purposes of statistical inference, it would be great if What can we say about the sampling variability?
we knew what the sampling error is. Unfortunately, we
will never know it. So we will never be able to say with Using my computer, I generated ten samples. Here are the
certainty what the population value is. ten sample proportions.

However, we can say something about the population 57.0 60.6 53.2 59.8 56.4
value if we know something about sampling variation. 55.2 54.2 55.4 58.0 58.2
The sampling errors are:
Sampling variability describes how large the sampling
error would be over hypothetical repeated samples from 0.0 3.6 -3.8 2.8 -0.6
the same population. -1.8 -2.8 -1.6 1.0 1.2
Notice that these ten sampling errors range from -3.8 to Notice that the distribution of these proportions is centered
+3.6. around the true value of 57%.

Next, using my computer, I drew 100 random samples from But there is considerable variability in these estimates. The
this same population and computed the sample proportion lowest estimate is 50.2%, and the highest is 63.8%.
from each one. A boxplot and histogram of the proportions Here is a boxplot of the sampling errors.
are shown here.
Histogram of 100 sample proportions Boxplot of 100 sampling errors
Boxplot of 100 sample proportions
64

30
62

25
60

20
Frequency
58

15
56

-6 -4 -2 0 2 4 6
10
54

The average of these sampling errors is -0.04, which is very


52

close to zero. The standard deviation is 2.4.


5
50

50 52 54 56

p
58 60 62 64
Is there a rule about this?

2. Variability of a proportion Central Limit Theorem for a proportion

There is a mathematical rule that tells us how the Suppose that you take many random samples of size n
sample proportion will behave over repeated samples from a large population in which the true proportion
from a population. of “Yes” is p.

This rule is called the Central Limit Theorem. And, for each sample, suppose you find the estimated value
of p in the usual way (number of “yes” divided by n).
It was demonstrated first in 1733 and again in 1812.
But it was not really known or used much until the
These sample estimates of p will be
Russian mathematician Lyapunov proved it in general
terms in 1901. • approximately normally distributed, with
The Central Limit Theorem is the cornerstone of • mean equal to p, and
modern statistical inference.
• standard deviation equal to
Now we will learn what the Central Limit Theorem says the square root of p×(1–p) / n.
about the behavior of a sample proportion.

Example
Histogram of 100 sample proportions

Is that what actually happened?


30
25

Earlier, we took 100 random samples of size 500 from a


20

population in which the true proportion of “yes” was 57%


Frequency

The histogram of the 100 sample


15

or 0.57.
10

proportions does look like a normal curve.


5

What does the Central Limit Theorem say about what


0

50 52 54 56 58 60 62 64

should happen?
p

The average of the 100 sample percentages was 56.8,


The CLT says that, over many samples, the sample which is quite close to the population value of 57%.
proportions will be approximately normally distributed
with The standard deviation of the 100 sample percentages was
• mean equal to p = 0.57 or 57% 2.4, which is close to what the CLT said it should be (which
was 2.2).
• standard deviation equal to the What we got is close to, but not exactly equal to, what the
square root of ( 0.57 × 0.43 ) / 500 CLT says it ought to be. Is there something wrong?
= square root of .0004702
= 0.0221 No. The CLT says what will happen in a very large number
= 2.21% of samples. We only took 100 samples.
Results for one million samples Restating the CLT for a sample proportion
Using my computer, I repeated the experiment. But this
time, I took one million samples. A boxplot of the sample The Central Limit Theorem says that if you
percentages is shown below.
• take many random samples of size n
Boxplot of 1,000,000 percentages

• from a population with true proportion = p

then the many sample proportions will be

• approximately normally distributed, with


45 50 55 60 65
• mean equal to p, and
The average of these one million percentages is 57.00, and
the standard deviation is 2.21. This is precisely what the • standard deviation equal to the square root of
CLT said it should be. p × (1 – p) / n

Hooray for the Central Limit Theorem!

Why does it matter? 3. Variability of a sample mean


The Central Limit Theorem tells us about the behavior of a The sample proportion estimates the proportion of “Yes”
sample proportion over many repeated samples. responses in the population.

But in real problems, we usually take only one sample. It applies to a variable that is binary or dichotomous.

And, of course, we usually don’t know what the true p is. Suppose we want to estimate the mean of a continuous
The true p is what we are trying to estimate! variable (height, cholesterol, test score, etc.) in a population.

So, how does the CLT help us to make inferences about the The natural estimate is the sample mean or sample
true population proportion? average.

The sample mean will not be exactly equal to the


That will be the subject of Chapter 20. population mean.

But the CLT tells us how it behaves over repeated samples.

Central Limit Theorem for a mean Comment #1


Suppose that you take many random samples of size n To understand the CLT, you need to understand the
from a large population in which the true mean is µ difference between the population mean and the sample
and the true standard deviation is σ. mean.

And, for each sample, suppose you find the sample mean. The population mean, which we call µ, is the average value
of the variable for all subjects in the population. It does not
These sample means will be vary. It is a fixed number.

• approximately normally distributed, with The sample mean is the average value of the variable
measured for all subjects in the sample.
• mean equal to µ, and
If you were to take a new random sample, you would get a
• standard deviation equal to σ divided by
the square root of n different sample mean. But the population mean does not
change.
Comment #2

You also need to understand the difference between the


population standard deviation and the standard deviation of
the sample mean.

The population standard deviation, which we call σ,


measures the spread or variability of all the measurements
in the population. It does not vary. It is a fixed number.

The standard deviation of the sample mean measures how


much the sample mean tends to vary over repeated
samples. It is equal to σ divided by the square root of n.

The standard deviation of the sample mean is much smaller


than the population standard deviation, especially when n is
large.

S-ar putea să vă placă și