Documente Academic
Documente Profesional
Documente Cultură
Please turn off cell phones, pagers, etc. 1. Sampling variability (Section 19.1)
The lecture will begin shortly.
However, we can say something about the population 57.0 60.6 53.2 59.8 56.4
value if we know something about sampling variation. 55.2 54.2 55.4 58.0 58.2
The sampling errors are:
Sampling variability describes how large the sampling
error would be over hypothetical repeated samples from 0.0 3.6 -3.8 2.8 -0.6
the same population. -1.8 -2.8 -1.6 1.0 1.2
Notice that these ten sampling errors range from -3.8 to Notice that the distribution of these proportions is centered
+3.6. around the true value of 57%.
Next, using my computer, I drew 100 random samples from But there is considerable variability in these estimates. The
this same population and computed the sample proportion lowest estimate is 50.2%, and the highest is 63.8%.
from each one. A boxplot and histogram of the proportions Here is a boxplot of the sampling errors.
are shown here.
Histogram of 100 sample proportions Boxplot of 100 sampling errors
Boxplot of 100 sample proportions
64
30
62
25
60
20
Frequency
58
15
56
-6 -4 -2 0 2 4 6
10
54
50 52 54 56
p
58 60 62 64
Is there a rule about this?
There is a mathematical rule that tells us how the Suppose that you take many random samples of size n
sample proportion will behave over repeated samples from a large population in which the true proportion
from a population. of “Yes” is p.
This rule is called the Central Limit Theorem. And, for each sample, suppose you find the estimated value
of p in the usual way (number of “yes” divided by n).
It was demonstrated first in 1733 and again in 1812.
But it was not really known or used much until the
These sample estimates of p will be
Russian mathematician Lyapunov proved it in general
terms in 1901. • approximately normally distributed, with
The Central Limit Theorem is the cornerstone of • mean equal to p, and
modern statistical inference.
• standard deviation equal to
Now we will learn what the Central Limit Theorem says the square root of p×(1–p) / n.
about the behavior of a sample proportion.
Example
Histogram of 100 sample proportions
or 0.57.
10
50 52 54 56 58 60 62 64
should happen?
p
But in real problems, we usually take only one sample. It applies to a variable that is binary or dichotomous.
And, of course, we usually don’t know what the true p is. Suppose we want to estimate the mean of a continuous
The true p is what we are trying to estimate! variable (height, cholesterol, test score, etc.) in a population.
So, how does the CLT help us to make inferences about the The natural estimate is the sample mean or sample
true population proportion? average.
And, for each sample, suppose you find the sample mean. The population mean, which we call µ, is the average value
of the variable for all subjects in the population. It does not
These sample means will be vary. It is a fixed number.
• approximately normally distributed, with The sample mean is the average value of the variable
measured for all subjects in the sample.
• mean equal to µ, and
If you were to take a new random sample, you would get a
• standard deviation equal to σ divided by
the square root of n different sample mean. But the population mean does not
change.
Comment #2