Sunteți pe pagina 1din 75

SAMPLING

DISTRIBUTIONS

Population Distribution
Definition
The population distribution is the probability distribution of the
population data.

Population Distribution
Suppose there are only five students in an advanced
statistics class and the midterm scores of these five
students are
70
78
80
80
95

Let x denote the score of a student

Table 7.1 Population Frequency and Relative


Frequency Distributions

Table 7.2 Population Probability Distribution

x P x
2

Sampling
In statistical inference our goal is to determine something about a
population based only on the sample. The population or census is
the entire group of individuals or objects under consideration, and
the sample is a part of that population.
Reason to Sample
a) To contact the whole population would be time consuming
b) The cost of studying all the items in a population may be
prohibitive
c) The physical impossibility of checking all items in the
population
d) The destructive nature of some test
e) The sample results are adequate

Types of Sampling

Simple Random Sampling

A sample selected so that each item or person in the population has the same
chance of being included

Systematic Random Sampling


A random starting point is selected, and then every kth number of the
population is selected

1. Stratified Random Sampling


A population is divided into subgroups, called strata and a sample is randomly
selected from each stratum

2. Cluster Sampling
A population is divided into clusters using naturally occurring geographic or
other boundaries. Then, clusters are randomly selected and a sample is
collected from each cluster

Sampling Distribution
Definition
The probability distribution of x is called its sampling
distribution. It lists the various values that x can assume
and the probability of each value of x .
In general, the probability distribution of a sample statistic
is called its sampling distribution.

Sampling Distribution
Reconsider the population of midterm scores of five
students given in Table 7.1
Consider all possible samples of three scores each that
can be selected, without replacement, from that
population. The total number of possible samples is

5!
5 4 3 2 1

10
5 C3
3!(5 3)! 3 2 1 2 1

Sampling Distribution
Suppose we assign the letters A, B, C, D, and E to the
scores of the five students so that
A = 70
B = 78
C = 80
D = 80
E = 95

Then, the 10 possible samples of three scores each are


ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE,
BDE, CDE

Table 7.3 All Possible Samples and Their Means


When the Sample Size Is 3

Table 7.4 Frequency and Relative Frequency


Distributions of x When the Sample Size Is 3

Table 7.5 Sampling Distribution of


Sample Size Is 3

x When the

SAMPLING AND NONSAMPLING ERRORS


Definition
Sampling error is the difference between the value of a
sample statistic and the value of the corresponding
population parameter. In the case of the mean,
Sampling error = x
Assuming that the sample is random and no non-sampling
error has been made.

SAMPLING AND NONSAMPLING ERRORS


Definition
The errors that occur in the collection, recording, and
tabulation of data are called nonsampling errors.

Reasons for the Occurrence of Nonsampling


Errors
1. If a sample is nonrandom (and, hence, nonrepresentative), the
sample results may be too difference from the census results.
2. The questions may be phrased in such a way that they are not
fully understood by the members of the sample or population.
3. The respondents may intentionally give false information in
response to some sensitive questions.
4. The poll taker may make a mistake and enter a wrong number
in the records or make an error while entering the data on a
computer.

Example 7-1
Reconsider the population of five scores given in Table
7.1. Suppose one sample of three scores is selected from
this population, and this sample includes the scores 70,
80, and 95. Find the sampling error.

Example 7-1: Solution


70 78 80 80 95

80.60
5
70 80 95
x
81.67
3
Sampling error x 81.67 80.60 1.07
That is, the mean score estimated from the sample is
1.07 higher than the mean score of the population.

SAMPLING AND NONSAMPLING ERRORS


Now suppose, when we select the sample of three scores,
we mistakenly record the second score as 82 instead of
80.
As a result, we calculate the sample mean as

70 82 95
x
82.33
3

SAMPLING AND NONSAMPLING ERRORS


The difference between this sample mean and the
population mean is

x 82.33 80.60 1.73


This difference does not represent the sampling error.
Only 1.07 of this difference is due to the sampling error.

SAMPLING AND NONSAMPLING ERRORS


The remaining portion represents the nonsampling error.

It is equal to 1.73 1.07 = .66


It occurred due to the error we made in recording the
second score in the sample

Also,
Nonsampling error Incorrect x Correct x
82.33 81.67 .66

Figure 7.1 Sampling and nonsampling errors.

MEAN AND STANDARD DEVIATION OF x


Definition
The mean and standard deviation of the
sampling distribution of x are called the
mean and standard deviation of x and
are denoted by x and x , respectively.

MEAN AND STANDARD DEVIATION OF x


Mean of the Sampling Distribution of

The mean of the sampling


distribution of x is always equal to the
mean of the population. Thus,

We can obtain the mean, x and the


standard deviation, x of x from the given
table

Alternatively, we can calculate the mean


and standard deviation of the sampling
distribution of x from the given table by
the formulas:

x = x P(x)=80.60

x P x

=3.30

If we calculate the mean for the population


probability distribution of and the mean for
the sampling distribution by using the
formula, Mean= x P(x)=80.60

x
x

MEAN AND STANDARD DEVIATION OF x


Standard Deviation of the Sampling Distribution
of
The standard deviation of the sampling
distribution of
is

where is the standard deviation of the


population and n is the sample size. This formula
is used when n /N .05, where N is the
population size.

MEAN AND STANDARD DEVIATION OF

If the condition n /N .05 is not satisfied,


we use the following formula to calculate
x :

n
N n
N 1

N n
N 1

where the factor


is called the finite
population correction factor

Two Important Observations


1. The spread of the sampling distribution
of x is smaller than the spread of the
corresponding population distribution,
i.e.
x
When n>1
2. The standard deviation of the sampling
distribution of x decreases as the
sample size increases(Consistent
estimator)

Example 7-2
The mean wage for all 5000 employees
who work at a large company is $27.50
and the standard deviation is $3.70.
Let x be the mean wage per hour for a
random sample of certain employees
selected from this company. Find the
mean and standard deviation of x for a
sample size of
(a) 30
(b) 75
(c) 200

Example 7-2: Solution


(a) N = 5000, = $27.50, = $3.70. In
this case, n/N = 30/5000 = .006 < .05.

x $27.50

3.70
x

$.676
n

30

Example 7-2: Solution


(b) N = 5000, = $27.50, = $3.70. In
this case, n/N = 75/5000 = .015 < .05.

x $27.50

3.70
x

$.427
n

75

Example 7-2: Solution


(c) In this case, n = 200 and
n/N = 200/5000 = .04, which is less than.05.

x $27.50

3.70
x

$.262
n

200

SHAPE OF THE SAMPLING DISTRIBUTION


OF x

The population from which samples are


drawn has a normal distribution.
The population from which samples are
drawn does not have a normal
distribution.

Sampling From a Normally Distributed Population


If the population from which the samples
are drawn is normally distributed with mean
and standard deviation , then the
sampling distribution of the sample mean,
x , will also be normally distributed with
the following mean and standard deviation,
irrespective of the sample size:

x and x

Figure 7.2 Population distribution and sampling


distributions of x .

Example 7-3
In a recent SAT, the mean score for all
examinees was 1020. Assume that the
distribution of SAT scores of all examinees is
normal with the mean of 1020 and a standard
deviation of 153. Let x be the mean SAT score
of a random sample of certain examinees.
Calculate the mean and standard deviation of x
and describe the shape of its sampling
distribution when the sample size is
(a) 16
(b) 50
(c) 1000

Example 7-3: Solution


(a) = 1020 and = 153.

x 1020

153
x

38.250
n

16

Figure 7.3

Example 7-3: Solution


(b)

x 1020

153
x

21.637
n

50

Figure 7.4

Example 7-3: Solution


(c)

x 1020

153
x

4.838
n

1000

Figure 7.5

Sampling From a Population That Is Not


Normally Distributed
Central Limit Theorem
According to the central limit theorem, for a large
sample size, the sampling distribution of x is
approximately normal, irrespective of the shape of
the population distribution. The mean and standard
deviation of the sampling distribution of x are

x and x

The sample size is usually considered to be large if


n 30.

x
x

Figure 7.6 Population distribution and sampling


distributions of x .

Example 7-4
The mean rent paid by all tenants in a small city
is $1550 with a standard deviation of $225.
However, the population distribution of rents for
all tenants in this city is skewed to the right.
Calculate the mean and standard deviation of
and describe the shape of its sampling
distribution when the sample size is
(a) 30
(b) 100

Example 7-4: Solution


(a) Let x be the mean rent paid by a sample
of 30 tenants.

x $1550

225
x

$41.079
n

30

Figure 7.7

Example 7-4: Solution


(b) Let x be the mean rent paid by a sample
of 100 tenants.

x $1550

225
x

$22.500
n

100

Figure 7.8

APPLICATIONS OF THE SAMPLING


DISTRIBUTION OF x
If we take all possible samples of the same (large)
size from a population and calculate the mean for
each of these samples, then about 68.26% of the
sample means will be within one standard deviation
of the population mean.
Alternatively, P ( 1 x x 1 x )
=0.8413- 0.1587= 0.6826

1.

Figure 7.9 P ( 1 x x 1 x )

APPLICATIONS OF THE SAMPLING


DISTRIBUTION OF x
If we take all possible samples of the
same (large) size from a population and
calculate the mean for each of these
samples, then about 95.44% of the
sample means will be within two standard
deviations of the population mean.
Alternatively, P ( 2 x x 2 x )
=0.9772-0.0228=0.9544
2.

Figure 7.10 P ( 2 x x 2 x )

APPLICATIONS OF THE SAMPLING


DISTRIBUTION OF x
If we take all possible samples of the
same (large) size from a population and
calculate the mean for each of these
samples, then about 99.74% of the
sample means will be within three
standard deviations of the population
mean.
Alternatively, P ( 3 x x 3 x )
=.09987-0.0013=0.9974
3.

Figure 7.11 P ( 3 x x 3 x )

Example 7-5
Assume that the weights of all packages of
a certain brand of cookies are normally
distributed with a mean of 32 ounces and a
standard deviation of .3 ounce. Find the
probability that the mean weight, x , of a
random sample of 20 packages of this
brand of cookies will be between 31.8 and
31.9 ounces.

Example 7-5: Solution

x 32 ounces

.3
x

.06708204 ounce
n

20

z Value for a Value of x


The z value for a value of
as

is calculated

Example 7-5: Solution

For x = 31.8:

31.8 32
z
2.98
.06708204

For x = 31.9:

31.9 32
z
1.49
.06708204

P(31.8 < x < 31.9) = P(-2.98 < z < -1.49)


= P(z < -1.49) - P(z < -2.98)
= .0681 - .0014 = .0667

Figure 7.12

Example 7-6
According to Sallie Mae surveys and Credit Bureau data, college
students carried an average of $3173 credit card debt in 2008. Suppose
the probability distribution of the current credit card debts for all
college students in the United States is known but its mean is $3173
and the standard deviation is $750. Let be the mean credit card debt
of a random sample of 400 U.S. college students.
a)
What is the probability that the mean of the current credit card debts
for this sample is within $70 of the population mean?
b)
What is the probability that the mean of the current credit card debts
for this sample is lower than the population mean by $50 or more?

Example 7-6: Solution


= $3173 and = $750. The shape of
the probability distribution of the
population is unknown. However, the
sampling distribution of
is
approximately normal because the
sample is large (n > 30).

Example 7-6: Solution


(a)

P($3103 x $3243)
= P(-1.87 z 1.87) = .9693 - .0307
= .9386

Figure 7.13

P ($3103 x $3243)

Prem Mann, Introductory Statistics, 7/E


Copyright 2010 John Wiley & Sons. All right reserved

Example 7-6: Solution


(a) Therefore, the probability that the mean
of the current credit card debts for this
sample is within $70 of the population
mean is .9386.

Example 7-6: Solution


(b)

For x = $3123:
3123 3173
z
1.33
37.50

P( x 3123) = P (z -1.33)
= .0918

Figure 7.14 P ( x $3123)

Example 7-6: Solution


Therefore, the probability that the mean of
the current credit card debts for this
sample is lower than the population mean
by $50 or more is .0918.

S-ar putea să vă placă și