Documente Academic
Documente Profesional
Documente Cultură
We often use samples instead of the entire population because the cost and time
of measuring every item in the population would be too expensive. Also, in
some cases measurement requires destruction of individual items. In general, we
achieve greater accuracy by carefully obtaining a random sample of the
population instead of spending the resources to measure every item. There are
two important reasons for this result. First, it is often very difficult to obtain and
measures every item in a population, and even if possible, the cost would be
very high for a large population.
Sampling
Sampling is the process of selecting a sample.
Simple random sampling can be implemented in many ways. We can place the
N population items—for example, colored balls—in a large barrel and mix them
thoroughly. Then from this well-mixed barrel we can select individual balls
from different parts of the barrel. In practice, we often use random numbers to
select objects that can be assigned some numerical value. Various statistical
computer software and spreadsheets have routines for obtaining random
numbers, and these are generally used for most sampling studies.
To see how to use random number table, suppose that we have 100 employees
in a company and wish to interview a randomly chosen sample of 10. We could
get such a random sample by assigning every employee a number of 00 to 99,
consulting a Random Number Table, and picking a systematic method of
selecting two-digit numbers. In this case, let’s do the following:
Go from the top to the bottom of the columns beginning with the left-hand
column, and read only the first two digits in each row.
1
Systematic Sampling
In systematic sampling, elements are selected from the population at a uniform
interval that is measured in time, order, or space. If we want to interview every
twentieth student on a college campus, we would choose a random starting
point in the first 20 names in the student directory and then pick every twentieth
name thereafter.
Stratified Sampling
To use stratified sampling, we divide the population into relatively
homogeneous groups, called strata. Then we use one of two approaches. Either
we select at random from each stratum a specified number of elements
corresponding to the proportion of that stratum in the population as whole or we
draw an equal number of elements from each stratum and give weight to the
results according to the stratum’s proportion of total population. With either
approach, stratified sampling guarantees that every element in the population
has a chance of being selected.
The physician wants to find out how many hours his patients sleep. To obtain an
estimate of this characteristic of the population, he could take a random sample
from each of the four age groups and give weight to the sample according to the
percentage of patients in that group. This would be an example of a stratified
sample.
Cluster Sampling
In cluster sampling, we divide the population into groups, or clusters, and then
select a random sample of these clusters. We assume that these individual
clusters are representative of the population as a whole. If a market research
team is attempting to determine by sampling the average number of television
sets per household in a large city, they could use a city map to divide the
territory into blocks and then choose a certain number of blocks (clusters) for
interviewing. Every household in each of these blocks would be interviewed. A
well-designed cluster sampling procedure can produce a more precise sample at
considerable less cost than of simple random sampling.
2
Sampling Distributions
Consider a random sample selected from a population that is used to make an
inference about some population characteristic, such as the population mean, ,
using a sample statistic, such as the sample mean, x . The inference is based on
the realization that every random sample has a different number for x , and,
thus, x is a random variable. The sampling distribution of this statistic is the
probability distribution of the sample means obtained from all possible samples
of the same number of observations drawn from the population.
2 4 6 6 7 8
Two of these employees are to be chosen randomly for a particular work group.
The mean of the years of experience for this population of six employees is
246678
5.5
6
Now, let us consider the mean number of years of experience of the two
employees chosen randomly from the population of six. Fifteen (
6! 6 5 4!
6
C2 15 ) possible different random samples could be selected.
2!4! 2!4!
Table 1 shows all of the possible samples and associated sample means.
Table1: Samples and sample means from the worker population sample size n =
2.
Sample Sample mean Sample Sample mean
2, 4 3.0 4, 8 6.0
2.6 4.0 6,6 6.0
2,6 4.0 6,6 6.5
2,7 4.5 6,8 7.0
2,8 5.0 6,7 6.5
4,6 5.0 6,8 7.0
4,6 5.0 7,8 7.5
4,7 5.5
Each of the 15 samples in Table 1 has the same probability, 1/15, of being
selected. Note that there are several occurrences of the same sample mean. For
example, the sample mean 5.0 occurs three times, and, thus, the probability of
obtaining a sample 5.0 is 3/15. Table 2 represents the sampling distribution for
the various sample means from the population, and the probability function is
graphed in Figure 1.
3
Table 2: Sampling distribution of the sample means from the worker population
sample size n = 2.
Sample mean x Probability of x
3.0 1/15
4.0 2/15
4.5 1/15
5.0 3/15
5.5 1/15
6.0 2/15
6.5 2/15
7.0 2/15
7.5 1/15
We see that, while the number of years of experience for the six workers ranges
from 2 to 8, the possible values of the sample mean have a range from only 3.0
to 7.5. In addition, more of the values lie in the central portion of the range.
Table3: Samples and sample means from the worker population sample size n =
5.
Sample x Probability
2,4,6,6,7 5.0 1/6
2,4,6,6,8 5.2 1/6
2,4,6,7,8 5.4 1/6
2,6,6,7,8 5.8 1/6
4,6,6,7,8 6.2 1/6
Sample Mean
Let the random variables X1, X2,……….,Xn denote a random sample from a
population. The sample mean value of these random variables is defined as
1 n
X Xi
n i 1
Consider the sampling distribution of the random variable X . At this point we
cannot determine the shape of the sampling distribution, but we can determine
the mean and variance of the sampling distribution. We know that the
4
expectation of a linear combination of random variables is the linear
combination of the expectations:
1 E ( X 1 ) E ( X 2 ) ........... E ( X n )
E ( X ) E ( X 1 X 2 .................... X n
n n
............. n
=
n n
Thus, the mean of the sampling distribution of the sample means is the
population mean. If samples of n random and independent observations are
repeatedly and independently drawn from a population, then as the number of
samples becomes very large, the mean of the sample means approaches the true
population mean.
X2
n
X X
Z
X
n
Example: Suppose that the annual percentage salary increases for the chief
executive officers of all midsize corporations are normally distributed with
mean 12.2% and standard deviation 3.6%. A random sample of nine
observations is obtained from this population and the sample mean computed.
What is the probability that the sample mean will be less than 10%?
Solution
We know that
12.2 3.6 n9
Let x denote the sample mean, and compute the standard error of the sample
mean
3.6
x 1.2
n 9
5
Then we compute
x 10 12.2
P( x 10) P P( Z 1.83) 0.0336
x 1.2
From this analysis we conclude that the probability that the sample mean will be
less than 10% is only 0.0336.
Example
A spark plug manufacturer claims that the lives of its plugs are normally
distributed with mean 36,000 miles and standard deviation 4,000 miles. A
random sample of 16 plugs had an average life of 34,500 miles. If the
manufacture’s claim is correct, what is the probability of finding a sample mean
of 34,500 or less?
Solution
To compute the probability, we need to first obtain the standard error of the
sample mean
4000
x 1,000
n 16
The desired probability is
x 34,500 36,000
P ( x 34,500) P ( Z 1.50) 0.0668
x 1,000
We find that the probability that sample mean is less than 34,500 is 0.0668. This
probability suggests that, if the manufacturer’s claims: 36,000 and 4,000
are true, then a sample mean of 34,500 or less has a small probability. As a
result we are doubtful about the manufacturer’s claims.
X
Pˆ
n
X is the sum of a set of n independent Bernoulli random variables, each with
probability of success P. As a result, P̂ is the mean of a set of independent
random variables. The central limit theorem can be used to argue that the
probability distribution for P̂ can be modeled as a normally distributed random
variable.
The mean and variance of the sampling distribution of the sample proportion P̂
can be obtained from the mean and variance of the number of success, X.
6
E(X) = nP Var(X) = nP(1 – P)
And, thus,
X 1
E( P̂ ) = E E ( X ) P
n n
We see that the mean of the distribution of P̂ is the population proportion, P.
X 1 P (1 P )
P2 Var 2 Var ( X )
n n n
The standard deviation of P̂ , which is the square root of the variance, is called
its standard error.
Example: A random sample of 250 homes was taken from a large population
of older homes to estimate the proportion of homes with unsafe wiring. If, in
fact, 30% of the homes have unsafe wiring, what is the probability that the
sample proportion will be between 25% and 30% of homes with unsafe
wiring?
P = 0.30 n = 250
We can compute the standard deviation of the sample proportion, P̂ , as
7
P(1 P ) 0.30(1 0.30)
Pˆ 0.029
n 250
The required probability is
0.25 P Pˆ P 0.35 P
P(0.25< P̂ <0.35) = P
ˆ
P Pˆ Pˆ
0.25 0.30 0.35 0.30
= P 0.029 Z 0.029
= P(-1.72 <Z<1.72)
= F (1.72) [1 F (1.72)]
= .9573 – [1 - .9573]
= .9573 – 0.0427
= 0.9146
Thus, we see that the probability that the sample proportion is within the
interval 0.25 to 0.35, given P = 0.30, is 0.9146. This interval is called a
91.46% acceptance interval.
Example: It has been estimated that 43% of business graduates believe that
a course in business ethics is very important for imparting ethical values to
students. Find the probability that more than one-half of a random sample of
80 business graduates have this belief.
= P Z 1.27
= 1 P( Z 1.27)
= 1 - .8980
= 0.1020
The probability of having one-half of the sample believing in the value of
business ethics courses is approximately 0.1.
Estimation
The investigation of whole population (totality of all elements) is not feasible,
because it may be time consuming, the cost is very high, need large number of
skilled persons etc.
8
In business, economics and managerial problems we frequently deal with
population parameters such as population mean ( ), population variance ( )
etc. In most of the business and economics problems such information are not
available. In that case sampling is used to estimate these unknown parameters
based on sample information to make inference and policy formulation.
For example, we draw a random sample from a population and we use the
sample mean ( ) to estimate the population mean .
Parameter
The numerical measures such as population mean ( ), population variance ( )
etc. that describes a population are called parameter. Thus any parameter is the
function of population observations.
For example, the average monthly salary of all the teachers of BRAC in 2016 is
a parameter.
Statistic
On the other hand, the numerical measures such as sample mean , sample
variance ( ) etc. that describes a sample are called statistic. Thus any statistic
is the function of sample observations.
For example, the average monthly salary of all the teachers of Business School
of BRAC in 2016 is a statistic.
Estimator
Any statistic is a function of sample observations, being used to estimate a
population parameter from which the sample is drawn is called an estimator of
the parameter. Thus, an estimator is a random variable because its value varies
from sample to sample.
For example, the sample mean , and the sample variance
9
Estimate
Any specific value of an estimator computed from a particular sample is called
an estimate of the parameter.
Example:
Let us consider a random sample of obtaining marks of Statistics of 25 students
in a final exam of BUP as 65, 45, 46, 47, 48, 65, 70, 71, 75, 45, 35, 32, 68, 65,
63, 55, 48, 49, 44, 56, 44, 53, 60, 70, and 67. Here all the students are the
population and X indicates the marks of students, and are the mean and
variance of the population and can be estimated by using sample mean ,
sample variance ( ) . These sample mean and variance are the estimators.
The specific value of these estimators is called the estimate.
Point estimate
A point estimate is a single value of an estimator which is computed from a
particular sample, and being used to estimate population parameter from which
the sample is drawn at random.
Interval estimate
An interval estimate is a range of numbers which is computed from a particular
sample, having a specified probability of correctly estimating the true value of
the population parameter from which the sample is drawn. The probability is
called the confidence level.
10
interval for the population parameter is given by . The
confidence level is 0.95 or 95%.
Estimation methods
Several methods are developed for constructing good estimators for the
unknown population parameters. The most popular and widely used methods for
estimation are given below:
- Least squares method
- Maximum likelihood method
- Method of moments
Problem
The monthly consumption expenditure (Tk.) and the level of monthly income (Tk.) of 25
employees of a garment factory are given below:
Consumption Level of Income Consumption Level of Income
Expenditure (y) (x) Expenditure (y) (x)
7880 8750 11589 12520
8025 8900 11969 12990
8055 9000 11905 13500
8225 9100 12545 13580
8435 9275 12869 13975
8725 9550 13255 14200
9205 10200 13518 14525
9578 10515 13689 15250
9855 10825 13700 15500
10513 11025 14255 15980
10599 11450 14555 16700
10719 11650 15500 16925
10987 12100
Fit the equation between income and consumption expenditure:
using least squares method.
Solution
11
The equation between monthly consumption expenditure and income of the
employee of a garments factory is given by:
Here, the variable Y indicates the monthly consumption expenditure and the
variable X indicates income level of the employees, is the regression
constant, is the regression coefficient which indicates the impact of per unit
income on consumption expenditure, is the random error term.
For sample observations the equation is given by
…………….. (1)
And
………….. (2)
12
13689 15250 208757250 232562500
13700 15500 212350000 240250000
14255 15980 227794900 255360400
14555 16700 243068500 278890000
15500 16925 262337500 286455625
280150 307985 3.5960e+09 3.9570e+09
Now, ,
Putting the values in the equation we have
Exercises
13