Sunteți pe pagina 1din 27


7: Sampling and Inferential Statistics

Presented by
Hussein Walid Hussein Alkhawaja

Supervised by
Vahid Nimehchisalem, PhD
Important statistical terms
a set which includes all
measurements of interest
to the researcher
(The collection of all
responses, measurements, or
counts that are of interest)

A subset of the population
Why sampling?

Get information about large populations

 Less costs
 Less field time
 More accuracy i.e. Can Do A Better Job of
Data Collection
 When it’s impossible to study the whole
Target Population:
The population to be studied/ to which the
investigator wants to generalize his results
Sampling Unit:
smallest unit from which sample can be selected
Sampling frame
List of all the sampling units from which sample is
Sampling scheme
Method of selecting sampling units from sampling
Steps in sampling?
1) Identification of the target population
- the large group to which the researcher wishes to generalize the
results of the study
- All students in in the Faculty of Modern Languages and
- all Malaysian boys and girls in the age range of 12 to 21 years

Note: accessible population, which is the population of subjects

accessible to the researcher for drawing a sample e.g., a sample of
adolescents from one state
- to save time and money Of course,
- but we could only generalize results to adolescents in the chosen
state, not to all Malaysian adolescents.
Steps in sampling?
2) Sample selection

 Probability sampling
- By chance inclusion of subjects/elements
- Random selection
- Help in inferential statistics to generalize findings
Criteria: every member or element of the population has a known
probability of being chosen in the sample.

 Nonprobability sampling :
- the elements are not chosen by chance procedures.
- Its success depends on the knowledge, expertise, and judgment of
the researcher.
- the application of probability sampling is not feasible.
- Its advantages are convenience and economy.
Probability samples

 Random sampling
 Each subject has a known probability of
being selected
 Allows application of statistical sampling
theory to results to:
 Generalise
 Test hypotheses
Methods used in probability

 Simple random sampling

 Systematic sampling

 Stratified sampling

 Multi-stage sampling

 Cluster sampling
Simple random sampling
 random means “without purpose or by accident.”
 chance alone determines which elements in the population will be in the sample

 When random sampling is used, the researcher can employ inferential statistics to
estimate how much the population is likely to differ from the sample.

1. Define the population.

2. List all members of the population.

3. Select the sample by employing a procedure where sheer chance determines which
members on the list are drawn for the sample.


- A container
- a table of random numbers (can be absolutely without bias)
- Research Randomizer (
Table of random numbers
Stratified Sampling
 a number of subgroups, or strata, that may differ in the characteristics being studied
e.g., age, neighborhood, and occupation
 To employ inferential statistics to estimate how much the population is likely to differ
from the sample.
 When the population to be sampled is not homogeneous but consists of several sub-
groups, stratified sampling may give a more representative sample than simple random

1. identify the strata of interest
2. randomly draw a specified number of subjects from each stratum with exact ratio of

Possible bias
- Geographic
- characteristics of the population ( income, occupation, gender, age, or year in college)

 Proportional stratified sampling

Take equal numbers from each stratum or select in proportion to the size of the stratum in
the population
Cluster Sampling
 Cluster: a naturally occur-ring group of sampling units close to each other i.e.
crowding together in the same area or neighborhood
e.g, a number of schools randomly from a list of schools and then include all the
students in those schools in the sample
50 blocks from a city map and then the polling of all the adults living on those
intact classrooms as clusters
 When it is very difficult, if not impossible, to list all the members of a target
population and select the sample from among them
 it would be very expensive to study a sample that is scattered throughout

1. Random selection of the cluster
2. all the members of the cluster must be included in the sample
Possible problems
- sampling error especial when number of clusters is small
Cluster sampling
Section 1 Section 2

Section 3

Section 5

Section 4
Systematic Sampling
drawing a sample by taking every Kth case from a list of the population.
Sampling fraction: Ratio between sample size and population size

 Randomize the sample
 Decide how many subjects you want in the sample (n)
 Divide N (total number of members in the population) by n to determine the
sampling interval (K ) list.
 Select the first member randomly from the first K members of the list and then
select every Kth member of the population for the sample by adding the K
value each time
e.g., N=500 subjects and a desired sample size n is 50:
K = N/n = 500/50 = 10.

Note: you could use cluster sampling if you were studying a very large and widely
dispersed population. At the same time, you might be interested in stratifying the
sample to answer questions regarding its different strata.
Systematic sampling
Non probability samples
 Probability of being chosen is unknown
 Cheaper- but unable to generalise
 potential for bias

 Convenience sampling (ease of access)

 Snowball sampling (friend of friend….etc.)
 Purposive sampling (judgemental)
 Quota sample
Convenience sampling
 sample is selected from elements of a population
that are easily accessible
 The weakest of all sampling procedures
 E.g.,
- Interviewing the first individuals you encounter on campus,
- a large undergraduate class
- using the students in your own classroom as a sample
- taking volunteers to be interviewed in survey research
 a convenience sample is perhaps better than nothing at all
 Be extremely cautious in interpreting the findings and know that you
cannot generalize the findings
Purposive/ judgment sampling

 sample ele- ments judged to be typical, or representative, are chosen

from the population
 E.g.,

- In forecasting national elections.

1. In each state, choose a number of small districts whose returns in
previous elections have been typical of the entire state.
2. Interview all the eligible voters in these districts
3. use the results to predict the voting patterns of the state.
4. Use similar procedures in all states, the pollsters forecast the
national results
 There is no reason to assume that the units judged to be typical of the
population will continue to be typical over a period of time.
 The results of a study using purposive sampling may be misleading.

Note: useful in attitude and opinion surveys because it is cheap

Quota Sampling
 Involves selecting typical cases from diverse strata of a popula-tion.
Based on known characteristics of the population to which you wish to
 For example, if census results show that 25 percent of the population
of an urban area lives in the suburbs, then 25 percent of the sample
should come from the suburbs.
 Determine a number of variables related to the question to be used as
bases for stratification. (gender, age, education, and social class).
 Use census or other available data, determine the size of each
segment of the population.
 Compute quotas for each segment of the population that are
proportional to the size of each segment.
 Select typical cases from each segment, or stratum, of the population
to fill the quotas (problematic)
Why? The selection of elements is likely to be based on accessibility
and convenience
Random Assignment
 Definition:
A procedure used after we have a sample of 
participants and before we expose them to a
 E.g., To compare the effects of two treatments on
the same dependent variable, we use random
assignment to put our available participants into
groups. Use a chance procedure (a table of
random numbers and tossing a coin is used to
decide which group gets which treatment.
Sample Size
 How large should a sample?
 Answer?
most important characteristic of a sample is its
representativeness, not its size.
A random sample of 200 is better than a random sample of
100, but a random sample of 100 is better than a biased
sample of 2.5 million.

most important characteristic of a sample is its

representativeness, not its size.
A random sample of 200 is better than a random sample of
100, but a random sample of 100 is better than a biased
sample of 2.5 million.
Power Calculations

Sample size
Quantitative Qualitative
Z 2σ 2 Z2 π(1  π)
n n
D2 D2

(σ12  σ 22 )xF 2 P (1 - P) F
n n
D2 D2

 Probability samples are the best

 Ensure
 Representativeness
 Precision
Errors in sample

Systematic error (or bias)

Inaccurate response (information bias)
Selection bias

Sampling error (random error)

“the difference between a population parameter and a sample statistic.” For example,
if you know the mean of the entire population (symbolized μ) and also the mean of a
random sample (symbolized X ) from that population, the difference between these
two (X − μ) represents sampling error (symbolized e).

Thus, e =X − μ. For example, if you know that the mean intelligence score for a
population of 10,000 fourth-graders is μ = 100 and a particular random sample of 200
has a mean of X = 99, then the sampling error is X − μ = 99 − 100 = −1.
Type 1 error
 The probability of finding a difference with our sample compared to

 The investigator will either retain or reject the null hypothesis. Either
decision may be correct or wrong. If the null hypothesis is true, the
investigator is correct in retaining it and in error in rejecting it. The
rejection of a true null hypothesis is labeled a Type I error.

 Known as the α (or “type 1 error”)

 Usually set at 5% (or 0.05)
Type 2 error

 The probability of not finding a difference that actually

exists between our sample compared to the population
 If the null hypothesis is false, the investigator is in error
in retaining it and correct in rejecting it. The retention of
a false null hypothesis labeled a Type II error.

 Known as the β (or “type 2 error”)

 Power is (1- β) and is usually 80%