Sunteți pe pagina 1din 23

Sampling Design,

Sample Size, and


Their Importance

Prof. Bhisma Murti, dr, MPH, MSc, PhD

Institute of Health Economic and Policy Studies (IHEPS),


Department of Public Health, Faculty of Medicine,
Universitas Sebelas Maret
Types of Population
Target population is the
population a researcher wants to
make inference about External
Source population (accessible Target population
population
population) is a subset of the target
population that is accessible to the
researcher, from which the samples Source
are drawn. Internal population
Validity
Study sample is a group of
Sampling Statistical External
subjects chosen from the source
inference Validity
population for study to represent the
target population Sample
External population is the
population larger than the target
population that the researcher may
still want to generalize results
Internal Validity and
External Validity
Internal validity refers to the
extent to which the sample
External
estimate reflects the true value Target population
population
of the association/ effect under
study in the target population Source
External validity refers to the Internal population
Validity
extent to which the sample
Sampling Statistical External
estimate is generalizable to the inference Validity
(larger) external population. The
Sample
internal validity is a prerequisite
for the external validity
What is Sampling and Why
Sampling is the selection of a
subset of individuals from within
a population to estimate
characteristics of the whole
population, e.g.
Prevalence of tuberculosis
The relationship between smoking
and the risk of stroke
Researchers rarely study the
entire population because the
cost of a census is too high.
Properties of a Good Research
A good research is one
that makes a valid,
precise, and consistent
estimate of characteristics
or difference/ Validity
association/ effect of
variables under study in
the population
The validity of a study is
inversely related to the
degree of systematic
Validity
error.
The precision and
consistency of an
estimate are inversely
related to the degree of
random error
Systematic Error
A systematic error or bias occurs
when there is a deviation between
the true value (in the target
population) and the observed value
(in the study sample)
A systematic error results from an
error in the selection of sample
(selection bias), faulty
measurement of variables
(information bias), and/ or mixed
effect by a third variable
(confounding factor)
Random Error
Random error occurs due to random variation
in sampling and/ or measurement of
variables
Random error is always present in a
measurement. It is caused by inherently
unpredictable fluctuations in measuring the
variables under study.
The distribution of random errors follows a
Gaussian-shape "bell" curve. They are
scattered about the true value, and tend to
have null value when a measurement is
repeated several times with the same
instrument.
Therefore increasing sample size can reduce
random error.
Systematic Error
The observed
Per Cent
values of the The true values
14 characteristics in of the
the sample characteristics in
12
the target
10 population
8
6
4
2
0
0 5 10 15 20 25 30
Size of induration, mm
Random Error
Per Cent The true
values of the
14
characteristics
12 in the target
population
10
The observed
8 values of the
6 characteristics
in the sample
4
2
0
0 5 10 15 20 25 30 35
Size of induration, mm
Why is Sampling Design
Important?
Incorrect selection of a
sample leads to bias
estimate of a study
Analysis of data from a
sample that is biased
or unrepresentative to
population will result
in wrong conclusion
about the
characteristics of the
population
Why is Sample Size Important?
Choosing a sample size
that is too small may not
give a statistically
significant conclusion nor
precise estimate about Valid, Valid,
difference/ relationship/
effect of the variables
under study
Too large a sample size is Not valid, Not valid,

wasteful and sometimes


impossible to complete.
Sample Size, Systematic Error,
and Random Error
The larger sample size,
the smaller random error
But sample size does not
affect systematic error Systematic error,
Larger sample size does random error
not reduce systematic Random error
error
Systematic error is more Systematic error
serious than random
error, as it cannot be
Sample size
corrected by increasing
sample size
Sample Size and Random Error
(Sampling Error, Margin of Error)

Larger sample size


reduces random
variation, therefore
increases precision
Sampling Design
Random sampling:
Simple random sampling
Stratified random sampling
Cluster random sampling
Non-random sampling:
A. Convenient sampling
B. Purposive (judgmental )
sampling:
Fixed disease sampling
Fixed exposure sampling etc.
Types of Random Sampling
Random sampling is a sampling
method in which all member of a
population (universe) have a known
and independent chance of being
selected.
Simple random sampling is a
sampling method in which all
member of a population have an
equal chance of being selected.
Stratified random sampling selects
independent samples at random
from subpopulations, groups or Choose groups
strata within the population. (cluster) at
random
Cluster (random) sampling selects
the sample units at random in groups Study all
members of the
(called cluster, eg. neighborhood). groups selected
Types of Non-Random Sampling
Purposive sampling uses expert
judgment to select a sample that
adequately represents the target
population on factors that might
influence the population: e.g.
socio-economic status,
intelligence, access to education,
environmental factors, etc.
Convenience sampling is a non-
probability sampling technique
where subjects are selected
because of their convenient
accessibility and proximity to the
researcher. This sampling design
is poor, it very unlikely gives a
representative sample
Fixed Exposure Sampling and
Fixed Disease Sampling
Fixed exposure sampling selects a
fixed number of subjects from each
exposure category (exposed and non-
exposed groups). This design is
primary used in a cohort study, but
can also be used in a cross-sectional
study
Fixed disease sampling select a fixed
number of subjects from each disease
category (case and control groups).
This design is primary used in a case
control study, but can also be used in
a cross-sectional study. Since cases are
rare, it will be efficient to include all
available cases for the study, while
subjects in the control group can be
selected at random from the available
non-diaseased population
Minimum Sample Size Formulas
Formula for Testing/
Estimating One Population:
1. Mean
2. Proportion
3. Correlation coefficient
Formula for Testing/
Estimating Two Populations:
1. Difference in Two (or More)
Population Means
2. Difference in Two (or More)
Population Proportion
Examples of Sample Size Formula
Sample size for a study that tests proportion
difference between two (or more) populations:

n
Z 1/2
2 P 1 P Z1 P1 1 P1 P2 1 P2
2

P P
1 2
2

Sample size for a study that tests mean difference


between two (or more) populations:
2
Z Z1 2
n
2 1/2

1 2 2
Determinants of a Sample Size
Estimation
Minimum sample size calculated by any formula is only a statistical
estimate. It is dependent on the researchers choice of acceptable
random error and on findings from previous studies. Time, cost, and
ethics should also be considered.

The researchers choice of acceptable random


error:
1. Tipe I error (). Arbritary, but conventional
choice: = 0.05
2. Type 2 Error () or statistical power (1- ).
Arbritary, but conventional choice: = 0.20
3. Degree of precision or margin of error (e.g.
+/- 5%)
Findings from previous or preliminary studies:
1. Difference in population means and their
variances
2. Difference in population proportions
3. Correlation coeficient from one population
Using Statistical Program to
Calculate Minimum Sample Size

Use of
OpenEpi to
calculate
sample size
Final Words: Important Reminder
The sample should be selected by
correct (unbiased) sampling design so
that it accurately represents the
population. Incorrect sampling design
will cause systematic error, which leads
to an estimate of the characteristics or
the association/ effect of variables in
the population that is not valid.

The sample size should be large


enough to achieve statistically
significant results (i.e. consistency) and
precise estimate. Small sample size will
increase random error, therefore will
cause non-statistically significant and
imprecise results.

S-ar putea să vă placă și