Sunteți pe pagina 1din 65

Nonresponse

Sometimes, in survey sampling, individuals


chosen for the sample are unwilling or
unable to participate in the
survey. Nonresponse bias is the bias that
results when respondents differ in
meaningful ways from
nonrespondents. Nonresponse is often
problem with mail surveys, where the
response rate can be very low.
Nonresponse
Miss Schuster-Slatt said she thought English
husbands were lovely, and that she was preparing a
questionnaire to be circulated to the young men of
the United Kingdom, with a view to finding out their
matrimonial preferences.
“But English people won’t fill up questionnaires,’’
said Harriet.
“Won’t fill up questionnaires?’’ cried Miss Schuster-
Slatt, taken aback.
“No,’’ said Harriet, “they won’t. As a nation we are
not questionnaire-conscious.’’
—Dorothy Sayers, Gaudy Night
Nonresponse
The best way to deal with nonresponse is to prevent
it.
After nonresponse has occurred,
it is sometimes possible to construct models to
predict the missing data, but predicting
the missing observations is never as good as
observing them in the first place.

Nonrespondents often differ in critical ways from


respondents;
if the nonresponse rate is not negligible, inference
based only upon the respondents may be seriously
flawed.
Types of Nonresponse
We discuss two type of nonresponse in this chapter:
unit nonresponse, in which
the entire observation unit is missing, and
item nonresponse, in which some
measurements are present for the observation unit
but at least one item is missing.
In a survey of persons, unit nonresponse means that
the person provides no information for the
survey;
item nonresponse means that the person does not
respond to a particular item on the questionnaire.
Example
In the Current Population Survey (CPS) and the
National Crime Victimization Survey (NCVS), unit
nonresponse can arise for a variety of reasons:
The interviewer may not be able to contact the
household; the person may be ill and
cannot respond to the survey; the person may refuse
to participate in the survey.
In these surveys, the interviewer tries to get
demographic information about the nonrespondent
such as age, sex, and race, as well as characteristics
of the dwelling unit such as urban/rural status; this
information can be used later to try to adjust for the
nonresponse.
Item nonresponse occurs largely because of
refusals:A household may
decline to give information about income, for
example..
Example
In agriculture or wildlife surveys, the term missing
data is generally used instead of nonresponse, but the
concepts and remedies are similar.
In a survey of breeding ducks, for example, some
birds will not be found by the researchers; they are,
in a sense, nonrespondents. The nest may be raided
by predators before the investigator can determine
how many eggs were laid; this is comparable to item
nonresponse.
Dealing with Nonresponse
1. Prevent it.
Design the survey so that nonresponse is low. This is
by far the best method.
2. Take a representative subsample of the
nonrespondents;
use that subsample to make inferences about the
other nonrespondents.
3. Use a model to predict values for the
nonrespondents.
Weighting class adjustment methods implicitly use a
model to adjust for unit nonresponse. Imputation
often adjusts for item nonresponse, and parametric
models may be used for either type of nonresponse.
4. Ignore the nonresponse
(not recommended, but unfortunately common in
practice).
Dealing with Nonresponse
A common feature of poor surveys is a lack of time
spent on the design and nonresponse
follow-up in the survey. Many persons new to
surveys (and some, unfortunately,
not new) simply jump in and start collecting data
without considering potential
problems in the data collection process; they mail
questionnaires to everyone in the
target population and analyze those that are
returned. It is not surprising that such
surveys have poor response rates.
Example
Some surveys reported in academic journals on
purchasing, for example, have response rates
between 10 and 15%. It is difficult to
see how anything can be concluded about the
population in such a survey.
Example
A researcher who knows the target population well
will be able to anticipate some of the reasons for
nonresponse and prevent some of it. Most
investigators, however, do not know as much about
reasons for nonresponse as they think they do. They
need to discover why the nonresponse occurs and
resolve as many of the problems as possible before
commencing the survey.
Books on quality improvement or design
of experiments such as Montgomery (2000) or
Oehlert (2000) will tell you how to collect your data.
And, of course, you can rely on previous researchers’
experiments to help you
minimize nonsampling errors. The references on
design of experiments and quality
control in Chapter 15 are a good place to start;
Hidiroglou et al. (1993) give a general
Example
The 1990 U.S. decennial census attempted to survey each of the over 100
million households in the United States. The response rate for the mail
survey was 65%; households that did not mail in the questionnaire needed
to be contacted in person, adding millions of dollars to the cost of the
census. Increasing the mail response rate for future censuses would result
in tremendous savings.
Dillman et al. (1995) report results of a factorial experiment employed in
the 1992 Census Implementation Test, designed to explore the individual
effects and interactions of three experimental factors on response rates.
The three factors were:
(1) a prenotice letter alerting the household to the impending arrival of the
census form, (2) a stamped return envelope included with the census form,
and (3) a reminder postcard sent a few days after the census form. The
results were dramatic, as shown in Figure 8.1. The experiment established
that while all three factors influenced the response rate, the letter and
postcard led to greater gains in response rate than the
stamped return envelope. !
Example
Nonresponse can have many different causes; as a
result, no single method can be recommended for
every survey.
Platek (1977) classifies sources of nonresponse
as related to
(1) survey content,
(2) methods of data collection, and
(3) Respondent characteristics,

and illustrated various sources using the diagram in


Groves (1989) and Dillman et al. (2009) discuss
additional sources of nonresponse. The
following are some factors that may influence
response rate and data accuracy.
Example
The following are some factors that may influence
response rate and data accuracy.
Example
Missing items may occur in surveys for several
reasons: An interviewer may fail to
ask a question; a respondent may refuse to answer
the question or cannot provide
the information; a clerk entering the data may skip
the value. Sometimes, items with
responses are changed to missing when the data set
is edited or cleaned—a data editor
may not be able to resolve the discrepancies for an
individual 3-year-old who voted
in the last election, and may set both values to
missing.
Example
Imputation is commonly used to assign values
to the missing items. A replacement
value, often from another person in the survey who
is similar to the item nonrespondent
on other variables, is imputed (filled in) for the
missing value. When
imputation is used, an additional variable should be
created for the data set that indicates
whether the response was measured or imputed.
Example
Imputation procedures are used not only to reduce
the nonresponse bias but to
produce a “clean,” rectangular data set—one without
holes for the missing values.We
may want to look at tables for subgroups of the
population, and imputation allows us
to do that without considering the item nonresponse
separately each time we construct
a table.
Example
Imputation procedures are used not only to reduce
the nonresponse bias but to
produce a “clean,” rectangular data set—one without
holes for the missing values.We
may want to look at tables for subgroups of the
population, and imputation allows us
to do that without considering the item nonresponse
separately each time we construct
a table.
What Is an Acceptable
Response Rate?
Many references give advice on cut-offs for acceptability of
response rates. Babbie, for example, says: “A review of the
published social research literature suggests that a response rate
of at least 50 percent is considered adequate for analysis and
reporting.Aresponse of 60 percent is good; a response rate of 70
percent is very good”
(2007, 262).
I believe that giving such absolute guidelines for acceptable
response rates is dangerous and has led many survey
investigators to unfounded complacency about nonresponse;
many examples exist of surveys with a 70% response rate whose
results are flawed.
The NCVS needs corrections for nonresponse bias even with a
response rate of about 95%.
Be aware that response rates can be manipulated
What Is an Acceptable
Response Rate?
Very different results for response rate accrue depending on
which definition of response rate is used; all of the following have
been used in surveys:
number of completed interviews
number of units in sample
,
number of completed interviews
number of units contacted
,
completed interviews + ineligible units
contacted units
,
completed interviews
contacted units − (ineligible units)
,
completed interviews
contacted units − (ineligible units) − refusals
.
Note that a “response rate” calculated using the last formula will
be much higher than
one calculated using the first formula because the denominator is
What Is an Acceptable
Response Rate?
The American Association of Public Opinion Research (2008b)
gives guidelines
for classifying units in the sample as eligible, complete or partial
interviews, refusals,
or other categories, and gives six definitions for different response
rates. They recommend
that the quantities used in calculating response rate should be
defined for
every survey. TheAAPOR guidelines are available online
atwww.aapor.org; these are
widely accepted as the standards for reporting response rates,
and using them allows
response rates reported by different surveys to be compared.
What Is an Acceptable
Response Rate?
The American Association of Public Opinion Research (2008b)
gives guidelines
for classifying units in the sample as eligible, complete or partial
interviews, refusals,
or other categories, and gives six definitions for different response
rates. They recommend
that the quantities used in calculating response rate should be
defined for
every survey. TheAAPOR guidelines are available online
atwww.aapor.org; these are
widely accepted as the standards for reporting response rates,
and using them allows
response rates reported by different surveys to be compared.
What Is an Acceptable
Response Rate?
The U.S. Office of Management and Budget (2006) guidelines
require that a
nonresponse bias assessment be performed when the expected
unit response rate is
below 80%, or the expected item response rate is below 70%,
based on the definitions
given in the document for calculating response rate. The following
recommendations
from the U.S. Office of Management and Budget’s Federal
Committee on Statistical
Methodology, reported in González (1994), are helpful:
What Is an Acceptable
Response Rate?
Recommendation 1. Survey staffs should compute response rates
in a uniform fashion over time and document response rate
components on each edition of a survey.
Recommendation 2. Survey staffs for repeated surveys should
monitor response rate components (such as refusals, not-at-
homes, out-of-scopes, address not locatable, postmaster
returns, etc.) over time, in conjunction with routine
documentation of cost and design changes.
Recommendation 3. Response rate components should be
published in survey reports; readers should be given definitions of
response rates used, including actual counts, and
commentary on the relevance of response rates to the quality of
the survey data.
Recommendation 4. Some research on nonresponse can have real
payoffs. It should be encouraged by survey administrators as a
way to improve the effectiveness of data
collection operations.
PROBABILITY SAMPLES

• each unit in the population has a


known, nonzero probability of
selection
• a random number table or other
randomization mechanism is used to
choose the specific units to be
included in the sample.
PROBABILITY SAMPLES

• each unit in the population has a


known, nonzero probability of
selection
• a random number table or other
randomization mechanism is used to
choose the specific units to be
included in the sample.
MAJOR DESIGN COMPONENTS

• simple random sampling,


• stratified sampling, and

• cluster sampling.

• (systematic sampling )
1.
SIMPLE RANDOM SAMPLE (SRS)

• the simplest form of probability sample,


• an SRS of size n is taken when every
possible subset of n units in the
population has the same chance of being
the sample,
• the investigator is in effect mixing up
the population before grabbing n units.
• each individual is chosen entirely by
chance and each member of the
population has an equal chance of being
included in the sample.
1.
SIMPLE RANDOM SAMPLE (SRS)
1.
SIMPLE RANDOM SAMPLE (SRS)

Examples:
• Drawing names from a hat
• Random Numbers
1.
SIMPLE RANDOM SAMPLE (SRS)

Examples:
• Drawing names from a hat
• Random Numbers
TABLE OF RANDOM NUMBERS
1 2 3 4 5 6 7 8 9 10

1
49486 93775 88744 80091 92732 38532 41506 54131 44804 43637
2
94860 36746 04571 13150 65383 44616 97170 25057 02212 41930
3
10169 95685 47585 53247 60900 20097 97962 04267 29283 07550
4
12018 45351 15671 23026 55344 54654 73717 97666 00730 89083
5
45611 71585 61487 87434 07498 60596 36255 82880 84381 30433
6
89137 30984 18842 69619 53872 95200 76474 67528 14870 59628
7
94541 12057 30771 19598 96069 10399 50649 41909 09994 75322
8
89920 28843 87599 30181 26839 02162 56676 39342 95045 60146
9
32472 32796 15255 39636 90819 54150 24064 50514 15194 41450
10
63958 47944 82888 66709 66525 67616 75709 56879 29649 07325
SIMPLE RANDOM SAMPLE (SRS)
(EXAMPLE)

• The investigator does not need to


examine every member of the
population.
• A medical technician does not need
to drain you of blood to measure your
red blood cell count.
• Your blood is sufficiently well mixed
that any sample should be
representative.
 Advantages
 minimal knowledge of population needed
 External validity high; internal validity high;
statistical estimation of error
 Easy to analyze data
 Disadvantages
 High cost; low frequency of use
 Requires sampling frame
 Does not use researchers’ expertise
 Larger risk of random error than stratified
classes.uleth.ca/.../Sampling%20and%20Sample%20Size%20Determinati...
2.
STRATIFIED RANDOM SAMPLE

• the population is divided into


subgroups called strata.
• then an SRS is selected from each
stratum, and
• the SRSs in the strata are selected
independently.
• elements in the same stratum often
tend to be more similar than
randomly selected elements from the
whole population, so stratification
often increases precision
STRATIFIED RANDOM SAMPLE
(EXAMPLES)
STRATIFIED RANDOM SAMPLE
(EXAMPLES)

• different regions of the country in a


survey of people,
• different types of terrain in an
ecological survey, or
• sizes of firms in a business survey.
 Advantages
 Assures representation of all groups in
sample population needed
 Characteristics of each stratum can be
estimated and comparisons made
 Reduces variability from systematic

 Disadvantages
 Requires accurate information on
proportions of each stratum
 Stratified lists costly to prepare
3.
CLUSTER SAMPLE

•observation units in the population


are aggregated into larger sampling
units, called clusters,
CLUSTER SAMPLE
(EXAMPLE)

• you want to survey Lutheran church


members in Minneapolis but do not
have a list of all church members in
the city,
• you do have a list of all the Lutheran
churches
• take an SRS of the churches (clusters)

and then subsample all or some church


members (observation units)in the
selected churches
CLUSTER SAMPLE
(EXAMPLE)
 Advantages
 Low cost/high frequency of use
 Requires list of all clusters, but only of
individuals within chosen clusters
 Can estimate characteristics of both cluster
and population
 For multistage, has strengths of used methods
 Disadvantages
 Larger error for comparable size than other
probability methods
 Multistage very expensive and validity
depends on other methods used
4.
SYSTEMATIC SAMPLE

• a starting point is chosen from a list


of population members using a
random number,
• that unit, and every kth unit
thereafter, is chosen to be in the
sample.
consists of units that are equally
spaced in the list.
4.
SYSTEMATIC SAMPLE
(EXAMPLE)

•a type of probability sampling in


which every kth member of the
population is selected

• k = N/n
N = size of the population
n = sample size
4.
SYSTEMATIC SAMPLE
(EXAMPLE)

You want to obtain a sample of 100


from a population of 1,000. You would
select every 10th (or kth) person from
the list.
k = N/n
= 1000 / 100
= 10
4.
SYSTEMATIC SAMPLE
(EXAMPLE)
 Advantages
 Moderate cost; moderate usage
 External validity high; internal validity
high; statistical estimation of error
 Simple to draw sample; easy to verify

 Disadvantages
 Periodic ordering
 Requires sampling frame
20 INTEGERS
FROM THE POPULATION {1, 2, . . . , 100}.
(EXAMPLE)
• For the stratified sample, the population
was divided into the 10 strata
{1, 2, . . . ,10}, {11, 12, . . . , 20}, . . . , {91, 92, . . . ,
100}, and an SRS of 2 numbers was drawn
from each of the 10 strata. This ensures that
each stratum is represented in the sample.
• For the cluster sample, the population was
divided into 20 clusters {1, 2, 3, 4, 5},
{6, 7, 8, 9, 10}, . . . , {96, 97, 98, 99, 100}; an SRS
of 4 of these clusters was selected.
• For the systematic sample, the random
starting point was 3, so the sample contains
units 3, 8, 13, 18, and so on. !
20 INTEGERS
FROM THE POPULATION {1, 2, . . . , 100}.
(EXAMPLE)
MAJOR DESIGN COMPONENTS

• simple random sampling,


• stratified sampling, and

• cluster sampling.

• (systematic sampling )
MAJOR DESIGN COMPONENTS
Classification of Sampling
Methods
Sampling
Methods

Non-
Probability
probability

Simple
Systematic Convenience Snowball
Random

Cluster Stratified Judgment Quota


NON-PROBABILITY SAMPLE

 units of the sample are chosen on the basis of


personal judgment or convenience,

 samples are selected based on the subjective


judgement of the researcher, rather than
random selection (i.e., probabilistic methods),

 there are NO statistical techniques for


measuring random sampling error in a non-
probability sample. Therefore, generalizability
is never statistically appropriate.
NON-PROBABILITY SAMPLE
1.
CONVENIENCE SAMPLE

 Use subjects that are easily accessible


 Examples:
 Using family members or students in a
classroom
 Mall shoppers
1.
CONVENIENCE SAMPLE
(EXAMPLE)
 Advantages
 Very low cost
 Extensively used/understood

 No need for list of population elements

 Disadvantages
 Variability and bias cannot be measured or
controlled
 Projecting data beyond sample not justified.
2.
JUDGEMENT OR PURPOSIVE
SAMPLE

 an experienced research selects


the sample based on some
appropriate characteristic of
sample members… to serve a
purpose
 is a type of
nonrandom sample that is
selected based on the opinion of
an expert
2.
JUDGEMENT OR PURPOSIVE
(EXAMPLE)
3.
SNOWBALL SAMPLE

 the initial respondents are chosen by


probability or non-probability
methods, and then additional
respondents are obtained by
information provided by the initial
respondents
 is a non-
probability sampling technique where
existing study subjects recruit future
subjects from among their
acquaintances.
3.
SNOWBALL SAMPLE
(EXAMPLE)
3.
SNOWBALL SAMPLE
(EXAMPLE)
 Advantages
 low cost
 useful in specific circumstances

 useful for locating rare populations

 Disadvantages
 bias because sampling units not independent

 projecting data beyond sample not justified.


4.
QUOTA SAMPLE

 ensure that a certain characteristic of


a population sample will be
represented to the exact extent that
the investigator desires
 is a non-
probability sampling technique
wherein the assembled sample has
the same proportions of individuals
as the entire population with respect
to known characteristics, traits or
focused phenomenon
4.
QUOTA SAMPLE
(EXAMPLE)
 Advantages
 moderate cost
 very extensively used/understood

 no need for list of population elements

 introduces some elements of stratification

 Disadvantages
 variability and bias cannot be measured or
controlled (classification of subjects)
 projecting data beyond sample not justified.

S-ar putea să vă placă și