Documente Academic
Documente Profesional
Documente Cultură
Survey
Research
Methods
Editorial Board
Editor
Paul J. Lavrakas
Independent Consultant and Former Chief Research
Methodologist for The Nielsen Company
Managing Editor
Jody Smarr
The Nielsen Company
Advisory Board
Michael P. Battaglia
Abt Associates, Inc.
Daniel M. Merkle
ABC News
Trent D. Buskirk
Saint Louis University
Peter V. Miller
Northwestern University
Edith D. de Leeuw
Methodika
Linda Piekarski
Survey Sampling International
Carroll J. Glynn
Ohio State University
Elizabeth A. Stasny
Ohio State University
Allyson Holbrook
University of Illinois at Chicago
Jeffery A. Stec
CRA International
Michael W. Link
The Nielsen Company
Michael W. Traugott
University of Michigan
Encyclopedia of
Survey
Research
Methods
E D I T O R
Paul J. Lavrakas
Independent Consultant and Former Chief Research
Methodologist for The Nielsen Company
VOLUMES
&
Contents
Volume 1
List of Entries
vii
Readers Guide
xix
xxvii
xxviii
xxxv
Entries
A
B
C
D
E
F
G
1
47
73
177
217
259
297
H
I
J
K
L
M
309
321
403
407
413
441
Volume 2
List of Entries
vii
Readers Guide
xix
Entries
N
O
P
Q
R
S
495
547
563
649
671
675
Index
T
U
V
W
Z
969
873
915
937
955
967
List of Entries
Abandoned Calls. See Predictive Dialing
ABC News/Washington Post Poll
Absolute Agreement. See Reliability
Absolute Frequency. See Relative Frequency
Access Lines
Accuracy. See Precision
Acquiescence Response Bias
Action Research. See Research Design
Active Listening Skills. See Refusal Avoidance
Training (RAT)
Active Screening. See Screening
Adaptive Sampling
Add-a-Digit Sampling
Adding Random Noise. See Perturbation Methods
Address-Based Sampling
Address Matching. See Matched Number
Adjustment Errors. See Total Survey Error (TSE)
Advance Contact
Advance Letter
Agenda Setting
AgreeDisagree Scale. See Likert Scale
Aided Recall
Aided Recognition
Algorithm
Alpha, Significance Level of Test
Alternative Hypothesis
American Association for Public Opinion Research
(AAPOR)
American Community Survey (ACS)
American Statistical Association Section on Survey
Research Methods (ASA-SRMS)
Analysis of Variance (ANOVA)
Analysis Unit. See Unit
Anonymity
Answering Machine Messages
Approval Ratings
Area Frame
Area Probability Sample
Bias
Bilingual Interviewing
Bipolar Scale
Biweight Midvariance. See Variance
Blocking. See Random Assignment
Blurring. See Perturbation Methods
Bogus Pipeline. See Sensitive Topics
Bogus Question
Bootstrapping
Bounded Recall. See Bounding
Bounding
Boxplot Rules. See Variance
Branching
Breakoff. See Partial Completion
Bureau of Labor Statistics (BLS)
Busies
Buying Cooperation. See Noncontingent Incentive
Calibration. See Weighting
Callbacks
Call Center. See Research Call Center
Caller ID
Call Forwarding
Calling Rules
Call-In Polls
Call Screening
Call Sheet
CaptureRecapture Sampling
Case
Case Control Form. See Control Sheet
Case-Control Study
Case Outcome Rates. See Standard Definitions
Case Number. See Case
Case Study. See Research Design
Categorical Variable. See Nominal Measure
Causal-Comparative Research. See Research Design
Cell Phone Only Household
Cell Phone Sampling
Cell Suppression
Census
Certificate of Confidentiality
Check All That Apply
Chi-Square
Choice Questionnaire. See Public Opinion
City Directory. See Reverse Directory
Clarification Probe. See Probing
Closed-Ended Question
Closed Rings. See Snowball Sampling
Closeness Property. See Raking
Clustering
Cluster Sample
Cochran, W. G.
Codebook
Codec. See Voice over Internet Protocol (VoIP) and
the Virtual Computer-Assisted Telephone Interview
(CATI) Facility
Coder Variance
Code Value Labels. See Precoded Question
Coding
Coefficient of Variation. See Sample Size
Coercion. See Voluntary Participation
Cognitive Aspects of Survey Methodology (CASM)
Cognitive Burden. See Respondent Burden
Cognitive Interviewing
Cohens Kappa. See TestRetest Reliability
Cohort Panel Survey. See Panel Survey
Cold Call
Cold-Deck Imputation. See Hot-Deck Imputation
Common Rule
Completed Interview
Completion Rate
Complex Sample Surveys
Composite Estimation
Comprehension
Computer-Assisted Personal Interviewing (CAPI)
Computer-Assisted Self-Interviewing (CASI)
Computer-Assisted Telephone Interviewing (CATI)
Computer Audio-Recorded Interviewing.
See Quality Control
Computerized-Response Audience Polling (CRAP)
Computerized Self-Administered Questionnaires
(CSAQ)
Concordance Correlation Coefficient. See Test
Retest Reliability
Conditional Probability. See Probability
Confidence Interval
Confidence Level
Confidentiality
Consent Form
Constant
Construct
Construct Validity
Consumer Sentiment Index
Contactability
Contact Rate
Contacts
Content Analysis
Context Effect
Contingency Question
Contingency Table
List of Entries ix
Election Polls
Elements
Eligibility
Email Survey
Encoding
EPSEM Sample
Equal Probability of Selection
Error of Nonobservation
Error of Observation. See Errors of Commission
Errors of Commission
Errors of Omission
Establishment Survey
Estimated Best Linear Unbiased Prediction
(EBLUP). See Small Area Estimation
Estimation. See Bias
Ethical Principles
Ethnographic Research. See Research Design
European Directive on Data Protection. See Privacy
Event-Based Diary. See Diary
Event History Calendar
Event Location Matching. See Reverse Record Check
Excellence in Media Coverage of Polls Award. See
National Council on Public Polls (NCPP)
Exhaustive
Exit Polls
Experimental Design
Expert Sampling. See Nonprobability Sampling
External Validity
Extreme Response Style
Face-to-Face Interviewing
Factorial Design
Factorial Survey Method (Rossis Method)
Fallback Statements
False Negatives. See Errors of Omission
False Positives. See Errors of Commission
Falsification
Family-Wise Error Rate. See Type I Error
Fast Busy
Favorability Ratings
Fear of Isolation. See Spiral of Silence
Federal Communications Commission (FCC)
Regulations
Federal Trade Commission (FTC) Regulations
Feeling Thermometer
Fictitious Question. See Bogus Question
Field Coding
Field Director
Field Interviewer. See Interviewer
Field Period
List of Entries xi
Hansen, Morris
Hard Refusal. See Unit Nonresponse
Hausman Test. See Panel Data Analysis
Hawthorne Effect. See External Validity
Hidden Population. See Respondent-Driven
Sampling (RDS)
Historical Research. See Research Design
Hit Rate
Homophily Principle. See Respondent-Driven
Sampling (RDS)
Horse Race Journalism
Horvitz-Thompson Estimator. See Probability
Proportional to Size (PPS) Sampling
Hot-Deck Imputation
Household Panel Survey. See Panel Survey
Household Refusal
HTML Boxes
Ignorable Nonresponse
Ignorable Sampling Mechanism. See Model-Based
Estimation
Implicit Stratification. See Systematic Sampling
Imputation
Inbound Calling
Incentives
Incidental Truncation. See Self-Selected Sample
Incorrect Stratum Allocation. See
Stratified Sampling
Incumbent Rule. See Undecided Voters
Independent Variable
Index of Inconsistency. See TestRetest Reliability
Index of Reliability. See TestRetest Reliability
Indirect Costs. See Survey Costs
Ineligible
Inference
Inferential Population. See Population of Inference
Inferential Statistics. See Statistic
Informant
Informed Consent
Injunctive Norms. See Opinion Norms
In-Person Survey. See Field Survey
Institute for Social Research (ISR)
Institutional Review Board (IRB)
Interaction Analysis. See Behavior Coding
Interaction Effect
Interactive Voice Response (IVR)
Intercept Polls/Samples. See Mall Intercept Survey
Intercoder Reliability
Internal Consistency. See Cronbachs Alpha
Internal Validity
Log-In Polls
Longitudinal Studies
Mail Questionnaire
Mail Survey
Main Effect
Maintaining Interaction. See Refusal Avoidance
Training
Mall Intercept Survey
Manifest Variable. See Variable
Mapping. See Respondent-Related Error
Marginals
Margin of Error (MOE)
Mark-Release-Recapture Sampling. See CaptureRecapture Sampling
Masking. See Variance
Masking Effect. See Outliers
Mass Beliefs
Matched Number
Maximum Abandonment Rate. See Predictive Dialing
Maximum Required Sample Size. See
Statistical Power
Mean
Mean Imputation. See Imputation
Mean Square Error
Mean Substitution. See Missing Data
Measured Reliability. See Reliability
Measurement Error
Measure of Size (MOS). See Area Probability Sample
Median
Median Absolute Deviation (MAD). See Variance
Media Polls
M-Estimation. See Outliers
Meta-Analysis. See Research Design
Metadata
Method of Random Groups. See Variance Estimation
Methods Box
Microaggregation. See Disclosure; Perturbation
Methods
Minimal Risk
Misreporting
Missing at Random (MAR). See Missing Data
Missing by Design. See Missing Data
Missing Completely at Random (MCAR). See
Missing Data
Missing Data
Mitofsky-Waksberg Sampling
Mixed-Methods Research Design. See Research
Design
Mixed-Mode
List of Entriesxiii
Nondifferentiation
Nondirectional Hypothesis. See Research Hypothesis
Nondirective Probing
Nonignorable Nonresponse
Nonobservational Errors. See Total Survey Error (TSE)
Nonprobability Sampling
Nonresidential
Nonresponse
Nonresponse Bias
Nonresponse Error
Nonresponse Rates
Nonsampling Error
Nontelephone Household
Nonverbal Behavior
Nonzero Probability of Selection. See Probability
of Selection
NORC. See National Opinion Research Center
(NORC)
Normative Crystallization. See Opinion Norms
Normative Intensity. See Opinion Norms
Not Missing at Random (NMAR). See Missing Data
Null Hypothesis
Number Changed
Number of Strata. See Stratified Sampling
Number Portability
Number Verification
Nuremberg Code. See Ethical Principles
Observational Errors. See Total Survey Error (TSE)
One-and-a-Half-Barreled Question. See DoubleBarreled Question
Open-Ended Coding. See Content Analysis
Open-Ended Question
Opinion Norms
Opinion Question
Opinions
Optimal Allocation
Optimum Stratum Allocation. See Stratified
Sampling
Optimum Stratum Boundaries. See Strata
Ordinal Measure
Original Sample Member. See Panel Survey
Other [Specify]. See Exhaustive.
Outbound Calling
Outcome Rates. See Response Rates
Outliers
Out of Order
Out of Sample
Overcoverage
Overreporting
List of Entriesxv
Ratio Measure
Raw Data
Reactive Dependent Interviewing. See Dependent
Interviewing
Reactivity
Recall Loss. See Reference Period
Recency Effect
Recoded Variable
Recognition. See Aided Recognition
Recontact
Record Check
Reference Period
Reference Survey. See Propensity Scores
Refusal
Refusal Avoidance
Refusal Avoidance Training (RAT)
Refusal Conversion
Refusal Rate
Refusal Report Form (RRF)
Registration-Based Sampling (RBS)
Regression Analysis
Regression Estimation. See Auxiliary Variable
Regression Imputation. See Imputation
Reinterview
Relative Frequency
Reliability
Reminder Mailings. See Mail Survey
Repeated Cross-Sectional Design
Replacement
Replacement Questionnaire. See Total Design Method
Replicate. See Sample Replicates
Replicate Methods for Variance Estimation
Replication
Replication Weights. See Replicate Methods for
Variance Estimation
Reporting Unit. See Unit
Representative Sample
Research Call Center
Research Design
Research Hypothesis
Research Management
Research Question
Residence Rules
Respondent
Respondent Autonomy. See Informed Consent
Respondent Burden
Respondent Debriefing
Respondent-Driven Sampling (RDS)
Respondent Fatigue
RespondentInterviewer Matching. See Sensitive Topics
RespondentInterviewer Rapport
Respondent Number. See Case
Respondent Refusal
Respondent-Related Error
Respondent Rights. See Survey Ethics
Response
Response Alternatives
Response Bias
Response Error. See Misreporting
Response Latency
Response Order Effects
Response Propensity
Response Rates
Retrieval
Return Potential Model. See Opinion Norms
Reverse Directory
Reverse Directory Sampling
Reverse Record Check (Rho)
Role Playing
Rolling Averages
Roper Center for Public Opinion Research
Roper, Elmo
Rotating Groups. See Rotating Panel Design
Rotating Panel Design
Rotation Group Bias. See Panel Conditioning
Rounding Effect. See Response Bias
Round-Robin Interviews. See Role Playing
Sales Waves. See SUGing
Saliency
Salting. See Network Sampling
Sample
Sample Design
Sample Management
Sample Precinct
Sample Replicates
Sample Size
Sample Variance. See Variance
Sampling
Sampling Bias
Sampling Error
Sampling Fraction
Sampling Frame
Sampling Interval
Sampling Paradox. See Sampling
Sampling Pool
Sampling Precision. See Sampling Error
Sampling Unit. See Unit
Sampling Variance
Sampling With Replacement. See Replacement
Sampling Without Replacement
SAS
Satisficing
Screening
Seam Effect
Secondary Sampling Unit (SSU). See Segments
Secondary Telephone Numbers. See Mitofsky-Waksberg Sampling
Segments
Selectivity Bias. See Self-Selection Bias
Self-Administered Questionnaire
Self-Coding. See Coding
Self-Disqualification. See Social Isolation
Self-Reported Measure
Self-Selected Listener Opinion Poll (SLOP)
Self-Selected Sample
Self-Selection Bias
Self-Weighting Sample. See EPSEM Sample
Semantic Differential Technique
Semantic Text Grammar Coding.
See Question Wording as Discourse Indicators
Semi-Structured Interviews. See Interviewer
Sensitive Topics
Sequential Retrieval. See Event History Calendar
Sequential Sampling
Serial Position Effect. See Primacy Effect
Sheatsley, Paul
Show Card
Significance Level
Silent Probe. See Probing
Simple Random Sample
Single-Barreled Question. See Double-Barreled Question
Single-Stage Sample. See Multi-Stage Sample
Skip Interval. See Systematic Sampling
Skip Pattern. See Contingency Question
Small Area Estimation
Snowball Sampling
Social Barometer. See Opinion Norms
Social Capital
Social Desirability
Social Exchange Theory
Social Isolation
Social Well-Being. See Quality of Life Indicators
Soft Refusal. See Unit Nonresponse
Solomon Four-Group Design
Specification Errors. See Total Survey Error (TSE)
Spiral of Silence
Split-Half
List of Entriesxvii
Standard Definitions
Standard Error
Standard Error of the Mean
Standardized Survey Interviewing
STATA
Statistic
Statistical Disclosure Control. See Perturbation
Methods
Statistical Inference. See Inference
Statistical Package for the Social Sciences (SPSS)
Statistical Perturbation Methods. See Perturbation
Methods
Statistical Power
Statistics Canada
Step-Ladder Question
Straight-Lining. See Respondent Fatigue
Strata
Stratification. See Post-Stratification
Stratified Cluster Sampling. See Ranked-Set
Sampling (RSS)
Stratified Element Sampling. See Ranked-Set
Sampling (RSS)
Stratified Random Assignment. See Random
Assignment
Stratified Sampling
Stratum Allocation. See Stratified Sampling
Straw Polls
Stringer. See Sample Precinct
Structured Interviews. See Interviewer
Subclasses. See Population
Subgroup Analysis
Subsampling. See Perturbation Methods
Substitution. See Replacement
SUDAAN
Suffix Banks
SUGing
Summer Institute in Survey Research Techniques.
See Institute for Social Research (ISR)
Superpopulation
Supersampling. See Perturbation Methods
Supervisor
Supervisor-to-Interviewer Ratio
Suppression. See Cell Suppression
Survey
Survey Costs
Survey Ethics
Survey Methodology
Survey Packet. See Mail Survey
Survey Population. See Population; Target
Population
Survey Sponsor
Synthetic Estimate. See Small Area Estimation
Systematic Error
Systematic Sampling
Taboo Topics. See Sensitive Topics
Tailored Design Method. See Total Design Method
Tailoring
Targeting. See Tailoring
Target Population
Taylor Series Linearization
Technology-Based Training
Telemarketing
Teleological Ethics. See Ethical Principles
Telephone Computer-Assisted Self-Interviewing
(TACASI). See Interactive Voice Response (IVR)
Telephone Consumer Protection Act of 1991
Telephone Households
Telephone Interviewer. See Interviewer
Telephone Penetration
Telephone Surveys
Telescoping
Telesurveys. See Internet Surveys
Temporary Dispositions
Temporary Sample Member. See Panel Survey
Temporary Vacancy. See Residence Rules
TestRetest Reliability
Text Fills. See Dependent Interviewing
Think-Aloud Interviews. See Cognitive Interviewing
Third-Person Effect
Threatening Question. See Sensitive Topics
Time-Based Diary. See Diary
Time Compression Theory. See Telescoping
Time-in-Panel Bias. See Panel Conditioning
Time-Space Sampling. See Rare Populations
Tolerance Interval. See Outliers
Topic Saliency
Total Design Method (TDM)
Total Survey Error (TSE)
Touchtone Data Entry
Tracking Polls
Training Packet
Trend Analysis
Trial Heat Question
Trimmed Means. See Variance
Troldahl-Carter-Bryant Respondent Selection
Method
True Value
Trust in Government
t-Test
Readers Guide
The Readers Guide is provided to assist readers in locating articles on related topics. It classifies articles into
nine general topical categories: (1) Ethical Issues in Survey Research; (2) Measurement; (3) Nonresponse; (4)
Operations; (5) Political and Election Polling; (6) Public Opinion; (7) Sampling, Coverage, and Weighting; (8)
Survey Industry; and (9) Survey Statistics.
Measurement
Interviewer
Conversational Interviewing
Dependent Interviewing
Interviewer Effects
Interviewer Neutrality
Interviewer-Related Error
Interviewer Variance
Nondirective Probing
Probing
Standardized Survey Interviewing
Verbatim Responses
Mode
Mode Effects
Mode-Related Error
Questionnaire
Aided Recall
Aided Recognition
Attitude Measurement
Attitudes
Attitude Strength
Aural Communication
Balanced Question
Behavioral Question
Bipolar Scale
Bogus Question
Bounding
Branching
Check All That Apply
Closed-Ended Question
Codebook
Cognitive Interviewing
Construct
Construct Validity
xix
Context Effect
Contingency Question
Demographic Measure
Dependent Variable
Diary
Dont Knows (DKs)
Double-Barreled Question
Double Negative
Drop-Down Menus
Event History Calendar
Exhaustive
Factorial Survey Method (Rossis Method)
Feeling Thermometer
Forced Choice
Gestalt Psychology
Graphical Language
Guttman Scale
HTML Boxes
Item Order Randomization
Item Response Theory
Knowledge Question
Language Translations
Likert Scale
List-Experiment Technique
Mail Questionnaire
Mutually Exclusive
Open-Ended Question
Paired Comparison Technique
Precoded Question
Priming
Psychographic Measure
Questionnaire
Questionnaire Design
Questionnaire Length
Questionnaire-Related Error
Question Order Effects
Question Stem
Radio Buttons
Randomized Response
Random Order
Random Start
Ranking
Rating
Reference Period
Response Alternatives
Response Order Effects
Self-Administered Questionnaire
Self-Reported Measure
Semantic Differential Technique
Sensitive Topics
Show Card
Step-Ladder Question
True Value
Unaided Recall
Unbalanced Question
Unfolding Question
Vignette Question
Visual Communication
Respondent
Coder Variance
Coding
Content Analysis
Field Coding
Focus Group
Intercoder Reliability
Interrater Reliability
Readers Guidexxi
Interval Measure
Level of Measurement
Litigation Surveys
Measurement Error
Nominal Measure
Ordinal Measure
Pilot Test
Ratio Measure
Reliability
Replication
Split-Half
Nonresponse
Item-Level
Missing Data
Nonresponse
Outcome Codes and Rates
Busies
Completed Interview
Completion Rate
Contactability
Contact Rate
Contacts
Cooperation Rate
e
Fast Busy
Final Dispositions
Hang-Up During Introduction (HUDI)
Household Refusal
Ineligible
Language Barrier
Noncontact Rate
Noncontacts
Noncooperation Rate
Nonresidential
Nonresponse Rates
Number Changed
Out of Order
Out of Sample
Partial Completion
Refusal
Refusal Rate
Respondent Refusal
Response Rates
Standard Definitions
Temporary Dispositions
Unable to Participate
Unavailable Respondent
Unknown Eligibility
Unlisted Household
Unit-Level
Advance Contact
Attrition
Contingent Incentives
Controlled Access
Cooperation
Differential Attrition
Differential Nonresponse
Economic Exchange Theory
Fallback Statements
Gatekeeper
Ignorable Nonresponse
Incentives
Introduction
Leverage-Saliency Theory
Noncontingent Incentives
Nonignorable Nonresponse
Nonresponse
Nonresponse Bias
Nonresponse Error
Refusal Avoidance
Refusal Avoidance Training (RAT)
Refusal Conversion
Refusal Report Form (RRF)
Response Propensity
Saliency
Social Exchange Theory
Social Isolation
Tailoring
Total Design Method (TDM)
Unit Nonresponse
Operations
General
Advance Letter
Bilingual Interviewing
Case
Data Management
Dispositions
Field Director
Field Period
Mode of Data Collection
Multi-Level Integrated Database Approach (MIDA)
Paper-and-Pencil Interviewing (PAPI)
Paradata
Quality Control
Recontact
Reinterview
Research Management
Sample Management
Sample Replicates
Supervisor
Survey Costs
Technology-Based Training
Validation
Verification
Video Computer-Assisted Self-Interviewing
(VCASI)
In-Person Surveys
Interviewer
Interviewer Characteristics
Interviewer Debriefing
Interviewer Monitoring
Interviewer Monitoring Form (IMF)
Interviewer Productivity
Interviewer Training
Interviewing
Nonverbal Behavior
RespondentInterviewer Rapport
Role Playing
Training Packet
Usability Testing
Telephone Surveys
Access Lines
Answering Machine Messages
Callbacks
Caller ID
Call Forwarding
Calling Rules
Call Screening
Call Sheet
Cold Call
Computer-Assisted Telephone
Interviewing (CATI)
Do-Not-Call (DNC) Registries
Federal Communications Commission (FCC)
Regulations
Federal Trade Commission (FTC) Regulations
Hit Rate
Inbound Calling
Interactive Voice Response (IVR)
Listed Number
Matched Number
Nontelephone Household
Number Portability
Number Verification
Outbound Calling
Predictive Dialing
Prefix
Privacy Manager
Research Call Center
Reverse Directory
Suffix Banks
Supervisor-to-Interviewer Ratio
Telephone Consumer Protection Act 1991
Telephone Penetration
Telephone Surveys
Touchtone Data Entry
Unmatched Number
Unpublished Number
Videophone Interviewing
Voice over Internet Protocol (VoIP) and the Virtual
Computer-Assisted Telephone Interview (CATI)
Facility
Cover Letter
Disk by Mail
Mail Survey
Readers Guidexxiii
Public Opinion
Agenda Setting
Consumer Sentiment Index
Issue Definition (Framing)
Knowledge Gap
Mass Beliefs
Opinion Norms
Opinion Question
Opinions
Perception Question
Political Knowledge
Public Opinion
Public Opinion Research
Quality of Life Indicators
Question Wording as Discourse Indicators
Social Capital
Spiral of Silence
Third-Person Effect
Topic Saliency
Trust in Government
Survey Industry
American Association for Public Opinion
Research (AAPOR)
Readers Guidexxv
Survey Statistics
Algorithm
Alpha, Significance Level of Test
Alternative Hypothesis
Analysis of Variance (ANOVA)
Attenuation
Auxiliary Variable
Balanced Repeated Replication (BRR)
Bias
Bootstrapping
Chi-Square
Composite Estimation
Confidence Interval
Confidence Level
Constant
Contingency Table
Control Group
Correlation
Covariance
Cronbachs Alpha
Cross-Sectional Data
Data Swapping
Design-Based Estimation
Design Effects (deff)
Ecological Fallacy
Effective Sample Size
Experimental Design
Factorial Design
Finite Population Correction (fpc) Factor
Frequency Distribution
F-Test
Hot-Deck Imputation
Imputation
Independent Variable
Inference
Interaction Effect
Internal Validity
Interval Estimate
Intracluster Homogeneity
Jackknife Variance Estimation
Level of Analysis
Main Effect
Marginals
Margin of Error (MOE)
Mean
Mean Square Error
Median
Metadata
Mode
Model-Based Estimation
Multiple Imputation
Noncausal Covariation
Null Hypothesis
Outliers
Panel Data Analysis
Parameter
(Rho)
Sampling Bias
Sampling Error
Sampling Variance
SAS
Seam Effect
Significance Level
xxvii
Contributors
Sowmya Anand
University of Illinois
Ren Bautista
University of NebraskaLincoln
Mindy Anderson-Knott
University of NebraskaLincoln
Patricia C. Becker
APB Associates, Inc.
H. ztas Ayhan
Middle East Technical University
Robert F. Belli
University of NebraskaLincoln
Janice Ballou
Mathematica Policy Research
Mildred A. Bennett
The Nielsen Company
Badi H. Baltagi
Syracuse University
Pazit Ben-Nun
State University of New York,
Stony Brook University
Laura Barberena
University of Texas at Austin
Sandra H. Berry
RAND
Kirsten Barrett
Mathematica Policy Research
Marcus Berzofsky
RTI International
Allen H. Barton
University of North Carolina at
Chapel Hill
Jonathan Best
Princeton Survey Research
International
Danna Basson
Mathematica Policy Research
Jelke Bethlehem
Statistics Netherlands
Michael P. Battaglia
Abt Associates, Inc.
Matthew Beverlin
University of Kansas
Joseph E. Bauer
American Cancer Society
David A. Binder
Statistics Canada
Christopher W. Bauman
Northwestern University
George F. Bishop
University of Cincinnati
Jonathan E. Brill
University of Medicine and
Dentistry of New Jersey
Sandra L. Bauman
Bauman Research
Steven Blixt
Bank of America
xxviii
Contributorsxxix
Trent D. Buskirk
Saint Louis University
Kathryn A. Cochran
University of Kansas
Sarah Butler
National Economic Research
Associates
Jon Cohen
The Washington Post
Mario Callegaro
Knowledge Networks
Pamela Campanelli
The Survey Coach
Michael P. Cohen
Bureau of Transportation
Statistics
Marjorie Connelly
The New York Times
David DesRoches
Mathematica Policy Research,
Inc.
Dennis Dew
Loyola University Chicago
Isaac Dialsingh
Pennsylvania State University
Lillian Diaz-Hoffmann
Westat
Patrick J. Cantwell
U.S. Census Bureau
Matthew Courser
Pacific Institute for Research
and Evaluation
Bryce J. Dietrich
University of Kansas
Xiaoxia Cao
University of Pennsylvania
Brenda G. Cox
Battelle
Wil Dijkstra
Free University, Amsterdam
Lisa Carley-Baxter
RTI International
Douglas B. Currivan
RTI International
Don A. Dillman
Washington State University
Richard T. Curtin
University of Michigan
Charles DiSogra
Knowledge Networks
Sylvia Dohrmann
Westat
Wolfgang Donsbach
Technische Universitt Dresden
Robert P. Daves
Daves & Associates Research
Katherine A. Draughon
Draughon Research, LLC
Bonnie D. Davis
Public Health Institute, Survey
Research Group
Karen E. Davis
National Center for Health
Statistics
Natalie E. Dupree
National Center for Health
Statistics
Matthew DeBell
Stanford University
Jennifer Dykema
University of Wisconsin
Femke De Keulenaer
Gallup Europe
Asia A. Eaton
University of Chicago
Edith D. de Leeuw
Methodika
Murray Edelman
Rutgers University
Woody Carter
University of Chicago
Barbara L. Carvalho
Marist College
Rachel Ann Caspar
RTI International
Jamie Patrick Chandler
City University of New York
Haiying Chen
Wake Forest University
Young Ik Cho
University of Illinois at Chicago
Leah Melani Christian
Pew Research Center
James R. Chromy
RTI International
M. H. Clark
Southern Illinois University
Carbondale
Mansour Fahimi
Marketing Systems Group
Ryan Gibb
University of Kansas
Dirk Heerwegh
Katholieke Universiteit Leuven
Moshe Feder
RTI International
Sean O. Hogan
RTI International
Karl G. Feld
D3 Systems, Inc.
Jason E. Gillikin
QSEC Consulting Group, LLC
Allyson Holbrook
University of Illinois at Chicago
Howard Fienberg
CMOR
Lisa M. Gilman
University of Delaware
Gregory G. Holyk
University of Illinois at Chicago
Agnieszka Flizik
BioVid
Patrick Glaser
CMOR
Adriaan W. Hoogendoorn
Vrije Universiteit, Amsterdam
Amy Flowers
Market Decisions
Carroll J. Glynn
Ohio State University
Lew Horner
Ohio State University
E. Michael Foster
University of North Carolina at
Chapel Hill
John Goyder
University of Waterloo
Joop Hox
Utrecht University
Kelly N. Foster
University of Georgia
Paul Freedman
University of Virginia
Marek Fuchs
University of Kassel
Siegfried Gabler
Universitt Mannheim
Matthias Ganninger
Gesis-ZUMA
Ingrid Graf
University of Illinois at Chicago
Eric A. Greenleaf
New York University
Thomas M. Guterbock
University of Virginia
Erinn M. Hade
Ohio State University
Sabine Hder
Gesis-ZUMA
Michael Huge
Ohio State University
Larry Hugick
Princeton Survey Research
International
Li-Ching Hung
Mississippi State University
Ronaldo Iachan
Macro International
John Hall
Mathematica Policy Research
Susan S. Jack
National Center for Health
Statistics
Janet Harkness
University of NebraskaLincoln
Annette Jckle
University of Essex
Jane F. Gentleman
National Center for Health
Statistics
Chase H. Harrison
Harvard University
Matthew Jans
University of Michigan
Amy R. Gershkoff
Princeton University
Rachel Harter
University of Chicago
Sharon E. Jarvis
University of Texas at Austin
Malay Ghosh
University of Florida
Douglas D. Heckathorn
Cornell University
Guillermina Jasso
New York University
Cecilie Gaziano
Research Solutions, Inc.
Contributorsxxxi
E. Deborah Jay
Field Research Corporation
Hyunshik Lee
Westat
Timothy Johnson
University of Illinois at Chicago
Gerald M. Kosicki
Ohio State University
Sunghee Lee
University of California,
Los Angeles
Phillip S. Kott
USDA/NASS
John Kovar
Statistics Canada
Sema A. Kalaian
Eastern Michigan University
Tom Krenzke
Westat
William D. Kalsbeek
University of North Carolina
Rafa M. Kasim
Kent State University
Randall Keesling
RTI International
Scott Keeter
Pew Research Center
Jenny Kelly
NORC at the University of
Chicago
Courtney Kennedy
University of Michigan
John M. Kennedy
Indiana University
Timothy Kennel
U.S. Census Bureau
Kate Kenski
University of Arizona
SunWoong Kim
Dongguk University
Irene Klugkist
Utrecht University
Thomas R. Knapp
University of Rochester and Ohio
State University
Frauke Kreuter
University of Maryland
Parvati Krishnamurty
NORC at the University of
Chicago
Karol Krotki
RTI International
Dale W. Kulp
Marketing Systems Group
Richard Kwok
RTI International
Jennie W. Lai
The Nielsen Company
Dennis Lambries
University of South Carolina
Jason C. Legg
Iowa State University
Stanley Lemeshow
Ohio State University
Gerty Lensvelt-Mulders
Universiteit Utrecht
James M. Lepkowski
University of Michigan
Tim F. Liao
University of Illinois at UrbanaChampaign
Michael W. Link
The Nielsen Company
Jani S. Little
University of Colorado
Cong Liu
Hofstra University
Kamala London
University of Toledo
Gary Langer
ABC News
Geert Loosveldt
Katholieke Universiteit Leuven
Michael D. Larsen
Iowa State University
Mary E. Losch
University of Northern Iowa
Paul J. Lavrakas
Independent Consultant and
Former Chief Research
Methodologist for The Nielsen
Company
Thomas Lumley
University of Washington
Geon Lee
University of Illinois at Chicago
Tina Mainieri
Survey Sciences Group, LLC
Lars Lyberg
Statistics Sweden
David W. Moore
University of New Hampshire
Kristen Olson
University of Michigan
Donald J. Malec
U.S. Census Bureau
Jeffrey C. Moore
U.S. Census Bureau
Diane ORourke
University of Illinois at
Chicago
Allan L. McCutcheon
University of NebraskaLincoln
Richard Morin
Pew Research Center
Daniel G. McDonald
Ohio State University
Patricia Moy
University of Washington
John P. McIver
University of Colorado
Mary H. Mulry
U.S. Census Bureau
Douglas M. McLeod
University of WisconsinMadison
Ralf Mnnich
University of Trier
Daniel M. Merkle
ABC News
Joe Murphy
RTI International
Philip Meyer
University of North Carolina at
Chapel Hill
Gad Nathan
Hebrew University of Jerusalem
Peter V. Miller
Northwestern University
Lee M. Miringoff
Marist College
Shannon C. Nelson
University of Illinois at
Chicago
Thomas E. Nelson
Ohio State University
Larry Osborn
Abt Associates, Inc.
Ronald E. Ostman
Cornell University
Mary Outwater
University of Oklahoma
Linda Owens
University of Illinois at
Urbana-Champaign
Michael Parkin
Oberlin College
Jennifer A. Parsons
University of Illinois at
Chicago
Jeffrey M. Pearson
University of Michigan
Steven Pedlow
NORC at the University of
Chicago
William L. Nicholls
U.S. Census Bureau (Retired)
Matthew C. Nisbet
American University
Andy Peytchev
RTI International
Andrew Noymer
University of California, Irvine
Linda Piekarski
Survey Sampling International
Barbara C. OHare
Arbitron, Inc.
Geraldine M. Mooney
Mathematica Policy Research
Robert W. Oldendick
University of South Carolina
Kathy Pilhuj
Scarborough Research
Danna L. Moore
Washington State University
Randall Olsen
Ohio State University
Stephen R. Porter
Iowa State University
Michael Mokrzycki
Associated Press
J. Quin Monson
Brigham Young University
Jill M. Montaquila
Westat
Christopher Z. Mooney
University of Illinois at
Springfield
Contributorsxxxiii
Frank Potter
Mathematica Policy Research
Matthias Schonlau
RAND
Tom W. Smith
NORC at the University of Chicago
Kevin B. Raines
Corona Research, Inc.
Paul Schroeder
Abt SRBI
Jolene D. Smyth
University of NebraskaLincoln
Susanne Rssler
Otto-Friedrich-University Bamberg
Tricia Seifert
University of Iowa
Elizabeth A. Stasny
Ohio State University
Bryce B. Reeve
National Cancer Institute
William R. Shadish
University of California at Merced
Jeffery A. Stec
CRA International
Lance J. Rips
Northwestern University
Dhavan V. Shah
University of WisconsinMadison
David Steel
University of Wollongong
Jacob Shamir
Hebrew University of Jerusalem
Sonya K. Sterba
University of North Carolina
at Chapel Hill
Gary M. Shapiro
Westat
Jennifer M. Rothgeb
U.S. Census Bureau
Joel K. Shapiro
Rockman et al.
Donald B. Rubin
Harvard University
Carol Sheets
Indiana University
Tams Rudas
Eotvos Lorand University
Sarah Shelton
Saint Louis University
Pedro Saavedra
ORC Macro
Charles D. Shuttles
The Nielsen Company
Adam Safir
RTI International
Samuel Shye
Hebrew University of Jerusalem
Joseph W. Sakshaug
University of Michigan
Richard Sigman
Westat
Charles T. Salmon
Michigan State University
Trevor N. Tompson
Associated Press
Carla R. Scanlan
Independent Researcher
N. Clayton Silver
University of Nevada, Las Vegas
Jeff Toor
San Diego State University
Fritz Scheuren
NORC at the University of Chicago
Jody Smarr
The Nielsen Company
Roger Tourangeau
University of Maryland
Michael F. Schober
New School for Social Research
Michael W. Traugott
University of Michigan
Kenneth W. Steve
Abt SRBI
John Stevenson
University of Wisconsin
James W. Stoutenborough
University of Kansas
John Tarnai
Washington State University
Charles Tien
City University of New York,
Hunter College
Lois E. Timms-Ferrara
University of Connecticut
Alberto Trobia
University of Palermo
Norm Trussell
The Nielsen Company
Clyde Tucker
U.S. Bureau of Labor Statistics
Geoffrey R. Urland
Corona Research
Akhil K. Vaish
RTI International
Melissa A. Valerio
University of Michigan
Wendy Van de Kerckhove
Westat
Patrick Vargas
University of Illinois at
Urbana-Champaign
Timothy Vercellotti
Rutgers University
Herbert F. Weisberg
Ohio State University
Eric White
University of Wisconsin
Rand R. Wilcox
University of Southern
California
Rick L. Williams
RTI International
Gordon B. Willis
National Cancer Institute
Michael B. Witt
RTI International
Jonathan Wivagg
PTV DataSource
Douglas A. Wolfe
Ohio State University
Daniel B. Wright
University of Sussex
Changbao Wu
University of Waterloo
Ting Yan
NORC at the University of
Chicago
Y. Michael Yang
University of Chicago
Elaine L. Zanutto
National Analysts Worldwide
Elizabeth R. Zell
Centers for Disease Control
and Prevention
Weiyu Zhang
University of Pennsylvania
Ana Villar
University of NebraskaLincoln
James Wolf
Indiana University at
Indianapolis
Sonja Ziniel
University of Michigan
Shapard Wolf
Arizona State University
Mary B. Ziskin
Indiana University
Introduction
Introductionxxxvii
error, question-related measurement error, interviewer-related measurement error, respondentrelated measurement error, and mode-related
measurement error.
5. How will the data be processed, weighted, and analyzed? This concerns adjustment error and processing error.
via interviewers (such as interviewer training, interviewer monitoring, and interviewer debriefing)
Political and Election Polling. This group includes
survey methods that are specific to election-related
and other types of political polling. These entries
include measurement topics (such as approval ratings,
convention bounce, leaning voters, and probable electorate), media-related topics (such as election night
projections, horse race journalism, and precision journalism) and types of election or political surveys
(such as deliberative polls, exit polls, pre-primary
polls, and tracking polls).
Public Opinion. The entries in the public opinion
grouping focus on a wide range of theoretical matters
that affect the understanding of public opinion, with
special attention to the methodological issues that are
related to each theoretical concept. This set of entries
includes agenda setting, knowledge gap, spiral of
silence, third-person effect, and trust in government.
Sampling, Coverage, and Weighting. This group covers a large and broad set of entries, many of which are
interrelated to sampling, coverage, and weighting,
such as address-based sampling, cell phone sampling,
coverage error, designated respondent, finite population, interpenetrated design, Neyman allocation, poststratification, quota sampling, replacement, sample
size, undercoverage, and zero-number banks.
Survey Industry. The entries in the survey industry
grouping include ones describing major survey professional organizations (such as AAPOR, CMOR, and
CASRO), major academic-based survey organizations
and government-based survey agencies (such as
NORC, ISR, Bureau of Labor Statistics, and Statistics
Canada), major figures in the history of survey
research (such as Elmo Roper, Leslie Kish, Morris
Hansen, and George Gallup), major U.S. government
surveys (such as the Behavioral Risk Factor
Surveillance System, the Current Population Survey,
and the National Health Interview Survey), and major
survey research periodicals (such as Public Opinion
Quarterly, the Journal of Official Statistics, and the
International Journal of Public Opinion Research).
Survey Statistics. The survey statistics grouping covers a
diverse spectrum of statistical concepts and procedures
that survey researchers use to help analyze and interpret
Introductionxxxix
Acknowledgments
I would like to begin by thanking Sage Publications
for believing that there should be an Encyclopedia of
Survey Research Methods and that I was a good
choice to serve as its editor. Here Lisa Cuevas Shaw,
acquisitions editor at Sage, played a major role. I am
indebted to Diana Axelsen, the developmental editor
at Sage with whom I worked most closely during the
final 2 years of the project, for her intelligence, guidance, encouragement, patience, and friendship. I also
thank Letty Gutierrez, reference systems manager at
Sage, for the numerous occasions that she fixed
things in the SRT that I was not able to. At the copyediting and production stages, I am especially grateful
to the conscientiousness, editing abilities, commitment,
and flexibility of Tracy Buyan (production editor),
Colleen Brennan (copy editor), and Pam Suwinsky
(copy editor). There were many others at Sage who
worked hard and intelligently to make this encyclopedia possible, but I am especially thankful to those who
created, maintained, and updated the SRT, which provided the Web-based platform that managed almost all
the invitations, submissions, reviews, and revisions.
I also am indebted to Jody Smarr, the administrative staff member at The Nielsen Company, who was
assigned to work with me during the last 2 years of the
project, including the last 13 months after I ended my
employment with the company. Ms. Smarrs intelligence, organization, reliability, and calm demeanor
will always be remembered and appreciated. I also
thank Paul Donato, chief research officer at Nielsen,
for committing that the company would be supportive
A
2001); daily pre-election tracking polls, in which the
Post joined ABC as of 2000; and a weekly consumer
confidence survey, in which the Post in 2005 joined an
ABC effort ongoing since 1985.
The Post has been polling on its own since 1975,
ABC since 1979. Their partnership was created by
Dick Wald, senior vice president of ABC News, and
his friend Ben Bradlee, the Posts editor. Wald
pitched the idea at lunch. Bradlee said, Okay. You
have a deal, he recalled. We just shook hands.
There was no contract, no paper, no anything else.
Jeffrey Alderman was longtime director of the
survey for ABC, replaced in 1998 by Gary Langer.
Barry Sussman directed for the Post, replaced in 1987
by Richard Morin, who in turn was succeeded in
2006 by Jonathan Cohen, then ABCs assistant polling director.
The news organizations also conduct polls on their
own and with other partners. In 2005, ABC won the
first news Emmy Award to cite a public opinion poll,
for its second national survey in Iraq, on which it
partnered with the BBC, the German network ARD,
and USA Today. ABC also won the 2006 Iowa/Gallup
award and 2006 National Council on Public Polls
award for its polling in Iraq and Afghanistan; the Post
won the 2007 Iowa/Gallup award for its survey focusing on black men in America, a poll it conducted with
the Henry J. Kaiser Family Foundation and Harvard
University.
Their joint polling nonetheless has been the most
consistent feature of both organizations efforts to
ABC NEWS/WASHINGTON
POST POLL
ABC News and The Washington Post initiated their
polling partnership on February 19, 1981, announcing
an 18-month agreement to jointly produce news surveys on current issues and trends. More than 25 years,
475 surveys, and 500,000 individual interviews later,
the partnership has proved an enduring one. Their first
shared surveyknown as the ABC/Post poll to viewers of ABC News, and the Post/ABC survey to readers of the Postfocused on newly elected President
Ronald Reagans tax- and budget-cutting plans. While
their work over the years has covered attitudes on
a broad range of social issues, ABC and the Post have
focused their joint polling primarily on politics and
elections.
The two organizations consult to develop survey
subjects, oversee methodology and research, and write
questionnaires; each independently analyzes and reports
the resulting data. Sampling, field work, and tabulation
for nearly all ABC/Post polls have been managed from
the start by the former Chilton Research Services, subsequently acquired by the multi-national research firm
Taylor Nelson Sofres.
In addition to full-length, multi-night surveys, ABC
and the Post have shared other polls designed to meet
news demands, including one-night surveys (e.g.,
immediately after the terrorist attacks of September 11,
Access Lines
ACCESS LINES
An access line is a telecommunications link or telephone line connecting the central office or local
switching center of a telephone company to the end
user. Access lines are sometimes referred to as local
routing numbers (LRNs), wireline loops, or switched
access lines, and they do not include telephone numbers used for wireless services. Access lines provide
access to a residence or business over twisted-pair
copper wire, coaxial cable, or optical fiber. The
Federal Communications Commission reported that
as of December 31, 2005, there were approximately
175.5 million switched access lines in the United
States. Access lines are normally assigned in prefixes
or 1000-blocks classified by Telcordia as POTS
(Plain Old Telephone Service), and most frames
used for generating telephone samples are restricted
to POTS prefixes and 1000-blocks.
Approximately two thirds of all access lines connect to a residence, which suggests that two thirds of
working numbers in a telephone sample should be
residential. Many business access lines are in dedicated prefixes or banks and do not appear in a listassisted random-digit dialing (RDD) telephone sample. However, since a single business will frequently
have multiple access lines, such as rollover lines,
direct inward dial lines, fax lines, and modem lines,
those access lines that are not in dedicated banks will
appear in an RDD sample, substantially increasing the
number of ineligible units.
A household also may have more than one access
line. Over the years some households added additional
access lines for children or home businesses. The
increased use of home computers and residential fax
machines in the 1990s further increased the number of
residences with two or more access lines. Because
multiple lines meant multiple probabilities of selection
Adaptive Sampling
Further Readings
ADAPTIVE SAMPLING
Adaptive sampling is a sampling technique that is
implemented while a survey is being fieldedthat is,
the sampling design is modified in real time as data collection continuesbased on what has been learned from
previous sampling that has been completed. Its purpose
is to improve the selection of elements during the
remainder of the sampling, thereby improving the representativeness of the data that the entire sample yields.
Background
The purpose of sampling is to learn about one or more
characteristics of a population of interest by investigating a subset, which is referred to as a sample, of
that population. Typical population quantities of interest include the population mean, total, and proportion.
For example, a population quantity of interest might
be the total number of people living in New York
City, their average income, and so on. From the sample collected, estimates of the population quantities of
interest are obtained. The manner in which the sample
is taken is called a sampling design, and for a sampling design various estimators exist. There is a multitude of sampling designs and associated estimators.
Adaptive Sampling
0
0
0
Figure 1
A final sample using ACS design with an initial simple random sample without replacement of size n = 4 from
a population of size N = 56
edge units, are networks of size 1. Figure 1 is an example of a final sample from an ACS, with an initial simple random sample without replacement taken from
a forest partitioned into N = 56: The objective is to
estimate the number of wolves in the forest. The condition to adaptively add neighboring units is finding one
or more wolves in the unit sampled. The neighborhood
is spatial and defined as north, south, east, and west.
The initial sample is of size n = 4, represented by the
dark bordered units. The units with a dotted border are
adaptive added units. The adjacent units with the
values 2, 6, 3 form a network of size 3. The units with
a dotted border and a value of zero are edge units. The
edge units plus the latter network of size 3 form a cluster. The edge units and the other units in the sample
with a value of zero are networks of size 1.
In ACS, networks are selected with unequal probability. In typical unequal probability sampling, the
probability of units included in the sample is determined before sampling begins. The typical estimators
in ACS can be viewed as a weighted sum of networks, where the size of the network and whether the
network was intersected in the initial sample is used
to calculate the weights. Networks that are also edge
units can enter into the final sample by being intersected in the initial sample or by being adaptively
added, whereas other networks must be intersected in
the initial sample. For the latter reason, the typical
estimators do not incorporate edge units not intersected in the initial sample. Some estimators have
been derived using the Rao-Blackwell theorem, which
can incorporate edge units in the final sample but not
in the initial sample.
For various reasons, when taking an ACS, it is
often not feasible to sample the entire cluster; for
Add-a-Digit Sampling
Further Readings
ADD-A-DIGIT SAMPLING
Add-a-digit sampling is a method of creating a sample
of telephone numbers to reach the general public
within some geopolitical area of interest. This method
is related to directory sampling in that the first step
involves drawing a random sample of residential
directory-listed telephone numbers from a telephone
directory that covers the geographic area of interest.
In add-a-digit sampling, the selected directory-listed
telephone numbers are not called. Rather, they form
the seeds for the list of numbers that will be called.
For each directory-listed telephone number drawn
from the telephone directory, the last digit of the telephone number is modified by adding one to the last
digit. The resulting number is treated as one of the
telephone numbers to be sampled. This is the simplest
form of add-a-digit sampling. When it was originally
devised in the 1970s, it was an important advancement over directory-listed sampling in that the resulting sample of telephone numbers included not only
listed numbers but also some numbers that were
unlisted residential telephone numbers.
Another practice is to take a seed number and generate several sample telephone numbers by adding 1, 2,
3, 4, 5, and so on to the last digit of the telephone
number. However, in the application of this technique,
it was found that the higher the value of the digit added
to the last digit of the seed telephone number, the less
likely the resulting telephone number would be a residential number. Still another method involves drawing
the seed telephone numbers and replacing the last two
digits with a two-digit random number.
Add-a-digit sampling originated as a method for
including residential telephone numbers that are not
listed in the telephone directory in the sample. These
unlisted numbers are given a zero probability of selection in a directory-listed sample. In add-a-digit sampling, some unlisted telephone numbers will be
included in the sample, but it is generally not possible
to establish that all unlisted residential telephone numbers have a nonzero probability of selection. Moreover,
it is difficult to determine the selection probability of
each telephone number in the population, because the
listed and unlisted telephone numbers may exhibit
different distributions in the population of telephone
numbers. For example, one might encounter 500 consecutive telephone numbers that are all unlisted numbers. Because of these and other limitations, add-a-digit
Address-Based Sampling
Further Readings
ADDRESS-BASED SAMPLING
Address-based sampling (ABS) involves the selection
of a random sample of addresses from a frame listing
of residential addresses. The technique was developed
in response to concerns about random-digit dialed
(RDD) telephone surveys conducted in the United
States because of declining landline frame coverage
brought on by an increase in cell phone only households and diminishing geographic specificity as a result
of telephone number portability. The development and
maintenance of large, computerized address databases
can provide researchers with a relatively inexpensive
alternative to RDD for drawing household samples. In
the United States, address files made available by the
U.S. Postal Service (USPS) contain all delivery
addresses serviced by the USPS, with the exception of
general delivery. Each delivery point is a separate
record that conforms to all USPS addressing standards,
making the files easy to work with for sampling
purposes.
Initial evaluations of the USPS address frame
focused on using the information to reduce the costs
associated with enumeration of primarily urban households in area probability surveys or in replacing traditional counting and listing methods altogether. These
studies showed that for a survey of the general population, the USPS address frame offers coverage of
approximately 97% of U.S. households. The frames
standardized format also facilitates geocoding of
addresses and linkage to other external data sources,
such as the U.S. Census Zip Code Tabulation Areas
Advance Contact
Further Readings
ADVANCE CONTACT
Advance contact is any communication to a sampled
respondent prior to requesting cooperation and/or
presenting the respondent with the actual survey task
in order to raise the likelihood (i.e., increase the
response propensity) of the potential respondent cooperating with the survey. As explained by LeverageSaliency Theory, a respondents decision to participate in research is influenced by several factors,
including his or her knowledge of and interest in the
survey research topic and/or the surveys sponsor. A
researcher can improve the likelihood of a respondent
agreeing to participate through efforts to better inform
the respondent about the research topic and sponsor
through the use of advance contact. Factors in considering the use of advance contacts are (a) the goals of
the advance contact and (b) the mode of contact.
The goals of advance contact should be to educate
and motivate the respondent to the survey topic and
the sponsor in order to improve the likelihood of
cooperation with the survey task. The cost and additional effort of advance contact should be balanced
against the cost effects of reducing the need for
refusal conversion and lessening nonresponse. The
first goal of educating respondents is to help them
better understand or identify with the topic and/or the
sponsor of the research through increasing awareness
and positive attitudes toward both. Respondents are
more likely to participate when they identify with the
research topic or sponsor. Additionally, it is an opportunity to inform the respondent of survey dates,
modes of survey participation (e.g., Watch your U.S.
mail for our questionnaire that will be arriving in
a first-class [color and size description of mailer]
around [anticipated arrival date]), and contact information to answer questions or concerns (e.g., Feel
free to contact us toll-free at [contact number] or via
the Web at [Web site address]). The second goal is
Advance Letter
Further Readings
ADVANCE LETTER
Advance letters (sometimes referred to as prenotification letters) are a means of providing potential
respondents with positive and timely notice of an
impending survey request. The letters often address
issues related to the purpose, topic, and sponsor of the
survey and a confidentiality promise. In some surveys,
advance letters include a token cash incentive. Letters
should be sent by first-class mail and timed to arrive
only days to a week ahead of the actual survey contact. They also may be accompanied by other informational materials, such as study-related pamphlets,
which are typically designed to address questions
about survey participation frequently asked by respondents and, in the case of ongoing or longitudinal
surveys, provide highlighted results from previous
administrations of the survey.
10
Advance Letter
Long used in survey research efforts, advance letters require only that a mailable address be associated
with the sampled unit, regardless of whether that unit
is a dwelling, telephone number, or name on a listing.
Advance letters are used in conjunction with nearly
all survey modes, including face-to-face, telephone,
mail, and some Web-based surveys. For example,
with random-digit dialed (RDD) telephone surveys,
sampled telephone numbers are often cross-referenced
with electronic telephone directories and other commercially available databases to identify addresses. In
a typical RDD sample, addresses can usually be identified for 5060% of the eligible telephone numbers.
Unfortunately, advance letters cannot be used with
survey designs when an identifiable address cannot be
determined, such as when respondents in the United
States are sampled from a frame of cellular telephone
numbers or email addresses. Typically, such frames
do not include mailable address information.
In terms of content, most of the research literature
and best practice recommendations suggest that an
advance letter be brief, straightforward, simple, and
honest, providing general information about the survey
topic without too much detail, especially if the topic is
sensitive. The letter should build anticipation rather than
provide details or conditions for participation in the survey. Highlighting government sponsorship (e.g., state),
emphasizing confidentiality of the data, expressing
advance appreciation, and supplying a toll-free telephone number are typically seen as desirable features.
Advance letters can also be used to adjust a variety of
other influences known to affect survey participation,
including use of official stationery of the sponsoring
organization to convey legitimacy; having the letter
signed by a person in authority; personalizing the name
(when available) and address of the sample household
and salutation of the letter to convey the importance
of the survey; and providing basic information about
the nature of the survey questionnaire to educate the
household with regard to the task being requested.
Additionally, by alerting a household in advance to an
upcoming survey request, the letter can be consistent
with the norms of politeness that most unannounced
contacts from salespersons (or even criminals or
scam artists) often violate. Furthermore, advance letters
can have a positive effect on the interviewers conducting surveys, enhancing their own confidence in seeking
a households participation in a survey.
Postcards are sometimes used in place of actual
letters and are considerably less expensive to produce.
Agenda Setting
11
use of advance letters thoroughly within their particular research context to determine whether the gains
from the reduction of nonresponse error outweigh the
costs or potential for survey bias.
Michael Link
See also Advance Contact; Incentives; Nonresponse;
Nonresponse Error
Further Readings
AGENDA SETTING
Agenda setting refers to the media effects processes
that lead to what are perceived as the most important
problems and issues facing a society. It is an important component of public opinion, and thus measuring
it accurately is important to public policy deliberation
and formation and to public opinion research.
The power to set the public agendadetermining
the most important problems for discussion and
actionis an essential part of any democratic system.
This is so because agenda control is a fundamental
lever of power and it is necessary to achieve citizen
desires. If democracy is to be a meaningful concept, it
must include the right of citizens to have their preferred agenda of topics taken up by policymakers.
Leaders who ignore the topics that citizens consider
important are not representing the people adequately.
12
Agenda Setting
Concepts
Popularized in the mass communication and public
opinion literature, agenda setting has for many years
been nearly synonymous with studying public issues
in a public opinion context. In the study of public
opinion, agenda setting refers to a type of media
effect that occurs when the priorities of the media
come to be the priorities of the public. Broadly speaking, the agenda-setting process has three parts:
1. Public agenda setting examines the link between
issues portrayed in the mass media and the issue
priorities of the public.
2. Policy agenda setting studies are those examining the activities of public officials or legislatures,
and sometimes the link between them and media
content.
3. Media agenda setting examines the antecedents of
media content that relate to issue definition, selection, and emphasis. This can typically include the
individual and organizational factors that influence
decision making in newsrooms and media organizations generally.
Agenda setting deals fundamentally with the importance or salience of public issues as measured in the
popular public opinion polls. Issues are defined similarly to what the polls measurethe economy, trust in
government, the environment, and so onand this
ensures comparability to the polling data. The innovation of conceptualizing all the complexity and controversy of a public issue in an abstract manner makes it
possible to study issues over long periods of time. But
it also tends to produce studies that are quite removed
from the very things that made the issues controversial
and interesting. Removing details also removes most
conflict from the issue. What is left is really just the
topic or shell of the issue, with very little content.
Most of the early agenda-setting research focused
on the correspondence of aggregate media data and
aggregated public opinion data. The rank-order correlations among the two sets of agendas measured the
agenda-setting effect. This trend continues to the present day. While it is important to try to understand
the connections between media and social priorities,
agenda-setting research as it is presently constituted
does not do a very good job of explaining how social
priorities are really determined. This is so because
most agenda-setting research focuses on media as the
Background
A famous case study of agenda setting that was developed by Christopher Bosso illustrates this concern
with identifying the correct independent and control
variables in agenda-setting research. In the case of the
Ethiopian famine in 1984, the problem had been at
a severe level for some time. Some BBC journalists
traveling in Africa filmed sympathetic reports of
starving Ethiopians and interested a major American
television network in them because of the personal
interest of one news anchor. American television
news aired the British footage and attracted tremendous interest and more coverage by the other networks and eventually the world. The Ethiopian
famine became the subject of worldwide headlines
and media attention, from which followed a number
of very high-profile food relief efforts and other innovations in fundraising in a global attempt to solve the
problem. Of course, the problem had existed long
before the media spotlight focused on the problem
and continued long after the media tired of the story
and moved on. While the audience might conclude
that the problem was solved, it was not. But the
abrupt spike of interest, as measured by public opinion polls, and subsequent decline and its lack of correlation with the real-world conditions is a classic
example of media agenda setting as a unique force,
Agenda Setting
13
Looking Forward
Studying agenda setting in the new information
environment where Search Is Everything will be
increasingly challenging. One way to address this
14
Aided Recall
issue is to focus more research attention on the political economy of search engines that are delivering
news to many people and the agenda-setting power
of their methods to determine who sees what news.
Search engines operate via proprietary algorithms
that they apply to the portion of the Internet that they
are able to map and index. When a user enters a topic
into a search engine, the search engine returns a prioritized listan agendaof results. Unfortunately,
how this agenda is set is anything but transparent. In
fact, search results vary, sometimes dramatically,
from search engine to search engine based on the
nature of the formulae used to find the results and
prioritize them. Most search engines collect fees from
clients who want their search terms to appear higher
on the prioritized order of results. Some disclose that
a given sites result is a sponsored link, but this is
not a universal practice. In other words, commercial
interests often buy the answer to a given search.
Search results can also be influenced without anyone
making a payment directly to a search engine.
Results are gamed by firms known as optimizers,
which collect fees in exchange for figuring out ways
to move certain results higher on the list. They do
this through painstaking attempts to learn key elements of the algorithms used to determine the agenda
order and then making sure their clients sites meet
these criteria.
In an information environment that increasingly
depends on search technology, the political economy
of search is an understudied but key component of
what the public knows and thinks is important: the
public agenda. In todays fracturing media environment, consumers and citizens rely increasingly on
standing orders for customized information that meets
certain specifications. How that information is searched
and delivered will be an increasingly significant issue
for political and commercial interests as well as public
opinion researchers seeking to understand the publics
priorities. A challenge to survey researchers will be to
understand this process and use it to design studies that
incorporate an up-to-date understanding of the media
system. This can help assure the relevance of the
agenda-setting model for years to come.
Gerald M. Kosicki
See also Issue Definition (Framing); Multi-Level Integrated
Database Approach (MIDA); Priming; Public Opinion;
Public Opinion Research
Further Readings
AIDED RECALL
Aided recall is a question-asking strategy in which
survey respondents are provided with a number of
cues to facilitate their memory of particular responses
that are of relevance to the purpose of the study.
Typically such cues involve asking respondents separate questions that amount to a list of subcategories of
some larger phenomenon. The purpose of listing each
category and asking about it separately is to assist the
respondent by providing cues that will facilitate memory regarding that particular category.
Applications
This question technique is most appropriate when the
researcher is most concerned about completeness and
accuracy and more worried about underreporting
answers than in overreporting. Aided recall question
strategies structure the range of possible answers completely and simplify the task for the respondent. They
also simplify the investigators work in gathering and
analyzing the data, since no recording or coding of
Aided Recall
15
16
Aided Recognition
Further Readings
AIDED RECOGNITION
Within the context of survey research, aided recognition is a form of aided recall in which a survey
respondent is asked if she or he was aware of something prior to being asked about it in the survey questionnaire. The stimulus that the respondent is asked
about typically is the name of a company or of a product or service. In some cases, other than in telephone
surveys, a picture can be shown as the stimulus. In
telephone, Internet, and in-person surveys, audio can
serve as the stimulus for the respondent.
The common form for measuring aided recognition
is to use a closed-ended survey question along the
following lines:
Further Readings
ALGORITHM
Algorithm
Ginori
Lambertes
Castellan
Pucci
Pucci
Peruzzi
Barbadori
Albizzi
Guadagni
Bischeri
Albizzi
Ridolfi
Salviati
Bischeri
Ridolfi
Medici
Guadagni
Medici
Salviati
Strozzi
Tornabuon
Acciaiuol
Barbadori
Lambertes
Castellan
Figure 1
Ginori
Strozzi
Pazzi
Tornabuon
Peruzzi
17
Acciaiuol
Pazzi
Further Readings
18
Note that if 20 statistical models are run, for example, then one should expect one of them to produce
a significant result when alpha is set at 0.05, merely by
chance. When multiple tests are performed, investigators sometimes use corrections, such as the Bonferroni
correction, to adjust for this. In and of itself, specifying
a stringent alpha (e.g., 0.01 or 0.001) is not a guarantee
of anything. In particular, if a statistical model is misspecified, alpha does not change that.
Only models in which a given alpha is satisfied tend
to reach consumers, who tend to be exposed to scientific
studies via referred journal articles. This phenomenon
is known as publication bias. The reader of a study
may find it persuasive because the p-value is smaller
than alpha. The persuasion derives from the small likelihood (alpha) of the data having arisen by chance if the
null hypothesis is correct (the null hypothesis is therefore rejected). But even at a small level of alpha, any
given result may be likely by sheer chance if enough
models have been run, whether or not these models are
reported to the reader. Even an arbitrarily small alpha
is meaningless as a probability-based measure if many
models are run and only the successful ones revealed. A
small level of alpha, taken by itself, is therefore not an
indicator that a given piece of research is persuasive.
Statistical models are sometimes used for purely
descriptive purposes, and in such contexts no level of
alpha need be specified.
Andrew Noymer
See also Null Hypothesis; Probability; p-Value; Standard
Error; Type I Error
Further Readings
ALTERNATIVE HYPOTHESIS
An alternative hypothesis is one in which a difference
(or an effect) between two or more variables is
19
Further Readings
Founding of AAPOR
The redeployment of U.S. industrial power to the production of consumer goods after World War II stimulated interest in a wide variety of survey applications,
particularly market and media research. The economy
20
needed mass media to sell the output of mass production, and survey research made the marketing process
efficient.
Harry Field, who had founded the National Opinion
Research Center (NORC) at the University of Denver
in 1941, saw the wars end as an opportunity to assemble the diverse strands of survey research. He organized a national conference to open on July 29, 1946.
The site was Central City, Colorado, 42 miles of winding mountain road from downtown Denver and 8 hours
by reciprocating-engine airliner from New York
City. Field invited 264 practitioners, and 73 attended.
Don Cahalan, who coordinated the event, classified
the attendees: 19 from media, 18 academics, 13 commercial researchers, 11 from nonprofits, 7 government
employees, 3 from advertising agencies, and 2 others.
A key session on technical and ethical standards in
public opinion research was led by George Gallup,
Clyde Hart of the Office of Price Administration, Julian
Woodward of Elmo Ropers organization, and Field. In
a paper that Paul Sheatsley would later describe as
remarkably prescient, Woodward foresaw expanded
use of polls to provide feedback for elected officials
and to test public knowledge. Competition among polls
would create pressure to minimize costs, but because
such polls would play an important role in public service by providing a continuing referendum on policy
and consumer issues, they would require standards of
quality that would justify the responsibilities which
will increasingly be theirs.
After 3 days of discussion, the conference decided
that a second meeting should be held in 1947. Harry
Field was to lead it, but he died in a plane crash in
France only a month later. Clyde Hart became director of NORC and organizer of the second conference.
For the second meeting, Hart and the sponsoring committee chose Williamstown, Massachusetts, in
the northwest corner of the state. Julian Woodward
assembled a program that drew 194 participants.
While the Central City meeting had envisioned an
international confederation of existing survey research
organizations, the Williamstown meeting took the
unexpected step of forming a membership organization
instead. A constitution was drafted, and the name
American Association for Public Opinion Research
was approved after assurances were made that an international organization would be formed the next day.
Since that time, AAPOR and the World Association
for Public Information Research (or WAPOR) have
combined their meetings in even-numbered years.
Mission of AAPOR
One function of a professional association is to codify
the professions self-definition by setting standards of
ethics and technical competence. When AAPOR was
founded, the main technical debate was between the
advocates of quota sampling and those who preferred
probability sampling. It quickly became clear that setting rules of scientific orthodoxy was not practical,
but there was support for setting moral standards, particularly regarding transparency in research methods.
The other key aspect of professionalism is advancement of the professions body of knowledge. The constitution adopted at Williamstown provided for the
dissemination of opinion research methods, techniques and findings through annual conferences and an
official journal and other publications. Public Opinion
Quarterly had been started in 1937 at Princeton
University, and AAPOR designated it the official journal of the association, paying a fee to have its conference proceedings published there. In 1968, the journal
was acquired by Columbia University, and title was
transferred to AAPOR in 1985.
21
22
Content
AMERICAN COMMUNITY
SURVEY (ACS)
The American Community Survey (ACS) is an ongoing national survey conducted by the U.S. Census
Bureau. Part of the federal decennial census program,
the ACS was designed to replace the long form or sample portion of the decennial census, starting in 2010.
By conducting monthly surveys of a sample of the
U.S. population, the ACS collects economic, social,
and housing information continuously rather than every
10 years. The ACS does not replace the decennial enumeration, which is constitutionally mandated for apportioning congressional seats. It is expected that the ACS
program will improve the quality of the decennial census, because the elimination of long-form questions
should increase response and allow more focused nonresponse follow-up.
Eventually, the ACS will supply data for the same
geographic levels that have traditionally been available
from the census long form, including sub-county areas
such as census tracts and block groups. The ACS sample sizes are not large enough to support annual releases
for all geographic areas. For smaller areas, the ACS
data are averaged over multiple years. Annual data are
available for populations of 65,000 or more. Annual
estimates from the 2005 ACS were released in 2006.
Three-year averages will be released for areas with
20,000 or more people, and 5-year averages will be
available for the remaining areas. Three-year averaged
data will be available starting in 2008, and the 5-year
averaged data will first be available in 2010. After
2010, data for all geographic data will be updated annually, using the rolling 3- or 5-year averages for the
smaller areas.
The Census Bureau has conducted ACS tests in
select counties since the mid-1990s. In 2005, the
housing unit sample was expanded to its full size,
which includes all U.S. counties and equivalents, the
District of Columbia, and Puerto Rico. The ACS was
expanded to include group quarters facilities in 2006.
As an ongoing program, funding for the American
Community Survey must be approved by Congress
annually as part of the federal budget process. Current
23
for the CAPI phase. The responses from the nonresponse follow-up are weighted up to account for the
nonrespondents who are not contacted.
Currently, standard ACS questionnaires are produced in English and in Spanish. English forms are
mailed to homes in the United States, and Spanish
forms are mailed to homes in Puerto Rico. ACS questionnaires include phone numbers that recipients can
call for assistance in filling out the questionnaire.
English forms include these phone assistance instructions in both English and Spanish. Persons in the
United States may request the Spanish language form.
24
For the 2005 ACS, group quarters were not sampled because of budget restrictions. Thus, the published data contain only the household population.
Some data users did not understand these universe
differences and made direct comparisons to decennial
data that represented the total population.
Although there are a number of considerations for
ACS data users, when used properly, the ACS supplies
reliable and timely information to help users make
better decisions. Many of these issues should be worked
out over time as more information is released and data
users become more familiar with the data limitations.
Christine Pierce
See also Census; Computer Assisted Personal Interviewing
(CAPI); Computer-Assisted Telephone Interviewing
(CATI); Nonresponse; Sampling Error; U.S. Bureau of
the Census
Further Readings
25
26
Further Readings
Anonymity
Further Readings
ANONYMITY
Anonymity is defined somewhat differently in survey
research than in its more general use. According to
the American Heritage Dictionary, anonymity is the
quality or state of being unknown or unacknowledged.
However, in survey research, the concept is more
complex and open to interpretation by the various
organizations that conduct surveys.
In the form closest to the standard definition, anonymity refers to data collected from respondents who
are completely unknown to anyone associated with
the survey. That is, only the respondent knows that
he or she participated in the survey, and the survey
researcher can not identify the participants. More
often, anonymity refers to data collected in surveys in
which the respondents are de-identified and all possible identifying characteristics are separated from the
publicly available data. Many survey research organizations provide data and data summaries to individuals outside their organizations. These data are
27
28
first questionnaire did not reach the survey organization and respond a second time.
A similar problem that leads to inappropriate
follow-up requests occurs with Internet surveys that
do not use authentication. These surveys are open to
anyone with Internet access. While some limitations
can be applied to prevent unauthorized access, they
are minimally effective. The survey data and results
are harmed if those not selected for the sample are
included in the survey data or respondents participate
more than once.
Many survey organizations conduct random checks
on survey interviewers to determine whether the interview was conducted and/or was conducted correctly.
Survey procedures that ensure anonymity simultaneously prevent these important procedures for verification and monitoring survey quality.
Anonymity is important for the success of surveys
under certain conditions. Anonymity can help to protect privacy so that respondents can reveal information that cannot be identified to them. When the
survey poses exceptional risks for participants, anonymity may improve cooperation. When a survey
asks especially sensitive questions, anonymity will
likely improve reporting of stigmatizing behaviors or
unpopular attitudes and opinions. Surveys of sexual
behaviors, illegal drug use, excessive alcohol use, illegal activities such as tax evasion, and other possibly
stigmatizing activities can benefit from providing anonymity to the respondents.
Some participants would be reluctant to discuss attitudes and opinions on such topics as race, politics, and
religion unless they believed their responses could not
be identified to them. Similarly, respondents have a
reduced impetus to provide socially desirable responses
in anonymous surveys. For example, respondents may
be more willing to admit to negative attitudes toward
minority groups if the survey is anonymous.
For these surveys, the risk of exposure or harm to
respondents needs to be balanced against the loss of
quality control procedures needed to ensure survey
integrity. Little empirical evidence is available to
indicate the overall importance of anonymity to survey cooperation and survey quality, but survey
researchers regularly attempt to use procedures that
can ensure anonymity in data collection.
John Kennedy
See also Confidentiality; Ethical Principles; Verification
Approval Ratings
29
Further Readings
APPROVAL RATINGS
Approval ratings are a particularly versatile class of
survey questions that measure public evaluations of
a politician, institution, policy, or public figure as well
as judgments on public issues. This type of question
was first developed by the Gallup Organization in the
late 1930s to measure public support for the U.S.
president. Today, the presidential job approval question is believed to be the single most frequently asked
question in political surveys. Many members of the
political community, journalists, and academics consider the job approval question to be among the most
reliable and useful barometer of a presidents public
standing.
30
Approval Ratings
The basic form reads: Do you approve or disapprove of the way (name of president) is handling his
job as president? Some polling organizations use
slightly different wording, but most have adopted
the Gallup language, in part so they can compare
the results with Gallups historic data without
having to worry about the effect of wording differences. A variation of the question is frequently used
to measure a presidents performance in specific
domains, as with this trend question asked by The
Los Angeles Times: Do you approve or disapprove
of the way George W. Bush is handling the war on
terrorism?
The questions basic format is easily altered to
evaluate the performance of other public officials or
institutions, such as Congress, individual members of
a presidents cabinet, or state and local officials, as
well as other prominent leaders. It also is a useful
measure of public attitudes toward government programs or policies and frequently is used to measure
attitudes toward a range of nonpolitical issues, such
as this question by USA Today and Gallup: Do you
approve or disapprove of marriage between blacks
and whites?
Polling organizations often include language that
measures the intensity of approval or disapproval, as
with this approval question asked in 2005 by the Pew
Center for the People and the Press: There is now
a new Medicare law that includes some coverage of
prescription drug costs. Overall, would you say you
strongly approve, approve, disapprove, or strongly
disapprove of the way Medicare will now cover prescription drug costs? These strength-of-support measures allow survey respondents to indicate a degree of
approval or disapproval, and thus are more sensitive
to change in public attitudes. For example, declining
public support for elected officials is often first seen
as a decline among those who strongly approve of
him or her and a comparable increase in those who
somewhat support the official, with little or no decline
in the overall support.
Area Frame
Retrospective Judgments
Approval questions sometimes are used to measure
the publics retrospective judgments. USA Today and
Gallup asked this question in 1995 on the 50th anniversary of the end of World War II: As you may
know, the United States dropped atomic bombs on
Hiroshima and Nagasaki in August 1945 near the end
of World War II. Looking back, would you say you
approve or disapprove of using the atomic bomb on
Japanese cities in 1945? Such a format has provided
an interesting view of the American publics retrospective judgment of its presidents. When Gallup
asked the public in 2002 if they approved or disapproved of the job done by each of the presidents in the
postWorld War II era, President John F. Kennedy
topped the list with 83% approval, followed by
Ronald Reagan (73%), and Jimmy Carter (60%).
The retrospective approval question is regularly
asked by Gallup. The results over time suggest that an
elected officials job approval rating can change significantly even after he or she leaves office. In 2002,
Gallup found that 69% of the public approved, in retrospect, of the job that George H. W. Bush had done
as president. But in 2006, the elder Bushs job rating had declined from 69%, third-highest behind
Kennedy and Reagan, to 56%. Conversely, President
Clintons retrospective job approval rating increased
from 51% in 2002 to 61% four years later.
31
Further Readings
AREA FRAME
An area frame is a collection of well-defined land
units that is used to draw survey samples. Common
land units composing an area frame include states,
provinces, counties, zip code areas, or blocks. An area
frame could be a list, map, aerial photograph, satellite
image, or any other collection of land units. Area
frames play an important part in area probability samples, multi-stage samples, cluster samples, and multiple frame samples. They are often used when a list of
ultimate sampling units does not exist, other frames
have coverage problems, a geographically clustered
sample is desired, or a geographic area is the ultimate
sampling unit.
32
Area Frame
Terminology
There are several terms that are used in relation to
area probability sampling that are not frequently used
except in area probability and other multi-stage sampling designs. In area probability samples, the units
formed for selection at the first stage are called primary sampling units (PSUs) and those for the second
stage of selection are called secondary sampling units
(SSUs). The units that are actually selected at these
stages are called, respectively, primary and secondary
selections. If there are more than three stages, the
units for the third stage may be called tertiary selection units or third-stage selection units. The final unit
to be selected is called the ultimate sampling unit.
PSUs, SSUs, and perhaps other units are often
selected using probability proportional to size (PPS)
methods. In these cases, each selection unit is
assigned a measure of size (MOS). The MOS usually
represents the size of the study population found in
the unit. The MOS may be known or estimated or
may be a function such as the square root of the population total or a composite (e.g., the sum of the total
number of males plus 1.5 times the total number of
females).
33
Disadvantages of
Area Probability Samples
There are two major disadvantages to using an area
probability sample: (1) the increase in variance, often
called a design effect (deff) that comes from the use of
multi-stage or clustered designs, and (2) the increased
cost that is mostly associated with using in-person data
collection (although not all studies with area probability sample designs use in-person data collection).
The design effect due to clustering arises from the
fact that the units of observation in the study, be they
individuals, households, or businesses, are not selected
independently, but rather their selection is conditional
on the cluster (in this case a geographic area) in which
they are found being selected. In area probability
34
studies. If the PSUs are too large, the sample may not
be able to include a desirable number of selections.
On the other hand, small PSUs may be more homogeneous than desired. A good approach is to have PSUs
large enough that sampling the SSUs and subsequent
units can introduce heterogeneity into the sample
within each PSU.
After defining the PSUs, at least in general terms,
strata are defined. Part of the stratification process
involves defining certainty selections, that is, PSUs
that are large enough that they are certain to be
selected. Each certainty PSU becomes its own stratum. One can think of certainty selections in terms of
a sampling interval for systematic selection. To this
end, define the interval (I) as the sum of the MOS for
all PSUs in the population (MOSTOT) divided by the
number of PSUs to be selected (n_PSU):
I = MOSTOT=n PSU:
Thus, any PSU with an MOS at least as large as I
would be certain to be selected. If there are certainty
selections, then it is advisable to set the cutoff for designating a PSU as a certainty selection as a fraction of
I (perhaps 0.8 times I). The reason for this is that
once the certainty PSUs are removed from the population, the sum of the MOS becomes smaller, and possibly additional PSUs will become large enough to be
certainty selections: the sum of the remaining MOS
can be designated MOSTOT* and the number of
PSUs to be selected after the certainty selections are
made as n_PSU_noncert. If one calculates a new sampling interval I = MOSTOT =n PSU noncert, it is
possible that there will be new certainty selections the
MOS for which is equal to or greater than I . Setting
the certainty cutoff as a fraction of I usually avoids
the problem of having to go through several iterations
of removing certainty PSUs from the pool.
Once all certainty selections have been defined, the
other PSUs on the frame are grouped into strata. As
for any study, the strata should be related to study
objectives, especially if subgroups of the population
are to be oversampled. Area probability samples are
often stratified geographically. The number of strata
for the first stage is limited by the number of primary
selections to be made. To estimate sampling variance,
each stratum should be allocated at least two primary
selections. Some deeply stratified designs call for one
selection per stratum, but in such a design, strata will
have to be combined for variance estimation.
Hypothetical Example
of an Area Probability Design
In the United States, many large ongoing surveys
operated or funded by the federal government use
area probability designs. These include surveys of
households or individuals as well as studies of businesses and other establishments. The subject areas of
these surveys range from labor force participation to
health status to energy consumption and other topics.
Rather than try to examine the details of such sample
designs, what follows is a hypothetical (generic)
example of a sample design for a survey in which
adults living in households comprise the target population and in-person data collection is required.
Although there could be more stages of sampling, this
example deals with four: (1) at the first stage, PSUs
will be defined as large geographic areas; (2) in the
second stage, somewhat smaller geographic areas will
35
36
Attenuation
Further Readings
ATTENUATION
Attenuation is a statistical concept that refers to
underestimating the correlation between two different
measures because of measurement error. Because no
test or other measurement of any construct has perfect
reliability, the validity of the scores between predictor
Attitude Measurement
correcting the predictor variable. For example, suppose the correlation between scores from a personnel
test (x) and the number of interviews completed in
a week (y) is equal to .20 and the score reliability of
the personnel test is .60. The correction for attenuation would equal .26, using the following equation for
correcting only for the score reliability of the predictor variable:
37
Question Format
People hold attitudes toward particular things, or attitude objects. In question format, an attitude object is
presented as the stimulus in an attitude question, and
respondents are asked to respond to this stimulus.
Consider the following question:
Do you approve, disapprove, or neither approve nor
disapprove of the way the president is handling
his job?
rxy
rxyc = pffiffiffiffiffi :
rxx
Paul Muchinsky summarized the recommendations
for applying the correction for attenuation. First, the
corrected correlations should neither be tested for statistical significance nor should they be compared with
uncorrected validity coefficients. Second, the correction for attenuation does not increase predictive validity of test scores. Donald Zimmerman and Richard
Williams indicated that the correction for attenuation
is useful given high score reliabilities and large sample sizes. Although the correction for attenuation
has been used in a variety of situations (e.g., metaanalysis), various statisticians have suggested caution
in interpreting its results.
N. Clayton Silver
See also Correlation; Cronbachs Alpha; Reliability; Validity
Further Readings
ATTITUDE MEASUREMENT
Researchers from a variety of disciplines use survey
questionnaires to measure attitudes. For example,
political scientists study how people evaluate policy
alternatives or political actors. Sociologists study how
ones attitudes toward a social group are influenced
by ones personal background. Several different methods, including multi-item measures, are used to measure attitudes.
The attitude object in this question is the presidents handling of his job. The respondents must consider what they know about how the president is
handling his job and decide whether they approve, disapprove, or neither approve nor disapprove. Another
possible closed-ended format is to turn the question
into a statement, and ask the respondents whether they
agree or disagree with a declarative statement, for
example, The president is doing a good job. However,
some research indicates that the agreedisagree format
produces acquiescence bias or the tendency to agree
with a statement regardless of its content. Yet another
closed-ended format is to ask the respondents to place
themselves on a continuum on which the endpoints
are labeled. For example, one could ask, How do you
feel the president is handling his job? and ask the
respondents to place their opinions on a scale, from
0 being poor to 10 being excellent.
Researchers measuring attitudes must decide how
many scale points to use and how to label them. Five
to seven scale points are sufficient for most attitude
measures. Assigning adjectives to scale points helps
define their meaning, and it is best if these adjectives
are evenly spaced across the continuum.
Sometimes a researcher wants to understand the
preferences of respondents in more depth than a single
closed-ended question will allow. One approach for
this purpose is to ask the question in an open-ended
format such as, If the Democratic Party were a person,
what traits would you use to describe it? Here, the
Democratic Party is the attitude object or stimulus.
An advantage of the open format is that the answers
are not limited to the researchers own categories.
The answers to such a question will provide insights
into whether or not the respondent holds positive,
negative, or conflicted attitudes toward the attitude
object (the Democratic Party, in this example).
However, open-ended responses can be very time
38
Attitude Measurement
Multi-Item Scales
Another way to measure attitude strength is by using
multi-item scales. All scaling procedures require the
creation of a pool of items from which a respondent
is asked to select a final set according to some criteria.
For example, Thurstone scaling first requires a set of
judges to rate or compare several statements on a continuum from unfavorable to favorable toward the attitude object. The judges scores for each statement are
then averaged to align the statements along the attitude continuum. These average scores from the judges
become the scale values for each statement. Next, the
statements are administered to the respondents. The
respondents are asked whether they agree with the
statements. The respondents score is then a function
of the scale values for the statements that the respondents agreed with.
Guttman scaling is similar, except that it requires
an assumption about the pattern of responses that is
rarely met in practice. The assumption is that the data
set associated with a Guttman scale has a cumulative
structure, in the following sense: For any two persons
Further Readings
Attitudes
ATTITUDES
Attitudes are general evaluations that people hold
regarding a particular entity, such as an object, an
issue, or a person. An individual may hold a favorable
or positive attitude toward a particular political candidate, for example, and an unfavorable or negative attitude toward another candidate. These attitudes reflect
the individuals overall summary evaluations of each
candidate.
Attitude measures are commonplace in survey
research conducted by political scientists, psychologists, sociologists, economists, marketing scholars,
media organizations, political pollsters, and other academic and commercial practitioners. The ubiquity of
attitude measures in survey research is perhaps not
surprising given that attitudes are often strong predictors of behavior. Knowing a persons attitude toward
a particular product, policy, or candidate, therefore,
enables one to anticipate whether the person will
purchase the product, actively support or oppose the
policy, or vote for the candidate.
What Is an Attitude?
An attitude is a general, relatively enduring evaluation
of an object. Attitudes are evaluative in the sense that
they reflect the degree of positivity or negativity that
a person feels toward an object. An individuals attitude toward ice cream, for example, reflects the extent
to which he or she feels positively toward ice cream,
with approach tendencies, or negatively toward ice
cream, with avoidance tendencies. Attitudes are general in that they are overall, global evaluations of an
object. That is, a person may recognize various positive and negative aspects of ice cream, but that persons attitude toward ice cream is his or her general
assessment of ice cream taken as a whole. Attitudes
are enduring in that they are stored in memory and
they remain at least somewhat stable over time. In
39
this way, attitudes are different from fleeting, momentary evaluative responses to an object. Finally, attitudes
are specific to particular objects, unlike diffuse evaluative reactions like moods or general dispositions.
Given this conceptualization, attitudes are most
commonly measured by presenting respondents with
a bipolar rating scale that covers the full range of
potential evaluative responses to an object, ranging
from extremely negative to extremely positive, with
a midpoint representing neutrality. Respondents are
asked to select the scale point that best captures their
own overall evaluation of a particular attitude object.
In the National Election Studies, for example,
respondents have often been asked to express their attitudes toward various groups using a feeling thermometer ranging from 0 (very cold or unfavorable) to 100
(very warm or favorable), with a midpoint of 50 representing neither warmth nor coldness toward a particular
group (e.g., women). By selecting a point on this scale,
respondents reveal their attitudes toward the group.
40
Attitudes
Attitudes
An Important Caveat
It is important to note, however, that attitudes do not
always exert such powerful effects. In fact, attitudes
sometimes appear to have negligible influence on
thought and behavior. Recently, therefore, a central
focus within the attitude literature has been on identifying the conditions under which attitudes do and do
not powerfully regulate cognition and behavior. And
indeed, great strides have been made in this effort.
It has been established, for example, that attitudes
influence thought and behavior for some types of people more than others, and in some situations more
than others. More recently, attitude researchers have
determined that some attitudes are inherently more
powerful than others. These attitudes profoundly
influence our perceptions of and thoughts about the
world around us, and they inspire us to act in attitudecongruent ways. Further, these attitudes tend to be tremendously durable, remaining stable across time and
in the face of counter-attitudinal information. Other
attitudes do not possess any of these qualitiesthey
exert little influence on thought and behavior, they
fluctuate over time, and they change in response to
persuasive appeals.
The term attitude strength captures this distinction,
and it provides important leverage for understanding
and predicting the impact of attitudes on thought and
behavior. That is, knowing an individuals attitude
toward a particular object can be tremendously useful
in predicting his or her behavior toward the object,
but it is just as important to know the strength of the
attitude.
Fortunately, several attitudinal properties have
been identified that differentiate strong attitudes from
weak ones, enabling scholars to measure these properties and draw inferences about the strength of a given
attitude (and therefore about its likely impact on
thought and behavior). For example, strong attitudes
tend to be held with great certainty, based on a sizeable store of knowledge and on a good deal of prior
thought, and considered personally important to the
attitude holder. Thus, measures of attitude certainty,
attitude-relevant knowledge, the extent of prior
thought about the attitude object, and attitude importance offer valuable insights regarding the strength of
individuals attitudes.
41
Further Readings
42
Attitude Strength
ATTITUDE STRENGTH
Attitude strength refers to the extent to which an attitude is consequential. Compared to weak attitudes,
strong attitudes are more likely to remain stable over
time, resist influence, affect thought, and guide
behavior.
Researchers have identified several attributes
related to attitude strength. Several frequently studied
attributes are well suited for survey research because
they can be assessed directly using a single self-report
survey item. For example, attitude extremity can be
conceptualized as the absolute value of an attitude
score reported on a bipolar scale that is centered at
zero and ranges from strongly negative to strongly
positive. Attitude importance is the significance people
perceive a given attitude to have for them. Attitude
certainty refers to how sure or how confident people
are that their attitude is valid. Each of these attributes
can be measured with straightforward questions, such
as, To what extent is your attitude about X positive or
negative?; How important is X to you personally?;
and How certain are you about your attitude about X?
Recent research suggests that attitude strength also is
related to the extent that individuals subjectively associate an attitude with their personal moral convictions.
Other attributes can be assessed directly, with selfreport survey items, or indirectly, with survey measures that allow researchers to infer the level of the
attribute without relying on peoples ability to introspect. For example, knowledge is the amount of information people associate with an attitude. Knowledge
often is assessed by quizzes or by asking people to
recall and list facts or experiences they relate to the
attitude object. In a similar way, ambivalence, or the
extent that people feel conflicted about a target, can
be measured by asking people to list both positive
and negative thoughts about the attitude object.
Most attitude strength research has assessed the
association between attributes and characteristics of
Further Readings
ATTRITION
Unit nonresponse is a problem for any type of survey;
however, unit nonresponse in panel studies can be
a more severe problem than in cross-sectional studies.
Attrition
43
The main goal of panel management or panel maintenance is to maintain participation of all sample members in the panel study after the initial wave. The
specific techniques to reduce attrition in panel studies
are focused on locating the sample unit and establishing sufficient rapport with the sample units to secure
their continued participation. Panel studies can keep
contact with the sample units and keep them interested
in participating in the panel study by adopting a good
panel maintenance plan and employing techniques of
tracking and tracing. Acquiring detailed contact information, the organization of contact efforts, hiring
skilled interviewers, and retaining staff over time are
important components of a good panel maintenance
plan. Tracking procedures aim to maintain contact with
sample units in the period between waves in order to
update addresses between interviews so that a current
or more recent address is obtained for each sample unit
prior to conducting the interview. Tracking procedures
are adopted in an attempt to find the missing sample
units and are used at the point of data collection when
the interviewer makes his or her first call, discovers the
sample member has moved, and tries to find a new
address or telephone number.
The second approach to addressing attrition is to
calculate adjustment weights to correct for possible
attrition bias after the panel study has been conducted.
Since nonresponse may occur at each successive wave
of data collection, a sequence of nonresponse adjustments must be employed. A common procedure is
first to compute adjustment weights for nonresponse
in the initial wave. At Wave 2, the initial weights are
adjusted to compensate for the sample units that
dropped out because of attrition in Wave 2; at Wave
3, the Wave 2 weights are adjusted to compensate for
the Wave 3 nonrespondents; and so on. Adjustment
weighting is based on the use of auxiliary information
available for both the sample units that are retained
and the sample units that dropped out because of attrition. However, for the second and later waves of
a panel study, the situation to find suitable auxiliary
information is very different than in cross-sectional
studies or in the initial wave because responses from
the prior waves can be used in making the adjustments for nonresponse in subsequent waves.
Femke De Keulenaer
See also Differential Attrition; Nonresponse Bias;
Nonresponse Rates; Panel; Panel Data Analysis; Panel
Survey; Post-Survey Adjustments; Unit Nonresponse
44
Further Readings
AUDIO COMPUTER-ASSISTED
SELF-INTERVIEWING (ACASI)
Audio computer-assisted self-interviewing (ACASI) is
a methodology for collecting data that incorporates
a recorded voice into a traditional computer-assisted
self-interview (CASI). Respondents participating in
an ACASI survey read questions on a computer
screen and hear the text of the questions read to them
through headphones. They then enter their answers
directly into the computer either by using the keyboard or a touch screen, depending on the specific
hardware used. While an interviewer is present during
the interview, she or he does not know how the
respondent answers the survey questions, or even
which questions the respondent is being asked.
Typically the ACASI methodology is incorporated
into a longer computer-assisted personal interview
(CAPI). In these situations, an interviewer may begin
the face-to-face interview by asking questions and
recording the respondents answers into the computer
herself or himself. Then in preparation for the ACASI
questions, the interviewer will show the respondent
how to use the computer to enter his or her own
answers. This training may consist solely of the interviewer providing verbal instructions and pointing to
various features of the computer but could also include
a set of practice questions that the respondent completes prior to beginning to answer the actual survey
questions. Once the respondent is ready to begin
answering the survey questions, the interviewer moves
to a place where she or he can no longer see the computer screen but where she or he will still be able to
Further Readings
Auxiliary Variable
AURAL COMMUNICATION
Aural communication involves the transmission of
information through the auditory sensory systemthe
system of speaking and hearing. It usually encompasses both verbal communication and paralinguistic
communication to convey meaning. Aural communication can be used to transmit information independently or in combination with visual communication.
When conducting surveys, the mode of data collection
determines whether information can be transmitted
aurally, visually, or both. Whether survey information
is transmitted aurally or visually influences how
respondents first perceive and then cognitively process information to provide their responses.
Aural communication relies heavily on verbal language when information is transmitted through spoken
words. Additionally, paralinguistic or paraverbal communication, in which information is conveyed through
the speakers voice, is also an important part of aural
communication. Paralinguistic communication can
convey additional information through voice quality,
tone, pitch, volume, inflection, pronunciation, and
accent that can supplement or modify the meaning of
verbal communication. Paralinguistic communication
is an extremely important part of aural communication,
especially in telephone surveys, where visual communication is absent.
Since aural and visual communication differ in
how information is presented to survey respondents,
the type of communication impacts how respondents
initially perceive survey information. This initial step
of perception influences how respondents cognitively
process the survey in the remaining four steps
(comprehension, retrieval, judgment formation, and
reporting the answer). Whereas telephone surveys rely
solely on aural communication, both face-to-face and
Internet surveys can utilize aural and visual communication. Face-to-face surveys rely extensively on aural
communication with the occasional use of visual
45
AUXILIARY VARIABLE
In survey research, there are times when information
is available on every unit in the population. If a variable that is known for every unit of the population is
46
Auxiliary Variable
Stratification
It is often advantageous to divide a population into
homogeneous groups called strata and to select a sample independently from each stratum. Auxiliary information on all population units is needed in order
to form the strata. The auxiliary information can be
a categorical variable (e.g., the county of the unit), in
which case the categories or groups of categories
form the strata. The auxiliary information could also
be continuous, in which case cut points define the
strata. For example, the income of a household or revenue of an establishment could be used to define
strata by specifying the upper and lower limits of
income or revenue for each stratum.
Post-Stratification
If specific auxiliary information is not used in forming
strata or as a measure of size, it can still be used to
adjust the sample weights to improve estimation in
a process called post-stratification.
Michael P. Cohen
See also Bias; Imputation; Post-Stratification; Probability of
Selection; Probability Proportional to Size (PPS)
Sampling; Strata; Stratified Sampling; Systematic
Sampling
Further Readings
B
remain in Iraq until the country is more stable.
What is your opinion on whether the troops should
be withdrawn as soon as possible? Do you Strongly
Agree, Somewhat Agree, Somewhat Disagree, or
Strongly Disagree?
BALANCED QUESTION
A balanced question is one that has a question stem that
presents the respondent with both (all reasonably plausible) sides of an issue. The issue of balance in a survey question also can apply to the response alternatives
that are presented to respondents. Balanced questions
are generally closed-ended questions, but there is nothing inherently wrong with using open-ended questions
in which the question stem is balanced.
For example, the following closed-ended question
is unbalanced for several reasons and will lead to
invalid (biased) data:
Paul J. Lavrakas
47
48
Further Readings
BALANCED REPEATED
REPLICATION (BRR)
Balanced repeated replication (BRR) is a technique
for computing standard errors of survey estimates. It
is a special form of the replicate weights technique.
The basic form of BRR is for a stratified sample with
two primary sampling units (PSUs) sampled with
replacement in each stratum, although variations have
been constructed for some other sample designs. BRR
is attractive because it requires slightly less computational effect than the jackknife method for constructing replicate weights and it is valid for a wider range
of statistics. In particular, BRR standard errors are
valid for the median and other quantiles, whereas the
jackknife method can give invalid results.
A sample with two PSUs in each stratum can be
split into halves consisting of one PSU from each stratum. The PSU that is excluded from a half-sample is
given weight zero, and the PSU that is included is
given weight equal to 2 times its sampling weight.
Under sampling with replacement or sampling from an
infinite population, these two halves are independent
stratified samples. Computing a statistic on each half
and taking the square of the difference gives an
unbiased estimate of the variance of the statistic.
Averaging this estimate over many possible ways of
choosing one PSU from each stratum gives a more precise estimate of the variance.
If the sample has L strata there are 2L ways to take
one PSU from each stratum, but this would be computationally prohibitive even for moderately large L.
The same estimate of the variance of a population
mean or population total can be obtained from a much
smaller set of splittings as long as the following
conditions are satisfied:
BANDWAGON AND
UNDERDOG EFFECTS
Bandwagon and underdog effects refer to the reactions that some voters have to the dissemination of
information from trial heat questions in pre-election
polls. Based upon the indication that one candidate is
leading and the other trailing, a bandwagon effect
indicates the tendency for some potential voters with
low involvement in the election campaign to be
attracted to the leader, while the underdog effect
refers to the tendency for other potential voters to be
attracted to the trailing candidate.
Background
Bandwagon and underdog effects were a concern of
the earliest critics of public polls, and the founders of
polling had to defend themselves against such effects
from the start. The use of straw polls was common by
the 1920s, and by 1935 a member of Congress had
introduced an unsuccessful piece of legislation to
limit them by constraining the use of the mails for
surveys. A second piece of legislation was introduced
in the U.S. Senate after the 1936 election, following
on the heels of an editorial in The New York Times
that raised concerns about bandwagon effects among
the public as well as among legislators who saw poll
results on new issues (even while the Times acknowledged such effects could not have been present in
the 1936 election). A subsequent letter to the editor
decried an underdog effect instead, and the debate
was off and running.
49
50
Research
Limitations
A second consideration is that the likely size of the
effects is small. This is due to the fact that as Election
Day approaches and preferences crystallize, it is the
strongest partisans who are most likely to participate.
And their preferences are the most stable in the electorate. As a result, there is a relatively small proportion
of the likely electorate, as opposed to the entire registered or voting age population, that could be subject to
such effects. This implies that very large sample sizes
are needed to detect such effects with confidence.
A third consideration is that these two effects do
not occur in isolation, and as a result they may offset
each other because they reflect responses in opposing
directions. This represents another difficulty in searching for their occurrence in single cross-sectional surveys. This in fact was the main point of evidence and
source of refutation of bandwagon and underdog
effects used by the public pollsters in the early
defense of their work. Given the historical record of
accuracy of the major public pollsters, with an average deviation from the final election outcome of about
2 percentage points (excluding the 1948 election), the
differences between final pre-election poll estimates
at the national level and the popular vote for president
have been very small.
It should also be noted that the full specification of
models that predict candidate preference involve a
large number of factors, a further complication for
isolating published poll results as a cause. For all of
these reasons, researchers interested in these phenomena turned to alternative designs involving variations
on experiments. The experimental approach has a
number of advantages, including isolating exposure to
poll results as the central causal factor when randomization of subjects to various treatment groups and a
control group is used to make all other things equal.
An experimental design can also assess temporal
order as well, verifying that candidate preference
occurred (or changed) after exposure to the poll
results. A well-designed experimental study will
require many fewer subjects than the sample size for
a survey-based design. At the same time, the kind of
subjects used in many experiments, such as college
Behavioral Question
51
BEHAVIORAL QUESTION
Behavioral questions are survey questions that ask
about respondents factual circumstances. They contrast with attitude questions, which ask about respondents opinions. Typical behavioral questions target
the respondents household composition, sources of
income, purchases, crime victimizations, hospitalizations, and many other autobiographical details. The
Current Population Survey (CPS), for example, asks:
Have you worked at a job or business at any time
during the past 12 months?
52
that can be confused with irrelevant ones are all subject to inaccuracy because of the burden they place on
memory.
Peoples difficulty in recalling events, however, can
lead them to adopt other strategies for answering behavioral questions. In deciding when an event happened,
for example, respondents may estimate the time of
occurrence using the date of a better-remembered
neighboring event (The burglary happened just after
Thanksgiving; so it occurred about December 1). In
deciding how frequently a type of event happened,
respondents may base their answer on generic information (I usually go grocery shopping five times a
month), or they may remember a few incidents and
extrapolate to the rest (I went grocery shopping twice
last week, so I probably went eight times last month).
These strategies can potentially compensate for recall
problems, but they can also introduce error. In general,
the accuracy of an answer to a behavioral question will
depend jointly, and in potentially complex ways, on
both recall and estimation.
Answers to behavioral questions, like those to attitude questions, can depend on details of question
wording. Linguistic factors, including choice of words,
grammatical complexity, and pragmatics, can affect
respondents understanding of the question and, in
turn, the accuracy of their answers. Because behavioral
questions sometimes probe frequencies or amounts,
they can depend on the respondents interpretation of
adverbs of quantification, such as usually, normally, or
typically (How often do you usually/normally/typically
go grocery shopping each month?) or quantifiers of
amounts, such as a lot, some, or a little (as in the
NHIS example). Similarly, answers to these questions
are a function of respondents interpretation of the
response alternatives. Respondents may assume, for
example, that the response options reflect features of
the population under study and base their response
choice on this assumption.
Lance J. Rips
See also Measurement Error; Respondent-Related Error;
Satisficing; Telescoping
Further Readings
Behavior Coding
53
BEHAVIOR CODING
Behavior coding concerns the systematic assignment
of codes to the overt behavior of interviewer and
54
Behavior Coding
Pretesting Questionnaires
If a particular question is often read incorrectly, this
may be due to interviewer error, but it may also be a
result of the wording of the question itself. Perhaps
the question has a complex formulation or contains
words that are easily misunderstood by the respondent. To prevent such misunderstandings, the interviewer may deliberately change the formulation of
the question.
To gain more insight into the quality of the questions, the behavior of the respondent should be coded
too. Typical codes for respondent behavior include
Asks repetition of the question, Asks for clarification, Provides uncodeable response (e.g., I watch
television most of the days, instead of an exact number), or Expresses doubt (e.g., About six I think,
Im not sure). Most behavior coding studies use codes
both for the respondent and the interviewer. The number of different codes may range between 10 and 20.
Unlike evaluating interviewer performance, pretesting questionnaires by means of behavioral coding
requires a pilot study conducted prior to the main data
collection. Such a pilot study should reflect the main
study as closely as possible with respect to interviewers and respondents. At least 50 interviews are
necessary, and even more if particular questions are
asked less often because of skip patterns.
Compared to other methods of pretesting questionnaires, such as cognitive interviewing or focus groups,
pretesting by means of behavior coding is relatively
expensive. Moreover, it primarily points to problems
rather than causes of problems. However, the results
of behavior coding are more trustworthy, because the
data are collected in a situation that mirrors the data
collection of the main study. Moreover, problems that
appear in the actual behavior of interviewer and respondent are real problems, whereas in other cases, for
example in cognitive interviewing, respondents may
report pseudo-problems with a question just to please
the interviewer.
InterviewerRespondent Interaction
If one codes both the behavior of interviewer and
respondent and takes the order of the coded utterances
into account, it becomes possible to study the course of
the interaction. For example, one may observe from a
pretesting study that a particular question yields a disproportionately high number of suggestive probes from
the interviewer. Such an observation does not yield
much insight into the causes of this high number.
However, if one has ordered sequences of codes available, one may observe that these suggestive probes
almost invariantly occur after an uncodeable response
to that question. After studying the type of uncodeable
response and the available response alternatives in more
detail, the researcher may decide to adjust the formulation of the response alternatives in order to decrease the
number of uncodeable responses, which in turn should
decrease the number of suggestive probes.
In contrast, if the researcher merely looked at the
sheer number of suggestive probings, he or she might
Beneficence
In interviewer-monitoring studies, it may be sufficient to code the utterances of the interviewer only;
moreover, the researcher may confine himself to
55
particular interviewer utterances, like question reading, probing, or providing clarification. Other types of
utterancesfor example, repeating the respondents
answerare neglected. In pretesting studies, it is
sometimes decided to code only behavior of the
respondent. Also, in interaction studies, the researcher
may use a form of such selective coding, neglecting all utterances after the answer of the respondent
(e.g., if the respondent continues to elucidate the
answer, this would not be coded). Alternatively, each
and every utterance is coded. Especially in the case of
interaction studies, this is the most common strategy.
All these procedural decisions have time and cost
implications. Selective live coding is the fastest and
cheapest, while full audio-recorded coding using transcriptions is the most tedious and costly but also
yields the most information.
Wil Dijkstra
See also Cognitive Interviewing; Interviewer Monitoring;
Questionnaire Design
Further Readings
BENEFICENCE
The National Research Act (Public Law 93348)
of 1974 created the National Commission for the
Protection of Human Subjects of Biomedical and
Behavioral Research, which, among other duties, was
charged with the responsibility of identifying, articulating, and fully explaining those basic ethical principles that should underlie the conduct of biomedical
56
Bias
It is the principle of beneficence, along with the principles of justice and respect for human subjects, that
stands as the foundation upon which the governmentmandated rules for the conduct of research (Chapter 45,
Subpart A, Section 46 of the Code of Federal
Regulations) have been created under the auspices of
the U.S. Department of Health and Human Services,
Office of Human Research Protections.
Jonathan E. Brill
See also Confidentiality; Ethical Principles
Further Readings
BIAS
Bias is a constant, systematic form or source of error,
as opposed to variance, which is random, variable
error. The nature and the extent of bias in survey
Bias
57
Figure 2
Overview
Survey researchers often rely upon estimates of population statistics of interest derived from sampling the
relevant population and gathering data from that sample. To the extent the sample statistic differs from the
true value of the population statistic, that difference
is the error associated with the sample statistic. If the
error of the sample statistic is systematicthat is, the
errors from repeated samples using the same survey
design do not balance each other outthe sample statistic is said to be biased. Bias is the difference
between the average, or expected value, of the sample
estimates and the target populations true value for
the relevant statistic. If the sample statistic derived
from an estimator is more often larger, in repeated
samplings, than the target populations true value,
then the sample statistic exhibits a positive bias. If the
majority of the sample statistics from an estimator are
smaller, in repeated samplings, than the target populations true value, then the sample statistic shows a
negative bias.
Bias of a survey estimate differs from the error of
a survey estimate because the bias of an estimate
relates to the systematic and constant error the estimate exhibits in repeated samplings. In other words,
simply drawing another sample using the same sample design does not attenuate the bias of the survey
estimate. However, drawing another sample in the
context of the error of a survey can impact the value
of that error across samples.
Graphically, this can be represented by a bulls-eye
in which the center of the bulls-eye is the true value
of the relevant population statistic and the shots at the
target represent the sample estimates of that population statistic. Each shot at the target represents an estimate of the true population value from a sample using
the same survey design. For any given sample, the
difference between the sample estimate (a shot at
the target) and the true value of the population (the
bulls-eye) is the error of the sample estimate.
58
Bias
Estimation Bias
Estimation bias, or sampling bias, is the difference
between the expected value, or mean of the sampling
distribution, of an estimator and the true value of the
population statistic. More specifically, if is the
population statistic of interest and ^ is the estimator
of that statistic that is used to derive the sample estimate of the population statistic, the bias of ^ is
defined as:
^ = E
^ :
Bias
The estimation bias of the estimator is the difference between the expected value of that statistic and
the true value. If the expected value of the estimator,
^ is equal to the true value, then the estimator is
,
unbiased.
Estimation bias is different from estimation, or
sampling, error in that sampling error is the difference
between a sample estimate and the true value of the
population statistic based on one sampling of the sample frame. If a different sample were taken, using the
same sample design, the sampling error would likely
be different for a given sample statistic. However, the
estimation bias of the sample statistic would still be
the same, even in repeated samples.
Often, a desirable property of an estimator is that it
is unbiased, but this must be weighed against other
desirable properties that a survey researcher may want
Coverage Bias
Coverage bias is the bias associated with the failure
of the sampling frame to cover the target population.
If the sampling frame does not allow the selection
of some subset of the target population, then a survey
can be susceptible to undercoverage. If a sampling
frame enumerates multiple listings for a given member of the target population, then a survey can suffer
from overcoverage.
In the case of undercoverage, a necessary condition
for the existence of coverage bias is that there are
members of the target population that are not part of
the sampling frame. However, this is not a sufficient
condition for coverage bias to exist. In addition, the
members of the target population not covered by the
sampling frame must differ across the population statistic of interest in some nonignorable way from the
members of the target population covered by the sampling frame. To the extent that there is not a statistically significant nonignorable difference between the
members of the target population covered by the sampling frame and the members of the target population
not covered by the sampling frame, the coverage bias
is likely to be small, even in instances when there is
significant noncoverage of the population by the sampling frame.
If one defines the following:
C The population mean for the relevant variable
for all members of the population covered by the sampling frame
NC The population mean for the relevant variable
for all members of the population not covered by the
sampling frame
pC The proportion of the target population covered
by the sampling frame
Bias
Nonresponse Bias
Nonresponse is the bias associated with the failure of
members of the chosen sample to complete one or more
questions from the questionnaire or the entire questionnaire itself. Item nonresponse involves sampled members of the target population who fail to respond to
one or more survey questions. Unit nonresponse is the
failure of sample members to respond to the entire
survey. This can be due to respondents refusals or
inability to complete the survey or the failure of the
researchers to contact the appropriate respondents to
complete the survey.
Like coverage bias, to the extent that there is
not a statistically significant nonignorable difference
between the sample members who respond to the survey and the sample members who do not respond to
the survey, the nonresponse bias is likely to be small
(negligible), even in instances when there is significant item or unit nonresponse.
59
60
Bilingual Interviewing
(b) identifying interviewers who are skilled in overcoming refusals and training these interviewers to
hone these skills further; and (c) developing a motivational incentive system to coax reluctant respondents
into participation.
Another approach that adjusts survey data to attempt
to account for possible nonresponse bias is the use of
post-stratified weighting methods, including the use of
raking adjustments. With these methods, auxiliary
information is used about the target population to bring
the sample, along selected metrics, in line with that
population. Imputation methods can also be used to
insert specific responses to survey questions suffering
from item nonresponse.
Measurement Bias
Measurement bias is the bias associated with the
failure to measure accurately the intended variable or
construct. The bias results from the difference
between the true value for what the question or questionnaire intends to measure and what the question or
questionnaire actually does measure. The source of
the bias can be the interviewer, the questionnaire, the
respondent, the mode of data collection, or a combination of all of these.
Measurement bias can be particularly difficult
to detect. The problem with detection stems from the
possibility that the bias can originate from so many possible sources. Respondents can contribute to measurement bias due to limitations in cognitive ability,
including recall ability, and due to motivational shortcomings in the effort required to answer the survey
questions properly. To combat measurement bias from
respondents, surveys can be designed with subtle redundancy in the questions asked for variables and constructs where the survey researcher suspects some
problem. This redundancy allows the researcher to
examine the survey results for each respondent to determine whether internal inconsistencies exist that would
undermine the data integrity for a given respondent.
The questionnaire can contribute to measurement
bias by having questions that inadequately address
or measure the concepts, constructs, and opinions that
make up the subject matter of the study. The questionnaire can also contribute to measurement bias if the
question wording and order of questions impact the
quality of respondents answers. Typically, the amount
of measurement bias introduced due to the questionnaire will be difficult to gauge without controlled
BILINGUAL INTERVIEWING
Bilingual interviewing refers to in-person and telephone surveys that employ interviewers who have the
Bilingual Interviewing
Interviewer Support
Additionally, bilingual populations may show hesitation in answering particular questions that may not be
problematic for non-bilingual populations. For example, many Spanish-speaking respondents tend to routinely hesitate when asked to provide their names and
addresses. Each bilingual group may have its own set
of questions that are considered sensitive when
asked by an outsider (i.e., the survey interviewer).
Thus the interviewer will need to find ways to minimize respondent hesitation and reluctance in order to
61
Translation Process
The quality of the bilingual questionnaire translation
will depend on the time and resources the survey
researcher can devote to the task. It is in the best interest of the survey researcher to provide the group that is
doing the translation with information on the background of the study, information about the questionnaire topics, country-of-origin statistics of the target
population, acculturation level of the target population,
effective words or phrases that may have been used in
prior studies, and the format in which the survey will
be conducted (i.e., phone, mail, in person, etc.). All of
this information provides the translators with the tools
to tailor the questionnaire translation to the bilingual
target population(s). The preferred method of translation is to allow at least two translators to independently
develop their own translated versions of the survey
questionnaire. Next, the two translators use their
independent versions to develop a single version and
review the new version with the project lead to make
sure the concepts have been conveyed correctly and
effectively. The team then finalizes the version for use
in bilingual interviewing pilot testing. Even though this
62
Bilingual Interviewing
Interviewing
In order to interview bilingual populations, the survey
research organization must employ bilingual interviewers and bilingual support staff that are fluent in all
the languages in which respondents will be recruited
and interviewed. Interviewers and support staff should
be able to show mastery of the relevant languages, and
their abilities (including their ability to speak English
or the dominant language in which the survey will be
administered) should be evaluated through use of a language skills test to measure spoken fluency, reading
ability, and comprehension in the other language(s).
During data collection, it is important for interviewers
and support staff to be able to communicate with the
researchers and project supervisors to work together to
address any culturally specific problem that may arise.
Depending on the level of funding available to the
survey organization, there are a few areas of additional
training that are useful in improving bilingual staff
interviewing skills: listening techniques, language and
cultural information about bilingual respondents, and
accent reduction techniques.
The researcher may want to have bilingual interviewers trained to listen for important cues from the
respondent, that is, the respondents dominant language, level of acculturation, culture or country of
origin, immigration status, gender, age, education
level, socioeconomic status, individual personality,
and situation or mood. The bilingual interviewer can
use these cues proactively to tailor the survey introduction and address any respondent concerns, leading
to a smooth and complete interview.
Survey researchers can provide interviewers with
information on language patterns, cultural concepts,
and cultural tendencies of bilingual respondents.
Understanding communication behavior and attitudes
can also be helpful in tailoring the introduction and
addressing respondent concerns. Survey researchers
need to train bilingual interviewers to use a standard conversational form of the foreign language,
remain neutral, and communicate in a professional
public-speaking voice. The use of a professional voice
helps reduce the tendency of both the interviewer and
respondent to judge social characteristics of speech,
especially when the interviewer has the same regional
language style as the respondent.
For those bilingual interviewers who will also be
conducting interviews in English but have trouble
with English consonant and vowel pronunciation, a
training module that teaches accent reduction will
help the interviewer produce clearer speech so that
English-language respondents do not have to strain to
understand.
Kimberly Brown
See also Fallback Statements; Interviewer Debriefing;
Interviewer Training; Language Barrier; Language
Translations; Nonresponse Bias; Questionnaire Design;
RespondentInterviewer Rapport; Sensitive Topics
Further Readings
Bipolar Scale
BIPOLAR SCALE
Survey researchers frequently employ rating scales to
assess attitudes, behaviors, and other phenomena having a dimensional quality. A rating scale is a response
format in which the respondent registers his or her
position along a continuum of values. The bipolar
scale is a particular type of rating scale characterized
by a continuum between two opposite end points. A
central property of the bipolar scale is that it measures
both the direction (side of the scale) and intensity
(distance from the center) of the respondents position
on the concept of interest.
The construction of bipolar scales involves numerous design decisions, each of which may influence
how respondents interpret the question and identify
their placement along the continuum. Scales typically
feature equally spaced gradients between labeled end
points. Data quality tends to be higher when all of the
gradients are assigned verbal labels than when some
or all gradients have only numeric labels or are unlabeled. Studies that scale adverbial expressions of
intensity, amount, and likelihood may inform the
researchers choice of verbal labels that define relatively equidistant categories.
Both numeric and verbal labels convey information
to the respondent about the meaning of the scale
points. As shown in Figure 1, negative-to-positive
numbering (e.g., 3 to + 3) may indicate a bipolar
conceptualization with the middle value (0) as a balance point. By contrast, low-to-high positive numbering (e.g., 0 to + 7) may indicate a unipolar
conceptualization, whereby the low end represents the
absence of the concept of interest and the high end
represents a great deal. The choice of gradient labels
Extremely
dissatisfied
Figure 1
+1
+2
+3
63
Extremely
satisfied
64
Bogus Question
Further Readings
BOGUS QUESTION
candidate name recognition is critical for understanding the intentions of voters. Thus, the name of a fictitious candidate could be added to the list of real
candidates the survey is asking about to learn how
many respondents answer that they know the fictitious
(bogus) candidate. Similarly, when people (e.g., surveys of teenagers) are asked about the use of illegal
substances they may have used in the past, it is advisable to add one or more bogus substances to the list
of those asked about to be able to estimate the proportion of respondents who may well be answering erroneously to the real survey questions.
Past experience has shown that in some cases as
many as 20% of respondents answer affirmatively
when asked if they ever have heard about X before
today, where X is something that does not exist.
That is, these respondents do not merely answer that
they are uncertainthey actually report, Yes,
they have heard of the entity being asked about. Past
research has suggested that respondents with lower
educational attainment are most likely to answer affirmatively to bogus questions.
The data from bogus questions, especially if several
bogus questions are included in the questionnaire, can
be used by researchers to (a) filter out respondents who
appear to have answered wholly unreliably, and/or (b)
create a scaled variable based on the answers given to
the bogus questions and then use this variable as a covariate in other analyses. Researchers need to explicitly
determine whether or not the needs of the survey justify
the costs of adding bogus questions to a questionnaire.
When a new topic is being studiedthat is, one that
people are not likely to know much aboutit is especially prudent to consider the use of bogus questions.
Paul J. Lavrakas
Further Readings
Bootstrapping
BOOTSTRAPPING
Bootstrapping is a computer-intensive, nonparametric
approach to statistical inference. Rather than making
assumptions about the sampling distribution of a statistic, bootstrapping uses the variability within a sample to estimate that sampling distribution empirically.
This is done by randomly resampling with replacement from the sample many times in a way that
mimics the original sampling scheme. There are various approaches to constructing confidence intervals
with this estimated sampling distribution that can be
then used to make statistical inferences.
Goal
The goal of statistical inference is to make probability
statements about a population parameter, , from a
^ calculated from sample data drawn ranstatistic, ,
domly from a population. At the heart of such analysis is the statistics sampling distribution, which is the
range of values it could take on in a random sample
of a given size from a given population and the probabilities associated with those values. In the standard
parametric inferential statistics that social scientists
learn in graduate school (with the ubiquitous t-tests
and p-values), a statistics sampling distribution is
derived using basic assumptions and mathematical
analysis. For example, the central limit theorem gives
one good reason to believe that the sampling distribution of a sample mean is normal in shape, with an
expected value of the population mean and a standard
deviation of approximately the standard deviation of
the variable in the population divided by the square
root of the sample size. However, there are situations
in which either no such parametric statistical theory
exists for a statistic or the assumptions needed to
apply it do not hold. In analyzing survey data, even
using well-known statistics, the latter problem may
arise. In these cases, one may be able to use bootstrapping to make a probability-based inference to the
population parameter.
Procedure
Bootstrapping is a general approach to statistical
inference that can be applied to virtually any statistic.
The basic procedure has two steps: (1) estimating the
65
66
Bootstrapping
Table 1
Case
Number
Resample
#1
Resample
#2
Resample
#3
0.27
1.768
0.27
0.152
0.152
0.27
1.779
0.697
1.395
0.697
1.408
1.768
0.875
0.697
0.133
2.204
2.039
0.133
1.395
0.875
0.727
0.587
0.587
0.914
0.366
0.016
1.234
1.779
2.204
0.179
0.152
2.039
0.179
0.714
1.395
2.204
10
0.261
0.714
1.099
0.366
11
1.099
0.097
1.121
0.875
12
0.787
2.039
0.787
0.457
13
0.097
1.768
0.016
1.121
14
1.779
0.101
0.739
0.016
15
0.152
1.099
1.395
0.27
16
1.768
0.727
1.415
0.914
17
0.016
1.121
0.097
0.860
18
0.587
0.097
0.101
0.914
19
0.27
2.204
1.779
0.457
20
0.101
0.875
1.121
0.697
21
1.415
0.016
0.101
0.179
22
0.860
0.727
0.914
0.366
23
1.234
1.408
2.039
0.875
24
0.457
2.204
0.366
1.395
25
0.133
1.779
2.204
1.234
26
1.583
1.415
0.016
1.121
27
0.914
0.860
0.457
1.408
28
1.121
0.860
2.204
0.261
29
0.739
1.121
0.133
1.583
30
0.714
0.101
0.697
2.039
Bounding
Table 1 (continued)
Case
Number
Original
Sample
(N(0,1))
Resample
#1
Resample
#2
Resample
#3
Mean
0.282
0.121
0.361
0.349
StDev
1.039
1.120
1.062
1.147
Useful Situations
There are two situations in which bootstrapping is most
likely to be useful to social scientists. First, the bootstrap may be useful when making inferences using a
statistic that has no strong parametric theory associated
with it, such as the indirect effects of path models,
eigenvalues, the switch point in a switching regression,
or the difference between two medians. Second, the
bootstrap may be useful for a statistic that may have
strong parametric theory under certain conditions, but
those conditions do not hold. Thus, the bootstrap may
be useful as a check on the robustness of parametric
statistical tests in the face of assumption violations.
Christopher Z. Mooney
See also Confidence Interval; Dependent Variable;
Independent Variable; Relative Frequency; Simple
Random Sample
Further Readings
67
BOUNDING
Bounding is a technique used in panel surveys to reduce
the effect of telescoping on behavioral frequency
reports. Telescoping is a memory error in the temporal
placement of events; that is, an event is remembered,
but the remembered date of the event is inaccurate.
This uncertainty about the dates of events leads respondents to report events mistakenly as occurring earlier or
later than they actually occurred. Bounding reduces telescoping errors in two ways. First, bounding takes
advantage of the information collected earlier to eliminate the possibility that respondents report events that
occurred outside a given reference period. Second,
bounding provides a temporal reference point in respondents memory, which helps them correctly place an
event in relation to that reference point.
A number of specific bounding procedures have
been discussed in the survey literature. The bounding
interview procedure was first developed by John Neter
and Joseph Waksberg in the 1960s in a study of
recall of consumer expenditures (they call it bounded
recall). The general methodology involves completing
an initial unbounded interview in which respondents are
asked to report events that occurred since a given date.
In the subsequent bounded interviews, the interviewer
tells the respondents the events that had been reported
during the previous interview and then asks for additional events occurring since then. In other words, the
information collected from each bounded interview is
compared with information collected during previous
interviews to ensure that the earlier reported events are
not double counted.
For example, suppose panel respondents are interviewed first in June and then in July. The June interview is unbounded, where respondents are asked to
report events that occurred in the previous month.
The July interview is bounded. Interviewers would
first inform respondents of the data they had provided
in June and would then inquire about events that happened since then. Often the data from the initial
unbounded interview are not used for estimation but
68
Branching
BRANCHING
Branching is a questionnaire design technique used
in survey research that utilizes skip patterns to ensure
that respondents are asked only those questions that
apply to them. This technique allows the questionnaire to be tailored to each individual respondent so
that respondents with different characteristics, experiences, knowledge, and opinions are routed to applicable questions (e.g., questions about a treatment for
diabetes are only asked to respondents who have been
diagnosed with diabetes).
69
Further Readings
70
Busies
Fedstats: http://www.fedstats.gov
U.S. Department of Labor, Bureau of Labor Statistics: http://
www.bls.gov
BUSIES
Busies are a survey disposition that is specific to
telephone surveys. They occur when the interviewer
or a predictive dialer dials a number in the sampling
pool and encounters a busy signal. Busies can be considered a positive outcome because they often indicate
(a) that the telephone number is in service, and
(b) that a person likely can eventually be reached at
the number.
Busies can usually be considered a temporary disposition code because the presence of a busy signal
is not sufficient to establish whether the respondent or
household is eligible for the survey (i.e., busies are
cases of unknown eligibility). As a result, it is important to have the interviewer redial the number. One
common sample management strategy is to have the
number redialed immediately, thus ensuring that the
number was dialed correctly and making it possible to
reach the person using the phone if he or she was in
the process of finishing the call. However, depending
71
Further Readings
C
anything that would help him or her determine the
best time to reach the designated respondent. In other
cases coded with a general callback disposition, the
interviewer may obtain some information about when
to next make a call attempt on the case (such as evenings only or before 2:30 p.m.) but is not able
to make an appointment to contact the designated
respondent at a definite day or time. In a specific callback, however, the interviewer learns enough to set
a definite day and time for the next call attempt (such
as, appointment set for 2:30 p.m. tomorrow). Aside
from learning the day and time for subsequent call
attempts, interviewers also should attempt to obtain
other information that might increase the chances of
converting the callback into a completed interview.
This information might include the name and/or gender of the designated respondent, or any other information that might help the interviewer reach the
designated respondent on subsequent call attempts.
Because cases coded with the callback disposition
are eligible and continue to be processed in the sampling pool, information learned during previous call
attempts about when to contact the designated respondent can be used to better target subsequent call
attempts by the interviewer. For a specific callback,
additional call attempts should occur at the appointment time set by the respondent; additional call
attempts on a general callback in which little is
known might be made at a variety of other days and
times in order to increase the chances of reaching the
designated respondent and/or to learn more about
how to target additional call attempts.
CALLBACKS
Callbacks are a survey disposition that is specific to
telephone surveys. They are a common temporary survey disposition because fewer than half of all completed interviews occur on the first dialing of a case.
Callbacks happen for a number of reasons. For example, an interviewer might dial a telephone number in
the sampling pool and be told that the designated
respondent is not available to complete the interview at
the time of the call. In other cases, the interviewer
might reach the designated respondent but learn that
he or she would prefer to complete the interview at
another time. A callback might also occur if an interviewer dials a telephone number and reaches an
answering machine or a voicemail service. Callbacks
are considered a positive outcome because they usually
indicate that the household or designated respondent is
eligible and that an interview is likely to be completed
with the respondent if the interviewer is able to reach
him or her at a good time. Cases coded with the callback disposition usually are considered eligible cases
in calculating survey response rates because the interviewer has been able to determine that the household
or designated respondent meets the qualifications set
by the survey researcher for completing the interview.
Callbacks can occur for multiple reasons, and as
a result the callback disposition often is further categorized into a general callback disposition and a specific callback disposition. In a general callback, the
interviewer learns that the designated respondent is
not available at the time of the call but does not learn
Matthew Courser
73
74
Caller Id
CALLER ID
Further Readings
CALL FORWARDING
Call forwarding is a feature on most U.S. and international telephone networks that allows an incoming
call to be redirected to one or more other telephone
numbers as directed by the subscriber. This feature is
popular with individuals who want or need to be
reached when they are not at home or want to avoid the
delays inherent with answering machines and voicemail. The use of call forwarding features can cause
problems for telephone survey researchers. When an
incoming call has been forwarded to another location,
the called party may be less willing to participate in
a survey at that location. When a call is forwarded to
a cell phone in the United States, the called party will
incur a cost in terms of dollars or minutes and may be
in a location or other circumstance that is incompatible
with survey participation.
Standard call forwarding transfers all calls from
phone number A to phone number B. Special types of
call forwarding are also available. Call forwarding
can automatically route calls that are not answered
within a designated number of rings or when the line
is busy to another telephone number. Finally, call
Calling Rules
75
CALLING RULES
Telephone survey researchers often utilize a set of
guidelines (or calling rules) that dictate how and when
a sample unit should be contacted during the surveys
field period. These rules are created to manage the
sample with the goal of introducing the appropriate
sample elements at a time when an interviewer is most
likely to contact a sample member and successfully
complete an interview. In telephone surveys, calling
rules are typically customized to the particular survey
organization and to the particular survey and should be
crafted and deployed with the survey budget in mind.
Calling rules are a primary mechanism that researchers can use to affect a surveys response rate. All else
equal, making more dialing attempts will lower noncontact-related nonresponse, thereby yielding a higher
response rate. In general, the more call attempts placed
to a telephone number, the more likely someone will
eventually answer the phone, thereby giving the survey
organizations interviewers the opportunity to try to
complete an interview. However, the trade-off to making more and more phone calls is the additional costs
incurred with each call, both in terms of interviewers
labor and the toll charges related to the calls.
Since all surveys have finite budgets and resources
that must be allocated for dialing attempts, resources
allocated for these purposes cannot be used for other
important purposes, such as additional questionnaire
testing or development or gathering data from larger
sample sizes. This competition for survey resources,
along with the tension between achieving higher
response rates with more calls made and the added
expenditure of these additional call attempts illustrates
the importance of a well-thought-out approach to the
development and implementation of calling rules to
manage a telephone survey sample.
When examining calling rules, an important distinction is often made between first call attempts to
76
Calling Rules
Ranked Category
In the case of ranked category calling rules, the sample is categorized into independent (nonoverlapping)
cohorts, based on sample member characteristics and/
or previous call outcomes, and then ranked in order of
the most likely categories to lead to a contacted sample member. For example, a simple ranked category
calling rules system might suggest that previously
reached sample members, answering machines, and
ringno answers are categorized as such and then
should be called in that order. More complicated
ranked category systems would classify the sample
into more specialized categories and, therefore, have
more elaborate calling rules to process the sample. As
an example, for sample members who have yet to be
contacted, categories could be created that take into
account the time and day that previous calls had been
made. Calling rules could then dictate that future calls
should be made at times and days on which previous
calls had not been attempted. Once a call attempt is
made under a ranked category calling rules system,
assuming that the sample member remains part of the
active sample, the information gained from the last
call is incorporated into the information set for that
sample member. This additional information collected
from the last call is used to recategorize the sample
member, possibly into a different sample category.
Ranked category calling rules can be implemented using computer-assisted telephone interviewing
(CATI), but they can also be implemented without
the use of computers, making them an effective
means by which to control and process the sample.
However, a drawback to the use of ranked category
calling rules is the multitude of different categories
that may be necessitated and then the elaborate system of calling rules that would be developed to rank
these categories.
Priority Scoring
Priority scoring calling rules differ from ranked category calling rules in that, with priority scoring, it is not
necessary to categorize the sample into discrete, nonoverlapping categories. Instead, the information collected for each sample member is used in a multivariate
model, typically a logistic regression model, to estimate
the probability of the next call attempt leading to a contact and/or completion, conditioned on relevant information. Using the estimated coefficients from this
multivariate model, the probability of contact or completion can be calculated for any possible permutation
of the conditioning information set. These probabilities
are then used to order the sample, from the highest
probability calls to the lowest, with the highest probability calls being made first.
For example, a sample member who has been called
three times previously, once in the afternoon and twice
in the evening, with the outcomes of one ringno
answer, one busy signal, and one callback may have
a contact probability of 0.55 if the next call attempt is
placed in the evening. Another sample member who
has been called five times previously, once in the
morning, twice in the afternoon, and twice in the evening, with the outcomes of three ringno answers, one
busy signal, and one callback may have a contact probability of 0.43 if the next call attempt is placed in the
evening. Although both contact probabilities indicate
a fairly high likelihood of reaching these sample members in the evening, the contact probability for the first
sample member is higher, so that priority scoring calling rules would rank that sample member higher in the
calling queue.
Once the call attempt is made, assuming that the
sample member continues to be part of the active
sample, the information gained from this call attempt
updates the sample members information set. This
updated information is used to calculate an updated
Call-In Polls
Conditioning Information
In order to develop ranked category calling rules or
priority scoring calling rules, some prior understanding
of the likelihood of contacting sample members, given
the condition information, must be available. Typical
conditioning information that is used can be classified
into external information about sample membersfor
example, demographics, telephone number or exchange
informationand call history information about sample members. Call history information that has been
used for initial calls includes the time of day and the
day of the week the first call is made. Call history
information that has been used for subsequent calls
includes not only the information used for first calls
but also the number of previous calls that have been
made, the length of time between the last call and the
next call, the disposition of the previous call, the entire
history of call dispositions, and the time and days that
previous calls were made.
Typically, previous survey experience governs not
only the use of conditioning information either to categorize or to score the sample, but also how this conditioning information impacts the contact probabilities.
To the extent that the population for a survey has been
studied before, the use of the conditioning information
from that prior survey can be used to develop calling
rules for subsequent surveys of that same population.
However, to the extent the survey researcher is studying a population for the first time, the only avenue
77
Further Readings
CALL-IN POLLS
A call-in poll is an unscientific attempt to measure
public preferences by having radio or television audience members or newspaper readers call a telephone
number and register their opinions. Usually a single
question is posed, and people are asked to call one
phone number in support of a viewpoint and another
number in opposition. Call-in polls are used by some
media organizations as a way to measure public opinion and get the audience involved. But they are very
problematic from a data quality standpoint and should
not be referred to as polls.
78
Call Screening
A major problem with call-in polls is that the participants are entirely self-selected. Only those people
who tuned in to that particular broadcast at that time,
or read that newspaper, can be included. Further, those
who make the effort to participate are often very different from those who do not. That is because participants
are usually more interested in the topic or feel very
strongly about it. For these reasons, survey researcher
Norman Bradburn of the University of Chicago coined
the term SLOP, which stands for self-selected listener
opinion poll, to refer to call-in polls.
Another big problem is that call-in polls are open to
manipulation by any individual or group with a vested
interest in the topic. With no limit on the number of
calls that can be placed, people can call multiple times
and groups can set up more elaborate operations to
flood the phone lines with calls in support of their point
of view. As a result, call-in polls often produce biased
results, and their findings should be ignored. Legitimate survey researchers avoid the types of bias inherent in call-in polls by selecting respondents using
probability sampling techniques.
There are many examples of call-in polls producing distorted results. In one famous example, USA
Today conducted a call-in poll in 1990 asking its
readers whether Donald Trump symbolizes what is
wrong with the United States or symbolizes what
makes the United States great. USA Today reported
overwhelming support for Trump, with 81% of calls
saying he symbolizes what makes the United States
great. Later, USA Today investigated the results and
found that 72% of the 7,802 calls came from a company owned by a Trump admirer.
Another example comes from a 1992 CBS television program called America on the Line, where
viewers were asked to call in and register their opinions after President George H. W. Bushs State of
the Union address. The views of the approximately
317,000 calls that were tallied were much more pessimistic about the economy than what was measured in
a traditional scientific poll conducted by CBS News
at the same time. For example, 53% of those who
called in to the program said their personal financial
situation was worse than 4 years ago, compared with
32% in the scientific poll. The views of those who
called in were quite different than those of the general
public on a number of measures.
Although those with survey research training know
that call-in polls should not be taken seriously, many
members of the public do not make a distinction
CALL SCREENING
Call screening is a practice in which many people
engage whereby they listen to an incoming message on
their answering machine or look on their caller ID to
see who is calling before deciding whether or not to
answer the call. This behavior is thought to negatively
affect survey response rates. Over time, respondents
have become increasingly unwilling to participate in
surveys or even answer unsolicited telephone calls.
This desire for privacy has resulted in legislation such
as do-not-call lists and the use of a variety of technological barriers such as answering machines, caller ID,
and call blocking to screen incoming calls. These
screening devices allow individuals to determine which
calls they will answer, making it more difficult for
researchers to contact them. Further, individuals who
always screen may also be more likely to refuse to participate if and when they are contacted.
More than two thirds of U.S. households have
answering machines, and about 18% report always
using their answering machine to screen calls. Telephone companies improved on the answering machine
as a screening device with the development of caller
ID technology. This service displays the callers name
and/or telephone number on a persons phone or caller ID device. It is estimated that more than half of all
U.S. households now have caller ID and that nearly
30% always use it to screen calls. Call-blocking services that allow subscribers simply to reject certain
numbers or classes of numbers are also growing in
popularity.
Owners of these devices and those who regularly
use them to screen calls have been shown to be demographically different from the general population. It
is not always easy to identify a screening household,
particularly if the dialing always results in a noncontact.
Call Sheet
Further Readings
CALL SHEET
A call sheet is a record-keeping form that is used by
telephone survey interviewers to keep track of information related to the calls they make to reach survey
respondents. As paper-and-pencil interviewing (PAPI)
was replaced by computer-assisted telephone interviewing (CATI), these call sheets moved from being printed
on paper to being displayed on the interviewers computer monitor. The fact that they are named call
sheets refers to the days when thousands of such call
sheets (each one was a piece of paper) were used to
control sampling for a single telephone survey.
The information that is recorded on a call sheet
also called paradatacaptures the history of the
various call attempts that are made to a sampled telephone number. Typically these forms are laid out in
matrix format, with the rows being the call attempts
and the columns being the information recorded about
each call. For each call attempt, the information
includes (a) the date; (b) the time of day; (c) the outcome of the call (disposition), for example, ringno
answer, busy, disconnected, completed interview, and
so on; and (d) any notes the interviewer may write
about the call attempt that would help a subsequent
interviewer and/or a supervisor who is controlling the
79
sample, for example, The respondent is named Virginia and she is only home during daytime hours.
Since most telephone interviews are not completed on
the first calling attempt, the information that interviewers record about what occurred on previous call
attempts is invaluable to help process the sample further and effectively.
It is through the use of the call outcome information recorded on the call sheetand described in
detail in the American Association for Public Opinion
Researchs Standard Definitionsthat the sample is
managed. In the days when PAPI surveys were routinely conducted and the call sheets were printed on
paper, supervisory personnel had to sort the call
sheets manually in real time while interviewing was
ongoing. When a questionnaire was completed, the
interviewer manually stapled the call sheet to the top
of the questionnaire and then the supervisor removed
that case from further data collection attempts. For
call sheets that did not lead to completed interviews
but also did not reach another final disposition (e.g.,
disconnected or place of business), the supervisor followed a priori calling rules to decide when next to
recycle a call sheet for an interviewer to try dialing it
again. With the shift to CATI and computer control of
the sampling pool (i.e., the set of numbers being
dialed) all this processing of the information recorded
on call sheets has been computerized. The CATI software serves up the call sheet on the interviewers
monitor at the end of the call for pertinent information
to be entered. That information drives other logic in
the CATI software that determines whether, and
when, to serve up the telephone number next to an
interviewer. The information captured on the call
sheet is used for many other purposes after the survey
ends, including helping to create interviewer performance metrics and calculating survey response rates.
Paul J. Lavrakas
See also Callbacks; Calling Rules; Computer-Assisted
Telephone Interviewing (CATI); Interviewer Productivity;
Paper-and-Pencil Interviewing (PAPI); Paradata;
Response Rates; Sampling Pool; Standard Definitions;
Telephone Surveys
Further Readings
80
CaptureRecapture Sampling
CAPTURERECAPTURE SAMPLING
Capturerecapture sampling (also referred to as
capturemarkrecapture sampling or markrelease
recapture sampling) is a method used to estimate the
unknown size of a population. In practice, it is often
not feasible to manually count every individual element
in a population because of time, budget, or other constraints. And, in many situations, capturerecapture
sampling can produce a statistically valid estimate of
a population size in a more efficient and timely manner
than a census.
The most basic application of capturerecapture
sampling consists of two stages. The first stage involves
drawing (or capturing) a random sample of elements
from a population of unknown size, for example, fish in
a pond. The sampled elements are then marked, or
tagged, and released back into the population. The second stage consists of drawing another random sample
of elements from the same population. The secondstage sample must be obtained without dependence on
the first-stage sample. Information from both stages is
used to obtain an estimate of the population total.
The capturerecapture technique assumes that the
ratio of the total number of population elements to the
total number of marked elements is equal, in expectation, to the ratio of the number of second-stage sample elements to the number of marked elements in the
sample. This relationship can be expressed as follows:
N=C = n=R,
Example
A classic example comes from the field of ecology.
Suppose the goal is to estimate the size of a fish population in a pond. A first-stage sample of 20 fish is
drawn, tagged, and released back into the pond. A
second-stage sample of 30 fish is subsequently drawn.
Tags are found on 12 of the 30 sampled fish, indicating that 12 fish captured in the first sample were
recaptured in the second sample. This information can
be used to assign actual quantities to the variables of
interest in Equation 1, where n = 30, C = 20, and
R = 12. Solving for N using Equation 2 yields the
following estimate of the population total:
N = nC=R = 3020=12 = 50:
Therefore, the estimated size of the ponds fish
population is 50. A more stable estimate of the population total, subject to less sampling variability, can be
obtained if multiple second-stage samples are drawn,
and estimated totals, computed from each sample, are
averaged together.
Assumptions
In order for the capturerecapture sampling technique
to produce a valid estimate of a population size, three
assumptions must hold:
1. Every population element has an equal probability
of being selected (or captured) into both samples.
2. The ratio between marked and unmarked population elements remains unchanged during the time
interval between samples.
3. Marked elements can be successfully matched from
first-stage sample to second-stage sample.
87
Case
Further Readings
Chao, A. (1987). Estimating the population size for capturerecapture data with unequal catchability. Biometrics,
43(4), 783791.
Hogan, H. (1993). The 1990 Post-Enumeration Survey:
Operations and results. Journal of the American
Statistical Association, 88, 10471060.
81
Le Cren, E. D. (1965). A note on the history of markrecapture population estimates. Journal of Animal
Ecology, 34(2), 453454.
Shapiro, S. (1949). Estimating birth registration
completeness. Journal of the American Statistical
Association, 45, 261264.
CASE
The term case refers to one specific element in the
population of interest that has been sampled for a survey. A completed case contains the responses that
were provided by that respondent for the questionnaire
used in that survey. A case may be an individual,
a household, or an organization. Being able to identify
each individual respondent can be critical for the conduct of the survey. Assignment of a unique case number identifier associated with each individual sampled
element should be done in every survey. Although most
computer-assisted surveys assign a respondent number,
it should not be confused with assignment of a case
number. As a general rule, case numbers are assigned
before a questionnaire is distributed, while respondent
numbers are assigned when a respondent is contacted
and an attempt is made to complete the survey.
Prior to data collection, a simple case number may
be assigned sequentially to each questionnaire before
being distributed for completion. The case number
can also be used to identify any number of background characteristics of the individual or household
to which the survey was distributedsuch as census
block, zip code, or apartment or single-family home.
Assignment of a case number should not be used to
compromise the confidentiality of either those who
complete the survey or the information they provide.
During data processing, the case number can be used
to assist in coding open-ended responses and conducting edit checks on the data set, such as verifying
information that is outside the normal response range
or that is inconsistent with other data in the case
record. In those designs for which respondents may
be contacted at a future date, the unique case number
can be used to ensure that responses to future surveys
are linked to the correct respondent.
Dennis Lambries
See also Coding; Completion Rate; Element; Respondent
82
Case-Control Study
CASE-CONTROL STUDY
Case-control studies measure the association between
the exposure to particular risk factors and the occurrence of a specific disease. These types of studies are
common in public health and medical research. The
basic premise of such studies is the comparison of
two groups: cases, individuals who have a particular
disease of interest to the researcher, and controls,
who do not have the disease.
In case-control studies, individuals in the case group
are selected and matched to persons in the control
group on a common set of characteristics that are not
considered to be risk factors for the disease being studied. These characteristics are frequently demographic
variables such as age, gender, education, income, and
area of residence. Comparisons across the case-control
pairs are made, examining hypothesized risk factors for
a particular disease. For example a case-control study
of heart disease among women may compare cases
and controls on their level of exposure to factors
thought to influence the risk of heart disease such as
family history of heart disease, smoking, cholesterol,
high blood pressure, diet, and exercise. These differences are usually assessed using statistical tests.
Data for case-control studies is typically collected
by interviewing or surveying the cases and the controls. Individuals in both groups are asked the same
series of questions regarding their medical history and
exposure to factors that are considered to increase the
risk of developing the disease in question. Data may
also be collected from medical records.
The advantages of case-control studies include the
following:
Data collection does not typically require medical
tests or other intrusive methods.
The studies are typically inexpensive to conduct in
comparison to other methods of data collection.
They are good for examining rare diseases because the
investigator must identify cases at the start of the
research rather than waiting for the disease to develop.
Case-control studies allow for the examination of
several risk factors for a particular disease at the
same time.
As with all research studies, there are some significant disadvantages as well, including the following:
Data on exposure and past history is subject to the
individuals memory of events.
Katherine A. Draughon
See also Case; Control Group; Research Design
Further Readings
living in cell phone only households differ on the survey variables of interest from persons living in households with landline telephones.
The National Health Interview Survey (NHIS)
provides the most up-to-date estimates regularly available from the U.S. federal government concerning the
prevalence and characteristics of cell phone only
households. This cross-sectional, in-person, household
survey of the U.S. civilian noninstitutionalized population, conducted annually by the National Center for
Health Statistics of the Centers for Disease Control
and Prevention, is designed to collect information on
health status, health-related behaviors, and health care
utilization. However, the survey also includes information about household telephones and whether anyone in the household has a working cell phone.
Approximately 40,000 household interviews are completed each year.
NHIS data permit an analysis of trends in the prevalence of cell phone only households in the United
States since 2003. The percentage of cell phone only
households doubled from 2003 to 2005, and as of
2006, approximately 11% of U.S. households were
cell phone only. The rate of growth in the size of this
population has not slowed, increasing at a compound
growth rate of more than 20% every 6 months. Cell
phone only households now compose the vast majority of non-landline households. More than 80% of
non-landline households have cell phone service in
the household, and this proportion also continues to
increase; the proportion was 62% during the first 6
months of 2003. This largely reflects the fact that the
percentage of households without any telephone service has remained unchanged, whereas the percentage
of cell phone only households has increased.
Since the NHIS began collecting data on cell
phone only households and the persons who live in
such households, the prevalence of cell phone only
adults has been greatest for adults 1824 years of age,
adults renting their homes, and adults going to school.
Men are more likely than women to be living in cell
phone only households. Hispanic adults are slightly
more likely to be living in cell phone only households
than are non-Hispanic white adults or non-Hispanic
black adults. Adults living in the Midwest, South, or
West are more likely to be living in cell phone only
households than are adults living in the Northeast.
Adults living in urban households are more likely
than adults living in rural households to be in cell
phone only households.
83
Further Readings
84
Table 1
Australia
Austria
103
Belgium
93
Canada
53
Denmark
107
Finland
108
France
85
Germany
102
Greece
100
Hong Kong
131
Italy
124
Japan
79
Netherlands
97
Portugal
116
Russia
84
Spain
106
Sweden
105
Turkey
71
Taiwan
97
U.K.
116
U.S.
77
Diversification of
Telephone Sampling Frames
As a result of the new presence of cell phone subscribers, the telephone subscriber universe as we know it
is changing and can be best described in four parts:
(1) cell phone only (CPO), (2) landline only (LLO),
(3) cell and landline (C&L), and (4) no phone service
of any kind (NPS), as depicted in Figure 1.
In Table 2, the distribution of the population within
each of these four subsets is provided for several countries. These data were obtained via nationwide probability samples using face-to-face interviews. A
common theme among industrialized countries is the
continued rise in the number of inhabitants who fall
into the cell phone only category; this increase poses
Landline
Phone Only
Households
Landline and
Cell Phone
Households
Cell Phone
Only
Households
Table 2
85
Country
Cell
Only
Cell and
Landline
Landline
Only
No
Phone
Month/
Year
Canada
5.1
61.3
32.4
1.2
12/2006
Finland
52.2
44.3
3.1
0.4
08/2006
France
16.7
61.6
20.8
1.0
07/2006
U.K.
9.0
84.0
7.0
1.0
07/2007
U.S.
14.0
72.3
12.3
1.4
06/2007
86
88
89
the United States and worldwide, additional information will become available to confirm and possibly
reform the current methods.
Trent D. Buskirk and Mario Callegaro
See also Calling Rules; Cell Phone Only Household;
Design Effect (deff); Dual-Frame Sampling; Federal
Trade Commission (FTC) Regulations; Hit Rate;
Latest-Birthday Selection; List-Assisted Sampling;
Mitofsky-Waksberg Sampling; Number Portability;
Prefix; Suffix Banks; Telephone Surveys; Weighting;
Within-Unit Selection
Further Readings
Brick, J. M., Dipko, S., Presser, S., Tucker, C., & Yuan, Y.
(2006). Nonresponse bias in a dual frame sample of cell
and landline numbers. Public Opinion Quarterly,
70, 780793.
Callegaro, M., & Poggio, T. (2004). Espansione della
telefonia mobile ed errore di copertura nelle inchieste
telefoniche [Mobile telephone growth and coverage error
in telephone surveys]. Polis, 18, 477506. English version
retrieved March 24, 2008, from http://eprints.biblio
.unitn.it/archive/00000680
Callegaro, M., Steeh, C., Buskirk, T. D., Vehovar, V.,
Kuusela, V., & Piekarski, L. (in press). Fitting disposition
codes to mobile phone surveys: Experiences from studies
in Finland, Slovenia, and the United States. Journal of the
Royal Statistical Society, Series A (Statistics in Society).
International Telecommunication Union. (2006). World
telecommunication indicators database (9th ed.).
Geneva: Author.
Kennedy, C. (2007). Evaluating the effects of screening for
telephone service in dual frame rdd surveys. Public
Opinion Quarterly, 71(5), 750771.
Kuusela, V., Callegaro, M., & Vehovar, V. (2007). Mobile
phones influence on telephone surveys. In M. Brick,
J. Lepkowski, L. Japec, E. de Leeuw, M. Link, P. J.
Lavrakas, et al. (Eds.), Telephone surveys: Innovations
and methodologies (pp. 87112). Hoboken, NJ: Wiley.
Lavrakas, P. J., & Shuttles, C. D. (2005) Cell phone
sampling, RDD surveys, and marketing research
implications. Alert!, 43, 45.
Lavrakas, P. J., Shuttles, C. D., Steeh, C., & Fienberg, H.
(2007). The state of surveying cell phone numbers in the
United States: 2007 and beyond. Public Opinion
Quarterly, 71(5), 840854.
Lohr, S., & Rao, J. N. K. (2000). Inference from dual frame
surveys. Journal of the American Statistical Association,
95, 271280.
Steeh, C., Buskirk, T. D., & Callegaro, M. (2007). Using text
messages in U.S. mobile phone surveys. Field Methods,
19, 5975.
90
Cell Suppression
CELL SUPPRESSION
Under certain circumstances, it is considered necessary
to withhold or suppress data in certain cells in a published statistical table. This is often done when particular estimates are statistically unreliable or when the
information contained could result in public disclosure
of confidential identifiable information. Suppression for
reasons of statistical reliability involves consideration
of sampling error as well as the number of cases upon
which the cell estimate is based. Suppression to avoid
the disclosure of confidential information in tabular
presentations involves many additional considerations.
Cell suppression may involve primary suppression,
in which the contents of a sensitive cell are withheld;
or if the value for that cell can be derived from other
cells in the same or other tables, secondary or complementary suppression. In the latter instance, the contents
of nonsensitive cells as well those of the sensitive cells
are suppressed. Sensitive cells are identified as those
containing some minimum number of cases. In an
establishment survey, for example, a cell size of 2
would be regarded as sensitive because it could reveal
to one sample establishment (included in the tabulation
and knowing its contribution to an estimate reported in
the table) the value of a variable reported by another
establishment known to have participated in the survey.
Often, the minimum cell size for suppression is considerably higher than 2, depending upon such factors as
total sample size, sampling ratio, and potential harm to
survey participants resulting from disclosure.
Once sensitive cells have been identified, there are
some options to protect them from disclosure: (a)
restructure the table by collapsing rows or columns
until no sensitive cells remain, (b) use cell suppression,
(c) apply some other disclosure limitation method, or
(d) suppress the entire planned table.
When primary and complementary suppressions are
used in any table, the pattern of suppression should be
audited to check whether the algorithms that select the
suppression pattern permit estimation of the suppressed
cell values within too close of a range. The cell suppression pattern should also minimize the amount of
data lost as measured by an appropriate criterion, such
as minimum number of suppressed cells or minimum
total value suppressed. If the information loss from cell
suppression is too high, it undermines the utility of the
data and the ability to make correct inferences from
the data. Cell suppression does create missing data in
tables in a nonrandom fashion, and this harms the utility of the data.
In general, for small tables, it is possible to select
manually cells for complementary suppression and to
apply audit procedures to guarantee that the selected
cells adequately protect the sensitive cells. However,
for large-scale survey publications having many interrelated, higher-dimensional tables, the selection of
a set of complementary suppression cells that are optimal is an extremely complex problem. Optimality in
cell suppression is achieved by selecting the smallest
number of cells to suppress (to decrease information
loss) while ensuring that confidential information is
protected from disclosure.
Stephen J. Blumberg
See also Confidentiality; Disclosure Limitation
Further Readings
CENSUS
A census is an attempt to list all elements in a group
and to measure one or more characteristics of those
elements. The group is often an actual national population, but it can also be all houses, businesses, farms,
books in a library, cars from an assembly line, and so
on. A census can provide detailed information on all or
most elements in the population, thereby enabling
totals for rare population groups or small geographic
areas. A census and a sample survey have many features in common, such as the use of a questionnaire to
collect information, the need to process and edit the
data, and the susceptibility to various sources of error.
Unlike a sample survey, in which only a subset of the
elements is selected for inclusion and enumeration,
a census generally does not suffer from sampling error.
However, other types of errors may remain. The decision to take a census versus a sample surveyif not
mandated by statuteis often based on an assessment
Census
of the coverage, cost, errors in the data, and other qualitative factors.
Aspects of a census include the types and historical
purposes for taking a census, its statistical properties,
the differences between a census and a sample survey,
and errors that can occur in a census.
General Background
Perhaps the most well-known type of census is one
that enumerates the population or housing characteristics of a specified country or other politically defined
region. Others measure the output in a specified sector
of the economy, such as agriculture, transportation,
manufacturing, or retail sales. These censuses are
typically authorized and funded by the central government of the region covered.
Censuses were first conducted hundreds (Canada,
Sweden) and even thousands (China) of years ago in
some parts of the world. In many countries, a census
is repeated in a fixed cycle, often every 5th (the
United Kingdom, Canada, Australia, New Zealand) or
10th (Portugal, Spain, Italy, Poland, Turkey) year. In
the United States, the census of population and housing has been conducted every 10th year, beginning
in 1790. The U.S. economic census is taken every
5th year.
Historically, the purpose of the census has varied.
At first, governing bodies wanted to know the number
of people for assessing taxes or determining the number of men eligible for the military. Currently, governments use census data to apportion their legislative
bodies, set boundaries for political districts, distribute
government funds for social programs, track the
nations economy, measure crops to predict food supplies, and monitor peoples commute to work to
determine where to improve the regions infrastructure. As a by-product, census lists of households, businesses, or farms are often used as frames for surveys
or follow-up studies. Further, the detailed information
collected in the census allows for more efficient sample designs and improved estimation in the surveys.
91
limited to the names, ages, and a few other characteristics of the people living in the household. At the same
time, a sample of about 1 in 6 U.S. households
received a long form that solicited the basic information as well as more detailed data on the residents
demographic and educational background, the housing
units physical size and structure, and other characteristics. Plans for the U.S. Census in 2010 call for only
a short form. The detailed data formerly solicited in
the long-form census are now collected in the American Community Survey, a large survey conducted by
the U.S. Census Bureau designed to produce estimates
at the county level every year. In an economic census,
dozens of different forms may be used to tailor the
questions to specific types of business.
Traditionally, census takers went door to door asking questions, an approach still used in many countries, especially in the developing world. In the
developed world, one or several modes of enumeration may be used. People or businesses are often contacted by mail or in person, perhaps by telephone if
a current number is available. When no response is
received from a mailing, a census representative may
be sent to a housing unit or establishment to follow
up. Where feasible, especially when canvassing businesses, an electronic questionnaire might be provided
on a disk. In some censuses, respondents may be
encouraged to reply via the Internet.
Alternative or combination approaches can be used
to solicit or collect data. As an example, in the U.S.
Census of Retail Trade in 2002, all of the larger establishments and a sample of the smaller ones were
mailed a complete questionnaire. For the smaller firms
not selected into the sample, the basic economic information was collected through available tax records.
Such an approach can lessen the reporting burden of
the respondents and, in some cases, provide valuable
auxiliary data. However, combining alternative methods of data collection usually requires an examination
of several key aspects: the coverage of the population,
differences in the definitions of data items, the consistency of information collected via different modes, and
the accuracy of the data.
92
Census
Errors in a Census
While the results from a census typically do not suffer
from sampling errorthose errors introduced by
canvassing only a sample of the entire population
censuses are susceptible to the nonsampling errors found
in sample surveys. A common problem is missing data,
Certificate of Confidentiality
Further Readings
CERTIFICATE OF CONFIDENTIALITY
In order to collect sensitive information, researchers
need to be able to ensure for themselves that identifiable research data will remain confidential and assure
respondents that this is the case. However, neither
legislatures nor courts have granted researchers an absolute privilege to protect the confidentiality of their
research data. Despite this, there are several federal
93
Certificates of Confidentiality allow the investigator and others who have access to research records to
refuse to disclose identifying information on research
participants in any civil, criminal, administrative, legislative, or other proceeding, whether at the federal,
state, or local level. Certificates of Confidentiality
may be granted for studies collecting information that,
if disclosed, could have adverse consequences for
subjects or damage their financial standing, employability, insurability, or reputation (such as drug use,
sexual behavior, HIV status, mental illness).
Research need not be federally supported to be eligible for this privacy protection. Certificates of Confidentiality are issued by various Public Health Service
component agencies, the Food and Drug Administration, the Health Resources and Services Administration, and the National Institutes of Health. Researchers
are expected to inform subjects in the consent form
about the Certificate of Confidentiality protections and
the circumstances in which disclosures would be made
to protect the subject and others from harm (such as
suicidal intention, child abuse, elder abuse, intention to
harm others) and certain types of federal audits.
There is very little legal precedent considering the
scope of the protections afforded by Certificates of
Confidentiality. However, in at least one case from
1973 (People v. Newman), a New York state court of
appeals found that a certificate provided a substance
abuse program with a proper basis for refusing to turn
over the names of program participants.
94
Further Readings
What race or races are you? (Please check all that apply)
___ Asian
___ Black
___ Native American
___ Pacific Islander
___ White
Figure 1
Chi-Square
Adam Safir
See also Forced Choice; Primacy Effect; Questionnaire
Design; Response Order Effects; Satisficing; Show Card
Further Readings
CHI-SQUARE
The chi-square (2 ) is a test of significance for categorical variables. Significance tests let the researcher know
what the probability is that a given sample estimate
actually mirrors the entire population. The chi-square
95
2 =
fo fe 2
,
fe
Support/Oppose
U.S. Involvement
in Iraq
Female
Male
Total
Support
170
200
370
Oppose
250
150
400
Total
420
350
770
96
Closed-Ended Question
Further Readings
CLOSED-ENDED QUESTION
A closed-ended survey question is one that provides
respondents with a fixed number of responses from
Clustering
Further Readings
CLUSTERING
In broad terms, clustering, or cluster analysis, refers
to the process of organizing objects into groups whose
members are similar with respect to a similarity or
distance criterion. As such, a cluster is a collection of
similar objects that are distant from the objects of
other clusters. Unlike most classification techniques
that aim to assign new observations to one of the
many existing groups, clustering is an exploratory
procedure that attempts to group objects based on
their similarities or distances without relying on any
assumptions regarding the number of groups.
Applications of clustering are many; consequently,
different techniques have been developed to address
the varying analytical objectives. There are applications
(such as market research) in which clustering can be
used to group objects (customers) based on their behaviors (purchasing patterns). In other applications (such
as biology), clustering can be used to classify objects
(plants) based on their characteristics (features).
Depending on the application and the nature of data
at hand, three general types of data are typically used in
clustering. First, data can be displayed in the form of an
O C matrix, where C characteristics are observed on
O objects. Second, data can be in the form of an N N
similarity or distance matrix, where each entry represents a measure of similarity or distance between the
two corresponding objects. Third, data might represent
presumed group membership of objects where different
observers may place an object in the same or different
groups. Regardless of data type, the aim of clustering is
to partition the objects into G groups where the structure and number of the resulting natural clusters will be
determined empirically. Oftentimes, the input data are
converted into a similarity matrix before objects are
portioned into groups according to one of the many
clustering algorithms.
It is usually impossible to construct and evaluate all
clustering possibilities of a given set of objects, since
there are many different ways of measuring similarity
or dissimilarly among a set of objects. Moreover, similarity and dissimilarly measures can be univariate or
97
98
Cluster Sample
points for the donating and receiving clusters are recalculated to identify the structure of the resulting clusters.
Aside from the algorithm chosen for clustering, several guidelines have been developed over the years
regarding the number of clusters. While a few of these
guidelines rely on visual clues such as those based on
sizable change in dendograms, others incorporate formal
statistical tests to justify further bisecting of clusters. It
has been suggested that visual guidelines can be somewhat ad hoc and result in questionable conclusions.
Test-based approaches, on the other hand, might require
more distributional conformity than the data can afford.
Mansour Fahimi
See also SAS; Statistical Package for the Social
Sciences (SPSS)
Further Readings
CLUSTER SAMPLE
Unlike stratified sampling, where the available information about all units in the target population allows
researchers to partition sampling units into groups
(strata) that are relevant to a given study, there are situations in which the population (in particular, the sampling
frame) can only identify pre-determined groups or clusters of sampling units. Conducive to such situations,
a cluster sample can be defined as a simple random sample in which the primary sampling units consist of clusters. As such, effective clusters are those that are
heterogeneous within and homogenous across, which is
a situation that reverses when developing effective strata.
In area probability sampling, particularly when
face-to-face data collection is considered, cluster samples are often used to reduce the amount of geographic
dispersion of the sample units that can otherwise result
from applications of unrestricted sampling methods,
such as simple or systematic random sampling. This is
how cluster samples provide more information per unit
cost as compared to other sample types. Consequently,
Cochran, W. G. (19091980)
COCHRAN, W. G.
(19091980)
William Gemmell Cochran was an early specialist
in the fields of applied statistics, sample surveys,
99
experimental design, observational studies, and analytic techniques. He was born in Rutherglen, Scotland,
to Thomas and Jeannie Cochran on July 15, 1909,
and he died on Cape Cod, Massachusetts, on March
29, 1980, at the age of 70. In 1927, Cochran participated in the Glasgow University Bursary competition
and took first place, winning enough funds to finance
his education. After taking a variety of classes, he
was awarded an M.A. in mathematics and physics at
the University of Glasgow in 1931. He then received
a scholarship for a Cambridge University doctoral
program, where he studied mathematics, applied
mathematics, and statistics. He began his professional
career at the Rothamsted Experimental Station in England after being persuaded by Frank Yates to leave
Cambridge prior to the completion of his doctorate.
Cochran remained at Rothamsted until 1939, working
on experimental designs and sample survey techniques, including a census of woodlands with colleague
and mentor Yates. During his years at Rothamsted,
Cochran remained in touch with R. A. Fisher and was
heavily influenced by Fisherian statistics. In his 5 years
at Rothamsted (19341939), he published 23 papers.
Also during his time at Rothamsted, Cochran met and
married Betty I. M. Mitchell.
In 1939 Cochran accepted a post in statistics at Iowa
State University, where he taught from 1939 to 1946.
His task at Iowa was to develop their graduate program
in statistics. During his years at Iowa he both served on
and chaired the advisory panel to the U.S. Census and
published a number of papers on experimental design.
Cochran joined Samuel Wilks and the Statistical
Research Group at Princeton University in 1943, examining probabilities of hits in naval warfare and the efficacy of bombing raid strategies. Shortly after World
War II, he joined Gertrude Cox at the North Carolina
Institute of Statistics, where he assisted in developing
graduate programs in statistics. Cochran chaired the
Department of Biostatistics at Johns Hopkins University
from 1949 until 1957. During this time he authored two
books, Sampling Techniques and (in collaboration with
Gertrude Cox) Experimental Designs. In 1957 Harvard
University established a Department of Statistics and
appointed Cochran to head the department. Cochran
remained at Harvard until his retirement in 1976.
During his career, Cochran was lauded with many
honors. He was the president of the Institute of
Mathematical Statistics in 1946, the 48th president of
the American Statistical Association in 19531954,
president of International Biometric Society 19541955,
100
Codebook
Further Readings
CODEBOOK
Codebooks are used by survey researchers to serve
two main purposes: to provide a guide for coding
Coder Variance
CODER VARIANCE
Coder variance refers to nonsampling error that arises
from inconsistencies in the ways established classification schemes are applied to the coding of research
observations. In survey research, coder variance is
associated with the process of translating the raw or
verbatim data obtained from open-ended survey items
into a quantitative format that can be analyzed by
computers.
To appreciate how coder variance can occur, it
is useful to review the process of preparing openended survey item data for analysis. Once all or
101
102
Coding
CODING
Coding is the procedural function of assigning concise
and specific values (either alpha or numeric) to data
elements collected through surveys or other forms of
research so that these data may be quickly and easily
counted or otherwise processed and subjected to statistical analyses, most often using a computer. These
values may be alphanumeric in format, although it is
common practice to use entirely numeric characters or
entirely alphabetical characters when assigning labels.
Numeric character values generally are almost universally referred to as numeric codes while alphabetical
character values (and sometimes alphanumeric labels)
are commonly referred to in several fashions, including
strings, string codes, and alpha codes, among
others.
Inasmuch as data processing and analysis is typically accomplished through the use of specialized
computer application software programs (e.g., Statistical Package for the Social Sciences [SPSS] or SAS),
the assignment of designated values permits data to
be transferred from the data collection instrument
(which itself may be an electronic system, such as
a computer-assisted telephone interviewing network)
into a compact, computer-readable, database form.
The process of value development and specification may occur at any of several points in time during
the conduct of the research project.
Precoding refers to code development and specification that occurs prior to the commencement of data
collection activities. Precoding is appropriate for those
data elements of the study where observations (e.g.,
respondent responses to survey questions) can be
anticipated and exhaustively (or nearly exhaustively)
specified before the research data are collected.
As such, in survey research, precoding is routinely
employed for all closed-ended items, all partly
closed-ended items, and certain open-ended questions
with which the investigator can anticipate the exhaustive range or set of possible responses. In addition,
precoding occurs naturally and virtually automatically
for open-ended items where clear constraints pertaining to the respondents answer are implied by the
question itselffor example, How many times, if any,
in the past year did you visit a dentist for any type of
dental care?and, for this reason, such questions are
said to be self-coding.
In contrast, postcoding refers to code development
and assignment that occur after data collection activities have begun. Most often, postcoding refers to code
development and specification procedures implemented after the completion of data collection. However,
to reduce the length of time between the data collection and subsequent data analysis activities of a study,
postcoding might be initiated during data collection
whenever a reliable subset of the full data set has
been collected or when there is prior experience with
similar questions.
Precoded labels are typically assigned in a manner
that coincides with the measurement level implied
103
to correspond to these categories. Then, once categories and corresponding labels have been established, item data for each interview are reviewed and
one or more of these code labels are assigned to represent the information that was collected.
Standard research practice is to document the
coded label values for each planned research observation (i.e., survey interview item) in a codebook. This
document is more than just a listing of coded values,
however; it is a blueprint for the layout of all information collected in a study. As such, the codebook not
only identifies the value assigned to each research
datum (i.e., survey answer, observation, or measurement) and the name of that value (i.e., the value
label), but it also documents each labels meaning,
specifies the name used to identify each item (i.e.,
variable name), includes a description of each item
(variable label), and defines the data structure and
reveals the specific location within that structure in
which coded label values are stored.
Jonathan E. Brill
See also Closed-Ended Question; Codebook; Content
Analysis; Interval Measure; Nominal Measure;
Open-Ended Question; Ordinal Measure; Precoded
Question; Ratio Measure; SAS; Statistical Package
for the Social Sciences (SPSS)
104
Examples of Basic
CASM Research Studies
Some of this experimentation has concerned issues of
response order effects, or how the respondents tendency to select a particular response category (e.g.,
choice of a vague quantifier such as excellent, very
good, good, fair, poor, or very poor) may depend on
the order in which these options appear. Experiments
by Jon Krosnick and colleagues have determined that
response order effects depend on factors such as survey
administration mode, for reasons having a cognitive
basis. When response categories appear visually, as on
a self-administered instrument, a primacy effect is often
observed, where respondents are more likely to select
items early in the list, presumably due to motivational
factors such as satisficing that lead to fuller processing
of earlier items than later ones. On the other hand,
when the same response categories are read aloud under
interviewer administration, a recency effect is obtained,
in which later items in the list are more likely to be
selected. From a cognitive point of view, recency
effects are hypothesized to occur due to short-term
memory limitations, where the items read most recently
(those later in the list) are better represented in the
respondents memory and are therefore favored.
As a further example of experimentally oriented
basic CASM research, Norbert Schwarz and colleagues
cited in Tourangeau et al. have considered the effects
of open-ended versus closed response categories for
questions that ask about the frequency and duration of
105
Further Readings
106
Cognitive Interviewing
COGNITIVE INTERVIEWING
Cognitive interviewing is a psychologically oriented
method for empirically studying the ways in which
individuals mentally process and respond to survey
questionnaires. Cognitive interviews can be conducted
for the general purpose of enhancing the understanding of how respondents carry out the task of answering survey questions. However, the technique is more
commonly conducted in an applied sense, for the purpose of pretesting questions and determining how
they should be modified, prior to survey fielding, to
make them more understandable or otherwise easier
to answer.
The notion that survey questions require thought
on the part of respondents is not new and has long
been a central premise of questionnaire design. However, cognitive interviewing formalizes this process,
as it approaches the survey response task from the
vantage point of cognition and survey methodology
(CASM), an interdisciplinary association of cognitive
psychologists and survey methodologists. The cognitive interview is generally designed to elucidate four
key cognitive processes or stages: (1) comprehension
of the survey question; (2) retrieval from memory
of information necessary to answer the question;
(3) decision or estimation processes, especially relating to the adequacy of the answer or the potential
threat it may pose due to sensitive content or demands
of social desirability; and (4) the response process, in
which the respondent produces an answer that satisfies the task requirements (e.g., matching an internally
generated response to one of a number of qualitative
response categories on the questionnaire).
For example, answering the survey question In the
past week, on how many days did you do any work for
pay? requires that the respondent comprehends the key
elements week and work for pay, as well as the
overall intent of the item. He or she must retrieve relevant memories concerning working and then make
a judgment concerning that response (for instance, the
individual may have been home sick all week, but in
keeping with the desire to express the notion that he or
she is normally employed, reports usual work status).
Finally, in producing a response, the respondent will
provide an answer that may or may not satisfy the
requirements of the data collector (e.g., Four; Every
day; Yes, I worked last week). The cognitive model
proposes that survey questions may exhibit features that
preclude successful cognitive processing and that may
result in survey response error (in effect, answers that
are incorrect). In the preceding example, the question
may contain vague elements (week; work for pay)
that create divergent interpretations across respondents;
or it may induce biased responding (e.g., the socially
desirable impulse to provide a nonzero response).
Cognitive Interviewing
think-aloud procedure was adapted from psychological laboratory experiments and requires subjects to
verbalize their thoughts as they answer survey questions. The interviewer prompts the subject as necessary by providing feedback such as Tell me what
you are thinking or Keep talking. The researchers
then analyze the resulting verbatim verbal stream to
identify problems in answering the evaluated questions that have a cognitive origin. For example, the
subjects verbal protocol relating to the preceding
question on work status might include a segment stating, Besides my regular job, last Saturday I, uh, did
help a friend of a friend move into a new apartment
he gave me pizza and beerand a gift card that was
lying around with a little money on it still, so I guess
you could call that working for pay, but Im not sure
if thats supposed to count. Given this accounting,
the investigators might surmise that the meaning of
work for pay is unclear, in this case concerning
irregular work activities that result in noncash remuneration. Especially if this finding were replicated
across multiple cognitive interviews, the questionnaire
designer could consider revising the question to more
clearly specify the types of activities to be included or
excluded.
However, practitioners have observed that some
subjects are unable to think aloud effectively, and that
the pure think-aloud approach can be inefficient for
purposes of testing survey questions. Therefore, an
alternative procedure, labeled verbal probing, has
increasingly come into prominence and either supplements or supplants think aloud. Probing puts relatively
more impetus on the interviewer to shape the verbal
report and involves the use of targeted probe questions
that investigate specific aspects of subjects processing
of the evaluated questions. As one common approach,
immediately after the subject answers the tested question, the interviewer asks probes such as Tell me
more about that; and What does the term work for
pay make you think of? Probe questions are sometimes designed to tap a specific cognitive process (e.g.,
comprehension probes assess understanding of the
question and its key terms; retrieval probes assess
memory processes). However, probes also lead the subject to provide further elaboration and clarify whether
the answer provided to the evaluated question is consistent with and supported by a picture gleaned through
a more thorough examination of the subjects situation.
Verbal probing can be used to search for problems,
proactively, when probes are designed prior to the
107
interview, based on the anticipation of particular problems. Or, probes may be reactive, when they are
unplanned and are elicited based on some indication by
the subject that he or she has some problem answering
it as intended (e.g., a delay in answering or a response
that seems to contradict a previous answer). The proactive variety of probing allows the cognitive interviewer
to search for covert problems that otherwise do not surface as a result of the normal interchange between interviewer and subject. Conversely, reactive probes enable
follow-up of unanticipated overt problems that emerge.
Further, the type of probing that is conducted
depends fundamentally on variables such as survey
administration mode. For interviewer-administered
questions (telephone or in person), probes are often
administered concurrently, or during the conduct of the
interview, immediately after the subject has answered
each tested question. For self-administered questionnaires in particular, researchers sometimes make use of
retrospective probes, or those administered in a debriefing step after the main questionnaire has been completed, and that direct the subject to reflect on the
questions asked earlier. Concurrent probing provides
the advantage of eliciting a verbal report very close to
the time the subject answers the tested questions, when
relevant information is likely to remain in memory.
The retrospective approach risks the loss of such memories due to the delay between answering the question
and the follow-up probes. On the other hand, it more
closely mirrors the nature of the presentation of the targeted questions during a field interview (i.e., uninterrupted by probes) and prompts the subject to reflect
over the entire questionnaire. Cognitive interviewing
approaches are flexible, and researchers often rely both
on concurrent and retrospective probing, depending on
the nature of the evaluated questionnaire.
108
Cognitive Interviewing
Variation in Practice
Although cognitive interviewing is a common and wellestablished pretesting and evaluation method, the precise activities that are implemented by its practitioners
vary in key respects. Cognitive testing of questionnaires
used in surveys of businesses and other establishments
places significant emphasis on information storage and
retrieval, especially because relevant information is
Common Rule
COLD CALL
A cold call refers to the circumstance that takes place
in many surveys when a respondent is first called or
contacted in person by a survey interviewer without
any advance knowledge that he or she has been sampled to participate in the survey, and thus does not
know that the call or contact is coming. This circumstance contrasts to other instances in which some
form of advance contact has been made with the sampled respondent to alert him or herthat is, to warm
up him or herthat he or she has been sampled and
that an interviewer soon will be in contact. Survey
response rates consistently have been found to be
lower for those sampled respondents that receive cold
calls than for those that receive advance contact.
For many people who are sampled in telephone
surveys, there is no way that researchers can use an
advance mail contact technique because all that is
known about the sampled household is the telephone
number. This occurs even after the researchers have
run matches of sampled telephone numbers against
address databases and no address match is identified.
Granted, an advance telephone contact attempt could
be made in which a recorded message is left alerting
109
COMMON RULE
The Common Rule refers to a set of legal and ethical
guidelines designed for protection of human subjects
in research either funded by federal agencies or taking
place in entities that receive federal research funding.
The term Common Rule technically refers to all the regulations contained in Subpart A of Title 45 of the Code
of Federal Regulations Part 46 (45 CFR 46). As applied
to survey research, the most important elements of the
110
Common Rule
Background
In the early 1970s, a number of high-profile cases of
clearly unethical research made headlines and resulted
in calls for congressional hearings. A few of the most
striking examples include the following:
The Tuskegee Syphilis Study (19321972). Begun in
1932 to test syphilis treatments, the federal Public
Health Service enrolled hundreds of African American
men to participate. Deception was a key feature of the
research from the start, but it was taken to new levels
in the 1940s, after penicillin was proven an effective
cure for syphilis. The researchers prevented their subjects from obtaining beneficial medical treatment and
maintained their deception until 1972, when details of
the study first came out in the press. The study directly
caused 28 deaths, 100 cases of disability, and 19 cases
of congenital syphilis and was in direct violation of
several elements of the Nuremberg Code (1945), developed after World War II in response to Dr. Joseph
Mengeles infamous experiments on Nazi concentration camp victims.
Milgrams Experiments on Obedience to Authority. In
attempting to determine the extent to which typical
Americans might be willing to harm others simply
because an authority figure told them to, psychologist Stanley Milgram designed an experiment in the
early 1960s in which the subjects believed that they
were delivering ever-stronger electrical shocks to
a learner who was actually part of the research
team. A large majority of subjects continued to comply even after they believed they were causing severe
pain, unconsciousness, and even, potentially, death.
Very early on, subjects showed clear signs of severe
psychological stress, but Milgram continued his
experiments to the end, even adding an especially
cruel treatment condition in which the subject had to
physically hold the victims hand in place. (The
ethics of Milgrams work has been debated for years,
but many believe that it served a very positive role in
showing the power and danger of authoritarianism and
also served as an important warning to the scientific
community for the need to make more formal and
stringent ethical procedures for all social research.)
Zimbardos Prison Experiment. As part of a research
study, and after randomly assigning student volunteers
to be either prisoners or guards in the early
1970s, psychologist Philip Zimbardo found that
The Milgram and Zimbardo experiments, in particular, served as wake-up calls to social science researchers who, until that point, had generally considered
research ethics a topic of interest to medical research
but not to the social sciences. In both cases the unethical behavior occurred not so much with regard to the
research designs but rather with regard to the choices
the researchers made after their studies went in unanticipated harmful directions. The principal investigators
decided to continue their experiments long after they
were aware of the harm they were causing their
research subjects, a fact that made comparisons to the
Tuskegee Experiment both inevitable and appropriate.
Indeed, by failing to balance the anticipated benefits of
the research with the risks to their subjects, they were
in violation of a key provision of the Nuremberg Code.
Congressional and
Regulatory Action
As a result of press reports and resultant public outcries
about these cases, Congress held hearings in 1973 titled
Quality of Health CareHuman Experimentation.
The hearings led to the passage of the National
Research Act of 1974, which established the National
Commission for the Protection of Human Subjects of
Biomedical and Behavioral Research and required the
creation of institutional review boards (IRBs) at all
institutions receiving funding from the Department of
Health, Education, and Welfare (HEW).
The commission was charged to identify the basic
ethical principles that should underlie the conduct of
biomedical and behavioral research involving human
subjects and to develop guidelines . . . to assure that
such research is conducted in accordance with those
principles. The first regulations were issued as 45
CFR 46, Regulations for the Protection of Human
Subjects of Biomedical and Behavioral Research, in
1974 by HEW (now Health and Human Services, or
HHS); these were revised and expanded on after the
release of the commissions report in April 1979. The
Belmont Report first laid out three Basic Ethical
Principles: (1) respect for persons, (2) beneficence,
and (3) justice. Then it detailed specific ways in
Completed Interview
111
COMPLETED INTERVIEW
The completed interview survey disposition is used in
all types of surveys, regardless of mode. In a telephone
or in-person interview, a completed interview results
112
Completion Rate
Further Readings
COMPLETION RATE
The term completion rate has been used often in the
survey research literature to describe the extent of
cooperation with and participation in a survey. However, it is an ambiguous term because it is not used
consistently. Therefore readers of the literature should
interpret the term with caution.
Completion rate is often used to describe the portion
of a questionnaire that has been completed. In selfadministered surveys, it is used widely to differentiate
between the number of eligible individuals who do not
complete a questionnaire and those who do. In this context, the completion rate is the number of questionnaires
completed divided by all eligible and initially cooperating sample members. Researchers using completion rate
in this sense should state so explicitly. This rate is an
important indicator of item nonresponse in self-administered surveys. It has implications for the visual layout
of a self-administered instrument, since the layout may
affect how willing sample members are to complete the
questionnaire. In addition, it also has implications for
the content and the placement of critical questions in
the questionnaire.
Completion rate is also an umbrella term used to
describe the extent of sample participation in a surveyincluding the response rate, the contact rate, and
the cooperation rate. Since these outcome rates are
often used as criteria for evaluating the quality of survey data, analysts and other data users should know
which rate is being referred to by the term completion
rate. The response rate indicates the proportion of the
total eligible sample that participates in the survey, the
contact rate indicates the proportion of those contacted
out of all eligible sample members, and the cooperation
rate indicates the proportion of the contacted sample
that participates in (or consents to participate in) the
survey. The American Association for Public Opinion
Research (AAPOR) recommends that researchers define
how they are using the terms response rate, contact
rate, and cooperation rate and offers standard definitions for these terms and how they should be calculated.
AAPOR recommends that researchers explain in detail
how they calculated the rates and how they categorized
the disposition codes. Of note, AAPOR does not define
the calculation of the term completion rate.
In addition to responding to a survey, people may
participate in studies in other ways as well, and instruments other than questionnaires are often used to collect data. For instance, a screener interview may be
used to determine an individuals eligibility for a study
before he or she is asked to participate in the full survey. In addition to self-reported information collected
during an interview, other data may be collected from
participants, such as biomeasure data (height and
weight measures, hair samples, or saliva samples). In
epidemiological or randomized controlled studies,
sample members may be asked to participate in
a health regimen, in special education programs, or in
an employment development program. The term completion rate may therefore be used to indicate the
extent to which any or all of these activities have been
completed. This more or less universal nature of
the term underscores the importance of defining how
it is being used in any given context. For example, in
reporting findings based on biomeasure data, researchers should be clear about whether completion means
completing the questionnaire only or if they are referring to completing the additional data collection.
Because it is impossible to assign a term to every
possible permutation of a survey, it is critical for
researchers to fully explain the sense in which they are
using terms such as completion rate. It is equally important to use the terminology defined by the standardsetting organization(s) in a given discipline so as to
promote a common understanding and use of terms.
Danna Basson
See also Completed Interview; Cooperation Rate; Final
Dispositions; Partial Completion; Response Rates;
Standard Definitions
Further Readings
113
Stratification
One aspect of a complex sampling design may involve
stratification, defined as a partition of the population
into mutually exclusive and collectively exhaustive subsets called strata. One primary reason for using stratification is usually associated with the recognition that
members of the same stratum are likely to be more similar to each other than members of different strata.
Other reasons for using stratification include the desire
to have every part of the population represented, or the
desire to reduce sampling variance by using a larger
sampling fraction in strata when the unit variance is
larger than in more homogeneous strata, or it may
reflect a strategy based on differential data collection
costs from stratum to stratum. Stratification could
also be used if stratum-specific domain estimates are
desired. As previously alluded to, the sampling fractions
used within the different strata may or may not be the
same across all the strata. Strata may be explicit, and
the number of units to be selected from each strata may
be determined beforehand. Or stratification may be
114
Cluster Designs
While stratification attempts to partition the population
into sets that are as similar to each other as possible,
clustering tries to partition the population into sets that
are as heterogeneous as possible, but where data collection is less expensive by selecting a number of clusters
that contain population units. One example is in a survey
of students in which a given number of schools are
selected, and then students are sampled within each of
those chosen schools or clusters. In this case, the schools
are called the primary sampling units (PSUs), while
the students within the schools are referred to as the
secondary sampling units (SSUs). It is possible to
take either a sample or census of the secondary sampling units contained within each of the selected clusters. This would be the case when sampling additional
units is extremely inexpensive, such as sampling entire
classrooms from selected schools. More common, however, is to select clusters as a first sampling stage and
then to select a subset of units within the clusters as
a second stage. Sometimes there are more than two
stages within a design, such as when school districts are
selected first, then schools within the districts, and then
intact classrooms within the schools.
Another variant of cluster design is the multi-phase
design. In this instance the clusters are selected as in
a multi-stage design, but instead of selecting units
within each cluster, units are selected from the union
of all units within the selected clusters. Of course,
depending on the assigned probabilities and selection
method, some multi-phase designs are strictly equivalent to multi-stage designs.
Weighting
The purpose of sampling in a survey is to obtain an
estimate of a parameter in the population from which
the sample was drawn. In order to do this, one must
Composite Estimation
know how to weight the sampled units. The most common approach to weighting is to calculate a probability
of selection and then take its multiplicative inverse.
This yields the Horvitz-Thompson estimator, and
though it seems straightforward, there are many designs
for which this estimator is difficult or impossible to
obtain. Dual-frame estimators represent a case in which
the straightforward Horvitz-Thompson estimators have
to be modified to incorporate the probability of being
included into the sample via multiple frames. It is often
the case that the initial weights (i.e., inverse of selection
probability) are not the final versions used to produce
the final estimates. Rather, the weights are often
adjusted further to account for population sizes and/or
nonresponse using a variety of techniques, including
post-stratification, trimming of the weights, and the use
of ratio or regression estimators.
Variance Estimation
Survey weights as well as the design upon which the
weights are computed play an important role in both
the parameter estimates and variance computations.
Whereas estimating the variance of simple survey estimates is rather straightforward, variance estimation in
complex sample surveys is much more complicated.
Some sampling approaches have variance formulas that
may be applied, but a multi-stage approach in which
clusters are sampled with PPS and weight adjustments
are made can be far more complex. There are two basic
sets of methods that may be used: (1) Taylor series linearization and (2) replicate methods. In each of these
methods it is important, although not always obvious,
that the design be properly specified. One important
consideration is that if a PSU is sampled with certainty,
it must be treated as a stratum, and the units at the next
level of sampling should be treated as PSUs.
Taylor series linearization has the advantage of
using a straightforward approach that is available in
many standard statistical packages. Replicate methods, such as the jackknife and balanced half sample
pseudo-replications, allow one to reproduce aspects of
the design, taking imputation into account. These methods are also available in many packages, but it is also
easy to fail to specify the design properly. A more complex method is the bootstrap, which needs to be programmed specific to each design but allows for a closer
reproduction of the initial sample.
Pedro Saavedra
115
COMPOSITE ESTIMATION
Composite estimation is a statistical estimation procedure that combines data from several sources, for
example, from different surveys or databases or from
different periods of time in the same longitudinal survey. It is difficult to describe the method in general,
as there is no limit to the ways one might combine
data when various useful sources are available. Composite estimation can be used when a survey is conducted using a rotating panel design with the goal of
producing population estimates for each point or
many points in time. If the design incorporates rotating groups, composite estimation can often reduce the
variance estimates of level variables (e.g., totals,
means, proportions). In addition, composite estimation
can reduce the variance estimates of variables dealing
with changes over time, depending on the structure of
the sample design, the strength of the correlations
between group estimates over time, and other factors.
116
Composite Estimation
Specific Examples
of Composite Estimators
A specific example of a composite estimator is the
one used in the Current Population Survey, jointly
sponsored by the U.S. Bureau of Labor Statistics and
the Census Bureau, to measure the U.S. labor force.
In each month, separate estimates of characteristic
totals are obtained from the eight rotation groups. Six
of these groups contain households that were interviewed the prior month. The composite estimator
implemented in 1998 combines the estimates from
current and prior months to estimate the number of
unemployed using one set of compositing coefficients,
and the number of employed using a different set that
reflects the higher correlations over time among estimates of employed:
YtCE = 1 KYtAVG + KYtCE
1 + Dt 1, t + Abt ,
where YtAVG is the average of the estimates of total from
the eight rotation groups; Dt 1, t is an estimate of
change based only on the six rotation groups canvassed
at both times t 1 and t; bt is an adjustment term
inserted to reduce the variance of YtCE and the bias arising from panel conditioning; and (K, A) = (0.4, 0.3)
when estimating unemployed, and (0.7, 0.4) when estimating employed. For researchers, a problem with composite estimates is producing them from public use
microdata files, because computing the composite estimate for any period generally requires one to composite
recursively over a number of past periods. This problem
has been addressed for the Current Population Survey,
which now produces and releases a set of composite
weights with each months public use file. First, for
Comprehension
117
Pt = 1 bxAt
+ bPt 1 Dt 1, t ,
COMPREHENSION
Survey researchers, in developing questions, must bear
in mind the respondents ability to correctly grasp the
question and any response categories associated with the
question. Comprehension, which is defined in this
118
Timothy Vercellotti
See also Cognitive Aspects of Survey Methodology
(CASM); Cognitive Interviewing; Focus Groups; Pilot
Test; Questionnaire Design; Reliability; Response
Alternatives
Further Readings
COMPUTER-ASSISTED PERSONAL
INTERVIEWING (CAPI)
Computer-assisted personal interviewing (CAPI)
refers to survey data collection by an in-person interviewer (i.e., face-to-face interviewing) who uses
a computer to administer the questionnaire to the
respondent and captures the answers onto the computer. This interviewing technique is a relatively new
development in survey research that was made possible by the personal computer revolution of the 1980s.
Background
To understand the evolution of CAPI it is necessary
to understand the history that led to its development
and widespread implementation. In the late 1980s,
many surveys used early versions of computer-assisted
telephone interviewing (CATI). The early CATI systems ran as terminal applications on a mainframe or
minicomputer. Computer applications typically used
compilers; the central computer had to handle many
simultaneous processes to service a CATI research facility. The cost of mainframes and more capable minicomputer systems was so high that the economic case that
CATI should replace paper-and-pencil interviewing
(PAPI) was tenuous. In addition, CATI facilities tended
to use interviewers quite intensively and with close
supervision, so interviewers tended to make fewer errors
of the sort that computerized systems suppress, at least
relative to face-to-face interviewers. With computing
costs high, CATI was not a strong value proposition.
As personal computers (PCs) started to penetrate the
market, they offered only modest processing power
but CATI interviews did not require much power. An
intensively used PC could be cost-effective, and its capabilities matched the CATI task better than a mainframe
or minicomputer did. There was no strong need to have
a networked solution for PC computing, since CATI
facilities could use low-tech case management and
scheduling systems and still get the work done.
119
Usage
When it comes to the field effort, it is important
to remember that, more and more, survey efforts are
multi-modal. Face-to-face surveys frequently work
many of their cases over the phone, self-administered
on the Web, by mail, or even self-administered on
a personal digital assistant (PDA) or some other
device. Unless the technical approach handles multimodal surveys efficiently, the survey preparation
phase will require a separate programming effort for
each mode. Apart from the multi-modal aspects,
120
COMPUTER-ASSISTED
SELF-INTERVIEWING (CASI)
Computer assisted self-interviewing (CASI) is a
technique for survey data collection in which the
respondent uses a computer to complete the survey
questionnaire without an interviewer administering it
to the respondent. This assumes the respondent can
read well (enough) or that the respondent can hear the
questions well in cases in which the questions are prerecorded and the audio is played back for the respondent one question at a time (audio computer assisted
self-interviewingACASI).
A primary rationale for CASI is that some questions
are so sensitive that if researchers hope to obtain an
accurate answer, respondents must use a highly confidential method of responding. For a successful CASI
effort, the survey effort must consider three factors: (1)
the design of the questions, (2) the limitations of the
respondent, and (3) the appropriate computing platform.
Unless one has a remarkable set of respondents, the
sort of instrument needed for CASI (or any self-administered interview) will be different from what one uses
when a trained interviewer is administering the questionnaire. Having a seasoned interviewer handling the
questioning offers a margin of error when designing
questions. When the researcher is insufficiently clear,
she or he essentially counts on the interviewers to save
the day. Their help comes in a variety of forms. The
interviewer can explain a question the respondent asks
about. Good interviewing technique requires the interviewer to avoid leading the respondent or suggesting
what the expected or correct response is. The interviewer can also help salvage bad questions when the
respondents answer reveals that, although the respondent showed no overt confusion about the question, it is
clear that the respondent either did not understand the
question or took the question to be something other
than what was asked. The interviewer can also help out
the questionnaire designer when, during a complex
interview, it becomes clear to the interviewer that something has gone wrong with the programming and the
item occurs in a branch of the questionnaire where it
should not be. The interviewer can then try to put things
right or at least supply a comment that will help the
central office sort out the problem.
In all of these cases, the interviewer plays a crucial
role in improving data quality. With a self-administered
121
122
fills and alternate language versions to adapt to respondents whose first language is not English. As equipment
becomes smaller and more capable, survey researchers
are beginning to set up ACASI interviews on Palm
Pilots or other handheld devices. Ohio State University
recently deployed an ACASI interview on Palm Pilots
using an interview that took about an hour to completean interview full of very sensitive questions.
The process went very smoothly; the respondent wore
headphones to hear the question and tapped on the
answer with a stylus on the Palms screen. Respondents
whose reading skills are strong can choose to turn off
the audio. Because no one can tell whether the respondent is using audio, there is no stigma to continuing to
use it.
Most interviews, however, do not consist only of
sensitive questions. By putting the sensitive questions in
one section, one can often switch between modes, using
a CASI or ACASI method only where it is necessary.
In fact, interviewers have been utilizing CASI since
they first walked into a household with a computer, just
as interviewers have turned a paper-and-pencil interviewing (PAPI) document into a self-administered
questionnaire when they thought circumstances required
it. For example, when a questionnaire asked about former spouses and the current spouse was in the house,
savvy interviewers would simply point to the question
in the booklet and say something like And how would
you answer this one? With a laptop, the interviewer
would simply twist the machine around and have the
respondent enter an answer.
There are differences when using CASI within
a telephone interview. One can conceal the line of
sensitive questioning from another person in the room
by structuring the questionnaire to require simple
Yes or No responses, simple numbers, and the
like. While this affords some confidentiality from
eavesdropping, it does nothing to conceal the respondents answers from the interviewer. To work on this
problem there has been some limited experimentation
at Ohio State using Voice over Internet Protocol (VoIP)
methods. With some sophisticated methods, one can
transfer the respondent to a system that speaks the questions by stringing together voice recordings and then
interprets the respondents answers and branches to the
appropriate question. When done with the sensitive
questions, the robot reconnects the interviewer and
the interview continues. This approach works quite well
and allows telephone interviews to achieve a measure
of the security attained with other closed methods,
COMPUTER-ASSISTED TELEPHONE
INTERVIEWING (CATI)
Computer-assisted telephone interviewing (CATI) in
its simplest form has a computer replacing the paper
questionnaire on a telephone interviewers desk.
Advantages of Computer-Assisted
Telephone Interviewing
CATI provides the following advantages:
More efficient data collection, because the interviewer enters answers directly into the computer
rather than sending a paper questionnaire for a separate data capture step.
More efficient and more accurate questionnaire administration, because the computer delivers the questions
to the interviewer in the correct programmed sequence,
including any required rotations, randomizations, or
insertions of information from a separate data file or
from earlier in the interview.
While this has been the basic model for CATI systems since they were first introduced in the 1970s,
and some CATI systems still have only this questionnaire administration component, technological developments during the past 30 years have provided many
more ways in which the computer can assist the telephone interviewing process.
Sample Management
and Call Scheduling
Most CATI programs now have at least two modules,
one being the questionnaire administration tool already
described, the other providing sample management and
call scheduling functions, such as the following:
123
The sample management module often has a separate supervisor interface, which enables the supervisor
to execute additional sample management functions,
such as stopping particular numbers from being delivered to increasing for a limited period of time the priority of numbers in strata where the survey is lagging.
124
While some dialers have some sample management and basic questionnaire administration capabilities, at the time of writing there are few systems that
manage the sophistication in questionnaire administration or sample management that is typically needed in
survey work.
COMPUTERIZED-RESPONSE AUDIENCE
POLLING (CRAP)
A number of survey designs deviate from the parameters of a scientific probability design, with significant
consequences for how the results can be characterized.
Computerized-response audience polling (CRAP) is an
example of such a design. In this kind of poll, a sample
of telephone numbers is typically purchased and loaded
into a computer for automatic dialing. The questionnaire is produced through computer software that
employs the digitized voice of someone assumed to be
known to many of those who are sampled, such as the
voice of a newscaster from a client television station.
After an introduction, the computerized voice goes
through the questionnaire one item at a time, and the
respondent uses the key pad on a touchtone phone to
enter responses to each question asked, as in an interactive voice response (IVR) system.
125
A major problem with CRAP polls is that the methodology does not allow for specific respondent selection, meaning that the basic premise of probability
sampling, namely that each respondent has a known,
nonzero probability of selection, is violated. Interviews
are conducted with whoever answers the phone, and
there is no guarantee that the person answering is eligible by age or other personal characteristics. Although
information can be gathered about the household composition, there is no random selection of a designated
respondent from the household. The computer can dial
a large set of telephone numbers in a short period of
time, working through a purchased sample quickly but
producing a low contact or cooperation rate as a result.
There also is no attempt to recontact a household to
obtain an interview with a designated respondent who
is not at home at the time of the first call. Because of
these considerations, it is inappropriate to calculate
a margin of error around any estimates produced from
such a poll.
This method shares many characteristics with selfselected listener opinion polls (SLOP) and other designs
that employ volunteer samples. A true response rate
cannot be calculated, although a version of a cooperation
rate can. The data can be collected rapidly and at
low cost. Although post-stratification weighting can be
applied to the resulting set of respondents, it is difficult
to interpret its meaning when information about respondent selection is missing.
Michael Traugott
See also Interactive Voice Response (IVR); Mode of Data
Collection; Self-Selected Listener Opinion Poll (SLOP)
Further Readings
COMPUTERIZED SELF-ADMINISTERED
QUESTIONNAIRES (CSAQ)
Computerized self-administered questionnaires (CSAQ)
are a method of collecting survey data that takes advantage of computer technology to create an instrument
(the questionnaire) that allows respondents to complete
126
Confidence Interval
CONFIDENCE INTERVAL
A probability sample can provide a point estimate of an
unknown population parameter and the standard error
of that point estimate. This information can be used to
Confidence Interval
127
Detailed Definition
of a Confidence Interval
In statistical terms, a confidence interval (two-sided)
is defined as a random interval [L, U] enclosing the
unknown population parameter value (y) (such as
a population mean, variance, or proportion) with
a given probability (1 a). That is, Probability
(L y U) = 1 a, where 0 a 1 and it generally
takes small values, such as 0.01, 0.05, or 0.1. The
interval [L, U] is known as the 100(1 a)% confidence interval for y, and the probability (1 a) is
known as the confidence level or the coverage probability of the interval [L, U]. In certain applications,
only a lower or upper bound may be of interest, and
such confidence intervals are known as one-sided
confidence intervals.
If a sampling process is repeated a large number of
times, and for each selected sample a confidence
interval is obtained using the same confidence level
and statistical technique, and if the population parameter was known, then approximately 100(1 a)% of
the confidence intervals will enclose the population
parameter. In reality, y is unknown, and owing to
budget and time constraints, only one sample is
selected; hence, it is not possible to know with certainty if the calculated 100(1 a)% confidence interval encloses the true value of y or not. It is hoped
with the chances at 100(1 a)% that it does enclose
the true value of y. The lower and upper end points of
the confidence interval depend upon the observed
sample values, selected confidence level, statistical
technique, the sample design, and population distributional characteristics, as illustrated by the following
examples. The confidence interval definition given
earlier comes from the frequentist school of thought.
The alternative, Bayesian inference, is not yet commonly used in survey data analysis and hence is not
covered here.
128
Confidence Interval
v
y
Confidence Interval
Stratified Sampling
Suppose that N = 120,000 cans were produced in
three batches of 40,000 cans each. In order to account
for batch-to-batch variability in the estimation process, it would make sense to select a random sample
from each of the three batches. This is known as
stratified sampling. To find a confidence interval
for the average amount of soda in the 120,000 cans
suppose the inspector took a SRSWOR sample of
(Y),
40 cans from each of the three batches. In stratified
sampling notation (from William Gemmell Cochran),
the three batches are known as strata, with
N1 = N2 = N3 = 40,000 denoting the stratum sizes and
n1 = n2 = n3 = 40 denoting the stratum sample sizes.
From Cochran, an unbiased estimator of Y is
L
P
Nh
h , where yh denotes the sample mean for
yst =
n y
h=1
h=1
variances
S2h ,
sizes are not large enough, then a Students t-distribution with n* degrees of freedom is used to approximate the sampling distribution of yst . The calculation
of n* is not straightforward and should be done under
the guidance of an experienced survey statistician.
129
Cluster Sampling
Now suppose that the N = 120,000 cans were packed
in 12-can packs and shipped to a retailer. The retailer
is interested in knowing the average amount of soda
A stratified sampling would
in the 120,000 cans (Y).
not be feasible unless all of the 12-can packs
(M = 10; 000) are opened. Similarly, for SRSWOR, it
would require one to list all the 120,000 cans, which
in turn may require opening all the packs. Because
each pack can be regarded as a cluster of 12 cans,
a cluster sample design is most suitable here. To
obtain an approximate 1001 a% confidence inter the retailer decided to select a SRSWOR
val for Y,
sample of m = 20 packs from the population of
M = 10; 000 packs and measure the amount of soda
in each of the cans. This is known as a single-stage
(or one-stage) cluster sample, in which all the clusters
(packs) have the same number (12 cans in each pack)
of elements (soda cans). An unbiased estimator of
r
m P
P
yij , where r is the common
Y is ycluster = N1 M
m
i=1 j=1
m
P
i=1
j=1
i=1
m P
r
m
2 1 P
P
1
by ^v
ycluster = vM
yij = M
ti =
N m
N vm
i=1 j=1
i=1
M2 1
m 2
N
m 1 M st : If the number of clusters in the
sample is large, then an approximate 1001 a%
confidence
interval
for Y is given by ycluster
p
za=2 ^v
ycluster :
In the preceding example, r = 12 (because all 12
cans in a pack are examined) and ti represents the total
amount of soda in the i th selected pack. Because
a SRSWOR sample of m = 20 packs is not large enough
to assume a normal probability distribution, a Students
tdistribution with tm 1, a=2 = t19;:025 = 2:093 could be
used by the retailer to obtain
a 95% confidence interval
p
ycluster .
for Y (i.e., ycluster t19;:025 ^v
It is a common mistake to analyze the data obtained
from a cluster sample as if it were obtained by
SRSWOR. In the preceding cluster sampling example,
if clustering is ignored at the analysis phase and a 95%
confidence interval is constructed by assuming that the
240 (20 packs 12) soda cans were selected using the
SRSWOR method, then the actual coverage probability
may be less than 95%, depending upon the size of the
130
Confidence Level
intracluster (or intraclass) correlation coefficient. Generally, the point estimate of a population parameter will
be the same whether the data are analyzed with or
without incorporating the survey design information in
the estimation process. However, the standard errors
may be quite different if the survey design information
is ignored in the estimation process, which in turn will
result in erroneous confidence intervals.
The examples given deal with constructing a confidence interval for the population mean for some of
the basic survey designs. In practice, survey designs
are generally more complex, and a confidence interval
for other population parameterssuch as population
proportions and quantiles; linear, log-linear, and nonlinear regression model parameters; survival function
at a given time (Coxs proportional hazard model or
Kaplan-Meier estimator)may be needed.
Akhil K. Vaish
See also Alpha, Significance Level of Test; Bias; Cluster
Sample; Confidence Level; Finite Population; Inference;
Margin of Error (MOE); Point Estimate; Population
Parameter; (Rho); Sampling Without Replacement;
Simple Random Sample; Standard Error; Stratified
Sampling; Target Population; Variance Estimation
Further Readings
CONFIDENCE LEVEL
In statistical inference, it is common practice to report
the point estimate of a population parameter along
Confidentiality
Further Readings
CONFIDENTIALITY
The confidentiality of survey data is expected by both
survey researchers and survey participants. Survey
researchers have multiple meanings for confidentiality
that are not quite the same as the common definition.
Dictionary definitions use terms such as private, intimate, and trusted, and some refer to national security
concerns.
However, in survey research, the definition is more
complex and can be used differently by different
researchers and survey organizations. For the most part,
confidentiality in survey research refers to the methods
for protecting the data that are collected. It refers both
to the promises made to survey participants that they
will not be identified in any way to those outside the
organization without their specific permission and to
the techniques that organizations use to ensure that
publicly available survey data do not contain information that might identify survey respondents.
For respondents, the promise of confidentiality is
the agreement on the methods to prevent others from
accessing any data that might identify them. Confidentiality of data is important for the success of survey research because survey participants would be
much less willing to participate if they thought the
survey organization would disclose who participated
in the research and/or their identified responses to
questions. The confidentiality protections provided to
participants are not as strong as for anonymously collected data, but both anonymity and confidentiality
are used for the same reasons.
The confidentiality of survey responses is important
for the success of surveys under certain conditions.
When the survey poses some risks for participants,
promises of confidentiality may improve cooperation.
Promises of confidentiality are also important to allow
respondents to feel comfortable providing answers,
especially to sensitive questions. When a survey asks
especially sensitive questions, respondents may be more
willing to share their thoughts if they know their
131
132
Consent Form
be compelled in any federal, state, or local civil, criminal, administrative, legislative, or other proceedings to
identify them by name or other identifying characteristic. However, some skepticism exists about whether
this protection would survive a serious legal challenge.
The rules on privacy and confidentiality appear to
be changing with the widespread use of computer networks and the analysis large scale databases. Yet, survey researchers and survey participants still expect that
survey data will remain confidential and protected. The
long-term success of the survey industry in protecting
its data is important to the professions overall success.
John Kennedy
See also Anonymity; Cell Suppression; Certificate of
Confidentiality; Ethical Principles; Sensitive Topics
Further Readings
CONSENT FORM
4. A statement describing the extent to which confidentiality of any answers or data identifying the
respondent will be maintained by researchers
Although consent forms usually are required for surveys of youth populations, federal regulations and IRBs
often provide some flexibility for surveys of adult
populations. For adult populations, participation in surveys rarely puts respondents at more than the minimal
risks of everyday life. Moreover, depending on the
mode of a survey, documentation of consent may not
Construct
CONSTANT
133
CONSTRUCT
In the context of survey research, a construct is the
abstract idea, underlying theme, or subject matter that
one wishes to measure using survey questions. Some
constructs are relatively simple (like political party
affiliation) and can be measured using only one or
a few questions, while other constructs are more complex (such as employee satisfaction) and may require
134
Construct Validity
CONSTRUCT VALIDITY
In survey research, construct validity addresses the
issue of how well whatever is purported to be measured actually has been measured. That is, merely
because a researcher claims that a survey has measured
presidential approval, fear of crime, belief in extraterrestrial life, or any of a host of other social constructs
does not mean that the measures have yielded reliable
or valid data. Thus, it does not mean the constructs
claimed to be measured by the researcher actually are
the ones that have been measured.
In most cases, for survey measures to have high
construct validity they also should have good face
validity. Face validity is a commonsensical notion
that something should at least appear on the surface
(or at face value) to be measuring what it purports
to measure. For example, a survey item that is supposed to be measuring presidential approval that asks,
How well is the country being run by the current
administration? has only some face validity and not
much construct validity. Its face validity and thus its
construct validity would be enhanced by adding the
name of the president into the question. Otherwise it
is a stretch to claim that the original wording is measuring the presidents approval. One reason for this is
that there could be many other members of the current administration other than the president who are
affecting the answers being given by respondents.
The single best way to think about the likely construct validity of a survey variable is to see the full
wording, formatting, and the location within the questionnaire of the question or questions that were used to
gather data on the construct. In this way one can exercise informed judgment on whether or not the question
is likely to have high construct validity. In exercising
this judgment, one should also consider how the question was administered to the respondents and if there is
anything about the respondents themselves that would
make it unlikely for them to answer accurately. Unfortunately, too few consumers of survey results have
access to this detailed type of information or take the
time to think critically about this. This applies to too
many journalists who disseminate survey information
without giving adequate thought to whether or not it is
likely to have solid construct validity.
For researchers and others who have a greater need
to judge the construct validity of variables on the
basis of empirical evidence, there are a number of
statistical analyses that can (and should) be performed. The simpler of these analyses is to investigate
whether answers given by various demographic
groups are within reasonable expectations. For example, if it is reasonable to expect gender differences,
are those gender differences actually observed in the
data? Additional, correlational analyses should be
conducted to determine if the variables of interest correlate with other variables they should relate to. For
example, if a Democrat is president, do respondents
who are strong Republicans give considerably lower
approval ratings than respondents who are strong
Democrats? A final consideration: variables that are
created from multiple survey items, such as scaled
variables, should be tested to learn if they have strong
internal consistency using procedures such as factor
analyses and calculating Cronbachs alpha. If they do
not, then one should suspect their construct validity.
Paul J. Lavrakas
See also Construct; Cronbachs Alpha; Interviewer-Related
Error; Measurement Error; Mode-Related Error;
Questionnaire-Related Error; Reliability;
Respondent-Related Error; Validity
Further Readings
135
136
Early Development
An economic behavior research program at the University of Michigan began as part of the postWorld
War II planning process. Its agenda was focused on
understanding the role of the consumer in the transition from a wartime economy to what all hoped
would be a new era of peace and prosperity. The primary purpose of the first survey in 1946 was to collect in-person data on household assets and debts. The
sponsor of the survey, the Federal Reserve Board, initially had little interest in the attitudes and expectations of consumers. Their goal was a financial balance
sheet, the hard currency of economic life, not the soft
data of consumer sentiment. George Katona, the
founder of the survey program, convinced the sponsor
that few respondents would be willing to cooperate if
the first question asked was, We are interested in
knowing the amount of your income and assets. First,
how much do you have in your savings account?
Instead, sound survey methodology required that
other, more general, and less threatening questions
were first needed to build respondent rapport and to
establish a sense of trust and confidence with the
respondents.
Katona devised a conversational interview that
introduced each new area of interest with questions
that first elicited general opinions before asking the
detailed questions on dollar amounts. Although the
sponsor was convinced that such attitudinal questions
were needed for methodological reasons, Katona was
told that he did not need to report any of these results
since the Federal Reserve had no interest in the attitudinal findings. Ultimately, the Federal Reserve Board,
as well as many others, became as interested in the
findings on consumers expectations as on consumers
balance sheets. Although the first measures of consumer expectations may seem serendipitous, it was in
reality no happenstance. Katona had clear and unmistakable intentions and seized this opportunity to give
life to an innovative research agenda. Katona had long
been interested in the interaction of economic and psychological factors, what he termed the human factor
in economic affairs. When Katona advocated his theory
of behavioral economics, few economists listened;
50 years later behavioral economics is at the center
of new theoretical developments.
When the sentiment measure was first developed in
the late 1940s, it was intended to be a means to directly
incorporate empirical measures of income expectations
Adaptation to Change
In the late 1940s, most consumers viewed all aspects
of the economy through the single dimension of how
it affected their jobs and income prospects. In the 21st
century, while job and income prospects are still
important, there are many other aspects of the economy that are just as important to consumers. For
example, consumer expectations for interest rates,
inflation, stock prices, home values, taxes, pension
and health care entitlements as well as jobs and
incomes have moved independently, and often in
opposite directions. Furthermore, consumers are now
more likely to make distinctions between the nearand longer-term prospects for inflation and stock
prices as well as between near- and longer-term job
and income prospects. Moreover, consumers have
also recognized the importance of the global economy
in determining wage and job prospects as well as
137
138
Contactability
Sample Design
The monthly survey is based on a representative sample of all adult men and women living in households
in the coterminous United States (48 states plus
the District of Columbia). A one-stage list-assisted
random-digit design is used to select a probability
sample of all telephone households; within each
household, probability methods are used to select one
adult as the designated respondent. The probability
design permits the computation of sampling errors for
statistics estimated from the survey data.
The sample is designed to maximize the study of
change by incorporating a rotating panel in the sample
design. An independent cross-section sample of
households is drawn each month. The respondents
chosen in this drawing are then reinterviewed six
months later. A rotating panel design results, with the
total of 500 monthly interviews made up of 60% new
respondents and 40% being interviewed for the
second time. The rotating panel design has several
distinct advantages. This design provides for the regular assessment of change in expectations and behavior
both at the aggregate and at the individual level. The
rotating panel design also permits a wide range of
research strategies made possible by repeated measurements. In addition, pooling the independent crosssection samples permits the accumulation of as large
a case count as needed to achieve acceptable standard
errors for critical analysis variables.
Richard Curtin
CONTACTABILITY
The ease or difficulty with which a sampled respondent can be contacted by a survey organization is
referred to as her or his contactability. It can be
expressed as a quantity (or contact propensity) and
ranges from 0.0 to 1.0, with 0.0 meaning it is impossible to contact the respondent and 1.0 meaning it is
certain that the respondent will be contacted.
Contactability will vary by the mode that is used
to attempt to contact a respondent in order to recruit
her or his cooperation and/or gather data. Contactability also will vary according to the effort a survey
organization expends to reach the respondent and
what days and times these contact attempts are tried.
For example, take the case of young adult males,
ages 18 to 24 years, who are among the hardest of
demographic groups for survey organizations to make
contact with. The mode of contact that is used will
affect the contactability of this cohort, as they are far
less likely to be contacted via a traditional random-digit
dialed (RDD) landline survey. If the telephone mode is
used, then researchers trying to contact this cohort also
need to sample cell phone numbers, as nearly one third
of these adults in the United States were cell phone
only in 2007 and their proportion is growing each
year. If the mode of contact is the postal service (mail),
Contact Rate
CONTACT RATE
Contact rate measures the proportion of eligible cases
in the sampling pool in which a member of a sampled
household was contactedthat is, reached by an interviewer (in telephone and in-person surveys) or received
the survey request (in the case of mail and Internet surveys). Contact rates can be computed for all surveys,
regardless of the mode in which the data are gathered.
The contact rate is a survey outcome rate that can be
cited in survey reports and in research literature.
139
Contact Rate 1
The numerator of this rate is comprised of all of
the kinds of contacts (e.g. completion, refusals, language barrier, and so on) a survey or interviewer
(depending on the mode) might make with a person
in a sampled household or unit (or with the respondent, if a respondent-level contact rate is being computed). The denominator includes all known eligible
cases and all cases of indeterminate eligibility. As
such, this rate is the most conservative contact rate.
Contact Rate 2
As before, the numerator of this rate is comprised of
all of the kinds of contacts a survey or interviewer
(depending on the mode) might make with a person in
a sampled household or unit (or with the respondent, if
a respondent-level contact rate is being computed).
However, the denominator of this rate includes all
known eligible cases and a proportion of the cases of
indeterminate eligibility that is based on the researchers best estimate of how many of the cases of indeterminate eligibility actually are eligible.
Contact Rate 3
As with Contact Rates 1 and 2, the numerator of this
rate is comprised of all of the kinds of contacts a survey or interviewer (depending on the mode) might
make with a person in a sampled household or unit
(or with the respondent, if a respondent-level contact
rate is being computed). The denominator of this rate
140
Contacts
Matthew Courser
Matthew Courser
Further Readings
CONTACTS
Contacts are a broad set of survey dispositions that
are used with all surveys (telephone, in-person, Internet, and mail), regardless of mode. The set of contact
dispositions includes all the kinds of contacts a survey
or interviewer (depending on the mode) might make
with a person or sampled household or unit.
Many of the most common types of contacts occur
in all surveys, regardless of the mode in which they
are conducted. These include completed interviews,
partial interviews, refusals, and breakoffs. Other, less
common types of contacts include cases in which contact is made with a respondent or sampled unit or
household but an interview is never started because
the sampled respondent is physically or mentally
unable to participate, or an interviewer is told the
respondent is unavailable to complete the questionnaire
during the entire field period. Contacts also include
cases involving language barriers (with a telephone or
in-person survey) and literacy issues relating to respondents not being able to read and understand the questionnaire, in the case of mail and Internet surveys. A
final type of contact occurs when it is determined that
the person or household is ineligible for the survey.
Of note, in many cases in mail and Internet surveys,
the researcher has no idea whether or not contact ever
was made with anyone at the sampling unit.
Contacts are used for computing contact rates for
surveys. A contact rate measures the proportion of all
cases in the sampling pool in which a member of a sampled household was reached by an interviewer (in
CONTENT ANALYSIS
As it relates to survey research, content analysis is
a research method that is applied to the verbatim
responses given to open-ended questions in order to
code those answers into a meaningful set of categories
that lend themselves to further quantitative statistical
analysis. In the words of Bernard Berelson, one of the
early scholars explaining this method, Content analysis is a research technique for the objective, systematic, and quantitative description of the manifest
content of communication. By coding these verbatim
responses into a relatively small set of meaningful
categories, survey researchers can create new variables in their survey data sets to use in their analyses.
Content Analysis
AFFORDABLE HOUSING
Figure 1
Honesty in government
Immigration; Illegal aliens
Moral decline
Housing
War in Iraq
National security; Terrorism
Misc. Other
141
Analytic Considerations
A general rule of thumb that many survey researchers
have found in doing content analyses of open-ended
answers is to code as many as three new variables for
each open-ended question. For example, if the openended question is Q21 in the questionnaire, then the
three new variables might be named Q21CAT1,
Q21CAT2, and Q21CAT3. This follows from experience that indicates that nearly all respondents will
give at least one answer to an open-ended question
(since most of these open-ended questions do not ask
for only one answer). Many respondents will give two
answers, and enough will give three answers to justify
coding up to three answers from each respondent.
When this approach is used, the researcher also may
want to create other new dichotomous (dummy) variables coded 0 or 1 to indicate whether each
respondent did or did not mention a certain answer
category. Thus, for the earlier example using the
biggest problem question, new dichotomous variables could be created for each category (BUSH,
CONGRESS, HONESTY, IMMIGRATION, etc.).
For each of these new variables, the respondent would
be assigned the value of 0 if she or he did not
mention this category in the open-ended verbatim
response and 1 if this category was mentioned.
Paul J. Lavrakas
See also Content Analysis; Open-Ended Question; Verbatim
Responses
Further Readings
142
Context Effect
CONTEXT EFFECT
The term context effect refers to a process in which
prior questions affect responses to later questions in
surveys. Any survey that contains multiple questions
is susceptible to context effects. Context effects have
the potential to bias the thinking and answers of
survey respondents, which reduces the accuracy of
answers and increases the error in survey measurement. Psychologists refer to context effects as the
general effect of priming. Priming occurs when the
previous activation of one type of information in
active memory affects the processing of subsequent
related information. For example, the prior presentation of the word doctor reduces the time it takes to
subsequently recognize the word nurse in comparison
to an unrelated word. This priming effect is thought
to occur because the activation of one concept spreads
and activates related concepts in the brain. Similarly,
for example, attempting to remember a list of words
that all relate to bed (i.e., sleep, pillow, etc.)
increases the likelihood that a person will falsely
remember that the related word was present in the list
during recall. In both cases, the previous context consistently primes, or biases, thinking in a certain direction by increasing the saliency of that information.
Context effects are most noticeable in attitude surveys. These contexts effects may occur (a) within
a question, and (b) between questions (also referred to
as question order effects). An example of a withinquestion context effect is how the label anti-abortion
instead of pro-choice affects attitudes toward abortion.
The wording choice leads the respondent to frame
a question in a certain way or increases the saliency
and importance of some information over other information within a question. A between-question context
effect occurs, for example, when previous questions
regarding attitudes toward an ongoing war influence
a subsequent question regarding presidential performance. Question order effects are evident in the fact
that answers to questions on related themes are more
similar and consistent when the questions are asked in
a group than when these questions are separated and
scattered throughout a questionnaire. Effects of question order are also evident when questions regarding
a negative life event lead to more negative attitudes for
subsequent questions regarding present feelings.
It is possible to control for context effects by counterbalancing question order across several versions of
Further Readings
CONTINGENCY QUESTION
Questions that are limited to a subset of respondents
for whom they are relevant are called contingency
questions. Relevancy is sometimes based on a respondent characteristic such as gender or age. For example,
it is typical to ask only women of childbearing age if
they are currently pregnant; conversely, only men are
asked if they have ever have had a prostate cancer
Contingency Table
143
Further Readings
CONTINGENCY TABLE
A contingency table (or cross-tabulation) is an effective way to show the joined distribution of two variables, that is to say, the distribution of one variable
within the different categories of another. Data in the
table are organized in rows and columns. Each row
corresponds to one category of the first variable (usually considered as the dependent variable), while each
column represents one category of the second variable
(usually considered as an independent variable). The
intersection of a row and a column is called a cell.
Each cell contains the cases that have a certain combination of attributes corresponding to that row and column (see Table 1). Inside each cell a variety of
information can be displayed, including (a) the total
count of cases in that cell, (b) the row percentage
represented by the cell, (c) the column percentage
represented by the cell, and (d) the proportion of the
total sample of cases represented by that cell.
Generally, a contingency table also contains the
sums of the values of each row and column. These
sums are called the marginals of the table. The
sum of column or row marginals corresponds to the
sample size or grand total (in the lower right-hand cell
of the table).
The product of the number of the rows by the
number of the columns is called the order of the
table (Table 1, for example, is a 2 2 table), while
the number of the variables shown in the table represents its dimension.
144
Contingent Incentives
Table 1
Gender
Education
Low
High
Total
Total
32
87
119
26.9%
73.1%
100%
17.0%
37.5%
28.3%
7.6%
20.7%
156
145
301
51.8%
48.2%
100%
83.0%
62.5%
71.7%
37.1%
34.5%
188
232
420
44.8%
55.2%
100%
100%
100%
Further Readings
CONTINGENT INCENTIVES
Past research has shown that contingent incentives can
be used in survey research as a way of increasing survey response rates. The concept of contingent versus
noncontingent incentives is that a noncontingent incentive is given to the respondent regardless of whether the
survey task is completed, whereas giving a contingent
incentive is dependent on the respondents completion
of the survey task, such as completing and returning the
questionnaire in a mail survey. Contingent incentives
are most commonly used with phone and Internet
surveys, although they can be used with any mode of
Contingent Incentives
145
The decision to use a contingent incentive is somewhat independent from the decision to use a noncontingent incentive. If the survey budget can afford both,
researchers will still be somewhat at a loss as to how to
distribute the total value that will be used across the noncontingent and contingent incentives. That is, there is no
definite guidance provided by the research literature
indicating what is the most optimal balance between the
value of a contingent incentive and a noncontingent
incentive when both are used in the same survey.
When considering which type of contingent incentive, if any, to use in a particular survey, the researcher
should consider the type of survey instrument (mailed,
phone, Internet, in-person), the relative importance of
the response rate, the level of effort required to complete the survey, the probable motivation of the sample
to comply without any incentive, and the need possibly
to differentially incent certain hard-to-reach demographic cohorts. For simple, short mailed surveys, short
phone interviews, and short Internet surveys, an incentive may not be needed. As the length and complexity
of the survey increases or respondent engagement (e.g.,
level of interest) decreases, the need to consider the use
of a noncontingent incentive is likely to increase.
The amount of contingent incentive offered to the
respondent should not be out of proportion to the
effort required to complete the survey. When a promised contingent incentive amount is the sole motivating
factor in the decision of a respondent to cooperate, the
respondent may put in a less than adequate effort in
accurately and completely answering the questions in
the survey. Researchers should be aware of this buying cooperation phenomenon. Some research organizations offer points for completing surveys that can
later be redeemed for prizes. Some firms form panels
of households that will complete numerous surveys
and can accumulate points over time and redeem them
for larger prizes.
Another use for contingent incentives is to persuade
the participants to return all materials and do so in
a timely manner. The participant may be motivated to
make a deadline for returns if they are aware that the
amount of the contingent incentive is at least partially
dependent on returning the materials by the cutoff date.
A concern to some researchers who are considering use of a contingent versus noncontingent incentive
with a mail or Internet survey (ones not administered by an interviewer) is the possibility of confusion
about whether the survey task (e.g., questionnaire)
was fully completed and returned in a timely manner.
146
Control Group
Further Readings
CONTROL GROUP
In experimental designs, a control group is the
untreated group with which an experimental group
(or treatment group) is contrasted. It consists of units
of study that did not receive the treatment whose effect
is under investigation. For many quasi-experimental
studies, treatments are not administered to participants,
as in true experimental studies. Rather, treatments are
broadly construed to be the presence of certain characteristics of participants, such as female gender, adolescence, and low socioeconomic status (SES), or features
of their settings, such as private schools or participation
in a program of interest. Thus, the control group in
quasi-experimental studies is defined to be those lacking these characteristics (e.g., males, respondents who
are older or younger than adolescence, those of high
and medium SES) or absent from selected settings
(e.g., those in public schools, nonparticipants in a program of interest). Control groups may alternatively be
called baseline groups.
In a true experiment, control groups are formed
through random assignment of respondents, as in
between-subject designs, or from the respondents
themselves, as in within-subject designs. Random
assignment supports the assumption that the control
group and the experimental group are similar enough
(i.e., equivalent) in relevant ways so as to be genuinely
comparable. In true experimental studies and betweensubject designs, respondents are first randomly selected
from the sampling frame; then they are randomly
assigned into either a control group or an experimental
group or groups. At the conclusion of the study, outcome measures (such as responses on one or more
dependent variables, or distributions on survey items)
are compared between those in the control group and
those in the experimental group(s). The effect of a treatment (e.g., a different incentive level administered to
each group) is assessed on the basis of the difference
(or differences) observed between the control group
and one or more experimental group.
Similarly, in within-subject designs, respondents
are randomly selected from the sampling frame. However, in such cases, they are not randomly assigned
into control versus experimental groups. Instead, baseline data are gathered from the respondents themselves. These data are treated as control data to be
compared with outcome measures that are hypothesized to be the result of a treatment after the respondents are exposed to the experimental treatment.
Thus, the respondents act as their own control group
in within-subject designs.
Control groups are often used in evaluation studies
that use surveys, and they are also relevant to methodological research on surveys. Research that examines
the effects of questionnaire design, item wording, or of
other aspects of data collection often uses a classical
split-ballot design or some variant. In these studies,
respondents are assigned at random to receive one of
two versions of a questionnaire, each version varying
on a single point of question order, wording, or presentation. In practice, these studies often depart from the
conception of presence versus absence that typically
marks the contrast between treatment and control
Controlled Access
Further Readings
CONTROLLED ACCESS
Any sampled housing unit to which access by a data
collector is physically blocked or impeded is considered
to be a situation of controlled access. Impediments may
include people (e.g., a gatekeeper), structures, and/or
animals. Controlled access situations are encountered
only in studies using the in-person field data collection
methodology. Dealing effectively with these impediments is necessary to further the objectives of a field
data collection operation.
147
148
Control Sheet
their supervisors before using a covert method of gaining entry to a controlled access environment.
Researchers should include in their procedural manuals and training programs material on how to deal
effectively with various controlled access situations.
Strategies and tools for dealing with locked facilities,
complexes, and neighborhoods should be developed,
utilized, and continually enhanced in an effort to negotiate past these impediments. This is particularly important so that data collectors do not find themselves
taking unnecessary risks. They must be prepared to
exercise good judgment to avoid legal issues such as
trespassing or being injured attempting to surmount a
physical barrier or outrun an aggressive animal.
As our society becomes increasingly security- and
privacy-minded, the presence of controlled access
situations and facilities will similarly increase. It is
important for researchers to recognize this trend and
the potential negative effect on survey nonresponse
that controlled access situations represent.
Randall Keesling
See also Face-to-Face Interviewing; Field Survey; Field
Work; Gatekeeper
CONTROL SHEET
A control sheet, also called a case control form, is
used by interviewers in in-person (face-to-face) surveys to record information about the contact attempts
they make with households or persons who have been
sampled.
Similar in purpose to the call sheet used by telephone interviewers, the control sheet captures key
paradata about each contact attempt an interviewer
makes with the household or person. This includes (a)
the date of the contact attempt, (b) the time of day
of the contact attempt, (c) the outcome (disposition)
of the contact attempt, and (d) any additional information that is pertinent about the effort to make contact
(e.g., the name of the designated respondent if she or
he is not home at the time the attempt is made and
the best time to recontact her or him).
The information recorded on control sheets serves
several important purposes. First, it allows the interviewers and supervisory field staff to better control
the processing of the sample according to the a priori
CONVENIENCE SAMPLING
Convenience sampling is a type of nonprobability
sampling in which people are sampled simply because
they are convenient sources of data for researchers.
In probability sampling, each element in the population
has a known nonzero chance of being selected through
the use of a random selection procedure. Nonprobability
sampling does not involve known nonzero probabilities
of selection. Rather, subjective methods are used to
decide which elements should be included in the sample. In nonprobability sampling, the population may
not be well defined. Nonprobability sampling is often
divided into three categories: purposive sampling, convenience sampling, and quota sampling.
Convenience sampling differs from purposive sampling in that expert judgment is not used to select a
representative sample of elements. Rather, the primary
Convention Bounce
selection criterion relates to the ease of obtaining a sample. Ease of obtaining the sample relates to the cost of
locating elements of the population, the geographic distribution of the sample, and obtaining the interview
data from the selected elements. Examples of convenience samples include mall intercept interviewing,
unsystematically recruiting individuals to participate in
the study (e.g., what is done for many psychology studies that use readily available undergraduates), visiting
a sample of business establishments that are close to
the data collection organization, seeking the participation of individuals visiting a Web site to participate in
a survey, and including a brief questionnaire in a coupon mailing. In convenience sampling the representativeness of the sample is generally less of a concern
than in purposive sampling.
For example, in the case of a mall intercept survey
using a convenience sample, a researcher may want
data collected quickly using a low-cost method that
does not involve scientific sampling. The researcher
sends out several data collection staff members to
interview people at a busy mall, possibly on a single
day or even across a weekend. The interviewers may,
for example, carry a clipboard with a questionnaire
that they may administer to people they stop in the
mall or give to people to have them fill out. This variation in convenience sampling does not allow the
researcher (or the client) to have any sense of what
target population is represented by the sample.
Although convenience samples are not scientific samples, they do on occasion have value to researchers
and clients who recognize their severe limitation; for
example, they may allow some quick exploration of
a hypothesis that the researcher may eventually plan
to test using some form of probability sampling.
Mike Battaglia
See also Mall Intercept Survey; Nonprobability Sampling;
Probability Sample; Purposive Sample
Further Readings
CONVENTION BOUNCE
Support for presidential candidates usually spikes during their nominating conventionsa phenomenon so
149
reliable its measurement has become a staple of preelection polling and commentary. Some of these convention bounces have been very short-lived, the race
quickly reverting to its pre-convention level between
the candidates. Others have been more profound
a coalescing of voter preferences that has charted the
course for the remaining campaign.
While convention bounces have been apparent since
1968 (previous election polling was too infrequent for
reliable identification of such bounces), focus on the
convention bounce owes much to Bill Clinton, who
soared from a dead heat against Republican presidential incumbent George H. W. Bush before the 1992
Democratic convention to nearly a 30-point lead after
it. While the race later tightened, Clinton never again
trailed in pre-election polls.
No bounce has matched Clintons, but others are
impressive in their own right. Jimmy Carter rode
a 16-point bounce to a 33-point lead after the 1976
Democratic convention, lending authority to his challenge and underscoring incumbent Gerald Fords
weakness. Ford in turn mustered just a 7-point bump
following the 1976 Republican convention; while the
race tightened at the close, Carters higher bounce
foretold his ultimate victory.
If a solid and durable bounce suggests a candidates
strength, its absence can indicate the opposite. Neither
Hubert Humphrey nor George McGovern took significant bounces out of their nominating conventions in
1968 and 1972, both en route to their losses to
Richard Nixon.
Assessment
Standards for assessing the bounce differ. While it
sometimes is reported among likely voters, it is
more meaningfully assessed among all registered
voters, which is a more stable and more uniformly
defined population. And the fullest picture can be
drawn not by looking only at change in support for
the new nominee, butoffense sometimes being the
best defense in politicsat the change in the margin
between the candidates, to include any drop in support
for the opposing candidate. For example, the 1968
Republican convention did more to reduce Humphreys support than to bolster Nixons.
Timing can matter as well; surveys conducted
closer to the beginning and end of each convention
better isolate the effect. In 2004, Gallup polls figured
John Kerrys bounce from a starting point measured 5
150
Convention Bounce
Causes
The basis for the bounce seems clear: a specific candidate dominates political center stage for a week, laying
out his or her vision, burnishing his or her credentials
anddirectly or through surrogatescriticizing his or
her opponent. It takes a problematic candidate, an offkey convention, or an unusually immovable electorate
not to turn the spotlight into support.
But exposure is not the sole cause; while airtime
for network coverage of the conventions has declined
sharply over the years, the bounces havent. The two
national conventions received a total of 73 hours
of broadcast network coverage in 1968, declining
sharply in ensuing years to a low of 6 hours in 2004
(as reported by Harold Stanley and Richard Niemi in
Vital Statistics on American Politics 20032004).
Audience ratings likewise dropped. Yet there is no
significant relationship between hours of network
coverage and size of convention bounces. Indeed, the
Conversational Interviewing
CONVERSATIONAL INTERVIEWING
Conversational interviewing is also known as flexible interviewing or conversationally flexible interviewing. These terms refer to an alternative style of
survey interviewing that allows deviations from the
norms of standardized interviewing. Under conversational interviewing procedures, interviewers are
allowed to ask respondents if they did not understand
a question and provide unscripted feedback to clarify
the meaning of questions as necessary. Conversational
interviewing represents an alternative set of techniques
to standardized survey interviewing whereby interviewers are allowed to provide unscripted information
to respondents in an effort to clarify question meaning.
Proponents of conversational interviewing techniques argue that standardized procedures may reduce
the accuracy of survey responses because standardization precludes conversational interactions that may be
required for respondents to understand some questions. A key distinction between standardized and
conversational interviewing is that standardization
requires the interpretation of questions to be accomplished entirely by respondents. A central tenet of
standardized interviewing is that interviewers must
always read questions, response options, and instructions to respondents exactly as they are scripted. Further definitions, clarifications, or probes can only be
read in standardized interviews if these elements are
included in the interview script. A second tenet of
standardized interviewing is that any probes used by
interviewers must be nondirective, so that the probes
do not lead respondents to give particular answers. As
a result, standardized interviewers can only provide
clarification when respondents request it, and can then
only provide standardized forms of assistance such as
nondirective probes.
In conversational interviewing, interviewers can
provide whatever information is needed to clarify
question meaning for respondents, and they can provide these clarifying statements whenever they perceive respondents are having difficulty understanding
a question. Proponents of conversational interviewing
hypothesize that these more flexible techniques can
produce more accurate survey responses by standardizing the meaning of questions, not the wording or
exact procedures used to administer the questions.
Because the same terms can have different meanings
to different respondents, conversational interviewing
151
152
Cooperation
COOPERATION
Cooperation is a term used by survey researchers that
refers to the degree to which persons selected (sampled) to participate in research accept (agree to) their
invitation and engage (cooperate) in the research process. The composition of the group under study is
a fundamental (and vitally important) consideration in
the design, execution, and interpretation of a survey.
A researcher must both identify and collect information from an appropriate sample in order to successfully and validly answer the research question.
Ideally, the rate of cooperation among those sampled
will be very high.
Applied to a specific study, cooperation refers to the
breadth of participation that researchers are able to elicit
from those that they have chosen to study. To help
objectively measure levels of cooperation within a study,
the American Association for Public Opinion Research
(AAPOR) developed a series of standard definitions
that include how to define and compute cooperation
rates. AAPORs cooperation rates are mathematical
formulae that reflect the proportion of respondents who
actually participate in a survey divided by all of the
sampled cases that are ever contacted, and are eligible,
to participate in the survey.
Together with the response, refusal, and contact
rates, the cooperation rate is included in a category
of formulas collectively known as outcome rates.
These rates are calculated by survey researchers in
order to better understand the performance of surveys.
Methods sections of survey reports typically include
at least some information regarding these rates.
Further Readings
Cooperation Rate
153
research and the public. To this end, CMOR has published and encourages all researchers to adhere to the
Respondent Bill of Rights. It also encourages members
of the profession to use the same outcome rate calculations to ensure that there are consistent measures in
the profession.
Patrick Glaser
See also American Association for Public Opinion Research
(AAPOR); Cooperation Rate; Council for Marketing and
Opinion Research (CMOR); Incentives; LeverageSaliency Theory; Nonresponse; Nonresponse Error;
Respondent Burden; Standard Definitions
Further Readings
COOPERATION RATE
The cooperation rate to a survey indicates the extent
to which contacted individuals cooperate with a
request to participate in a survey. It is often mistakenly reported or interpreted as the response rate. Generally, the cooperation rate is the ratio of all cases
interviewed out of all eligible units ever contacted,
whereas a response rate is the ratio of all cases interviewed out of all eligible sample units in the study,
not just those contacted.
The American Association for Public Opinion
Research (AAPOR), which has established a standard
definition of the cooperation rate, offers at least four
ways to calculate it. The numerator includes all completed interviews but may or may not include partial
interviews. The denominator includes all eligible sample units that were contacted (including refusals and
other non-interviews that may have been contacted),
but may or may not include sample units that are
incapable of cooperating (e.g., because of health or
language barriers).
154
Correlation
Further Readings
CORRELATION
Correlation is a statistical measure of the relationship,
or association, between two or more variables. There
are many different types of correlations, each of which
measures particular statistical relationships among and
between quantitative variables. Examples of different
types of correlations include Pearsons correlation
(sometimes called product-moment correlation),
Spearmans correlation, Kendalls correlation, intraclass correlation, point-biserial correlation and others.
The nature of the data (e.g., continuous versus dichotomous), the kind of information desired, and other factors can help determine the type of correlation measure
that is most appropriate for a particular analysis.
Correlation
The value of the correlation between any two variables is typically given by a correlation coefficient,
which can take on any value between and including
1.00 (indicating a perfect negative relationship) up
to and including +1.00 (indicating a perfect positive relationship). A positive correlation between two
variables means that as the value of one variable
increases, the value of the second variable tends to
increase. A negative correlation means that as the
value of one variable increases, the value of the second variable tends to decrease. A correlation that is
equal to zero means that as one variable increases or
decreases, the other does not exhibit a tendency to
change at all.
One frequently used measure of correlation is
Pearsons correlation; it measures the linearity of the
relationship between two variables. The Pearsons
correlation coefficient is calculated by dividing the
covariance of two variables by the product of the
standard deviation of each variable. That is, for n
pairs of variables x and y; the value of the Pearsons
correlation is
0vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1
n
u n
, u
u P xi x2 u P yi y2
C
t
t
xi x yi y B
B i=1
C:
i=1
i=1
@
A
n
n
n
n
P
Table 1
155
Average Number of
Cigarettes Smoked / Day
10
28
30
31
45
25
46
22
48
12
55
57
13
62
62
77
Further Readings
156
25.00
20.00
15.00
10.00
5.00
0.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
Income ($1,000s)
Figure 1
Organizational Structure
A volunteer board of directors and volunteer committee set CMORs policy and vision and determine the
direction of CMORs initiatives. Members are drawn
from all sectors of the research industry: full-service
research firms, data collection companies, research
analysts, and end users.
CMOR is structurally organized into two separate
departments: Respondent Cooperation and Government
Affairs. Each department maintains a permanent volunteer committee in order to drive the organizations
work. Additional committees are formed on an ad hoc
basis. A professional staff person oversees both departments and acts as a liaison with counterparts in the
other research organizations. Further, professional staffers are hired to head each department and support staff
assist in implementing the initiatives.
Respondent Cooperation and Government Affairs
are inextricably related due to governments influence,
through legislation, over what methods for conducting
research are deemed legal and how this may affect the
validity of research and the ability of the researcher to
achieve respondent cooperation. Conversely, Respondent Cooperation is partially a reflection of the publics perceptions and attitudes toward research, and it
may play a very strong role in the types of legislation
that are proposed and adopted as law.
Respondent Cooperation
With regard to respondent cooperation, CMORs mission is to evaluate the publics perceptions of the
research process, to measure the effects of alternative
methods of improving respondent cooperation, and
to provide a foundation upon which to build an
improved set of industry guidelines. Since its formation, CMOR has worked to increase respondent cooperation and has advocated the importance and
necessity of marketing and opinion research to the
general public. Objectives related to respondent cooperation objectives include the following:
Provide objective information about level of cooperation in surveys
Monitor the ever-changing research environment
Develop industry-accepted and -supported solutions
to improve respondent relations
Educate and develop training programs for our
members and members of the research community
about the issues affecting respondent cooperation
and of CMORs efforts to improve participation
Educate the research communitys external audiences, including the public, media, and businesses,
about the value of research and their participation in
legitimate research surveys and polls
Promote the social utility and value of survey
research
Act quickly to provide guidance to our members
and the research community about environmental
issues that may affect cooperation
157
Government Affairs
In terms of government affairs, CMORs mission is to
monitor relevant legislative and regulatory activity, to
ensure that the interests of the research community are
protected, and to educate industry members regarding
relevant legislative, statutory, and legislative issues.
The following are among the objectives in this area:
Monitor and respond to legislative and regulatory
activities that affect the research industry
Educate CMOR members and members of the
research community about the legislative and regulatory measures that threaten research and of
CMORs efforts to protect the research industry
Educate CMOR members and members of the
research community about existing statutes and regulations that impact the research industry
Educate lawmakers and policymakers about the
value of research, the distinction between research
and sales-related activities and the negative implications restrictive measures have on research
Respond to abuses of the research process and work
with lawmakers and government officials to regulate
and prosecute such activities
Act pro-actively on legislative and regulatory measures
Build coalitions with other organizations to use as
resources of information and to strengthen our ability to act on restrictive and proactive legislative and
regulatory measures
Kathy Pilhuj
See also Council of American Survey and Research
Organizations (CASRO); Federal Communication
Commission (FCC) Regulations
Further Readings
158
Covariance
Professional Development
CASRO University is a professional development curriculum that provides certificates in Survey Research
Practice, Business Management, Project Management,
and Privacy Management. CASRO University includes
an annual series of conferences, workshops, Webcasts,
and other professional development and educational programs that contribute to the career development of survey researchers. CASRO and CASRO University work
in cooperation with academic programs as well, including the graduate degree programs in survey research at
the University of Georgia (Athens), University of Texas
(Arlington), University of Wisconsin (Madison), Southern Illinois University (Edwardsville), and the Market
Research Institute International (MRII). CASRO Financial Reports include annual Financial and Compensation
Surveys, as well as an annual Data Collection Survey.
Self-Regulation
The CASRO Government and Public Affairs (GPA)
program monitors, lobbies as appropriate, and provides guidance on compliance with legislation and
regulations that impact survey research. In addition,
the CASRO GPA proactively protects professional
COVARIANCE
Covariance is a measure of association between two
random variables. It has several applications in the
design and analysis of surveys.
The covariance of two random variables, X and Y,
is equal to the expected product of the deviations
between the random variables and their means:
CovX; Y = EX X Y Y :
Under a design-based perspective to surveys, the
sample inclusion indicators are random variables, and
covariance is present when the probabilities of inclusion are correlated.
For a simple random sample of n units from
a population of size N; the covariance between the
means x and y is estimated as:
n
n 1 1 X
xi xyi y:
covx, y = 1
N n n 1 i=1
This is equivalent to the variance formula when xi
and yi are the same for each unit in the sample. For
complex sample surveys, standard variance estimation
techniques, such as Taylor series linearization, balanced repeated replication, or jackknife replication,
can be used to compute covariance.
Covariance can be written as a function of the correlation (x; y):
covx; y = x; yvarxvary;
Coverage
159
Further Readings
COVERAGE
The percentage of females in the population is estimated as 48% based on only respondents but as 50%
from the full sample, for a bias of 2%. Using the
appropriate variance estimation method, the variances
are found to be 1.2 for the estimate from respondents
and 1.0 for the full sample, with a covariance of 0.9.
Taking into consideration the correlation between
estimates from the full sample and estimates from
respondents only, the variance of the bias is 0.4
= 1:2 + 1:0 2 0:9. Using a t-test to test the null
hypothesis that the bias is equal to zero, the p-value is
found to be < 0:001, indicating significant bias in the
estimate of females. However, if the covariance term
is ignored, the variance of the bias is calculated as
2.2, and the bias is no longer determined to be statistically significant.
Ignoring the covariance term leads to an overestimation of the variance of the difference of the
estimates, given the two estimates are positively
correlated. This result is important in other survey
contexts, such as comparing estimates between two
time periods for a longitudinal survey or from different subdomains involving clustering. Covariance also
has several other applications in surveys, including
The term coverage, as used in survey research, indicates how well the sampling units included in a particular sampling frame account for a surveys defined
target population. If a sampling frame does not contain
all the units in the target population, then there is
undercoverage of the population. If the frame contains
duplicate units or other units beyond those contained in
the population, then there is overcoverage. Undercoverage and overcoverage do not necessarily mean there
will be coverage error associated with the frame.
Overcoverage occurs when members of the survey
population are erroneously included in the survey sampling frame more than once or are included erroneously. Noncoverage (including undercoverage) occurs
when members of the targeted population are erroneously excluded from the survey sampling frame. The
meaning of the term noncoverage is not the same as
the meaning of unit nonresponse, which is the failure
to obtain complete survey data because of issues such
as noncontacts, refusals, lost questionnaires, and so on.
Both overcoverage and noncoverage can occur
at several junctures during the survey process. For
example, in population surveys in which the sample is
selected in two or more stages to obtain estimates of
persons within households, coverage errors may occur
at any or all stages when creating the sampling frame
of primary sampling units, during field listing of housing units, or when creating a household roster of persons within a given family. Noncoverage that occurs
during field listing can result if members of the survey
sample are excessively expensive to locate or are part
160
Coverage
Noncoverage
Noncoverage can occur when sampling units are
omitted or missing from the sampling frame. For
example, a sampling frame of business establishments
may omit newly created businesses, or an administrative system may exclude units that failed to submit
reports, or newly constructed buildings may be omitted from a housing survey. This will result in an
incomplete frame from which the sample is drawn.
Biases in the resulting survey estimates can occur
when it is incorrectly assumed that the frame is complete or that the missing units are similar to those
included in the frame, if units are actually known to
be missing from the sampling frame.
A special case of noncoverage can be attributed to
sampling units that are misclassified with respect to
key variables of interest, such as a persons raceethnicity or a households vacancy status. When these
key variables are missing, the sampling units cannot be
properly classified in order to determine their eligibility
status for the survey. In population household surveys,
groups such as homeless persons or constant travelers
are generally excluded from coverage. Special procedures may be necessary to account for these groups to
prevent understating these populations in the survey
estimates. Alternatively, if this is not feasible, it is
important that published survey results document the
limitations in coverage and possible errors in the survey estimates associated with imperfect coverage.
Overcoverage
Overcoverage can occur when the relationship between
sampling units is not properly identified, resulting in
duplicate or erroneous entries on the sampling frame.
For instance, use of lists to develop the survey sampling frame might overlook events such as business
Coverage Error
161
COVERAGE ERROR
Coverage error is a bias in a statistic that occurs when
the target population does not coincide with the population actually sampled. The source of the coverage
error may be an inadequate sampling frame or flaws
in the implementation of the data collection. Coverage
error results because of undercoverage and overcoverage. Undercoverage occurs when members of the target population are excluded. Overcoverage occurs
when units are included erroneously. The net coverage error is the difference between the undercoverage
and the overcoverage.
Bias in Descriptive
and Analytical Statistics
Both undercoverage and overcoverage are biases and
therefore may distort inferences based on descriptive or
analytical statistics. Weaknesses in the sampling frame
or the survey implementation create coverage error by
compromising the random selection and thus how
representative of the target population is the resulting
sample. This is particularly the case if the cause of the
coverage error is correlated with the characteristics
being measured.
The amount of bias in descriptive statistics, such as
means and totals, from undercoverage depends on the
proportion of the population not covered and whether
the characteristics of individuals not covered differ from
those who are. If those not covered are merely a simple
random sample of the population, then means will not
be biased, although totals may be. For example, when
estimating the mean, excluding individuals in the target
population will not bias the mean if the mean of those
covered equals the mean of those not covered. However, usually the exclusion of individuals is not random.
More often, the excluded individuals are difficult to
identify and to contact for interviews because of their
characteristics. For example, a telephone survey measuring income would exclude individuals with low
incomes who could not afford a telephone.
Coverage error also may affect analytical statistics,
such as regression coefficients. The amount of bias in
a regression coefficient from undercoverage depends
on the ratio of the dependent variables variance in
the target population to that in the covered population
and the quality of the fit of the regression model in
the target population. If the variance of the dependent
162
Coverage Error
Coverage Error
Longitudinal surveys that interview a sample periodically over a period of years have the potential for
coverage error due to attrition, in addition to coverage
concerns at the time of the initial sample selection.
One approach is to estimate the attrition rate and then
draw an initial sample large enough to produce
a desired sample size at the end. Adjustments for the
attrition may be made in the estimation.
163
estimated with survey data may be compared to statistics estimated with the auxiliary data. Although the
auxiliary data source may be available for only some
of the characteristics the survey measures, such a comparison provides guidance regarding coverage error.
When using an auxiliary source for estimating coverage error, the researcher also has to be concerned
about the coverage error in the auxiliary source. Even
a census, which is often used to judge whether coverage error exists, may have coverage error itself. For
the U.S. Population Census in 2000, two different
methods estimated coverage error. Both found the net
coverage error for the population overall to be very
close to zero, but also found that the net coverage
error rate was not uniform across the population. To
illustrate the differential coverage error within groups,
both methods estimated undercoverage for black males
and overcoverage for nonblack females.
164
Coverage Error
Coverage Error
165
166
Cover Letter
Coverage Error in
Establishment Surveys
Establishment surveys have their own unique sources
of coverage error. Miscoding of industry, size, geographic location, or company structure may lead to
frame errors that result in coverage error. The list
frame may not be updated often enough to reflect the
population corresponding to the survey reference
period. Changes that make frames out of date include
acquisitions, mergers, and growth in one line of business. In addition, the maintenance process for the list
may not enter new businesses in the frame in a timely
manner. Businesses that are no longer operating may
remain on the list for some time after they close.
There may be a delay in recording changes in a
business that would cause its industry or size coding
to change.
For the United States, Dun & Bradstreet has a list
of businesses that is publicly available. These listings
have addresses and telephone numbers. When a business has more than one location, researchers have to
decide whether the target population is establishments
or a more aggregated level within the company. The
U.S. Census Bureau maintains its own list of businesses for its surveys, but the list is not available to
the public.
Small businesses pose more difficult coverage
error concerns because they are less stable than
larger businesses. The process for forming the large
lists is unable to keep up with the start-ups and failures in small businesses. Sometimes researchers use
multiple-frame methodology that relies on a list
frame and an area frame to reduce the potential for
coverage error.
Mary H. Mulry
See also Area Frame; Attrition; Auxiliary Variable;
Face-to-Face Interviewing; Frame; Half-Open Interval;
Mail Survey; Multiple-Frame Sampling; Overcoverage;
Raking; Random-Digit Dialing (RDD); Target
COVER LETTER
A cover letter accompanies or transmits another document such as a survey questionnaire. Its purpose is to
alert the respondent about the questionnaire it accompanies and to provide the details of requested actions
on the part of the respondent. When used as a part of
multiple communications or overall research strategy,
such as an advanced contact or future reminder mailings, it can help increase response by conveying important information (e.g., research topic, survey sponsor,
incentives) that is likely to influence a respondents
decision to cooperate and/or to comply fully and accurately with the survey task. As with all communications (including the questionnaire), the cover letter
should be written in a way that maximizes the likelihood of participation and minimizes or eliminates any
possible objectionable content.
Cover letters are an accepted and commonly used
part of good survey design. There is a large amount
of experimental research available on cover letter
style, layout, elements, wording, and so on.
Cover Letter
Elements
The elements listed following are used commonly
in professional letters; they assume the use of common word processing and mail merge software. For
167
The date that the questionnaire is mailed is important to include. Giving no date or just month and year
would be conspicuous and would fail to convey the
timing of the request you are making to get the completed questionnaire returned.
Name of Addressee
Listing the address helps convey the personalization of the survey request. Be sure to include all relevant addressing elements to assist with accurate
delivery; such as apartment number, lot, or unit number and the zip + 4 extension if available.
Salutation
168
Cronbachs Alpha
Complimentary Close
Real Signature
The complimentary close is followed by the signature, four lines down from the close, which states the
writers full name and below that her or his title. The
use of an actual signature using ballpoint pen or blue
ink digital signature has been found to raise response
rates compared to no signature or a machine-imprinted
signature. However, the use of an actual signature is
judged to be impractical by most researchers when
sample sizes are large. The actual (real) name of a person at the survey organization should be used, as it is
unethical to use a fictitious name.
Postscript
Usually, a postscript (P.S.) is read by the respondent. Careful consideration of what might or should be
included in the postscript is important.
Charles D. Shuttles and Mildred A. Bennett
See also Advance Letter; Confidentiality; Economic
Exchange Theory; Informed Consent; Leverage-Saliency
Theory; Refusal Avoidance; Social Exchange Theory;
Total Design Method (TDM)
Further Readings
CRONBACHS ALPHA
Cronbachs alpha is a statistic that measures the internal consistency among a set of survey items that (a)
a researcher believes all measure the same construct,
(b) are therefore correlated with each other, and
(c) thus could be formed into some type of scale. It
belongs to a wide range of reliability measures.
A reliability measure essentially tells the researcher
whether a respondent would provide the same score on
a variable if that variable were to be administered again
(and again) to the same respondent. In survey research,
the possibility of administering a certain scale twice to
the same sample of respondents is quite small for many
reasons: costs, timing of the research, reactivity of the
cases, and so on. An alternative approach is to measure
Cronbachs Alpha
n
r
,
1 + rn 1
169
Further Readings
170
CROSSLEY, ARCHIBALD
(18961985)
Archibald Maddock Crossley was born on December
7, 1896, in Fieldsboro, New Jersey. His love for the
state of his birth carried him to Princeton University in
1917; he later worked for a small advertising firm
based in Philadelphia. Crossleys research career began
soon afterward, in 1918, when he was asked by an
executive in his firm to create a research department,
something he knew nothing about. Once the department was created, Crossley began work on Crossley
Rating, which many believe is the first ratings system.
Using this rating, one could estimate the number of
telephone subscribers tuned in to any radio show at
any given time. Creating the ratings was no easy task,
requiring various Crossley aides to thumb through telephone books covering more than 80 U.S. cities. From
these telephone books, researchers were able to randomly call individuals and determine to what programs
they were listening. For 16 years, people were asked
one by one until May 1942, when Crossleys rating
system was replaced with a simpler Hooper telephone
poll. Even though Crossleys measure gave no indication about what people thought of a program, it was
still used to get a sense of what programs people were
listening to, which soon became synonymous with
good and bad programming, similar to the Nielsen and
Arbitron ratings systems of today.
Crossleys work in radio ratings served as a catalyst
for other research endeavors, leading him to form
Crossley, Inc., in 1926, a company that still operates
today under the name Crossley Surveys, created in
1954 when Crossley, Inc., merged with another firm.
During this time, Crossley collaborated with George
Gallup and Elmo Roper and successfully predicted
the 1936 presidential election, which was made infamous in public opinion circles after the Literary
Digest incorrectly predicted Alfred Landon would
defeat Franklin D. Roosevelt, an error that Crossley
and others attributed to sample bias and the misanalysis of poll returns. This experience led Crossley
to participate actively in the establishment of the
Market Research Council, the National Council on
Public Polls, and the American Association for Public
CROSS-SECTIONAL DATA
Cross-sectional data are data that are collected from
participants at one point in time. Time is not considered one of the study variables in a cross-sectional
Cross-Sectional Data
171
However, there also are disadvantages with crosssectional data. For example, cross-sectional data are
not appropriate for examining changes over a period
of time. Thus, to assess the stability of social or psychological constructs, longitudinal data are required.
Sociologists, in particular, made significant contributions to the early design and conduct of crosssectional studies. One of the major contributors in
cross-sectional design and the use of cross-sectional
data was Paul Lazarsfeld. Leslie Kish made significant contributions about how to sample subjects from
a target population for cross-sectional data.
Cong Liu
See also Attrition; Cross-Sectional Survey Design; Field
Period; Focus Group; Interviewer; Longitudinal Studies;
Sampling; Survey
Further Readings
172
Design Considerations
The principles of cross-sectional survey design are
those that one would normally think of for survey
design in general. Designing a panel survey would be
similar, except that provisions would need to be made
in sampling, operations, and questionnaire design in
light of the need to maintain contact with respondents
and collect repeated measurements on variable of
interest. Some of the considerations particular to panel
surveys could apply to a cross-sectional survey that is
to be repeated in the future.
The steps in designing a cross-sectional survey may
be thought of as (a) conceptualization (or research
design), (b) sample design, (c) questionnaire (or other
data collection instrument) design, and (d) operations
planning.
Conceptualization
The sample design builds on the process of conceptualization. Steps in designing the sample include
the following:
1. Selecting (or planning to construct) a sampling
frame
2. Defining the strata, if any, to be employed
3. Deciding whether the sample is to be a singlestage, clustered, or multi-stage design, and
4. Determining the sample size
The questionnaire design also flows from the conceptualization process. The questionnaire or other
instrument translates the dependent and independent
variables into specific measurements. Often, questions
available from previous studies can be used or
adapted; sometimes new items must be developed.
Scales to measure attitudes or psychological constructs may be available from the survey research or
psychological literature. New items will require cognitive testing and pretests. The form of the questions
will depend in part on the mode of data collection: for
example, show cards cannot be used in a telephone
survey.
Other considerations in questionnaire design
include the overall length of the instrument, skip patterns, and the possibility of question ordering effects.
173
Operations Planning
174
Cutoff Sampling
CUTOFF SAMPLING
Cutoff sampling is a sampling technique that is most
often applied to highly skewed populations, such as
business establishments that vary considerably in
employee size, gross revenues, production volume, and
so on. Data collected on establishment surveys (from
businesses or other organizations, including farms) are
often heavily skewed. For any variable of interest there
would be a few large values, and more and more, smaller and smaller values. Therefore, most of the volume
for a given data element (variable) would be covered
by a small number of observations relative to the number of establishments in the universe of all such establishments. If a measure of size is used, say, number of
employees or a measure of industrial capacity or some
other appropriate measure, then the establishments can
be ranked by that size measure. A cutoff sample would
not depend upon randomization, but instead would
generally select the largest establishments, those at or
above a cutoff value for the chosen measure of size.
This is the way cutoff sampling is generally defined,
but the term has other interpretations. Four methods
are discussed here.
Cutoff sampling is used in many surveys because of
its cost-effectiveness. Accuracy concernsfor example, noncoverage bias from excluding part of the populationare different than in design-based sampling
and are mentioned following. Note that cutoff sampling
could be used for other than establishment surveys, but
these are where it is generally most appropriate.
Of the following methods, the first two are probably
more universally considered to be cutoff sampling:
Method 1. Assign a probability of one for sample
selection for any establishment with a measure of size
at or above (or just above) a cutoff value, and a zero
probability of selection for all establishments with
a measure of size below (or at or below) that cutoff.
175
176
Cutoff Sampling
from http://www-econo.economia.unitn.it/new/
pubblicazioni/papers/9_07_bee.pdf
Elisson, H., & Elvers, E. (2001). Cut-off sampling and
estimation. Statistics Canada International Symposium
SeriesProceedings. Retrieved March 29, 2008, from http://
www.statcan.ca/english/freepub/11-522-XIE/2001001/
session10/s10a.pdf
Harding, K., & Berger, A. (1971, June). A practical approach
to cutoff sampling for repetitive surveys. Information
Circular, IC 8516. Washington, DC: U.S. Department of
the Interior, Bureau of Mines.
Knaub, J. R., Jr. (2007, April). Cutoff sampling and
inference. InterStat: Statistics on the Internet. Retrieved
May 27, 2007, from http://interstat.statjournals.net
Madow, L. H., & Madow, W. G. (1978). On link
relative estimators. Proceedings of the Survey
Research Methods Section. American Statistical
Association (pp. 534539). Retrieved February 19,
2007, from http://www.amstat.org/Sections/Srms/
Proceedings
Madow, L. H., & Madow, W. G. (1979). On link relative
estimators II. Proceedings of the Survey Research
Methods Section. American Statistical Association
(pp. 336339). Retrieved February 19, 2007, from
http://www.amstat.org/Sections/Srms/Proceedings
Royall, R. M. (1970). On finite population sampling
theory under certain linear regression models.
Biometrika, 57, 377387.
Saerndal, C.-E., Swensson, B., & Wretman, J. (1992).
Model assisted survey sampling. New York:
Springer-Verlag.
Sweet, E. M., & Sigman, R. S. (1995). Evaluation of
model-assisted procedures for stratifying skewed
populations using auxiliary data. Proceedings of the
Survey Research Methods Section. American Statistical
Association, I (pp. 491496). Retrieved February 19,
2007, from http://www.amstat.org/Sections/Srms/
Proceedings
D
are also complex relationships among the answers in
various waves that result from pre-fills (i.e., data carried forward) from previous surveys and bounded
interviewing techniques that create event histories by
integrating lines of inquiry over multiple rounds of
interviewing. Failure to use an RDBMS strategy for
a longitudinal survey can be considered a serious error
that increases administrative costs, but not using
RDBMS methods in large and complex cross-sectional
surveys can be considered just as big an error.
DATA MANAGEMENT
Longitudinal projects and other large surveys generate
large, complex data files on thousands of persons that
researchers must effectively manage. The preferred
data management strategy for such large, complex
survey research projects is an integrated database
facility built around modern relational databases. If
one is dealing with a relatively small, simple questionnaire, many carefully implemented methods for
data collection and data management will work. What
needs to be done for technologically complex surveys
touches upon all the considerations that can be given
to less complex survey data sets.
As the scale, scope, and complexity of a survey
project grow, researchers need to plan carefully for the
questionnaire, how the survey collects the data, the
management of the data it produces, and making the
resultant data readily available for analysis. For
these steps to run smoothly and flow smoothly from
one to the other, they need to be integrated. For these
reasons, relational database management systems
(RDBMS) are effective tools for achieving this integration. It is essential that the data file preserve the
relationships among the various questions and among
the questionnaire, respondent answers, the sampling
structure, and respondent relationships. In birth cohort
or household panel studies there are often complex
relationships among persons from the same family
structure or household. In longitudinal surveys there
Structure
Questionnaires often collect lists, or rosters, of people,
employers, insurance plans, medical providers, and so
on and then cycle through these lists asking sets of
questions about each person, employer, insurance
plan, or medical provider in the roster. These sets of
related answers to survey questions constitute some of
the tables of a larger relational database in which the
connections among the tables are defined by the
design of the questionnaire. One can think of each
question in a survey as a row within a table, with
a variety of attributes that are linked in a flexible
manner with other tables. The attributes (or columns)
within a question table would contain, at a minimum,
the following:
The question identifier and the title(s) associated with
the variable representing the questions answer with
the facility to connect the same question asked in
177
178
Data Management
question. For example, metadata include which questions lead into a particular question and questions to
which that question branches. These linkages define
the flow of control or skip pattern in a questionnaire.
With a sophisticated set of table definitions that
describes virtually any questionnaire, one can join
tables and rapidly create reports that are codebooks,
questionnaires, and other traditional pieces of survey
documentation. The questionnaire itself is not programmed but rather is formed by the successive display on the screen of the questions characteristics,
with the next question determined either by direct
branching or by the execution of internal check items
that are themselves specified in the question records.
Sequential queries to the instrument database display
the questions using an executable that does not change
across surveys but guides the interview process
through successive question records.
By breaking down the survey into a sequence of
discrete transactions (questions, check items, looping
instructions, data storage commands, etc.) stored in
a relational database, with each transaction being
a row in a database table and the table having a set of
attributes as defined in the relational database, one
can efficiently manage survey content, survey data,
data documentation, and even public user data extraction from a single integrated database structure.
Web Integration
When the tools that reference the master database are
Web enabled, staff at any field organization in the
world can access this resource and share it. Access
control and security measures are necessary, of
course. Some users can be given access to some parts
of the data set with varying read/write permissions.
One person might only be able to edit database fields
related to documentation and so on.
When the data capture system is built for the Web,
multi-modal surveys on the Web (including cell
phone Internet connections), computer-assisted telephone interview (CATI), or computer-assisted personal interviewing (CAPI) become simple to execute.
(CAPI is done either by putting a client and server on
the laptop or tapping into the cellular network with
a wireless modem and using the Web.) The organizations involved in survey data collection are increasingly keen on multi-modal surveys in order to
accommodate difficult users who have very particular
Data Management
Software
Relational database software is a major software
industry segment, with vendors such as Oracle,
Sybase, IBM, and Microsoft offering competitive products. Many commercial applications use relational
database systems (inventory control; accounting systems; Web-based retailing; administrative records systems in hospitals, welfare agencies, and so forth, to
mention a few), so social scientists can piggyback on
a mature software market. Seen in the context of relational databases, some of the suggested standards for
codebooks and for documenting survey data, such as
the data documentation initiative (DDI), are similar to
relational database designs but fail to use these existing professional tool sets and their standard programming conventions. Superimposing a DDI structure for
documentation also fails to make an organic connection among the management of the instrument, management of the data, and the dissemination of the
data. Rather than including the questionnaire specification in an RDBMS at the outset, the DDI approach
requires the instrument to be retrofitted into DDI form
with additional labor time and its attendant costs and
fails to exploit the economies of scope RDBMS methods provide.
Either one plans for survey complexity at the outset of the effort or one retrofits the data from the field
into an RDBMS, which amounts to paying for the
same work twice or three times because of all the
steps taken to manage these projects. For example,
the designers must write down the questionnaire
specifications. This sounds simple, but it is virtually
always the case that the document the design team
produces does not cover every contingency that can
occur and where the instrument must branch in that
case. For example, one needs to specify not only what
is to happen if the respondent refuses to answer each
question or says, I dont know; one must also
decide how to handle any internal check item that
encounters an answer with an item nonresponse. This
means the questionnaire programmer needs to go back
and forth with the design team to ensure the instrument is faithful to their intentions. Once designed, the
instrument must be tested, and one needs a testing
protocol that can test out the many pathways through
179
180
Data Swapping
Further Readings
DATA SWAPPING
Data swapping, first introduced by Tore Dalenius and
Steven Reiss in the late 1970s, is a perturbation
method used for statistical disclosure control. The
objective of data swapping is to reduce the risk that
anyone can identify a respondent and his or her
responses to questionnaire items by examining publicly released microdata or tables while preserving the
amount of data and its usefulness.
In general, the data swapping approach is implemented by creating pairs of records with similar attributes and then interchanging identifying or sensitive
data values among the pairs. For a simplistic example,
suppose two survey respondents form a swapping
pair by having the same age. Suppose income categories are highly identifiable and are swapped to
reduce the chance of data disclosure. The first respondent makes between $50,000 and $60,000 annually,
and the other makes between $40,000 and $50,000.
After swapping, the first respondent is assigned the
income category of $40,000 to $50,000, and the second respondent is assigned $50,000 to $60,000.
One benefit of data swapping is that it maintains
the unweighted univariate distribution of each variable that is swapped. However, bias is introduced in
univariate distributions if the sampling weights are
different between the records of each swapping pair.
One can imagine the impact on summaries of income
categories if, in the example given, one survey respondent has a weight of 1, while the other has a weight
of 1,000. A well-designed swapping approach incorporates the sampling weights into the swapping algorithm in order to limit the swapping impact on
univariate and multivariate statistics.
There are several variations of data swapping,
including (a) directed swapping, (b) random swapping, and (c) rank swapping.
Directed swapping is a nonrandom approach
in which records are handpicked for swapping. For
instance, a record can be identified as having a high
risk of disclosure, perhaps as determined through
a matching operation with an external file, and then
chosen for swapping. Random swapping occurs when
all data records are given a probability of selection
and then a sample is selected using a random
approach. The sampling can be done using any
approach, including simple random sampling, probability proportionate to size sampling, stratified random
sampling, and so on. Once the target records are
selected, a swapping partner is found with similar
attributes. The goal is to add uncertainty to all data
records, not just those that can be identified as having
a high risk of disclosure, since there is a chance that
not all high-risk records identified for directed swapping cover all possible high-risk situations. Finally,
rank swapping is a similar method that involves the
creation of pairs that do not exactly match on the
selected characteristics but are close in the ranking of
the characteristics. This approach was developed for
swapping continuous variables.
The complexities of sample surveys add to the
challenge of maintaining the balance of reducing disclosure risk and maintaining data quality. Multi-stage
sample designs with questionnaires at more than one
level (i.e., prisons, inmates) give rise to hierarchical
data releases that may require identity protection for
Debriefing
181
DEBRIEFING
Debriefing in survey research has two separate meanings. It is used to refer to the process whereby qualitative feedback is sought from the interviewers and/or
respondents about interviews conducted and surrounding survey processes. It also is used to refer to the
process whereby justified deception has been used
by the researchers, and, following ethical research
practices, respondents are then debriefed after the
study ends to explain the deception to them and try to
undo any harm that may have been caused by the
deception.
182
Deception
DECEPTION
According to Websters Dictionary, deception is the
act of making a person believe what is not true; that
Deception
183
184
Deliberative Poll
Further Readings
DELIBERATIVE POLL
A deliberative poll is a methodology for measuring
public preferences that combines small group discussions and traditional scientific polling. It was created
by James Fishkin, political science and communications professor, with the goal of improving the quality
of public opinion expression and measurement.
Fishkin argues that traditional polls often do not
provide good measures of public opinion because
Demographic Measure
weekend of deliberations. However, selection differences were less prevalent on the issue questions.
Another concern is whether group discussions
are the best approach for disseminating information.
Deliberative poll participants generally take the group
discussion task seriously, but criticisms have been
raised about the quality of the discussions and the
accuracy of information exchanged in them. A related
criticism of the discussions is the potential impact of
group dynamics. In group situations, people can be
influenced by normative factors unrelated to the
strength or merits of the arguments. In addition, differences in discussion participation rates can also
have an impact on opinions. Not everyone is equally
motivated or has the same ability to participate in
group discussions. The more vocal and persuasive
members of the group may have a disproportionate
influence on the outcome of the deliberative poll.
There also has been debate about the amount
of opinion change that is produced by deliberative
polling. For example, in the 1996 National Issue
Convention, Fishkin pointed to a number of statistically significant shifts in aggregate opinion as a result
of participation in that deliberative poll. Other
researchers have argued that there were relatively few
meaningful changes in aggregate opinion after this
significant effort to educate members of the public
and have them participate in extensive deliberations.
This was taken as evidence of the robustness of public
opinion as measured by traditional public opinion
polls that can be conducted at a fraction of the cost
of a project like the National Issues Convention
Deliberative Poll. Larger shifts in aggregate opinion
have been found, for example, in deliberative polls
conducted for utility companies on esoteric issues for
which opinions are weakly held or nonexistent and
public interest and knowledge are very low.
Daniel M. Merkle
See also Focus Group; Poll; Public Opinion
Further Readings
185
DEMOGRAPHIC MEASURE
Demographic measures are questions that allow pollsters and other survey researchers to identify nonopinion characteristics of a respondent, such as age, race,
and educational attainment. Demographic measures
typically are used to identify key respondent characteristics that might influence opinion and/or are correlated with behaviors and experiences. These questions
are usually found at the end of a questionnaire.
Reasons for this are (a) to engage or otherwise build
rapport with the respondent by asking substantive
questions of interest earlier in the questionnaire; (b) to
lessen the likelihood that asking these personal questions will lead to a refusal to continue completing the
questionnaire (i.e., a breakoff); (c) to prevent priming
the respondent; and (d) to allow the respondent to
answer the core questions before possibly boring him
or her with the mundane demographic details.
Demographic measures are important because
numerous studies have demonstrated that opinions are
formed primarily through an individuals environment.
This environment socializes us to think and behave in
accordance with community norms and standards. As
a result, by identifying these demographic measures,
pollsters are better suited to understand the nature of
public opinion and possibly how it might be formed
and modified.
Demographic measures are also very important
because they allow researchers to know how closely
the sample resembles the target population. In a national sample of U.S. citizens, for example, researchers
know what the population looks like, demographically,
because the federal government conducts a census
every 10 years and updates those data annually thereafter until the next census. As such, researchers know
the percentages of the population based on race, gender, age, education, and a whole host of other demographic characteristics. A simple random sample of
the population ideally should resemble the population,
and demographic measures allow researchers to see
how well it does. For example, because survey nonresponse often correlates with educational attainment,
most surveys of the public gather data from proportionally far too many respondents who earned college
degrees and far too few respondents who did not graduate from high school. Knowing the demographic
characteristics of the sample respondents (in this case,
186
Dependent Interviewing
Further Readings
DEPENDENT INTERVIEWING
Dependent interviewing is a method of scripting computer-assisted survey questionnaires, in which information about each respondent known prior to the
interview is used to determine question routing and
wording. This method of personalizing questionnaires
can be used to reduce respondent burden and measurement error. The prior information can be incorporated
reactively, for in-interview edit checks, or proactively,
to remind respondents of previous answers.
Dependent interviewing exploits the potential of
scripting computer-assisted questionnaires such that
each interview is automatically tailored to the respondents situation. This can be done using routing
instructions and text fills, such that both the selection
of questions and their wording are adapted to the
respondents situation. Both routing and text fills are
usually based on responses to earlier questions in the
questionnaire. Dependent interviewing in addition
draws on information known to the survey organization about the respondent prior to the interview.
In panel surveys, where dependent interviewing is
mainly used, this information stems from previous
waves of data collections. For each panel wave, prior
survey responses are exported and stored together
with identifying information (such as name, address,
and date of birth) used by interviewers to locate sample members eligible for the round of interviewing.
The previous information can be incorporated
into the questionnaire script to reduce respondent burden and measurement error. In panel surveys, a set of
core questions are repeated at every interview. For
respondents whose situation has not changed between
Dependent Interviewing
187
earnings differ by more than +/10% from the previous interview, the respondent could be asked:
May I please just check?So your earnings have
changed from <fill: amount a> to <fill: amount
b> since we last interviewed you on <fill: date of
interview>?
In addition, the respondent could be asked to clarify the reason for the difference, and this information
could later be used for data editing.
With proactive dependent interviewing, the previous
response is incorporated into the question text. This
can be used as a boundary before asking the independent question. For example, respondents may be asked:
Last time we interviewed you on <fill: date of interview>, you reported receiving <fill: $amount of
unemployment benefits> each month. Have you continued to receive <fill: $amount unemployment benefits> each month since <fill: date of interview>?
Alternatively, the respondent can be asked to confirm the prior information before being asked about
the current situation. For example:
According to our records, when we last interviewed
you on <fill: date of interview>, you were <fill:
employment status>. Is that correct?
188
Dependent Variable
Further Readings
DEPENDENT VARIABLE
A dependent variable is a variable that is explained by
one or more other variables, which are referred to as
independent variables. The decision to treat a variable as a dependent variable may also imply a claim
that an independent variable does not merely predict
this variable but also shapes (i.e., causes) the dependent variable. For example, in a survey studying news
consumption, exposure to television news could serve
as a dependent variable. Other variables, such as
demographic characteristics and interest in public
affairs, would serve as the independent variables.
These independent variables can be used to predict
Designated Respondent
189
Further Readings
DESIGNATED RESPONDENT
Designated respondents are the individuals chosen
specifically to be interviewed for a survey. Surveys
often are conducted in two stages: first, selecting
a sample of household units and, second, selecting
persons within the households with whom to speak.
Survey researchers and interviewers jobs would be
easier if they could question the persons first answering the phone or first coming to the door or simply
any adult resident in the unit who was willing to talk.
This usually is an acceptable idea only if the researchers simply need to know the basic characteristics of
the household; however, much of the time researchers
need to gather data from one specifically chosen person in the householdthat is, translate the sample of
units into a sample of individuals. In contrast, if the
respondent is merely the most likely person to answer
the phone or to be home, his or her characteristics
may be overrepresented in the sample, meaning that
the sample will be biased. These more willing or
available individuals tend to be older and/or female.
Such biases mean that survey researchers are likely to
get an inaccurate picture of their samples and can
come to some incorrect conclusions. Information
quality depends on who is providing it.
Researchers try to avoid such bias by using
a within-household selection procedure likely to produce a more representative sample at the person level.
These tend to be more expensive than interviewing
any available person in the household, but they are
also more precise. It takes more time to find the
190
Designated Respondent
Design-Based Estimation
DESIGN-BASED ESTIMATION
Design-based estimation methods use the sampling
distribution that results when the values for the finite
population units are considered to be fixed, and the
variation of the estimates arises from the fact that
statistics are based on a random sample drawn from
the population rather than a census of the entire
population.
Survey data are collected to estimate population
quantities, such as totals, means, or ratios of certain
characteristics. Other uses include comparing subpopulationsfor example, estimating the average
difference between males and females for certain
characteristics. In addition to these descriptive quantities, for many surveys the data are used to fit statistical models, such as linear regression models, to
explain relationships among variables of interest
for the particular population. In any case, statistics
derived from the sample are used to estimate these
population quantities, or parameters. The basis for
assessing the statistical properties of such estimates is
the sampling distribution (the probability distribution)
of the estimatesthe distribution of the estimates that
would arise under hypothetical repetitions using the
same randomization assumptions and the same form
of the estimate.
In design-based estimation, the probabilities used
to select the sample are then used as the basis for statistical inference, and such inference refers back to
the finite population from which the random sample
was selected. These selection probabilities are derived
191
Survey-Weighted Estimates
One common outcome in design-based methods is the
generation of point estimates that serve to estimate
the finite population parameters of interest, such as
a population mean, total, proportion, and so on. Such
estimates are derived using the sampling weights that
192
Design-Based Estimation
n
P
wi yi , where the
i=1
model-design-based framework. The model-designbased variance for the design-based estimate of the
model parameter will be close to the design-based
variance when the sampling fractions are small and
the sample size is large. Therefore, design-based
inference for the model parameters of interest would
also be valid in the model-design-based or combined
framework. Modifications to the design-based variance would be required for cases where the sampling
fractions are not negligible.
There are some statistical models for which
design-based estimation will not be consistent under
the model-design-based framework. These include
estimates of the variance components associated with
random effects models, mixed effects models, structural equation models, and multi-level models. The
fixed effects in these models can usually be estimated
consistently, but not the variance components associated with the random effects, unless certain conditions
on the sample sizes apply. However, for most models,
such as generalized linear models (including linear
regression and logistic regression) and proportional
hazards models, the parameters of interest can be estimated consistently.
Informative Sampling
The issue of whether a pure model-based estimation
approach, as opposed to a design-based estimation
approach, is appropriate when estimating quantities
from a sample that has been obtained from a complex
design is related to whether or not the sampling
design is informative. If the sampling distribution of
the observations is the same under the model-based
randomization assumptions as the sampling distribution under the model-design-based (or combined)
randomization assumptions, then the sampling is noninformative. Stratification and clustering in the sample
design can lead to informative samples.
When the sampling is informative, the observed
outcomes may be correlated with design variables not
included in the model, so that model-based estimates
of the model parameters can be severely biased, thus
leading possibly to false inferences. On the other
hand, if the sampling is noninformative, and a designbased estimation approach is used, then the variances
of the estimates will usually be larger than the variances of the estimates using a model-based approach.
David A. Binder
193
Further Readings
S2
,
n
194
VCSD y
,
VSRS y
18:78:
3
9
3
2
0:6667,
3
2
3
169
9
0:0355,
CM1
1 + M 1,
M C 1
M 1C M 1S2
,
195
iS
196
Alternatively, one may wish to estimate the population distribution and from this an estimator of S2 .
DEFFM = n
Model-Based Approach
to Design Effects
j=1
VarM1 ycj = s2 for c = 1, . . . ,C; j = 1, . . . , bc 9
CovM1 ycj ,yc0 ; j0 =
s2
0
if c = c0 ; j 6 j0 ;
otherwise
b* =
12
VarM1 yw
:
VarM2 y
13
wcj
bc
C P
P
w2cj
c=1
j=1
15
16
The quantity again servers as a measure of homogeneity. The usual analysis of variance (ANOVA)
estimator of ||I|| is given by
^ANOVA =
VarM2 ycj = s2 for c = 1, . . . ,C; j = 1, . . . , bc 11
!2
bc
P
C
P
c=1 j=1
10
wcj
!2 1 + b * 1,
where
wcj
bc
C P
P
14
model (M1):
w2cj
c=1 j=1
c=1 j=1
bc
C P
P
MSB MSW
,
MSB + K 1MSW
17
where
MSB =
SSB
C1
P
with SSB = Cc=1 bc yc y2 , yc the sample mean of
the y values in the cth cluster, and
MSW =
with SSW =
PC Pbc
c=1
j=1
SSW
nC
b2c
c=1
K=
n n
:
C1
Diary
Software
Today, most of the popular statistical software
packages offer an option for data analyses to allow
for complex designseither by providing an estimate
of the design effect or by their capability to account
for the complex design in the variance estimation.
These include STATA, SUDAAN, and WesVar PC.
Siegfried Gabler, Matthias Ganninger,
Sabine Hader, and Ralf Munnich
See also Bootstrapping; Cluster Sample; Complex Sample
Surveys; Design-Based Estimation; Effective Sample Size;
Jackknife Variance Estimation; Model-Based Estimation;
(Rho); Sample Design; Systematic Sampling; Unbiased
Statistic; Variance Estimation; WesVar
Further Readings
197
DIARY
A diary is a type of self-administered questionnaire
often used to record frequent or contemporaneous
events or experiences. In diary surveys, respondents
are given the self-administered form and asked to fill
in the required information when events occur (eventbased diaries) or at specified times or time intervals
(time-based diaries). Data from diary studies can be
used to make cross-sectional comparisons across people, track an individual over time, or study processes
within individuals or families. The main advantages
of diary methods are that they allow events to be
recorded in their natural setting and, in theory, minimize the delay between the event and the time it is
recorded.
Diaries are used in a variety of domains. These
include studies of expenditure, nutrition, time use,
travel, media exposure, health, and mental health.
Expenditure surveys usually have a diary component
in which the respondent has to enter expenditures on
a daily basis for a short period of time, such as a week
or 2 weeks. An example of this is the Consumer
Expenditure Survey in the United States, in which
one household member is assigned two weekly diaries
in which to enter household expenditures. Food and
nutrition surveys use diaries to record food consumption over a fixed period of time. An example is the
1996 Food Expenditure Survey in Canada.
Types of Diaries
Time-use diaries usually have shorter reference periods than expenditure diaries. The most common
methodology is a diary where the respondent accounts
198
Diary
Further Readings
Differential Attrition
DIFFERENTIAL ATTRITION
Panel studies are subject to attrition, which is unit
nonresponse after the initial wave of data collection.
Attrition affects the results of analyses based on panel
data by reducing the sample size and thereby diminishing the efficiency of the estimates. In addition, and
more important, attrition also may be selective; differential or selective attrition occurs when the characteristics of the panel members who drop out of the panel
because of attrition differ systematically from the
characteristics of panel members who are retained in
the panel. Differential attrition may introduce bias in
survey estimates. However, the amount of bias
depends both on the amount of attrition and on the
selectivity of attrition, or in other words, on the association between the variables from which the estimate
is constructed and the attrition propensity of the panel
units. If an estimate is not associated at all with the
attrition propensity, then the data are not biased.
However, if an estimate is associated with the propensity to participate in the panel, the data are biased.
The propensity to participate in a panel survey
(or alternatively, the propensity to be contacted, and
given contact, the propensity to agree to participate in
the panel survey) is influenced by many different factors, from characteristics of the survey design, surveytaking climate, and neighborhood characteristics to
sociodemographic characteristics of the sample persons, the sample persons knowledge of the survey
topic, and their prior wave experiences. For example,
the at-home patterns of a household and its members, and thus also their propensity to be contacted,
are a function of sociodemographic attributes (e.g.,
number of persons in household) and lifestyle (e.g.,
working hours, social activities). If one person lives
alone in a housing unit, contact is completely dependent on when he or she is at home. Likewise, the lifestyles of younger people may involve more out-ofhome activities than those of other groups, and this
also means that they will be harder to contact.
Consequently, for example, when studying the extent
of and changes in social contacts as teenagers grow
into adulthood and later when they start their own
199
Further Readings
200
Differential Nonresponse
DIFFERENTIAL NONRESPONSE
Differential nonresponse refers to survey nonresponse
that differs across various groups of interest. For
example, for many varied reasons, minority members
of the general population, including those who do not
speak as their first language the dominant language of
the country in which the survey is being conducted,
are generally more likely to be nonresponders when
sampled for participation in a survey. Thus, their
response propensity to cooperate in surveys is lower,
on average, than that of whites. The same holds true
for the young adult cohort (1829 years of age) compared to older adults. This holds true in all Western
societies where surveys are conducted.
Ultimately, the concern a researcher has about this
possible phenomenon should rest on whether there is
reason to think that differential nonresponse is related
to differential nonresponse error. If it is not, then
there is less reason for concern. However, since nonresponse error in itself is difficult to measure, differential nonresponse error is even more of a challenge.
In considering what a researcher should do about
the possibility of differential nonresponse, a researcher
has two primary options. First, there are things to do
to try to avoid it. Given that noncontacts and refusals
are typically the main causes of survey nonresponse,
researchers can give explicit thought to the procedures
they use to make contact with respondents (e.g.,
advance letters) and those they use to try to avoid
refusals from respondents (e.g., refusal conversation
attempts)in particular as these procedures apply to
key groups from whom lower levels of contact and/or
cooperation can be expected. For example, the use of
differential incentives to persons or households known
from past research to be harder to contact and/or gain
cooperation from has been shown to be effective in
lowering differential nonresponse. However, some
have argued that it is not equitable to provide
higher incentives to groups that traditionally have low
response rates because it fails to fairly reward those
who readily cooperate in surveys.
Further Readings
Disclosure
DIRECTORY SAMPLING
Directory sampling is one of the earliest versions of
telephone sampling. Telephone directories consist of
listings of telephone numbers. The residential numbers are generally placed in a section of the directory
separate from business numbers. Each telephone
listing is generally accompanied by a name and an
address, although the address is not always present.
Households may choose not to have their telephone
number published in the directory. These are referred
to as unpublished numbers, most of which also are
unlisted numbers.
In the original application of directory sampling,
a set of telephone directories covering the geopolitical
area of interest to the survey were assembled. After
the sample size of telephone numbers was determined, a random selection procedure was used to
draw the required number of residential directorylisted telephone numbers for each directory. The
actual selection method ranged from using systematic
random sampling of listed telephone numbers to first
selecting a sample of pages from the directory and
then sampling one or more telephone numbers from
the selected pages.
Directory samples provide samples only of telephone numbers that are directory listed. Directory
samples will yield biased samples of a population,
because all unlisted households are given a zero probability of selection, and unlisted households generally
differ from listed households on key characteristics.
For example, persons with unlisted numbers are more
likely to be minorities, recent movers, and single
female adults. In some geographic areas, a substantial
percentage of households may have unlisted telephone
numbers, for example, larger central city areas and
Western states.
Today, directory-listed sampling is rarely used
alone, having been replaced by list-assisted randomdigit dial sampling. But in other ways, directory sampling has made a comeback. Telephone directories are
now entered into national databases of listed residential telephone numbers that are updated on an ongoing
basis. A fairly common random-digit dialing sample
design involves forming two strata. The first stratum
consists of directory-listed residential telephone numbers. The second stratum consists of telephone numbers in the list-assisted sampling frame that are not
201
Further Readings
DISCLOSURE
Within the context of survey research, disclosure can
be used with two distinct meanings. In the first meaning, a researcher is required to provide full disclosure
of his or her own identity and purpose in collecting
data. In the second meaning, a researcher is required
to prevent disclosure of information that could be
used to identify respondents, in the absence of specific
and explicit informed consent allowing the researcher
to disclose such information.
However, in some research settings, full disclosure
of the research objectives may jeopardize the objectivity of results or access to research participants.
Observational research of behavior in public settings,
for example, may be exempt from rules of informed
consent, since the public nature of the behavior itself
implies consent. Nevertheless, in this situation, the
researcher ideally should provide detailed justification
202
Disclosure
Disk by Mail
DISCLOSURE LIMITATION
Survey researchers in both the public and private sectors are required by strong legal and ethical considerations to protect the privacy of individuals and
establishments who provide them with identifiable
information. When researchers publish or share this
information, they employ statistical techniques to
ensure that the risk of disclosing confidential information is negligible. These techniques are often
referred to as disclosure limitation or disclosure
avoidance techniques, and they have been developed and implemented by various organizations for
more than 40 years.
The choice of disclosure limitation methods
depends on the nature of the data product planned for
release. There are specific disclosure limitation methods for data released as micro-data files, frequency
(count) tables, or magnitude (point estimates) tables.
Online query systems may require additional disclosure limitation techniques, depending on whether the
data underlying these systems are in the form of
micro-data files or tables.
The first step in limiting disclosures in data products is to delete or remove from the data any
personal or direct identifiers, such as name, street
address, telephone number, or Social Security number. Once this is done, statistical disclosure limitation
methods are then applied to further reduce or limit
disclosure risks.
After direct identifiers are deleted from a micro-data
file, there is still a possibility that the data themselves
could lead to a disclosure of the individual, household,
203
Further Readings
DISK BY MAIL
Disk by mail is a survey administration technique in
which a selected respondent is mailed a computer disk
that contains a questionnaire and a self-starting interview program. The respondent runs the program on
204
Dispositions
his or her own computer and returns the disk containing the completed questionnaire. In some instances,
the disk may provide an option for the person to transmit his or her responses over the Internet. Although
disk-by-mail surveys can be conducted with the general public, the approach is most effective for targeted
populations such as professional or business groups for
whom computer access is nearly universal.
Disk by mail is one of a variety of computerassisted self-interview (CASI) techniques. As such it
has some of the advantages of a computerized survey.
These surveys have the capability of guiding the
respondent interactively through the questionnaire and
including very complex skip patterns or rotation logic.
This approach can also offer many innovative features
beyond traditional mail and telephone surveys, but it
does require costs and time in terms of programming
and distribution of the survey. Because the approach
is computer based, it allows the researcher to enhance
the survey forms with respect to the use of color,
innovative screen designs, question formatting, and
other features not available with paper questionnaires.
They can prohibit multiple or blank responses by not
allowing the participant to continue on or to submit
the survey without first correcting the response error.
Disk by mail also shares some of the advantages of
mail surveys. It is less expensive than telephone surveys since there are no interviewer costs incurred,
eliminates the potential for interviewer bias, provides
respondents with greater perceived anonymity that
may lead to more truthful answers, especially on sensitive questions; and allows respondents to complete
the survey on their own time, that is, when it is most
convenient.
Disk by mail does have some drawbacks as a survey technique. It is restricted to those having access
to a computer and limited by the technological capacity or make of the respondents computer. Although
disk-by-mail surveys allow for much more innovative
features than paper-and-pencil mailed surveys, some
respondents may have difficulty accessing the survey
due to poor computer skills and will not be able to
respond. Furthermore, some people are not accustomed to the process used to respond to an electronic survey (e.g., selecting from a pull-down menu,
clicking a radio button, scrolling from screen to
screen) and will need specific instructions that guide
them through each question and the manner in which
they should respond. As with other computer-based
survey tools, respondents are often concerned about
DISPOSITIONS
Sample dispositions (codes or categories used by survey researchers to track the outcome of contact
attempts on individual cases in the sample) provide
survey researchers with the status of each unit or case
within the sampling pool and are an important quality
assurance component in a survey, regardless of the
mode in which the survey is conducted. Sample dispositions are used for three reasons: (1) to help the
survey researcher control the sampling pool during
the field period, (2) to calculate response rates, and
(3) to help assess whether the sample might contain
nonresponse error. Sample dispositions usually are
tracked through the use of an extensive system of
numeric codes or categories that are assigned to each
unit in the sampling pool once the field period of the
survey has begun. Common sample dispositions
include the following:
Dispositions
205
206
Further Readings
DISPROPORTIONATE ALLOCATION
TO STRATA
One type of random sampling employed in survey
research is the use of disproportionate allocation to
strata. Disproportionate allocation to strata sampling
involves dividing the population of interest into mutually exclusive and exhaustive strata and selecting
elements (e.g., households or persons) from each
stratum.
Commonly used strata include geographic units;
for example, high-minority-density census tracts in
a city are put into one stratum and low-minoritydensity census tracts are put into another stratum. In
epidemiology case-control studies, strata are used
where persons in one stratum have a condition of interest (e.g., Type I diabetes) and persons without the
condition are put into a second stratum. After dividing
the population into two or more strata, a disproportionate number of persons are selected from one stratum
relative to others. In other words, the persons in one
stratum have a higher probability of being included in
the sample than are persons in the other strata.
This type of sampling can be used to create a more
efficient sample design with more statistical power to
detect key differences within a population than a simple random sample design or a proportionate stratified
sample design. An example of a difference within
a population is the comparison of older and younger
persons with respect to some characteristic, such as
having health insurance. However, a disproportionate
allocation can also produce some results that are
much more inefficient than a simple random sample
or a proportionate stratified sample design.
Disproportionate allocation to strata as a technique
can be more efficient than a simple random sample
design. Efficiency is determined by whether the sample variances are smaller or larger than they would
have been if the same number of cases had been sampled using a simple random sample.
Researchers use disproportionate allocation to
strata in order to increase the number of persons with
important characteristics within their final study sample and to increase the efficiency of the sample design
over simple random sampling. When making estimates using a sample that has used disproportionate
allocation to strata sampling, it is important to control
for the differences in the probabilities of selection into
the sample. Persons from some strata will have been
more likely to be included than persons from other
strata. To accomplish this task, survey weights are
used to adjust each person for their probability of
selection into the sample when making estimates of
specific characteristics for the entire population.
Disproportionate allocation to strata can make
some estimates more (or less) efficient than if the
same number of cases had been selected using simple
random sampling. Efficiency is gained to the extent
that the variables used to stratify the target population
are related to the characteristic being studied. For
example, when stratifying a health insurance survey
by age into two stratathose 65 years of age and
older and those under 65 yearsthe outcome variable
of interest, health insurance coverage, is strongly
related to the variable used to stratify. People 65 years
of age and over are much more likely to be insured
than those under 65 years. The same is true for casecontrol studies where the condition of interest is used
to stratify the target population and the resulting sample is more efficient for studying differences between
those with a condition and those that are known to
have the condition than would have been possible
through a simple random sample of the population.
By the same token, the survey that was more
effective for some estimates as a result of stratification may be less efficient for other estimates than
a simple random sample would have been. For example, when studying political party preference using
a survey stratified by age65 years and over versus
18- to 64-year-oldswill not yield nearly as efficient
an estimate as it did for health insurance coverage,
because party preference is not as correlated with
being over 64 years as health insurance coverage is.
This situation also varies by how much more likely
people in one stratum were to be selected into the
sample. The worst case scenario is when the strata are
Further Readings
207
208
Further Readings
Dont Know response option has been found to substantially increase (from 5 to as much as 30 percentage points) the proportion of respondents who report
that they dont know, particularly for questions about
issues with which respondents may not be familiar.
Because including an explicit Dont Know option
can have a dramatic impact on responses, the decision
about whether to explicitly offer such a response
option is a very important one for researchers when
creating a survey instrument. Two perspectives
nonattitude and satisficingprovide competing theoretical arguments about this decision.
The nonattitude perspective suggests that respondents who genuinely do not know an answer nevertheless may choose a substantive response option
when no other option is available. The nonattitude
perspective comes from Philip Converses observation that survey interviews may exert implicit pressure on respondents to appear to have an opinion on
a wide range of topics. When respondents are faced
with a question to which they genuinely do not know
the answer, many may be uncomfortable admitting
that they know little about the topic or that they do
not know the answer, and this may be particularly
true when multiple questions for which they are
uninformed are included in a survey interview.
Respondents who do not have attitudes on an issue
may respond to a question about the issue essentially
by randomly selecting responses from among the
choices offered. Including an explicit Dont Know
response option would provide these respondents
with a way to accurately report that they do not know
how to answer the question.
In contrast, as noted previously, the satisficing
perspective suggests that respondents may choose
an explicitly offered Dont Know response option as
an alternative to completing the work necessary to
choose a substantive response that they would otherwise be able to provide. Thus, the satisficing perspective suggests that Dont Know responses should not
always be viewed as accurate reports of nonattitudes.
This perspective on Dont Know responding is based
on the argument that answering survey questions is
a demanding cognitive task. When answering each
question in a survey, respondents must understand
and interpret the question, search their memory for
relevant information, integrate that information into
an opinion, and translate that opinion into an understandable response. This work may overwhelm
209
respondents abilities or motivation. In such situations, some respondents may satisfice by seeking out
ways to avoid doing this work while still appearing as
if they are carrying on a survey interview appropriately. When respondents satisfice, they look for a cue
in the question suggesting how to do so. An explicitly
offered Dont Know response option provides such
a cue, allowing respondents who are otherwise disposed to satisfice to do so by saying, Dont know.
If a Dont Know option was not offered (and the
question provided no other cue about how to satisfice), these respondents might be pushed to do the
cognitive work necessary to carefully answer the survey question.
Evidence about why respondents choose to report
that they do not know generally supports the satisficing perspective. Omitting Dont Know response
options from survey questions does not appear to substantially reduce data quality. There is little evidence
that explicitly offered Dont Know response options
provide an advantage to researchers.
Allyson Holbrook
See also Forced Choice; Missing Data; Nonattitude;
Response Alternatives; Satisficing
Further Readings
210
Double-Barreled Question
DOUBLE-BARRELED QUESTION
A double-barreled question asks about more than one
construct in a single survey question. Best practices
for questionnaire design discourage use of certain
types of questions. Questions with unknown terms or
complicated syntax should not be used when designing a questionnaire. Foremost among these recommendations is to avoid double-barreled questions.
The word and is a hallmark of a double-barreled
question. Double-barreled questions most frequently
arise in attitudinal questions. In these types of questions, two attitude targets (e.g., political candidates
and policy decisions) are asked as one construct (e.g.,
Do you favor candidate X and higher taxes or candidate Y and lower taxes?). Response formation problems arise when the respondent prefers candidate X
and lower taxes or candidate Y and higher taxes.
Statements that align two different constructs also are
double-barreled (e.g., Do you agree or disagree with
the following statement: Managers in my organization
are helpful, but the lack of diversity in the organization is disappointing). The word but plays the role of
the conjunction and, linking two divergent question
constructs into one double-barreled question.
Double-barreled questions require more time for
respondents to answer than single-barreled forced
choice questions. Comprehension breakdowns are
responsible for part of the problems with double-barreled questions. Respondents struggle to understand
exactly which construct among the multiple constructs
that appear in the question wording is the most important, resulting in higher rates of requests for clarification for double-barreled questions than in singlebarreled questions. Breakdowns may also occur when
generating a response and in mapping the retrieved or
generated response to the response options. As a result,
higher rates of item nonresponse and unstable attitudes are likely to occur with double-barreled questions. This also leads to analytic problems and
questions of construct validity, as the analyst does not
know which barrel led to the respondents answer.
Some double-barreled questions ask about one
construct in the question wording, but introduce
a second construct through the response options.
These questions are sometimes called one-and-ahalf-barreled questions. For example, Do you
agree or disagree with Candidate Zs views on
Further Readings
DOUBLE NEGATIVE
A double negative refers to the use of two negatives
in one statement or question. In questionnaire design,
this is almost always a situation to be avoided. A
double-negative usually creates an unnecessary
amount of confusion in the mind of the respondent
and makes it nearly impossible for the researcher to
accurately determine what respondents were agreeing
or disagreeing to.
Drop-Down Menus
Perhaps the most infamous example of a doublenegative occurred in November 1992 in a survey
conducted by the prestigious Roper Organization,
a respected survey research center founded in 1947.
Roper was commissioned by the American Jewish
Committee to conduct a survey of adults in the
United States to measure public attitudes and beliefs
about Jews. The following question slipped through
the usual quality control steps:
The term Holocaust usually refers to the killing of
millions of Jews in Nazi death camps during World
War II. Does it seem possible or does it seem
impossible to you that the Nazi extermination of the
Jews never happened?
211
DROP-DOWN MENUS
Drop-down menus are often used in Web surveys and
are one of the basic form elements in HTML (hypertext markup language) used for closed-ended survey
questions, in addition to radio buttons and check
boxes. They are also referred to as drop-down lists
and drop-down boxes.
Drop-down menus commonly display a single
option that can be left blank, and other response
options become visible after clicking on the side of
the box. A response is selected by then clicking on
one of the displayed choices, and multiple selections
can be allowed (i.e., check all that apply). They are
suitable and used for long categorical lists, such as
lists of states or institutions, but they can be used for
shorter and ordinal lists. While radio buttons and
drop-down menus can fulfill the same purpose, there
are some key differences. Drop-down menus take less
space, as all the options do not need to be visible at
all times. They also require two clicks, instead of
a single click, to select a response option.
Experimental studies show no difference between
radio buttons and drop-down menus in terms of time
and break-off rates, but find higher rates of item
212
Dual-Frame Sampling
Population/Universe
Andy Peytchev
See also Check All That Apply; Closed-Ended Question;
Radio Buttons; Web Survey
ab
Frame A
Intersection
Frame B
Further Readings
DUAL-FRAME SAMPLING
Dual-frame sampling designs are a subset of multipleframe designs in which units within the population of
interest are selected via independent probability samples taken from each of two frames. These two frames
make up the population of interest, and they typically
overlap. The dual-frame sampling approach is often
useful when the amount of undercoverage from using
a single frame is substantially improved by the introduction of two (or more) frames. The degree of overlap in the two frames is usually not known a priori to
the sampling, but should this information be available,
estimates of the amount of undercoverage to be
expected from the dual-frame approach can be assessed more accurately. The resulting estimates from each
of the two frames in the dual-frame sampling design
are combined to form a single composite dual-frame
estimate of the population parameter(s) of interest. A
generic figure illustrating the basic structure of a twoframe design is provided in Figure 1.
Considering this figure, we can see that there are
three possible overlap situations that may occur
when using two frames in the sampling design,
including the following:
1. Illustrated in Figure 1 is the circumstance in
which neither of the two frames is completely
included in the other, implying that Frame A and
Frame B have some degree of overlap (i.e., like cell
phone and landline phone ownership). This approach
serves to improve the overall coverage of the target
Figure 1
population, thus reducing undercoverage; this situation is very common for dual-frame designs in practice. Another spin on this approach comes when
estimates from a rare population are desired. For
example, using random-digit dialing (RDD) to survey
the state to estimate the quality of life of breast cancer
survivors one year beyond their cancer diagnosis is
possible through the use of a health eligibility
screenerhowever, within a given state, the proportion of adult citizens who are one-year breast cancer
survivors may be small, making the screener approach
alone prohibitively expensive. The State Cancer
Registry, however, provides a list of those diagnosed
with cancer and is considered complete somewhere
around 2 years post-diagnosis. So using this frame at
the one-year point would certainly be accompanied
by a degree of undercoverage and may contain errors
in diagnosis, in general but it would include more
individuals from the target population of interest.
Using a dual-frame approach with an RDD frame
with a health screener along with the cancer registry
frame may be a more viable and precise approach for
estimating the quality of life parameter of interest.
2. Not illustrated in Figure 1 is the circumstance
in which Frame A is a complete subset of Frame B
(i.e., a rare segment of the population, like homeless,
institutionalized, or members of a health maintenance
organization who were prescribed a particular type of
drug). In this case, Frame B may provide complete
coverage of the population frame (i.e., complete
household address list for customers within a business
district of a large retail corporation), while Frame A
may consist of a subset of population units from
Dual-Frame Sampling
213
214
Table 1
Sample
Unit
Dual-Frame Sampling
Example for computing dual-frame estimate for the total purchases made by "rewards program"
members based on samples from telephone and email frames
Selected from
Frame:
In the
Overlap?
Used in:
Sampling
Weight
Annual Purchases
for Selected Unit
No
Y^a
16
$354.39
No
Y^a
16
$205.76
No
Y^a
16
$329.39
Yes
Y^ab
16
$255.53
Yes
Y^ab
16
$264.48
No
Y^b
10
$408.70
No
Y^b
10
$415.37
No
Y^b
10
$479.48
No
Y^b
10
$437.05
No
Y^b
10
$311.97
No
Y^b
10
$360.17
Yes
Y^ba
10
$357.44
Yes
Y^ba
10
$394.40
Yes
Y^ba
10
$439.34
10
Yes
Y^ba
10
$494.85
Duplication
Brick, J. M., Dipko, S., Presser, S., Tucker, C., & Yuan, Y.
(2006). Nonresponse bias in a dual frame sample of cell
and landline numbers. Public Opinion Quarterly,
70(5), 780793.
Hartley, H. O. (1974). Multiple frame methodology and
selected applications. Sankhya: The Indian Journal of
Statistics, 36(Ser. C, Pt. 3), 99118.
Lohr, S. L., & Rao, J. N. K. (2000). Inference from dual
frame surveys. Journal of the American Statistical
Association, 95(449), 271280.
DUPLICATION
Duplication refers to the prevalence of an element
more than one time on a sampling frame, assuming
215
216
Duplication
analyses in order to correct the issue of duplication and reduce the potential bias it may create.
Paul J. Lavrakas
See also Cell Phone Sampling; Elements; Probability of
Selection; Random-Digit Dialing (RDD); Sampling
Frame; Target Population; Weighting
Further Readings
E
55% of the unknown cases would appear in the base
(denominator). The 45% estimated not to be eligible
would be excluded from the calculation of this
response rate.
In estimating e, AAPOR requires that one must
be guided by the best available scientific information
and one must not select a proportion in order to
boost the response rate. AAPOR has documented
eight general methods for estimating the eligibility
rate:
e
e is a term used in the calculation of survey response
rates; it represents the proportion of sampled cases
with unknown eligibility that are estimated to be eligible cases.
To determine response and other outcome rates for
surveys, all cases in the sample first need to be classified into one of four categories: (1) completed cases;
(2) eligible cases, no interview (nonrespondents); (3)
cases of unknown eligibility, no interview; and (4)
not eligible cases (out of sample). Then the eligibility
status of the unknown cases needs to be estimated.
The proportion of unknown cases that is estimated to
be nonrespondents (i.e., eligible cases with no interviews) is known as the e-rate and is represented as e in
equations. For example, in the formula for Response
Rate 3, according to the standards of the American
Association for Public Opinion Research (AAPOR),
the response rate is the number of complete interviews
(I) divided by the number of complete interviews (I)
plus the number of partial interviews (P), plus the number of nonrespondents due to refusals (R), noncontact
(NC), and other reasons (O), plus the number of
unknown known cases (unknown if household (UH)
and other unknowns (UO)) times their estimated eligibility rate (e):
218
Ecological Fallacy
ECOLOGICAL FALLACY
The ecological fallacy is a type of faulty reasoning
that sometimes is made in the interpretation of results
that come from the analysis of aggregate data. This
mistake occurs when data that exist at a group or
aggregate level are analyzed and interpretations are
then made (generalized) as though they automatically
apply at the level of the individuals who make up
those groups. For example, if a researcher used zip
code level census data to determine that the proportion of women in the labor force was inversely correlated with the prevalence of mobile homes in that zip
code, it does not necessarily follow that women who
live in mobile homes are less likely to be employed
than are women who do not live in mobile homes.
Further Readings
219
220
Further Readings
221
222
800 Poll
800 POLL
An 800 poll is a one-question unscientific survey
that is taken by having daily newspaper readers, television viewers, and/or radio listeners call into a tollfree 1-800-number that involves no cost to the caller.
A different 800-number is given for each response
that the poll allows the self-selected respondents to
choose as their answer to whatever the survey question is. These polls are typically sponsored over
a one-day period (or part of a day) by media organizations that produce news. For example, callers who
agree and those who disagree with whichever
issue position is being surveyed use separate 800numbers. It is possible to offer callers more than two
answer choices, and thus more than two 800-numbers,
but typically the polls utilize only two choices.
Such polls have no scientific standing because
there is no way to know what target population is
represented by those who choose to dial in. Since this
is a nonprobability sample, there is no valid way to
calculate the size of the sampling error. Additional
threats to their validity include the possibility that the
same person will call in more than once. Also,
because the response choices normally are limited to
two, the question wording and the response choices
often are not well crafted.
Nonetheless, they offer a vehicle through which
media organizations can provide their audience with
a feeling of involvement in the news, since the poll
results are typically reported by the news organization
within the next day of the poll being conducted. In
some cases, the news organizations acknowledge that
the poll results they are reporting are unscientific, and
in other cases they do not.
With the widespread use of the Internet by the general public, 800 polls have been mostly replaced by
similar one-question unscientific surveys on the
homepages of news media organizations Web sites.
It is important to understand that these unscientific
800 polls are entirely unrelated to the scientific use of
800-numbers by some survey organization as a mode
of allowing scientifically sampled respondents to opt
into a mail survey rather than completing a questionnaire and mailing it back. That is, some survey organizations that conduct mail surveys provide their
respondents with a toll-free 800-number to call in
Sources of Information
There are four possible sources of information about
the election outcome in any given state that are used to
make Election Night projections for that state: (1) the
actual vote at sample precincts, (2) a statewide exit poll
of voters at those precincts, (3) a statewide telephone
poll of absentee (early) voters, and (4) the tabulated
vote reported by counties throughout the state.
Precinct Sample
Passaic
Sussex
2
Bergen
Paterson
Essex
Morris
Hudson
Warren
Newark
1
5
Union
Somerset
Hunterdon
Somerville
Middlesex
Mercer
New Jersey
Monmouth
Trenton
Ocean
Camden
Burlington
Camden
Gloucester
1 - Northern Ciltes
2 - Bergen/Passaic Counties
Salem
Atlantic
3 - Central
4 - South
5 - Northwest
Cumberland
Cape
May
Figure 1
223
224
Northern
Urban
Bergen/Passaic
Counties
Central
South
Northwest
1
(19%)
2
(16%)
3
(21%)
4
(29%)
5
(15%)
2005 GOVERNOR
CORZINE(D)P
(53%)
69
56
51
51
40
FORRESTER (R)
(43%)
28
41
44
45
56
KERRY (D)P
(53%)
66
53
53
51
42
BUSH (R)
(46%)
33
46
47
48
57
LAUTENBERG (D)P
(54%)
67
55
53
53
39
FORRESTER (R)
(44%)
31
43
44
45
58
MCGREEVEY (D)P
(56%)
67
56
57
56
42
SCHUNDLER (R)
(42%)
31
43
40
42
55
GORE (D)P
(56%)
68
56
56
56
42
BUSH (R)
(40%)
29
41
40
41
54
CORZINE (D)P
(50%)
64
50
51
48
36
FRANKS (R)
(47%)
33
48
46
50
61
2004 PRESIDENT
2002 SENATE
2001 GOVERNOR
2000 PRESIDENT
2000 SENATE
(50%)
63
50
50
48
39
REPUBLICAN
(50%)
37
50
50
52
61
Figure 2
election; (2) being competitive, so that it shows a reasonable distribution of the vote; and (3) being typical,
in that it reflects the ideology of the political parties
that the candidates represent.
A listing of all precincts and vote counts in that
past election is obtained and geographic codes are
added. The precincts are sorted by geographic area
In most states, an exit poll is also taken. A subsample of between 15 and 60 precincts are selected
for the exit poll in each state; the actual number
used depends on the importance of the race. The
subsampling is done in such a way as to preserve
the states original order and stratification. The
interviewers at the exit poll precincts tally the questionnaires three times during the day: morning,
afternoon, and about an hour before the polls close
in that state. These tallies are the first data used in
the projection models.
225
Once the polls in a state close, a preliminary tabulation is conducted by the state. Many states now put
the results by county on their Web sites in a timely
manner. But in all states, the vote results, broken down
by county, are provided by the Associated Press and
are used directly in the election projection models.
Models Used
Models With Precinct Data
226
Unlike the precinct data, which is a sample of precincts, the county data is an evolving census of the
vote count. Most counties start with a trickle of precincts approximately a half-hour after the polls close
in a state and eventually reach 100% of the precincts,
often many hours later. In some states, the absentee
vote is counted with the precinct Election Day vote;
in other states, it is counted separately and added to
the vote, sometimes at the beginning of the night and
other times at other irregular intervals. In a few states,
at least some of the absentee vote is not counted until
days later. An analyst trying to make a projection
from the county reports has to be cautious when estimating how much vote has come in at a given time,
since the vote count only roughly follows the proportion of precincts reported. In addition, even when
a county has 100% of the vote reported, it can still
have errors of as much as 0.5%.
A county estimate is made by inflating the votes in
the county based on the inverse of the percentage of
the precincts reported. Then the counties are cumulated by stratum and inflated to the stratum size to
account for counties that have not reported yet. The
stratum estimates are then added and the votes percentaged at the state level. An error term on this estimate is formed by using a regression equation that is
based on historical data over different time intervals
and elections relating the percentage of precincts
reporting in a county at a given time to the deviation
of the candidate percentages from the final outcome.
The estimates of the individual county errors are combined by stratum and then adjusted to the relative
sizes of the strata to form an estimate of the error of
the overall state estimate.
Election Polls
227
Further Readings
ELECTION POLLS
Election polls are surveys that are taken before, during, and after election season and are used to predict
and explain election outcomes. The media conduct
election polls to satisfy their viewers and readers
desire for horse race journalism and to help editors
and reporters plan their coverage of elections and
politicians. Candidates and political parties use them
for strategic purposes, including fund-raising and
helping to position their campaigns in the best possible light. Political scientists and other academics conduct election polls to understand the influence of
campaign dynamics on voting behavior.
Election polls employ various survey methods and
come in a variety of types. In the United States over
the past few decades, most election polls have been
random sample telephone polls, drawn from various
228
Election Polls
a Fox News/Opinion Dynamics trial heat survey conducted in August 2004 gave Democratic presidential
candidate John Kerry a 6-point lead over Republican
George Bush, but the same survey conducted 2
months later in October gave Bush a 4-point lead,
a swing of 10 percentage points. Such surveys are at
the heart of what has become known as horse race
journalism, which refers to the perceived obsession of
the news media to focus overly on who is likely to
win an election.
Tracking Polls
Pre-Primary and Primary Surveys
Surveys are conducted early in a campaign to help
benchmark baseline information about voter demographics and the publics perceptions of the candidates image, message, and issue positions. The most
useful benchmark questions for a candidate concern
name recognition, strengths compared to challengers,
and performance while in office (if the candidate is an
incumbent). The results of these surveys are circulated
within a candidates campaign organizations and help
shape strategy.
These surveys are conducted before and during the
season of the primary elections, when campaigns are
striving to demonstrate the viability of their candidate.
The results are used by the candidates to stimulate
fund-raising efforts, and may be leaked to the news
media if favorable to the candidate and/or unfavorable
to the opponent(s). The value of these pre-primary
polls depends on their timing. If conducted too early,
respondents may not know enough about a candidate.
If conducted too late, the results may have little value
to the candidates.
Tracking polls produce up-to-date estimates of campaign leaders and are typically conducted over the last
few weeks of the campaign. They are used by the
media to complement its horse race coverage and by
candidates to monitor late shifts in support, especially
any shifts that may occur after a campaign-staged
event or other newsworthy events that may arise.
They produce a rolling average estimate derived from
daily samples, usually 100200 interviews each, that
typically are aggregated across 3-day periods (e.g.,
MondayTuesdayWednesday, TuesdayWednesday
Thursday, WednesdayThursdayFriday). Tracking
polls have been criticized for employing inconsistent
sampling procedures; they are often conducted only
in the evenings, rarely attempt to deal with hard-toreach respondents, and select respondents based on
whoever answers the phone rather than randomly
within the household. Tracking polls can be very
expensive compared to other pre-election surveys.
Exit Polls
Probably the most controversial of election polls are
exit polls. Exit polls have two primary purposes: helping project winners on the evening of Election Day
and helping explain election outcomes in the days following the election. These polls consist of interviews
of voters as they leave sampled polling places; they are
asked a short list of questions concerning vote decision,
issue positions, and voter demographics. Exit polls use
multiple-stage sampling methods. First, the polling
organization randomly samples counties in the states of
interest, and then precincts within counties, and then
interviewers in the sampled precincts select respondents based on a pre-determined systematic sampling
Elements
229
Further Readings
ELEMENTS
Within the context of survey research, an element is
the basic unit that represents whatever is being sampled and from which survey data are to be gathered.
Thus, the elements used in different surveys will
depend on the purpose of the survey and may be
adults, children, households, employees, businesses,
students, teachers, schools, school districts, uniformed
personnel, civilian personnel, police districts, libraries,
books within libraries, pages within books, or many
other things.
Within a target population, all the members of that
population are its elements. Within a sampling frame,
all the elements from the target population that can be
listed constitute the frame. All the elements that are
selected for study from the sampling frame make up
what is commonly called the survey sample. However, all the selected elements from which data are
gathered also are commonly referred to as the
sample.
Paul J. Lavrakas
See also Sample; Sampling Frame; Sampling Pool; Target
Population
230
Eligibility
Further Readings
ELIGIBILITY
Eligibility refers to whether or not a sampled unit is
eligible to have data gathered from itthat is, is the
unit part of the surveys target population or is it not?
For example, the target population for a survey might
be all adults who are 1834 years of age. As such, if
a household is sampled and screened via random-digit
dialing and no one living there fits the age criteria,
then the household is ineligible. If there is at least one
person ages 1834 years, then the household is eligible. Ultimately, eligibility versus ineligibility is central to the issue of how well a sampling frame and
a sample drawn from that frame cover the target
population and whether or not coverage error results.
Eligibility also is linked to survey costs, since samples
drawn from frames that contain a large portion of
ineligible units are much more costly to process. As
straightforward as it may appear to determine eligibility for a survey, it often is not at all easy to do and
many mistakes (errors) may occur in the process. Mistakes in determining eligibility may lead to coverage
bias.
For example, most surveys have geopolitical
boundaries for their samples, as they are not national
in scope. In each of these surveys, the target population typically is limited to those residents living
within the geopolitical boundaries (e.g., a particular
county). If the mode of sampling and data collection
is the telephone, as it often is, then some form of geographic screening must be instituted for interviewers
to determine the eligibility of the household or person
being contacted. In the case of boundaries that are
commonly known and well understood by the public
(e.g., ones county of residence), eligibility is readily
determined without many errors, as long as the
respondent does not know what answer will make her
or him eligible or ineligible for the survey. (If the
respondent knows this in advance of answering the
screening questions, some respondents will self-select
themselves in or out of the interview erroneously.) On
the other hand, if the geographic boundaries that
define eligible residency are not well know, (e.g.,
Email Survey
Further Readings
EMAIL SURVEY
An email survey is one that sends the survey instrument (e.g., questionnaire) to a respondent via email
and most often samples respondents via email. These
electronic mail surveys first came into use in the late
1980s, and many scholars at the time thought that
they represented the future of survey research. Since
then, Web (Internet) surveys have become the predominant model for electronic surveying, because of
the relatively poor performance of email surveys in
terms of ease of use and response rates.
231
232
Email Survey
Encoding
Further Readings
233
ENCODING
Encoding information is the cognitive process through
which experiences are translated into memory. However, for the social sciences, encoding often means
the process of translating thoughts, ideas, or questions
into words. Different phrases and words, definitional
and connotative frameworks may conflict given different audiences and contexts. In survey research, the
encoding of widely understood and definitive meaning
into a question is essential to valid measurement.
Researchers must be cognizant of how different
groups will interpret (or decode) their questions. A
strong survey instrument ensures that the researcher
and the respondent share the same understanding of
both the questions asked and the answers given. Compounding problems emerge when a respondent is conditioned by the survey questionnaire or must choose
between response options with similar meanings.
One example of poor encoding might be translating an idea into a question that the respondents interpret inconsistently as a group, interpret differently
from the researcher, or both. For example, a survey
question might ask how respondents feel about
democracy. In order to interpret responses to this
question, a researcher must assume that everyone in
the sample shares the same definition of democracy
as everyone in the sample population and also shares
the surveyors definition. Further, the researcher must
receive the respondents answers with the same
understanding that the respondent delivers it. In survey projects, consistent encoding and decoding is
essential to relaying the respondents true responses.
Failing to anticipate how different groups of people
will interpret the survey instrument will affect both
the internal and external validity of a research project.
There are several sources of differential encoding and decoding in survey research. A survey or
researcher may ask questions in a second or translated
language and may confuse meanings in the second
language. Values, ideas, and definitions may differ
234
EPSEM Sample
Further Readings
EPSEM SAMPLE
Sampling involves the selection of a portion of
the population being studied. In probability sampling, each element in the population has a known,
nonzero chance of being selected through the use
of a random selection procedure. EPSEM refers to
an equal probability of selection method. It is not
a specific sampling method such as systematic sampling, stratified sampling, or multi-stage sampling.
Rather it refers to the application of a sampling
technique that results in the population elements
having equal probabilities of being included in the
sample. EPSEM samples are self-weighting; that is,
the reciprocal of the probability of selection of each
element in the selected sample is the same. Thus the
base sampling weighting for each selected element
in the sample is a constant equal to or greater than
one (1.00).
The most common examples of equal probability
of selection methods are (a) simple random sampling,
(b) unrestricted random sampling, (c) systematic random sampling, (d) stratified sampling, and (e) proportionate stratified sampling. Simple random sampling
refers to equal probability of selection element sample
without replacement. Unrestricted random sampling
refers to equal probability of selection element sample
with replacement. Systematic random sampling refers
to the selection of elements using a sampling interval
and a random start. Stratified sampling refers to the
formation of mutually exclusive and exhaustive
groupings of elements. Proportionate stratified sampling then entails selecting a sample from the strata
so that the proportion of the total sample allocated to
each stratum equals the proportion of the total elements in the population in each stratum. So for example, if a stratum contains 25% of the population
elements, 25% of the sample would be selected from
that stratum.
A multi-stage sample can also be an EPSEM sample. The simplest example is a multi-stage design
based on equal probabilities of selection at each stage
of sampling. A more common practical example is
a multi-stage design that results in an overall equal
probability of selection of each element in the population, but at each stage the probabilities of selection
are not equal. In two-stage sampling the clusters are
usually of unequal size (i.e., the number of elements
in the clusters vary from cluster to cluster, with some
small clusters and some large clusters). If a probability
proportional to size (PPS) sample of clusters is drawn,
the larger the cluster the greater its probability for
selection. So, at the first stage, the probabilities of
selection are unequal. At the second stage of sampling, an equal number of elements are selected using
simple random sampling from the sample clusters.
So, at the second stage, the within-cluster probability
of selection of an element is higher if the cluster is
smaller. However, the product of the first-stage selection probability of the cluster and the second-stage
probability of selection of the element within the cluster is a constant for all elements in the population.
Thus, an EPSEM sample is achieved.
Michael P. Battaglia
See also Multi-Stage Sample; Hansen, Morris; Probability
Proportional to Size (PPS) Sampling; Probability Sample;
Simple Random Sample; Systematic Sampling; Stratified
Sampling
Further Readings
235
236
Error of Nonobservation
Sampling Error
Further Readings
ERROR OF NONOBSERVATION
Errors of nonobservation refer to survey errors that
are related to the exclusion versus inclusion of an eligible respondent or other sample record. This term
principally refers to sampling error, coverage error,
and nonresponse error. This is distinguished from
errors of observation, which refer to errors that are
related to the measurement of the content of surveys.
The term errors of nonobservation is based on the
language and assumptions of survey methodology. It
is similar to the concepts that psychometricians use to
call the errors that impact external validity and, in
Inference from sample surveys assumes that an underlying population is being studied and that samples are
taken from this underlying population. Sample statistics, including sampling errors, are calculated to determine the variability of a statistic as measured in
a survey compared to the actual or true value of that
statistic in the population.
Since not all members of a population are included
in a sample, survey statistics are usually different
from population values. For any population, there are
all sorts of possible combinations of records that
might be included in any particular sample. In many
cases, the results of a survey will be close to what
would be found in an underlying population; in some
cases they may be far off. The sampling error is traditionally taken as a measure of how the statistics
obtained from any particular survey might differ or
vary from those of the actual underlying population.
In terms of understanding errors of nonobservation,
sample errors from probability samples primarily refer
to errors regarding certainty about how close a survey
statistic comes to the actual value of the statistic in
an underlying population. That is, nonobservational
errors due to sampling primarily impact the variability
of survey statistics or the precision of the survey measure. Although there is almost always error in the
form of variance, because survey results are rarely
exactly in line with population statistics, these variable errors are random and thus cancel each other out
across many samples.
The characteristics of sampling error are primarily
mathematical and are based on several assumptions.
Sampling statistics assume that a sample of respondents or other units is taken from an underlying collection, list, or frame of all members of a population.
Error of Nonobservation
Coverage Error
Coverage error refers to the error that occurs when
the frame or list of elements used for a sample does
not correspond to the population a survey is intended
to study. This can occur in several ways. For example,
some sample records might correspond to multiple
members of a population. In contrast, some sample
records might be duplicates or correspond to the same
member of a population. The most problematic situation is undercoverage, where a sample frame excludes
some members of the population it is intended to
cover.
The primary danger involved in coverage error is
coverage bias, which can occur when a sample frame
systematically differs from the population it is
intended to include. The extent of coverage bias
depends both on the percentage of a population that is
not covered in the sample frame and the differences
on any statistic between those included in the sample
frame and those excluded from the sample frame. For
example, household surveys systematically exclude
237
persons who are homeless, telephone surveys systematically exclude persons who do not have telephone
service, and most telephone surveys have systematically excluded people who have cellular (mobile) telephone service but not traditional residential landline
telephone service. In cases where the excluded proportion of a surveys target population is small, and
where differences between sampled respondents and
others are small, researchers usually do not have to
worry about bias in results because of these exclusions. However, if the magnitude of this coverage
error is large, or if the differences between covered
and noncovered respondents are great, or if a survey
is attempting to make very precise estimates of the
characteristics of a population, then nonignorable coverage bias may result.
Nonresponse Error
Nonresponse error refers to error that occurs when
persons or other elements included in a sample fail to
respond to a survey. There are two types of nonresponse to surveys: unit nonresponse and item nonresponse. Item nonresponse (i.e., missing data) occurs
when a respondent who completes a survey fails to
provide answers to a question. Many Americans, for
example, refuse to tell survey researchers their
income. Unit nonresponse, in contrast, occurs when
sampled respondents fail to respond to a survey at all.
Unit nonresponse occurs for a variety of reasons.
Some respondents are unable to complete a survey,
for example, because of a health condition or because
they speak a language other than the ones a survey is
administered in. Other respondents are unavailable to
complete a survey, for example, because they are not
at home when an interviewer calls. Still other respondents refuse to complete surveys and thus are not
included in final data.
The primary danger involving survey nonresponse
is potential bias in results. Similar to coverage error,
the magnitude of nonresponse error depends both on
the percentage of the population who fail to respond to a given question and the differences between
respondents and nonrespondents on any survey
statistic.
Chase H. Harrison
See also Coverage Error; External Validity; Missing Data;
Nonresponse Error; Random Error; Sampling Error;
238
Errors of Commission
ERRORS OF COMMISSION
Errors of commission are sometimes also called
false positives. They refer to instances in which
someone or something is erroneously included for
consideration when they or it should have been
excluded. In survey research, this error typically
occurs when the eligibility of a unit is determined.
For example, when someone is screened at the start of
interviewer contact to determine whether or not he or
she is eligible, and the person answers erroneously in
a way that makes the interview proceed as though the
person were eligible for data collection when in fact
the person is not eligible, then this is an error of commission. In these cases, data are gathered and analyzed for someone who should not have been
interviewed.
Errors of commission occur for many reasons, but
most are due to questionnaire-related error, interviewer-related error, and/or respondent-related error.
The introduction to a questionnaire, where eligibility
screening is typically carried out, might be worded
poorly and thus cause incorrect data to be gathered,
leading to people who in fact are not eligible being
erroneously treated as eligible. An interviewer may
administer an eligibility screening sequence poorly,
thus causing the respondent to misunderstand what
is being asked, leading to answers that result in an
ineligible person being treated as eligible. Finally,
a respondent may be unable to understand the eligibility screening questions and thus may give incorrect
answers. Or the respondent may not be paying enough
Establishment Survey
ERRORS OF OMISSION
Errors of omission are also sometimes called false
negatives. They refer to instances in which someone
or something is erroneously excluded from consideration when they or it should have been included. In
survey research, this error typically occurs when the
eligibility of a unit is determined. For example, when
someone is screened at the start of interviewer contact
to determine whether or not he or she is eligible, and
the person answers erroneously in a way that keeps
the interview from proceeding as though the person
were ineligible for data collection when in fact the
person is eligible, this is an error of omission. In these
cases, data are not gathered from someone who
should have been interviewed.
Errors of omission occur for many reasons, but
most are due to questionnaire-related error, interviewer-related error, and/or respondent-related
error. The introduction to a questionnaire, where
eligibility screening is typically carried out, might
be worded poorly and thus cause incorrect data to
be gathered, leading to people who in fact are eligible being erroneously treated as ineligible. An
interviewer may administer an eligibility screening
sequence poorly, thus causing the respondent to
misunderstand what is being asked, thus leading to
answers that result in an eligible person being treated as ineligible. Finally, a respondent may be
unable to understand the eligibility screening questions and thus may give incorrect answers. Or the
respondent may not be paying enough attention to
the screening questions or may not be willing or
able to give an accurate answer.
An example of an error of omission occurs whenever a survey screening sequence is worded in such
a way that it is readily apparent to the respondent
what answers will qualify her or him for the interview
and what answers will disqualify her or him. Since
many people are reluctant to directly refuse to participate in surveys in which they have little or no interest,
such screening sequences are an easy way for the
respondent to get out of doing the survey without
outright refusing the interviewer. Thus, it is very
239
ESTABLISHMENT SURVEY
An establishment survey is a survey that seeks to
measure the behavior, structure, or output of organizations rather than individuals. Establishment
surveys include surveys of business that are critical
to our understanding of trends in the economy,
such as the Economic Census conducted by the
U.S. Census Bureau. However, establishment surveys also include surveys of universities and colleges, hospitals, and nursing homes. There has
been considerable discussion about the best practices involved with conducting establishment surveys in recent years, as the response rates achieved
by many establishment surveys have declined similar to those of household surveys. This reduction in
response rates has spurred the development of
a more robust literature on conducting establishment surveys, as well as investigation of how to
increase cooperation through improved questionnaire design and contacting procedures. Understanding establishment surveys requires examining
the ways in which they are different from household surveys by focusing on the unique sampling,
survey, and questionnaire design issues that need
to be considered when studying establishments, as
well as effective strategies for contacting the
appropriate respondents within establishments to
complete these surveys.
240
Establishment Survey
Establishment Survey
Sampling
241
There are questionnaire design issues that are particular to establishment surveys. With surveys of all
types, shorter questionnaires generally produce higher
response rates. Given the amount of time required by
the establishment to collect information to respond to
some surveys, limiting the number of data items will
242
Establishment Survey
The modes that are used in establishment surveys have evolved considerably in recent years.
For many years, mail was the predominant mode
used when surveying businesses and other institutions due to the low cost of mail surveys. Similar
to recent trends in household surveys, surveys
offering sampled establishments a choice of modes
are now common. The increasing popularity of
offering a Web survey response option to selected
establishments is primarily due to the fact that
Web access in workplaces is now widespread.
Also, the use of combinations of mail, Web, and
other modes such as touchtone data entry in a survey reduces mode effects, since these modes are
self-administered and questions can be displayed in
essentially the same format. Surveys using mail
and Web have also employed telephone contacts
with establishments as reminder prompts or to collect the data as a last resort. In contrast, for voluntary surveys or surveys among populations that
Data Collection
As discussed previously, the level of burden associated
with completing establishment surveys has depressed
response rates in recent years, and maintaining similar
response rate levels on periodic surveys has been
accomplished only through more intense follow-up
efforts and the employment of more effective data collection techniques. One of the techniques employed in
establishment surveys is ensuring that the initial contact to respondents in establishment surveys is personalized, professional, and succinct. Personalizing an
advance mailing by sending it to an individual rather
than to the establishment at large increases the probability that the mailing will be opened, and if the targeted contact is not the most appropriate person to
respond to the survey, he or she is more likely to pass
along the materials to someone else at the establishment who is better able to provide the information.
However, efforts to direct materials to specific individuals at the establishment are dependent on the information available from the sampling frame or on prefield efforts to identify appropriate respondents.
The design of a successful establishment survey
includes a thorough understanding of the terminology used among the establishments studied, as
discussed earlier. In the same vein, successful contacting of establishments during data collection
should be done by staff who are familiar with this
terminology and can provide relevant information
about the value of completing the survey. Organizations conducting establishment surveys that include
telephone interviewing or prompting have found it
useful to provide additional specialized training to
interviewing staff in order to avoid refusals and
increase survey response. Also, a highly trained and
knowledgeable staff will be better able to negotiate
with gatekeepers, who tend to be more prevalent in
establishment surveys due to the nature of organizations and the need for decision makers to insulate
themselves from communication not central to their
organizations missions.
Ethical Principles
Use of Incentives
Many establishment surveys are conducted regularly to enable time-series data analyses, and often the
same set of businesses is asked to complete surveys at
regular intervals, such as annually. These longitudinal
surveys face additional challenges, such as respondent
fatigue, and the survey instruments used are generally
changed very little between rounds of the survey in
order to maintain time-series continuity. Even though
questions on these surveys remain fairly static over
time, pretesting (including cognitive interviews prior
to data collection or response behavior follow-up
surveys after respondents complete the survey) can
identify areas of confusion that respondents may
encounter while completing the survey. The decision
to change survey items on longitudinal surveys should
factor in the potential impact on respondents, some of
whom complete the survey in multiple years and may
be frustrated by changes in the measures or definitions. That concern must be balanced with the knowledge that there will also be respondents completing
the survey for the first time, either due to staff turnover or because of new establishments added to the
sample. Longitudinal surveys should strive to accommodate both situations to ensure similar response
rates between establishments that are new to the sample and those completing the survey multiple times.
David DesRoches
See also Advance Contact; Cognitive Aspects of Survey
Methodology (CASM); Cognitive Interviewing;
Directory Sampling; Gatekeeper; Informant;
Longitudinal Studies; Probability Proportional to
Size (PPS) Sampling; Questionnaire Design;
243
ETHICAL PRINCIPLES
In the discipline of survey research, ethical principles
are defined as the standard practices for privacy and
confidentiality protection for human subject participants. Ethical principles in survey research are in
place to protect individual participant(s) beginning at
the start of study recruitment, through participation
and data collection, to dissemination of research findings in a manner that is confidential, private, and
respectful. These principles guide accepted research
practices as they apply to the conduct of both quantitative and qualitative methods in survey research.
244
Ethical Principles
Respondent/Participant
Informed Consent
Informed consent is designed to protect survey participants rights to voluntary participation and confidentiality and thus relates to basic concerns over
respect, beneficence, and justice, as discussed previously. To make an informed decision to participate,
individuals must understand that the survey involves
research and the purpose of the research. The consent
statement should communicate the expected burden
(typically for survey research, the length of commitment) and any potential discomfort that may result
from participation, such as distress resulting from
the sensitivity of questions. The consent statement
Voluntary Participation
Ethical Principles
Researcher/Investigator
Ethical principles also provide a foundation to guide
the researcher in the design of a study. The principles
supply a standard for practice that is used to guide
research design, conduct, analysis, and reporting of
findings. Applied and ethical theories are the basis for
several codes of professional ethics and practices (e.g.,
245
246
Further Readings
Key Components
The primary aim of the event history calendar
approach is to maximize the accuracy of autobiographical recall. Just as event history calendars represent the past both thematically and temporally, the
structure of autobiographical knowledge is believed to
be organized in a similar fashion. Theoretically, the
thematic and temporal associations of events within
the structure of autobiographical knowledge afford
retrieval cues that can be implemented in event history calendar interviewing and aid respondents to
reconstruct their pasts more completely and accurately. One type of retrieval cue involves the sequencing of periods of stability and the transitions between
them with regard to what happened earlier and later
in time within the same timeline. For example, one
may remember that ones employment period in one
company immediately preceded another period of
employment with a different company. In between
these periods resides the transition point from one
period to another, and both the length of periods and
the timing of transition points are recorded within
event history calendar timelines. In addition to
sequential retrieval, the use of parallel retrieval cues
247
Background
The first administrations of the event history calendar
interviewing methodology can be traced to 1969,
with the Monterrey Mobility Study of 1,640 men ages
2160 in Monterrey, Mexico, and with the Hopkins
Study that recruited a U.S. national probability sample
of 953 men ages 3039. The method became more
widely recognized as a viable approach to questionnaire design in the late 1980s, with its implementation
in the University of Michigans Study of American
Families. Complementing these earliest efforts in the
fields of demography and sociology, the event history
calendar has been administered by researchers in
a variety of disciplines, including criminology, economics, nursing, psychiatry, psychology, epidemiology, social work, and survey methodology. It has
been successfully used to collect mobility, labor,
wealth, partnering, parenting, crime, violence, health,
and health risk histories. Although aptly used in scientific investigations seeking to uncover the causes and
consequences that govern well-being within populations, it has also been used in more qualitative efforts,
including those in clinical settings that examine an
individuals patterns of behavior to assess potential
beneficial interventions.
248
Exhaustive
Administration Methods
Event history calendars have been administered in
a variety of methods, including as paper-and-pencil
and computer-assisted interviewing instruments, and in
face-to-face, telephone, and self-administered modes.
The method has mostly been implemented in the interviewing of individuals, but the interviewing of collaborative groups has also been done. The computerization
of event history calendars affords the automation of
completeness and consistency checks. Web-based applications are also being explored.
Robert F. Belli and Mario Callegaro
See also Aided Recall; Conversational Interviewing; Diary;
Interviewer Variance; Reference Period; Standardized
Survey Interviewing
Further Readings
EXHAUSTIVE
Exhaustive is defined as a property or attribute of survey questions in which all possible responses are captured by the response options made available, either
explicitly or implicitly, to a respondent. Good survey
questions elicit responses that are both valid and reliable measures of the construct under study. Not only
do the questions need to be clear, but the response
options must also provide the respondent with clear
and complete choices about where to place his or her
answer. Closed-ended or forced choice questions are
often used to ensure that respondents understand what
a question is asking of them. In order for these question types to be useful, the response categories must
be mutually exclusive and exhaustive. That is, respondents must be given all possible options, and the
options cannot overlap. Consider the following question, which is frequently used in a number of different
contexts.
Please describe your marital status. Are you. . .
Married
Divorced
Widowed
Separated
Never married
Exit Polls
In situations in which the researcher cannot possibly identify all response options a priori, or cannot
assume a single frame of reference for the subject
matter, an Other [specify] option can be added. For
example, questions about religion and race always
should include an Other [specify] option. In the
case of religion, there are too many response options
to list. For race, traditional measures often do not adequately capture the variety of ways in which respondents conceptualize race. Thus, an Other [specify]
option allows respondents to describe their race in
a way that is most accurate to them.
Linda Owens
See also Closed-Ended Question; Forced Choice; Mutually
Exclusive; Open-Ended Question
Further Readings
EXIT POLLS
Exit polls are in-person surveys in which data are
gathered immediately after people have engaged in
the behavior about which they are being surveyed,
such as voting in an election. The survey methods that
are used in exit polls apply to the measurement of
a wide variety of behaviors, but in the minds of most
people exit polls are most closely associated with
what is done on Election Day to help project the
249
250
Exit Polls
more influential on the geopolitical entity and the citizenry being measured. Thus, exit polls are important,
not so much because they are used to help make the
projections reported by the major television networks
on Election Night, but because the information they
gather about the voters demographics and attitudes
toward the candidates and the campaign issues provides very powerful and important explanations about
why the electorate voted as it did. It is only through
the use of accurate exit poll data that the so-called
mandate of the election can be measured and reported
accurately without relying on the partisan spin that
the candidates, their campaign staff, and political
pundits typically try to put on the election outcome.
For example, in 1980, Ronald Reagans strategists
described his sound defeat of Jimmy Carter as a turn
to the right by American voters and as an impetus
for a conservative legislative agenda for the new Congress. In contrast, 1980 exit-poll data showed there
was no ideological shift among American voters.
Instead, they were primarily concerned about President Carters inability to influence the economy and
settle the Iran hostage crisis, and they wanted a new
president whom they hoped would do a better job
in reducing inflation. As another example, in the
1998 exit polls, voters indicated that they were basing
their votes for Congress on evaluations of their local
candidates and not on any concerns about the allegations regarding President Bill Clinton and Monica
Lewinsky contained in Kenneth Starrs report.
Exit Polls
251
252
Experimental Design
Further Readings
EXPERIMENTAL DESIGN
Experimental design is one of several forms of scientific inquiry employed to identify the cause-and-effect
relation between two or more variables and to assess
the magnitude of the effect(s) produced. The independent variable is the experiment or treatment applied
(e.g., a social policy measure, an educational reform,
different incentive amounts and types) and the dependent variable is the condition (e.g., attitude, behavior)
presumed to be influenced by the treatment. In the
course of the experiment it is necessary to demonstrate the existence of covariation between variables,
its nonspuriousness, and to show that the cause
occurred before the effect. This sort of inquiry can
take the form of an artificial experiment, carried out
in a laboratory scenario, or a natural experiment
implemented in a real-life context, where the level of
control is lower. For both cases, the literature presents several taxonomies, from which four main
types are considered: (1) true or classical experimental, (2) pre-experimental, single-subject experimental, and (3) quasi-experimental. In addition, there are
a number of variations of the classic experimental
design as well as of the quasi-experimental design.
In a true or classic experimental design, there are
at least two groups of individuals or units of analysis:
the experiment group and the control group. Participants are randomly assigned to both groups. These
two groups are identical except that one of them is
exposed to the experiment or causal agent, and the
other, the control group, is not. In many instances,
a pretest and a posttest are administered to all individuals in the two groups; but the pretest is not
a necessary aspect of the true experiment. If there is
a significant difference between members of the two
groups, it is inferred that there is a cause-and-effect
link between that treatment and the outcome.
The pre-experimental design does not have a control group to be compared with the experiment group.
There is a pretest and a posttest applied to the same
participants. In a single-subject experimental design,
there is only one participant, or a small number, that is
Experimental Design
253
254
External Validity
EXTERNAL VALIDITY
External validity refers to the extent to which the
research findings based on a sample of individuals
or objects can be generalized to the same population
If the results of the survey apply only to the sample, rather than to the target population from which
the sample was selected, then one might question why
the survey was conducted in the first place, as the
results are not likely to have any value. In order to
avoid this situation, the researcher should make certain that the sampling design leads to the selection
of a sample that is representative of the population.
This can be accomplished by using an appropriate
probability sampling method (e.g., simple random sampling, systematic sampling, stratified sampling, cluster
sampling, multi-stage sampling) to select a representative sample from the target population. Generally this
means drawing a sample that has sample characteristics
External Validity
This threat to external validity refers to the characteristics of the survey studys setting that may
limit the generalizability of the results of the study.
The major concern with this threat to the external
validity is that the findings of a particular survey
research study may be influenced by some unique
circumstances and conditions, and if so, then the
results are not generalizable to other survey research
studies with different settings. The research site,
specific experimental setting arrangements, intervention delivery method, and experimenters competency level are examples of such possible setting
factors that well can limit the generalizability of the
results.
One of the methods that can be used for minimizing the survey research setting threat is replicating the
study across different sites with different individuals
and in different times. Thus, in order for the results of
a survey to be externally valid, it should be generalized across settings or from one set of environmental
conditions to another. This concept is also referred to
in the literature as ecological validity.
Temporal Characteristics
255
This threat to external validity refers to the possible impact on respondents of knowing that they are
participating in a survey research study. This impact
is known in the social science literature as the
Hawthorne effect or reactivity. Thus, the participants
awareness of their participation in a survey and their
thoughts about the studys purpose can influence the
studys outcomes and findings. Performance, achievement, attitude, and behavior are examples of such outcomes that may be affected. The research findings
may be different if the participants were unaware of
their participation in the study, although this generally
is not practical when using survey research. Nevertheless, the prudent researcher and researcher consumer
keeps this threat to external validity in mind when
deciding how well the survey results generalize
beyond the study itself.
One of the methods that can be used for avoiding
the awareness threat in an experimental study is by
giving the participants in the control group a placebo
treatment and giving the participants in the experimental group the new treatment. However, it is
important to stress the fact that the researcher who
uses a placebo treatment in a survey study must consider the following ethical issues:
1. Should the participants in the study be informed
that some of them will be given the placebo treatment and the others will be given the experimental
treatment?
2. Should the participants be informed that they will
not know which treatment (placebo or experimental)
they have received until the conclusion of the study?
3. Should the participants be informed that if they
initially received the placebo treatment, they will
be given the opportunity to receive the more effective treatment at the conclusion of the study if the
results of the experimental study indicate the effectiveness of the experimental treatment?
Multiple-Treatment Interferences
256
External Validity
interventions, treatments, training, testing, and surveying is responsible for the results of the survey study
unless the various treatments all are controlled with an
experimental design, which often is impractical. For
example, if a study were conducted within a survey to
try to raise response rates and several interventions
were combined (e.g., higher incentives, new recruitment scripts, special interviewer training) to form
a treatment package for the experimental group of
respondents, the researchers would not have confidence in knowing which of the interventions or their
interactions brought about any observed increase in
response rates. Thus, any observed effects could not
generalize to other studies that did not use all of the
same interventions in the same combination.
Pretest-Treatment Interactions
Further Readings
257
Further Readings
F
Disadvantages
FACE-TO-FACE INTERVIEWING
Advantages
By far, the main advantage of the face-to-face interview is the presence of the interviewer, which makes
it easier for the respondent to either clarify answers or
ask for clarification for some of the items on the questionnaire. Sometimes, interviewers can use visual aids
(e.g., so-called show cards) to assist respondents in
making a decision or choice. Properly trained interviewers are always necessary lest there be problems
such as interviewer bias, which can have disastrous
effects on the survey data. Relatively high response
rates and an almost absence of item nonresponse are
also added bonuses. The opportunity for probing
exists where the interviewer can get more detailed
information about a particular response.
259
260
Face-to-Face Interviewing
Each format has particular advantages and disadvantages and is suited for particular purposes. The
purpose of the study, the length of the interview, and
the cost constraints are all factors to consider.
In structured face-to-face interviews the interviewer asks each respondent the same questions in
the same way. This is the most basic and most common face-to-face survey type. A structured interview
may include open-ended and closed-ended questions.
This type of interview is usually used for large projects for which the researcher wants the same data to
be collected from each respondent.
Semi-structured face-to-face interviews mainly consist of open-ended questions based on topics the
researcher wants covered in the interview. Although
the interview focuses on key topics, there is also the
opportunity to discuss, in more detail, some particular
areas of interest. The interviewer has the opportunity to
explore answers more widely or other areas of discussion spontaneously introduced by the respondent. The
face-to-face interviewer may also have a set of prompts
to help respondents if they struggle to answer any of
Before interviewers are sent to conduct face-toface interviews, there are some tasks that must be
accomplished. One of these is to craft an introduction
to inform respondents about the interview. They may
want to know about the reason(s) for conducting the
survey, how the information will be used, and how
the results may impact their lives in the future. The
respondent usually will want to know the length of
time that the interview will take. Last but not least,
the respondent must be given information to trust the
interviewer and the introduction should set this into
motion.
Immediately before the interview, the interviewer
does a check to ensure that she or he has all the
equipment and materials needed. She or he should
review the questionnaire, as needed. The place chosen
to do the interview should be as serene as possible
and be exempt from unnecessary disruptions. If
respondents believe that their answers are confidential, then they will be less hesitant to respond. To
increase the sense of confidentiality, names and
addresses should not be placed on the questionnaires
(where the respondent could see them). Instead, the
researchers should use code numbers on the questionnaires and keep the names and addresses in a separate
document. Also, there should be no other people
presentonly the interviewer and the respondent.
Incentives can improve response rates, so researchers need to decide on these beforehand. This can take
the form of rewarding respondents before (noncontingent incentives) or after (contingent incentives) filling
out questionnaires, or both before and after. Deployment of either form of incentive can be immediate
when doing face-to-face interviewing; thus, gratification is essentially instant. Cash money is the simplest
Factorial Design
261
FACTORIAL DESIGN
Factorial designs are a form of true experiment, where
multiple factors (the researcher-controlled independent variables) are manipulated or allowed to vary,
and they provide researchers two main advantages.
First, they allow researchers to examine the main
effects of two or more individual independent variables simultaneously. Second, they allow researchers
to detect interactions among variables. An interaction
is when the effects of one variable vary according to
262
Factorial Design
Table 1
Row means
Low dose
15
10
Medium dose
15
15
15
Column means
15
10
12.5
effect of type of psychotherapy on symptom improvement, the table is read down by column, comparing the overall means for the two groups receiving
Psychotherapy 1 versus the two groups receiving Psychotherapy 2 (15 vs. 10). The data show that patients
receiving Psychotherapy 1 showed greater symptom
improvement compared with patients receiving Psychotherapy 2.
To determine whether an interaction exists, we
examine whether the size of the Factor 1 (type of psychotherapy) effect differs according to the level of
Factor 2 (medication dosage level). If so, an interaction exists between the factors. In this example, the
effect of dose differs according to which therapy the
patient received. Thus, there is an interaction between
type of psychotherapy and medication dosage. For
patients receiving the low dose medication, there was
significantly less improvement under Psychotherapy 2
than under Psychotherapy 1. But dosage level made
no difference when given with Psychotherapy 1.
Because the effect of the individual variables differs
according to the levels of the other variable, it is common practice to stress that any significant main effects
must be interpreted in light of the interaction rather
than on their own. In this example, while drug dosage
showed a main effect with the medium dose leading to
greater symptom improvement, on average, this effect
held only for patients receiving Psychotherapy 2. The
outcome of a study using a factorial design can also be
depicted graphically. Figure 1 shows a bar chart and
a line chart of the group means. For the bar chart, an
interaction is apparent because the difference between
the bars for low dosage is larger than the difference
between the bars for medium dosage. For the line
chart, an interaction is apparent because the lines are
not parallel. Line charts should be used only with
factorial designs when it makes sense to talk about
Factorial Design
20
Mean Improvement
Mean Improvement
20
15
10
15
10
0
Low
Medium
Level of medication dose
Psychotherapy
Type 1
Type 2
Figure 1
263
Low
Medium
Level of medication dose
Psychotherapy
Type 1
Type 2
Graphical representations of the outcome of a factorial study: bar chart (left) and line chart (right)
264
Further Readings
Data Collection
Each respondent is asked to assign the value of a specified outcome variable (such as healthiness, marital
happiness, actual wage, just wage, or fairness of an
actual wage) corresponding to a fictitious unit (e.g.,
a person or a family), which is described in terms of
potentially relevant characteristics such as age, gender, study or eating habits, access to medical care or
housing, and the like. The descriptions are termed
vignettes. One of Rossis key insights was that fidelity
to a rich and complex reality can be achieved by
generating the population of all logically possible
combinations of all levels of potentially relevant characteristics and then drawing random samples to present to respondents. Accordingly, the vignettes are
described in terms of many characteristics, each characteristic is represented by many possible realizations,
and the characteristics are fully crossed (or, in some
cases, to avoid nonsensical combinations, almost fully
crossed). Three additional important features of the
Rossi design are (1) in the population of vignettes, the
correlations between vignette characteristics are all
zero or close to zero, thus reducing or eliminating
problems associated with multi-colinearity; (2) the
vignettes presented to a respondent are under the control of the investigator (i.e., they are fixed), so that
endogenous problems in the estimation of positivebelief and normative-judgment equations arise only if
265
Data Analysis
The analysis protocol begins with inspection of the
pattern of ratings, which, in some substantive contexts, may be quite informative (e.g., the proportion
of workers judged underpaid and overpaid), and
continues with estimation of the positive-belief and
normative-judgment equations. Three main approaches
are (1) the classical ordinary least squares approach;
(2) the generalized least squares and seemingly unrelated regressions approach, in which the respondentspecific equations may have different error variances
and the errors from the respondent-specific equations
may be correlated; and (3) the random parameters
approach, in which the respondents constitute a random sample and some or all of the parameters of the
respondent-specific equations are viewed as drawn
from a probability distribution. Under all approaches,
an important step involves testing for interrespondent
homogeneity.
Depending on the substantive context and on characteristics of the data, the next step is to estimate the
determinants equation and the consequences equation.
Again depending on the context, the determinants
equation may be estimated jointly with the positivebelief and normative-judgment equations.
Prospects
For many years Rossis method was somewhat difficult to implement, given the computational resources
required to generate the vignette population and to estimate respondent-specific equations. Recent advances in
desktop computational power, however, render straightforward vignette generation, random samples, and data
analysis, thus setting the stage for a new generation of
studies using Rossis method.
Research teams around the world are exploring
several new directions, building on the accumulating
266
Fallback Statements
FALLBACK STATEMENTS
Oftentimes when interviewers first make contact with
a sampled respondent, the respondent is hesitant or
otherwise reluctant to agree to participate in the survey. In most surveys, researchers can anticipate the
nature of the concerns that will be expressed by
Falsification
targeted fallback statements to deploy for those respondents who express specific concerns.
Paul J. Lavrakas
See also Tailoring
Further Readings
FALSIFICATION
Interviewer falsification, the act by a survey interviewer of faking an interview or turning in falsified
results as if they were the real thing, is a well-known,
long-standing, and recurrent problem that has drawn
occasional attention in the research literature since the
early days of the fields development. It has traditionally been referred to as curbstoning, a term that
captures the image of the interviewer, out on field
assignment, who settles on the street curbing to fill
interview forms with fabricated responses instead of
knocking on doors to obtain real interviews.
In recent years, the problem has drawn renewed
attention because the U.S. federal governments
Office of Research Integrity (ORI) made clear in
2002 that it considers interviewer falsification in any
study funded by the U.S. Public Health Service to be
a form of scientific misconduct. Because that designation can invoke potentially grave consequences for
researchers and their organizations, a summit conference of representatives of governmental, private, and
academic survey research organizations was convened
in Ann Arbor, Michigan, in April 2003 by Robert M.
Groves (with ORI support) to compile best practices for the detection, prevention, and repair of
interviewer falsification. This entry draws freely on
the statement generated from those meetings, which
has been endorsed by the American Association of
Public Opinion Research and by the Survey Research
Methods Section of the American Statistical Association. This entry defines falsification, discusses its
prevalence and causes, and outlines methods of
267
Falsification Defined
Interviewer falsification means the intentional departure from the designated interviewer guidelines or
instructions, unreported by the interviewer, which
could result in the contamination of data. Intentional
means that the interviewer is aware that the action
deviates from the guidelines and instructions; honest
mistakes or procedural errors by interviewers are not
considered falsification. This behavior includes both
fabrication (data are simply made up) and falsification
(results from a real interview are deliberately misreported). It covers (a) fabricating all or part of an interview, (b) deliberately misreporting disposition codes
and falsifying process data (e.g., recording a refusal
case as ineligible or reporting a fictitious contact
attempt), (c) deliberately miscoding the answer to
a question in order to avoid follow-up questions,
(d) deliberately interviewing a nonsampled person in
order to reduce effort required to complete an interview, or (e) intentionally misrepresenting the data collection process to the survey management.
268
Falsification
Falsification
269
270
Fast Busy
Further Readings
FAST BUSY
A fast busy is a survey disposition that is specific to
telephone surveys. It occurs when an interviewer dials
a number in the sampling pool and hears a very rapid
busy signal. Fast busy signals are sometimes used by
telephone companies to identify nonworking telephone numbers, but they occasionally occur when
heavy call volumes fill all of the local telephone circuits. Telephone numbers in the sampling pool that
result in a fast busy disposition usually are considered
ineligible. As a result, fast busy case dispositions are
considered final dispositions and typically are not
redialed by an interviewer, although in some cases
they may be dialed again in case the fast busy condition is only temporary.
From a telephone interviewing standpoint, the
practical difference between a fast busy signal and
a normal busy signal is that the pace of the fast busy
signal is noticeably faster than that of a normal busy
signal. It is important to note that the disposition of
fast busies is different from that of busies, and thus
fast busies need to have a survey disposition code that
is different from the code used for normal busies. As
a result, telephone interviewers need to understand
the difference between busies and fast busy signals,
along with the different dispositions of cases that
reach normal busies and fast busy signals. This
knowledge will ensure that interviewers code the fast
Favorability Ratings
Further Readings
FAVORABILITY RATINGS
A favorability rating is a statistical indicator that is
produced from data that typically are gathered in
political polls. These ratings indicate whether the publics overall sentiment toward a politician is favorable
(positive), unfavorable (negative), or neutral. Journalists often report favorability ratings as part of their
coverage of political campaigns and elections.
A favorability rating about a politician is calculated
by using data gathered in so-called approval questions.
These questions ask poll respondents whether they
approve or disapprove of X, where X typically is
the name of a politician. The favorability rating for that
person is calculated by subtracting the proportion of
those interviewed who say they disapprove of the person (or her or his policies, or both) from the proportion
that say they approve. That is, the disapprove (negative) percentage is subtracted from the approve (positive) percentage; if there are more who disapprove than
approve, then the favorability rating will be a negative
number.
For example, if 65% of the people polled said they
disapproved of the job George W. Bush was doing as
president, while 30% said they approved (with 5%
271
272
FEDERAL COMMUNICATIONS
COMMISSION (FCC) REGULATIONS
The Federal Communications Commission (FCC)
regulates many of the telecommunications survey and
marketing researchers use through the rules under the
Telephone Consumer Protection Act (TCPA), which
directed the FCC to balance the fair practices of telemarketers with consumer privacy concerns.
Although some TCPA provisions apply only to
commercial and sales-related communicationsfor
example, the Junk Fax Prevention Act (JFPA), the
National Do Not Call Registry, and restrictions on call
abandonment and time of daythey still impact
researchers. However, the TCPA restrictions on war
dialing, artificial or prerecorded messages, and cellular phone calling apply to all callers, including survey
researchers.
Call Abandonment
The TCPA prohibits telemarketers from abandoning
more than 3% of all telemarketing calls that are
answered live by a person. A call is considered
abandoned if it is not connected to a live sales representative within 2 seconds of the called persons completed greeting. Although these restrictions apply only
to telemarketing calls, professional associations recommend that researchers strictly limit their call abandonment rates.
War Dialing
War dialing is the practice of using automated equipment to dial telephone numbers, generally sequentially,
and software to determine whether each number is associated with a fax line or voice line. The TCPA prohibits
anyone from doing so. However, the restriction only
applies if the purpose of the call is to determine
whether the line is a facsimile or a voice line. For
example, calling a number already known to be a voice
line for the purpose of determining if it is a working
or nonworking number could be outside the scope of
the TCPA.
273
274
Telemarketing and
Consumer Fraud and
Abuse Prevention Act (TSR)
This federal act, also know as the Telemarketing
Sales Rule or TSR, established rules in 1994 to prohibit certain deceptive telemarketing activities, and it
regulates sales and fund-raising calls to consumers, as
well as consumer calls in response to solicitation by
mail. The TSR also prohibits activities commonly
known as SUGing and FRUGing. SUGing is the practice of selling under the guise of research, while
FRUGing is fund-raising under the guise of research.
Selling, in any form, is differentiated from survey
research, and the FTC recognizes that in the TSR.
Occasionally, survey research companies will offer an
incentive or gift to the respondent in appreciation of
his or her cooperation. Such an incentive or gift could
be a cash donation to a charity, a product sample, or
a nominal monetary award. But sales or solicitation is
not acceptable or permitted in legitimate and professionally conducted survey research and violates
federal law.
Telemarketers have various restrictions in the TSR
but perhaps the best-known provisions relate to the
National Do Not Call Registry. To enforce the law,
the TSR allows consumers to bring private civil lawsuits in federal district courts.
275
Deceptive Practices
The FTC regulates survey researchers in a broad
waybreaking promises can mean breaking the law.
Violating stated privacy policy can be actionable
under Section 5 of the original FTC authorization
act (15 U.S.C. xx 4158) as an unfair or deceptive
trade practice, as well as under similar laws at the
state level.
Howard Fienberg
Further Readings
276
Feeling Thermometer
FEELING THERMOMETER
The feeling thermometer is a common survey tool
used by researchers to determine and compare respondents feelings about a given person, group, or issue.
Feeling thermometers enable respondents to express
their attitudes about a person, group, or issue by
applying a numeric rating of their feelings toward that
person, group, or issue to an imaginary scale. Using
a feeling thermometer, respondents express their feelings in terms of degrees, with their attitudes corresponding to temperatures. A rating of 0, very cold,
indicates that a respondent does not like a given person, group, or issue at all; a rating of 100, very warm,
translates to the respondent liking that person, group,
or issue very much. In general, researchers consider
ratings below 50 to indicate a respondent dislikes or
has a negative view of a person, group, or issue; conversely, respondent ratings above 50 are indicative of
positively held feelings or attitudes. The midpoint of
the feeling thermometer, 50, is reserved to indicate
that a respondents feelings toward a person, group,
or issue are completely neutral: He or she does not
like or dislike, approve or disapprove, have positive or negative feelings toward the person, group,
or issue.
Despite the seemingly simple and straightforward
concept of feeling thermometers, they are susceptible
to high levels of variance due to a variety of reasons
associated with how individuals respond to feeling
thermometers. Studies have found that some respondents tend to be warmer than others in applying the
scale, whereas other respondents tend to be colder.
Further, they explain that some respondents, for whatever reason, restrict their ratings to relatively small
portions of the thermometer, whereas others are just
more open to using the entire spectrum. Additionally,
an inverse relationship has been found between
respondents levels of education and thermometer ratings, with higher ratings associated with the less educated respondents.
Feeling thermometers were first used in the 1964
American National Election Study. Because feeling
thermometers were introduced in an election study,
people commonly associate the use of feeling thermometers with political science research. Although
political scientists do utilize feeling thermometers in
a wide variety of studies, many researchers in other
disciplines, including psychology and sociology,
FIELD CODING
Field coding involves the use by an in-person or
telephone interviewer of a standardized listing of
response options to categorize open-ended responses
given by respondents to questions that provide no
specific response options to the respondent. This
approach differs from the administration of a closedended question, where the response options are read
to the respondent, and differs from the administration
of open-ended questions, where the response is typically recorded verbatim.
With field coding, an interviewer typically asks the
respondent an open-ended question and waits for
a response. As the respondent replies, the interviewer
records the information into one or more of the predetermined response options. Should the respondent
give an answer that is not on the interviewers list
of response options, the interviewer either must interpret the answer as close as possible to one of the
Field Director
277
Further Readings
FIELD DIRECTOR
In the survey research community, the title Field
Director is commonly used to denote the person with
overall responsibility for the data collection component of a survey that uses off-site interviewing personnel as data collectors. Not to be confused with the
Principal Investigator or the Project Director, which
may be positions held by other staff on the project,
the Field Director role is commonly limited in functional scope to all aspects of collecting data in the
field. The Field Director also may be called the Field
Manager or Data Collection Task Leader. (The Field
Director title is sometimes used to refer to the person
in charge of data collection in a centralized telephone
call center, although some consider this a less appropriate use of the term.)
278
Field Period
FIELD PERIOD
The field period of a survey is the time frame during
which the survey instrument is in the field, as
opposed to the time when the survey instrument is
under development or review in the office. It is the
period during which interviews are conducted and
data are collected for a particular survey. Originally,
it referred to the period of time when personal faceto-face interviews are being conducted by field interviewers. Over the course of years, the field period
has come to be regarded as the period of days or
months over which data for a survey were gathered
from respondents, regardless of the mode of data collection that was used.
The purpose of the survey is directly related to the
field period that is established. A field period might
be as short as a few hours for an overnight public
opinion poll or a few days for time- or event-sensitive
surveys. For surveys in which the subject is less timeor event-sensitive, the field period might extend for
several weeks or months. In establishing the field
period for a survey, the purpose of the survey is perhaps the most significant factor. To the extent that
a survey is designed to gauge public opinion in
response to a specific event or activity, a short field
period is appropriate. This is often the case in political
Field Survey
279
Further Readings
FIELD SURVEY
The term field is used in survey research to refer to
the geographical setting where data collection takes
place. Typically this refers to in-person interviewing
and thus the name, field survey.
One of the key decisions when designing a survey
is the choice of the mode of data collection. Field
interviewing is one of three traditional modes of survey data collection (along with telephone and mail).
In field surveys, which are also referred to as face-toface or personal-visit surveys, an interviewer visits
the respondents home or office (or another location)
and conducts the interview. This entry outlines the
major advantages and disadvantages of field data collection and the variations that are found in modern
survey research and concludes with a brief overview
of the development of present-day field surveys.
280
Field Survey
Variations
Field surveys can be implemented in a number of
ways and can be used to collect a wide range of data.
It is common to record interviewer observations on
characteristics of the neighborhood and housing unit.
In surveys that ask for sensitive information such
as drug use or sexual behavior, some questions may
be self-administered; that is, respondents read and
answer the questions on their own either during or
after the interview. For example, the National Survey
on Drug Use and Health, a large annual field survey
of approximately 70,000 U.S. persons 12 years old
and older, which is sponsored by the U.S. Substance
Abuse and Mental Health Services Administration
and conducted by RTI International, uses ACASI
(audio computer-assisted self-interviewing) in which
respondents listen to questions using earphones and
enter their responses on a laptop computer.
Field survey protocols may include the administration of tests of physical performance (e.g., walking
speed, grip strength) or cognitive ability (e.g., memory tasks, word recognition) or the recording of physical measurements (e.g., height, blood pressure).
Biological specimens such as blood or saliva or environmental specimens such as soil or dust may be
taken as part of the in-person visit, as is done, for
example, in the National Health and Nutrition Examination Survey. In mixed-mode studies, sample members may first be asked to complete the survey using
Development of Present-Day
Field Studies
The roots of modern field survey research can be
found in part in the studies of the poor carried out by
Charles Booth and colleagues in London during the
late 19th and early 20th centuries.
In the decades to follow, the use of field surveys
grew dramatically as there were attempts to systematically record and analyze sample survey data on a variety of phenomena, from consumer preferences to
unemployment. However, in the private and academic
sectors, field surveys would later be replaced by mail
and telephone surveys that were cheaper and expected
to yield similar data based on methodological studies.
Today, most national field data collections are sponsored by the federal government.
Historically, field interviews were completed using
paper-and-pencil questionnaires, but by the end of the
20th century, most large field studies in the United
States had transitioned to computer-assisted personal
interviewing (CAPI) instruments that were administered using laptop computers. The first national household survey to use CAPI in the United States was the
1987 Nationwide Food Consumption Survey, conducted by National Analysts.
Ashley Bowers
See also Area Probability Sample; Audio Computer-Assisted
Self-Interviewing (ACASI); Computer-Assisted Personal
Interviewing (CAPI); Current Population Survey (CPS);
Face-to-Face Interviewing; Field Work; National Health
and Nutrition Examination Survey (NHANES)
Further Readings
Field Work
FIELD WORK
Field work encompasses the tasks that field staff, such
as field interviewers or telephone interviewers, perform before or during the data collection field period
of a survey. Field work refers to both telephone and
in-person studies. For telephone interviewing, field
work is usually restricted to the field period. For inperson studies, field work may take place before, during, and after the field period for the study.
The field work for telephone studies usually involves
working at a telephone survey center with computerassisted telephone interviewing (CATI) stations. In
RDD (random-digit dialing) surveys, the computer at
the CATI station dials randomly generated telephone
numbers that are programmed into the CATI system.
Given that the telephone numbers are randomly generated, part of the telephone interviewers field work is to
screen the telephone numbers to determine whether
they are eligible for the study sample. For instance, if
a study is sampling members of households, the interviewer has to determine whether the phone number
dialed reaches a household, a business, or is a nonworking number. Usually, the interviewer samples eligible
respondent(s) at the household by asking the person
who answers the phone questions designed to randomly
select a person or persons in the household. The telephone interviewer then administers the CATI questionnaire to the sampled respondent over the telephone or
281
makes an appointment for a callback when the designated respondent will be available. In case of a refusal,
the telephone interviewer uses his or her knowledge
about the study and about refusal conversion skills to
convince the respondent to participate. If not successful,
the interviewer records the reason for the refusal in
detail so that the case can be contacted again for refusal
conversion.
The field work for in-person studies is more extensive than for telephone interviewing. Field work conducted in preparation for the field period for random
sampling studies can include listing of dwelling unit
addresses, which is performed by listers or enumerators. These field staff members work at the selected
geographical areas for the study identifying eligible
units and listing their addresses for sampling. For list
samples, field work may involve contacting institutions to obtain lists of employees, members, or clients
for sampling.
Field work for an in-person interviewer (or enumerator) during the studys field period may include
locating the sampled units on a map and planning the
most efficient way to travel to the area to conduct the
interviews. Once in the area, the field interviewer contacts the sampled unit to request participation in the
study. Field work may involve screening households
or businesses to identify eligible respondents using
a screening questionnaire. Once the eligible respondents are identified, the field interviewer administers
the main questionnaire to the sampled respondents
usually via computer-assisted in-person interviewing
(CAPI) but also sometimes via a paper questionnaire.
In some studies, field work also involves administering literacy assessments, collecting samples (e.g., hair,
urine), and taking other health measurements of
respondents, such as height and weight, in addition to
administering a CAPI questionnaire. Field interviewers also ensure that respondents fill out any
required study forms such as consent forms. Field
interviewers may also call respondents to schedule
appointments for additional interviews or assessments
as part of their field work. Field interviewers may
plan and implement refusal conversion strategies to
convert refusals incurred by the interviewer or transferred from another interviewer.
Other tasks that are part of field work for in-person
interviewers are recording the result of contacts on
a computer or case folder and submitting completed
work via mail and online data uploads. Field interviewers report to a field supervisor or to a home office
282
Final Dispositions
FINAL DISPOSITIONS
Final dispositions (or final sample dispositions) are
a set of codes or categories used by survey researchers
to document the ultimate outcome of contact attempts
on individual cases in a survey sample. Assigned after
field work on a survey has been completed, final dispositions provide survey researchers with a terminal,
or ending, status of each unit or case within the sampling pool. Survey researchers use final sample dispositions for two reasons: (1) to calculate response rates
and (2) to help assess whether the sample might contain nonresponse error.
One important purpose of final dispositions is to
calculate survey response rates. It is common practice
for survey researchers to compute the response rates
at the end of a surveys field period. Response rates
are a common measure of survey quality, and typically it is assumed that the higher the response rate is,
the higher the quality of the survey data is. Because
the final dispositions categorize the outcome of each
case (or unit) in the sampling pool, final dispositions
make it possible for survey researchers to calculate
survey response rates.
Final Dispositions
283
error, nonresidential units, vacant households, household units with no eligible respondent, and situations
where quotas have been filled. In addition, for telephone household surveys, ineligible cases include fax
or data lines, nonworking numbers, or nonresidential
numbers.
Converting Temporary
Dispositions to Final Dispositions
At the end of a survey field period, many cases will
already have reached a logical final disposition. These
cases include completed interviews, refusals, and ineligible numbers, among others. However, some cases
will not have reached a final disposition and will still
have a temporary disposition code. (Temporary disposition codes are used to record the outcomes of contact attempts when the contact has not resulted in
a final disposition.) Examples of temporary disposition codes include maximum call limit met, callback,
no callback by date of collection cut-off, ring-noanswer, busy, and appointments that were not kept by
the interviewer or the respondent.
Temporary disposition codes must be replaced with
final case dispositions before these cases can be
included in the calculation of response rates. For these
cases, researchers must assign final dispositions by
reviewing the pattern of disposition codes and call/
contact outcomes recorded for each individual case and
using this information to determine the final disposition
code that best describes the case. (Computer algorithms can be written to make most, if not all, of these
decisions.) In considering the proper final disposition
code to use, survey researchers must consider the best
information from all contact attempts. Because the
information across contact attempts might be contradictory, three factors merit special attention: (1) the cases
situation on status day (usually the first day of the field
period or the first day that a case was contacted); (2)
the certainty of the information on case contact attempts
(information across contact attempts might be uncertain
and researchers in these cases most often should take
the conservative approach of assuming a case is eligible
or possibly eligible unless there is reliable information
to suggest otherwise); and (3) the hierarchy of disposition codes (disposition codes in which there was human
contact take precedence over others, and generally in
these cases, the last disposition in which there was
human contact will serve as the final disposition). For
example, if the last contact attempt with a sampled
284
Finite Population
Further Readings
FINITE POPULATION
Most statistical theory is premised on an underlying
infinite population. By contrast, survey sampling theory and practice are built on a foundation of sampling
from a finite population. This basic difference has
myriad ramifications, and it highlights why survey
sampling is often regarded as a separate branch of statistical thinking. On a philosophical level, the theory
brings statistical theory to a human, and thus necessarily finite, level.
Before describing the basic notion of finite population sampling, it is instructive to explore the analogies
and differences with sampling from infinite populations.
These analogies were first described in Jerzy Neymans
seminal articles in the 1930s and are discussed in basic
sampling theory textbooks such as William Cochrans
in the 1970s. In the general framework of finite population sampling, we consider samples of size n from
a finite population of size N, that is, a population with
N elements or members.
The bridge of finite to infinite population sampling
is also seen in terms of a finite population correction
(fpc) that applies to the variances under most sampling
designs. Finite population sampling typically begins
with simple random sampling (SRS), the simplest form
of sampling design, which can be considered with
replacement or without replacement. For SRS designs,
the fpc may be expressed as 1 n=N, or 1 f , where
f is the sampling fraction or the sampling rate,
Further Readings
sampling (without replacement). A sample of n observations for a data element of interest, say, pairs of shoes
sold, are randomly selected from the N members of the
universe, say, of all shoe stores or dwellings, respectively, in a geographic region. (This can also be done
by strata in stratified random sampling. Other strategies
can be more complex. Also, this concept can be applied
to ratios of totals, such as price per unit.) An estimated
mean or total will be found by extrapolating from the
n
P
yi , to an
sum of the n observations in the sample,
I =1
I=n+1
285
286
Focus Group
Background
Official Disclaimer: This is not an endorsement by the U.S.
Department of Energy or the Energy Information
Administration.
See also Elements; Finite Population; n; N; Sample;
Sampling Error; Sampling Fraction; Sampling Variance;
Simple Random Sample; Stratified Sampling; Survey;
Universe; Weighting
Further Readings
FOCUS GROUP
A focus group is a qualitative research method in
which a trained moderator conducts a collective interview of typically six to eight participants from similar
backgrounds, similar demographic characteristics, or
both. Focus groups create open lines of communication across individuals and rely on the dynamic interaction between participants to yield data that would
be impossible to gather via other approaches, such
as one-on-one interviewing. When done well, focus
groups offer powerful insights into peoples feelings
and thoughts and thus a more detailed, nuanced, and
richer understanding of their perspectives on ideas,
products, and policies.
This entry begins by describing the historical background of focus groups. The entry then discusses issues
that researchers might consider in choosing to use focus
groups, including their strengths and limitations.
Next, the entry describes the types of focus groups;
the steps taken to prepare for focus groups; and the
analysis of, and reports pertaining to, the data
Focus Group
on previously collected data, and to aid in the construction of future large scale quantitative survey studies. In all of these instances, it is important to allow
data to emerge freely from participants and to listen for the deeper understanding of the range of ideas.
In other situations, focus groups are not an appropriate choice for researchers. Group interviews should
be avoided when participants are not comfortable with
each other or with the topic, when a project requires
rigorous statistical data, when consensus or emotionally charged information is desired, or when confidentiality is necessary. Additionally, focus groups should
not be used when the act of holding a group, and soliciting opinions and reactions on a potentially sensitive
issue, implies a commitment to a group of participants
that cannot be kept (i.e., those who use this method
have a special obligation to be sensitive to the suggestive force of this method as well as the communities with whom they work).
Relative to other qualitative methods, focus groups
most closely resemble open-ended interviewing and
participant observation. As in open-ended interviews,
focus group moderators approach groups with a protocol
of questions and encourage participants to focus on an
identified topic. Unlike open-ended interviews, however, focus group moderators can be flexible with how
the questions are asked and should use the conversation
(as opposed to the individual interview) as the unit of
analysis. Like participant observation, focus groups
afford the opportunity to observe interaction among
individuals and require that moderators surrender some
power, at least, to the group. Unlike participant observation, though, focus groups produce large amounts of
data on the researchers specific interest in a short
period of time. Two criteria, then, help researchers discern if focus groups are a good methodological choice
for them relative to these closely aligned approaches:
Would the research project be better off with the additional individual-level data acquired from interviews
(than the group-level conversation data from focus
groups)? Would the research project be better off with
the contextual information afforded by naturally occurring events witnessed during participant observation
(than the focused, yet less naturalistic, data gathered
during a focused group conversation)?
287
include how groups provide for exploration and discovery (to learn more about ideas or people who are poorly
understood), context and depth (to discover the background behind thoughts, experiences, and differences),
interpretation (to uncover how things are as they are
and how they got that way), and sharing and comparing
across participants (to offer and sharpen ideas and perspectives through the group process). In all of these
instances, researchers benefit from listening and learning from a conversation across individuals.
The limitations of focus groups are similar to those
of other qualitative methods and stem from the inherent
flexibility of the group interview format. Focus groups
have been critiqued for not yielding generalizable findings (as they typically employ small samplesthree or
four focus groupsthat rarely are selected using probability sampling techniques). Focus group procedures
can be viewed with suspicion, as questions are not
asked the same way each time with regard to ordering
or phrasing, and responses are not independent (and
thus the unit of analysis becomes the group). Focus
group data can be nettlesome, as the results are difficult
to quantify and conclusions depend on the interpretations of researchers.
Types
Many focus group experts acknowledge that these
group conversations can take several forms. Perhaps
the most common type is a full group, in which
a group of 6 to 10 participants (who are recruited
because they share at least one commonality of relevance to the researcher) are gathered together and led
by one moderator (possibly with the aide of a facilitator who helps with procedural aspects of the focus
group) for 90 to 120 minutes. Other types of groups
involve at least one derivation from this approach.
Two-way focus groups allow for one group to watch
another focus group and to discuss the observed interactions and conclusions.
Dual moderator focus groups feature two moderators in which one guides the conversation and another
makes sure that all desired topics are covered. Dueling
moderator focus groups, unlike dual moderator groups,
feature two moderators that encourage these two leaders
to intentionally take opposite sides on the issue under
discussion (and then watch the conversation that
emerges as a response from the group). Respondent
moderator focus groups invite one or more of the participants to act as the moderator on a temporary basis in
288
Focus Group
order to add another layer of perspective to the conversation. Client participant focus groups enable one or
more clients of the group to engage in the discussion,
either covertly or overtly, to add their desired perspective to the discussion. In addition to these takes on the
standard format, focus groups can also feature fewer
participants (mini-groups are composed of four or five
participants), teleconference focus groups encourage
interaction over a telephone or network, and online
focus groups rely on computers and Internet networks
to facilitate a conversation between participants.
Preparation Steps
Focus group preparation involves the following steps.
First, researchers must decide what kind of people
should be studied, how many groups should be conducted, what type of group plan should be adopted for
each group type (e.g., per group recruited on at least
one variable of interest to the researcher), and how
participants will be recruited or sampled. Although it
is rarely used, a probability sampling design can be
used to sample participants. Often recruitment is done
via telephone. It is recommended that at least three to
four groups per group type be conducted.
Deciding upon how large an incentive should be
offered is an important decision as offering too low
an incentive will increase recruitment costs (because
many people will refuse), possibly to the level where
it would have been cost-effective to start out with
a larger incentive in the first place. In deciding about
the amount of incentive, consideration should be
given to travel time and travel cost for the participants
to come to the focus group facility.
Second, researchers should decide on a moderator.
Moderators should not be of an age, ethnic background, or gender that might inhibit group members
from participating in the conversation; must be comfortable with the reality that participants will have
varying levels of comfort in speaking in front of the
group; and must be mindful of their nonverbal behaviors (so as not to affect the group conversation).
Third, researchers should decide upon the desired
level of structure for the group and on the scope of
the protocol (also called a questioning route, topic
guide, or discussion guide). Generally speaking, the
focus group protocol should feature 10 to 12 questions
for a 90-minute group.
Fourth, basic logistical issues of recruitment and
compensation of participants must be considered.
Participants should be selected on a variable of interest to the researchers, and efforts must be made to
ensure that these individuals possess the desired background knowledge or experience to yield valuable
data for the project (while also not having so much
experience that they will silence other members of the
group). Researchers should create careful screeners
that outline the desired characteristics of group members. Researchers can also attempt to overrecruit participants for each group and then, after the participants
have arrived to the location, selectively tell potentially
problematic group members that the group is overenrolled (and thank such members and send them home
with any promised compensation). While it might
seem wasteful to pay an individual for not participating in the group, it can be far more costly to keep that
individual in the group if there is a risk that he or she
will threaten the group dynamics. It also is a good
policy to invite one or two more people to participate
than may be needed because of no-shows.
Fifth, moderators should attend to the best practices
of facilitating the session. Sessions should be held
around a round (or oval or rectangular) table. The moderator should be in a position to see all participants to
help control the flow and content of the conversation,
and if the session is being video recorded, the recording
device should be behind the moderator. Name cards
(with first names only) can be placed around the table
to assign the participants to specific places and to facilitate the recognition of names through the conversation
and during potential transcription.
Sixth, focus group data can be obtained in a variety
of ways. Full transcription is the most costly but the
most accurate means of generating a record of the
group conversation and lends itself to myriad ways of
content analyzing it. Other options include tape-based
coding (in which researchers take notes from audio- or
videotapes searching for pre-established themes); notebased coding (in which researchers rely on their field
notes; in such instances the same researcher and moderator should be employed to ensure consistency across
the field notes); and memory-based coding (recommended only for experienced moderators who have
a strong sense of what they are looking for in group
conversations).
Forced Choice
Ethical Considerations
There are several ethical considerations with focus
groups. One consideration involves judging if participants are at risk. Researchers can protect participants
by providing them with a statement of informed consent (e.g., clarifying that participants are over 18 years
of age and aware that they are participating in a study).
Another ethical risk involves attending to basic privacy issues. Researchers can protect the privacy of
their participants by restricting access to information
that reveals their identities, for example, protecting
identifying information, referring to participants only
by their first names or pseudonyms, protecting access
to the transcripts and tapes of the focus groups,
removing or modifying identifying information on
transcripts, protecting them against the sponsor of the
group, and encouraging the moderator to remind participants not to overdisclose during group discussions. Yet another risk lies in the discussion of
potentially stressful topics. Researchers can protect
participants against stress by emphasizing how participation is voluntary, setting boundaries for the group
289
FORCED CHOICE
Forced choice refers to a specific format for response
options in survey questionnaires. In a forced choice
format, respondents are not given a specific option
to reflect a nonresponse type choice, such as no
opinion, dont know, not sure, or not applicable. Respondents must select a response choice that
provides a specific answer to the survey item.
The elimination of item nonresponse choices in
the forced choice format increases the number of survey records with responses that are usable for analysis. Survey designers use the forced choice format to
encourage respondents to provide an actual response.
The forced choice format is common in key survey
questions, especially qualifier (screener) questions.
For example, question items about household income
and number of household members might use forced
choice response formats in a survey of households
below the poverty level so as to make certain that
everyone provides an answer to allow the researchers
to determine whether a given respondent is eligible or
ineligible for the survey.
Interviewer-administered surveys sometimes use
a more flexible version of the forced choice format
where the item nonresponse choices are available for
the interviewer to see, and thus to code, but are not
290
Frame
Further Readings
FRAME
A frame is used to identify elements in the population.
Elements are the fundamental unit of observation in the
survey. A frame may look very different depending on
how the population of interest is defined and how its
elements are defined. A well-defined appropriate frame
is essential to the sampling process, the development of
weights for use in analyses of survey data, the minimization of coverage error, and the understanding of what
coverage error may exist. This entry describes the basic
concept of a frame; the impact it has on sampling,
weighting, and coverage; and how it is developed in
relation to the survey population and the survey sample.
It also discusses several commonly used frames and
their specific issues.
A major goal of most surveys is to describe a specific
population. For example, the U.S. government conducts
two surveys specifically to estimate the rate of unemployment in the country each month: the Current
Employment Statistics program (a survey of business
establishments) and the Current Population Survey
(CPS; a survey of people). Each month, the U.S. Census Bureau interviews a sample of people for the CPS.
However, selecting that sample is difficult as there is no
accurate, up-to-date list of people in the United States
with contact information. Without such materials, it is
difficult to draw a sample. But the U.S. Census Bureau
can construct a frame of housing units in the country
using various sources (the decennial census, building permits, etc.). Therefore, the U.S. Census Bureau
defines the survey population as people living in housing units. This revised definition of the survey population is important because it allows for a better frame to
be constructed. Of course, a disadvantage is that the
homeless are not included, but this is judged to be
acceptable to meet the goal of this survey.
Among the domains of research that use statistical
techniques, survey research is unique in assigning so
much importance to the source of sample units.
Whereas most statisticians view sample units as a way
to describe a process of interest, survey statisticians
view sample units as a way to describe a population of
interest. Other statisticians would only be interested in
Frame
291
292
Frequency Distribution
Further Readings
FREQUENCY DISTRIBUTION
A frequency distribution is a tabular representation of
a survey data set used to organize and summarize the
data. Specifically, it is a list of either qualitative or
quantitative values that a variable takes in a data set
and the associated number of times each value occurs
(frequencies).
The frequency distribution is the basic building
block of statistical analytical methods and the first
step in analyzing survey data. It helps researchers
(a) organize and summarize the survey data in a tabular
Frequency Distribution
Thus, the simple frequency distribution of the listing of the 25 students math exam scores will look
like Table 1. (Of note, this table does not contain any
percentages, which could be added to the table and
are what is called relative frequency.)
In some situations, this simple frequency distribution tabulation is unpractical, even impossible, or
simply not needed by the researcher, for instance,
when the variable under consideration has continuous
values with decimal points (e.g., 88.5, 75.6, 94.4)
instead of discrete values (e.g., 88, 75) or when the
number of possible data points (values) is too large to
construct such simple frequency distribution.
Table 1
293
Frequency
Cumulative
Frequency
65
//
69
75
///
80
///
83
///
12
85
///
15
89
///
18
95
///
21
98
///
24
100
25
Score
Total
25
294
Table 2
Score
Intervals
FRUGing
Frequency
Cumulative
Frequency
65-71
///
72-78
///
79-85
/////////
15
86-92
///
18
93-100
///////
25
Total
25
FRUGING
When a survey is not conducted to gather valid information but instead to stimulate fund-raising for a cause
or organization, this practice is know as FRUGing
(fund-raising under the guise of research) and
F-test
and the research consequences of such an action. Furthermore, in addition to being an unethical practice,
FRUGing telephone calls are also illegal in the United
States under the Federal Trade Commissions 2003
Telemarketing Sales Rule.
In Canada, FRUGing is known as SUGing (soliciting under the guise of research), leading to confusion
in the United States, the United Kingdom, and continental Europe, where SUGing is defined as selling
under the guise of research.
Geoffrey R. Urland and Kevin B. Raines
See also Nonresponse; SUGing; Survey Ethics;
Telemarketing
Further Readings
F-TEST
An F-test is any statistical hypothesis test whose test
statistic assumes an F probability distribution. The
F-test is frequently associated with analysis of variance (ANOVA) and is most commonly used to test
the null hypothesis that the means of normally distributed groups are equal, although it can be used to test
a variety of different hypotheses. The F-test was
devised as an extension to the t-test: F is equal to the
squared value of t (t2 = F). Although the F-test produces the same information as the t-test when testing
one independent variable with a nondirectional
hypothesis, the F-test has a distinct advantage over
the t-test because multiple independent groups can
easily be compared. Survey researchers often use the
F-test because of its flexibility to compare multiple
groups and to identify whether the relationship they
295
are studying among a set or combination of independent variables has occurred by chance.
For example, if a survey researcher hypothesizes that
confidence in government varies between two groups of
persons with different levels of education (e.g., those
with a college degree and those without a college
degree), a t-test and an F-test would produce the same
results. More often, one is interested in comparing multiple or subsets of independent variables. The F-test
gives researchers the ability to examine the independent
(main) effects of education and the combined (main)
effects of a set of socioeconomic status (SES) variables
(e.g., education, income, and occupation) as well as the
potential effects of the interaction among these variables on confidence in government.
F-tests are also often used to test the effects of subsets of independent variables when comparing nested
regression models. For instance, the researcher could
compare the F-tests from a model with only the SES
variables, a model with a set of variables measuring
satisfaction with government services (e.g., police,
fire, water, and recreation), and an overall model with
both sets of variables to determine whether, as
a group, the SES and government services variables
make a statistically significant contribution to explaining differences in confidence in government.
The F-test compares the observed value to the
critical value of F. If the observed value of F (which
is derived by dividing the mean squared regression
by the mean squared error) is larger than the critical
value of F (obtained using the F-distribution table),
then the relationship is deemed statistically significant and the null hypothesis is rejected. There are
two types of degrees of freedom associated with the
F-test: The first is derived by subtracting 1 from
the number of independent variables and the second
by subtracting the number of independent variables
from the total number of cases. In output tables from
statistical software packages, such as SPSS, SAS, or
STATA, the F value is listed with the degrees of
freedom and a p-value. If the p-value is less than the
alpha value chosen (e.g., p < :05), then the relationship is statistically significant and the null hypothesis
is rejected. It is important to note that the F-test is
sensitive to non-normality when testing for equality
of variances and thus may be unreliable if the data
depart from the normal distribution.
Kelly N. Foster and Leah Melani Christian
296
F-test
Further Readings
G
elections. This started when he was a student and was
enlisted to help his mother-in-law run for statewide
office in Iowa. She won her first election by the narrowest of margins, but Gallup began to survey her
constituents and used the resulting information to help
her build increasing electoral margins. He was known
as a person of high ethical standards, and as his political polling work expanded and became more public,
he stopped voting in presidential elections so his published polls would be free of any allegations of personal preference or bias.
During this period, he formed the American
Institute of Public Opinion, from which he began to
conduct national surveys of public opinion and produce
a newspaper column. He also founded the Audience
Research Institute where he did work on the response
of film audiences to new releases, including the development of an innovative method to measure consumer
reactions to films and new products at the Mirror of
America, a converted theater in Hopewell, New Jersey.
He became a vice president of Young & Rubicam
and served in that capacity until 1947 when he turned
full time to the other businesses he had developed. He
managed three firms in Princeton, New Jersey. The
American Institute of Public Opinion conducted the
Gallup Poll and produced three syndicated newspaper
articles a week from it. A second firm, Gallup and
Robinson, conducted market research for a number of
clients. In addition, a third firm, the Gallup Organization,
conducted special surveys tailored to the interest and
needs of individual clients.
Gallup catapulted to fame in 1936 because of his
belief that he could apply face-to-face interviewing
297
298
Gallup Poll
Further Readings
GALLUP POLL
The Gallup Poll is the longest continuous measure of
public opinion in the United States, having been conducted for more than 70 years, and is the most widely
recognized brand name in the field of survey research.
On Sunday, October 20, 1935, George Gallup
officially launched his scientific polling operation
nationwide with America Speaks: The National
Weekly Poll of Public Opinion. About three dozen
newspapers carried his first release, including The
Washington Post, whose editor heralded the event by
hiring a blimp to pull a streamer over the city to
announce the new column. Gallup called his operation
the American Institute of Public Opinion, which he
located in Princeton, New Jersey, where he also lived.
To attract subscribers, he made a money-back guarantee that his poll-based prediction of the 1936 presidential election would be more accurate than that of
The Literary Digest, which had correctly predicted
Herbert Hoovers win in the 1928 election within less
than one percentage point of the election outcome.
Gallup made good on his promise, predicting Franklin
Delano Roosevelt would beat Alf Landon, while the
Digests final poll predicted a Landon landslide.
Gallup kept the name American Institute of Public
Opinion for more than 20 years, but within a very
short time, his poll was known simply as the Gallup
Poll. He, too, used that name, giving souvenir cards
to cooperative respondents with the announcement,
You have been interviewed for THE GALLUP
POLLThe American Institute of Public Opinion.
The Gallup Poll increased its newspaper subscribers substantially over the years, though it suffered
Gatekeeper
299
International Association remains a viable organization. A Gallup Poll in the United States clearly refers
to the original operation founded by George Gallup.
But reports from Gallup International, and even
from a Gallup Poll in some countries, are not necessarily from the U.S.-based Gallup Organization.
David W. Moore
See also Gallup, George; Poll; Pollster; Public Opinion;
Public Opinion Research
Further Readings
GATEKEEPER
A gatekeeper is a person who stands between the data
collector and a potential respondent. Gatekeepers, by
virtue of their personal or work relationship to
a respondent, are able to control who has access, and
when, to the respondent. Furthermore, they may be
encountered on both field (in-person) and telephone
data collection surveys. They may also be encountered in mail surveys in which a respondents material
must be sent to, or in care of, another individual for
distribution to the respondent (e.g., sending materials
to a parent for distribution to a respondent away at
college or in the military, or sending materials to an
employer for distribution to sampled employees).
Gatekeepers can take many forms, including guards
or doormen at secured residential or business complexes; secretaries, administrative assistants, or office
300
accomplished by the regular collection and distribution of the NORC General Social Survey (GSS) and
its allied surveys in the International Social Survey
Programme (ISSP).
Origins
Two social science movements in the 1960s spawned
the GSS. First, the social indicators movement stressed
the importance of measuring trends and of adding noneconomic measures to the large repertoire of national
accounts indices. Second, scholarly egalitarianism was
advocating that data be made available to scientists at
all universities and not restricted to elite senior investigators at large research centers. In 1971, these ideas
were presented together in a modest proposal to the
National Science Foundation (NSF) for twenty-some
questions that called for the periodic asking of items
on national samples with these data immediately distributed to the social science community for analysis
and teaching. Approval from NSF plus supplemental
funding from the Russell Sage Foundation spawned the
first GSS in 1972.
Growth
From 1972 to 2004, the GSS conducted 25 independent, cross-sectional, in-person surveys of adults
living in households in the United States, and in
1982 and 1987, it carried out oversamples of African
Americans. There are a total of 46,510 respondents.
During most years until 1994 there were annual surveys of about 1,500 respondents. Currently about
3,000 cases are collected in a biennial GSS.
Additionally, since 1982 the GSS has expanded
internationally. The cross-national research started as
a bilateral collaboration between the GSS and the
Allgemeine Bevolkerungsumfrage der Sozialwissenschaften (ALLBUS) of the Zentrum fur Umfragen,
Methoden, und Analysen in Germany in 1982 and
1984. In 1984, they joined with the British Social
Attitudes Survey of the National Centre for Social
Research and the National Social Science Survey at
Australian National University to form the ISSP.
Along with institutes in Italy and Austria, the founding
four fielded the first ISSP in 1985. ISSP surveys have
been collected annually since that time, and there are
now 41 member countries (the founding four plus
Austria, Belgium, Brazil, Bulgaria, Canada, Chile,
the Czech Republic, Cyprus, Denmark, Dominican
301
Content
The GSS lives up to its title as General. The 4,624
variables in the 19722004 cumulative data set run
from ABANY (legal abortion if a woman wants one
for any reason) to ZOMBIES (behavioral medication
for children) and have core batteries on such topics as
civil liberties, confidence in institutions, crime/violence,
gender roles, government spending, intergroup relations,
psychological well-being, religion, and work.
The balance of components has changed over time,
but currently half of the GSS is replicating core topics,
one sixth deals with cross-national topics, and one third
consists of in-depth, topical modules. Recent ISSP
modules include the environment, gender and work,
national identity, and the role of government. Recent
topical modules include work organizations, multiculturalism, emotions, gender, mental health, giving/
volunteering, altruism, Internet, and genetics. The data
sets are available on the GSS Web site.
Research Opportunities
Several important types of research are facilitated by
the GSS design. First, the replication of items allows
the study of societal change. Moreover, because all
surveys and all variables are organized in one cumulative file, researchers do not have to patch together
time series from different and often incompatible data
sets. By just running the data by YEAR, more than
1,600 trends can be tracked.
Second, replication also means that subgroups can
be pooled across surveys to aggregate an adequate
sample for analysis. For example, Blacks at about
12% of the population account for about 175 respondents in a 1,500 case sampletoo few for detailed
analysis. But in the 19722004 GSSs there are 6,399
Blacksmore than enough for analysis.
Third, researchers can both track trends and pool
cases. For example, Blacks from the 1970s, 1980s,
1990s, and 2000s can be combined to have four time
points and still have between 1,216 and 2,208 Blacks
in each subsample.
302
Geographic Screening
Further Readings
GEOGRAPHIC SCREENING
Most surveys target a specific geopolitical area, so that
estimates produced from their data can be representative of that area. For some surveys, the area consists of
an entire nation, but other surveys aim to produce
regional estimates (such as those for states, counties, or
zip codes). Thus, such surveys require some sort of
geographic screening, or determination that a sampled
case falls within the target geography, to establish
study eligibility. If the screening is inherent in the sampling design itself, no further information is required.
Other studies require additional screening steps, either
prior to sample release or during the field period.
Decisions about the level of geographic screening for
a study arise from the sampling frame to be used.
When the sampling frame for a desired geographic
area can be tied clearly to that area, no screening is
needed beyond the design of the sample itself. For
example, the sampling frame for a mail-based survey
is composed of addresses that are known to be within
a specific geographic area. Thus, geographic screening is part of the sampling design itself. Similarly, the
sampling frame for an area probability sample is, by
definition, geopolitically based, and therefore, no
additional geographic screening is needed.
Telephone surveys typically use sampling frames
that are defined by areas such as the nation as a whole,
states, counties, cities, Census tracts, or zip codes.
Samples of telephone numbers are generated by linking
telephone exchanges to the desired target geography.
In random-digit dialing (RDD) surveys of relatively
small areas, it is impossible to match exactly telephone
numbers with the boundaries of the target area.
Researchers must determine whether the level of agreement between sampled telephone exchanges and the
geography of interest is sufficient for their purposes or
whether further questioning of the respondents to establish their location is warranted. This questioning can be
complex and difficult to operationalize, thus leading to
errors of omission and commission in which some eligible people are incorrectly screened out and some
ineligible people are incorrectly screened in.
Gestalt Psychology
303
GESTALT PSYCHOLOGY
Often summarized by the phrase The whole is greater
than the sum of its parts, Gestalt psychology refers to
304
Gestalt Psychology
an approach to understanding everyday human experiences as a whole rather than breaking them down into
a collection of individual stimuli, behaviors, or both.
This approach recognizes the ability of the human brain
to piece together separate stimuli in context to one
another and their surroundings so that the overall
impression of an object, event, or other stimulus provides more information to the individual making the
observation than was provided by the individual component stimuli. In other words, the individual may actually experience something that is not present in the
stimuli themselves. A common example of this is
watching a motion picture at a theater. Motion pictures
(on film) actually consist of a series of still shots presented in rapid succession to give the impression of
movement. Any one frame of the movie alone is simply
a still photograph. When presented in rapid succession,
however, the brain is able to fill in the gaps so that the
individual has the experience of fluid motion.
This ability of the human brain, referred to as the
phi phenomena, was used by Max Wertheimer to
demonstrate the value of a holistic approach to studying psychology. Since that time, many other principles of Gestalt psychology have been identified.
These include emergence, reification, multi-stability,
and invariance. Emergence occurs whenever there is
confusion between figure and ground in an image.
The figure of an image refers to the subject or object,
whereas the ground refers to the setting or background. The classic example of emergence in psychology texts is a black and white picture that
initially appears to be random splotches of black ink
(figure) on a white paper (ground). When the individual trains his or her eye on the white portion of the
picture as the figure instead of the ground, a picture
of a spotted Dalmatian dog appears.
Reification is similar to emergence in that the phenomenon is based on the visual relationship between
figure and ground. Reification, however, is more often
associated with the arrangement of geometric shapes,
whereby the relationship of the shapes (figures) on the
ground begin to form a shape of the ground. Hence,
the ground becomes the figure. Multi-stability refers to
the tendency for an ambiguous figure to be interpreted
as two or more different figures such that the brain cannot decide which figure is correct. This phenomenon
can be isolated to the figure itself (e.g., Neckers cube),
as well as a product of figure/ground confusion (e.g.,
Rubins Figure/Vase Illusion). Finally, the principle of
invariance refers to the brains ability to recognize
Graphical Language
Further Readings
GRAPHICAL LANGUAGE
Respondents interpret the meaning of survey questions
from both the verbal and graphical language used in
the questionnaire. Graphical language includes various
elements such as contours and lines, images, numbers,
and symbols and their attributes such as movement,
spatial location, color or contrast, and size. These
graphical elements influence how respondents perceive
survey information and therefore significantly impact
the survey response process. Graphical language can
convey meaning independently, or it can influence or
modify how written text is perceived. Thus, it can be
compared to paralanguage that is conveyed aurally
through a speakers voice (e.g., inflection, tone) and to
nonverbal communication in face-to-face interactions
(e.g., gaze, facial expressions, body language, and gestures). Because paper and Web surveys transmit information visually, survey designers can strategically use
graphical language to convey information and meaning
to respondents. However, graphical language may also
confuse survey respondents when used in competing
ways, carelessly, or inconsistently.
Graphical language acts like a visual paralanguage
to emphasize or draw attention to information in a survey, create groupings and subgroupings of information,
and improve navigation through the survey. Graphical
attributes such as size, contrast, color, layout, and position can influence the meaning assigned to written text
in many ways. For example, in Figure 1, the larger size
and use of reverse print for the question number and
the Next Page button and the underlining of the
word satisfaction in the question stem help draw
respondents attention to this information. In addition,
locating the question number 1 in the upper left of
the screen helps convey to respondents that the number, one, means this is where they should begin.
Furthermore, graphical language can encourage
respondents to perceive information as belonging
together in a group and therefore as related conceptually.
305
The Gestalt principles of proximity, similarity, connectedness, and common region indicate that information is grouped visually when items are located
near each other, share similar graphical attributes
(shape, size, color/contrast, etc.), and are connected
or enclosed within a common region such as a square.
For example, in Figure 1, using similar size, font,
and reverse print for the question number and Next
Page button encourages respondents to group them
visually and then conceptually as tools to aid in navigating through the survey. In addition, using a larger
size for the question stem but similar font size for
each item helps respondents perceive the subgroups
within the question group (i.e., response items separate from question stem). Grouping is also established by the gray lines in Figure 1 that connect the
text of each item to the appropriate answer spaces
and by positioning the radio buttons in closer proximity horizontally than vertically.
In addition to acting like a visual paralanguage,
graphical elements such as symbols, logos, pictures,
and other images can independently influence the tone
of printed survey contacts (letters or emails), instructions to respondents, individual questions, and response
categories. Appropriate logos and images on contact
letters and survey instruments can increase respondent
motivation and commitment to completing the survey.
Moreover, pictures and other images can be used to
convey information or enhance the meaning of written
text in much the same way that facial expressions, body
language, and gestures do in face-to-face communication. For example, in Figure 1 the combination of facial
expression images and numbers are used to convey the
meaning of each scale point.
Since research on Web surveys has shown that pictures and other graphical images can modify the meaning respondents assign to particular questions and
concepts, images must be chosen carefully to avoid
negative impacts on measurement, such as when the
inclusion of pictures of an endangered species artificially increased respondent support for that species.
Moreover, research has shown that including sizable
graphical elements in Web surveys can slow page
download times, thus increasing respondent burden and
sometimes nonresponse. The increased use of Web surveys has heightened the attention given to graphical language in survey questionnaire design because graphical
language is easy and inexpensive to include and modify
in Web surveys (i.e., no printing costs). In addition to
increasing the need for research into the effects of
306
Figure 1
Guttman Scale
GUTTMAN SCALE
Given a data set of a sample of N persons and of
a selection of n survey questions (variables) designed
for measuring a particular traitsuch as peoples
position on a political issue or their ability in a specific
field of human activitya Guttman Scale is the
hypothesis that the data set would have a cumulative
structure, in the following sense: For any two persons
in the observed sample, one of them would exhibit all
the manifestations of the trait that the other person
would, and possibly additional ones. That is, there
would be no two persons in the sample with the one
Guttman Scale
Table 1
Profile Based on
Observed Responses
Score x Assigned
to the Profile
1111
2111
3111
3211
3221
Example
3321
Consider, for example, public attitudes toward intervention in foreign countries as a trait in question.
Presenting an appropriate sample of the adult population with the following questions can serve in measuring this attitude:
3322
307
308
Guttman Scale
appliances, and more. Different versions of a coefficient of reproducibility have been proposed for assessing the degree of fit of data to a Guttman Scale. It is
important to realize that deviations from a Guttman
Scale can be of two kinds: (1) random deviations,
suggesting the existence of a Guttman Scale with
noise, and (2) structured deviations, suggesting that
two or more scales are needed to measure the studied
trait meaningfully. Developing the procedures of multiple scaling, using partial order scalogram analysis
by base coordinates (POSAC), Samuel Shye has generalized Guttman Scale to dimensionalities higher
than one.
Misconceptions
Several misconceptions have accompanied the notion
of the Guttman Scale throughout the years:
1. Some have sought to construct a Guttman
Scale by eliminating variables (or respondents) from
their data. Assuming one has had a rationale for selecting variables as representing a concept (and for defining
the sampled population), such eliminations in a particular application may be questionable and should be
avoided, except possibly in the context of the larger
cycle of scientific investigation where concepts are
reshaped and redefined. As noted, a Guttman Scale is
essentially a hypothesis, which may or may not be supported by data.
2. Many believe that a Guttman Scale necessarily
involves only dichotomous variables. Indeed, most
illustrations and many applications in the literature are
with such variables. As the example presented earlier
shows, this need not be the case. However, when
a Guttman Scale is found in dichotomous variables, the
variables are naturally ordered according to their sensitivity in detecting the presence of the measured trait.
H
Hagan and Collier reported favorable results in a
sample compared with one that selected respondents
by the T-C-B procedure. Demographic characteristics
were similar, and the refusal rate at respondent selection was almost 5% less than the T-C-B method. Both
methods have a small within-unit coverage bias
because adults in households of more than two adults
of the same sex whose ages are between the oldest
and youngest adults have no chance of selection.
Also, in three-adult households, one of the three
adults would have the chance of being designated the
respondent twice. Troldahl and Carter considered
these violations of random sampling to be very small.
Research using census data has shown that the bias
caused by the Hagan-Collier method is very slight.
An example of Hagan-Collier question wording is
May I please speak to the "youngest man"? Another
example is For this survey, I need to speak with the
youngest adult male in your household over the age of
17, if there is one. If there is none, the following question is asked: Then may I please speak with the youngest adult female? Wording should include the fact that
the designated respondent is not the one who happens
to be at home at the time but, instead, is the one who
lives in the household. Interviewers need training in
awareness that a woman in a one-person household fits
as either the youngest woman or the oldest woman, that
youngest man can apply to an elderly male, and that
informants can be confused and think the interviewer is
asking for an old man (or a young woman), among
other things.
Cecilie Gaziano
309
310
Half-Open Interval
Further Readings
HALF-OPEN INTERVAL
The half-open interval is a linking procedure that is
used in some surveys to address issues of noncoverage.
Sampling frames or lists are not perfect, and survey
researchers often use frames with problems such as
missing elementary units, blanks for relevant information, clusters of elementary units, and duplicate listings.
Of these problems, sample frames that are missing elementary unitsknown as noncoveragefrequently
present important practical problems. For example,
housing lists incorporating addresses are often used in
household surveys. The housing list is often out of
date, and when an interviewer visits the housing unit
selected from the list, there can be newly constructed
housing units that were not on the original list used for
sampling. When there is noncoverage of the target
population due to an imperfect sample frame, specific
remedies are required to improve the frame coverage.
To account for the units missing from a frame,
researchers may use a linking procedure. This is a useful
device in many situations where the missing units are
scattered individually or in small clusters. The linking
procedure is often called the half-open interval, which
Further Readings
HANG-UP DURING
INTRODUCTION (HUDI)
A telephone interview that is terminated by the
respondent during the introduction of the interview
shortly after an interviewer has made contact is called
a hang-up during introduction (HUDI). HUDI is a form
of refusal to the survey request that is growing in
occurrence and provides little or no opportunity for
the interviewer to overcome the respondent objection.
The most difficult time to assure success in a telephone interview is during the first few seconds of the
call. During this time, the interviewer has to identify
the purpose and legitimacy of the call. In the past two
decades, there has been an increasing tendency for
respondents to hang up on the interviewer during this
time without completing a full interaction with the
interviewer. In contrast, in the 1970s and 1980s when
telephone surveys were first gaining their legitimacy
as a valid survey method of the public, there was
a social norm that held most people to not hang up on
a stranger (the interviewer who called them) abruptly.
However, with the problems causes by excessive telemarketing in the 1990s and busy lifestyles, people are
far less reluctant to just hang up.
Early work in the late 1980s found that 40% of refusals occur in the first two sentences of the introduction.
Similarly, more recent research has found that HUDIs
last an average of 15 seconds. A study in 2003 found
that one in four HUDIs occur without the respondent
saying anything at all to the interviewer, and a 2005
study found two fifths of respondents hanging up on the
interviewer without speaking. Respondents may give
brief and abrupt objections, which are most frequently
an indication of not interested or dont have time
and then abruptly hang up.
Urbanicity has been found to be negatively associated with response rate. This finding is reflected in the
incidence of HUDIs by metropolitan area size. A 2005
study found a 6 percentage point gap in the occurrence
of HUDIs in the 10 largest metropolitan areas compared
311
312
In D. W. Maynard, H. Houtkoop-Steenstra, N. C.
Schaeffer, & J. van der Zouwen (Eds.), Standardization
and tacit knowledge: Interaction and practice in the
survey interview (pp. 161178). New York: Wiley.
Lavrakas, P. J. (1993). Telephone survey methods: Sampling,
selection, and supervision. Newbury Park, CA: Sage.
Mooney, C., & OHare, B. C. (2006, May). HUDIs: A look
at quick refusals. Paper presented at the American
Association of Public Opinion Research, Montreal,
Quebec, Canada.
Oksenberg, L., & Cannell, C. F. (1988). Effects of
interviewer vocal characteristics on nonresponse.
In R. M. Groves, P. P. Biemer, L. E. Lyberg,
J. T. Massey, W. L. Nicholls, & J. Waksberg (Eds.),
Telephone survey methodology (pp. 257272).
New York: Wiley.
HANSEN, MORRIS
(19101990)
Morris Hansen was one of the most innovative and
influential statisticians of the 20th century. He helped
pioneer the work of sampling techniques and the use
of the total survey error perspective in designing surveys. He also developed quality control procedures
for surveys that helped legitimize the accuracy of survey research. He attended the University of Wyoming
(B.S.) and American University (M.S.) and had a long
career at the U.S. Census Bureau and Westat, until his
death in 1990.
Morris Hansen was born in Thermopolis, Wyoming,
in 1910 and spent his formative years in Worland,
Wyoming. He earned a B.S. in accounting from the
University of Wyoming in 1934. After graduation he
started his career at the U.S. Census Bureau in
Washington, D.C. Fascinated by statistics in college,
Morris started his formal training as a statistician when
he arrived in Washington, taking evening courses at the
Graduate School of the U.S. Department of Agriculture
and eventually earning a masters degree in statistics
from American University in 1940.
During his first years at the U.S. Census Bureau,
Morris started to establish himself as a highly skilled
statistician. At age 26, Morris worked on the sample
design for an unemployment survey (which would
later evolve into the Current Population Survey)
for the federal government, an innovative project
because, at the time, the government preferred using
census data for their studies. Hansen convinced them
Further Readings
HIT RATE
In general, hit rate is a ratio or a proportion, and the
term is used in many environments and disciplines
with specific definitions for both the denominator and
numerator. In the online world, it usually means the
number of hits a Web page receives during some
period of time. In marketing, it can mean the number
of sales achieved as a percentage of the number of
sales calls made. In survey research, hit rate is most
commonly used to refer to the proportion of telephone
numbers in a sample that are working residential
numbers. However, hit rate is sometimes used to
mean incidence. Both meanings of hit rate are essential components of sample size calculations.
In its common usage, hit rate is synonymous with
the terms working residential rate and working phone
rate. For a residential telephone sample, eligible units
would be those numbers that connect to a household,
while ineligible numbers would include nonworking
or disconnected numbers, data/fax lines or numbers
that connect to an ineligible unit such as a business.
For an in-person survey it might mean the proportion
of occupied housing units in the frame, and for a mail
survey it could mean the proportion of deliverable
mail pieces in the list.
Hit rate is also sometimes used as a surrogate for
incidence or the proportion of qualified contacts to all
contacts. For example, a survey might require screening
households for further eligibility, such as living within
a particular geography, having a certain income, or
belonging to a specific racial or ethnic group. In these
cases the hit rate would be the probability of finding
members of that target population among all contacts.
Understanding and being able to estimate these
two hit rates is integral to sample design. Most formulas for calculating the number of sample units needed
to complete a set number of interviews include both
of these definitions of hit rate (working phone rate
and incidence) in conjunction with estimates of contact rate and completion rate for the survey.
Linda Piekarski
See also Contact Rate; Eligibility; Nonresidential; Out of
Sample; Target Population
313
The Dominance of
Horse Race Journalism
Over the past 40 years, the rise in horse race journalism has been called by Thomas Patterson the quiet
revolution in U.S. election reporting. Pattersons
now classic analysis finds that coverage focusing on
the game schema that frames elections in terms of
strategy and political success rose from 45% of news
stories sampled in 1960 to more than 80% of stories
in 1982. In comparison, coverage focusing on policy
schema, which frame elections in terms of policy and
leadership, dropped from 50% of coverage in 1960 to
just 10% of coverage analyzed in 1992.
Other analyses confirm the contemporary dominance of the horse race interpretation in election
314
Hot-Deck Imputation
315
Matthew Nisbet and Michael Huge argue that the strategy frames preferred he said, she said style leads to
a false balance in the treatment of technical issues
where there is clear expert consensus. Polling experts
offer other reservations. For example, Frankovic and
others warn that overreliance on horse race journalism
and polling potentially undermines public trust in the
accuracy and validity of polling.
Matthew C. Nisbet
See also Bandwagon and Underdog Effects; Polls; Pollster;
Precision Journalism; Public Opinion; Public Opinion
Research; Tracking Polls; World Association for Public
Opinion Research (WAPOR)
Further Readings
HOT-DECK IMPUTATION
Hot-deck imputation is a popular and widely used
imputation method to handle missing data. The
316
Hot-Deck Imputation
Household Refusal
Assumptions
317
Further Readings
HOUSEHOLD REFUSAL
The household refusal disposition is used in telephone, in-person, and mail surveys to categorize a case
in which contact has been made with a household, but
someone in the household has refused either a request
by an interviewer to complete an interview (telephone
or in-person survey) or a mailed request to complete
and return a questionnaire (mail survey). The household refusal typically occurs before a designated
respondent is selected. Household refusals are considered eligible cases in calculating response and cooperation rates.
In a telephone survey, a case is coded with the
household refusal disposition when an interviewer dials
a telephone number, reaches a person, and begins the
introductory script, and the person who answers the
telephone declines to complete the interview. In calls
ending in a household refusal, the person the interviewer spoke with may provide an explanation for the
refusal, such as We dont do surveys, I dont have
time, Were not interested, or Please take us off
your list. In other instances, the person contacted may
simply hang up. It is important to note that for a case
to be coded as a household refusal, the refusal either
must occur before the interviewer selects the designated respondent or must be generated by a household
member other than the designated respondent. If
a refusal was generated by the person known to be the
designated respondent, the case should be coded with
the respondent refusal disposition, not the household
refusal disposition. Past research has shown that the
majority of refusals in a telephone survey come from
household refusals.
Household refusals in an in-person survey occur
when an interviewer contacts a household, a household
member answers the door, the interviewer begins the
introductory script, and the person declines to proceed
with the survey request. As in a telephone survey,
cases should be considered household refusals when
the refusal occurs before the interviewer selects a designated respondent or when the refusal is provided
by a household member other than the designated
respondent. A case in an in-person survey should be
coded with the respondent refusal dispositionnot
318
HTML Boxes
Further Readings
HTML BOXES
Hypertext markup language (HTML) boxes are used
in Web-based survey applications and come in all
shapes and sizes, but they all allow respondents to
HTML Boxes
319
Further Readings
I
The survey response rate is an often-used criterion
for evaluating survey data quality. The general and
conservative underlying assumption of this is that
nonresponse is not ignorable. To achieve high response
rates, survey organizations must devote a great deal
of resources to minimize nonresponse. They might
lengthen the field period for data collection, use expensive locating sources to find sample members, use multiple and more expensive modes of contact, and devote
additional resources (e.g., through incentives) to convince sample members to cooperate with the survey
request. Complex statistical techniques may also be used
after data collection to compensate for nonresponse bias.
All of these techniques dramatically increase the cost of
conducting surveys. In light of this, recent trends of
increasing survey nonresponse make the questions of if
and when nonresponse is ignorable especially important.
If nonresponse does not yield biased estimates, then by
implication, it is not advantageous to spend additional
resources on minimizing it.
It is difficult to conduct research that evaluates nonresponse error because data for nonresponders to the
survey have to be available from some other source.
When available, administrative records can be used to
evaluate assumptions about nonresponders. However,
such studies are rare and expensive to conduct. Other
methods used to evaluate nonresponse error include
comparing hard-to-reach respondents with easy-to-reach
and cooperative respondents or comparing estimates
in surveys with identical questionnaires but different
response rates.
Though there is relatively sparse evidence that measures nonresponse error in large surveys, nonresponse
IGNORABLE NONRESPONSE
Researchers who use survey data often assume that
nonresponse (either unit or item nonresponse) in the
survey is ignorable. That is, data that are gathered
from responders to the survey are often used to make
inferences about a more general population. This
implies that the units with missing or incomplete data
are a random subsample of the original sample and do
not differ from the population at large in any appreciable (i.e., meaningful and nonignorable) way. By
definition, if nonresponse is ignorable for certain variables, then it does not contribute to bias in the estimates of those variables.
Because nonresponse error (bias) is a function of
both the nonresponse rate and the difference between
respondents and nonrespondents on the statistic of
interest, it is possible for high nonresponse rates to
yield low nonresponse errors (if the difference between
respondents and nonrespondents is quite small). The
important question, however, is whether there truly are
no meaningful differences between respondents and
nonrespondents for the variables of interest. In a major
article on this topic, reported in 2006 by Robert M.
Groves, no consistent patterns were found between the
amount of nonresponse and the amount of nonresponse
bias across the myriad surveys that were investigated.
That is, in many cases the nonresponse was ignorable
and in others it surely was not, and this happened
regardless of whether there was a great deal of nonresponse or very little.
321
322
Imputation
Further Readings
IMPUTATION
Imputation, also called ascription, is a statistical
process that statisticians, survey researchers, and other
scientists use to replace data that are missing from
a data set due to item nonresponse. Researchers do
imputation to improve the accuracy of their data sets.
Missing data are a common problem with most
databases, and there are several approaches for handling this problem. Imputation fills in missing values,
and the resultant completed data set is then analyzed
as if it were complete. Multiple imputation is a method
for reflecting the added uncertainty due to the fact that
imputed values are not actual values, and yet still
Imputation
Among valid procedures, those that give the shortest intervals or most powerful tests are preferable.
Missing-Data Mechanisms
and Ignorability
Missing-data mechanisms were formalized by Donald
B. Rubin in the mid-1970s, and subsequent statistical
literature distinguishes three cases: (1) missing completely at random (MCAR), (2) missing at random
(MAR), and (3) not missing at random (NMAR). This
terminology is consistent with much older terminology
in classical experimental design for completely randomized, randomized, and not randomized studies. Letting
Y be the N (units) by P (variables) matrix of complete
data and R be the N by P matrix of indicator variables
for observed and missing values in Y, the missing data
mechanism gives the probability of R given Y and possible parameters governing this process, x : pR|Y; x.
MCAR
323
324
Imputation
Single Imputation
Single imputation refers to imputing one value for
each missing datum, where the resulting completed
data set is analyzed using standard complete-data
methods. R. J. A. Little and Rubin offer the following
guidelines for creating imputations. They should be
1. Conditional on observed variables
2. Multivariate, to reflect associations among missing
variables
3. Randomly drawn from their joint predictive distribution rather than set equal to expectations to ensure that
correct variability is reflected
Imputation
325
Multiple Imputation
Multiple imputation (MI) was introduced by Rubin in
1978. It is an approach that retains the advantages
of single imputation while allowing the uncertainty
due to the process of imputation to be directly
assessed by the analyst using only complete-data software, thereby leading to valid inferences in many
situations. MI is a simulation technique that replaces
the missing values Ymis with m > 1 plausible values,
where each single imputation Ymis creates a completed
data set, and thus MI creates m completed data
326
Imputation
is
the
betweenB = m 1 1 m
l
MI
l=1
imputation variance; the factor (1 + m 1 reflects
the fact that only a finite number of completed-data
estimates ^
yl , l = 1, . . . , m are averaged together to
obtain the final point estimate, and the quantity
^
g = 1 + m 1 B=T estimates the fraction of information about y that is missing due to missing data.
Inferences from multiply imputed data are based on
^
yMI , T, and a students t reference distribution. Thus,
for example, interval
pffiffiffi estimates for y have the form
^
yMI t1a=2 T , where t1a=2 is the (1a=2
quantile of the t-distribution. Rubin and Schenker prog 2 for the
vided the approximate value vRS =m1^
degrees of freedom of the t-distribution, under the
assumption that with complete data, a normal reference
distribution would have been appropriate. J. Barnard
and Rubin relaxed the assumption of Rubin and Schenker to allow for a t reference distribution with complete
1
1 1
+^vobs
for
data, and proposed the value vBR =vRS
the degrees of freedom in the multiple-imputation analgvcom vcom +1vcom +3, and
ysis, where ^vobs =1^
vcom denotes the complete-data degrees of freedom.
Inbound Calling
327
INBOUND CALLING
Telephone survey calls involving call centers are classified as inbound or outbound, depending on whether
the call is being received by the call center (inbound)
or initiated in the call center (outbound).
Inbound calling in the survey research context usually arises from one of the following situations:
As support for an existing mail, Web, computerassisted telephone interview (CATI), or in-person survey. In a pre-survey letter or email to the respondent,
or on the actual instrument in the case of a selfcompletion survey, a phone number is provided for
respondents to call if they have any questions about
the survey or if they want to schedule a particular time
for an interviewer to call. Depending on the survey
design, respondents might also be given the option to
complete the survey via interview during the inbound
call.
As an additional data collection mode in a multimode survey. Longitudinal surveys, where a reliable
mail or email address exists for communication,
often offer respondents an incentive to phone in and
complete the interview at their own convenience
before the main field phase. Web and mail surveys
often contain a phone number for respondents to
call if they would prefer an interview to a selfcompletion mode.
As the only data collection option, as might be
appropriate for surveys of
328
Incentives
INCENTIVES
Providing incentives, such as cash, to potential survey
respondents is an effective way to increase response
rates and thereby possibly reduce the potential for nonresponse bias. Incentives work best when combined
with a multiple contact recruitment approach, but incentives demonstrate their effectiveness in improving
response rate even at the time of first contact. For
interviewer-mediated surveys, the judicious use of
incentives can reduce the number of contacts required
to complete an interview. Because incentives increase
early responses, incentives have demonstrated their
time-efficiency and cost-effectiveness by reducing the
labor and postage costs of additional contacts.
Theories of Incentives
Several theories are used to explain why incentives
work. The most common explanations rely on social
Incentives
Cognitive dissonance, as explained by social psychologist Leon Festinger and his colleagues in the
329
330
Incentives
that the relationship is not strictly linear. These conflicting findings suggest that there are linear effects
with lower amounts of incentives (where each additional dollar produces a higher response rate) and
a flattening effect with higher amounts of incentives
(additional dollars produce little change in response
rate, e.g., going from $20 to $25). The diminishing
return between the size of the incentive and response
rate casts doubt on economic exchange theory of
incentives as the sole or primary explanation of how
incentives work.
While noncash incentives are less effective than
monetary awards, some noncontingent, noncash awards
produce significant increases in response rates. Pens
and charity donations appear to be the favored options
for noncash incentives among those that have been
reported in the research literature. Pens vary in effectiveness in increasing response rate, but the research on
the role of charity contributions is more conclusive.
Charity donations on behalf of the respondents do not
produce a significant increase in response rates.
Data Quality
Initially, survey researchers worried that incentives
would reduce data quality by reducing intrinsic motivation to complete a survey, leading to less thoughtful
or less complete responses. Subsequent research has
largely appeased this concern. Studies find that incentives improve or have no effect on data quality. When
comparing item nonresponse to closed-ended questions, research tends to be equally split between finding
no differences and finding fewer item nonresponses in
the incentive groups, compared to no-incentive groups.
For example, the Singer meta-analysis of interviewermediated studies found that about half of the studies
had no differences in data quality and half of the studies found improved data quality, when respondents
received an incentive. When examining item nonresponse and length of response to open-ended questions,
research suggests that incentives lead to higher-quality
responses, as indicated by more distinct ideas and longer responses.
Response Bias
Survey researchers worry that incentives might alter
survey responses, producing response bias. The possibility of response bias has been evaluated by comparing responses from those respondents who receive an
Incentives
Sample Bias
In general, studies have found few differences in the
demographic composition of their samples, when
comparing no-incentive groups and those groups who
received incentives of various types and sizes. Thus,
incentives appear to be effective in recruiting respondents from a broad range of demographic groups.
Unfortunately, these findings also suggest that incentives may not be effective in recruiting respondents
who are especially hard to reach or otherwise unlikely
to participate in surveys (e.g., young adults), unless
differential amounts of incentives are given to such
subgroups. For example, Singer and colleagues
reported that only one third of the studies in their
meta-analysis provided support for the role of incentives in improving the representation of groups that
typically are underrepresented (low-income and nonwhite) in surveys, but these studies did not provide
differential incentives to these groups. Most of the
interviewer-mediated studies affirm that incentives do
not change the demographic composition between
incentive and no-incentive groups, unless differential
incentives are used. Nor is there reliable and consistent evidence that incentives reduce nonresponse bias.
331
Further Readings
332
Independent Variable
Further Readings
INDEPENDENT VARIABLE
In survey research, an independent variable is
thought to influence, or at least be correlated with,
another variable: the dependent variable. For example, researchers hypothesize that childhood exposure
to violent television can lead to violent behavior
in adulthood. In such a study, exposure to violent
television programming as a child is an independent
variable and violent behavior in adulthood is the
dependent variable.
An independent variable is commonly denoted by
an x and a dependent variable by y, with the implication that x causes y or, in the case of noncausal
covariation, x is related to y.
Determining whether one variable influences
another is of central importance in many surveys and
studies, as making this determination helps researchers accept or reject hypotheses and thereby build
social science knowledge. Relationships between variables help researchers to describe social phenomena.
In experimental studies, with random assignment of
respondents to experimental conditions, a researcher
can choose which variables are independent, because
these are the variables controlled by the researcher. In
population studies, patterns in data help researchers
determine which variables are independent.
More than one independent variable may influence
a dependent variable. Quantitative tools and approaches
can assist researchers in accepting or rejecting their
hypotheses about the relationships among independent
variables and a dependent variable. In some analyses,
researchers will control for the influence of certain
independent variables in order to determine the strength
of the relationship for other independent variables.
Using the example of childhood exposure to television violence again, another independent variable in
the study could be parental control over television
viewing. Yet another independent variable could be
level of physical violence in the home. The complexity of a hypothesized causal model such as this
increases with the number of independent variables
and interaction effects among independent variables.
Heather H. Boyd
See also Dependent Variable; Experimental Design;
Interaction Effect; Noncausal Covariation; Random
Assignment
INELIGIBLE
The ineligible disposition is used in all surveys,
regardless of the mode (telephone, in-person, mail, or
Internet). Ineligible cases are cases that were included
in the sampling frame but fail to meet one or more of
the criteria for being included in the survey. Cases
coded as ineligible do not count in computing survey
response rates.
One of the most common reasons a case would be
coded as ineligible is when a survey uses screening criteria to determine whether the respondent, household, or
organization contacted as part of a survey is eligible to
complete the survey. For example, a survey may require
respondents or households to be located within a specific
geographic area, such as a specific county, town or
village, or neighborhood. A case would be considered
ineligible if it were discovered that the respondent or
household was located outside of the geographic boundaries of the survey population. In most instances, if it
were discovered during the screening process that the
sampled respondent had moved out of the geographic
boundaries of the survey during the field period, that
case also would be considered ineligible.
An additional example of how screening criteria may
result in a case being considered ineligible occurs in surveys of the general population. These surveys use residential status as screening criteria, and as a result, all
cases that result in contact with a nonresidential unit,
such as businesses, schools, or governmental organizations, would be considered ineligible. In in-person surveys, this often is discovered when an interviewer visits
a sampled address and discovers that it is not a residence. In telephone surveys, this would be discovered
when interviewers make telephone calls to businesses;
fax or data lines; nonworking, changed, and disconnected telephone numbers and numbers that reach
pagers. In landline telephone surveys, numbers that
reach cell phones would be treated as ineligible. Also,
in a telephone survey an answering machine message
Inference
INFERENCE
Inference is a process whereby a conclusion is drawn
without complete certainty, but with some degree of
333
Design-Based and
Model-Based Inferences
There are two approaches to making inferences from
survey data. First, in the design-based approach, inferences are made by looking at how statistics vary as
samples are repeatedly drawn using the same sampling
procedures as were employed in the actual sampling.
Second, in the model-based approach, inferences
are made by looking at how statistics vary as the
population, as described by a probability model, is
allowed to vary without changing the sample. The
model-based approach is also called the prediction
approach because the model is used to predict the
population units not in the sample. It is called the
superpopulation approach as well because the population can be regarded as selected from a still larger
population according to the probability model.
Inference procedures (e.g., hypothesis testing or
estimating confidence intervals) can be carried out
under either the design-based or the model-based
approach. The design-based approach is more traditional in survey sampling. The model-based approach,
on the other hand, is more consistent with statistical
approaches used outside of survey sampling.
Confidence Intervals
Confidence intervals allow one to infer with a high
degree of confidence that a quantity being estimated lies
within an interval computed by a specified procedure.
334
Informant
Hypothesis Testing
The purpose of hypothesis testing is to ascertain
whether an observed difference in the sample is statistically significant or whether it can instead be adequately explained by chance alone. Hypothesis tests
are designed so that, if there is in fact no difference,
the probability of (erroneously) rejecting the hypothesis
that there is no difference (i.e., the null hypothesis) is
kept to a specified low level; often this probability,
called the Type I error, is set to .05. A well-designed
hypothesis test will also minimize the other potential
error on inference, namely, not rejecting the hypothesis
of no difference when a difference actually exists (i.e.,
Type II error). In survey sampling, it is often the case
that two sample averages are independent and approximately normally distributed so the hypothesis that their
difference is zero can be tested using properties of the
normal distribution (this is called a t-test).
There is a close relationship between confidence
intervals and hypothesis testing in that a hypothesis of
no difference is rejected if and only if the confidence
interval for the difference does not include zero. If one
has confidence intervals for each of two independent
averages and the confidence intervals do not overlap,
one may reject the hypothesis of no difference between
the two averages. But if the two confidence intervals
do overlap, it is still possible that the hypothesis of no
difference in the sample averages can be rejected.
Michael P. Cohen
See also Confidence Interval; Design-Based Estimation;
Model-Based Estimation; Null Hypothesis; Point
Estimate; Regression Analysis; Superpopulation; t-Test;
Type I Error; Type II Error
Further Readings
INFORMANT
An informant in a survey is someone asked to provide
information about another person, persons, or an organization, for example, when a parent is interviewed
about a young child who could not answer the survey
questions. Informants (also known as proxies) tend to
be used in surveys when the target respondent is
unable to respond or when it is not feasible to collect
responses from all members of a group under study.
As the use of informants to collect quantitative data
has become integral to survey research, due to the
cost-effectiveness of the approach, so has the study of
the effects of using informants on the data that are
collected. The substitution of informants limits the
types of data that can be collected with accuracy, and
proper selection methods should be used to minimize
the resulting response bias that has been noted to
occur occasionally when data collected from informants are compared to self-reported data.
There are several types of surveys in which the use
of informants is the most efficient means of collecting
responses. Informants are frequently used in surveys
when members of the population under study are
unable to provide responses because a physical or
cognitive impairment prevents them from responding.
Because informants are the only way to collect information about these target populations, informants are
used despite the fact that several methodological
experiments have shown that using informants produces response bias. Specifically, in surveys asking
about disability, informants tend to overreport more
obvious types of disability (such as difficulty with
activities of daily living) and to underreport less obvious types (such as mental health problems).
Another survey situation in which the informant
method of data collection has been used is when it
is not economically feasible to interview all individual respondents, such as all members of a household
or an organization. Studies using this method generally develop selection rules that ensure that the
selected informant is likely to be able to provide
accurate responses about others in the household or
Informed Consent
Further Readings
335
INFORMED CONSENT
As outlined in the Belmont Report of 1979, the core
elements underlying the ethical treatment of research
participants are autonomy (respect for persons),
beneficence, and justice. Providing adequate information and obtaining active consent for research participation are central to autonomy and respect for
persons. Acknowledging the importance of autonomy
requires that every potential research participant must
be afforded adequate time and opportunity to make
his or her own informed and voluntary decision about
whether or not he or she wishes to participate in
a research study. This requires the provision of adequate information about the study and, in theory, also
requires that no pressure be exerted to participate.
The principle of autonomy also requires that special
protections be given to potentially vulnerable populations such as minors, the mentally ill, or prisoners.
Individuals in these groups may be in a position of
increased potential for coercion (e.g., prisoners) or
may be less capable of understanding information that
would enable them to make an informed decision
about study participation.
Informed consent includes the process by which
research participants gain an understanding of the procedures, risks, and benefits that may be associated with
their taking part in a study. In virtually all surveys, the
key elements of voluntary and informed consent
can be provided in a concise way at the beginning of
a telephone or face-to-face interview, in a cover letter
for a self-administered survey, or in the introductory
screen of a Web or other electronic survey. This is true
regardless of level of risk and is consistent with the
contemporary view of consent as an ongoing process
rather than a paper document. The main elements of
consent include the following:
An explanation of the purpose(s) of the study
An indication of the approximate amount of time it
will take to complete the study
A description of what the respondents will be asked
to do
A description of any foreseeable risks or discomforts,
if any
A description of any direct benefits to the respondents
or others
A statement describing the extent to which responses
will be confidential
A statement of the voluntary nature of participation
336
Informed Consent
immigrants) and may reduce cooperation unnecessarily. Telephone surveys utilizing random-digit dialing
and Web survey invitations sent via email cannot incorporate signed consent in the protocol prior to the initial
contact because respondents names and street addresses
are unknown to the researcher. Thus, a waiver of documentation of consent is typically the most desirable
approach for most survey protocolsespecially those
utilizing telephone and electronic modes.
In the typical survey that presents minimal risk,
lengthy and detailed information about the objectives
of the survey and the questions to be asked may
increase respondent burden and bias responses without safeguarding respondent rights. In these surveys,
the usual practice of a short introductionincluding
the purpose of the study; the approximate amount
of time it will take; the sponsor, responsible survey
organization, or both; and the general topics to be
coveredis typically deemed sufficient. This statement should also include information about the confidentiality of the responses.
More detailed information is required when survey participation may pose substantial risk. In general, respondents should be informed that the
content includes sensitive topics or questions about
any illegal behaviors, but they should not be told so
much as to bias their answers (e.g., they should not
be informed of the study hypothesis). This is consistent with much other social science research performed in laboratory settings where explanations of
the hypotheses at the outset would render the study
invalid, although such studies may require special
debriefing of the respondents after the survey is
completed. It is also important to include a statement
indicating that respondents can skip questions that
cause them discomfort and questions they do not
want to answer.
The introductory statement and the reminders about
the voluntary nature of response help ensure respondent autonomy (respect for persons) without affecting
substantive responses. If appropriate, at the end of the
interview, respondents can be debriefed to see if any of
the matters covered were upsetting, to give further
information on study purposes, or to answer respondent
questions.
Mary E. Losch
See also Debriefing; Ethical Principles; Institutional Review
Board (IRB); Protection of Human Subjects; Voluntary
Participation
Further Readings
337
338
the center made an institutional move from the College of Literature, Science, and Arts to ISR, becoming
the institutes fourth center.
Training future generations of empirical social scientists is the long-term commitment of the ISR. The ISR
has offered the Summer Institute in Survey Research
Techniques for more than 60 years; designed to meet
the needs of professionals, the Summer Institute teaches
practice, theory, implementation, and analysis of surveys. For more than 45 years, the Inter-University
Consortium for Political and Social Research (ICPSR)
Summer Program has provided studies in basic methodological and technical training, data analysis, research
design, social methodology, and statistics, in addition to
advanced work in specialized areas. A graduate-level
Program in Survey Methodology was established in
2001, seeking to train future survey methodologists in
communication studies, economics, educations, political
science, psychology, sociology, and statistics. The program offers a certificate, master of science, and doctorate
degrees through the University of Michigan.
Jody Smarr
See also Consumer Sentiment Index; Joint Program in
Survey Methods (JPSM)
Further Readings
INSTITUTIONAL REVIEW
BOARD (IRB)
Institutional review boards (IRBs) are committees
charged with the review and monitoring of research
(including surveys) involving human participants. The
basic principles of human research protection used
today in the United States were outlined in the
Nuremberg Code and were developed in response to
the Nazi atrocities. Voluntary informed consent to
research participation is at the core of that code. In
response to research participant abuse in the first half
of the 20th century, IRBs were mandated in the
United States by the Research Act of 1974. In 1978,
Interaction Effect
potential risks and potential benefits; (6) determination of the adequacy of privacy and confidentiality
protections; and (7) determination of the best review
intervals and any necessary monitoring of data
collection.
The IRB has the authority to approve a protocol, to
disapprove a protocol, or to ask for revisions or modifications of a protocol before approving the protocol.
The IRBs decision and any required modifications
are made in writing to the investigator. If a project is
disapproved, the investigator is notified of the rationale and is provided with an opportunity to respond to
the IRB regarding the disapproval.
Many protocols require review by the fully convened IRB; however, most minimal risk research
receives an expedited review (conducted by the IRB
chair or his or her designee) if it falls into one of
seven categories defined by the regulations. Other
minimal risk research may be reviewed initially and
then granted an exemption from review if it meets
specific criteria defined by the regulations and the
local IRB. The determination of whether or not
a research project meets the definition for exempt or
expedited review is discretionary on the part of the
IRB. The application of the various types of review is
determined by the local IRB. At most academic institutions, for example, all research is reviewed by the
IRB to assure that basic ethical standards are met.
Minimal risk surveys are typically reviewed under an
exempt or expedited category.
The IRB typically requires documentation of
informed consent (e.g., a form signed by the participant that outlines the project, the risks, and benefits).
However, the IRB may also approve a waiver of documentation of consent. This often is done for survey
projects where obtaining a signature would not be
possible or feasible or necessary in light of the minimal risks involved. For example, in a random-digit
dialing telephone interview, consent is obtained (typically in oral mode), but no signature is required.
The IRB must review projects at least annually
(for those lasting more than 1 year) but may require
a shorter interval for projects that are more than minimal risk. The IRB is also authorized to have one or
more members observe the recruiting, consent, and
research process or may enlist a third party to observe
to ensure that the process meets the desired ethical
standards and desired levels of risk and confidential
treatment of data. A protocol approved by an IRB
may still be disapproved by the institution for some
339
Further Readings
INTERACTION EFFECT
An interaction effect is the simultaneous effect of two
or more independent variables on at least one dependent
variable in which their joint effect is significantly greater
(or significantly less) than the sum of the parts. The
presence of interaction effects in any kind of survey
research is important because it tells researchers how
two or more independent variables work together to
impact the dependent variable. Including an interaction
term effect in an analytic model provides the researcher
with a better representation and understanding of the
relationship between the dependent and independent
variables. Further, it helps explain more of the variability in the dependent variable. An omitted interaction
effect from a model where a nonnegligible interaction
does in fact exist may result in a misrepresentation of
the relationship between the independents and dependent variables. It could also lead to a bias in estimating
model parameters.
As a goal of research, examination of the interaction
between independent variables contributes substantially
Interaction Effect
80
% Response rate
340
Federal
Express (B2)
60
40
First
Class (B1)
20
0
$0(A1)
$1(A2)
$ Incentive (A)
Figure 1
Ordinal interaction
$5(A3)
40
30
20
Federal
Express (B2)
10
0
341
50
First
Class (B1)
$0 (A1)
$1 (A2)
$5 (A3)
% Response rate
% Response Rate
Interaction Effect
Federal
Express (B2)
40
30
First
Class (B1)
20
10
0
$0 (A1)
$ Incentive (A)
$1 (A2)
$5 (A3)
$ Incentive (A)
Figure 2
Disordinal interaction
Figure 3
No interaction
342
Y^ = bo + b1 D + b2 T + b3 D x T
Y^ = bo + b1 + b2 + b3 ) T
An interaction effect between two continuous independent variables on a dependent variable is expressed as
a multiplicative term in a multiple regression analysis.
A full regression model that predicts, say, parents
attitude toward schools (Y) from schools involvement
in the community (SIC) and parents social economic
status (SES) as two continuous independent variables
and their interaction (SIC SES) is presented in
Equation 4.
4
Y^ = bo + b1 SIC + b2 SES:
Further Readings
touchtone data entry. These terms all refer to computerized telephone data collection systems where
respondents answer survey items via automated
self-administered procedures, as opposed to giving
verbal answers to live interviewers.
IVR has two primary uses in survey research. First,
IVR can be used to replace interviewer-administered
telephone data collection. Potential respondents may be
contacted first by telephone interviewers and then
switched to IVR, or they can be contacted by another
mode (e.g., mail or Internet) and provided a call-in
number to use for completing an IVR interview.
Another use of IVR is to provide telephone survey
respondents greater privacy in responding to questions
of a potentially sensitive nature. When used for this second purpose, IVR interviews are typically conducted
through initial contact from telephone interviewers who
then switch respondents to IVR for the sensitive items.
The respondent is then switched back to the interviewer
after answering the sensitive questions. Regardless
of the specific purpose, IVR data collection typically
involves a relatively short, simple interview or a brief
module that is part of a longer interview.
IVR offers a number of potential advantages over
interviewer-administered modes of telephone data collection. First, because a pre-recorded voice is employed
to administer all survey items, IVR respondents are all
read questions, response options, and instructions in
the exact same way. This provides a higher degree of
interview standardization than interviewer-administered
telephone data collection, where interviewers vocal
qualities, reading skills, and presentation skills vary.
A second advantage of IVR is providing greater
privacy for answering questions, especially survey
items of a potentially sensitive nature that could be
affected by socially desirable reporting. Because IVR
respondents enter their answers by pressing touchtone
buttons, they do not have to be concerned about giving their responses to a live interviewer or about
others (e.g., family members) hearing their responses.
Research indicates respondents in IVR mode give
more nonnormative answers to questions of a sensitive
nature compared to respondents in computer-assisted
telephone interviewing (CATI) mode. Examples include
greater reports of illicit substance use, certain sexual
behaviors, and negative satisfaction ratings.
Third, when IVR surveys are conducted with no
interviewer involvement, this mode can be a much more
cost-effective method than CATI. This version of IVR
data collection saves on both the costs of recruiting,
343
training, and supervising interviewers as well as telecommunication costs. Because telephone charges are
incurred only when potential respondents call in to
complete IVR interviews, these costs are lower than
costs in the interviewer-administered mode where
multiple outbound calls are typically made to contact
each sampled unit.
IVR does have some potential limitations compared
to interviewer-administered modes of telephone data
collection. First, IVR mode provides unique opportunities for unit nonresponse or incomplete interviews.
When respondents are left to complete an IVR interview on their own time, they are less likely to participate without the motivation provided by interviewers.
Similarly, in surveys where interviewers first recruit
respondents and then switch them to IVR mode,
respondents have an opportunity to terminate the interview at that point, without being exposed to the persuasive efforts of an interviewer.
Second, concerns about respondent patience and
terminations limit the length and complexity of IVR
surveys. More lengthy IVR surveys introduce considerable risk that respondents will terminate interviews
prior to completion, because no interviewer is present
to encourage continued participation. Complicated
surveys may also increase the risk of respondents
either terminating the interview or providing inaccurate responses or no responses to some items to
reduce the survey burden.
Third, the lack of an interviewer creates a potential
risk of measurement error in IVR interviews, as in
other self-administered modes of data collection.
Without an interviewer to clarify response tasks,
probe inadequate responses, and record answers accurately, IVR data can introduce other sources of
respondent-related measurement error that are not as
likely to occur in interviewer-administered surveys.
Although IVR systems can provide a help button for
respondents to use when they are having difficulty
completing a question, respondents may be hesitant to
use this option when they need it or they may not
receive sufficient help to accurately answer an item.
Finally, when IVR data collection is used as part
of an interviewer-administered survey where interviewers recruit and screen eligible subjects, cost savings compared to CATI data collection mode may
not be realized. Adding an IVR module to a CATI
survey would actually add costs related to the additional programming and management required for
using these two modes. Costs for recruiting, training,
344
Intercoder Reliability
INTERCODER RELIABILITY
Intercoder reliability refers to the extent to which two
or more independent coders agree on the coding of the
content of interest with an application of the same coding scheme. In surveys, such coding is most often
applied to respondents answers to open-ended questions, but in other types of research, coding can also be
used to analyze other types of written or visual content
(e.g., newspaper stories, peoples facial expressions, or
television commercials). Intercoder reliability is often
referred to as interrater or interjudge reliability. Intercoder reliability is a critical component in the content
analysis of open-ended survey responses, without
which the interpretation of the content cannot be considered objective and valid, although high intercoder
reliability is not the only criteria necessary to argue that
coding is valid. Intercoder reliability is a standard measure of research quality, and a low level of intercoder
reliability may suggest weakness in coding methods,
including poor operational definitions with unclear coding categories and poor coder training.
Although there are more than 30 different statistical
measures or indices of intercoder reliability, only
a handful of measures are widely used and there is no
consensus on the single best measure. Among all, for
Internal Validity
Further Readings
INTERNAL VALIDITY
As explained in the 1960s by Donald Campbell and
Julian Stanley in their seminal book Experimental
and Quasi-Experimental Designs for Research, internal
validity refers to the extent to which the methodological
research design used by a researcher can provide empirical evidence to test the possible cause-and-effect
relationship between an independent variable (the antecedent), X, and a dependent variable (the consequence),
Y. Without adequate internal validity, researchers may
offer logical, reasoned arguments to speculate about the
possible causal nature of any correlational relationship
they observe between X and Y in their data, but they
cannot use the internal strength of their research design
to bolster such reasoning. Thus, although researchers
can (and routinely do) draw casual inferences about X
causing Y based on speculation from the results of
many types of nonexperimental research designs, it is
only with carefully planned experiments and quasiexperiments that researchers can draw internally valid
conclusions about cause-and-effect relationships with
a high degree of confidence.
Unfortunately, many survey researchers have not
received training in experimental design and do not
appear to appreciate the importance of planning
345
Example of a Survey
Lacking Internal Validity
The results of a nonexperimental pre-election study
on the effects of political advertising were posted on
the American Association Public Opinion Research
listserve (AAPORnet) in 2000. This survey-based
research was conducted via the Internet and found
that a certain type of advertising was more persuasive
to potential voters than another type. By using the
Internet as the data collection mode, this survey was
able to display the adswhich were presented as digitized video segmentsin real time to respondents/
subjects as part of the data collection process and,
thereby, simulate the televised messages to which
voters routinely are exposed in an election campaign.
Respondents were shown all of the ads and then asked
to provide answers to various questions concerning
their reactions to each type of ad and its influence on
their voting intentions. This was done in the individual respondents own home in a room where the
respondent normally would be watching television.
Here, the Internet was used very effectively to provide
mundane realism to the research study by having survey respondents react to the ads in a context quite
similar to one in which they would be exposed to real
political ads while they were enjoying a typical evening at home viewing television. Unlike the majority
of social science research studies that are conducted
under conditions far removed from real life, this study
went a long way toward eliminating the potential
artificiality of the research environment as a serious
threat to its overall validity.
Another laudable design feature of this study was
that the Internet sample of respondents was chosen
with a rigorous scientific sampling scheme so that it
could reasonably be said to represent the population of
potential American voters. The sample came from
a large, randomly selected panel of households that
had received Internet technology (WebTV) from the
survey organization in their homes. Unlike most social
science research studies that have studied the effects of
political advertising by showing the ads in a research
346
Internal Validity
costs and yet would have been more powerful in supporting causal interpretations.
Conducting Internally
Valid Survey Research
Cook and Campbell define internal validity as the
approximate validity with which one infers that
a relationship between two variables is causal or that
the absence of a relationship implies the absence of
cause. There are three conditions for establishing that
a relationship between two variables (X and Y) is
a causal one, as in X causes Y. The researcher
must demonstrate
1. Covariation, that there is a reliable statistical relationship between X and Y.
2. Temporal order, that X occurs before Y occurs.
3. An attempt to eliminate other plausible explanations than changes in X for any observed changes
in the dependent variable (Y). The use of a true
experimental design with random assignment of
respondents to different levels of X is of special
value for this last condition to be met.
Internal Validity
347
Too often the selection of the respondents that constitute different comparison groups turns out to be the
main threat to a studys internal validity. For example,
if a survey researcher sampled two different municipalities and measured the health of residents in each
community, the researcher would have no statistical
or methodological grounds on which to base any attributions about whether living in one community or the
other caused any observed differences between the
average health within the respective communities. In
this example, no controlled effort was built into the
study to make the two groups equivalent through random assignment, and of course, in this example that
would not be possible. As such, any observed differences between the health in one community versus
another could be due to countless other reasons than
place of residence, including a host of demographic
and behavioral differences between the residential
populations of each community.
Thus, any time two (or more) groups have been
selected for comparison via a process other than random assignment, the researchers most often will have
no solid grounds on which to draw valid inferences
about what may have caused any observed difference
between the two groups. (An exception is the possibility
that a researcher deployed a quasi-experimental design,
one without true random assignment, but one that may
348
Internal Validity
Internal Validity
validity because the dependent variable often is gathered immediately after the administration of the independent variable (e.g., most wording experiments built
into a questionnaire require that the respondent answer
the question immediately after being exposed to the
wording), but in other instances the researcher must be
very conscious of the possibility that history may have
undermined the integrity of the experimental design.
349
Instrumentation
Mortality
350
Internal Validity
Internal Validity/Random
Assignment Versus External
Validity/Random Sampling
Now that it has been explained that random assignment is the cornerstone of experimentation and the
establishment of internal validity of a research design,
it is worth observing that many survey researchers
and students new to the field appear to confuse random assignment with random sampling, or at least
seem not to appreciate the distinction. Random sampling is very much a cornerstone of external validity,
especially when it is done within the context of a probability sampling design. The beauty and strength of
high-quality survey research is that a researcher often
can meld both random assignment and random sampling, thereby having strong internal validity and
strong external validity.
Researchers who use the survey mode of data collection typically are much more familiar with the science of sampling than they are with the science of
experimentation. Although many of them may not
have prior familiarity with the term external validity,
they are very familiar with the principles underlying
the concerns of external validity: If one wants to represent some known target population of interest accurately, then one best utilize a sampling design that (a)
well represents that population via a properly constructed sampling frame and (b) uses a random probability sampling scheme to select respondents from it,
thereby allowing one to generalize research findings
from the sample to the population with confidence and
within a known degree of sampling error. Within
a total survey error framework, the avoidance of coverage error and nonresponse error are each a necessary
condition for achieving strong external validity, and
together they comprise the sufficient condition. Thus,
survey researchers need to use sampling frames that
fully cover the target population they purport to represent and need to achieve an adequate response rate
that avoids meaningful nonresponse bias.
The linkage between internal validity and external
validity concerns whether any cause-and-effect relationship that has been observed in a survey research
Conclusion
One cannot understate the importance of the research
power that is afforded by an experimental design
in allowing a researcher to test the causal nature of
the relationship between variables with confidence
(i.e., with strong internal validity). As noted earlier,
this often can be done at little or no additional cost in
the data collection process, and sometimes it can even
save costs as it may reduce the amount of data that
must be gathered from any one respondent.
Paul J. Lavrakas
See also American Association for Public Opinion Research
(AAPOR); Convenience Sampling; Coverage Error;
Dependent Variable; Differential Attrition; Experimental
Design; External Validity; Independent Variable;
Interaction Effect; Noncausal Covariation; Nonresponse
Error; Random Assignment; Self-Selected Sample;
Split-Half; Total Survey Error (TSE); Validity
Further Readings
The Conference
The intent of the IFD&TC (according to its charter)
is to provide informal [emphasis added] interaction
between field director, field technology, and survey
management personnel of a type not usually available
in professional conventions or through professional
journals. The sessions [are] informal and focus on work
in progress or recently completed, and on exchanges of
information, practices, and opinions on relevant subjects of common interest. Finished papers ready for
publication, public distribution, or production in official formal proceedings are not required [and not
encouraged]. Extensive time is provided for discussion during sessions and for casual interchange during lengthy breaks between sessions. Because the
attendees generally do not represent organizations
in competition with each other, the discussions also
are unusually frank, open, and mutually sharing.
These characteristics have fostered strong personal
loyalty by regular attendees and a welcoming attitude toward those new to the field.
Presentations frequently focus on the practical
aspects of survey data collection by personal interviews,
351
352
History
Further Readings
IFD&TC: http://www.ifdtc.org
IFD&TC. (1993). IFD&TC charter (as amended 1999).
Retrieved October 22, 2007, from http://www.ifdtc.org/
charter.htm
353
354
ISSP Themes
Further Readings
355
356
Internet Surveys
Further Readings
INTERNET SURVEYS
Internet surveys refer to surveys that sample respondents via the Internet, gather data from respondents
via the Internet, or both. Using the Internet to conduct
survey research provides a great many opportunities
and a great many challenges to researchers.
Internet Surveys
Advantages
The major advantage of the use of the Internet in data
collection is the very low cost per respondent, as compared to other modes of data collection. This has made
the Internet survey an extremely attractive option to
a wide range of survey researchers, primarily in the
areas of opinion polling and market research, where
the principles of probability sampling are not always
considered as being of prime importance and large
numbers of respondents are judged as valuable. The
initial set-up costs entailed in the design of high-quality
collection instruments via the Internet may be somewhat higher than those required for the design of paper
questionnaires or computer-assisted telephone interviewing (CATI) instruments. However, this is more than
offset by the current operational savings, due to selfadministration of the survey instrument. The savings
in the direct costs of interviewers, their training and
control, are substantial. While other self-administered
instruments, such as mail questionnaires and simple
email collection, share with Internet surveys the advantage of not requiring the intermediary function of interviewers, for Internet surveys the costs involved in the
control of unit and item nonresponse, callbacks, and
editing are minimal and lower, in general, than those for
other self-administered modes.
An important advantage of the use of the Internet
for data collection lies in the advanced enhancements
of the visual and aural aspects of the collection instrument. The use of color, animation, and even video and
audio effects can, if used with care, facilitate the completion of the questionnaire for the respondent. The
real-time interaction between the collection instrument
and the respondent is a definite improvement over the
fixed form of questionnaire required by other modes of
data collection. Thus the use of drop-down menus, the
possibilities to refer easily by hyperlinks and radio
boxes or buttons to instructions and classifications,
the possibilities to display photos of products or magazine covers, and other features all make the task of
completing an Internet questionnaire much easier than
for conventional instruments. Online editing, logical
checks, and complex skip patterns can be employed in
ways that are virtually invisible to the respondent.
357
358
Internet Surveys
Interpenetrated Design
Future Developments
The Internet is a powerful and inexpensive method of
data collection, with many advantages and enormous
potential in cases where it can be used in the context of
probability sampling. However, in many current applications, coverage and frame problems prevent its being
used for probability samplingbased surveys to ensure
valid inferences. Future developments may change the
situation. Thus the proposed introduction of a unique
universal personal communications number for use with
all modes of telecommunication (fixed-line and mobile
phones, fax, and email) may solve many of the problems associated with coverage and absence of frameworks, at least for multi-mode surveys.
Gad Nathan
See also Computer-Assisted Personal Interviewing (CAPI);
Computer-Assisted Telephone Interviewing (CATI);
Coverage; Coverage Error; Drop-Down Menus; Email
Survey; Face-to-Face Interviewing; Missing Data;
Mixed-Mode; Multi-Mode Surveys; Partial Completion;
Propensity-Weighted Web Survey; Radio Buttons;
Sampling Frame; Undercoverage; Web Survey
Further Readings
359
INTERPENETRATED DESIGN
An interpenetrated survey design is one that randomly
assigns respondent cases to interviewers. This is done
to lower the possibility that interviewer-related measurement error is of a nature and size that would bias
the surveys findings. This type of design addresses
survey errors associated with the survey instrument
and the recording of responses by the interviewer. One
way to reduce subjective interviewer error is to develop
a survey using an interpenetrated designthat is, by
ensuring a random assignment of respondents to interviewers. Surveys employing an interpenetrated design,
when such is warranted, will tend to reduce the severity of interpretation errors resulting from the conflation
of interviewer bias with some other statistically relevant variable that might serve as a basis for assigning
respondents. It will also typically reduce the overall
standard error of response variance, especially for types
of questions that inherently require some judgment or
interpretation in recording by the interviewer.
Example of an Interpenetrated
Survey Design
Assume a survey of 100 women from known high-risk
populations (e.g., low income, substandard education,
history of domestic violence), who are being queried
about their tobacco use. The survey will be administered face-to-face by five interviewers and will feature
a mix of demographic and binary-response questions,
as well as several open-ended questions about the
respondents psychosocial triggers for smoking that the
interviewer will interpret and assign a clinical risk
index score.
In an interpenetrated design, the 100 women will
be randomly assigned to each of the five interviewers.
This means that any potential skewing of recorded
results arising from bias or judgment by any single
interviewer will be relatively equally shared by all of
the respondents assigned to that interviewer and could
therefore be considered background noise in terms
of finding correlations within and among classes in
the data. By contrast, in a noninterpenetrated design,
it is possible that a correlating variable or class could
be overemphasized or underemphasized by the relative weight of interviewer bias across a nonrandom
assignment of respondents to interviewers.
360
Interpenetrated Design
Challenges in Implementing
an Interpenetrated Design
The use of an interpenetrated design can mitigate the
inflation of statistical error engendered from systematic error in survey design, for surveys with measurement tools or questions that fail to adequately control
for interviewer bias, in cases where such bias could
affect findings.
It can be difficult to engineer an effective interpenetrated design, however. There may be situations,
particularly with large-scale face-to-face surveys,
when geography or interviewer expertise with a particular class of respondent reduces the designs capacity
to fully randomize the assignment of interviewer to
respondent. There may be some benefit to determining whether a mixed strategy might be appropriate,
with a partial randomization along respondent demographic, location, or cohort lines that are not believed
to be relevant to the hypothesis of the survey or in its
final analysis. As with any survey design, the question
of which variables should be considered relevant must
be approached with great caution.
Colm OMuircheartaigh and Pamela Campanelli,
in their targeted meta-analysis of the British Household Surveys of the 1990s, concluded that there was
a significant increase in the inflation of variance
rooted in measurable interviewer effects that were
comparable in scope to the variance attributed to survey-design effects. Their findings, and the work of
other statisticians, suggest that a clear understanding
of the interviewer effect on the intraclass correlation
(rho) is necessary for effective survey designs, and
the use of an interpenetrated design can be quite
Further Readings
Interval Estimate
361
Further Readings
INTERRATER RELIABILITY
The concept of interrater reliability essentially refers to
the relative consistency of the judgments that are made
of the same stimulus by two or more raters. In survey
research, interrater reliability relates to observations
that in-person interviewers may make when they gather
observational data about a respondent, a household,
or a neighborhood in order to supplement the data gathered via a questionnaire. Interrater reliability also applies
to judgments an interviewer may make about the respondent after the interview is completed, such as recording
on a 0 to 10 scale how interested the respondent appeared
to be in the survey. Another example of where interrater
reliability applies to survey research occurs whenever
a researcher has interviewers complete a refusal report
form immediately after a refusal takes place and how
reliable are the data that the interviewer records on the
refusal report form. The concept also applies to the reliability of the coding decisions that are made by coders
when they are turning open-ended responses into quantitative scores during open-ended coding.
Interrater reliability is rarely quantified in these
survey examples because of the time and cost it
would take to generate the necessary data, but if it
were measured, it would require that a group of interviewers or coders all rate the same stimulus or set of
stimuli. Instead, interrater reliability in applied survey
research is more like an ideal that prudent researchers
strive to achieve whenever data are being generated
by interviewers or coders.
An important factor that affects the reliability of
ratings made by a group of raters is the quantity and
the quality of the training they receive. Their reliability can also be impacted by the extent to which they
are monitored by supervisory personnel and the quality of such monitoring.
A common method for statistically quantifying the
extent of agreement between raters is the intraclass
correlation coefficient, also known as Rho. In all of
the examples mentioned above, if rating data are not
reliable, that is, if the raters are not consistent in the
ratings they assign, then the value of the data to
researchers may well be nil.
Paul J. Lavrakas
See also Coding; Content Analysis; Open-Ended Question;
Refusal Report Form (RRF); (Rho)
INTERVAL ESTIMATE
Interval estimates aim at estimating a parameter using
a range of values rather than a single number. For
example, the proportion of people who voted for a particular candidate is estimated to be 43% with a margin
of error of three (3.0) percentage points based on
a political poll. From this information, an interval estimate for the true proportion of voters who favored the
candidate would then consist of all the values ranging
from a low of 40% to a high of 46%which is usually
presented as (0.40, 0.46). If the interval estimate is
derived using the probability distribution of the point
estimate, then the interval estimate is often referred to
as a confidence interval where the confidence coefficient quantifies the probability that the process and
subsequent derivation will produce an interval estimate
that correctly contains the true value of the parameter.
While point estimates use information contained in
a sample to compute a single numeric quantity to estimate a population parameter, they do not incorporate
the variation in the population. Interval estimates, on
the other hand, make use of the point estimate along
with estimates of the variability in the population to
derive a range of plausible values for the population
parameter. The width of such intervals is often a function of the margin of error, which is itself a function
of the degree of confidence, the overall sample size,
and sampling design as well as the variability within
the population. In practice, intervals that are narrower
usually provide more specific and useful information
about the location of the population parameter as compared to wider intervals that are often less informative
or more generic (e.g., the population proportion of
voters in favor of a candidate is between 0 and 1 would
be an interval estimate that is not informative).
Interval estimates can be derived for any population
parameter, including proportions, means, totals, quantiles, variances, regression parameters, and so on. The
generic format of an interval estimate for the population parameter y can be written as ^y DV SE^
y
where DV represents a distribution value determined
362
Interval Measure
Sample
Size
Sample
Mean
Sample
Variance
150,000
10
$88.50
660.49
100,000
10
$108.90
900.00
50,000
10
$110.25
576.00
50,000
10
$100.00
784.00
50,000
10
$106.75
729.00
75,000
10
$176.40
1,296.00
25,000
10
$200.20
1,444.00
300,000
10
$98.70
529.00
Stratum
13.26
$109.83
Further Readings
INTERVAL MEASURE
An interval measure is one where the distance between
the attributes, or response options, has an actual meaning and is of an equal interval. Differences in the
values represent differences in the attribute. For example, the difference between 3 and 4 is the same as the
difference between 234 and 235. Interval measures
have fixed measurement units, but they do not have
a fixed, or absolute, zero point. Because of this, it is
technically not correct to declare that something is so
many times larger or smaller than something else,
although this often is done nonetheless.
Unlike other less sophisticated levels of measurement (e.g., nominal and ordinal measures), interval
measures have real meaning. The relationship between
the value and attribute is meaningful. For instance,
temperature (Fahrenheit or Celsius) can be divided into
groups of one degree and assigned a different value
for each of the intervals such that anything from 50
degrees to 50.99 degrees has a value of 50. The distance between 50 and 51 degrees has an actual value,
one degree. On the other hand, one cannot say that 90
degrees is twice as hot as 45 degrees because there is
not an absolute zero.
Within social science research, interval measures are
not particularly common because there are only a limited number of attributes that can take on an interval
form. When used, they tend to be based on constructed
Interviewer
363
INTERVIEWER
Interviewers are survey staff who contact the people
from whom the study seeks to gather data (i.e., respondents) to explain the study, encourage them to participate, and attempt to obtain a completed interview. Once
a sample member agrees to participate in a survey, the
interviewer is then responsible for administering the
survey questions (i.e., survey instrument) to the respondent. At times, the skill sets necessary to successfully
complete these two taskssample recruitment and data
collectiondiffer in conflicting ways. In encouraging
participation, interviewers must adapt (i.e., tailor) their
approach to gain cooperation based on their interaction
with the respondent, whereas for administering the
questionnaire in most surveys they are encouraged to
use a standardized, scripted approach.
Traditionally, there have been two types of survey
interviewers: telephone interviewers and field interviewers. Telephone interviewers administer survey
instruments over the telephone, whereas field interviewers go to a respondents home (or business for
business surveys) to contact respondents and complete
the in-person interview face-to-face. More recently
and at a growing pace, interviewers are also being
used to provide technical assistance (e.g., help desk)
to self-administered surveys, such as mail or Web.
This entry presents the responsibilities of interviewers, along with the various skills needed and the
training and supervision of interviewers. Next, this
entry discusses common interview techniques and the
impact of interviewers on the data collected. Lastly,
this entry addresses the importance of interviewers to
the data collection effort.
James W. Stoutenborough
See also Level of Measurement; Likert Scale; Mean;
Median; Mode; Nominal Measure; Ordinal Measure;
Ratio Measure
Further Readings
364
Interviewer
Interviewer
365
366
Interviewer
Importance of Interviewers
to the Data Collection Effort
As the main visible representatives of the survey,
and the people who are actually collecting the data,
interviewers play a crucial part in data collection
efforts. Interviewers are often the only contact that
a respondent has with the survey team and as such
need to provide a positive, professional image of the
study. Interviewers are also a key part of obtaining an
acceptable response rate by convincing respondents to
participate in the survey and by conducting refusal conversions on respondents who initially refuse to participate. If an interviewer is disinterested or ineffective in
obtaining cooperation from respondents, this can result
in significant refusals or passive noncooperation, which
can be detrimental to a studys response rate and data
quality.
Lisa Carley-Baxter
See also Computer-Assisted Personal Interviewing (CAPI);
Computer-Assisted Telephone Interviewing (CATI);
Conversational Interviewing; Gatekeeper; Interviewer
Characteristics; Interviewer Effects; Interviewer
Monitoring; Interviewer-Related Error; Interviewer
Training; Interviewer Variance; Nondirective Probing;
Refusal Avoidance; Refusal Conversion; Respondent;
RespondentInterviewer Rapport; Standardized Survey
Interviewing; Survey Costs; Tailoring; Verbatim
Responses
Further Readings
Interviewer Characteristics
INTERVIEWER CHARACTERISTICS
Interviewer characteristics refer to the personal attributes of the interviewer who is conducting a survey
with a respondent. These attributes may include physical attributes, such as gender, age, and voice qualities, and attitudinal or behavioral attributes, such as
confidence or friendliness. Both visual and audio cues
are available to respondents in face-to-face surveys,
but respondents have only audio cues in telephone
surveys. The characteristics of the interviewer introduce additional factors into the interaction between
interviewer and respondent that may affect data collection and data quality. Research has shown that
interviewer characteristics affect unit nonresponse,
item nonresponse, and response quality.
Physical Characteristics
Physical characteristics of the interviewer, such as age,
gender, or race, may be used by the respondent in the
decision whether to agree to the survey request and to
set expectations of the interview experience. Studies on
the effects of these attributes on interview outcomes
show mixed results. There is evidence that older interviewers are more likely to be consistent in administering surveys and introduce less response variation. No
consistent effects of gender have been found on data
quality although female interviewers, on average,
achieve higher response rates. There has been considerable study of interviewer race effects. The matching of
characteristics of the interviewer to the respondent has
been shown to improve respondent cooperation and
data quality. Respondents appear to be more comfortable and thus cooperative with someone similar to
themselves, especially in interviews on sensitive topics
such as inequality and racial discrimination.
In telephone interviews, interviewer characteristics
can only be conveyed through the audio interaction with
the respondent. Physical characteristics that can be perceived over the phone include gender, age, and possibly,
race and ethnic origin, as well as voice characteristics
such as loudness and rate of speech. These characteristics can be measured both acoustically and through
subjective perception. Acoustic measures of voice properties that have been studied include fundamental
frequency of the voice sound waves, the variation in
fundamental frequency, and measures of rate of speech
367
368
Interviewer Debriefing
Effects
Interviewer characteristics contribute to defining the
social interaction of the interviewer and respondent.
Interviewer variation exists and contributes to survey
outcomes. The mechanisms that define the effect of the
characteristics are dependent on other survey conditions, including the respondent and the survey design.
Barbara C. OHare
See also Interviewer Effects; Interviewer-Related Error;
Interviewer Variance; RespondentInterviewer Rapport
Further Readings
INTERVIEWER DEBRIEFING
Interviewer debriefing is a process used to gather
feedback from telephone and field (in-person) interviewers regarding a particular survey effort. As the
project staff members who most closely interact with
respondents, interviewers provide a unique perspective on how questions are answered by respondents
and which questions may be difficult to ask or answer.
They also can provide other, general observations
about the administration of the survey instrument.
These debriefing sessions can be held either in-person
(as is usually the case for telephone interviewer
debriefing sessions) or over the phone (as is often the
case for field interviewer debriefing sessions).
Prior to conducting an interviewer debriefing,
a member of the project staffeither the project director or the person who has managed the data collection
effortusually creates a detailed questionnaire for
interviewers to complete prior to the session. Most of
the questions should be short-answer (closed-ended)
but also should include space so that the interviewer
can provide feedback and concrete examples from their
own experience in administering the survey on specified topics. Providing an additional open-ended question at the end of the questionnaire can also encourage
the interviewer to comment on any other circumstances
not covered in the debriefing questionnaire.
The debriefing questionnaire is given to interviewers
by their supervisor, and they are instructed to complete
it prior to the debriefing session. Interviewers are usually given several days to complete the questionnaire
and are encouraged to initially spend a concentrated
amount of time filling out the questionnaire and then
continue adding to it in subsequent days as additional
examples or comments occur to them. This process
allows interviewers to spend multiple days thinking
about their comments and allows them to relate those
comments that are important to them. Further, quiet
interviewers may be overlooked or not speak up much
Interviewer Effects
369
INTERVIEWER EFFECTS
In many surveys, interviewers play an important role in
the data collection process. They can be effective in
370
Interviewer Effects
gaining cooperation of the sample persons, helping clarify survey tasks, or motivating the respondent to provide complete and accurate answers. Thus, interviewers
can contribute to data quality, but they can also contribute to measurement error. Interviewers can affect
respondents answers through their mere presence as
well as their behaviors when administering the survey.
There are several ways in which interviewers seem
to influence respondents answers. First, the presence
of an interviewer can stimulate respondents to take
social norms into account when answering a survey
question. Pressure to conform to social norms can lead
to the underreporting of socially undesirable behavior
and the overreporting of socially desirable behavior.
Second, observable interviewer characteristics, such
as age, gender, or race, can affect many stages of the
answer process, for example, by changing the salience
of the question topic and therefore altering the retrieval
process or by influencing the respondents judgments
of which answers would be socially appropriate. Third,
the interviewers verbal and nonverbal behavior can
also affect respondents answers. For example, the
interviewers feedback, facial expressions, or rate of
speech can be taken by respondents as reflecting (dis)approval of their answers or how important the interviewer thinks the question is. Finally, the interviewer
can make errors when delivering and recording the
answers to a question. These errors are particularly
problematic if they are systematic, for example, not
reading certain questions exactly as worded, delivering
them incorrectly, omitting necessary probes, or neglecting some response categories.
It is important to note that the effects of interviewers on respondents answers are not equally
strong across all types of questions. Social norms
apply only to certain behavioral and attitudinal questions. Interviewers observable characteristics play
a role only if they are related to the question content.
Early studies on interviewer effects have shown, for
example, race-of-interviewer effects for racial items
and gender-of-interviewer effects in gender-related
attitude questions but no effects with attitude questions related to other subjects. Similarly, the effects of
interviewer behavior also vary by question type. They
are more likely to occur if respondents are forced to
answer questions about unfamiliar topics, questions
about topics that are not salient, questions that are difficult to understand, or questions that leave room for
differing interpretations to be elicited by the interviewer. Interviewer errors in question delivery are
Interviewer Monitoring
371
INTERVIEWER MONITORING
Interviewer monitoring is a process of observing and
evaluating the performance of an individual who is
372
Interviewer Monitoring
conducting an interview to gather survey data. Interviewer monitoring is typically conducted in an effort
to reduce interviewer-related measurement error by
allowing the researcher to understand where in the
interview mistakes are being made, with whom are
they being made, and under what circumstances.
Interviewer monitoring is also necessary as a deterrent
to interviewer falsification. If the interviewer is made
aware that he or she will be monitored and is kept
blind as to when the monitoring will occur, the temptation to falsify data can be greatly reduced. This
entry contains an overview of the role of interviewer
monitoring, followed by a summary of the types of
data collected while monitoring, the ways monitoring
data can be used to improve the quality of surveys,
and finally a summary of the monitoring techniques
employed in telephone and face-to-face interviews.
Interviewer Monitoring
373
rather than the data gathered while monitoring interviewers. These are often reported in conjunction with
the production-based observations mentioned earlier.
Further Readings
374
Steve, K., Burks, A. T., Lavrakas, P. J., Brown, K., & Hoover,
B. (2008). Monitoring telephone interviewer performance.
In J. Lepkowski, C. Tucker, M. Brick, E. de Leeuw, L.
Japec, P. J. Lavrakas, et al. (Eds.), Advances in telephone
survey methodology (pp. 401422). New York: Wiley.
INTERVIEWER MONITORING
FORM (IMF)
An interviewer monitoring form (IMF) is a framework
or set of guidelines used to facilitate interviewer monitoring. A carefully developed IMF is central to the
development of standardized processes for measuring
interviewer-related error and conducting interviewer
training and debriefing. The primary purpose for
developing an IMF is to minimize interviewer effects
and increase interviewer productivity. This entry contains a brief summary of what is known about interviewer monitoring forms, why they are used, and
what they should include.
It is widely accepted that interviewer behavior can
be a significant source of measurement error in surveys and that effort to observe and reduce this error is
a necessary element of conducting survey research
whenever interviewers are used to collect data. Interviewer monitoring forms are typically used to monitor
the performance and behavior of telephone interviewers. This is due to the fact that telephone interviewers are centrally located and can be observed
from a remote location while the interview is taking
place. Traditionally it has been up to the individual
organization conducting the research to develop its
own IMF. Over the past 40 years or more, a wide
variety of methods have been developed.
Although many different approaches to monitoring
the performance of telephone interviewers have been
proposed, there currently exists no standard, widely
accepted IMF through which interviewer performance
can be assessed and compared across interviewers, studies, or research organizations. This is due in large part
to the complexity of the interview process and the wide
variety of purposes for conducting telephone interviews.
It also, in part, reflects the diversity of real-world
motivators that influence the nature of the IMF that an
individual organization might develop. Still, there exists
a growing body of research into understanding what an
IMF should contain and how it should be used.
Interviewer Neutrality
Further Readings
INTERVIEWER NEUTRALITY
Interviewer neutrality occurs when an interviewer provides no indication of desired responses (remains unbiased) during the interview process. Interviewers are
trained to betray no opinion about survey questions to
minimize interviewer-related error that occurs when
responses are influenced by respondent perception of
what the interviewer indicates is an appropriate answer.
The process of collecting data using interviewers is
designed to obtain valid information (i.e., a respondents accurate responses), but to be effective the information must be collected in a consistent and neutral
manner that minimizes bias. Neutral administration
of surveys requires the training of interviewers to not
reveal their own opinions or preferences, either
375
verbally or nonverbally, which could induce respondents to provide inaccurate answers in response to perceived interviewer preferences. Specifically, rapportbuilding behaviors, interviewer feedback, and respondent vulnerability to social desirability and acquiescence need to be considered.
Interviewer neutrality can be accomplished by
training interviewers to gather data in a nonjudgmental
manner and to use a normal tone of voice throughout
the interview process. It is important that interviewers
avoid using words or nonverbal cues that imply criticism, surprise, approval, or disapproval. Verbal
behavior such as Yes, I agree or I feel the same
way, or nonverbal behavior such as smiling, frowning, giving an intense look, or an extended pause may
be interpreted by the respondent as approval or disapproval of an answer. Although interviewers are
encouraged to establish rapport with respondents to
promote respondent motivation, the interviewer must
be continually aware of the risk of expressing personal opinions or preferences. When interviewers provide verbal or nonverbal feedback throughout the
interview, it is vital to avoid using any feedback techniques that may be interpreted as approval or disapproval. Interviewers should avoid expressing personal
opinions on the topics covered in the survey, as well
as communicating any personal information that the
respondent may use to infer the interviewers opinions. The validity of the data can be threatened if
respondents are aware of interviewer opinions or preferences. Because the goal of interviews is to provide
an environment in which respondents feel comfortable
reporting accurate answers, it is critical that the interviewers opinions or preferences do not influence the
respondent in any way.
Finally, social desirability (wanting to provide
socially acceptable answers) and acquiescence response
bias (the tendency to agree with perceived interviewers opinions) can threaten the validity of the
data. Social desirability bias occurs when respondents answer questions to present themselves in
a favorable light (providing answers they feel are
most socially approved). Acquiescence response bias
occurs when respondents agree with statements from
the questions that are spoken by the interviewer
regardless of content and can lead to responses that
merely reflect agreement with what the interviewer
is reading rather than the respondents own opinions.
Training for interviewer neutrality should seek to
minimize the effects of social desirability and
376
Interviewer Productivity
INTERVIEWER PRODUCTIVITY
Interviewer productivity refers to the ways of measuring what is achieved by telephone and in-person survey
interviewers when they work to (a) gain cooperation
from sampled respondents and (b) complete interviews
with these respondents. Measuring interviewer productivity is a major concern for those conducting and
managing surveys, for several reasons. Knowledge of
productivity is essential to survey budgeting and developing realistic estimates of survey costs. Managing
a survey requires an understanding about how many
completed interviews, refusals, noncontacts, ineligibles,
and callbacks can be expected for a given survey. Productivity information is often used to reward interviewers that are performing well or to retrain those
who are not being productive (enough). Interviewer
productivity information is also a necessary aspect of
planning and scheduling the number of interviewers
needed for fielding a survey and for monitoring survey
progress. It is also important to communicate productivity expectations to interviewers in advance of the
start of data collection so they know how to perform
adequately.
Literature
Interviewer productivity is discussed in the literature on
survey methods. Don Dillman has noted the importance
Interviewer Productivity
Performance Evaluations
Interviewers want to know how their performance is
evaluated, and productivity information is a useful way
377
Future Directions
Most survey organizations are concerned about interviewer productivity, including declining response rates,
lack of trained interviewers, and the problems of managing a part-time work force of interviewers. A survey
of telephone survey organizations conducted in 2007
found that almost all survey call centers (84%) regularly collect productivity information on interviewers.
However, only 54% reported that they actively use this
information as a way to analyze and to make decisions
about interviewers. Standards of interviewer productivity differ among survey research organizations and are
dependent on the specific goals of each organization
and of each survey. But no survey organization can survive if it does not pay attention to and regularly measure interviewer productivity. In the future, interviewer
productivity measures will become even more important, as survey costs increase and organizations look for
ways to increase ever-declining survey response rates.
John Tarnai and Danna L. Moore
378
Interviewer-Related Error
INTERVIEWER-RELATED ERROR
Interviewer-related error is a form of measurement
error and includes both the bias and the variance that
interviewers can contribute to the data that are gathered in face-to-face and telephone surveys. In interviewer-administered surveys, although interviewers
can contribute much to the accuracy of the data that
are gathered, they also can contribute much of the
nonsampling error that finds its way into those data.
The methodological literature includes startling
examples of measurement error due to interviewer mistakes. In 1983, an interviewers incorrect recording of
one wealthy respondents income resulted in the erroneous report that the richest half percent of the U.S.
population held 35% of the national wealth. This finding, widely publicized, was interpreted to show that
Reaganomics favored the wealthy. When the error was
detected and corrected, the actual estimate was 27%
only a slight increase from the 1963 figure. Most
Interviewer-Related Error
379
380
Interviewer-Related Error
to open questions, and any other more easily quantified measure related to error. Some questions will
show high error rates across all interviewers. These
are the banana peels in the questionnaire, best
addressed by redesign and retraining across-the-board.
Others will show high variance in error rate by interviewer. These are the interviewer-specific correlated
errors that can be addressed through targeted retraining or more draconian solutions.
The most common approach to measurement of
interviewer-related error involves observing field
interviewing and monitoring telephone interviews.
Interviewer observation can cover only a minimum
of actual interview time (even in telephone centers)
and is expensive, labor intensive, and difficult to analyze in real time. Quantitative coding of interviewer
behavior is used in telephone survey labs to offer
corrective feedback, but, in part because of the disconnect between day-to-day managers and principal
investigators, there is little evidence that these protocols protect the integrity of key research hypotheses.
Interviewer Training
Further Readings
381
INTERVIEWER TRAINING
Interviewer training refers to the instruction that
survey research interviewers receive at various stages
of their employment, and in various ways, to make
it more likely that they will perform their jobs effectively. It is absolutely essential for achieving highquality survey samples, interviews, and resulting data.
Organizations that hire people to conduct standardized
survey interviews understand that one of the most
critical success factors is how well interviewers are
trained and managed. The purpose of training interviewers is to teach the principles, skills, and basic
procedures needed to conduct telephone or face-toface interviewing in a manner that achieves highquality, reliable, and valid information for research.
382
Interviewer Training
Table 1
Interviewer Training
383
384
Interviewer Variance
Training Standards
Interviewer training is an important way to convey to
interviewers what is expected of them and how to adequately perform the job of interviewing in a standardized
way. Survey training activities are very important to
survey quality and are considered a best practice for
survey organizations. The International Standards Organization has established a set of standards for market,
opinion, and social research that includes a requirement
for at least 6 hours of training for new telephone interviewers. The goal of this requirement is to ensure that
all interviewers receive a minimum amount of training
in standardized survey interviewing, thereby ensuring
better quality survey results.
Danna L. Moore and John Tarnai
See also Computer-Assisted Personal Interviewing (CAPI);
Computer-Assisted Telephone Interviewing (CATI);
Confidentiality; Interviewer Monitoring; Interviewer
Productivity; Interviewer-Related Error; Nonresponse;
Nonresponse Error; Probing; Refusal Avoidance
Training (RAT); Refusal Conversion; Role Playing;
Survey Ethics; Tailoring
Further Readings
INTERVIEWER VARIANCE
Interviewer variance describes the part of the overall
variability in a survey statistic that is associated with the
interviewer. Clusters of respondents interviewed by the
same person tend to have more similar responses than
do clusters of respondents interviewed by different interviewers. This cluster effect can appear, for example, if
an interviewer uses inappropriate or inconsistent probing
techniques, has idiosyncratic interpretations of questions
and rewords them accordingly, or differs in the way he
or she reads answer categories. In addition, interviewerspecific interactions between the interviewer and respondent can lead to an intra-interviewer covariance term
that contributes to the variance of the estimate.
The effect of interviewers on responses can increase
the variability of survey estimates in a way parallel to
the effect of clustered samples. The standard errors of
such survey estimates are inflated compared to those
computed for a simple random sample. Thus, ignoring
the clustering of respondents within interviewers can
yield misleading results in significance tests or in the
coverage rates of confidence intervals. Most statistical
packages use linearization or replication methods to correct the variance estimation for different kinds of sampling designs. To account for an interviewer clustering
effect, those procedures require either an interviewer
identification variable or appropriate replicate weights
created by the data collector as part of the data set.
The overall variance of the respondent mean is
inflated by interviewer variance according to the function deff = 1 + w 1, where w is the average number of interviews conducted by individual interviewers,
and is the intraclass correlation coefficient among
responses produced by a common interviewer. If all
respondents interviewed by the same interviewer
answered in exactly the same way, would be equal
to 1. The size of reported by various researchers has
shown substantial variation among surveys and survey
variables. The average value for in many (mostly
telephone) studies is 0.01, but values of about 0.05 are
not uncommon, while for some surveys and items a
as high as 0.2 has been observed. These seemingly
small values can have a large impact. If the average
workload for an interviewer in a survey is 100, a of
Interviewing
Further Readings
385
INTERVIEWING
Survey interviewing is typically a formal, standardized conversation between a person asking questions
(the interviewer) and a person giving answers to those
questions (the respondent). The respondents are selected
because they belong to a population of interest. The
population can be very broad (e.g., residents of a city
or state, registered voters) or very narrow (e.g.,
people who have been diagnosed with a particular
disease; females who smoke cigarettes, have less
than a high school education, and watch the local
news on a specified television station). In addition to
asking questions, the interviewers may also play other
roles, such as gaining initial cooperation from the respondents or showing respondents how to answer selfadministered questionnaires on paper or by computer.
While some survey data may be collected by selfadministration (e.g., mail or Internet-based surveys),
many surveys, particularly long, complicated ones,
require the use of an interviewer. Thus interviewing is
an important aspect of survey research. This entry provides an overview of factors relevant to interviewing,
many of which are discussed in greater detail in other
entries in this volume.
386
Interviewing
Interviewer Effects
Interviewing can cause interviewer effects, a subset
of a larger problem called measurement error. The
primary causes of interviewer effects are improper
administration of the questionnaire and the effects of
interviewer characteristics themselves. Interviewers can
negatively affect the administration of a questionnaire
in many ways, such as by misreading questions and
deviating from a standardized script. Improper questionnaire administration can be held to a minimum
through the use of professional, well-trained interviewers. The way to minimize interviewer-related error
is to standardize the interviewing as much as possible.
Interviewer Characteristics
Interviewer characteristics are those things about a particular interviewer that may, in some circumstances, affect
how a respondent will answer survey questions. They
include gender, age, race, ethnicity, and in some countries, perceived or actual class, caste, or clan. Interviewer
characteristics can affect respondentinterviewer rapport
and respondent answers, positively or negatively. In
some cases, if there is a major concern that an interviewer characteristic may affect the respondent, interviewers may be matched with respondents on that
characteristic. For example, only female interviewers
would interview female respondents or only older male
interviewers would interview older male respondents.
Interviewer Training
The quality of survey interviewing is dependent on good
interviewer training. Surveys conducted by professional
survey organizations use professional interviewers who
have been trained extensively. At the other extreme,
some surveys are conducted by volunteers who have
been given little or no instruction. Regardless of
whether or not interviewers are paid, it is essential that
they are trained to do the job expected of them. While
a large part of interviewer training relates to the administration of the questionnaire (e.g., asking the questions
as worded; reading or not reading answer categories, as
directed; knowing when and how to probe), there are
many other important aspects to this training. They
include (a) identifying the appropriate sample unit (e.g.,
a household, a telephone number, an individual), (b)
obtaining and maintaining respondent cooperation, and
(c) following prescribed rules regarding contacts (number of attempts, days and times of attempts, etc.). In
addition to the information in the questionnaires introduction, interviewers should be given backup information, including fallback statements, about the study
so that they can answer potential respondent questions.
Such questions may include which firm is conducting
the survey, who is sponsoring or funding the survey,
why the household or respondent has been selected,
approximately how long the interview will take,
whether or not the information will be confidential, and
who to contact if they want further information.
Regardless of the length of the questionnaire and
the experience of the interviewers, the person conducting the survey or supervising the data collection should
prepare a written interviewer training manual. The
manual should be used for initial training and also be
available for review throughout the data collection
period. In fact, if certain instructions, probes, lists, and
so forth, are important and may need to be referred to
during an interview, they should be either printed out
separately or available on the computer screen, so that
an interviewer can refer to them instantly.
Most important is to make clear to the interviewers
what the conventions are on the questionnaire, for
Interviewing
387
or untrained interviewers. Interviewers should be neutral, not only in their words, but also in their nonverbal
behavior and dress. An interviewer who personally
disapproves of a respondents answers may communicate that feeling through raised eyebrows, a frown, or
tone of voice. Such nonneutral behaviors can significantly bias subsequent responses.
While neutrality is a must, interviewers should be
supplied with information to assist respondents after the
interview if circumstances warrant that. For example, if
a survey of smokers is about smoking cessation, interviewers should have a telephone number available for
quit smoking help. The survey sponsor may decide to
offer the information only to respondents who ask for it
or to all respondents at the end of the interview. Information on help with domestic violence should be available if relevant to the topic of the questionnaire and if
the respondent has mentioned such violence. In cases
like these, the assistance should be offered to all respondents who mention the problem, whether they request it
or not, but the offer should come only after the interview has been completed, so as not to bias subsequent
questions.
Interviewer Monitoring,
Supervision, and Validation
Interviewer training and supervision do not end with
the training held prior to the start of data collection. It
is important to continue to supervise, including monitoring and validating their work and re-training when
necessary. Interviewing that is conducted from a centralized facility offers the ability for greater supervision
and monitoring. Telephone calls may be monitored
and, with CATI surveys, monitoring can include following along with the questionnaire as viewed on
another computer screen as the interview is conducted.
With the appropriate computer connections, it is possible for a sponsor or client to monitor interviews from
another site. Interviews conducted at other sites (e.g.,
interviewers calling from their home phones, face-toface interviews conducted in homes, malls, or clinics)
offer fewer opportunities to monitor and supervise,
making training and validation even more important.
Validation is the term survey researchers use for
checking interviews that have been conductedto
verify to the extent possible that they have actually
been conducted, conducted correctly, and with the
appropriate respondents. The major way to validate
an interview is to recontact the respondent (usually by
388
Intracluster Homogeneity
Calling Rules
Calling rules (or contact rules) refer to the rules of data
collection set up by the survey administrator. They
include the dates to begin and end the data collection,
the days of the week and times of day that calling will
be allowed, how many contacts each potential case
should be tried (typically a maximum number), and
how to code the outcome of each contact attempt.
Except in telephone surveys where a predictive dialer
is used to assign an already connected call to an interviewer, interviewers can play an integral part in the
success of the study by reviewing the history of each
case and deciding the best way to proceed before the
next contact is attempted. An important part of this
process is to gather information from a prospective
respondent or household about the best day of the
week and time of day to attempt another contact. The
more contact attempts made, the more likely a potential
respondent will turn into an actual respondent.
Diane ORourke
See also Audio Computer-Assisted Self-Interviewing (ACASI);
Bias; Bilingual Interviewing; Calling Rules; Closed-Ended
Question; Computer-Assisted Personal Interviewing
(CAPI); Computer-Assisted Self-Interviewing (CASI);
Computer-Assisted Telephone Interviewing (CATI);
Confidentiality; Conversational Interviewing; Face-to-Face
Interviewing; Fallback Statements; Falsification; Field
Period; Field Work; Interviewer; Interviewer Characteristics;
Interviewer Effects; Interviewer Monitoring; Interviewer
INTRACLUSTER HOMOGENEITY
Intracluster (or intraclass) homogeneity is a concept
related to the degree of similarity between elements in
the same cluster. The intracluster (or intraclass) correlation coefficient, , measures the degree of homogeneity
among population elements within the sampling clusters. Intracluster homogeneity is computed as the Pearson correlation coefficient between pairs of elements
that are in the same cluster.
In terms of the variance components in an analysis
of variance (ANOVA), intracluster homogeneity measures the extent to which the total element variance in
the population is due to the between-cluster variance.
In other words, measures intracluster homogeneity in
terms of the portion of the total variance that is attributable to cluster membership. When there is complete
homogeneity within clusters, the between-cluster variance accounts for all the variance in the population and
is equal to 1.0. When there is complete heterogeneity
within clusters, the within-cluster variance accounts for
all the variance in the population and is a negative
number equal to the inverse of the size of the cluster
minus 1.0. Finally, when the clusters are comprised on
random elements from the population with no relationship to each other, is zero.
Intracluster Homogeneity
389
city blocks, census tracts, schools, hospitals, households, and so on. With respect to many population
characteristics (demographic, socioeconomic, political, behavioral, epidemiological, health care, and the
like), elements in the same clusters tend to be more
similar than those in different clusters, resulting in
a positive correlation among elements in the same
clusters. By confining the sample to a subset of clusters, cluster sampling tends to reduce the spread and
representativeness of the sample. Compared to a simple random sample of the same size (in terms of the
number of elements), a cluster sample is more likely
to lead to extreme estimates and hence increased sampling variance.
Intracluster homogeneity is an important tool for
measuring sample efficiency and for survey planning.
The efficiency of a complex sample design may be
measured by the design effect (deff), defined as the
ratio of the variance under the complex design to the
variance of a simple random sample of the same size.
For a complex design that involves unequal selection
probabilities, stratification, and clustering, the design
effect may be decomposed into three multiplicative
components: (a) weighting effect, (b) stratification
effect, and (c) clustering effect. The clustering effect
of a two-stage cluster sample may be expressed as
deff = 1 + m 1,
where m is the size of the subsample selected from
each cluster. When the subsample size differs across
clusters, the average subsample size may be used as
an approximation. The clustering effect of a threestage cluster sample may be approximated by
deff = 1 + m1 1m2 1 + m2 12 ,
where m1 is the average number of secondary
sampling units (SSUs) selected within each primary
sampling unit (PSU), m2 is the average number of
ultimate sampling units selected from each SSU, 1 is
the intracluster homogeneity between SSUs within the
same PSU, and 2 is the intracluster homogeneity
between the population units within the same SSU.
An estimated design effect can then be used to determine the sample size, effective sample size, and other
relevant design parameters given the variance requirements and/or cost constraints of the survey.
Y. Michael Yang
390
Introduction
Further Readings
INTRODUCTION
The survey introduction is a key step that affects the
surveys response rate; therefore, interviewers need
special training in it. A key part of an interviewers
role is to gain cooperation from the respondent. This
opportunity for enlisting cooperation occurs within
a time period of variable length, which starts with the
interviewers initial contact with the sampled unit and
continues until the selected respondent agrees to participate in the survey or provides a definitive no.
Depending on the survey design, this conversation is
often conducted over multiple callbacks. This time
period has traditionally been called the doorstep introduction. In this entry, it is referred to as the survey
introduction.
Interviewers abilities to gain cooperation during
the introduction to a survey vary greatly. For example, in exploring interviewers survey introduction
skill versus respondents reluctance, Pamela Campanelli and her colleagues found that interviewers skill
could affect response rates by 13 to 20 percentage
points for in-person surveys.
This entry details the five parts of the survey introduction and discusses the differences and similarities
of introductions for surveys of establishments and surveys of households.
Parts of Introduction
The survey introduction can be thought as being made
up of five parts, which generally occur in the following order.
2. Advanced Letters
Introduction
391
392
Introduction
Be Prepared
Interviewers who are prepared will increase credibility and those who are not prepared will lose credibility. Take the following survey introduction excerpt
recorded by Campanelli and her colleagues from an
actual interview.
Respondent: Whats the research about?
Interviewer: Its called political tracking. If you come to
any question youre not sure of or dont
know how to answer you just leave it, put
dont know, things like that.
Respondent: Right, so its about politics?
Interviewer: Well just how you think, what you think, that
sort of thing.
Eleanor Singer and her colleagues found that interviewers expectations about the ease of persuading
respondents to be interviewed were significantly related
to their response rates.
Every respondent is an individual, so general standardized replies to respondent concerns are the least
effective. Robert M. Groves and Mick P. Couper have
found that to be successful, interviewers need to
remain flexible and be fully prepared to tailor their
manner and explanation for each and every situation.
It also is recommended that interviewers analyze how
their reply is working and if it isnt proving successful, to try a different one.
Be Succinct
Good interviewers make their replies clear, coherent, and to the pointthus, providing long rationales
should be avoided. Good interviewers should also
acknowledge the respondents viewpoint and never
argue directly with the respondent during the introduction (or at any time, for that matter).
Maintain Interaction
Inverse Sampling
Respondent reluctance is often specific to a particular situation. Good interviewers need to watch for
signs and back off before receiving a final no. It is
much better to retreat and leave the door open for
another contact than to get an outright refusal.
Dont Ask Questions That Can Be Answered No
393
response rates 30 percentage points lower than surveys directed to a named individual within the company. Exploratory telephone calls are often used to
find out this key information. Once this is known, the
respondent may be contacted by telephone with an
invitation to participate, and this would follow closely
the steps for household surveys (outlined earlier in
this entry). In other cases, a self-administered mode
may be used where the respondent is sent a postal
questionnaire (and may be given a Web option) and
telephone contact is limited to a follow-up method
after several reminders are sent.
Pamela Campanelli
See also Advance Contact; Designated Respondent;
Gatekeeper; Household Informant; Interviewer
Neutrality; Refusal Avoidance; RespondentInterviewer
Rapport; Screening; Tailoring; Within-Unit Selection
INVERSE SAMPLING
Inverse sampling is an adaptive sampling technique
credited to J. B. S. Haldanes work in the 1940s. Under
many study designs, it is desirable to estimate the frequencies of an attribute in a series of populations, each
394
Inverse Sampling
Applications
Applications for inverse sampling can have broad
appeal. One such application is the ability to determine the better of two binomial populations (or the
one with the highest probability of success). For
example, in a drug trial, where the outcome is success
or failure, inverse sampling can be used to determine
the better of the two treatment options and has been
shown to be as efficient, and potentially less costly,
than a fixed sample size design. Milton Sobel and
George Weiss present two inverse sampling techniques to conduct such an analysis: (1) vector-at-a-time
(VT) sampling and (2) play-the-winner (PW) sampling. VT inverse sampling involves two observations,
one from each population, that are drawn simultaneously. Sampling continues until r successful observations are drawn from one of the populations. PW
inverse sampling occurs when one of the populations is
randomly selected and an observation is randomly
selected from that population. Observations continue to
be selected from that population until a failure occurs,
at which point sampling is conducted from the other
population. Sampling continues to switch back and forth
between populations until r successful observations are
Further Readings
395
396
On the other form of the questionnaire, respondents were asked the same questions in reverse
sequence. These alternative sequences produced a significant question order effect. In this case, where there
are only two questions to consider, simply reversing
the sequence of the items amounts to fully randomizing them and controlling for order effects. But if there
397
398
Probability of Response
Item response theory (IRT) is an approach used for survey development, evaluation, and scoring. IRT models
describe the relationship between a persons response
to a survey question and his or her standing on a latent
(i.e., unobservable) construct (e.g., math ability, depression severity, or fatigue level) being measured by multiple survey items. IRT modeling is used to (a)
evaluate the psychometric properties of a survey, (b)
test for measurement equivalence in responses to surveys administered across diverse populations, (c) link
two or more surveys measuring similar domains on
a common metric, and (d) develop tailored questionnaires that estimate a persons standing on a construct
with the fewest number of questions. This entry discusses IRT model basics, the application of IRT to survey research, and obstacles to the widespread
application of IRT.
1.00
False
True
0.75
0.50
0.25
2
0.00
3
Low
Figure 1
0
1
2
3
Depression
Severe
b = .25
Item response curves representing the probability of a false or true response to the
item I am unhappy some of the time conditional on a persons depression level. The
threshold (b = 0:25) indicates the level of
depression gy needed for a person to have
a 50% probability for responding false or
true.
1
1 + eai y bi
a = 2.83
2.0
I dont seem
to care what
happens to me
I am unhappy
some of
the time
1.5
Information
399
a = 2.20
1.0
0.5
a = 1.11
I cry easily
0.0
3
Low
1
b = 0.23
b = 0.25
3
Severe
b = 1.33
Depression
Figure 2
400
Information
20
15
10
5
r = .96
1.0
r = .95
r = .93
r = .90
r = .80
0
3
Low
0
Depression
3
Severe
Probability of Response
25
0.8
None of
the time
0.6
Some of
the time
Little of the
time
Most of
the time
All of the
time
A good bit
of time
0.4
0.2
0.0
3.00
Figure 3
Poor
Figure 4
2.00
1.00
0.00
1.00
2.00
3.00
Excellent
research populations, (c) linking two or more questionnaires on a common metric, and (d) developing item
banks for computerized adaptive testing applications.
Evaluating Existing Scales
and Guiding Survey Revisions
401
402
Limitations
The survey research field has much to gain from IRT
methods; however, there are limitations to widespread
application. First, many researchers who have been
trained in classical test theory statistics may not be as
comfortable with the advanced knowledge of measurement theory IRT modeling requires. In addition, the
supporting software and literature are not well adapted
for researchers outside the field of educational measurement. Another obstacle is that the algorithms in the
IRT parameter-estimation process require large sample
sizes to provide stable estimates (from 100 for the simplest IRT model to 1,000 or more for more complex
models). Despite the conceptual and computational
challenges, the many potential practical applications of
IRT modeling cannot be ignored.
Bryce B. Reeve
See also Language Translations; Questionnaire Design;
Questionnaire-Related Error; Response Alternatives
Further Readings
J
finite population under study, ^j =
n
P
yi =n1 is
i6j,i=1
404
Further Readings
405
406
to upgrade the skills of people who are already working as survey researchers. The most demanding are
two certificate programs tailored to students who
already have an advanced degree in another field but
are seeking to learn more about survey methods. Each
of the certificate programs is a bundle of six semesterlength courses. In addition to its semester-length classes, JPSM also offers 20 or more short courses each
year. The short courses are 1- or 2-day classes taught
by experts on a given topic. Approximately 700 persons
attend JPSM short courses each year. The citation programs recognize persons who have completed specific
combinations of short courses and a semester-length
class in survey methods.
JPSM also has one programthe Junior Fellows
Programdesigned for undergraduates; it seeks to
recruit promising undergraduates to the field of survey
research. The program consists of a weekly seminar
at JPSM and an internship with one of the federal statistical agencies. In a typical year, about 30 students
take part in the Junior Fellows Program; more than
200 Fellows have taken part since the program began.
Future Directions
As JPSM enters its second decade, it seeks to expand
its ties to other programs (especially the Michigan Program in Survey Methodology), to become a more conventional department at the University of Maryland,
and to continue to provide innovative educational offerings. JPSMs graduates have already made their mark
on the federal statistical system and on the field more
generally; their impact on survey research is likely to
grow in the years to come.
Roger Tourangeau
Further Readings
JOURNAL OF OFFICIAL
STATISTICS (JOS)
The Journal of Official Statistics (JOS) was launched
in 1985. It is published by Statistics Sweden, the
National Statistical Institute of Sweden. It replaced the
K
sides of the issue at study. Key informant interviews
are most commonly conducted face-to-face and can
include closed- and open-ended questions. They are
often audio-taped and transcribed so that qualitative
analyses of the interviews can be performed. Key
informant interviews are rarely used as the sole
method of data collection for a study or particular
issue, as there is little generalizability that can come
from them. However, they have a useful role, especially at the beginning stages of research studies when
information gathering and hypothesis building are
the goal.
KEY INFORMANT
Within the context of survey research, key informant
refers to the person with whom an interview about
a particular organization, social program, problem,
or interest group is conducted. In a sense, the key
informant is a proxy for her or his associates at the
organization or group. Key informant interviews are
in-depth interviews of a select (nonrandom) group of
experts who are most knowledgeable of the organization or issue. They often are used as part of program
evaluations and needs assessments, though they can
also be used to supplement survey findings, particularly
for the interpretation of survey results. Key informants
are chosen not because they are in any way representative of the general population that may be affected
by whatever issue is being studied, but because they
are believed to have the most knowledge of the subject
matter.
Key informant interviews are especially beneficial
as part of an initial assessment of an organization or
community issue, allowing for a broad, informative
overview of what the issues are. In survey studies, key
informant interviews can be valuable in the questionnaire development process, so that all question areas
and possible response options are understood. Relying
on this method is also appropriate when the focus of
study requires in-depth, qualitative information that
cannot be collected from representative survey respondents or archival records. While the selection of key
informants is not random, it is important that there be
a mix of persons interviewed, reflecting all possible
Jennifer A. Parsons
See also Informant; Proxy Respondent
Further Readings
KISH, LESLIE
(19102000)
Leslie Kish was a statistician, sociologist, and cofounder of the Institute for Social Research at the
University of Michigan. His work had a profound and
407
408
Knowledge Gap
which indicate first the numbers of adults in households from 1 through 6 and then below, the number
of the adult to be interviewed. Each table lists the target individual in systematic order that differs among
the tables, and the first three tables are set up in such
a way that males are more likely to be selected
because males tend to be underrepresented in surveys.
In his second example, used more frequently in
research today than the first procedure, there are eight
tables labeled A, B1, B2, C, D, E1, E2, and F. Tables
A, C, D, and F will be used one sixth of the time, and
the others will be used one twelfth of the time. These
tables give equal chances of selection to individuals
in households of 1, 2, 3, 4, and 6 adults. Dwelling
units with 5 adults are overrepresented to compensate for the inability of this method to represent
households of more than 6 adults, a very small proportion of the population. Kish estimated, at the time
he was writing, that only 1 or 2 adults per 1,000
would not be represented, usually young females.
Here is one example of the Kish method question
wording, used by Robert W. Oldendick and his
colleagues:
In order to select the right person to interview, we
need to list all the people living in your household
who are 18 years of age or older. First, could you
tell me the ages of all the males living in your
household who are 18 years of age or olderthat
is, from the oldest to the youngest? Next, could you
tell me the ages of all the females living in your
household who are 18 years of age or olderthat
is again, from the oldest to the youngest?
409
increase with any method that requires two consentsthe first from the informant who answers the
phone and the second from the designated respondent.
Several researchers report that the Kish method is still
a reliable and noninvasive method when interviewers
are skilled and well trained in it. It is important that
interviewers have faith in the method because they
otherwise unconsciously may communicate negative
attitudes toward a method, thereby decreasing the
cooperation rate. Several comparative studies have
included the Kish method with other methods on coverage, response, costs, and other characteristics.
Cecilie Gaziano
See also Last-Birthday Selection; Within-Unit Coverage;
Within-Unit Selection
Further Readings
KNOWLEDGE GAP
This entry focuses on the concept of the knowledge
gap, which is a specific hypothesis within the area of
410
Knowledge Gap
Knowledge Question
Further Readings
KNOWLEDGE QUESTION
A knowledge question is designed to capture the
extent to which people have stored factual information in long-term memory and how well they can
retrieve and respond with that information when
asked a survey question about a given topic.
Knowledge, as a concept, is distinct from attitudes
and opinions. Knowledge, moreover, is not synonymous with the term information. Whereas information
describes a wider breadth of content that may include
411
412
Knowledge Question
Further Readings
L
target population for a survey specified that respondents must speak English (or another specific language) to be eligible to complete the questionnaire.
Survey researchers may include a variety of categories within the language barrier disposition. One
category of language barrier is used in telephone or
in-person surveys for those cases in which there is no
one in the household present at the time of contact
who can speak or understand the language in which the
introduction is spoken. Other categories of language
barriers include cases in which the sampled respondent
does not speak the language in which the interview is
conducted or does not read the language in which the
questionnaire is printed (for mail surveys) or displayed
(for Internet surveys). Finally, a third category of language barrier occurs in in-person and telephone surveys
when an interviewer fluent in the language spoken by
the household is not available to be assigned to the case
at the time of contact.
Because cases with a language barrier disposition
increase the nonresponse rates in a survey, researchers
fielding surveys in areas known to be multi-lingual
often use practical strategies to ensure that households
or respondents who do not speak the language in
which the interview is to be conducted can complete
the survey. In telephone and in-person surveys, these
strategies usually include employing multi-lingual
interviewers and arranging to have the survey questionnaire and all supporting materials translated into
one or more additional languages. In mail and Internet
surveys, these strategies usually include having the
questionnaire and supporting materials translated into
one or more additional languages and then re-sending
LANGUAGE BARRIER
Language barrier dispositions occur in U.S. surveys
when a household member or the sampled respondent
does not speak or read English (or another target language) well enough to complete the interview. The
language barrier disposition is used in all surveys,
regardless of the mode (telephone, in-person, mail,
and Internet). Language barrier dispositions in surveys
in the United States are not common, but their frequency is growing. Approximately 20% of the U.S.
population in 2005 spoke a language other than
English in their home, according to the U.S. Census
Bureau. Furthermore, the 2005 U.S. Census estimates
show upwards of 5 million residents being linguistically isolated, in that they can speak little or no
English. Language barriers are more likely to occur
when data collection is conducted in central city areas
and in rural areas of the Southwest.
The language barrier disposition functions as both
a temporary and a final disposition. Cases may be
coded temporarily with a language barrier disposition
and then contacted again (in the case of a telephone
or in-person survey) by an interviewer who speaks the
same language as the household or the sampled
respondent. In a mail or Internet survey, the survey
organization may re-mail or resend a translated version of the questionnaire. However, cases with a language barrier final disposition often are considered
eligible cases and thus are factored into survey nonresponse rates. The exception to this would be if the
413
414
Language Translations
Further Readings
LANGUAGE TRANSLATIONS
Many survey projects at national and cross-national
levels use questionnaires developed in one language
and translated into another. The quality of these translations is a crucial factor in determining the comparability
of the data collected. Conversely, poor translations of
survey instruments have been identified as frequent and
potentially major sources of survey measurement error.
This entry outlines key aspects of conducting and monitoring survey language translations.
Essentially, translation allows researchers to collect
data from people who cannot be interviewed in the language(s) in which a questionnaire is already available.
In countries such as the United States, long-standing
linguistic minorities and newly immigrated groups
make translation into multiple languages essential to
ensure adequate coverage and representation of different segments of the national population. The 2000 U.S.
Census was available in six languages, with language
Language Translations
Translation Monitoring
and Documentation
Systematic monitoring of translation quality also calls
for technical support. In addition, team translation
efforts rely on documentation from one stage for efficient execution of the next. Numerous projects use
templates aligning source and target texts to facilitate
the translation and documentation processes. In some
projects, final draft versions are monitored centrally;
in others, this responsibility is left with the individual
language teams. Those translation teams who are
monitoring these processes centrally need to align and
415
Harmonization
As used here, the term harmonization is an extension of
the principle of asking the same questions. In multilingual studies, several countries or groups may use or
share the same language (e.g., Spanish or French).
Sometimes a study stipulates that efforts must then be
undertaken to harmonize questions across those populations sharing a language. In a harmonizing procedure,
countries sharing a language compare and discuss their
individually translated versions with a view to removing
any unnecessary differences in a final harmonized version. The degree to which harmonization is obligatory
differs; many studies recognize that regional standards
of a shared language result in necessary differences in
translations.
Team Translation
Views on what counts as good survey translation
practice have changed noticeably in the last decade.
Translation guidelines produced by the European
Social Survey, by the U.S. Bureau of Census, by the
World Health Organizations World Mental Health
Initiative, and by the International Workshop on
Comparative Survey Design and Implementation, all
available online, emphasize the benefits to be gained
from organizing survey translations in a team effort.
These team procedures consist of (at least) five
steps that may be reiterated: (1) draft translations by
translators; (2) review by the entire team; (3) adjudication by the team or a subgroup of it; (4) pretesting
and translation adjustment; and (5) documentation.
416
Language Translations
Back Translation
A more traditional form of survey translation usually
has one translator translate. Assessment of the translation is often made by having a second translator produce a back translation of the translated text. Thus, if
an English questionnaire is translated into Czech, the
back translation step translates the Czech translation
into English, without the translator seeing the original
English version. The assessment of the Czech text is
made by comparing the two English versions (source
and back translation). If these are thought to be comparable (enough), the Czech translation is considered
to have passed muster.
Back translation was one of the earliest procedures
to establish itself in survey research and, over time,
became associated with quality assurance. In translation studies, however, translation quality is normally
assessed by focusing on the target translation, not
source-language texts. Back translation can be an aid
to researchers who do not understand the target language and want to gain a sense of what is in a text.
However, as increasingly recognized, back translation
cannot function as a refined tool of quality assessment.
Adaptations
Translation alone is often not sufficient to render questions appropriately and adaptations may be required.
Adaptations, as the term is used here, are changes that
are not purely driven by language. They take different
forms, partly determined by the needs or context of
a given target group. Some adaptations are simple,
such as changing Fahrenheit measurements to Centigrade. Others are more complex, depending also on the
purpose of the question. Questions on health, for
instance, may need to mention different symptoms for
various cultures to determine the presence of the same
complaint or illness across populations. In tests of
knowledge, different questions may be needed to avoid
favoring one population, and in skills and competence
research, target-language versions depend heavily on
adaptation. Visual components may be adapted to
accommodate the direction in which material is read
in a given culture or language. Some populations may
be unfamiliar with questionnaires and need more
Last-Birthday Selection
navigational guidance and instructions. Careful adaptation helps researchers produce target-culture questions
that will collect data comparable to that collected with
the source questions.
417
LAST-BIRTHDAY SELECTION
Survey researchers are usually concerned with choosing respondents within households after households
are selected randomly. Within-unit coverage and nonresponse are key issues, so researchers want to select
the correct respondent and gain his or her cooperation.
Each of these goals has costs. One popular quasirandom compromise is the last-birthday (LB) method
of selecting respondents from within a sampled
household in random-digit dialing surveys. It circumvents the pitfalls of pure or nearly pure random methods by being relatively quick, easy, and likely to
secure cooperation. Probability methods can involve
a potentially lengthy and intrusive process of querying
the informant (person who answers the phone) about
all household members eligible to be interviewed
before selecting the correct respondent from the
resulting list.
An example of LB question wording is, In order to
determine whom to interview, could you tell me, of the
people who currently live in your household who are
18 or olderincluding yourselfwho had the most
recent birthday? I dont mean who is the youngest
adult, but rather, who had the most recent birthday? If
the respondent does not know all the birthdays, the
following question can be asked: Of the ones you do
know, who had the most recent birthday? Some
researchers have sought to study the best wording to
choose the correct respondent and secure cooperation.
The first published description of birthday methods
tested a next-birthday (NB) method, assuming that the
incidence of births is random and the first stage of
selection in a two-step process. The researchers,
Charles Salmon and John Nichols, considered this
method to be the second stage of sampling, with all
members of a household having an equal probability of
being chosen. After implementation, however, the
418
Leaning Voters
Further Readings
LEANING VOTERS
Leaning voters is a term in politics and survey
research methods that has several meanings. The
Leaning Voters
419
420
Level of Analysis
Further Readings
LEVEL OF ANALYSIS
A social science study using survey data can be set at
the micro level when individuals are analyzed, or it
can be set at a higher, more macro level when aggregates of individuals such as households, wards, precincts, firms, neighborhoods, communities, counties,
provinces, states, or nations become the unit of analysis. This structural level, spanning the range from most
micro to the most macro, at which a social scientific
investigation is carried out is called level of analysis. A
particular study may also cut across several levels of
aggregation. For example, a multi-level study of the
educational effectiveness of a certain education program may include pupil-specific, classroom-specific,
school-specific, and school-district-specific information
and analyze the data at each and all of the levels.
The choice of level of analysis should be driven by
researchers theory and, subsequently, their research
questions. There are two large, contrasting issues of
concern over why the level of an analysis must be
Level of Measurement
421
Further Readings
LEVEL OF MEASUREMENT
Level of measurement refers to the relationship between
the numeric values of a variable and the characteristics
that those numbers represent. There are five major
levels of measurement: nominal, binary, ordinal, interval, and ratio. The five levels of measurement form
a continuum, because as one moves from the nominal
level to the ratio level, the numeric values of the variable take on an increasing number of useful mathematical properties.
Nominal
Nominal variables are variables for which there is no
relationship between the numeric values of the variable
and characteristics those numbers represent. For example, one might have a variable region, which takes
on the numeric values 1, 2, 3, and 4, where 1 represents
North, 2 represents South, 3 represents East,
and 4 represents West. Region is a nominal variable
because there is no mathematical relationship between
the number 1 and the region North, or the number 2
and the region South, and so forth.
For nominal variables, researchers cannot compute
statistics like the mean, variance, or median because
they will have no intuitive meaning; the mode of the
distribution can be computed, however. Nominal variables also cannot be used in associational analyses
like covariance or correlation and cannot be used in
regressions. To use nominal variables in associational
analyses, the nominal variable must be separated into
422
Level of Measurement
a series of binary variables. Only nonparametric statistical tests can be used with nominal variables.
Binary
Binary or dummy variables are a special type of
nominal variable that can take on exactly two mutually
exclusive values. For instance, one might have a variable that indicates whether or not someone is registered
to vote, which would take on the value 1 if the person
is registered and 0 if the person is not registered. The
values are mutually exclusive because someone cannot
be both registered and not registered, and there are no
other possibilities. Like with nominal variables, there is
no mathematical relationship between the number 1
and being registered to vote, but unlike nominal variables, binary variables can be used in associational
analyses. Technically, only nonparametric statistical
tests should be used with nominal variables, but the
social science literature is filled with examples where
researchers have used parametric tests.
Ordinal
Ordinal variables are variables for which the values
of the variable can be rank ordered. For instance,
a researcher might ask someone their opinion about
how the president is doing his job, where 1 = strongly
approve, 2 = somewhat approve, 3 = somewhat disapprove, and 4 = strongly disapprove. In this case, the
values for job approval can be ranked, and researchers
can make comparisons between values, for example,
saying that someone who gives a job approval value of
1 approves of the president more than someone who
gives a job approval value of 3.
However, a researcher cannot make exact mathematical comparisons between values of the variable;
for example, it cannot be assumed that a respondent
who gives a job approval of 4 disapproves of the president twice as much as someone else who gives a job
approval of 2. Researchers can, however, compare
values using greater than or less than terminology and logic.
The mode and the median can be computed for an
ordinal variable. The mean of an ordinal variable is
less meaningful, because there is no exact numerical
distance between the number assigned to each
value and the value itself.
Ordinal variables can be used in associational analyses, but the conclusions drawn are dependent upon
Interval
With interval variables, distances between the values
of the variable are equal and mathematically meaningful, but the assignment of the value zero is arbitrary. Unlike with ordinal variables, the differences
between values assigned to the variable are meaningful, and researchers use the full range of parametric
statistics to analyze such variables.
As with ordinal variables, interval variables can be
used in associational analyses, but the conclusions
drawn are dependent upon the way that numbers were
assigned to the values of the variable. Interval variables
can be rescaled to have a different value arbitrarily set
to zero, and this would change both the sign and
numerical outcome of any associational analyses. Parametric statistics can be used with interval variables.
Ratio
With ratio variables, distances between values of the
variable are mathematically meaningful, and zero is
a nonarbitrarily assigned value. Anything that can be
countedvotes, money, age, hours per day asleepis
a ratio variable.
Values assigned to ratio variables can be added,
subtracted, multiplied, or divided. For instance, one
can say that a respondent who views 6 hours of television per day views twice as many hours as another
respondent who views only 3 hours, because for this
variable, zero is nonarbitrary. By contrast, one cannot
say that 60 degrees feels twice as warm as 30 degrees,
because 0 degrees is an arbitrary construct of the temperature scale.
With ratio variables, researchers can calculate
mean, median, mode, and variance and can use ratio
Leverage-Saliency Theory
Further Readings
LEVERAGE-SALIENCY THEORY
Leverage-saliency theory, as first proposed by
Robert M. Groves and his colleagues in 2000, is a unifying theory to help explain survey nonresponse, with
the goal of helping to identify strategies to counter
nonresponse.
Nonresponse is a critical challenge to survey
research. Those who do not respond to surveys (or to
parts of questionnaires) may differ in important ways
from those who do respond. Leverage-saliency theory
attempts to describe the underpinnings of individual
behavior related to the individuals choosing to cooperate or not to cooperate with a survey request. The theory posits that different people place a different level
of importance to various attributes associated with
a survey request. These attributes are like weights on
a scale, tipping the scale to the sample person either
acceding to or declining a particular survey request. An
implication of the theory is that the response propensity
of any one person deciding to cooperate or not with
a specific survey request will vary across different survey requests and that few people will always agree or
never agree to participate when they are sampled for
a survey.
This entry describes several key attributes of the
survey request and how these interact in terms of their
leverage, value disposition, and saliency to affect
survey response. The entry includes suggestions as to
how interviewers can tailor their requests to make
more salient those attributes with the greatest amount
of positive leverage for an individual from the sample.
This entry describes the leverage-saliency theory in
423
424
Leverage-Saliency Theory
Likely Voter
Further Readings
425
LIKELY VOTER
A likely voter is someone who is registered to vote in
an upcoming election and is deemed likely to vote in
that election, for example, by pollsters trying to forecast the election outcome. Pre-election pollsters face
a unique challenge. At their most germane, they seek
to sample a population that is unknown and indeed
technically unknowable, because it does not and will
not exist until on and around Election Day; this population is the voting public. For a survey that seeks to
measure the attitudes and intentions of voters in a given
election, the only recourse is to estimate this population
through a process known as likely voter modeling.
There are no fixed rules for likely voter modeling,
and techniques vary. But all begin with a similar
approach, using known or self-reported information, or
both, about respondentsfor example, actual or selfreported voter registration status, actual or self-reported
voting history, self-reported interest in the campaign,
and self-reported intention to voteto determine their
likelihood of actually participating in the coming election. More controversially, some also may use weighting adjustments for political party affiliation or for
demographic targets drawn from exit polls or other
sources.
There are three main types of likely voter modeling:
(1) screening, (2) scaling, and (3) probability (propensity) modeling. In screening, respondents are identified
as likely voters on the basis of their answers to a series
of questions. In scaling, or cutoff modeling, qualification
requires selected answers to, say, any five of eight questions. The third approach employs a probability model
to build a probable electorate in which each respondent
is assigned a weight (which can range from 0 to 1)
reflecting her or his estimated likelihood of voting.
In the first two approaches, respondents are either
classified as likely voters and included in the sample,
426
Likely Voter
or as nonvoters and excluded; in the third, all respondents are included, but with varying weights. Results
are identified as representing the views of likely or
probable voters and, in some cases, are distilled further to certain or definite voters.
Some polling organizations use a single, preestablished likely voter model; others run several models, assessing results across a range of scenarios positing
differing levels of voter turnout and then investigating
differences across models when they occur. To some
extent, all likely voter models involve human (professional) judgment as to the elements they include, the
turnout level or levels they anticipate, and the weights
applied; at the same time, they are empirically based
and ultimately tested (and ideally are later refined)
against the actual election outcome.
Likely voter modeling is fraught with hazard. As
easily as estimates are improved by good modeling,
they can be worsened by poor modeling, for example,
through the inadvertent inclusion of nonvoters, the
exclusion of actual voters, or both. Poor likely voter
modeling is the likeliest cause of inaccurate final estimates in otherwise rigorous pre-election polls.
Poor modeling can negatively impact results well
before the final estimate. Ill-conceived likely voter
models can introduce volatility in estimatesswings in
candidate support that do not reflect actual changes in
opinion but rather changes in the characteristics of
respondents moving into and out of the model. The goal
of good likely voter modeling is to report real changes,
not changes that are an artifact of the model itself.
Likely voter modeling increases survey expense (or
decreases effective sample size) because it requires discarding or weighting down interviews with nonvoters.
To avoid this downside, while still claiming to produce
a likely voter survey, some pollsters use weak or
lightly screened models that include an unreasonable
number of nonvoters. Weeks or months before Election
Day, these estimates cannot be held to account by
actual results, but they can produce different estimates
in different surveys, making variations in models look
like volatility in the electorate.
Indeed, one useful way of evaluating a likely voter
model is to compare the turnout level it estimates with
reasonable expectations for that election. For example,
a model that includes 55% of the general population as
likely voters in a primary election where anticipated
actual turnout is 15% would be a poor one. However,
even models that winnow down to an appropriate turnout level may miss the mark by misstating the size of
Further Readings
Likert Scale
LIKERT SCALE
The Likert scale, named for Rensis Likert (pronounced
lick-urt) who published a seminal report describing
its use, possibly is the most widely employed form of
attitude measurement in survey research. Similar to
nearly all psychometric scale measures, the Likert scale
consists of multiple items that typically are summed or
averaged to produce a more reliable measure than
could be obtained by use of a single item.
The Likert scale is a special type of the more general class of summated rating scales constructed from
multiple ordered-category rating items. Its distinguishing characteristics are as follows:
Each item uses a set of symmetrically balanced
bipolar response categories indicating varying levels
of agreement or disagreement with a specific stimulus statement expressing an attitude or opinion (e.g.,
Ripe cherries are delicious).
The response category points for each item are
individually labeled (e.g., Strongly Agree, Agree,
Disagree, Strongly Disagree).
The descriptive text of these labels is chosen so that
gradations between each pair of consecutive points
seem similar.
427
428
Likert Scale
Table 1
A.
I think the president has been doing a wonderful job while in office.
Strongly Disagree
Disagree
Neutral
Agree
Strongly Agree
B.
Mostly
Agree
Somewhat
Agree
Somewhat
Disagree
Mostly
Disagree
Completely
Disagree
C.
How satisfied or dissatisfied are you with the reliability of this product?
Very
Satisfied
Somewhat
Satisfied
Neither Satisfied
nor Dissatisfied
Somewhat
Dissatisfied
Very
Dissatisfied
D.
Compared with adults in general, how would you rate your own health?
Excellent
Very Good
Good
Fair
Poor
Very Poor
E.
When you drink coffee, how frequently do you choose to drink decaf coffee?
Never
Rarely
Occasionally
Often
Nearly Always
F.
Choose the one box along the continuum between each pair of antonyms that best
describes how you view the service representative who assisted you.
Rude
Polite
Intelligent
Stupid
List-Assisted Sampling
Further Readings
429
LIST-ASSISTED SAMPLING
List-assisted sampling is a technique used in telephone surveys, which utilizes information from the
Bell Core Research (BCR) telephone frame and directory listings to produce a simple random sample. This
is accomplished by stratifying the BCR telephone
frame into two strata. The high-density stratum consists of 100-banks that contain at least one listed number, and the low-density stratum consists of hundreds
of banks without a listed number. The proportion of
the sample drawn from each stratum depends on the
requirements of the study. This technique started to
be widely used by telephone survey researchers in the
early 1990s because it increased the efficiency of traditional random-digit dialing (RDD) methods, in particular, the Mitofsky-Waksberg method. List-assisted
sampling helps to provide a solid foundation, as well
as lending statistical justification, for increasing the
efficiency of the sample while not sacrificing coverage. As a result, this sampling technique is used
widely by telephone researchers to reduce costs and
shorten the data collection period. List-assisted sampling can be done in a few different ways, namely, by
(a) dual frame design, (b) directory-based stratification, and (c) directory-based truncation. There are
some slight biases associated with list-assisted samples in that those 100-banks without a listed number
will contain some residential numbers but are not as
likely to be included in the sample. However, these
biases are minor when compared to other samples
with a more complete sample frame, especially when
the gains in efficiency are taken into account.
430
List-Assisted Sampling
Directory-Based Sampling
With the rise of national directories of the listed numbers in the United States (e.g., White Pages, MetroMail), calling efficiency was greatly increased. Sample
designs could be based on a one-stage selection procedure from these national directories, and only a very
small percentage of numbers would be found to be
nonresidential, depending on how often the database
was updated. While this provided researchers with
a purely residential sample, it also excluded numbers
that were unlisted residences. This increased the problem of coverage error in telephone surveys. This presented a problem to survey researchers as they realized
that not only were they excluding these unlisted residential numbers from the sample frame, but households
vary significantly based on their decision to list their
phone number. This pressing problem gave rise to listassisted sampling, which sought to preserve the efficiency associated with the directory-based sampling
but also to increase the amount of coverage to something close to the Mitofsky-Waksberg method.
List-Assisted Sampling
To produce a sample frame that utilizes the best in
coverage from the Mitofsky-Waksberg method while
maintaining the efficiency seen in directory-based sampling methods, the sample frame information used in
both of these methods needs to be combined. By enumerating the numbers of the BCR frame and matching
these numbers to the directory of listed numbers,
researchers are able to establish the listed status of each
number on the BCR frame without dialing a single
digit. Once the listing status of each number is known,
researchers can then draw a sample of numbers directly
from the BCR frame, without utilizing a two-stage
design. There are three predominant ways of producing
a list-assisted sample frame: (1) the dual frame design,
(2) directory-based stratification, and (3) directorybased truncation.
Dual Frame Design
List-Assisted Sampling
are then fielded independently of each other. The efficiencies gained by using a directory-based sampling
frame are balanced with the coverage offered by the
BCR frame. However, there are problems with this
approach. First, the BCR frame still contains all of the
problems that were associated with the Mitofsky-Waksberg method: unknown residential status, clustering,
inclusion of empty hundreds blocks, and so on. Second,
combining the two samples into one data set provides
a whole new set of estimation problems. Ideally one
would use a one-stage sampling procedure based on
a single frame.
Directory-Based Stratification
The directory-based stratification method is a onestage sample based on the BCR frame with the listed
status of each number obtained by comparing the
frame to the directory. Once the listed status of each
phone number is known, the BCR frame is separated
into two strata. The first stratum contains 100-banks
with at least one listed number, known as the highdensity stratum, and the second stratum contains 100banks without a listed number, known as the lowdensity stratum (in this case density refers to residential status, not listed status). Phone numbers are then
randomly selected from both strata; however, more
numbers are usually drawn from the high-density stratum to increase the efficiency of the calling effort.
Ultimately, it is up to the researcher, based on the
needs of the data collection effort, which percentage
of the final sample is drawn from the high-density
stratum versus the low-density stratum. Again, the
give-and-take between efficiency and coverage play
a key role in the decision-making process.
431
are far fewer 100-banks that contain residential numbers but no listed numbers. Hence the bias associated
with directory-based truncated designs is only slightly
higher when compared to designs that draw sample
from both strata.
Directory-Based Truncation
Implications
List-assisted sampling has greatly improved the efficiency of field work with regard to large, randomselection telephone surveys. By incorporating the directory listings into the BCR frame, researchers have
found that the problems associated with each frame
individually can be overcome when these two frames
are combined. No longer beholden to the MitofskyWaksberg method of RDD sampling, telephone surveys have been fielded much more quickly (and at less
cost), and as a result, study findings have been released
to the public (or client) earlier. Increasing efficiency in
432
Listed Number
Further Readings
LISTED NUMBER
LIST-EXPERIMENT TECHNIQUE
The list-experiment technique is a survey measurement technique that uses an experimental design to
List-Experiment Technique
433
Table 1
Table 2
434
List Sampling
exposed to one more item in the list than was the control group. Given that this is a design with strong
internal (cause-and-effect) validity, the researchers
can be confident about the findings.
As powerful as it is in improving the accuracy of
measuring sensitive survey topics, the list-experiment
technique has a methodological flaw of its own and
also has a major analytical disadvantage associated
with it. The flaw comes about because anyone in the
treatment group who is shown the list with the sensitive issue and for whom all the statements apply is
disclosing that fact by giving an answer that equals
the total number of items in the list. In the handgun
example in Table 2, anyone who says five is
known to have a handgun. This may cause some
people who should answer five to instead answer
four to not allow the interviewers and researchers
to know for certain that they have a handgun. Because
of this flaw, the list-experiment technique likely yields
an underestimate of the prevalence of the sensitive
characteristic. A way to reduce the impact of this flaw
is for the researchers to make certain that the list of
statements shown to everyone contains at least one
nonsensitive statement that has an extremely low rate
of occurrence. If this is done, there will be very few
respondents in the Treatment Group who are put in
the position of responding with an answer that gives
away that the sensitive statement applies to them.
Analytically, because respondents merely answer
with the number of statements that apply, as opposed
to indicating which statements apply to them,
researchers cannot do analyses on the sensitive item
itself at the level of the individual respondent. This is
due to the fact that they cannot know whether the
individual respondent possesses the sensitive characteristic, except in the case of two types of respondents. In the previously mentioned handgun example,
only for people in the treatment condition who
answered five can it be known for certain that they
have a handgun at home and only for those who
answered "zero" can it be known they do not have
a handgun at home. All others who answered anything other than five or zero may or may not
have a handgun. As such, analyses (e.g., multiple
regression) about which of the respondents have
handguns, and why, cannot be conducted. Group-level
analyses can be conducted, for example, analyses that
indicate which demographic characteristics are more
likely to correlate with possessing the sensitive characteristic, but not individual-level analyses.
LIST SAMPLING
List sampling is one of the basic ways that survey
samples can be created. The basic concept of list sampling is deceptively simple. The process is to choose
a subset of the elements (the sample) from a listing of
all elements (the sampling frame) using a specific
selection process. The selection process may have
several features, for example, sampling with replacement or sampling without replacement.
In list sampling, as in other sample selection processes, issues arise about whether the sample estimate
is an unbiased and reliable estimate for the characteristic or attribute in the full list of elements. Bias and reliability are measures of how well the estimator for the
attribute computed using list sample data corresponds
to the true value for the attribute in the full list.
Bias is the difference between the true value and
the expected value of the estimator. A specific estimator can be determined to be unbiased or nearly
unbiased on the basis of sampling theory for the estimator and the sampling process, but the bias cannot
be estimated explicitly from a sample.
List Sampling
435
436
Litigation Surveys
Further Readings
LITIGATION SURVEYS
Litigation surveys are surveys that are used in legal
proceedings. Like surveys in nonlitigation contexts,
they measure opinions, attitudes, and behavior among
representative samples of the general public, consumers, employees, and other populations. They afford
a useful mechanism for collecting and evaluating opinions or experiences of large groups of individuals
bearing on disputed issues in civil and criminal lawsuits. When administered by independent persons
who qualify as court expert witnesses in survey methodology by reason of their knowledge, experience,
and education, these group assessments may provide
more trustworthy information on legal questions than
the testimony of a handful of witnesses chosen by an
interested party.
State and federal courts throughout the United
States now regularly admit survey evidence. However, before 1963, survey respondents statements
were considered hearsay (i.e., statements made outside of court, which are offered to establish facts),
Litigation Surveys
437
438
Log-In Polls
LOG-IN POLLS
A log-in poll is an unscientific poll that typically is
conducted by news and entertainment media on their
Web sites to engage their visitors (audiences) by providing them an opportunity to register their opinion
about some topic that the media organization believes
has current news or entertainment value. Typically
two choices are given for someone to express her or
his opinion. One choice might be for those who agree
with the issue and the other might be for those who
disagree. For example, a log-in poll question might be
to indicate whether a Web site visitor agrees or disagrees that Congress should impeach the President.
These polls are not accurate measures of public
opinion on the topic. The people who choose to register their opinion on the Web site represent no known
target population, and as such, the media organization
cannot know to whom the findings generalize. Most
often, response options such as undecided are not
Further Readings
Longitudinal Studies
LONGITUDINAL STUDIES
Longitudinal studies or panel studies are studies where
the research settings involve multiple follow-up measurements on a random sample of individuals, such as
their achievement, performance, behavior, or attitude,
over a period of time with logically spaced time points.
The purpose of longitudinal research studies is to gather
and analyze quantitative data, qualitative data, or both,
on growth, change, and development over time. Generally, the significance of the longitudinal research studies
stems from the fact that the knowledge, skills, attitudes,
perceptions, and behaviors of individual subjects usually
develop, grow, and change in essential ways over
a period of time. Longitudinal studies require formulating longitudinal research questions and hypotheses,
using longitudinal data collection methods (e.g., panel
surveys), and using longitudinal data analysis methods.
Researchers across disciplines have used different
terms to describe the design of the longitudinal studies
that involve repeatedly observing and measuring the
same individual subjects (respondents) over time.
Some of the terms used are longitudinal research
designs, repeated-measures designs, within-subjects
designs, growth modeling, multi-level growth modeling, time-series models, and individual change models.
Advantages
Compared to the cross-sectional research designs, longitudinal research designs have many significant advantages, including (a) revealing change and growth in an
outcome (dependent) variable (e.g., attitude, perception,
behavior, employment, mobility, retention), and (b) predicting the long-term effects of growth or change on a particular outcome (dependent) variable. Most importantly,
longitudinal research studies can address longitudinal
issues and research questions that are impossible to
address using the cross-sectional research designs. Across
all disciplines and fields of study, with the advancement
in technology and the use of high-speed computers, more
and more data are being collected over many different
occasions and time points on the same individuals, leading to complex longitudinal data structures.
Challenges
Such longitudinal research studies present researchers
and evaluators across all disciplines with many
439
methodological and analytical challenges. For example, a common problem in analyzing longitudinal data
in many disciplines is that complete data for all measurements taken at different time points for all individuals may not be available for many reasons. One
possible reason is that some subjects are not available
for some of the data collection time points to provide
measurements or responses. Another reason is that
some subjects might drop out from the study in any
time point, that is, attrition. Further, mortality (attrition) can be another reason for having incomplete longitudinal data to make valid conclusions about growth
or change.
Categories
Longitudinal research designs and the corresponding
analytic methods can be classified into two broad
categories based on the methodological and statistical
assumptions of each category.
Longitudinal data can be analyzed using repeatedmeasures analysis via SPSS (Statistical Package for
Social Sciences) or SAS (originally statistical analysis software) software when the individuals longitudinal repeated measurements on a dependent variable
are taken over different periods of time. More complex repeated-measure designs are the ones that have
at least one independent between-subjects factor
(e.g., gender, grade, ethnicity) in addition to having
the individuals longitudinal repeated measurements
on the dependent variable taken over different periods
of time (within-subject factor). This type of longitudinal design, with both within-subjects factors (repeated
measurements) and between-subjects factors (independent variables), can also be analyzed using factorial repeated-measures designs via SPSS.
Using these traditional repeated-measures analytic
methods requires the complete longitudinal data,
where every individual has all the measurements for
all the time points with equal time intervals between
the repeated measurements. Missing longitudinal data
for some individuals at different time points or having
unequal time intervals between measurements poses
great complications for longitudinal researchers in
using these traditional statistical methods to analyze
the longitudinal data.
440
Longitudinal Studies
Further Readings
M
low cost relative to costs for similar-quality surveys
using interviewer-administered research methods. At
one time, the mail survey was clearly the low-cost
king, but the emergence of Internet-based surveys
offers researchers a second lower-cost alternative.
Another important benefit of the use of mail questionnaires is that, when properly designed and executed, the data collected are generally of high quality.
That is, the psychometric performance claimed for
scale measures is typically realized. This is not surprising, since a substantial proportion of measurement
scales commonly used in basic social research have
been developed using self-administered paper questionnaires. Furthermore, the fact that the mail questionnaire
is paper based may conjure up the feeling of an examination, resulting in relatively higher levels of respondent effort and attentiveness in filling out the form;
indeed, there is evidence that other forms of selfadministered interviewing, such as computer-based
Web surveys or kiosk surveys, may not yield data of
similar integrity.
A third advantage is that, when professionally and
diligently implemented among sample cases that are
accurately targeted to the sample population, mail
questionnaire surveys can be expected to achieve
response rates that are similar to or even higher
than those that would be achieved by intervieweradministered methods.
Although many factors that can be manipulated by
the researcher have been associated with the achievement of mail survey response ratesfor example,
type of postage used, content and appearance of the
cover letter, type and value of the incentivethe
MAIL QUESTIONNAIRE
The term mail questionnaire refers to the instrumentation of a self-administered survey that has been laid
out and reproduced on a paper-based printed medium
with the intention that data collection operations will
be implemented via traditional postal service deliveries. A survey researchers solicitation and collection
of data via postal communications that include a mail
questionnaire is called a mail (or postal) survey.
In mail surveys, the initial delivery of the mail questionnaire is typically in the form of a survey packet.
In addition to the mail questionnaire itself, this packet
typically includes a cover letter explaining the purpose
of the study and encouraging participation, a postagepaid pre-addressed return envelope, and some form of
pre-paid incentive or participation gift intended as a social
gesture of appreciation. (A crisp, new, uncirculated $1
bill is perhaps the most commonly used incentive
device, particularly for commercially conducted mail
surveys.) The survey packet often is preceded by an
advance notification of some sort (e.g., a postcard
informing the recipient that a research questionnaire
will soon follow in the mail), and several reminder
communications (which usually include one or more
replacement questionnaire mailings to nonresponders)
are typical.
Advantages
As a data collection methodology, mail questionnaire
research offers several advantages. One advantage is
441
442
Mail Questionnaire
Disadvantages
Despite the substantive and important benefits that
can be realized through the postal survey methodology, mail questionnaires also have many drawbacks.
Sampling Control. There is only a moderate level
of sampling control. While the researcher is able to
identify potential respondents by address, it must be
assumed that the intended respondent associated with
an address is the individual who actually completes
the mail questionnaire interview form.
Contingency Questions. Contingency questions, when
a particular answer to one question creates the need
for a skip pattern to another question later in the questionnaire, can be problematic. Thus, it is very important that researchers pretest how well skip instructions
can be followed by respondents, especially when
a skip leads a respondent to jump several pages ahead
in the questionnaire.
Corrections and Changes. The pre-printed nature
of the survey instrument offers little flexibility to the
researcher. Corrections and changes once a survey
field period starts, if possible at all, are difficult to
implement reliably. Likewise, though experimental
design manipulations are possiblefor example, different ordering of questions or response alternatives
across different randomly assigned respondentsthey
Mail Survey
MAIL SURVEY
A mail survey is one in which the postal service, or
another mail delivery service, is used to mail the survey materials to sampled survey addresses. What is
mailed usually consists of a cover letter, the survey
questionnaire, and other materials, such as a postagepaid return envelope, an informational brochure to
help legitimize the survey organization, detailed
instructions about how to participate in the survey,
and/or a noncontingent cash incentive.
In some mail surveys, it is the household or the
business at the address that is sampled, but in other
mail surveys it is a specific person at the address who
is sampled. In the case of a specific person being sampled, sometimes there is a specifically named person
(e.g., Martha Johnson) who is sampled and other
times it is the person with some specific characteristic,
such as householder or Chief Information Officer. In most instances, respondents are asked to mail
back the questionnaire to the researchers once they
have completed it. Some mail surveys provide respondents with multiple modes to choose for their
response, including dialing into a toll-free telephone
number or going to an Internet site, to complete the
questionnaire, rather than mailing the questionnaire
back in the return envelope the researchers provide.
443
Reducing Error
Three major types of survey error should be guarded
against when conducting a mail survey. These are (1)
coverage bias, (2) unit nonresponse error, and (3) error
due to missing data (item nonresponse error).
Coverage Bias
444
Mail Survey
The largest concern about mail surveys is unit nonresponse. In convincing sampled respondents to cooperate with a mail survey request, there generally is no
interviewer involved in recruiting respondents and thus
in persuading the reluctant ones to cooperate. As such,
almost all mail surveys that strive for reasonably good
response rates must do multiple follow-up mailings
to initial nonresponders. This adds to expense but is a
necessary technique for gaining good response from
a sample in a mail survey. Without follow-up mailings,
which obviously lengthen the field period of the survey, a likely majority of the sample will not respond at
all. If the nonresponders differ from responders in the
variables of interest to the researchers, then the larger
the nonresponse rate, the greater the extent of nonresponse bias in the survey estimates.
Item Nonresponse
Because a mail survey questionnaire is not administered by an interviewer, there is no one present at
the time data are being entered into the questionnaire
to persuade a respondent to answer all the questions
asked and to answer them fully and accurately. To
the extent certain questions are avoided (improperly
skipped) by respondents, and if those respondents
who do not provide accurate answers differ on the
variables from those respondents who do answer these
items, then missing data will bias survey estimates for
these variables.
Mail Survey
Reminder mailings are likely to be the most important technique for producing high response rates in mail
surveys. Reminder mailings typically contain a modified
version of the cover letter, a new questionnaire, and
445
446
Mail Survey
Oversight of the outgoing mailings includes managing all the printing that must be done and assembling
(stuffing) all the envelopes that must be mailed out. As
mindless an activity as stuffing envelopes can be, errors
due to carelessness can be frequent if the staff members assigned to this task are not attentive to what they
are doing. As part of quality control, whoever is overseeing the mailings should randomly sample outgoing
envelopes before they are mailed to determine the
quality with which they were assembled.
Oversight of how the incoming returns are processed is also extremely important. Each incoming
returned envelope must be dated and opened soon after
it arrives to determine what exactly was returned. That
is, not all returns will be completed questionnaires;
some respondents will return blank questionnaires with
or without indication of why they did so, and some will
contain returned cash incentives. Once it has been
determined what was returned in each envelope, this
information is logged into the database being used to
help track the surveys progress. Essentially this is
done on a daily basis for each day that incoming mail
is delivered throughout the field period, so that the
researchers can receive daily information about
whether the sample is performing as expected. In this
way, changes to the methodology can be made if necessary, for example, deciding to extend the field period
with another follow-up mailing or to change the incentive sent in the next follow-up mailing.
Isaac Dialsingh
See also Address-Based Sampling; Advance Contact;
Anonymity; Confidentiality; Contingent Incentives;
Coverage Error; Cover Letter; Gestalt Psychology;
Graphical Language; Missing Data; Noncontingent
Incentives; Quality Control; Questionnaire Design;
Questionnaire Length; Sampling Frame; Total Design
Method (TDM); Unit Nonresponse
Further Readings
Main Effect
SelfAdministered
x
s
x
c
x
p
x
i
InterviewerAdministered
Mode of Administration
Factor
8
7
1
0
Paper-and-Pencil
Survey
ComputerAssisted Survey
Figure 1
8
7
6
x
s
x
p
SelfAdministered
5
4
x
c
x
i
3
2
InterviewerAdministered
Mode of Administration
Factor
Percent of Item
Nonresponse
MAIN EFFECT
Percent of Item
Nonresponse
447
0
Paper-and-Pencil
Survey
ComputerAssisted Survey
Figure 2
448
administration (Xi ). There is no main effect for survey type because each level of that factor contains
identical amounts of item nonresponse (Xp and Xc ).
Figure 2 shows a slightly more complex relationship, with main effects for both survey type and mode
of administration. The main effect for the factor survey type shows that paper-and-pencil surveys have
a greater amount of item nonresponse (Xp ) when collapsed across mode of administration than the amount
of item nonresponse in computer-assisted surveys
(Xc ). The main effect for the factor mode of administration shows that self-administration results in more
item nonresponse (Xs ) than interviewer administration
(Xi ) when collapsed across levels of survey type.
Main effects are not determined by merely eyeballing graphs, however. Identifying main effects requires
statistical calculations examining whether the differences between levels of a factor on the dependent
variable are no different than would be expected due
to chance. The statistical calculation examining main
effects results in an F-statistic that is computed by
dividing the total variance between levels of the factor
by the total variance within the factor. The Statistical
Package for the Social Sciences (SPSS) and SAS are
two statistical software packages that can compute
main effects using factorial analysis of variance and
will inform the user whether there are statistically significant main effects.
In addition to main effects, factors can interact with
one another to produce an interaction. If an interaction
is present, caution should be used when interpreting
main effects.
Dennis Dew
See also Analysis of Variance (ANOVA); Dependent Variable;
Experimental Design; Factorial Design; F-Test; Independent
Variable; Interaction Effect; Random Assignment; SAS;
Statistical Package for the Social Sciences (SPSS)
Further Readings
Marginals
449
Further Readings
MARGINALS
As it applies to survey research, a marginal is a number at the margins (at the edge or perimeter) of
a cross-tabulation table of two or more variables. Statistical software that is used by survey researchers,
such as SAS and the Statistical Package for the Social
Sciences (SPSS), routinely create cross-tabs with the
marginals showing as the default setting.
Table 1 shows a cross-tabulation between two
variableseducational attainment (Not High School
Grad, High School Grad, College Grad) and belief in
the existence of extraterrestrial life (Believe, Not
Sure, Do Not Believe)from a survey conducted in
1996. This table displays the number of respondents
(i.e., absolute frequency counts) that fall into each
of the conditions (cells) shown in the table. The
marginals in the Total column show the number of
450
Table 1
HS
Grad
College
Grad
Total
Believe
35
247
138
420
Not Sure
25
146
77
248
Do Not Believe
26
114
31
171
Total
86
507
246
839
Source: Buckeye State Poll, December 1996; Ohio State University Center for Survey Research.
Table 2
Not HS
HS
College Total
Grad (%) Grad (%) Grad (%) (%)
Believe
Not Sure
Do Not Believe
Total
8.3
58.8
32.9
40.7
48.7
56.1
10.1
58.9
31.0
29.1
28.8
31.3
15.2
66.7
18.1
30.2
22.5
12.6
10.3
60.4
29.3
50.1
29.6
20.4
100
Source: Buckeye State Poll, December 1996; Ohio State University Center for Survey Research.
Mass Beliefs
451
effect (deff). Accordingly, there are a number of different ways to determine the MOE, all of which are
dependent upon the particular sampling design used.
Although these equations change, the essential importance of the sample size does not.
Often, the MOE is incorrectly interpreted. For
instance, the results of the Pew poll did not indicate
that Bush and Kerry were statistically tied. Nor did
they indicate that Nader could have received 0% of the
vote, even though the MOE on the Bush percentage
was 2:5%; that is because there was a separate (and
smaller) MOE on the Nader percentage. They also did
not imply that a lead mattered only if it was greater
than 2.5%. The MOE indicated only that the expected
true percentage for the population of likely voters
would be within the MOE of the polls reported percentage X% of the time, where X represents the chosen
confidence level.
James W. Stoutenborough
See also Confidence Interval; Confidence Level; Design
Effect (deff); Multi-Stage Sample; N; Poll; Pollster;
Population; Random Sampling; Respondent; Sample
Size; Sampling Error; Simple Random Sample;
Total Survey Error (TSE); Variance
Further Readings
MASS BELIEFS
The concept of mass beliefs refers to the norms,
attitudes, and opinions held by the general public as
opposed to those held by elites (e.g., politicians, journalists, and scholars). The term does not imply that
all members of the public (masses) hold the same
beliefs, but rather that certain beliefs are held in common by subsets of the general citizenry that are nonnegligible in size. Nowadays, surveys and polls often
452
Mass Beliefs
Mass Beliefs
453
However, most Americans were not believed to possess an ideology. Philip Converse said in the early
1960s that mass beliefs as expressed through ideology
were largely unimportant in American politics and that
numbers alone did not create political power. His view
did not rule out ideology as a driving force. While
ideology was not a mass characteristic, it was characteristic of political elites, who behaved accordingly,
using central ideological principles and information
that logically related to the exercise of power. Because
ideology did not apply to everyone, the idea of
constraint was advanced. Constraint referred to
the extent to which an individuals beliefs were interrelated and the degree to which one belief could be predicted if another were known. Disputed research
analyses from the 1970s, relying on data collected
during turbulent times that stimulated new issues (the
Vietnam War, Watergate, race conflicts, urban crises,
economic instabilities), suggested that a newer generation of voters generally was more aroused. Contributory factors were said to include weakening party
identification, dissatisfaction with political processes,
increases in education that correlated with ideological
perspectives, and increases in coherent and consistent
issue voting.
Challenges to the hypothesis of minimal constraint
in mass belief systems (unconnected beliefs, ideas,
attitudes, and positions across issues) have been
voiced since the 1980s. Those against the hypothesis
argued that voters reason about parties, issues, and
candidates and make inferences from observations
gleaned from mass media, political party campaigns,
and informal opinion leaders. Knowledge about the
past and present and projections to the future play into
voter decisions. Compared to elites, such reasoning
probably is based on lower levels of information and
political understanding, because voters use informational shortcuts to simplify thought processes. Recent
theory downplays the electorates mass nature and
focuses on a more educated polity that pays attention
to issues that are personally relevant and well thought
out. Converse published a rejoinder in 2000, noting
that the average level of electorate information was
low, but there was high variance that could be
explained by ability, motivation, and opportunity. He
pointed out that degree of ideology and information
diminished rapidly as one moved from the elite to the
masses, except for lingering affective traces. Today,
the conceptualization of voters as motivated by issue
interest groups and ideological stances has led to
454
Matched Number
more concern for how a diverse electorate is fragmented or compartmentalized (for example, along racial
lines). Fewer theoreticians view society as divided
dichotomously into power elite versus the masses.
Whether or not the population is passive or active
has been a key consideration in the modern variant of
the earlier debate. Widespread adoption of converging
computer, Internet, and media technologies has given
individuals and groups the potential freedom to interact
on a global scale, freeing them from a strictly consumer role to a producer role that is equally (or more)
attractive. In contemporary debate, Robert D. Putnams
bowling alone perspective sees socially isolated people as lacking social capital and sometimes uncritically
accepting whatever is suggested, adopting the course
of least resistance. Another voice articulates the vision
of a somewhat antiauthoritarian brave new virtual
community of Internet enthusiasts who dynamically,
voluntarily, critically, and cooperatively search for and/
or produce particularized, selected cognitive information and who relish a variety of emotional sensations.
Whether one or the other critical perspective will be
vindicated depends in large part on whether future
communication technologies are controlled by elite,
powerful, centralized, and top-down organizations and
institutions or by a decentralized system that allows
access so that ordinary users can build social capital,
become more energized in social processes, and can
exercise more individual and collective power.
Ronald E. Ostman
See also Attitudes; Political Knowledge; Public Opinion;
Social Capital
Further Readings
MATCHED NUMBER
A matched telephone number is one that has a mailing
address associated with it. Typically, it also has a name
matched to it. The majority of matched telephone numbers are also listed telephone numbers, but some are
unlisted. Unlisted numbers (those not listed with directory assistance or published in any local telephone
book) can be matched to an address (and possibly
a name) because the commercial vendors that perform
the matching use databases that contain some unlisted
telephone numbers with addresses and names, such as
those that can be retrieved from public records in many
states (e.g., vehicle registration lists, public tax bills,
and other public records and databases). However, this
matching process is not 100% reliable, since people
often move or change their telephone numbers.
Whether or not a telephone number can be matched
is predictive of the likelihood that a completed interview will be attained with that household in a telephone
survey. A greater proportion of interviews are completed with numbers that are matched than are completed with unmatched numbers. A primary reason for
this is that matched numbers have an address associated with them. As such, researchers can send advance
mailings to these households when they are sampled
for a telephone survey to alert them (warm them up)
to the fact that an interviewer will be calling them.
Advance letters with as small a cash incentive as $2
have been found to raise cooperation rates by approximately 10 percentage points in general population
telephone surveys in the United States. Another important reason that cooperation rates in telephone surveys
are higher for matched numbers is that those whose
numbers are able to be matched are generally less likely
to regard a telephone interviewer contacting them as an
invasion of their privacy.
On average, matched telephone numbers require
fewer callbacks than unmatched numbers to reach
a proper final disposition. Thus, the calling rules used
by a survey center to process matched numbers should
differ from the rules used to process unmatched numbers. However, unless a survey center has its telephone
samples screened for matched/unmatched status or
receives this information for each number in the sample from its sample vendor, it will not be possible for
the survey center to take the matched/unmatched status
into account as their computer-assisted telephone interview (CATI) system processes the callback attempts.
Paul J. Lavrakas
See also Advance Contact; Advance Letter; Calling Rules;
Cold Call; Computer-Assisted Telephone Interviewing
(CATI); Listed Number; Random-Digit Dialing (RDD);
Telephone Surveys
Further Readings
MEAN
The mean is a descriptive statistic that survey researchers commonly use to characterize the data from their
studies. Along with the median and mode, the mean
constitutes one of the measures of central tendency
a general term for a set of values or measurements
located at or near the middle of the data set. The arithmetic mean is the most commonly used measure of
central tendency and is what is commonly referred to
as the average of the data values. The mean is calculated by taking the sum of the data set and dividing by
455
456
predicted parameter and the observed parameter. Formally, this can be defined as
2
E* :
allows the researcher to speak in terms of the variance explained by the model and the variance left to
random error. A model that has a nonzero bias term
can be somewhat problematic since the MSE value
serves as the basis for the coefficient standard error,
which is then compared to the coefficient magnitude
to create the t statistic. A biased MSE can affect
these estimates in many ways.
A positive bias term implies that the estimated
value is higher than the true value ultimately drawing
the t statistic closer to zero, resulting in an increase
in Type II error. A negative bias term implies that
the estimated value is lower than the true value,
which pushes the t statistic away from zero, resulting in an increase in Type I error. Additionally, a
relatively low MSE value does not necessarily imply
that the parameter estimate is unbiased, since a relatively high bias term can be compensated for by
a minimal variance in the estimated parameter. All
of these things should be kept in mind when using
the MSE value for variable selection and model
comparison.
However, when determining how well a statistical
model fits the data, MSE can be a valuable tool,
because it allows one to calculate the average error that
the parameter estimate produces, which can then be
partitioned into the variance of the estimated parameter
and some bias term. With MSE, one can compare ones
present model error to the error that one would expect
given the data, which is useful for interpreting a models
explanatory power as well as comparing it to other
models that attempt to achieve the same end. Ultimately, MSE can be used to help minimize the errors
of a given model and is one of many tools that survey
researchers and other social scientists use to conduct
meaningful quantitative research.
Bryce J. Dietrich
See also Confidence Interval; Random Error; Significance
Level; Standard Error; Standard Error of the Mean;
t-Test; Type I Error; Type II Error; Unbiased Statistic;
Variance
Further Readings
Measurement Error
MEASUREMENT ERROR
Measurement is the assignment of symbols, usually
numbers, to objects according to a rule. Measurement
involves both creating a rule and making assignments.
The symbols to be assigned represent attributes of the
object. Error in measurement is any deviation of the
assigned symbol from the true value that should
be designated to the object. A term that is used to
refer to how accurately something is measured is construct validity.
For example, a researcher might want to measure
a persons level of education. In this case, the person
is the object and level of education is the attribute for which the researcher wants a value assigned
to each object. The goal of measurement is to assign
to the person a symbola numberthat represents
her or his true educational attainment. In order to
achieve this goal, the researcher needs first to define
education and its range of values. Then the researcher
needs to devise a method to designate a value of education for the person. There are myriad ways to do
this, including observing the persons dress or behavior, documenting the vocabulary the person uses in
everyday discourse, retrieving information from school
records, testing the persons knowledge of various
subjects, or asking the person to report how many
years of schooling she or he completed. The information obtained then is converted to a value or category
of education.
457
458
Measurement Error
For example, a test of knowledge to measure educational attainment may cover all significant areas of
knowledge, but the individual questions in the test
instrument may be confusing. Or, the questions may be
appropriate, but the way in which scores are combined
in categories may give undue weight to some sorts of
knowledge. Any of these sources of errormismatch
between the concept and the operational definition, illformed observational techniques, or misguided scoring
methodscan affect the validity of a measure. For
a measure to be truly valid, it must be error-free. In
practice, however, the best researchers can claim is that
one measure is more valid than others. No measure is
completely error-free.
The types of error affecting a measures validity
can be divided into two classessystematic error (or
bias) and random error (or variance). In order to identify these types of error, researchers need to be able to
observe how a measure performs in repeated trials
with the same people in the same conditions. Systematic error is the tendency for a measure to produce
scores that are consistently different from the true
score in one way or another. A measure is biased if,
in repeated applications, its scores tend to deviate
from the true score in one direction. This might occur,
for example, if a particular measure of education
tended to systematically over- or underestimate the
level of attainment. Another form of systematic error
occurs when the scores produced by a measuring
technique have errors that are correlated with the true
value of the variable. This might occur if a measure
of education tended to consistently underestimate the
values for people with higher educational attainment
and to consistently overestimate the values for people
with lower levels.
A measure is affected by random error if, in
repeated trials, its scores deviate (vary) from the true
score with no consistent pattern. The results of
repeated applications produce scores that are all over
the map, not clustering at one level or another. The
less a measure is subject to random error, the more
reliable it is said to be. Reliable measures are ones
that tend to produce consistent scores over repeated
trials, even if those scores are not actually valid ones.
Summarizing terms to this point, a measure is valid
to the degree that it is error-free. Two types of error
are threats to validity: systematic and random. Systematic error, bias or correlated error, decreases the
validity of a measure because the scores produced are
consistently wrong. Random error decreases validity
Measurement Error
459
respondents. If respondents need to sort through possible meanings of what is being asked, they will need
to expend more effort. If they fail to do the extra
work, erroneous responses can result.
Task difficulty (i.e., respondent burden) must be
considered when constructing questions. Independent
of the words used or the syntax of the question, the
kind of information requested will affect how well
respondents perform. Some questions ask respondents
to recall events in their lives. The further removed in
time these events are, the less likely respondents will
be to recall them or to report them within the correct
time frame. This is particularly true of events that are
not salient. For example, asking a respondent what he
or she purchased at the grocery store a month ago presents a task that is very difficult if not impossible
to perform accurately. Asking an adult to report on
events that occurred in childhood is another example
of a task that is fraught with difficulty.
Apart from burdens on memory, there are other
sorts of task difficulty that can be posed by questions.
For example, some survey items ask respondents to
perform quasi- or actual arithmetical calculations
for example, to report what percentage of their time
they spend on different activities, or to report their net
worth. To answer such questions with any degree of
accuracy would require substantial time and calculation. A single survey question does not provide the
requisite conditions for an accurate response.
Another form of difficulty is posed by questions
that ask respondents to report on behaviors or feelings
that are socially disapproved. It is harder to admit to
being lazy or to harboring prejudicial feelings than
it is to report on completed projects and good will
toward others, especially when a questionnaire is
administered by an interviewer. Generally speaking,
saying Yes (i.e., acquiescing) may be easier than
saying No for some people. Norms of social desirability may also vary by culture, so task difficulty
needs to be anticipated with this factor in mind.
Questions also present respondents with different
kinds of response alternatives or options that may lead
to measurement error. Open-ended questions require
respondents not only to retrieve and organize relevant
information but also to express themselves in ways
that they think are responsive to the queries. It is frequently argued that respondents should be allowed to
speak for themselves in this way rather than being
confined to selecting a response from a list of categories. But the freedom to formulate a response to an
460
Measurement Error
open-ended question can involve considerable cognitive effort on the part of a respondent, particularly if
the question concerns a topic that is not familiar.
Articulation of an open-ended response will also be
more or less difficult for people who are taciturn or
gregarious in general.
Closed-ended questions ask respondents to choose
a response from a list of pre-set categories. In general,
this task may be an easier one for respondents than formulating an open-ended response, because it is clearer
what sort of answer is expected. On the other hand,
respondents can be influenced in their response selection by the range of alternatives offered. Additionally,
the list of options may be discordant with the body, or
stem, of the question. For example, a question may ask
a respondent how frequently he or she has what is,
in general, a very rare experience. Respondents may
also find themselves without a category that appears
to correctly map onto their experience or feelings.
Some respondents may believe their response best
fits between the categories that are offered to choose
from. Finally, offering or withholding a Dont know
option for selection among the list of response alternatives can have a profound effect on the number of
respondents who will provide this response.
Individual questions are set among others in a questionnaire. The context within which a given question
appears can affect responses to it. For example, if
respondents are asked for their attitude toward a particular government policy in a series of questions, their
responses to later items may assume that the earlier
answers are taken into account. Another sort of context
effect can occur when questions placed later in a long
questionnaire do not receive the same level of attention
that the lead-off items got. Fatigue may lead respondents to minimize effort in the later questions.
These are some of the ways in which question
wording, form, and context can lead to measurement
error. The way in which questions are communicated
(i.e., the survey mode of data collection) to respondents
can also have an effect on measurement validity.
Respondents can encounter questions in a face-to-face
interview, a telephone contact (either via an interviewer
or Interactive Voice Response technology), a paperand-pencil form, a laptop computer instrument, or an
Internet survey. When respondents speak directly to
interviewers, their answers may be affected by what
they think the interviewer expects or will approve.
When respondents complete a questionnaire without
interviewer involvement, their responses can be shaped
by the layout of and graphics used in the self-administered form. Both interviewers and respondents can
make errors in the way that they record responsesfor
example, failing to record responses verbatim or ignoring a questionnaire skip pattern.
Measurement error can also occur when survey
responses are processed at the end of a study.
Responses to open questions need to be summarized
in categories. The coding process can misplace
a respondents intended answer or it may not find
a place for it at all. Achieving reliable placement of
similar responses across coders can be an arduous
and expensive process. Closed questions have many
fewer coding problems, but they are not immune to
data entry errors.
A substantial part of the field of survey methodology is about identifying, reducing, and/or correcting
measurement error. Given the complexity of survey
investigation, involving the abstract realm of conceptualization and the intricate practicalities of getting
information from respondents, confronting measurement error is an enormous task. But while measurement error will never be eliminated, it is becoming
better understood. And the study of the phenomenon
itself provides much insight into human behavior.
Understanding measurement errors teaches us truths
about how people think and communicate.
Peter V. Miller
See also Acquiescence Response Bias; Bias; Closed-Ended
Question; Coder Variance; Coding; Cognitive Aspects
of Survey Methodology (CASM); Construct Validity;
Dont Knows (DKs); Forced Choice; Gestalt
Psychology; Graphical Language; Interviewer-Related
Error; Mode of Data Collection; Mode-Related Error;
Open-Ended Question; Primacy Effects; Questionnaire
Design; Questionnaire Length; Questionnaire-Related
Error; Question Order Effects; Random Error; Recency
Effects; Record Check; Respondent Burden;
Respondent Fatigue; Respondent-Related Error;
Response Alternatives; Self-Reported Measure;
Social Desirability; Systematic Error; Total Survey
Error (TSE); True Value; Validity; Variance
Further Readings
Media Polls
461
MEDIAN
Further Readings
Median is a descriptive statistic that researchers commonly use to characterize the data from their studies.
Along with the mean (average) and mode, the median
constitutes one of the measures of central tendency
a general term for a set of values or measurements
located at or near the middle of the data set. The
median is calculated by sorting the data set from the
lowest to highest value and taking the numeric value
occurring in the middle of the set of observations. For
example, in a data set containing the values 1, 2, 3, 4,
5, 6, 7, 8, and 9, the median would be the value 5 as
it is the value within the data set that appears in the
middlewith four observations less than and four
observations greater than the median value. The
median can also be thought of as the 50th percentile.
It is possible that a data set can have a median that
is not a specific observation within the data set. This
happens when the data set has an even number of
observations. In this instance, the median would be
the mean of the two middle numbers. For example, in
a data set containing the values 1, 2, 3, 4, 5, and 6,
the median would fall between the values 3 and 4. In
this instance, the median would be 3.5. There are
three observations less than and three observations
greater than the median value.
Unlike the mean, the median is not influenced by
extreme outlying data points within the data set. For
instance, in a response to a survey question about
annual personal income, if one respondent reports an
income that is 10 times greater than the next closest
person, this respondent would be an outlier and would
skew the mean value upward. However, the median
would be unaffected by this outlier and would more
accurately represent the middle of the data set. Thus,
the median is often used when a data set has outlying
data points that could influence the mean and thereby
misrepresent the middle of the data set. This also is
common in survey questions on home prices or issues
related to costs and finances, when extreme outliers
can dramatically affect the mean value. In this instance,
MEDIA POLLS
News organizations conduct or sponsor public opinion
research as part of their ongoing news coverage,
including but not limited to election campaign coverage. Media polls, also called news polls, have
attained wide prevalence as a reporting tool but also
remain a focus of debate and occasional controversy.
Media polls are a central part of what Philip Meyer
has explained as precision journalism.
News may be defined as timely information, professionally gathered and presented, about events and
conditions that affect or interest an audience. It can
include what people do and also what people think.
Polls provide a systematic means of evaluating both,
elevating anecdote about behavioral and attitudinal
trends into empirically based analysis.
News organizations long have reportedand continue to reportcharacterizations of public preferences without the use of polls, relying on expert
(and sometimes inexpert) evaluations, informed (and
sometimes uninformed) speculation, punditry, proselytizing, and conventional wisdom. Rigorous polling
improves on these.
The media also have reported polls (rigorous and
otherwise), whether provided by outside sources (e.g.,
government, academic, or corporate entities, interest
groups, and public relations firms); syndicated or circulated by independent polling companies (in some cases
for promotional purposes); or self-initiated. Only polls
in the last category are classified as media polls, as
opposed more broadly to polls reported by the media.
Media polls may best be understood as a means of
covering a news beatthe beat of public opinion. In
a process that in many ways closely reflects other news
462
Media Polls
Neither has a strong empirical foundation; candidate leads in fact change hands, and turnout rises and
falls from jurisdiction to jurisdiction independently of
the reported standings in any one contest on the ballot. Nor does denying the public information that is
readily available to the campaigns and related interest
groups seem preferable. (The medias tendency to
focus on the horse race in covering election polls, to
the exclusion of richer evaluative data, is a more persuasive concern.)
Broader criticisms suggest that media polls may
manufacture opinions by measuring attitudes that in
fact are nonexistent, lightly held, or ill founded; that
they selectively accentuate issues through their choice
of topics; or that they misreport attitudes through inadequate sampling methodology, ill-constructed questions, or ill-conceived analysis. None of these is
specific to media polls per se, but to all polls; and all,
again, can be answered. The absence of opinions on
given questions can be expressed and tabulated; measurements of strength of sentiment can be included;
choice of topics to cover is a feature of all news reportage, not news polls alone; and the quality of output is
an individual- or product-level matter rather than an
indictment of the enterprise overall.
However, media polls, like all polls, require careful
scrutiny of their methodological and analytical bona
fides. Indeed, given the credibility they lend data,
news organizations that conduct or sponsor polls have
a special responsibility to uphold the highest possible
standards of methodological rigor. Some acquit themselves well. Others fall short.
A related aspect of media involvement in polls is
in standards and vetting operations, in which news
organizations set basic disclosure requirements and
methodological standards for the survey research they
will report, then undertake concerted efforts to ensure
that any data under consideration meet those standards. This is a vital function, too long and still too
frequently avoided, to ensure the integrity of news
reports that incorporate polls and other data.
Ultimately, perhaps the best rationale for media polls
stems from the fundamental premise of an independent
news media. Non-news organizations often conduct
polls in their attempts to influence the public discourse,
and they will continue to do soregardless of whether
or not news organizations conduct their own polls.
Political campaigns will measure their candidates
standings and the attitudes behind those preferences and
use those data to direct and sharpen their messages and
Metadata
Further Readings
METADATA
There is no single definition that adequately describes
metadata, though it often is referred to as data about
463
data or is a description of data. In other words, metadata is a set of highly structured and/or encoded data
that describes a large set of data. It explains the data
to be collected, processed, and published and answers
questions regarding every facet of the documented
data. The data can be an individual data item or a
collection of data items, with a primary purpose of
managing, understanding, and facilitating data. Most
important, metadata describes diverse data products
by emphasizing the similarities between them, thus
allowing people to understand the diverse data a
certain organization has produced.
464
Methods Box
Further Readings
METHODS BOX
A methods box is a short news story or sidebar that
accompanies poll stories and provides methodological
details and clarifications about the survey, including
how the respondents were sampled, how the interviews were conducted, the process of weighting the
results, and the surveys possible error.
Many newspapers include a boxed feature (sometimes in a smaller type size) alongside a major poll
story, and most television networks include the equivalent on their Web sites with details about their survey methodology. For example, it is policy at The
New York Times that a methods box accompanies all
Minimal Risk
Further Readings
MINIMAL RISK
Minimal risk is a concept that relates to the protection
of human subjects and thus to survey ethics. In
465
466
Misreporting
Nonminimal Risk
Nonminimal Risk
Minimal Risk
Nonminimal Risk
Further Readings
Increasing Probability of Harm
Figure 1
individual IRB to determine the thresholds for likelihood of harm and severity of harm (should harm
occur) that moves a research program beyond the
realm of minimal risk.
It is generally understood that, for several reasons,
a system for classifying research as minimal risk is
a useful and important function of an IRB. These reasons include the following:
1. It helps the IRB be more efficient and effective
in its oversight of research activities across its organization. Research programs not designated as minimal
risk are deemed most appropriate for more focused
oversight attention than minimal risk research because
they have been identified to entail greater likelihoods
of causing harm to human research participants.
2. It identifies those research programs suitable for
expedited review by an IRB. Research protocols or
changes to the protocol requested by the investigator
that are reviewed on an expedited basis are not
reviewed by the full membership of the IRB, but
rather by the IRB director alone. (Research classified
as minimal risk need not be reviewed on an expedited
basis, however, as an IRB director always has the prerogative to refer a matter to the full board for review.)
3. It identifies those research studies for which the
IRB is permitted to allow a modification to the normal
requirements for obtaining informed consent. Specifically, minimal risk status is one of the four requirements under Chapter 45, Subpart A, Section 46.116,
Paragraph (d) of the Code of Federal Regulations that
render a research program eligible for an IRB to consider the approval of elimination of one or more of
MISREPORTING
Misreporting is the deliberate or nondeliberate reporting of inaccurate or untruthful answers to survey questions. It is often referred to as response error. While
survey researchers may attempt to gather accurate and
truthful responses, respondents are not always willing
or able to comply. Misreporting is a major concern for
data collected about sensitive topics such as abortion,
prejudice, sexual behavior, and income. Misreporting
can also occur in the case of nonthreatening questions.
Collecting inaccurate and untruthful responses limits
the validity of conclusions that can be drawn from survey data.
Respondents may be motivated to deliberately
misreport answers when asked sensitive topic questions about behaviors or attitudes for four main reasons: (1) social desirability concerns, (2) protection
of the respondents own self-concept, (3) embarrassment, and (4) fear that unauthorized disclosure may
cause harm. When respondents answer questions
based on one of these types of motivation, attitudes
and behaviors that are socially desirable tend to be
Missing Data
Further Readings
MISSING DATA
An important indicator of data quality is the fraction
of missing data. Missing data (also called item nonresponse) means that for some reason data on particular items or questions are not available for analysis.
In practice, many researchers tend to solve this problem by restricting the analysis to complete cases
through listwise deletion of all cases with missing
data on the variables of interest. However, this results
in loss of information, and therefore estimates will be
less efficient. Furthermore, there is the possibility of
systematic differences between units that respond to
a particular question and those that do not respond
that is, item nonresponse error. If this is the case, the
basic assumptions necessary for analyzing only complete cases are not met, and the analysis results may
be severely biased.
Modern strategies to cope with missing data are
imputation and direct estimation. Imputation replaces
the missing values with plausible estimates to make
the data set complete. Direct estimation means that all
467
available (incomplete) data are analyzed using a maximum likelihood approach. The increasing availability
of user-friendly software will undoubtedly stimulate
the use of both imputation and direct estimation
techniques.
However, a prerequisite for the statistical treatment
of missing data is to understand why the data are missing. For instance, a missing value originating from
accidentally skipping a question differs from a missing
value originating from reluctance of a respondent to
reveal sensitive information. Finally, the information
that is missing can never be replaced. Thus, the first
goal in dealing with missing data is to have none.
Prevention is an important step in dealing with missing data. Reduction of item nonresponse will lead to
more information in a data set, to more data to investigate patterns of the remaining item nonresponse
and select the best corrective treatment, and finally
to more data on which to base imputation and a correct analysis.
A basic distinction is that data are (a) missing completely at random (MCAR), (b) missing at random
(MAR), or (c) not missing at random (NMAR). This
distinction is important because it refers to quite
different processes that require different strategies in
data analysis.
Data are MCAR if the missingness of a variable is
unrelated to its unknown value and also unrelated to
the values of all other variables. An example is inadvertently skipping a question in a questionnaire. When
data are missing completely at random, the missing
values are a random sample of all values and are not
related to any observed or unobserved variable. Thus,
results of data analyses will not be biased, because
there are no systematic differences between respondents and nonrespondents, and problems that arise are
mainly a matter of reduced statistical power. It should
be noted that the standard solutions in many statistical
packages, those of listwise and pairwise deletion, both
468
Missing Data
Three main patterns can be discerned in item missing data: (1) the data are missing systematically by
design (e.g., contingency questions); (2) all the data are
missing after a certain point in the questionnaire (partial
completion); and (3) data are missing for some questions for some respondents (item nonresponse).
Missing by Design
In this case, all questions are applicable to all respondents, but for reasons of efficiency not all questions
are posed to all respondents. Specific subsets of questions are posed to different groups of respondents,
often following a randomized design in an experiment
(i.e., random assignment) that makes the missingness
mechanism MCAR. Again, since the missingness
mechanism is accessible, the incomplete data can be
handled statistically and the analyses give unbiased
results.
Partial Completion
Missing Data
469
The default options of statistical software are usually listwise or pairwise deletion or some simple imputation technique such as mean substitution. These
solutions are generally inadequate. Listwise deletion
removes all units that have at least one missing value
and is clearly wasteful because it discards information.
Pairwise deletion removes cases only when a variable
in a specific calculation is missing. It is less wasteful
than listwise deletion, but it can result in inconsistent
correlation matrices in multivariate analyses, because
different elements in the correlation matrix may be
based on different subsamples. Simplistic imputation
techniques (e.g., mean substitution) often produce
biased point estimates and will always underestimate
the true sampling variances. Listwise and pairwise
deletion and simple imputation are likely to be biased,
because these methods are all based on the strong
assumption of MCAR, which seldom is warranted.
Therefore, the best policy is to prevent missing data as
much as possible, and when they occur to employ an
470
Missing Data
the hot-deck method, the data file is sorted into a number of imputation classes according to a set of auxiliary variables. Missing values are then replaced by
observed values taken at random from other respondents in the same imputation class.
There are two fundamental problems associated with
imputation. First, using the information in the observed
data to predict the missing values emphasizes the structure in the completed data. Second, analyzing the completed data set uses a spuriously high number of cases
and thus leads to biased significance tests. Donald
Rubin proposes to solve both problems by using multiple imputation: Each missing value is replaced by two
or more (M) plausible estimates to create M completed
data sets. The plausible values must include an error
term from an appropriate distribution, which solves the
problem of exaggerating the existing structure in the
data. Analyzing the M differently completed data sets
and combining the estimates into an overall estimate
solves the problem of the biased significance test.
In the multiple imputation approach, analyzing M
data sets and having to combine the results is cumbersome but not especially complex. What is difficult is
generating the M data sets in a proper manner. A nonparametric method is to (a) compute for each respondent the propensity to have missing values on a specific
variable, (b) group respondents into imputation classes
based on this propensity score, and (c) use hot-deck
imputation with these imputation classes. Parametric
imputation methods assume a model for the data and
use Bayesian methods to generate estimates for the
missing values. These methods are described in detail
by Joseph L. Schafer. When multiple imputation is
used, it is important that the model for the data generation is very general and includes those variables that are
important for predicting either missingness or the variables of interest.
Edith D. de Leeuw and Joop Hox
See also Contingency Question; Error of Nonobservation;
Hot-Deck Imputation; Ignorable Nonresponse;
Imputation; Multiple Imputation; Nonignorable
Nonresponse; Nonresponse Error; Panel Attrition; Partial
Completion; Random Assignment; Unit Nonresponse
Further Readings
Mitofsky-Waksberg Sampling
MITOFSKY-WAKSBERG SAMPLING
Mitofsky-Waksberg sampling is a two-stage, clustered
approach for selecting a random sample of telephone
numbers. Developed by Warren Mitofsky and Joseph
Waksberg in the 1970s, this was an innovative
approach designed to improve the operational efficiency
of telephone samples through reductions in the proportion of unproductive numbers dialed. Prior to the development of Mitofsky-Waksberg sampling, unrestricted
random-digit dial (RDD) was used, but this method
was operationally inefficient as it led interviewers to
call far too many nonworking numbers. MitofskyWaksberg sampling (including modified versions of the
basic approach) was the predominant approach used for
selecting samples for RDD telephone surveys throughout the 1970s and 1980s, but it was largely supplanted
by list-assisted RDD by the early 1990s.
An understanding of the various approaches used
for RDD sampling requires some knowledge of the
structure of a telephone number. In the United States,
telephone numbers are 10-digit strings. The first three
digits are the area code, and the first six digits are the
telephone exchange. A 100-bank is a set of telephone
numbers having the same first eight digits. Historically,
telephone numbers were geographically clustered.
However, under provisions of the Telecommunications
Act of 1996, customers are able to retain their telephone numbers when switching from one telephone
service provider to another, even when that switch
involves a geographic move or a switch between
471
Implementation
In the first stage of selection in the Mitofsky-Waksberg
approach, the set of telephone exchanges is limited
to those exchanges designated for residential use, and
a sample of 100-banks is selected for the sampling
frame. A random two-digit suffix is appended to each
sampled 100-bank to obtain the prime number. Each
prime number in the sample is dialed to determine
whether it is a residential number. If the prime number
is a residential number, the 100-bank is retained in the
sample, and in the second stage of selection, additional
telephone numbers (secondary numbers) are selected
in that 100-bank. If the prime number is not a residential number, then the 100-bank is excluded from the
second stage of sampling. Following the second stage
of selection, attempts are made to complete interviews
until a predetermined fixed number (k) of residential
numbers is identified among the secondary numbers in
the 100-bank. The total number of residential numbers
in the sample is mk + 1.
A disadvantage of the Mitofsky-Waksberg method
is that the selection is sequential; all primary numbers
must be resolved before the second stage of sampling
can occur, and each secondary unit must be resolved
before additional units can be selected. Noncontact
cases (ringno answer and answering machine results)
are problematic in that regard. Richard Potthoff, J.
Michael Brick, and Joseph Waksberg each developed
modified Mitofsky-Waskberg methods to address the
sequential nature of the sample.
Efficiency
To evaluate the efficiency of the Mitofsky-Waksberg
approach, the precision of survey estimates and the
472
Mixed-Mode
Effect on Precision
Let m denote the number of 100-banks in the sample, 2 the unit variance of a characteristic y, and
(rho) the intraclass correlation of the characteristic y
(i.e., the correlation in y among units in the same 100bank). Mitofsky-Waksberg sampling results in an
equal probability sample of residential telephone
numbers. Therefore, the effect of using this approach
on the variances of survey estimates is due to clustering of the sample of telephone numbers within
exchanges. The variance of the sample mean y is
2
approximately V1 = mk+ 1 1 + k, and, therefore,
Further Readings
MIXED-MODE
Effect on Cost
t
+
p
k+1
Cu
CM W
=
1 + k:
Cp
Cunrest
+ 1 1 p
Cu
Mixed-mode surveys (sometimes referred to as multimode surveys) combine different ways (modes) of
collecting data for a single project. Different methodologies may be used during distinct phases of a survey,
such as recruitment, screening, and questionnaire
administration, or they may make use of different
survey modes during a single phase, like data collection. Mixed-mode surveys may involve combinations
of more traditional survey modes such as face to face,
telephone, and mail, or may include some of the
newer modes like Internet, cell phone, diaries, or
interactive voice response (IVR).
Mixed-Mode
of the research question(s) posed, population of interest, and amount of data to be collected. Next is the
desire to reduce the total survey error in a project,
which is the error from all potential sources, including
coverage, sampling, nonresponse, and measurement
error. The decision is also affected by the time frame
available for data collection. Some modes (such as
mail surveys) require considerably longer field periods than other modes (such as telephone surveys).
Finally, cost is an important consideration, given that
researchers typically need to operate within a fixed
budget.
473
474
Mixed-Mode
Mode Effects
475
Further Readings
MODE
The mode is a type of descriptive statistic that
researchers commonly use to characterize the data
from their studies. Along with the mean (average) and
MODE EFFECTS
Survey researchers use the term mode to refer to the
way in which data are collected in the survey. Often,
mode will be used to refer specifically to the way
the questionnaire is administered (e.g., as a selfadministered paper-and-pencil questionnaire, on the
Internet, or as a face-to-face interview). However,
mode can be discussed as a facet of various phases
of a survey project, including sampling, contact, and
recruitment, as well as the format of the questionnaire
476
Mode Effects
Components of Mode
and Causes of Mode Effects
When considering mode effects, it can be helpful to
think about the social, psychological, physiological, and
technical facets that comprise a given mode. For example, a face-to-face interview mode usually involves
a one-on-one social interaction between a respondent
and an interviewer, which generally carries with it certain social norms. Physiologically, the respondents must
have the ability to hear survey questions if they are to
be presented verbally by an interviewer. The interviewer
and respondent must also be able to converse in a common language. Finally, there are a number of logistical
issues surrounding traveling to and from sampled persons and finding a place to conduct the interview.
Interviewer Presence
Mode Effects
important factor in her or his decision to report information accurately or to participate in the interview at
all. With respect to mode, some of the most robust
findings about privacy indicate that self-administered
questionnaires produce fewer socially desirable
responses than questionnaires involving an interviewer. This is particularly the case for sensitive
behaviors, such as sexual practices or the use of illegal substances. In many cases, higher reports of sensitive behaviors are taken to be more accurate (when
a comparison with another data source is not available). However, with sexual activity, socially desirability effects seem to work in opposite directions for
men and women, with women reporting fewer sex
partners to an interviewer and men reporting more. It
is clear that the presence of an interviewer can produce mode-related measurement error, but the direction of that error is not always clear.
Channels of Communication
477
different applications and variations of computerassisted interviewing (CAI), some of which retain
interviewers as part of the data collection, as in
computer-assisted personal interviews (CAPI) or
computer-assisted telephone interviews (CATI), and
others that allow respondents to administer the questionnaire themselves, such as audio computer-assisted
self interviewing (ACASI), interactive voice response
(IVR), and Web surveys. Web surveys are a type of
computerized data collection that has grown in popularity over the past decade, primarily due to their low
cost and relative ease of implementation. In this type
of computerized data collection, the respondent interacts directly with the technology, and so the ability to
use the technology is not only an issue of design and
management from the researchers point of view but
is also an issue of respondent acceptance of and ability to employ the technology.
478
Mode Effects
error deals specifically with the sampling and estimation decisions made by analysts and is primarily a statistical problem, which is why mode effects are not
clearly and directly related to sampling per se. However, there are mode effects in coverage, which is
a part of the sampling and inference process.
Mode Effects
Item Nonresponse,
Measurement Error, and Mode
The mode of survey administration can also have an
impact on survey results at the level of the survey
question (item). This includes both item nonresponse
(that is, missing data) and measurement error on items
that have been reported.
Item Nonresponse
Just as potential respondents can refuse to participate in a survey, those who agree to participate can
choose not to answer individual questions. Such phenomena are relatively rare in average survey questions (below 5%), but can be fairly high for certain
kinds of questions like income (upward of 30% to
40% in general population surveys). For most types
of survey questions, face-to-face and telephone surveys produce much lower rates of missing data. The
cause is thought to be related to the presence of the
interviewer, specifically the task of making sure that
every question is asked, thereby eliminating or at least
vastly reducing item nonresponse due to respondents
479
480
Model-Based Estimation
they are the only ones remembered. Sometimes primacy and recency effects are lumped into a general
family of nonoptimal responding called satisficing, but
it seems reasonable to think that they may also be due
to cognitive and perceptual limitations.
Mixed-Mode Surveys
Considering all the variations of survey modes, the
impacts of effects and costs, there seem to be benefits
and drawbacks for different modes given the specific
statistics or measures needed. Survey researchers have
begun to use multiple modes in single surveys as ways
to counterbalance mode effects as sources of error by
building on the strengths of one mode to offset the limitations of another. For example, an in-person nonresponse follow-up with nonresponders to a mail survey is
one example. Using an interviewer-administered mode
for most of the survey questions but a self-administered
mode for sensitive questions is another.
Matthew Jans
See also Audio Computer-Assisted Self-Interviewing
(ACASI); Computer-Assisted Personal Interviewing
(CAPI); Computer-Assisted Telephone Interviewing
(CATI); Diary; Interactive Voice Response (IVR);
Interviewer Effects; Interviewer-Related Error;
Interviewer Variance; Mixed-Mode; Mode of Data
Collection; Mode-Related Error; Multi-Mode
Surveys; Primacy Effects; Recency Effects; Satisficing;
MODEL-BASED ESTIMATION
The primary goal of survey sampling is the accurate
estimation of totals, means, and ratios for characteristics of interest within a finite population. Rather than
assuming that sample observations are realizations of
random variables satisfying some model, it is standard
to treat only the sample selection process itself as random. This is called randomization or design-based
inference. Because they rely on averages taken across
all possible samples and not on the sample actually
drawn, design-based methods can sometimes produce
Model-Based Estimation
481
If E 2k |xk / xk (i.e., the variance of k given xk
is proportional to xk ) for all k U, then one can show
that the sample minimizing the model variance of
tratio as an estimator for T is the cutoff sample containing the n elements in U with the largest x values.
One does not even have to add the assumption that
k |xk has the same distribution within and outside the
sample since the random variable is defined conditioned on the size of xk , which is the only criterion
used in cutoff sampling.
Many surveys designed to measure change are based
on either cutoff samples or samples selected for convenience. In this context, xk is a previous value known for
all elements in U, and yk a current value known only
for elements in S. The ratio model and the ignorability
of the sampling mechanism is assumed (perhaps only
implicitly), and tratio is computed.
When the sampling mechanism is ignorable, there
are many unbiased estimators for T under the ratio
model. Some are more model efficient (have less
model variance) than tratio when 2k = E 2k |xk ) is not
proportional to xk . Usually, however, assumptions
about the 2k are on less firm ground than the ratio
model to which it is attached. Moreover, the model
itself, although apparently reasonable in many situations, may fail because the expectation of the k |xk
subtly increases or decreases with the size of xk .
Design-based methods offer protection against
possible model failure and the nonignorability of the
sampling mechanism. These methods, however, often
depend on a different kind of assumptionthat the
realized sample is sufficiently large for estimators
to be approximately normal. Combining design and
model-based methods is often a prudent policy, especially when samples are not very large. A working
model that is little more than a rough approximation
of the stochastic structure of the characteristics of
interest can help in choosing among alternative estimation strategies possessing both good model- and
design-based properties. It may even help assess
482
2
#
NxU
NxU
vratio =
nxS
nxS
(
2,
)
X
xk
yS
1
yk xk
,
kS
xS
nxS
which is also a nearly unbiased estimator for the strategys design variance under mild conditions. Empirical
studies have shown that variance estimators with both
good model- and design-based properties tend to produce confidence intervals with closer-to-predicted coverage rates than purely model- or design-based ones.
Phillip S. Kott
See also Convenience Sampling; Cutoff Sampling;
Design-Based Estimation; Finite Population;
Ratio Estimator; Simple Random Sample
Further Readings
modes. Modes can differ along a number of dimensions, including whether an interviewer is present,
how the questions are presented and the responses
recorded, the infrastructure required, field time, and
costs.
One of the primary distinctions between modes of
data collection is the presence or absence of an interviewer. When an interviewer is present, the survey
questions are generally read to the respondent, and the
mode is referred to as interviewer-administered. Telephone and face-to-face (in-person) surveys are examples of interviewer-administered data collection. When
an interviewer is not present and the respondent must
deal directly with a paper or electronic questionnaire,
the mode is generally said to be self-administered.
Examples of these include mail and Internet-based
surveys.
The method of presenting the questions and receiving the responses also defines the mode of data collection. Questions presented visually are typically read by
respondents, whereas those presented verbally are heard
by respondents. The way in which the respondent
receives the stimuli of the question has been shown to
affect how a person responds to a particular survey
question. Likewise, responses provided to survey questions can be written by hand, typed, or spoken. Each of
these methods presents different memory and perception issues. Questions and response options that are read
to respondents generally need to be shorter than those
that are read by respondents, because of working memory limitations. When response categories are received
visually, respondents tend to choose categories early in
the list (a primacy effect). When they are received
aurally, respondents tend to choose categories toward
the end of the list (a recency effect). Thus, researchers
must pay special attention to possible mode effects on
data quality, especially in mixed-mode surveys in which
some answers to a question come from respondents
who were contacted via one mode (e.g., mail) and other
answers to these same questions come from a different
mode (e.g., telephone).
The infrastructure (and thus the financing) needed
to conduct a survey also differs by mode. A selfadministered, Web-based survey of several thousand
individuals could potentially be carried out by an individual person, while a face-to-face survey of the same
size would require a staff of interviewers and field
managers. If a telephone survey is being considered,
a centralized telephone interviewing facility often is
required. Within any specific mode, the infrastructure
Mode-Related Error
483
MODE-RELATED ERROR
Face-to-face (in-person) surveys, telephone, mail, and
Web surveys are common types of data collection
modes in current survey research. These modes can
be classified into two categories: (1) self-administered
versus interviewer-administered, depending on whether
interviewers are involved in interviewing, (2) and
paper-and-pencil versus computer-assisted, depending
484
Mode-Related Error
Mode of Data Collection; Nonresponse Error; Paper-andPencil Interviewing (PAPI); Primacy Effects; Recency
Effects; Sampling Error; Sensitive Topics; Social
Desirability; Telephone Surveys; Total Survey Error
(TSE); Web Survey
Further Readings
MULTI-LEVEL INTEGRATED
DATABASE APPROACH (MIDA)
The multi-level integrated database approach (MIDA)
is an enhancement to survey sampling that uses databases to collect as much information as practical
about the target sample at both the case level and at
various aggregate levels during the initial sampling
stage. The goal of MIDA is to raise the final quality,
and thus the accuracy, of survey data; it can do this in
a variety of ways.
Building an MIDA
The following description of MIDA uses the example
of national samples of U.S. households based on
addresses and as such is directly appropriate for postal
and in-person samples. However, similar approaches
can be applied to other modes and populations (e.g.,
national random-digit dialing [RDD] telephone samples,
panel studies, list-based samples, and local surveys).
The first step in MIDA is to extract all relevant
public information at both the case level and aggregate levels from the sampling frame from which the
sample addresses are drawn. In the United States,
general population samples of addresses are typically
nearly void of household-level information. However,
U.S. address samples are rich in aggregate-level information. Address or location, of course, is the one
known attribute of all cases, whether respondents or
nonrespondents. Moreover, address-based sampling
frames are typically based on the U.S. Census and as
485
486
Multi-Mode Surveys
Further Readings
MULTI-MODE SURVEYS
Multi-mode surveys (sometimes called mixed-mode
surveys) involve collecting information from survey
respondents using two or more modes and combining
the responses for analysis. Multi-mode surveys have
become increasing popular because of the rise of
new modes of data collection, the impact of computer
technology, and decreasing response rates to traditional
survey modes (particularly telephone surveys). The
development of new modes of data collection has
expanded the methods available to survey researchers.
Multi-mode survey designs are extremely flexible
when various combinations of modes can be employed
to adapt to the particular needs of each research study.
Multi-mode surveys are often used to compensate for
coverage biases of individual modes and to increase
overall response rates. However, these reductions in
coverage and nonresponse must be balanced with
potential increases in measurement error that may arise
from combining responses collected using different
modes.
Survey designs involve choosing the optimal mode
or combination of modes while minimizing overall
total survey error (coverage, sampling, nonresponse,
and measurement). The decision of whether to use
multiple modes for data collection involves several
issues. Surveyors should consider the best mode or
Multi-Mode Surveys
487
A third type of multi-mode design involves surveying members of the same sample or of different samples using multiple modes over time, where the
survey mode changes for different periods or phases
of data collection. For example, face-to-face personal
interviews may be used for the initial period of data
collection in a longitudinal survey, but subsequent
data collection periods may survey respondents by
telephone, mail, or Internet.
Combining Data From Different Modes
in the Same Larger Survey
The final type of multi-mode survey involves combining independently collected data from different
samples, subgroups, or populations. For example,
many international surveys are conducted in which
data may be collected in one country by personal
interviews and in another country by telephone or
mail and then combined for analysis. In addition, data
may also be collected independently in different studies and then combined for comparative analysis (e.g.,
comparing data collected for a particular city or state
to nationally collected data).
Data Quality
Combining data collected from different survey modes
for analysis may introduce mode-related measurement
error and reduce data quality. Mode effects arise
because social, cultural, and technological factors associated with particular modes influence how respondents
complete the survey response process. Respondents
answers to survey questions are influenced by how
information is communicated with respondents, their
varying familiarity with and use of the medium or technology, whether the respondent or an interviewer controls the delivery of the survey questions, and the
presence of an interviewer. To reduce measurement differences, optimal design of survey questionnaires for
multi-mode surveys should focus on presenting an
equivalent stimulus to respondents across different
modes. This type of unified or universal mode design
should recognize how differences in meaning may
depend on how information is communicated with
respondents. In addition, questionnaire design for multimode surveys in which most of the responses are
488
Multiple-Frame Sampling
Further Readings
MULTIPLE-FRAME SAMPLING
Most survey samples are selected from a single sampling frame that presumably covers all of the units in
the target population. Multiple-frame sampling refers
to surveys in which two or more frames are used and
independent samples are respectively taken from each
of the frames. Inferences about the target population
are based on the combined sample data. The method
is referred to as dual-frame sampling when the survey
uses two frames.
Sampling designs are often dictated by several key
factors, including the target population and parameters
of interest, the population frame or frames for sampling selection of units, the mode of data collection,
inference tools available for analyzing data under the
chosen design, and the total cost. There are two major
motivations behind the use of multiple-frame sampling method: (1) to achieve a desired level of precision with reduced cost and (2) to have a better
coverage of the target population and hence to reduce
possible biases due to coverage errors. Even if a complete frame, such as a household address list, is available, it is often more cost-effective to take a sample
of reduced size from the complete frame and supplement the sample by additional data taken from other
frames, such as telephone directories or institutional
lists that might be incomplete but less expensive to
Multiple Imputation
489
Skinner, C. J., Holmes, D. J., & Holt, D. (1994). Multipleframe sampling for multivariate stratification.
International Statistical Review, 62, 333347.
MULTIPLE IMPUTATION
Multiple imputation (MI) is actually somewhat of a
misnomer. The phrase is best understood as the name
for a post-imputation variance estimation tool that
involves repetitions of the imputation process. The
father of multiple imputation, Donald Rubin, originally envisioned MI as a tool for the preparation of
public use files (PUFs). He advocated that data publishers use MI in order to simplify and improve the
analyses conducted by PUF consumers. So far, few
data publishers have adopted MI. More usage of MI
has been found in highly multivariate analyses with
complex missing data structures, such as in the scoring of standardized tests with adaptive item sampling.
In that literature, the multiple imputations are most
often referred to as plausible values.
Motivation
MI is most commonly used in conjunction with
Bayesian imputation methods, in which samples drawn
from the posterior distribution of the missing data given
the observed data are used to fill in the missing values.
However, as long as there is some element of randomness in the imputation process, one can imagine executing the process multiple times and storing the answers
from each application (i.e., replication). The variance
of a statistic of interest across these replications can
then be calculated. This variance can be added to the
nave estimate of variance (obtained by treating all
imputed data as if they were observed) to produce a variance estimate for the statistic that reflects the uncertainty
due to both sampling and imputation. That is the essence
of multiple imputation.
Controversy
Further Readings
490
Multiple Imputation
Basic Formulae
Suppose that the entire imputation process of choice
is repeated m times and that all m imputed values are
stored along with the reported data. Conceptually, the
process produces m completed data sets representing
m replicates of this process. If there were originally
p columns with missing data, then there will be mp
corresponding columns in the new multiply imputed
dataset. The user then applies his or her full-sample
analysis procedure of choice m times, once to each
set of p columns. Suppose that ^Ik is the point estimate of some parameter, , based on the kth set of
p columns. (The subscript, I , indicates employment of
^Ik is the variance
imputed data.) Also suppose that V
^
estimate for Ik provided by the standard complex
survey analysis software when applied to the kth set
of p columns.
Assuming that the imputation method of choice
has a stochastic component, such as imputation that is
based on a linear regression model to predict imputed
values from covariates, multiple imputations can be
used to improve the point estimate and provide better
leverage for variance estimation. Rubins point estim
P
^Ik , and his variance estimate is
mate is ^m = m1
Tm = m1
m
P
k=1
k=1
^Ik + m + 1 1
V
m m1
m
P
^Ik ^m
2
m + Bm .
=U
k=1
= lim U
m
With a proper imputation method, U
m
ance caused by both the missing data and the imputation procedure.
Pathological Examples
Consider now what can go wrong from the application of multiple imputation to an improper imputation
procedure. If the imputation procedure is a deterministic method (i.e., has no stochastic component), such
as mean imputation or nearest-neighbor imputation,
then Bm = 0 (i.e., no variability in estimates across the
imputation replicates), leading to an underestimate of
var^m ).
Overestimation of variances is possible as well, as
in the famous example of Robert Fay. Here a more
peculiar but less subtle example is considered. Suppose that for the variable Y, the data publisher randomly picks one respondent and imputes that single
value to all nonrespondents. Suppose further that there
are two domains, A and B, and that the parameter of
interest is the difference in the mean of Y across
them, despite the fact that, unbeknown to the analyst,
this difference is zero. Assume a simple random sample with replacement of size n with missingness
completely at random. Assume that the response rates
in the two strata, RA and RB , are unequal.
Then ^Ik = yAR RA yBR RB + YRk RB RA ), where
yAR and yBR are the means among respondents in the
two domains, and YRk is the universal donor chosen
on multiple impute k. From this, ^ = lim ^m =
m
Guidelines
From the pathological examples, we see that it is possible through the choice of imputation method to induce
either too much or too little variability among the plausible values. What method will induce just the right
amount? How to choose m? These are open questions
for all but the simplest analyses. Bayesian Markov chain
Monte Carlo (MCMC) methods are probably a good
choice, but the search for simpler alternatives continues.
There is no general theory on how to optimally
m ), and varBm )
choose m. Although var^m , varU
are all nonincreasing functions of m, the computing
demand is an increasing function of m, and examples
Multiplicity Sampling
have been discovered in which bias Tm ) also increases with m. A common choice is to use m = 5. A
larger number may be particularly desirable if the
item nonresponse rate is high.
Despite the discomfort caused by the lack of firm
answers to these questions, no better post-imputation
variance estimation methods have been found that
apply to multivariate analyses, such as a regression of
one variable on two or more other variables, each
with a distinct missing data pattern. The alternatives
that have been identified are mostly applicable only to
univariate statistics, with some extensions to multivariate analyses in which variables are missing in tandem
or block style instead of Swiss cheese style. However,
if the publisher did not condition the imputation in
such a way as to protect the relationships of interest
to the user, then the user may wish to consider replacing the published set of plausible values with his or
her own.
David Ross Judkins
See also Hot-Deck Imputation; Imputation; Variance
Estimation
Further Readings
491
MULTIPLICITY SAMPLING
Multiplicity sampling is a probability sampling technique that is used to enhance an existing sampling
frame by adding elements through a form of network
sampling. It is especially useful when surveying for
rare attributes (e.g., rare hereditary diseases).
In sample surveys, population elements are linked
to units in the sampling frame, or frame units. For
example, persons can be linked to a specific household by familial relationship among persons residing
in the same housing unit. A counting rule identifies
the linkage between population elements and frame
units. In most surveys, population elements are linked
to one and only one frame unit, and thus there is
a one-to-one correspondence of element to unit. For
multiplicity sampling, a counting rule is established
that defines the linkage between population elements
and frame units in which one or more population
elements are linked to one or more frame units. The
counting rule defines a one-to-many or a many-tomany correspondence between population elements
and frame units. The count of frame units linked by
the counting rule to each population element is called
the multiplicity of a frame unit.
For an unbiased estimate of the number of population elements, the design-based sampling weight for
each selected frame unit is adjusted for the number of
frame units linked to the population element by dividing the sampling weights by the multiplicity. The
multiplicity is needed for only those units selected in
the sample. Multiplicity sampling uses the linkage of
the same population element to two or more frame
units to allow the sample of frame units to identify
more population elements.
For example, in a household survey to estimate the
frequency of a target condition in a population, the standard household survey would enumerate only persons
with the target condition in sampled households. With
multiplicity sampling, a counting rule based on adult
biological siblings residing in households would identify a person with a specific attribute (the population
element) linked to their own household and to the
households of his or her adult biological siblings. Each
sampled household member would be asked, (a) if you
or an adult biological sibling have the specific condition,
(b) the number of adult siblings with the condition,
and (c) the number of households containing adult biological siblings. The person with the attribute would be
492
Multi-Stage Sample
Further Readings
MULTI-STAGE SAMPLE
A multi-stage sample is one in which sampling is done
sequentially across two or more hierarchical levels,
such as first at the county level, second at the census
track level, third at the block level, fourth at the
Mutually Exclusive
493
Further Readings
MUTUALLY EXCLUSIVE
Response options to a survey question are mutually
exclusive when only one response option can be true
for a single respondent. Consider a survey question that
asks respondents, How long do you spend commuting
each day (round trip): less than 15 minutes, 15 to
30 minutes, 30 minutes to one hour, or one hour or
longer? A respondent who commutes for 30 minutes
each day could choose either the second or the third
response option, so the options are not mutually exclusive. Because response options overlap, a researcher
examining responses to this question cannot differentiate between respondents in adjacent categories. Not
providing mutually exclusive response options is a
common mistake made when writing survey questions.
One could rewrite this survey question to have mutually exclusive response options as less than 15 minutes; at least 15 minutes but less than 30 minutes; at
least 30 minutes but less than 1 hour; 1 hour or more.
While a bit wordier, the response options in this
revised question are mutually exclusive.
494
Mutually Exclusive
as in a check-all-that-apply item. For example, a survey question measuring racial identification may allow
respondents to select more than one response option.
For example, the earlier question about voter turnout
could be rewritten to allow multiple responses: Why
did you not vote in this election? Please select all that
are true for you: (1) you were too busy, (2) you did not
have a strong preference for a candidate, (3) you were
ill or did not feel well, or (4) some other reason?
Responses to these questions can then be transformed
for analysis into multiple variables reflecting whether
respondents selected each response option.
Allyson Holbrook
See also Check All That Apply; Exhaustive; Response
Alternatives
Further Readings
N
See also Element; Finite Population Correction (fpc) Factor;
N; Population; Sample; Sample Size; Sampling Fraction;
Sampling Without Replacement
n
The sample size is traditionally labeled n, as opposed
to the total population size, which is termed N. The
sample size, n, can refer to either the original number
of population elements selected into the sample
(sometimes called the designated sample size or
sampling pool), or it can refer to the final number
of completed surveys or items for which data were
collected (sometime called the final sample size or
final sample). In the same vein, it could refer to
any number in between such as, for example, the
number of elements that have been sampled and contacted but not interviewed. Or it could refer to the
number of elements for which complete data are
available. Another interpretation or use of the term n
is the number of elements on the data file and available for analysis.
It is almost always true that n is smaller than N and
usually by orders of magnitude. In fact, the ratio (n=N)
is often referred to as the sampling fraction. Often the
population size N is so large relative to n that one can
safely assume that with replacement sampling holds
even if in practice without replacement sampling is
implemented. The relative sizes of n and N also play
a role in determining whether the finite population
correction factor 1 n=N is sufficiently different
from 1 to play a role in the calculation of sampling
variance.
Further Readings
N
The total population size is traditionally labeled N, as
opposed to the sample size, which is termed n. The
population size N refers to the total number of elements in the population, target population, or universe.
N also refers to the number of elements on the
sampling frame from which the sample is to be
drawn. Since in many cases, the list of population elements contains foreign elements, the accurate number
of eligible population elements is less than the number of elements on the list. In other cases, the population list not only contains foreign elements but also
contains omissions and inaccuracies. These further
put into question the validity of the value of N, which
should be assessed carefully both before and after
sample selection and survey implementation.
In some situations N is unknown, and in fact one
of the objectives of the survey is to estimate N and its
distributional characteristics. In other situations N is
known only approximately, and its estimate is refined
based on the information obtained from the survey.
Karol Krotki
495
496
Further Readings
NATIONAL COUNCIL ON
PUBLIC POLLS (NCPP)
Founded in 1969, the National Council on Public
Polls (NCPP) is an association of public opinion polling organizations. Initiated by George Gallup of the
American Institute of Public Opinion, the primary
goal of NCPP is to foster the understanding, interpretation, and reporting of public opinion polls through
the disclosure of detailed, survey-specific information
and methods to the general public and the media.
NCPP recognizes that the goal of public opinion
polls is to provide reliable, valid, and accurate information. If polls succeed in achieving these goals,
scientifically conducted surveys can characterize the
publics view on issues, policies, elections, and concerns of the day. But with the enormous amount of
polling information available, competing methods of
collecting information, and sometimes contradictory
results, it is often difficult for the general public and
the media to decipher polls that accurately reflect what
people think from polls that do not.
NCPP does not pass judgment on specific polls,
polling methods, or polling entities but rather advocates that polling organizations whose results reside
in the public realm disclose pertinent information
about how their surveys are conducted. NCPP maintains that if provided an adequate basis for judging
the reliability and validity of poll results, consumers
of surveys may assess these studies for themselves.
It is with this goal in mind that NCPP developed
a code for member organizations to abide by when
reporting survey findings that are intended for or end
up in the public domain. These Principles of Disclosure include three levels of disclosure, as described
on the NCPP Web site.
Level 1 disclosure requires that all reports of
survey findings issued for public release by member
organizations include the following information, and,
in addition, member organizations should endeavor to
have print and broadcast media include these items in
their news stories:
Sponsorship of the survey
Fieldwork provider (if the member organization did
not, itself, conduct the interviews)
Dates of interviewing
Sampling method employed (e.g., random-digit dialed
telephone sample, list-based telephone sample, area
probability sample, probability mail sample, other
probability sample, opt-in Internet panel, nonprobability convenience sample, use of any oversampling)
Population that was sampled (e.g., general population; registered voters; likely voters; or any specific
population group defined by gender, race, age, occupation, or any other characteristic)
Size of the sample that serves as the primary basis
of the survey report
Size and description of the subsample, if the survey
report relies primarily on less than the total sample
Margin of sampling error (if a probability sample)
Survey mode (e.g., telephone/interviewer, telephone/
automated, mail, Internet, fax, email)
Complete wording and ordering of questions mentioned in or upon which the news release is based
Percentage results of all questions reported
Complete wording of questions (per Level 1 disclosure) in any foreign languages in which the survey
was conducted
Weighted and unweighted size of any subgroup
cited in the report
Minimum number of completed questions to qualify
a completed interview
Whether interviewers were paid or unpaid (if interview-administered data collection)
Details of any incentives or compensation provided
for respondent participation
Description of weighting procedures (if any) used to
generalize data to the full population
Sample dispositions adequate to compute contact,
cooperation, and response rates
497
Further Readings
498
Thereafter, the VNS members disbanded that organization and formed NEP in its place.
Unlike VNS, the new pool did not have its own
staff but hired outside vendorsEdison Media
Research and Mitofsky International. Under NEP,
Edison-Mitofsky used in essence the same survey and
sample precinct methodology as VNS (which Warren
Mitofsky and Murray Edelman and others had developed at CBS prior to the formation of VRS) but ran
the data through new computer systems. However,
NEP abandoned the broader VNS vote count function;
the AP, which had maintained its own comprehensive
vote count during the VNS erawith stringers collecting vote in statewide and down-ballot races in
every county in the country (or towns and cities in the
New England states, where official vote is not tallied
centrally by counties)became the sole U.S. source
of unofficial vote count. AP vote tabulation data are
incorporated into the Edison-Mitofsky projections
models when it becomes available on Election Night,
helping NEP members call winners in races that were
too close to be called from early voter surveys, exit
polls, and sample precinct vote count alone.
The first election NEP covered was the California
gubernatorial recall in November 2003. NEP covered
23 Democratic presidential primaries and caucuses in
early 2004; the general election in all 50 states and
the District of Columbia in November of that year;
and elections in 32 states in the 2006 midterms.
The pool faced controversy again in the 2004 general election when estimates from exit poll interviews
early in the day leaked on Web sites and indicated
Democrat John Kerry would win the race for president.
Even with more complete samples later in the day,
some survey estimates fell outside sampling error tolerances when compared to actual vote. Several hypotheses for the discrepancies were offered, and the pool
and Edison-Mitofsky took corrective action, including
changes to interviewer recruitment and training procedures and measures to stanch leaks of early, incomplete
exit poll data. One of those measures, a quarantine
room, was established in 2006 and successfully monitored very closely by NEP, which strictly limited the
access that NEPs sponsors could have to the exit poll
data on Election Day prior to 5:00 P.M. EST, and this
resulted in no early leaks in 2006.
NEP planned to cover 23 states in the 2008 Democratic and Republican presidential primaries and all
50 states plus the District of Columbia in the general
election in November 2008. The pool now typically
supplements the Election Day exit polls with telephone surveys for early or absentee voters in about
a dozen states in a presidential general election.
Michael Mokrzycki
See also Election Night Projections; Exit Polls
Further Readings
History
The NES grew out of the studies created by the Survey
Research Center and the Center for Political Studies of
the Institute for Social Research at the University of
Michigan. The program always lacked sufficient funding, which limited improvement to the study. The
funding that it did receive was primarily used to conduct the survey. As a result, there were rarely changes
to the core questions of the study. This also meant that
those not directly involved in the program had little
influence on the types of questions offered.
In 1977, through the initiative of sociologist Warren E.
Miller, the National Science Foundation (NSF) formally established the National Election Studies. With
sufficient funding, the NES was expected to fulfill two
expectations. First, it was expected to continue the
499
500
Research Opportunity
Unlike many public opinion surveys, the NES has
been made available to anyone who wants to use it.
A researcher can download the individual responses
of each person surveyed since 1948. These data sets
are available from Inter-university Consortium for
Political and Social Research (ICPSR) or directly
from the American National Election Studies Web
page. The NES also provides a number of other
resources, including technical reports, tables, and
graphs. To date, there are more than 5,000 entries
on the NES bibliography, demonstrating the wideranging research options that are available from
analysis of these data.
James W. Stoutenborough
See also Election Polls; Face-to-Face Interviewing; Multistage Sample; Perception Question; Pilot Test; Reliability;
Telephone Surveys; Validity
Further Readings
Background
The NHANES program began in the early 1960s. The
first surveys did not have a nutritional component.
They were called the National Heath Examination
Surveys (NHES). When nutrition assessments were
added in the 1970s, the survey name changed to the
National Health and Nutrition Examination Survey
(NHANES).
The NHES and NHANES surveys were conducted
periodically through 1994 and targeted selected age
groups. Since 1999, NHANES has been conducted
every year and includes people of all ages.
NHANES is a cross-sectional survey with a stratified, multi-stage probability sample design. The
NHANES sample is selected from the civilian, noninstitutionalized U.S. population and is nationally representative. NHANES examines about 5,000 persons
annually. Participants are selected in 15 counties
across the country each year. These data provide an
overview of the health and nutrition of the U.S. population at one point in time. NHANES data are also
linked to Medicare and National Death Index records
to conduct follow-up studies based on mortality and
health care utilization.
Data Collection
NHANES consists of three major pieces: (1) health
interviews, (2) medical examinations, and (3) laboratory measures. The health interviews take place in the
participants homes. These are conducted face to face,
using computer-assisted personal interviewing (CAPI)
software on pen-top computers. CAPI was first used
in 1992, during the Third National Health and Nutrition and Examination Survey (NHANES III). Before
1992, NHANES interviews were conducted using
pencil and paper.
The home interviews are followed by physical
examinations. These are done in the NHANES
Mobile Examination Centers (MECs). The MEC is
made up of four interconnected 18-wheel tractor trailers. Each of the four trailers houses multiple examination rooms. The MEC visit also includes a dietary
recall interview and a health interview covering topics
too sensitive to ask in the home.
Laboratory specimens, including blood and urine,
are also collected in the MEC. Some laboratory
tests are conducted on-site, in the MEC laboratory.
Others are done at laboratories across the country.
Small amounts of urine and blood are also stored for
future testing, including genetic testing.
After the MEC examinations, certain subsets of
NHANES respondents participate in telephone interviews. All participants receive a report of the results
from selected examination and laboratory tests that
have clinical relevance.
The topics covered by NHANES vary over time.
Because current NHANES data are released in twoyear cycles, survey content is modified at two-year
intervals. Some topics stay in the survey for multiple
two-year periods. When the data needs for a topic are
met, it is cycled out of NHANES, and new topics are
added. Rotating content in and out over time has several benefits. It gives NHANES the flexibility needed
to focus on a variety of health and nutrition measurements. It provides a mechanism for meeting emerging
health research needs in a timely manner. This continuous survey design also makes early availability of the
data possible.
501
Further Readings
502
Release of Data
NCHS publicly releases NHIS microdata annually
from both the core and supplements. Microdata collected during 2004 were released less than 7 months
after the end of the data collection year. Currently, all
public use files and supporting documentation for data
years 1970 through the year of the most recent release
are available without charge from the NHIS Web site.
Previous years of public use files from 1963 through
1969 will soon be available for downloading from the
NCHS Web site as well.
Since data year 2000, NCHS has been releasing
quarterly estimates for 15 key health indicators through
its Early Release (ER) Program. After each new quarter of data collection, these estimates are updated and
then released on the NCHS Web site 6 months after
the data collection quarter. The 15 measures covered
by ER include (1) lack of health insurance coverage
and type of coverage, (2) usual place to go for medical
care, (3) obtaining needed medical care, (4) obesity,
(5) leisure-time physical activity, (6) vaccinations,
(7) smoking and alcohol consumption, and (8) general
health status. For each of these health measures, a graph
of the trend since 1997 is presented, followed by figures and tables showing age-specific, sex-specific, and
race/ethnicity-specific estimates for the new data quarter. Key findings are highlighted. A separate in-depth
report on health insurance is also updated and released
every 3 months as part of the ER Program. Both quarterly ER reports are released only electronically, on the
NCHS Web site.
In addition to releasing NHIS microdata to the
public, NCHS staff members publish their own analyses of the data. Series 10 reports provide results of
analyses of NHIS data in substantial detail. Among
those series reports are three volumes of descriptive
statistics and highlights published annually, based,
respectively, on data from the NHIS Family Core,
Sample Child Core, and Sample Adult Core. NCHSs
series Advance Data From Vital and Health Statistics
publishes single articles from the various NCHS programs. NCHSs annual report on the health status of
the United States (Health, United States) contains
numerous tables and other analytic results based on
NHIS data.
Multiple years of NHIS microdata are periodically
linked to other databases, such as the National Death
Index and Medicare records. The National Death
Index is an NCHS-maintained central computerized
503
Further Readings
504
505
506
Network Sampling
Further Readings
NETWORK SAMPLING
Network sampling is widely used when rare populations are of interest in survey research. Typically,
sampling frames do not exist for rare populations
507
Further Readings
508
Neyman Allocation
See also List-Assisted Sampling; Media Polls; MitofskyWaksberg Sampling; Random-Digit Dialing (RDD)
Further Readings
NEYMAN ALLOCATION
Stratified samples are commonly used when supplementary information is available to help with sample
design. The precision of a stratified design is influenced by how the sample elements are allocated to
strata. Neyman allocation is a method used to allocate
sample to strata based on the strata variances and similar sampling costs in the strata. A Neyman allocation
scheme provides the most precision for estimating
a population mean given a fixed total sample size.
For stratified random sampling, the population is
divided into H mutually exclusive strata. In each
stratum, a simple random sample is drawn without
replacement. Neyman allocation assigns sample units
within each stratum proportional to the product of the
population stratum size (Nh ) and the within-stratum
standard deviation (Sh ), so that minimum variance for
a population mean estimator can be achieved. The
equation for Neyman allocation is
nh =
Nh Sh
n;
H
P
Nh Sh
h=1
509
variability within the stratum is large, so that the heterogeneity needs to be compensated.
Of note, Neyman allocation is a special case of optimal allocation whose objective in sample allocation
is to minimize variance of an estimator for a population mean for a given total cost. It is employed when
the costs of obtaining sampling units are assumed to
be approximately equal across all the strata. If the variances are uniform across all the strata as well, Neyman
allocation reduces to proportional allocation where the
number of sampled units in each stratum is proportional
to the population size of the stratum. When the variances within a stratum are different and are specified
correctly, Neyman allocation will give an estimator
with smaller variance than proportional allocation.
The major barrier to the application of Neyman
allocation is lack of knowledge of the population variances of the study variable within each stratum. In
some situations, historical estimates of strata variances
can be used to provide good approximation to Neyman
allocation for the current survey sample. For example,
the Medical Expenditure Panel Survey Insurance
Component (MEPS IC) is an annual survey of establishments that collects information about employersponsored health insurance offerings. To implement
Neyman allocation, stratum variance estimates were
obtained from the 1993 National Employer Health
Insurance Survey for the initial MEPS IC 1996 and
later from prior MEPS IC surveys.
In situations where estimated population variances
within each stratum are not easily available, an alternative is to find a surrogate variable (a proxy) that is
closely related to the variable of interest and use its variances to conduct a Neyman allocation. For example,
the U.S. Government Accountability Office conducted
a survey in 20042005 to estimate the average and
median purchase prices of specified covered outpatient
drugs (SCODs) in a population of 3,450 hospitals. Since
a direct measure of purchase prices for SCODs was not
available at the time of sample selection, the total hospital outpatient SCOD charges to Medicare was used as
a proxy to carry out the Neyman allocation.
In practice, Neyman allocation can also be applied
to some selected strata instead of all strata, depending
on specific survey needs. For example, the National
Drug Threat Survey 2004 was administered to a probability-based sample of state and local law enforcement
agencies. The sample frame of 7,930 law enforcement
agencies was stratified into a total of 53 strata. Of
those 53 strata, 50 strata were formed based on the
510
900 Poll
Sh
H
P
n:
Sh
h=1
Here, H refers to the total number of rank order statistics and Sh denotes the standard deviation for the hth
rank order statistic.
Haiying Chen
See also Optimal Allocation; Proportional Allocation to
Strata; Ranked-Set Sampling; Stratified Sampling
Further Readings
900 POLL
A 900 poll is a one-question unscientific survey that
typically is taken by having television viewers or radio
listeners call into a 1-900-number that involves a cost
to the callersometimes a considerable cost. A different 900-number is given for each response that the
poll allows the self-selected respondents to choose as
their answer to whatever the survey question is. These
polls are typically sponsored over a brief period of
timeoften an hour or less, for example, within a television program or shortly after it ends. For example,
NOMINAL MEASURE
A nominal measure is part of taxonomy of measurement types for variables developed by psychologist
Stanley Smith Stevens in 1946. Other types of measurement include ordinal, interval, and ratio. A nominal variable, sometimes referred to as a categorical
variable, is characterized by an exhaustive and mutually exclusive set of categories. Each case in the
population to be categorized using the nominal measure must fall into one and only one of the categories.
Examples of the more commonly used nominal measures in survey research include gender, race, religious affiliation, and political party.
Unlike other types of measurement, the categories
of a variable that is a nominal measure refer to discrete characteristics. No order of magnitude is implied
when comparing one category to another. After the
relevant attributes of all cases in the population being
Nonattitude
Table 1
Count
Proportion
Percentage
(%)
Male
651
0.484
48.4
Female
694
0.516
51.6
TOTAL
1345
1.000
100.0
Ratio
(Males to
Females)
0.938
511
NONATTITUDE
Nonattitude refers to the mental state of having no attitude or opinion toward some object, concept, or other
type of stimulus. In survey research, this is manifested
by an overt no opinion or dont know response to an
attitude question, but it may also be hidden by a random
or guesswork choice of answers to avoid appearing
ignorant. Additionally, it is likely that not all no opinion
or dont know responses reflect nonattitudes. This
makes it hard to estimate how many respondents have
nonattitudes toward the object.
512
Nonattitude
Alternative Models
With Latent Attitudes
Critics analyzing the same data rejected the idea that
a large part of the public had nonattitudes on leading
public issues. Alternative theories to explain the
observed instability and incoherence of responses
include the following:
1. Measurement error produced by vague and ambiguous questions, concealing real attitudes, which could
be revealed by better questions
2. The influence of temporary stimulievents in the
news or in personal lifeleading to wide variations
in momentary feelings around underlying attitudes
3. The possibility that each object has a variety of elements or considerations about which the individual
has positive or negative feelings, but samples
unsystematically in answering the questions
perhaps randomly, perhaps in response to recent
events or cues given by question wording or
sequence
4. Those who more systematically inventory the considerations they hold in mind may have a near
balance of positive and negative feelingsan
ambivalence making their answers unstable from
Do Nonattitudes Matter?
A public poorly equipped to relate its underlying attitudes to current policy issues would seem little more
likely to have a strong influence on policy than one
with nonattitudes. Benjamin Page and Robert Shapiro
counter that the nonattitudes or weakly connected
attitudes to specific issues do not cripple democracy,
Noncausal Covariation
513
Allen H. Barton
See also Attitude Measurement; Attitudes; Cognitive Aspects
of Survey Methodology (CASM); Deliberative Poll;
Dont Knows (DKs); Measurement Error; Reliability;
Validity
Further Readings
NONCAUSAL COVARIATION
Although correlation is a necessary condition for causation, it is not a sufficient condition. That is, if X and
Y can be shown to correlate, it is possible that X may
cause Y or vice versa. However, just because correlation is established between the two variables, it is not
certain that X causes Y or that Y causes X. In instances
when X and Y are correlated but there is no empirical
evidence that one causes the other, a researcher is left
with a finding of noncausal covariation. A researcher
514
Noncontact Rate
NONCONTACT RATE
The noncontact rate for a survey measures the proportion of all sampled cases that are never contacted
despite the various efforts that the researchers may set
in motion to make contact. By default, if a sampled
case is never contacted, then no original data for the
survey can be gathered from it, other than observations
an in-person interviewer might make of the housing
structure or neighborhood. For surveys in which the
initial sampling unit is a household or business and
then there is a respondent sampled within that unit,
a noncontact rate can be calculated both at the unit
level and at the within-unit (respondent) level. In theory, a noncontact rate of zero (0.0) means that every
eligible sampled case was contacted, whereas a noncontact rate of one (1.0) means none of the sampled eligible cases were contacted. Neither of these extreme
conditions is likely to occur in a survey. However, the
best of commercial, academic, and government surveys
in the United States achieve noncontact rates of less
than 2%, meaning that more than 49 of every 50 eligible sampled cases are contacted at some point during
the field period.
In face-to-face and telephone surveys of the general public, businesses, or specifically named persons,
noncontacts result from no human at a household or
business ever being reached by an interviewer during
the surveys field period, despite what is likely to be
many contact attempts across different days of the
week and times of the day or evening. In mail and
Internet surveys, noncontacts result from the survey
request never reaching the sampled person, household, or business due to a bad address, transmittal
(delivery) problems, or the person never being at the
location to which the survey request is sent during the
field period.
Noncontacts
515
Calculating a noncontact rate is not as straightforward as it may first appear due to the many sampled cases in almost all surveys for which the researcher
is uncertain (a) whether they are eligible and/or (b)
whether they really were contacted but did not
behave in such a way that provided the researcher with
any certainty that contact actually occurred.
was made. Again, the researchers need to make a reasonable judgment about what proportion of these cases
should be counted as eligible and what portion of these
should be counted as being implicit refusals rather than
as noncontacts in the noncontact rate calculation. Any
of these cases that are counted as refusals should not
enter into the noncontact rate numerator.
NONCONTACTS
Noncontacts are a disposition that is used in telephone,
in-person, mail, and Internet surveys both as a temporary and a final disposition. Two primary types of noncontacts can occur in surveys. The first type occurs
when a researcher makes contact with a household or
other sampling unit, and no one is present to receive
the contact. The second type of noncontact occurs
when a researcher makes contact with a household or
other sampling unit, but the selected respondent is
unavailable to complete the questionnaire.
516
Noncontingent Incentives
For example, the first type of noncontact occurs during in-person surveys when an interviewer visits
a household unit and finds no one there (but does find
clear evidence that the unit is occupied). Noncontacts
also occur when contact is made with a household or
other sampling unit but the selected respondent is not
available to complete the questionnaire at the time of
contact. For example, this type of noncontact occurs
with in-person surveys when an interviewer visits
a sampled address, determines that the address is
a household (or other sampled unit), administers the
introductory script and respondent selection procedures
to someone at the address, and then learns that the
selected respondent is not available to complete the
interview. This type of noncontact is very similar for
telephone surveys and occurs whenever an interviewer
dials a case, reaches a household, administers the introductory script and respondent selection procedures for
the survey, and learns that the designated respondent is
not available at the time of the call. Because contact
has been made with someone within the designated
sampling unit, cases that result in this type of noncontact usually are considered eligible cases and thus are
included when computing survey response rates.
Noncontacts may also occur in mail and Internet
surveys, but the nature of these surveys makes it very
difficult for researchers to know when this is happening
and makes it almost impossible to differentiate between
the two types of noncontacts. For example, in a mail
survey, the questionnaire may be delivered to a household when the residents are away for the entire field
period of the survey. Similarly, in an Internet survey
the respondent may be away from email and the Internet for the entire field period, or the questionnaire may
be sent to an email address that the respondent does
not check during the field period of the survey. Only if
the researcher receives information (such as, in the case
of an Internet survey, an automated email reply noting
that a respondent is away) specifying that the survey
questionnaire was sent to and received by the named
respondent is the survey researcher able to determine
conclusively that a noncontact has taken place.
Because noncontacts usually are considered to be
eligible cases or cases of unknown eligibility (depending on the type of noncontact), researchers continue
to process these cases throughout the field period. In
order to better manage survey sampling pools, many
researchers assign different disposition codes to the
two different types of noncontacts. These disposition
codes allow researchers to manage the sample more
precisely. For example, noncontacts in which no contact is made with anyone at the household or other
sampling unit often are recontacted on a variety of
days and times (or after a specified period of time in
a mail or Internet survey) to increase the chances of
making contact with someone at the household or
other sampling unit. For cases in which contact is
made with someone in a household or other sampling
unit (but the selected respondent is not available), the
researcher can work to identify a good time to recontact the selected respondent. Because these types of
noncontacts are a temporary disposition, it is important that researchers learn as much as possible about
when to try to contact the selected respondent and
then use any information learned to optimize the timing of additional contact attempts and, in doing so, to
maximize the chances of converting the noncontact
disposition into a completed interview.
Noncontact also can be used as a final disposition
if (a) it occurs on the final contact attempt for a case,
(b) previous contact was made during the field period
but there was no success in completing the questionnaire at that time, and (c) there was never a previous
refusal outcome for the case. If there was a previous
refusal outcome, the case should be given the final disposition of refusal even if the last contact attempt
resulted in a noncontact.
Matthew Courser
See also Busies; Callbacks; Final Dispositions; Response
Rates; Temporary Dispositions
Further Readings
NONCONTINGENT INCENTIVES
Noncontingent incentives are traditionally used in survey research as a way of increasing survey response
rates. The concept of noncontigent versus contingent
incentives is that a noncontigent incentive is given to
the respondent regardless of whether the survey is
Noncooperation Rate
517
NONCOOPERATION RATE
Noncooperation occurs when a research unit is able to
cooperate but clearly demonstrates that it will not take
required steps to complete the research process. The
518
Noncooperation Rate
Noncoverage
NONCOVERAGE
Every scientific survey has a target population that is
operationalized by a sampling frame. Ideally, all units
in the sampling frame should match those in the target
population on a one-to-one basis. In reality, misalignment between the two occurs and is termed coverage
error. Noncoverage is one of the elements of coverage
error arising from the imperfectness of a sampling
frame that fails to include some portion of the population. Because these frames cover less than what they
should, noncoverage is also termed undercoverage.
Noncoverage is the most frequently occurring coverage
problem, and it may have serious effects because this
problem cannot be recognized easily in the given
frame. Because the target population is defined with
extent and time, the magnitude of noncoverage
depends on the maintenance of the frame. Depending
on whether the households or people covered by the
frame differ from those not covered, noncoverage may
introduce biases (coverage error) in survey estimates.
The classic example of noncoverage is the Literary
Digest poll predicting Alf Landon as the overwhelming
winner over the incumbent president, Franklin D. Roosevelt, in the 1936 election. Although it had surveyed
10 million people, their frame was comprised of the
Literary Digest readers, a list of those with telephone
service, and a list of registered automobile owners.
Although the general voter population was the target
population of the poll, the sampling frame excluded
a large proportion of the target population and, more
important, an unevenly higher proportion of the
middle- and low-income Democratic voters. The general voter population was more likely to differ in their
preference of presidential candidate than those who
519
520
Nondifferentiation
NONDIFFERENTIATION
Survey respondents are routinely asked to answer
batteries of questions employing the same response
scale. For example, in an effort to understand consumer preferences, respondents might be asked to rate
several products on a scale of 1 to 5, with 1 being
very poor to 5 being very good. Nondifferentiation (sometimes called straight-lining) occurs when
respondents fail to differentiate between the items
with their answers by giving identical (or nearly identical) responses to all items using the same response
scale. That is, some respondents might give a rating
of 2 to all products, producing nondifferentiated
answers.
In the survey literature, nondifferentiation is identified as a very strong form of satisficing. According to
the notion of satisficing, when respondents are unable
to or unwilling to carefully go through all the cognitive steps required in answering survey questions,
they may satisfice by looking for an easy strategy or
cues to provide a satisfactory (but not optimal)
answer. Nondifferentiation is such an easy response
strategy that it saves cognitive effort; respondents presumably do not retrieve information from memory
and do not integrate retrieved information into a judgment (or estimation). Instead, they may interpret each
question within a battery superficially and select a reasonable point on the response scale and stick with that
point for all items in the battery. The answers are thus
selected without referring to any internal psychological cues relevant to the specific attitude, belief, or
event of interest.
Like other satisficing behaviors, nondifferentiation
is most likely to occur when (a) respondents do not
have the ability to answer optimally, (b) respondents
are not motivated to answer carefully, and/or (c) the
questions are difficult to answer. Studies have demonstrated empirically that nondifferentiation is more
common among respondents with lower levels of cognitive capacity (such as respondents with less education or with less verbal ability) and more prevalent
toward the end of a questionnaire. In addition, nondifferentiation is more prevalent among respondents
for whom the questions topic is more personally
important.
Nondifferentiation may occur regardless of the
mode of data collection. However, there is evidence
suggesting that nondifferentiation is more likely to
occur with modes that do not promote respondent
motivation or use more difficult response tasks. For
instance, Web surveys have been shown to promote
nondifferentiating responses, especially when questions are displayed in a grid format (i.e., a tabular
format where question stems are displayed in the
left-most column and response options are shown
along the top row). In addition, Web surveys appear
to lead to more nondifferentiation than intervieweradministered modes. Within interviewer-administered
modes, respondents are found to give more nondifferentiating responses to the telephone surveys than to
the face-to-face interviews.
Nondifferentiation is a form of measurement error
and thus decreases data quality (both validity and
reliability). Of considerable concern, the presence of
nondifferentiating responses artificially inflates intercorrelations among the items within the battery and
thus suppresses true differences between the items.
Therefore, measures should be taken to reduce the
extent of nondifferentiation in a survey. Survey
researchers, for example, should take measures to help
increase respondent motivation to provide thoughtful
answers (e.g., interviewers instructing or encouraging
respondents to think carefully before answering a
survey question) or to lessen the task difficulty (e.g.,
avoiding a grid format in a Web survey and avoid
placing a battery of similar items toward the end of
Nondirective Probing
Further Readings
NONDIRECTIVE PROBING
Probing inadequate survey answers for the additional
information that may be necessary to fully meet
a questions goal(s) is an important element of standardized survey interviewing. In training interviewers
to probe effectively, an important distinction should
be drawn between nondirective and directive forms
of this technique. Unlike directive probing, nondirective probing is designed to encourage and motivate
respondents to provide clarifying information without
influencing their answers. That is, this approach is
specifically designed to be neutral in order to avoid
increasing the probability that any specific type of
answer is encouraged, or discouraged, from respondents. When nondirective probing is employed, an
answer is never suggested by the interviewer. Some
examples of nondirective probing of closed-ended
questions include slowly repeating the original question or repeating the full set of response options (e.g.,
Is that a Yes or a No?). When asking openended questions, some nondirective probe examples
include repeating respondent answers, using neutral
521
522
Nonignorable Nonresponse
NONIGNORABLE NONRESPONSE
When patterns of nonresponse (either unit or item
nonresponse) are significantly correlated with variables of interest in a survey, then the nonresponse
contributes to biased estimates of those variables and
is considered nonignorable. Recent trends of increasing survey nonresponse rates make the question
whether nonresponse is ignorable or not more salient
to more researchers.
Since data are only observed for responders,
researchers often use participating sample members or
members for whom there are complete responses to
make inferences about a more general population. For
example, a researcher estimating the average income
of single parents might use income data observed for
single-parent responders to make generalizations about
average income for all single parents, including those
who did not participate or who refused to answer the
relevant questions. The underlying assumption is that
single-parent sample members who do not respond or
respond with incomplete data are similar to singleparent sample members who participate fully. This
implies that the units with missing data or incomplete
data are a random subsample of the original sample
and do not differ from the population at large.
If this assumption is spurious (i.e., it is not true)
that is, units with missing or incomplete data are different in meaningful (nonignorable) ways from the
rest of the sample on key variables of interestthen
inferences with missing data can lead to biased estimates. For example, if lower-earning single parents
have high unit nonresponse rates because they are
more difficult to locate and contact, then the estimate
of income, the key variable, will be upwardly biased.
Nonprobability Sampling
Further Readings
NONPROBABILITY SAMPLING
Sampling involves the selection of a portion of the
finite population being studied. Nonprobability sampling does not attempt to select a random sample from
the population of interest. Rather, subjective methods
are used to decide which elements are included in the
sample. In contrast, in probability sampling, each element in the population has a known nonzero chance
of being selected through the use of a random selection procedure. The use of a random selection procedure such as simple random sampling makes it
possible to use design-based estimation of population
means, proportions, totals, and ratios. Standard errors
can also be calculated from a probability sample.
Why would one consider using nonprobability
sampling? In some situations, the population may not
be well defined. In other situations, there may not be
great interest in drawing inferences from the sample
to the population. Probably the most common reason
for using nonprobability sampling is that it is less
expensive than probability sampling and can often be
implemented more quickly.
Nonprobability sampling is often divided into three
primary categories: (1) quota sampling, (2) purposive
sampling, and (3) convenience sampling. Weighting
and drawing inferences from nonprobability samples
require somewhat different procedures than for probability sampling; advances in technology have influenced some newer approaches to nonprobability
sampling.
523
Quota Sampling
Quota sampling has some similarities to stratified
sampling. The basic idea of quota sampling is to set
a target number of completed interviews with specific
subgroups of the population of interest. Ideally, the
target size of the subgroups is based on known information about the target population (such as census
data). The sampling procedure then proceeds using
a nonrandom selection mechanism until the desired
number of completed interviews is obtained for each
subgroup. A common example is to set 50% of
the interviews with males and 50% with females in
a random-digit dialing telephone interview survey. A
sample of telephone numbers is released to the interviewers for calling. At the start of the survey field
period, one adult is randomly selected from a sample
household. It is generally more difficult to obtain
interviews with males. So, for example, if the total
desired number of interviews is 1,000 (500 males and
500 females), and the researcher is often able to
obtain 500 female interviews before obtaining 500
males interviews, then no further interviews would be
conducted with females and only males would be
selected and interviewed from then on, until the target
of 500 males is reached. Females in those latter sample households would have a zero probability of
selection. Also, because the 500 female interviews
were most likely obtained at earlier call attempts,
before the sample telephone numbers were thoroughly
worked by the interviewers, females living in harderto-reach households are less likely to be included in
the sample of 500 females.
Quotas are often based on more than one characteristic. For example, a quota sample might have
interviewer-assigned quotas for age by gender and by
employment status categories. For a given sample
household, the interviewer might ask for the rarest
group first, and if a member of that group were present in the household, that individual would be interviewed. If a member of the rarest group were not
present in the household, then an individual in one of
the other rare groups would be selected. Once the
quotas for the rare groups are filled, the interviewer
would start to fill the quotas for the more common
groups.
Quota sampling is sometimes used in conjunction
with area probability sampling of households. Area
probability sampling techniques are used to select primary sampling units and segments. For each sample
524
Nonprobability Sampling
Purposive Sampling
Purposive sampling is also referred to as judgmental
sampling or expert sampling. The main objective of
purposive sampling is to produce a sample that can
be considered representative of the population. The
term representative has many different meanings,
along the lines of the sample having the same distribution of the population on some key demographic
characteristic, but it does not seem to have any
agreed-upon statistical meaning. The selection of
a purposive sample is often accomplished by applying
expert knowledge of the population to select in a nonrandom manner a sample of elements that represents
a cross-section of the population. For example, one
might select a sample of small businesses in the United
States that represent a cross-section of small businesses
in the nation. With expert knowledge of the population,
one would first decide which characteristics are important to be represented in the sample. Once this is established, a sample of businesses is identified that meet
the various characteristics that are viewed as being
most important. This might involve selecting large
(1,000 + employees), medium (100999 employees),
and small (<100 employees) businesses.
Another example of purposive sampling is the
selection of a sample of jails from which prisoner
participants will be sampled. This is referred to as
two-stage sampling, but the first-stage units are
not selected using probability sampling techniques.
Rather, the first-stage units are selected to represent
key prisoner dimensions (e.g., age and race), with
expert subject matter judgment being used to select
the specific jails that are included in the study. The
opposite approach can also be used: First-stage units
are selected using probability sampling, and then,
within the selected first-stage, expert judgment is
employed to select the elements from which data will
be collected. Site studies or evaluation studies will
often use one of these two approaches. Generally,
there is not interest in drawing inferences to some
larger population or to make national estimates, say,
for all prisoners in U.S. jails.
A clear limitation of purposive sampling is that
another expert likely would come up with a different
sample when identifying important characteristics and
picking typical elements to be in the sample. Given
the subjectivity of the selection mechanism, purposive
sampling is generally considered most appropriate for
Nonprobability Sampling
525
Convenience Sampling
Convenience sampling differs from purposive sampling in that expert judgment is not used to select
a representative sample of elements. Rather, the primary selection criterion relates to the ease of obtaining a sample. Ease of obtaining the sample relates
to the cost of locating elements of the population,
the geographic distribution of the sample, and obtaining the interview data from the selected elements.
Examples of convenience samples include mall intercept interviewing, unsystematically recruiting individuals to participate in the study (e.g., what is done
for many psychology studies that use readily available
undergraduates), visiting a sample of business establishments that are close to the data collection organization, seeking the participation of individuals visiting
a Web site to participate in a survey, and including
a brief questionnaire in a coupon mailing. In convenience sampling, the representativeness of the sample
is generally less of a concern compared to purposive
sampling.
For example, in the case of surveying those attending the Super Bowl using a convenience sample,
a researcher may want data collected quickly, using
a low-cost method that does not involve scientific
sampling. The researcher sends out several data collection staff members to interview people at the stadium on the day of the game. The interviewers may,
for example, carry clipboards with a questionnaire
they may administer to people they stop outside the
stadium an hour before the game starts or give it to
people to have them fill it out for themselves. This
variation of taking a convenience sample does not
allow the researcher (or the client) to have a clear
sense of what target population is being represented
by the sample. Although convenience samples are not
scientific samples, they do on occasion have value to
researchers and clients who recognize their considerable limitationsfor example, providing some quick
exploration of a hypothesis that the researcher may
One issue that arises with all probability samples and for
many nonprobability samples is the estimation procedures, specifically those used to draw inferences from
the sample to the population. Many surveys produce
estimates that are proportions or percentages (e.g., the
percentage of adults who do not exercise at all), and
weighting methods used to assign a final weight to each
completed interview are generally given considerable
thought and planning. For probability sampling, the first
step in the weight calculation process is the development
of a base sampling weight. The base sampling weight
equals the reciprocal of the selection probability of
a sampling unit. The calculation of the base sampling
weight is then often followed by weighting adjustments
related to nonresponse and noncoverage. Finally, poststratification or raking is used to adjust the final
weights so that the sample is in alignment with the
population for key demographic and socioeconomic
characteristics. In nonprobability sampling, the calculation of a base sampling weight has no meaning,
because there are no known probabilities of selection.
One could essentially view each sampling unit as
having a base sampling weight of one.
Sometimes nonresponse and noncoverage weights
are developed for nonprobability samples, but the
most common technique is to use a weighting procedure such as post-stratification or raking to align the
nonprobability sample with the population that one
would ideally like to draw inferences about. The poststratification variables are generally limited to demographic and socioeconomic characteristics. One limitation of this approach is that the variables available
for weighting may not include key characteristics
related to the nonprobability sampling mechanism that
was employed to select the sampling units. The results
of weighting nonprobability samples have been mixed
in the situation when benchmarks are available for
a key survey outcome measure (e.g., the outcome of
an election).
526
Nonresidential
Recent Developments
in Nonprobability Sampling
Finally, it should be mentioned that newer versions
of nonprobability sampling have appeared in recent
years, some driven by changes in technology. These
have generally been labeled model-based sampling
approaches, and to some degree the use of the term
model-based sampling has replaced the term nonprobability sampling. Web surveys are one new example
of nonprobability sampling. Web surveys are generally convenience samples of households or adults
recruited to participate in surveys delivered over the
Web. The samples are usually set up as panels, that
is, a recruited household or adult is asked to respond
to some number of surveys over their tenure in the
sample. At the recruitment phase, characteristics of
the households or adults can be collected. This makes
it possible to limit future surveys to households or
adults with a specific characteristic (e.g., persons ages
1824 years, female executives, retirees). Because
respondents use the Web to complete the questionnaire, these nonprobability sample Web surveys can
often be conducted much more quickly and far less
expensively than probability samples.
Another new type of nonprobability sampling is
based on selecting email addresses from companies
that compile email addresses that appear to be associated with individuals living in households. Some
companies set up email panel samples through a recruitment process and allow clients to send a questionnaire
to a sample of email addresses in their panel. For both
email and Web panel samples, the estimation methods
used to attempt to draw inferences from the sample to
the population are a very important consideration. The
use of propensity scores and post-stratification or raking has been explored by some researchers. The calculation of standard errors, as with all nonprobability
samples, is problematic and must rely on model-based
assumptions.
Another relatively new nonprobability sampling
method is known as respondent-driven sampling.
Respondent-driven sampling is described as a form of
snowball sampling. Snowball sampling relies on referrals from an initial nonprobability or probability sample of respondents to nominate additional respondents.
It differs from multiplicity sampling in that no attempt
is made to determine the probability of selection of
each subject in the target population. Snowball samples
are sometimes used to select samples of members of
Further Readings
NONRESIDENTIAL
Nonresidential dispositions occur in telephone and
in-person surveys of the general public when a case
contacted or called by an interviewer turns out to be
Nonresponse
527
posed by vacation and other seasonal homes, and special rules may need to be developed to properly determine the disposition of these housing units.
The challenges and special cases mentioned illustrate that at times it may be difficult to determine
whether a telephone number or housing unit is residential. Although these instances are fairly uncommon, it important to ensure that there is definitive
evidence that the case is nonresidential before applying the nonresidential disposition to a case. Obtaining this evidence may require additional investigation
(such as talking to neighbors or documenting visible
signs that the unit is uninhabited or not used as a residence). In surveys of the general public, a nonresidential outcome is treated as final and ineligible and thus
is not used in response rate calculations.
Matthew Courser
See also Call Forwarding; Dispositions; Final Dispositions;
Response Rates; Temporary Dispositions
Further Readings
NONRESPONSE
Nonresponse refers to the people or households who
are sampled but from whom data are not gathered, or
to other elements (e.g., cars coming off an assembly
line; books in a library) that are being sampled but for
which data are not gathered. The individual nonrespondents on a survey contribute to the nonresponse
rate, the aggregate tally of how many did not provide
data divided by how many should have. One
minus the nonresponse rate is, of course, the overall
response rate.
Classification
Calculating nonresponse is not as simple as it initially
seems. One survey may have mainly nonrespondents
who could not be reached (i.e., contacted) by the
528
Nonresponse
researchers. Another reaches most of the sampled persons, but a large proportion refuse the survey. A third
has an easily reached, cooperative set of people in
a sampling frame, but many do not possess a characteristic that is part of the definition of the desired
survey population. These reasons for nonresponse,
usually termed noncontact, refusal, and ineligibility,
respectively, are differentiated within most classification procedures for nonresponse, such as the Standard
Definitions established by the American Association
for Public Opinion Research (AAPOR). The components of nonresponse may differ across the various
contact types or modes of survey. For example, the
noncontact count may be higher on a telephone survey in which calls to some numbers are never
answered than on a postal survey in which a letter
mailed is presumed to be a contact, except for a few
nondeliverables returned by the post office. On the
street, as distinguished from the logical world of survey methodology research articles and textbooks, the
distinctions among noncontact, refusal, and ineligibility are not always watertight. Is a foreign-born person
who opts out of a survey due to language problems,
or an elderly citizen with impaired hearing or eyesight, an ineligible or a refusal? Only the person
himself or herself really knows. What of those who,
in a face-to-face household survey, recognize the
approaching person as a survey taker and decline to
answer the doorbell? Refusal or noncontact?
Predictability
Nonresponse, like many social behaviors, is only
weakly predictable at the level of the individual yet
becomes more ordered at the aggregate. Any experienced survey methodologist, told the type of population to be sampled, the mode of contact (i.e.,
telephone, in-person, mail, Web, or multi-mode), the
length and content of the questions, the resources available for callbacks and follow-ups, and whether or not
payments or other incentives are to be used, can give
an informed estimate of the final response and nonresponse rates. Information on the sponsor, and whether
the fieldwork is to be conducted by a government
agency, a survey unit at a university, or a commercial
firm, will add further precision to the estimates. At the
same time, and in most cases, the attempt at prediction
of whether one person versus another will respond or
not generates only slight probabilities. Always depending on the particulars of the survey, people of greater
A Historical Perspective
on Survey Nonresponse
It used to be firmly believed by survey methodologists
that the response versus nonresponse rates on a survey
accurately indicated the quality of the data. One rule
of thumb, to which several generations of students
were exposed via a widely used textbook by Earl
Babbie, was that a mailed survey with 50% response
was adequate, 60% good, and 70% very
good. These notions were advanced in the early
1970s, when the highest quality surveys still tended to
be face-to-face and the telephone survey was still
under development. Face-to-face (personal) interview surveys of the national population in the United
States, if of high quality, were expected in this era to
have nonresponse rates of only some 20 to 25%. By
the mid-1970s, however, a growing number of voices
were detecting erosion in response rates to personal
interview surveys. The issue even was noted in The
New York Times on October 26, 1975. The telephone
survey was meanwhile gaining acceptance among survey researchers as they perfected the generation of
sampling frames via random-digit dialing (RDD).
Research was also under way during the 1970s on
enhancing the validity of mailed surveys, led by Don
Dillmans comprehensive strategy known as the total
design method or TDM.
Nonresponse
529
Current Theories
Survey methodologists thus put considerable resources
into the quality of survey fieldwork. Since part of the
problem of high survey nonresponse in the 21st century
is lifestyle basedfor example, the diminished predictability of when a householder will be reachable by
telephone at homethe number of repeat calls or
callbacks has had to be increased for high-quality
telephone surveys. Survey research organizations are
also now more involved in attempts at conversion of
initial refusers than they may once have been. Considerable experimentation has been conducted with incentives for surveys. Modest gifts or small sums of money
work especially well on the self-enumerated modes of
survey, such as mailed or Web contact. The incentive
can be delivered up-front, pre-paid, not conditional
on the response taking place. Somewhat counterintuitively, pre-paid rewards prove more effective than
the post-paid rewards given only after the survey is
successfully collected from the respondent. Naturally all
such techniques add to the costs of surveying and tend
to slow down the speed at which data can be collected.
Especially for commercial survey firms, who must
watch the balance sheet, it is difficult to maintain
response rates near historical norms. To be sure, surveys via the Internetthe Web surveyhold some
promise for controlling the spiraling costs of survey
research. Since the year 2000, growing attention to
the Web mode has been evident in the research journals. At the moment, Web surveys work best for
well-defined subpopulations having known email
addresses. University students, for example, are routinely being surveyed by university administrators and
academic researchers, with response rates in the 40s
when carefully conducted. NSSE, the National Survey
of Student Engagement, is one well-known example.
The general public is harder to survey adequately via
the Web. A survey questionnaire simply dumped onto
an Internet site for those Web passers-by who wish to
answer (sometimes as many times as they like!) is not
a serious option. Better-quality Web surveys mimic
the traditional mailed survey. A sampling frame
exists, consisting in the Web case of email addresses,
and a sequence of contacts is carried out. Provision is
made via passwords to prevent one person answering
several times. It is imaginable that convergence will
530
Nonresponse
human populations. Not all topics hold the same interest for all peoplequite the opposite. Research by
Robert M. Groves and colleagues shows that response
rates can exceed the norm by tens of percentage points
when the topic of the survey and the subpopulation
being sampled match up well. It follows from this
leverage-saliency interpretation of nonresponse that
techniques such as cash incentives will be proportionately most effective when the topicpopulation match,
thus the salience of the topic, is weak. As such research
accumulates, it is increasingly realized that just knowing how many of the eligible people in a sample
responded to a survey is insufficient information.
People assessing the validity of a survey need to know
why people responded and why others did not.
The study of survey nonresponse is thus both
a technical issue for methodologists and a rich conceptual puzzle for social science. Ideas about survey
nonresponse adapt to the tectonics of social change,
making this research topic a living, changing body of
ideas rather than a cemetery of textbook certainties.
John Goyder
See also American Association for Public Opinion Research
(AAPOR); Leverage-Saliency Theory; Nonresponse
Error; Nonresponse Rates; Panel; Random-Digit Dialing
(RDD); Social Exchange Theory; Standard Definitions;
Total Design Method (TDM)
Further Readings
Nonresponse Bias
NONRESPONSE BIAS
Nonresponse bias occurs when sampled elements
from which data are gathered are different on the
measured variables in nonnegligible ways from those
that are sampled but from which data are not gathered. Essentially, all surveys are likely to have some
degree of nonresponse bias, but in many cases it
occurs at a very small and thus a negligible (i.e.,
ignorable) level. The size of the bias is a function of
(a) the magnitude of the difference between respondents and nonrespondents and (b) the proportion of all
sampled elements that are nonrespondents. Thus, even
if only one of these factors is large, the nonresponse
bias may well be nonnegligible.
There are three basic types of survey nonresponse.
The first is refusals, which occur when sampled
individuals or households decline to participate. The
second is noncontacts, when sampled individuals are
never reached. The third type of nonresponse consists of situations in which the interviewer cannot
communicate with the sampled person because of
a language barrier or some mental or physical disability. Most nonresponse is the result of refusals
and noncontacts.
It has long been thought that response rates are
a good indicator of survey quality and nonresponse
bias; however, recent research has challenged this
notion. This is encouraging news for researchers
because it means that surveys with lower response
rates are not necessarily more biased than those with
higher response rates. But this does not mean that
nonignorable nonresponse bias cannot or will not
occur. Nonresponse bias will be present when the
likelihood of responding is correlated with the variable(s) being measured, and this correlation can vary
across variables even within the same survey.
Nonresponse bias has been a growing concern to
survey researchers as response rates have declined
over the years. There are a variety of reasons for this,
including an increase in refusals with the rise of telemarketing and an increase in technologies to screen
calls such as answering machines, voicemail, and caller ID. It is important to consider nonresponse due to
refusals and noncontacts separately because the characteristics of refusers can be different from those who
are difficult to reach.
Researchers can use several methods to maximize
response rates; much recent research has focused on
531
Recent Research
Through the 1990s, a common assumption in the field
of survey research was that surveys with higher
response rates always were more accurate (i.e., had
lower nonresponse bias) than those with lower
response rates. But this assumption has been challenged by recent research on opinion questions in
telephone polls and exit polls. In telephone polls,
Scott Keeter and his colleagues, and separately
Richard Curtin and his colleagues, have found that
fairly large changes in response rates had a minimal
impact on their survey estimates. Using a large number of in-person exit polls over different election
cycles, Daniel Merkle and Murray Edelman found
little or no relationship between response rates and
survey bias.
This research has been encouraging to survey
researchers because it shows that surveys with lower
response rates are not necessarily less accurate. However, this does not mean that survey researchers do
not need to worry about nonresponse bias. Robert
Groves conducted an extensive review of previous
nonresponse studies, most of which focused on nonopinion variables, and found a number of instances in
which nonresponse bias was present, even in some
532
Nonresponse Bias
Further Readings
Nonresponse Error
NONRESPONSE ERROR
During the past decade, nonresponse errorwhich
occurs when those units that are sampled but from
which data are not gathered differ to a nonignorable
extent from those sampled units that do provide
datahas become an extremely important topic to
survey researchers. The increasing attention given to
this part of the total survey error is related to the
observation that survey participation is decreasing in
all Western societies. Thus, more efforts (and cost
expenditures) are needed to obtain acceptably high
response rates.
In general, the response rate can be defined as the
proportion of eligible sample units for which an interview (or other form of data collection) was completed. The calculation of the standard response rate
is straightforward: the number of achieved interviews
divided by the number of sample units for which an
interview could have been completed. These are eligible sample units: completed interviews, partial interviews, noncontacted but known eligible units, refusals,
and other noninterviewed eligible units.
Simple Model
Although the nonresponse rate is important information used to evaluate the nonresponse error, that rate
is only one component of the nonresponse error. The
biasing effect of nonresponse error is also related to
the difference between respondents and nonrespondents. A simple model for a sample mean can be used
to illustrate this point.
Given a sample with n units, r of these n units participated in the survey and nr units did not: n = r +
nr. Y is a metric characteristic, and Y r = mean estimated for the r respondents, Y n = mean estimated for
all n sample units, and Y nr = mean estimated for the
nr nonrespondents. The estimated mean for all the
units in the sample (Y n ) equals to a weighted sum of
the estimated mean for the respondents (Y r ) and
the estimated mean for the nonrespondents (Y
). The
nr
nr
latter is weighted by the nonresponse rate
, the
n
533
r
. This results in the
n
following formal specification:
former by the response rate
r
nr
Y n = Y r + Y nr
n
n
nr
nr
Y r + Y nr
= 1
n
n
nr
= Y r (Y r Y nr )
r
nr
Y r = Y n + (Y r Y nr )
r
This expression makes it clear that the estimated
mean for the respondents is equal to the estimated
mean for all units in the sample plus a factor that
expresses the biasing effect of the nonresponse error.
When there is no nonresponse, then (nr = 0 r = n):
Y r = Y n . In this situation, the estimated mean for the r
respondents is equal to the mean estimated for all n
sample units. This signifies that there is no nonresponse error. This is also the case when the estimated
mean of the respondents and the nonrespondents are
equal: Y r = Y nr . In this case, the decision to participate
is uncorrelated with Y, and as far as Y is concerned,
there is no difference between the group of respondents
and the group of nonrespondents.
Although this nonresponse model for a sample
mean is simple, it shows that the reduction of nonresponse error operationally is not straightforward. For
example, with an incentive one can increase the
response rate, but it is possible that some persons are
more susceptible to the incentive than others. When
the susceptibility to incentives is related to a substantive relevant characteristic, the use of the incentive
will increase the difference between respondents and
nonrespondents with respect to that characteristic,
thus increasing nonresponse error. In this situation,
a higher response rate due to the incentive does not
result in a smaller nonresponse error; instead, just the
opposite occurs.
The model also illustrates that for the evaluation of the
nonresponse error one must both compare the respondents with the nonrespondents and calculate the response
rate. After defining some additional basic concepts, the
information necessary for this evaluation follows.
Basic Concepts
Until now, the sample units in the example given
were divided or classified into two general groups:
534
Nonresponse Error
Table 1
Information to Evaluate
Nonresponse Error
Information from all the units in the sample (respondents and nonrespondents) is needed to evaluate the
nonresponse error in a survey. This is the case both to
calculate the rates discussed previously and to compare
the group of respondents with the group of nonrespondents. Obtaining information from nonrespondents
seems at first a contradiction in terms. Although it is
a difficult task to collect information from the nonrespondents, there are some opportunities. Sometimes
one can use the sampling frame. The sampling frame
is the listing of all units in the target population. Sometimes the sampling frame (e.g., national register, staff
register) contains information about some basic sociodemographic characteristics of the units (e.g., age, gender, civil status, educational level, professional group).
When this kind of information is available, it offers an
excellent opportunity to compare respondents and nonrespondents using some relevant background characteristics. Another extremely useful source of information
Respondents (r)
Nonrespondents (nr)
- Refusals (rf)
- Noncontacts (nc)
- Other nonrespondents (on)
Ineligible (ie)
Nonresponse Error
535
536
Nonresponse Rates
NONRESPONSE RATES
The nonresponse rate is defined as the percentage of all
potentially eligible units (or elements) that do not have
responses toat least a certain proportion ofthe
items in a survey questionnaire. Thus, a nonresponse
rate can be calculated at the unit level (in which all data
from a sampled respondent are missing) and/or the item
level (in which only data for certain variables from
a sampled respondent are missing).
The nonresponse rate is not the same as nonresponse error. Nonresponse error occurs when nonrespondents in a survey are systematically different
from respondents in nonnegligible ways that are germane to the objects of the study. For example, if the
survey attempts to assess public opinion on the presidents new plan for the Iraq War, then nonresponse
error would occur if citizens who express their opinions on the issue were significantly more likely to
oppose the plan than were those who were sampled
but from whom no data were gathered. As such, the
nonresponse rate alone does not tell whether there is
nonresponse error or how much error exists in the survey. However, knowing the nonresponse rate is a necessary step toward estimating the nonresponse error.
All sampled cases in a survey can be categorized
into one of four major groups: (1) eligible cases with
sufficient data to be classified as responses, (2) eligible cases with no sufficient data (i.e., nonresponses),
(3) cases of unknown eligibility, and (4) ineligible
cases.
Nonresponse Rates
Responses
Cases that are treated as responses can also be divided
into two groups: complete responses and partial
responses. Standards that are widely used to define partial responses versus complete responses include the
following: (a) the proportion of all applicable questions
completed; (b) the proportion of essential questions
completed; and (c) the proportion of all questions
administered. Essential questions may vary, depending
upon the purposes of a survey. In the case of the previously mentioned survey about public approval of the
new war plan, the crucial variables may include attitudes toward the plan, party identification, and so on.
Survey researchers should define what constitutes a
complete versus a partial response based on one of the
standards or a combination of the standards prior to data
collection. Complete responses, for example, could be
defined as cases that have answers to 95% or more of
essential questions, whereas partial responses might be
defined as cases with answers to 5094% of such questions. Cases with fewer than 50% of the essential questions answered could be treated as breakoffs, which is
a form of nonresponse. By this definition, only cases
with data for at least 50% of crucial questions would be
deemed as responses or partial responses.
Nonresponses
Nonresponses can result from noncooperation or refusals, noncontacts, and from other factors. The situations
in which instances of nonresponse occur vary across
surveys and across different sampling or data collection
modes. Refusals happen in random-digit dialing (RDD)
telephone or in-person surveys of households when
a household has been contacted and either a responsible
household member or the designated respondent has
refused to participate in the study. Breakoffs refer to
a premature termination from an interview. In mail or
online surveys, refusals occur when contact has been
made with a specifically named respondent or with
a household or organization where the respondent
works or lives, and the respondent or a responsible
member of the household or organization has refused
to participate. However, researchers often are not certain when this has happened as they typically receive
no direct evidence to this effect. A breakoff occurs
when partially completed questionnaires are returned
with some notification suggesting that the respondent
refused to complete the rest of the questionnaire. The
537
Unknown Eligibility
Unknown eligibility occurs in RDD telephone surveys
of households when it is not known if a sampled telephone number belongs to a residential household or
whether there is an eligible respondent living in the
household. In in-person household surveys, a case
is categorized as unknown eligibility when it is
unknown whether an eligible household is located at
the sampled address or whether an eligible respondent
lives in the place. In mail or online surveys, unknown
eligibility includes cases in which nothing is known
about whether a questionnaire has reached the respondent or whether the respondent is eligible.
Ineligible Cases
In landline RDD telephone surveys of households, ineligible cases may include phone numbers of households
that are located outside of the geographical area of
interest. For example, researchers may be interested in
538
Nonresponse Rates
Formulae to Calculate
Nonresponse Rates
The calculation of nonresponse rates varies depending
upon what are considered as potentially eligible units.
The first formula of nonresponse rate (NRR1) generates
the minimum nonresponse rate because the potential
eligible cases include all cases of responses, all cases of
nonresponses, and all cases of unknown eligibility:
NRR1 = nonresponses=responses
+ nonresponses + unknown eligibility:
The second way of calculating the nonresponse
rate (NRR2) requires researchers to estimate the proportion of cases with unknown eligibility (separately
by type, e.g., for answering machines, for ringno
answers, for busy signals) that may be actually eligiblethat is, the e term in the formula NRR2based
on the best available scientific evidence. By doing so,
researchers can make a more accurate estimate of eligible units, thereby having a better estimate of the
nonresponse rate. When using NRR2, the evidence
used for the estimation must be provided in detail.
For instance, researchers might know that 40% of
cases with ringno answer in an RDD telephone
survey actually are eligible according to past survey
experience. Then they can set the value of e as 0.4 for
the ringno answer final dispositions and calculate the
rate using the estimate.
NRR2 = nonresponses=responses + nonresponses
+ e unknown eligibility:
Nontelephone Household
NONSAMPLING ERROR
Nonsampling error is a catchall phrase that refers to
all types of survey error other than the error associated with sampling. This includes error that comes
from problems associated with coverage, measurement, nonresponse, and data processing. Thus, nonsampling error encompasses all forms of bias and
variance other than that associated with the imprecision (variance) inherent in any survey sample.
Coverage error refers primarily to the bias and variance that may result when the sampling frame (the list
from which the sample is drawn) used to represent the
population of interest fails to adequately cover the
population, and the portion that is missed differs in nonignorable ways from the portion that is included on the
frame. Coverage error also includes bias and variance
that can result when a within-unit respondent selection
technique is used that does not adequately represent the
population at the level of the individual person. Nonresponse error refers to the bias and variance that may
result when not all those who are sampled have data
gathered from them, and these nonresponders differ in
nonignorable ways from responders on variables of
interest. Item-level nonresponse error includes the bias
and variance that may result from cooperating respondents who do not provide answers (data) to all the variables being measured if the data they would have
provided differ in nonignorable ways from the data that
the other respondents are providing on those variables.
Measurement error refers to bias and variance that may
result related to the questionnaire, the behavior of the
person who gathers the data, the behavior of respondents, and/or the mode of data collection. Data processing errors refer to the bias and variance that may
result from mistakes made while processing data,
including the coding and recoding of data, the transformation of data into new variables, the imputation of
missing data, the weighting of the data, and the analyses that are performed with data.
539
NONTELEPHONE HOUSEHOLD
Telephone surveys became an acceptable mode of
data collection in the United States in the 1970s, when
approximately 90% of households in the United States
had a telephone. According to the 2000 Census, 98%
of U.S. households contained a telephone. However,
the 2000 Census did not distinguish between wireline
and wireless service, so having a telephone could
mean having a landline phone, a wireless phone, or
both. In each year since the 2000 Census, more and
more households began to substitute wireless telephone service for their wireline or landline telephone
service. This phenomenon, often referred to as cutting
the cord, has introduced additional coverage bias in
traditional wireline random-digit dial (RDD) samples,
as in 2008 approximately 20% of U.S. households had
only a cell phone.
540
Nonverbal Behavior
Further Readings
NONVERBAL BEHAVIOR
Nonverbal behavior is physical action that complements, supplements, or takes the place of spoken words
or sounds. Examples include, but are not limited to,
facial expressions, body postures, and gestures.
Data collection in survey research may include
the cataloging (observing and coding) of nonverbal
Null Hypothesis
541
The conditional and contingent meanings of nonverbal behavior as a complement to verbal communication
can make coding and analyzing the information a challenging, but potentially very useful, aspect of research.
Heather H. Boyd
See also Attitude Measurement; Behavior Coding; Coding;
Face-to-Face Interviewing; Interviewer Neutrality;
Interviewer Training; Measurement Error; Social
Desirability
Further Readings
NULL HYPOTHESIS
A null hypothesis is one in which no difference (or no
effect) between two or more variables is anticipated
by the researchers. This follows from the tenets of science in which empirical evidence must be found to
disprove the null hypothesis before one can claim
support for an alternative hypothesis that states there
is in fact some reliable difference (or effect) in whatever is being studied. The null hypothesis is typically
stated in words to the effect that A equals B. The
concept of the null hypothesis is a central part of
formal hypothesis testing.
An example in survey research would be a splithalf experiment that is used to test whether the order
of two question sequences within a questionnaire
affects the answers given to the items in one of the
sequences, for example, in crime surveys where fear
of crime and criminal victimization experience are
both measured. In this example, a researcher could
hypothesize that different levels of fear would be
reported if the fear items followed the victimization
items, compared to if they preceded the victimization
items. Half the respondents would be randomly
assigned to receive one order (fear items, then victimization items), and the other half would receive the
other order (victimization items, then fear items). The
null hypothesis would be that the order of these question sequences makes no difference in the answers
given to the fear of crime items. Thus, if the null
hypothesis is true, the researcher would not expect to
542
Number Changed
observe any reliable (i.e., statistically significant) difference in levels of fear reported under the two question
ordering conditions. If results indicate a statistically reliable difference in fear under the two conditions, then
the null hypothesis is rejected and support is accorded
to the alternative hypothesis, that is, that fear of crime,
as reported in a survey, is affected by whether it precedes or follows victimization questions.
Another way of understanding the null hypothesis
in survey research is to think about the crime survey
example and the confidence intervals that can be calculated around the fear of crime measures in the two
ordering conditions. The null hypothesis would be
that the 95% confidence intervals for the fear measures under the two orders (conditions) would overlap
and thus not be reliably (significantly) different from
each other at the .05 (alpha) level. The alternative
hypothesis would be that the confidence intervals
would not overlap, and thus the fear measures gathered under one order are reliably different from the
same fear measures gathered under the other order.
Rejecting a null hypothesis when it is in fact true
is termed a Type I error. Not rejecting a null hypothesis when it is fact false is termed a Type II error.
Paul J. Lavrakas
See also Alpha, Significance Level of Test; Alternative
Hypothesis; Confidence Interval; Experimental Design;
p-Value; Split-Half; Statistical Power; Type I Error;
Type II Error
Further Readings
NUMBER CHANGED
When a telephone survey is conducted, some dialings
will result in a message stating that the number they
have dialed has been changed. The message often
includes the new number. Number changed dispositions in RDD samples are normally classified as
ineligible, but there are some circumstances for which
Further Readings
Number Portability
NUMBER PORTABILITY
Number portability is the ability of users of telecommunications services in the United States and most
other countries to keep their existing telephone number when changing from one local service provider to
another. Under Federal Communications Commission
(FCC) rules, the implementation of local number portability (LNP) in the United States has caused some
problems for telephone survey researchers. In the
early days of wireline-to-wireline portability, both
numbers associated with a porting request could be
dialed successfully. When wireline-to-wireless porting
was implemented, there were concerns that this would
influence the ability of researchers and telemarketers
to comply with the portion of the Telephone Consumer Protection Act (TCPA) of 1991 that limited
calls to certain types of phone numbers, including
wireless phone numbers.
In 1996, the U.S. Congress amended the Telecommunications Act of 1934 to establish a framework that
would promote competition and reduce regulation in
all telecommunications areas. It was recognized that
certain barriers to competition would need to be eliminated, specifically the inability of customers to switch
from one service provider to another and retain the
same phone number. New FCC rules were enacted that
gave consumers the ability to switch from one service
provider to another and keep their existing telephone
numbers. The rules were applicable only locally, that
is, only within a local exchange or rate center. If a person moved from one geographic area to another, the
number would not be portable. Telephone companies
were allowed to charge a fee to cover their porting
costs. However, because of these costs, most small
landline companies were not required to port numbers
to wireless carriers until the FCC had completed a study
about the effects of porting rules on small companies.
Local number portability in the United States was
implemented in phases. Portability between local landline (wire) service providers was implemented in 1998.
The FCC had also required that three categories of
CMRS (commercial mobile radio service) providers
cellular providers, broadband personal communications
service (PCS) providers, and covered specialized
543
544
Number Portability
Number Verification
NUMBER VERIFICATION
The verification of a telephone number in a telephone
survey is done by an interviewer who confirms with
a respondent that the number that was ostensibly dialed
is in fact the number that was reached. The need to do
this has been reduced over the years as more landline
telephone surveys have come to be conducted with
computer-assisted telephone interviewing (CATI) systems that include software and hardware to place the
calls to the sampled telephone numbers, as opposed to
having interviewers manually dial the numbers. However, the need for number verification has not been
eliminated completely, as even with automatic dialing
equipment mistakes sometimes occur. Furthermore, in
the United States and due to current federal telecommunications regulations, all cell phone numbers that
are sampled for a telephone survey must be handdialedunless the cell phone owner has given the survey organization prior consent to be calledand thus
interviewers will need to verify whether they have
dialed the sampled number correctly.
There are several reasons that a landline (or cell
phone) telephone survey may not reach the correct
number, even when using equipment to place the
calls. For example, national and local telephonic systems are subject to occasional error when wires get
crossed (the electronic signals get mixed up), thus
leading to a call reaching another number than the
one to which it was intended. Call forwarding,
545
Further Readings
O
main objectives is to engage the respondent (e.g., In
your opinion, what is the most important issue facing
the United States today?).
OPEN-ENDED QUESTION
The selection of question structure is fundamental to
the process of questionnaire construction. The openended question is one type of structure; the other,
more commonly used alternative is the closed-ended
question. The open-ended question does not provide
answer categories. The person (respondent) who is
asked an open-ended question formulates the answer
and gives the response in his or her own words.
Although this structure gives the respondent more
freedom in crafting an answer, it also increases the
cognitive effort. Without answer choices as cues to
aid in understanding the question and deciding on an
answer, the respondent has to perform additional cognitive tasks before he or she responds.
All open-ended questions are alike in that the respondent is not given answer choices. However, the reasons for using this structure and the level of cognitive
effort needed to respond can vary. The following are
seven examples that illustrate different reasons for
open-ended questions.
5. Establish knowledge. A test question can distinguish between more and less informed respondents to
enhance the understanding of opinion formation (e.g.,
Who are the U.S. senators from your state?).
548
Open-Ended Question
There are two basic methods used to collect information: self-administered (the respondent self-records
answers by writing on a paper questionnaire or by
entering a response into a computer) and interview
(the respondent answers questions that are read and
then recorded verbatim by another person, i.e., the
interviewer). With the self-administered method, the
respondent is responsible for providing a quality
answer. The types of respondent-related errors specific
to the open-ended structure are (a) missing answers;
(b) incomplete responses; (c) misunderstood terminology; and (d) illegible writing. The format of a selfadministered paper or electronic questionnaire can provide some assistance in reducing these errors. For
example, the size of the space provided is a visual cue
on how much information the respondent is expected
to reporta smaller space results in less information,
while more information is provided when there is
a larger space. When there is a request to write in
numeric factual information (e.g., What was the last
date you saw your doctor?), clear instructions on how
to provide the month, day, and year will reduce the
variation, and possible errors, on how respondents
report this information.
An interview can reduce some of the selfadministered errors because an interviewer guides the
respondent; however, there are other data quality
Coding verbatim responses is necessary with openended questions. While one of the main advantages of
using an open-ended structure is getting specific, individual information, the lists of verbatim answers need
to be organized to be useful for data analysis and
reports. Developing numeric codes to accurately represent the verbatim responses is challenging. The quality
of open-ended data is diminished when careful attention is not given to code development. Errors can also
occur when a person reads a verbatim answer and has
to make a judgment about the most appropriate code to
assign. Thorough training on how to make these judgments improves the accuracy and the confidence in
reliable coding results. When multiple people are coding verbatim responses, the quality of the data also
depends on intercoder reliability. To minimize the
amount of coding that is needed on a questionnaire
completed with an interviewer, a list of precoded
answers can be provided. While the question is still
asked using an open-ended structure, the interviewer
uses the precoded list to classify a verbatim answer.
There are two possible sources of error associated with
a list of precoded choices: the reliability of the interviewers judgment in selecting the appropriate answer
and the accuracy of the items on the precoded list.
To obtain quality information, open-ended questions
require sufficient time and financial resources to support the actions required for data quality. Time needs
to be allowed when the questions are being answered
and for coding the answers after the completed questionnaires are returned. A typical open-ended question
can take two or three times longer for a respondent to
complete than a closed-ended question because of the
Opinion Norms
Further Readings
OPINION NORMS
From a social science perspective, public opinion is
much more than an aggregation of polling statistics.
While individual survey responses are an instrumental
component of understanding public opinion as a social
force, the context in which the individual operates
(e.g., media environment, typical discussion patterns)
is an equally important consideration in obtaining
a better understanding of the evolution of public opinion. Recognizing the normative aspects of a public
opinion climate allows researchers to understand
better how individuals come to possess opinions and
how those opinions are shared with others.
Past work on behavioral norms offers insight as
to how contextual forces, or climates of opinion,
can influence the actions and expressions of group
549
550
Opinion Question
variables within the question (e.g., substituting candidates or discussion topics, level of support or opposition, and reference groups named within the question).
With this response data in hand, Jay Jacksons
Return Potential Model (RPM) provides one example for quantifying the normative opinion climate.
Normative intensity can be evaluated using the
RPM by calculating the mean group deviation from
a midpoint or neutral response option. This measure
specifies the approval or disapproval associated with
an opinion as well as the extremity or strength with
which a norm is held (e.g., slight approval versus
strong approval of expressing a specified opinion). It
is quantified by using the following equation:
Ii = xi m,
where intensity (I) is the mean deviation from the
researchers specified midpoint value (m). It is important to note that intensity is a bidirectional concept
that can apply to strong disapproval as well as strong
approval. In other words, the degree to which a certain
opinion norm is characterized by negative intensity
(i.e., an opinion is strongly opposed) is related to the
probable level of social sanctions for expressing that
opinion. For high-intensity norms, violation will most
likely result in some form of social isolation. On the
other hand, expressing opinions associated with lowintensity norms may be seen as odd but will likely
bear little social cost.
When measuring the normative opinion climate, it
also is important to know how much agreement exists
within a specified group. The RPM model also captures normative crystallization, which quantifies
the amount of consensus associated with a norm.
Specifically, this assessment of crystallization measures the level of agreement regarding what opinions
are appropriate to express. Mathematically, crystallization is inversely associated with variance among
respondents normative views and therefore becomes
greater when there is relatively more agreement about
public opinion. To properly derive crystallization (C),
the following equation is used:
Ci = 1=xi xmean ,
where Ci is the inverse of the deviation from the mean
approval rating among all respondents. Highly crystallized opinion norms are well understood and solidified.
Low crystallization is likely to be associated with
normative ambiguity. For example, if it is perceived
OPINION QUESTION
Opinions that individuals hold about an issue are
potentially quite complex, considering how opinions
comprise beliefs, feelings, and values on that issue.
As a result, survey questions designed to assess
a respondents opinion on an issue can tap a combination of the respondents feelings and thoughts. As
composite measures, however, opinions are gauged
through a variety of opinion items.
The most basic opinion question, sometimes called
an attitude question, is designed to measure the
direction of opinion. That is, where does the respondent stand on the issue or attitude object? Such opinion items, typically closed-ended, can be dichotomous
in nature (e.g., Do you support or oppose abortion?
Answer categories: Support; Oppose. Are you in favor
of the death penalty for a person convicted of
murder? Answer categories: Yes; No). Opinion items
Opinion Question
551
Further Readings
552
Opinions
OPINIONS
Opinions in survey research can be defined as subjective attitudes, beliefs, or judgments that reflect matters
of personal (subjective) preference. Some opinions
may not be confirmable or deniable by factual evidence (e.g., a persons attitude toward the use of capital punishment), whereas others may be (e.g., the
belief that a particular presidential candidate will be
elected). Moreover, the strength of ones opinions
may depend on ones level of knowledge or attentiveness on a subject.
The term opinion is often used interchangeably
with attitude and belief, but opinions are a broader
category that includes both attitudes and beliefs.
Ones subjective position on the truth of a subject is
a belief, such as whether global warming is the result
of human-made activity or if abstinence education
lowers the rate of teenage pregnancies. Some beliefs
could, at some point, be resolved with finality through
science. Attitudes are a type of opinion that have an
evaluative componentthat is, they refer to a positive
or negative evaluation of a person, idea, or object.
Attitudes are latent, subjective constructs that cannot
be observed directly and cannot be confirmed or
denied with factual information.
Opinions are measured in surveys, which is just
one way that opinions can be expressed. Opinions can
also be expressed via other behaviors such as voting,
participation in marches or demonstrations, or attempting to influence another persons opinion. Although
surveys involve measures of individuals opinions, surveys typically are designed to measure public opinion,
which can be defined as the aggregate of opinions
across the public.
Optimal Allocation
553
Further Readings
OPTIMAL ALLOCATION
Optimal allocation is a procedure for dividing the
sample among the strata in a stratified sample survey.
The allocation procedure is called optimal because
in a particular survey sampling design (stratified simple random sampling) it produces the smallest variance for estimating a population mean and total
(using the standard stratified estimator) given a fixed
budget or sample size.
A sample survey collects data from a population in
order to estimate population characteristics. A stratified sample selects separate samples from subgroups
(called strata) of the population and can often
increase the accuracy of survey results. In order to
implement stratified sampling, it is necessary to be
able to divide the population at least implicitly into
strata before sampling. Given a budget that allows
gathering data on n subjects or a budget amount $B,
there is a need to decide how to allocate the resources
for data gathering to the strata. Three factors typically affect the distribution of resources to the strata:
(1) the population size, (2) the variability of values,
and (3) the data collection per unit cost in the strata.
One also can have special interest in characteristics of
some particular strata that could affect allocations.
In a stratified simple random sample, a sample of
size nh is selected from strata or subpopulation h,
which has a population size of Nh (h = 1, 2, . . . , H).
The standard estimator of the population total is
H
P
Nh yh , where yh is the mean (arithmetic average)
h=1
P
of the sample values in stratum h and
denotes
554
Optimal Allocation
h=1
variance of the values in stratum h. If the rate of sampling is small in all strata, then (ignoring
the finite
population correction terms 1
approximately
H
P
h=1
nh
Nh
) the variance is
S2
g=1
Table 1
Year
Population Size:
Total Enrollment
Standard
Deviation: Sh
First
5,000
2.9 hours
1.0
223.8
224
Second
4,058
4.4 hours
1.1
262.8
263
Third
4,677
5.8 hours
1.2
382.3
382
Fourth
6,296
8.9 hours
1.4
731.1
731
Total
20,031
1,600.0
1,600
Ng Sg =
g=1
Cg
Sample Size:
Rounded Values
Ordinal Measure
ORDINAL MEASURE
Within the context of survey research, measurement
refers to the process of assigning values to characteristics of individuals to indicate their position on an
555
556
Outbound Calling
develop categories that are balanced and approximately equal distance from one another. Likert scales,
a popular type of ordinal scale, demonstrate this
balance well. Likert scales are bipolar and include
categories with both positive and negative values. A
typical example is one in which respondents are asked
their level of agreement with a particular statement,
with response options ranging from strongly disagree, somewhat disagree, neither, somewhat
agree, to strongly agree. With regard to labeling,
decisions include whether to label all of the categories
or just the end points with verbal descriptions, or
whether to label the categories with a combination of
verbal descriptions and numbers. Overall, data quality
is optimized when every scale point is represented by
a verbal description.
Jennifer Dykema, Steven Blixt, and John Stevenson
See also Balanced Question; Bipolar Scale; Closed-Ended
Question; Interval Measure; Level of Measurement;
Likert Scale; Nominal Measure; Ratio Measure
Further Readings
OUTBOUND CALLING
Telephone calls involving call centers are classified as
inbound or outbound, depending on whether the call
is being received by the call center (inbound) or initiated in the call center (outbound).
Any telephone survey that starts with a list of telephone numbers involves outbound calling, although
telephone surveys often use inbound calling in support functions and can also be entirely inbound.
The list of numbers used might be randomly generated, such as a random-digit dialed (RDD) sample,
created from public listings such as the white pages or
from prior respondent contact (for example, a customer
satisfaction survey might use the phone numbers provided by the customer at the point of purchase).
Particularly with RDD surveys, only a small proportion of dials made will result in live connects, and
therefore the efficiency of outbound calling can be
Outliers
Further Readings
Kelly, J., Link, M., Petty, J., Hobson, K., & Stork, P. (2008).
Establishing a new survey research call center. In
J. Lepkowski, C. Tucker, M. Brick, E. de Leeuw,
L. Japec, P. Lavrakas, et al. (Eds.), Advances in telephone
survey methodology (pp. 317339). New York: Wiley.
Trussell, N., & Lavrakas, P. J. (2005, May). Testing the
impact of caller ID technology on response rates in
a mixed mode survey. Paper presented at the 60th annual
American Association for Public Opinion Research
Conference, Miami Beach, FL.
OUTLIERS
An outlier, as the term suggests, means an observation
in a sample lying outside of the bulk of the sample
data. For example, the value 87 is an outlier in the
following distribution of numbers: 2, 5, 1, 7, 11, 9, 5, 6,
87, 4, 0, 9, 7. This original meaning has been expanded
to include those observations that are influential in estimation of a population quantity. Influence of an observation in estimation is intuitively understood as the
degree to which the presence or absence of that observation affects the estimate in terms of the variance.
The notion of outliers is common in all statistical
disciplines. However, it has a distinctive meaning in
sample surveys for mainly two reasons: (1) sample
surveys mostly deal with finite populations, often
without assuming a parametric distribution; and (2)
sample surveys often employ complex sample designs
with unequal inclusion probabilities. Moreover, the
meaning and handling of outliers differ also, depending on the stage of the survey process at hand: sample
design stage, editing stage, and estimation stage.
The occurrence of outliers is frequently unavoidable
when a multi-purpose survey is conducted. It may be
nearly impossible to make the design efficient for all
variables of interest in a large-scale multi-purpose survey. The outliers that have the most impact come from
sample units that have a large sample value coupled
with a large sampling weight. Probability proportional
to size sampling (PPS) or size stratification is often used
in the design stage to prevent such a situation from
occurring. If the measure of size (MOS) is not reliable,
it is difficult to eliminate outliers entirely, unless
557
558
Out of Order
OUT OF ORDER
The out of order survey disposition is used in telephone
surveys when the telephone number dialed by an interviewer is nonworking or not in service. Although each
local telephone company in the United States handles
out-of-order telephone numbers differently, most companies in urban and suburban areas include a recording
on these numbers that says something like, The number you have dialed is a nonworking number, or The
number you dialed is not in service at this time. Some
telephone companies in rural areas also include these
recordings as standard practice. Thus, interviewers dialing out-of-order numbers in these areas may hear nothing, or the number may ring and ring, making it
difficult to determine whether these numbers should be
coded using the out of order disposition or as a ringno
answer noncontact.
In most telephone surveys, a case with an out of
order disposition would be considered ineligible. Out
of order dispositions are considered final dispositions
and typically are not redialed again during the field
period of a survey.
One other exception to these rules occurs if a telephone number in the sampling pool is temporarily disconnected or temporarily out of service, or if the number
has a recording that indicates that the line is being
checked for trouble. If there is clear evidence that a
number in the sampling pool is temporarily out of order,
this number should not be considered ineligible but
instead should be considered a case of unknown eligibility. Assuming the field period permits it, most survey
organizations will wait a few days and attempt to redial
numbers that appear to be temporarily out of service.
Matthew Courser
See also Final Dispositions; Missing Data; Noncontacts;
Response Rates; Temporary Dispositions; Unknown
Eligibility
Further Readings
Further Readings
Overcoverage
OUT OF SAMPLE
The out of sample survey disposition is used in all
types of surveys, regardless of mode. Cases with out
of sample dispositions are considered ineligible and
are not contacted again. As a result, the out of sample
disposition is considered a final disposition.
In a telephone survey, out of sample dispositions usually occur when the telephone number dialed by an interviewer rings to a household, business, or individual that
is outside of the geographic sampling area for a survey.
For example, in a random-digit dial (RDD) survey of the
general public, this is most common when the survey is
sampling relatively small geographic areas such as counties, towns or villages, or neighborhoods for which telephone prefix boundaries do not conform exactly (or even
closely) to geopolitical boundaries. Out-of-sample cases
usually are discovered only if the questionnaire for the
telephone survey includes screening questions that verify
that the respondent or household is located within the
geographic area being sampled for the survey.
In an in-person survey, out-of-sample cases include
ineligible housing units that were listed as being
within the sampling area but actually are outside of it.
Out-of-sample cases in these surveys also can include
other households or businesses that were incorrectly
included in a list sampleany unit that is not properly part of the target population for the survey.
In a mail or Internet survey of named respondents,
out-of-sample cases occur when the named respondent
is found to be ineligible to participate in the survey
based on screening information he or she provides on
the questionnaire. For example, a respondent who indicates that he or she is not a doctor would be considered
out of sample in a mail or Internet survey of physicians.
Matthew Courser
See also Final Dispositions; Ineligible; Response Rates;
Temporary Dispositions
Further Readings
559
OVERCOVERAGE
Overcoverage occurs in survey sample frames when
the frame contains more than enough sample records.
This primarily results from two situations. In one
case, there are records in the sample frame that do not
contain respondents or members of the target population. In other cases, the same respondent is targeted
by duplicate or multiple records in the sample frame.
In either case, the sample frame contains sample
records that should be interviewed.
Different types of overcoverage are commonly
referred to as ineligible units or multiple records.
Different researchers use the term overcoverage inconsistently, so it is important to consider whether overcoverage in a given sample frame is caused by
ineligible units, multiple records, or both.
Sample frames ideally contain a perfect one-to-one
correspondence between sample records and members
of the target population for a survey. In some cases,
multiple records refer back to a single member of the
target population. This type of overcoverage is sometimes called a multiplicity of elements. In other
cases, sample records fail to lead to members of the
target population. These cases are sometimes referred
to as blanks or foreign elements.
Multiple sample records that refer to a single member of the target population are common in sample
frames. In cases in which directories or lists are used
as sampling frames, respondents can be included more
than once if lists are compiled from multiple sources.
More commonly, multiple records lead back to a single
respondent when the sample frame and target populations are measured (covered) inconsistently. For example, if telephone numbers are sampled for a survey of
households, a household with multiple telephones will
be included multiple times. If sales records are used as
a source for a consumer survey, then consumers who
have made multiple purchases might be included in
a sample multiple times.
Overcoverage caused by duplicate or multiple
records can be adjusted for either by cleaning the sample frame or by providing sample weights to adjust for
different probabilities that a respondent is included in
the sample frame. Frame cleaning can be accomplished
either before or during the survey field process.
Cleaning before involves checking the sample frame
for duplicate or multiple records and eliminating them.
This de-duping is a basic part of constructing and
560
Overreporting
checking a sample frame and is made enormously easier and more practicable by increased computer power.
The second type of overcoverage, in which sample
records do not contain valid members of the target
population, is present in almost all sample frames.
This occurs for a variety of reasons. In some cases,
sample records do not correspond to anything similar
to the target population. For example, telephone samples often contain disconnected telephone numbers or
numbers that have not been assigned. Household surveys may send an interviewer to an empty lot. A business directory might contain mailing addresses for
establishments that went out of business many years
ago. These listings are referred to as blanks,
empty records, empty listings, or more colloquially as bad records or duds. In other cases, the
sample record reaches a unit that can be screened for
eligibility, but the record turns out to not be a member
of the target population for the survey. For example,
a survey of eligible voters may reach nonvoters, a
survey that targets telephone households in one city
instead may reach some households in a neighboring
town, or a survey of college students may reach some
recent graduates. These records are called foreign
elements, out-of-scope units, or, colloquially,
screen-outs.
Survey researchers attempt to identify blanks or
foreign elements by screening to determine whether
they are eligible for the survey. In some cases, this
can be done automatically. For example, residential
telephone samples can be screened against databases
of known business households, and other electronic
matching can identify other foreign elements. In many
cases, however, an interviewer or other field staff
member needs to contact a sample record to determine if it is an eligible member of the surveys target
population. This is especially true for studies that utilize general population sample frames to identify rare
subpopulations. For example, a survey of parents with
disabled children who live at home may need to contact and screen all households in the sample to locate
the eligible households, even though the majority of
households do not have children living there and most
others have children who are not disabled. These lowincidence samples can add great cost to a survey.
Blanks and foreign elements generally do not lead
to biased or distorted survey results, but they often
result in a loss of both sample and economic efficiency. Surveys that desire a specific degree of statistical precision need to increase (inflate) initial sample
Further Readings
OVERREPORTING
In many surveys, respondents tend to report more
socially desired behaviors than they actually performed. In addition to this type of misreportingcalled
overreportingrespondents are also inclined to
understate that they have engaged in socially undesirable behaviors, which is called underreporting.
Similar to underreporting, overreporting is assumed to
be connected to social desirability bias and thus occurs
on the cognitive editing stage of the questionanswer
process.
Among other topics, overreporting of voting and
being registered to vote has been in the focus of methodological research for decades. Since respondents in
national- and state-level and local election polls tend
to overly state that they have voted in the election,
voter turnout has traditionally been overestimated.
Usually, overreporting is identified applying postsurvey validations using record checks (like in the
National Election Study).
Since not every survey can afford a cost-intensive
validation study, several attempts have been made in
order to reduce vote overreporting, either by softening
the question wording so that respondents will not feel
embarrassed to admit that they have not voted or by
a set of preceding questions on voting behavior in
other, prior elections. It was assumed that respondents
Overreporting
561
P
would be done by using a scaled response format such
as Strongly Prefer Candidate A; Prefer Candidate A;
Slightly Prefer Candidate A; Slightly Prefer Candidate B; Prefer Candidate B; Strongly Prefer Candidate B. Generally the midpoint of the preference
scalewhich in this example would be, Prefer Neither Candidate A nor Candidate Bis not offered to
respondents because it is reasoned that the likelihood
that there is complete indifference between the two is
extremely low. Providing this no preference choice
may encourage some respondents to satisfice and use
the middle option too readily.
Scoring using paired comparison data is straightforward. In the previous example a Strongly Preferred
response would be scored with a 3, a Preferred
response would scored with a 2, and a Slightly
Preferred response would be scored with a 1. If Candidate A were paired with Candidate D, and Candidate A
were strongly preferred over Candidate D by a given
respondent, then the respondent would be assigned
a + 3 score for Candidate A for that pairing, and the
respondent would get a 3 score for Candidate D for
that pairing. The scaled scores for a specific candidate
for each respondent would be the sum of the respondents individual scores from each of the pairings in
which that candidate was included. In the example of
five candidates being paired in all possible ways, there
would be ((c(c 1))/2) possible paired comparisons,
with c representing the number of things being paired.
Thus in this example there are ((5(5 1))/2) or 10
pairs: AB, AC, AD, AE, BC, BD, BE, CD, CE, and
DE. (The pairings would be presented in a random
order to respondents.) Each pairing would require
563
564
Panel
a separate question in the survey; thus, this fivecandidate comparison would require 10 questions being
asked of each respondent. If one of the candidates in
this example were strongly preferred by a specific
respondent over each of the other four candidates she or
he was paired with, that candidate would get a score of
+12 for this respondent. If a candidate were so disliked
that every time she or he was paired with one of
the other four candidates a given respondent always
chose Strongly Preferred for the other candidates,
then the strongly disliked candidate would be assigned
a scaled score of 12 for that respondent. Computing
scale scores for each thing that is being rated is easy to
do with a computer, and these scaled scores provide very
reliable indications of the relative preferences a respondent has among the different items being compared. That
is, asking a respondent to rank all of the things being
compared in one fell swoop (i.e., with one survey question) will yield less reliable and valid data than using
a paired comparison design to generate the ranking.
Unfortunately, increasing the numbers of things
being compared in a paired comparison design
quickly causes many more pairings to be required for
judgment by each respondent. Thus, if a researcher
wanted 10 things to be compared, a total of 45 pairings would need to be judged by each respondent.
In instances when a large number of pairings are to
be made by respondents, a researcher is wise to add
some reliability checks into the set of pairings. This is
done by randomly selecting some of the pairs and
reversing the order of the things within those pairings.
For example, if the AF pairing were randomly chosen
as one of the reliability checks, and if A was compared with F earlier in the question sequence by the
respondent, then later on in the sequence F would be
compared with A. The respondent is not likely to
recall that she or he had already made this comparison
if the set of items being compared is large. Adding
such reliability checks allows the researcher to identify those respondents who are not taking the task
seriously and instead are answering haphazardly.
Paul J. Lavrakas
See also Interval Measure; Ranking; Rating; Respondent
Burden; Satisficing
Further Readings
PANEL
A panel refers to a survey sample in which the same
units or respondents are surveyed or interviewed on
two or more occasions (waves). Panels can give information about trends or changes in the characteristics
of a given population over a period of time. A panel
usually can measure changes in the characteristics of
interest with greater precision than a series of independent samples of comparable size. A survey using
a panel design is often called a longitudinal survey,
which is one particular type of repeated survey.
The sample design for a panel is very different from
the one for an independent sample or a series of independent samples. In the sample design for a panel, more
stable stratification variables over time can (and should)
be employed than when using independent samples,
because whereas a panel design may be statistically efficient in a short run, it may not be over a longer period
of time. Also, the design for a panel should incorporate
the changes in the population that the panel is meant to
represent, such as births and other additions and deaths
and other removals of sample units, in an optimal way
so as not to cause disruption to the ongoing survey
operations at different points of time or waves.
Panel
565
Rotating Designs
Even if a panel is a bad sample, that is, if it does
not well represent a given population over time, the
organization carrying out the panel survey may have
to continue to maintain that panel. But there is a solution to such a problem of the panel. It is a rotating
panel design, in which a part of the panel sample is
replaced at each subsequent point in time. This design
is intermediate (i.e., a hybrid) between a panel sample
and independent samples. As the simplest example,
one may choose a panel design involving overlaps of
half of the sample elements, as shown by AB, BC,
CD, DE, EF, and so on at different points of time. In
this example, the B panel sample appears in the first
and second waves of data collection; the C panel sample appears in the second and third waves, and so on.
Such rotating designs not only reduce respondent burden but also provide an opportunity to refresh the
sample at each point of time with cases that better
reflect the current makeup of the target population.
Examples
A number of panel surveys with economic or social
science focus have been conducted around the world.
One of the oldest panel surveys is the Panel Study of
Income Dynamics (PSID) conducted by the Survey
Research Center at the University of Michigan. This
panel study has collected information on the dynamic
aspects of economic and demographic behavior,
including sociological and psychological measures.
The panel of the PSID is a representative sample of
U.S. family units and individuals (men, women, and
children). The panel, originating in 1968, consisted of
two independent samples: a cross-sectional national
sample and a national sample of low-income families.
The cross-sectional sample, which yielded about
3,000 completed interviews, was an equal probability
sample of households from the 48 contiguous states.
The national sample of low-income families came
from the Survey of Economic Opportunity (SEO)
566
Panel Conditioning
Further Readings
PANEL CONDITIONING
Panel conditioning is an effect sometimes observed in
repeated surveys when a sample units response is
influenced by prior interviews or contacts. Various
possibilities have been suggested to explain the cause.
Panel conditioning can affect the resulting estimates
by introducing what is sometimes called time-insample bias or rotation group bias.
In many surveys, the household, business, or other
sample unit is contacted more than once over a period
of time, usually to reduce the total survey cost, to produce longitudinal estimates, or to decrease the standard error of the estimate of change in the items of
interest. In various documented studies, the levels of
unemployed persons, expenditures, illness, victimizations, house repairs, and other characteristics were
significantly higher or lower in earlier survey contacts
than in later ones.
Potential scenarios to explain this behavior are
extensive. At times a respondent recalls the answer to
567
568
Applications
Most panel data applications have been limited to
a simple regression with error components disturbances, such as the following:
yit = x0it b + mi + nit ,
i = 1, . . . , n;
t = 1, . . . , T,
estimator cannot estimate the effect of any timeinvariant variable such as gender, race, or religion.
These variables are wiped out by the within transformation. This is a major disadvantage if the effect of
these variables on earnings is of interest.
If mi denotes independent random variables with
zero mean and constant variance s2m , this model is
known as the random-effects (RE) model. The preceding moments are conditional on the xit s. In addition,
mi and nit are assumed to be conditionally independent. The RE model can be estimated by generalized
least squares (GLS), which can be obtained using
a least squares regression of yit* = yit yyi: on xit*
similarly defined, where y is a simple function of the
variance components s2m and s2n . The corresponding
GLS estimator of b is known as the RE estimator.
Note that for this RE model, one can estimate the
effects of individual-invariant variables. The best quadratic unbiased (BQU) estimators of the variance components are ANOVA-type estimators based on the true
disturbances, and these are minimum variance unbiased
(MVU) under normality of the disturbances. One can
obtain feasible estimates of the variance components
by replacing the true disturbances by OLS or fixedeffects residuals.
A specification test based on the difference
between the fixed- and random-effects estimators is
known as the Hausman test. The null hypothesis is
that the individual effects are not correlated with
the xit s. The basic idea behind this test is that the
~ is consistent, whether or
fixed-effects estimator b
FE
not the effects are correlated with the xit s. This is true
because the fixed-effects transformation described by
~yit wipes out the mi effects from the model. However,
if the null hypothesis is true, the fixed-effects estimator is not efficient under the random-effects specification because it relies only on the within variation in
the data. On the other hand, the random-effects esti^ is efficient under the null hypothesis but is
mator b
RE
biased and inconsistent when the effects are correlated
with the xit s. The difference between these estimators
~ b
^
^q = b
FE
RE tends to zero in probability limits
under the null hypothesis and is nonzero under the
alternative. The variance of this difference is equal
~
to the difference in variances, var^q = varb
FE
^ ) because cov^q; b
^ = 0 under the null
varb
RE
RE
hypothesis. Hausmans test statistic is based on
m = ^q0 var^q 1 ^q and is asymptotically distributed
a chi-square with k degrees of freedom under the null
hypothesis.
Panel Fatigue
569
Further Readings
Other Considerations
Panel data are not a panacea and will not solve all the
problems that a time-series or a cross-section study
could not handle. For example, with macro-panels
made up of a large number of countries over a long
time period, econometric studies argued that panel
data will yield more powerful unit root tests than individual time-series. This in turn should help shed more
light on the purchasing power parity (PPP) and the
growth convergence questions. This led to a flurry of
PANEL FATIGUE
Panel fatigue refers to the phenomenon in survey
research whereby the quality of data that is gathered
from a particular member of a survey panel diminishes
if she or he is expected to stay in the panel for too long
a duration (i.e., for too many waves) of data collection.
In the extreme, panel fatigue leads to premature panel
nonresponse for particular panel members prior to their
570
Panel Survey
Further Readings
PANEL SURVEY
The essential feature of a longitudinal survey design is
that it provides repeated observations on a set of variables for the same sample units over time. The different
types of longitudinal studies (e.g., retrospective studies,
panel surveys, and record linkages) are distinguished by
the different ways of deriving these repeated observations. In a panel survey, repeated observations are
derived by following a sample of persons (a panel) over
time and by collecting data from a sequence of interviews (or waves). These interviews are conducted at
usually fixed occasions that in most cases are regularly
spaced.
There are many variations under this general
description of a panel survey, including (a) cohort
panel surveys, (b) household panel surveys, and (c)
rotating panel surveys. These three types of panel surveys can be distinguished, first, by the sampling units
and the population the survey aims to represent.
The focus can be entirely on individuals or on individuals within their household context. A second distinction is between surveys comprising a single panel
of indefinite life and surveys comprising multiple overlapping panels of fixed life. Choosing the appropriate
design for a panel survey depends on the priorities
and goals of the (potential) data users and requires
an assessment of the benefits of the different sorts of
information collected and the costs required for deriving them.
Panel Survey
571
572
Panel Survey
PAPER-AND-PENCIL
INTERVIEWING (PAPI)
Prior to the 1980s, essentially all survey data collection
that was done by an interviewer was done via paperand-pencil interviewing, which came to be known as
PAPI. Following the microcomputer revolution of the
early 1980s, computer-assisted interviewing (CAI)
for example, computer-assisted personal interviewing
(CAPI), computer-assisted self-interviewing (CASI),
and computer-assisted telephone interviewing (CATI)
had become commonplace by the 1990s, essentially
eliminating most uses of PAPI, with some exceptions.
PAPI still is used in instances where data are being
gathered from a relatively small sample, with a noncomplex questionnaire, on an accelerated start-up time basis,
573
574
Paradata
PARADATA
Paradata, also termed process data (but not to be confused with metadata), contain information about the
primary data collection process (e.g., survey duration,
interim status of a case, navigational errors in a survey questionnaire). Paradata can provide a means of
additional control over or understanding of the quality
of the primary data (the responses to the survey
questions).
Collecting Paradata
Since paradata are defined simply as data describing
the primary data collection process, paradata can
be collected in every survey mode. However, the
amount, type, and level of detail of the captured paradata will vary depending on whether the data have to
be manually recorded or whether they are automatically logged by computer software. A crude distinction can also be made between paradata describing
the data collection process as a whole (calls, followup procedures, etc.) and more specific paradata referring to how a survey questionnaire was filled in.
Paradata
Uses of Paradata
Paradata can assist survey questionnaire pretests. For
instance, data on how long it took to answer survey
questions could be of importance in this phase. Long
response latencies could indicate problems with particular questions. Paradata from keystroke files can
reveal where errors were made, which may indicate
poor interface design.
Paradata can also be collected during the actual
field work. Recently, researchers have used paradata
to adapt the survey design while the field work is still
ongoing in order to improve survey cost efficiency
and to achieve more precise, less biased estimates
(these are so-called responsive design surveys).
In interviewer-administered surveys, paradata can
be used to evaluate interviewer behavior. Time data
can help identify interviewers who administered all or
parts of the questionnaire too quickly. As in pretests,
keystroke data can reveal where errors are being
made. If these analyses are conducted during the field
work, corrective measures can still be implemented in
this phase.
575
Data Preparation
If the researcher has clear a priori assumptions about
which information is important, paradata can be
stored in conventional matrix form data sets. This, for
instance, is the case when call sheets are used or when
simple survey question durations are recorded. The
variables are pre-defined, and the interviewer or computer software is instructed to compute and record
their values. These variables can then be used in conventional analyses.
If some variables cannot be pre-defined, the data
can be collected in a relatively unstructured way. Keystroke data are of this type. Data are collected by adding each action to a data string. Since not every
interviewer or respondent will perform the same number of actions, or in the same sequence, these data
strings will be of different lengths and their structure
will vary from one observation to the next. In addition,
different parts of the data strings could be important
depending on the focus of the analysis. SAS macros or
other software capable of recognizing string patterns
can be used to extract the useful information from the
strings before the actual analysis.
Dirk Heerwegh
See also Computer-Assisted Personal Interviewing (CAPI);
Computer-Assisted Self-Interviewing (CASI);
Computer-Assisted Telephone Interviewing (CATI);
Response Latency; SAS; Web Survey
Further Readings
576
Parameter
PARAMETER
A parameter is a numerical quantity or attribute of
a population that is estimated using data collected
from the population. Parameters are to populations
as statistics are to samples. For example, in survey
research, the true proportion of voters who vote for
a presidential candidate in the next national election
may be of interest. Such a parameter may be estimated using a sample proportion computed from data
gathered via a probability sample of registered voters.
Or, the actual annual average household out-ofpocket medical expenses for a given year (parameter) could be estimated from data provided by the
Medical Expenditures Survey. Or, the modal race of
students within a particular school is an example of
an attribute parameter that could be estimated using
data acquired via a cluster sample of classrooms or
students from the particular school. An important
parameter in the realm of survey nonresponse is the
likelihood of the response. The binary event respond
or not respond can be modeled as a Bernoulli random variable with unknown parameter, p, which can
vary by sampling unit as a function of demographic,
socioeconomic, or other variables.
Parameters may also refer to specific aspects of
a sampling distribution of a test statistic or reference
distribution. For example, when estimating the mean
weight loss for subscribers of a particular weight loss
plan (the parameter), the most straightforward point
estimate is the sample mean. The corresponding
confidence interval is then computed with respect to
a reference distributionusually approximated by
a normal distribution with a location and a scale
parameter. The location parameteror mean of
Further Readings
PARTIAL COMPLETION
The partial completion survey disposition is used in
all types of surveys, regardless of mode. In a telephone
or in-person interview, a partial completion results
when the respondent provides answers for some of
the questions on the survey questionnaire that were
asked by the interviewer but is unable or unwilling to
allow the interviewer to administer all of the questions in the interview (item nonresponse). Partial completions in telephone or in-person surveys can occur
when an appointment or other commitment prevents
the respondent from completing the interview or when
the respondent begins the interview but then refuses to
complete the entire interview process (called a breakoff). In a mail survey, a partial completion results
when the respondent receives a paper-and-pencil survey questionnaire, answers only some of the questions
on the questionnaire, and returns the questionnaire to
the researcher. In an Internet survey, a partial completion occurs when the respondent logs into the survey,
enters answers for some of the questions in the questionnaire, and submits the questionnaire electronically
to the researcher. If a partial is not a hostile breakoff,
most survey firms attempt to recontact the respondent
who completed the partial interview (by telephone,
mail, or Internet, depending on the survey mode) in
order to attempt to get a completed interview.
In practice, the difference between completed interviews, partial completions, and breakoffs is that completed interviews contain the smallest number of item
nonresponses, while breakoffs contain the largest number of item nonresponses. Most survey organizations
have developed rules that explicitly define the difference
among breakoffs, partial completions, and completed
interviews. Common rules used by survey organizations
to determine whether an interview with item nonresponse can be considered a completed interview
include (a) the proportion of all applicable questions
answered, and (b) the proportion of critically important
or essential questions administered. For example, cases
in which a respondent has answered fewer than 50% of
the applicable questions might be defined as breakoffs;
577
Matthew Courser
See also Breakoff; Completed Interview; Final
Dispositions; Item Nonresponse; Response Rates;
Temporary Dispositions
Further Readings
PERCENTAGE FREQUENCY
DISTRIBUTION
A percentage frequency distribution is a display of
data that specifies the percentage of observations that
exist for each data point or grouping of data points. It
is a particularly useful method of expressing the relative frequency of survey responses and other data.
Many times, percentage frequency distributions are
displayed as tables or as bar graphs or pie charts.
The process of creating a percentage frequency
distribution involves first identifying the total number
of observations to be represented; then counting the
total number of observations within each data point or
grouping of data points; and then dividing the number
of observations within each data point or grouping of
data points by the total number of observations. The
sum of all the percentages corresponding to each data
point or grouping of data points should be 100%. The
final step of creating a percentage frequency distribution involves displaying the data.
For example, as part of a study examining the
relationship between number of trips to a physician and
socioeconomic status, one might survey 200 individuals
about the number of trips each made to a physician over
the past 12 months. The survey might ask each individual
to choose from the following responses: 0 times during
the past year, 13 times during the past year, 46
times during the past year, 79 times during the past
year, and 10 or more times during the past year.
578
Percentile
Response
0 times during the past year
40
25
20
10
40.0 %
40.0 %
Percentage
30.0 %
25.0 %
20.0 %
20.0 %
10.0 %
10.0 %
5.0 %
0.0 %
0
Figure 1
13
46
Trips
79
10 or more
Further Readings
PERCENTILE
A percentile is a statistic that gives the relative standing of a numerical data point when compared to all
other data points in a distribution. In the example
P:84 = 66, P:84 is called the percentile rank and the
data point of 66 is called the percentile point. The .84
in the percentile rank of P:84 is a proportion that tells
us the relative standing of the percentile point of 66
compared to all other data points in the distribution
being examined. Reporting percentiles can be a useful
way to present data in that it allows an audience to
quickly determine the relative standing of a particular
data point.
By itself, a raw score or data point says little about
its relative position within a data set. Percentiles provide a number expressing a data points relative position within a data set. At a glance, the percentile shows
the reader whether a particular numerical data point is
high, medium, or low in relation to the rest of the data
set. Salaries, IQ scores, standardized test scores such as
the SAT, GRE, body mass index (BMI), height, and
weight are all frequently expressed as percentiles.
Some percentile values commonly used in reporting
are the median, P:50 , below which 50% of the cases
fall; the lower quartile, P:25 , below which 25% of the
cases fall; and the upper quartile, P:75 , below which
75% of the cases fall. The area between the lower quartile and the middle quartile is called the interquartile
Perception Question
579
Further Readings
PERCEPTION QUESTION
Perception is the subjective process of acquiring, interpreting, and organizing sensory information. Survey
questions that assess perception, as opposed to those
assessing factual knowledge, are aimed at identifying
the processes that (a) underlie how individuals acquire,
interpret, organize, and, generally make sense of (i.e.,
form beliefs about) the environment in which they live;
and (b) help measure the extent to which such perceptions affect individual behaviors and attitudes as a function of an individuals past experiences, biological
makeup, expectations, goals, and/or culture.
Perception questions differ from other types of survey questionsbehavioral, knowledge, attitudinal, or
demographicin that questions that measure perception
ask respondents to provide information on how they
perceive such matters as the effectiveness of programs,
their health status, or the makeup of their community,
among other specific measures assessing biological,
physiological, and psychological processes.
Broadly, research on perception is driven by
many different kinds of questions that assess how individual senses and perceptions operate; how and why
individuals are susceptible to perceptions or misperceptions; which structures in the brain support perception;
and how individual perceptions acquire meaning.
Research on the psychology of perception suggests that
the actions of individuals are influenced by their perceptions of the opinions, values, and expectations of others,
including those individuals identified as important by
the respondent. This is of particular import to survey
methodologists, because an individuals perceptions
may influence her or his survey responses and, moreover, may be inaccurate. When this inaccuracy is systematic (biasing) rather than random, such inaccuracy
has consequences for interpreting the survey data collected. Theory development on perception indicates that
the accuracy of reported perceptions is related to communication mode, coordination efforts, and the salience
of the percept. Other research finds that perceptions
may be distorted by intimate relationships, attraction,
personality traits, or indirect cues or interviewer effects,
580
Perturbation Methods
Further Readings
PERTURBATION METHODS
Perturbation methods are procedures that are applied to
data sets in order to protect the confidentiality of survey respondents. The goal of statistical disclosure control (SDC) is to provide accurate and useful data
especially public use data fileswhile also protecting
confidentiality. Various methods have been suggested,
and these may be classified two ways: (1) methods that
do not alter the original data but reduce the amount of
data released; and (2) methods that alter individual
values while maintaining the reported level of detail.
The first set of methods may be described as data
coarsening; the second set of methods may be described as statistical perturbation methods.
Perturbation methods have the advantage of maintaining more of the actual data collected by survey
respondents than data coarsening. Variables selected for
perturbation may be those containing sensitive information about a respondent (such as income) or those that
may potentially identify a respondent (such as race).
These methods can be used for data released at the
microdata level (individual respondent records) or at
the tabular level (in the form of frequency tables).
Depending on the data, their values, and method of data
release, researchers may select one perturbation method
over another, use multiple perturbation techniques, or
use these techniques in addition to data coarsening.
Examples of perturbation methods are described
below, with a focus primarily on perturbation of microdata. This is not an exhaustive list, as new methods are
continually being developed.
Data swapping. In this method, selected records are
paired with other records in the file based on a predetermined set of characteristics. Data values from some
581
Further Readings
582
on the political values of the public; and (f) methodological research focused on issues facing the survey
research community. The center also conducts extensive polling during national elections, typically
including a broad scene-setter poll during the summer prior to the election, several polls during the fall
focused on the issues and candidate images, a predictive survey conducted the weekend before the election, and a post-election survey to gauge the publics
reaction to the outcome of the election and expectations for the future. In conjunction with the Pew Internet & American Life Project, the People & the Press
Center also conducts a post-election survey on the
publics use of the Internet to follow and participate
in the election campaigns.
The Pew Internet & American Life Project,
founded in 1999, uses surveys to study the social and
political impact of the Internet in the United States.
Most of its work consists of random-digit dialed
(RDD) telephone surveys of the general public and
special populations, but it also uses Internet surveys
and qualitative methods in some of its research. Its
director is Lee Rainie, a former journalist who served
as managing editor of U.S. News & World Report.
The project conducts regular tracking surveys to monitor trends in Internet use for a range of activities,
including email use, broadband adoption, search
engine use, blog creation and readership, and use of
the Internet in such areas as health care, hobbies, the
arts, social interaction, education, shopping, decisions
about major purchases, and political activity.
The Pew Hispanic Center makes extensive use of
survey research to study the Hispanic population of
the United States and its impact on the country. The
center conducts regular RDD telephone surveys of the
Hispanic population, focusing on such topics as political engagement, employment, identity, education, and
remittances. It also conducts extensive secondary analysis of U.S. Census surveys such as the decennial census, the Current Population Survey, and the American
Community Survey. Its founding director is Roberto
Suro, a former journalist who served as reporter for
The Washington Post, Time magazine, and The New
York Times.
In addition to partnering with the People & the Press
Center on annual surveys of religion and politics in the
United States, the Pew Forum on Religion & Public
Life also had used surveys to study Pentecostalism in
Asia, Latin America, and Africa as well as in the United
States. It also has worked with the Pew Hispanic Center
Pilot Test
Further Readings
PILOT TEST
Pilot tests are dress rehearsals of full survey operations that are implemented to determine whether problems exist that need to be addressed prior to putting the
production survey in the field. Traditional pilot tests
are common and have been a part of the survey process
since the 1940s. In recent years, by the time a pilot test
is conducted, the questionnaire has frequently already
undergone review (and revision) through expert review,
focus groups, and/or cognitive interviews.
The terms pretest and pilot test are sometimes used
interchangeably; however, in recent years pretest has
taken on the meaning of testing within a survey laboratory, rather than in the field with the general population. Some organizations or survey researchers now
refer to pilot tests as field pretests. Pilot testing is one
of the most critical aspects of a successful survey
operation resulting in good survey data. Going into
the field for a full production survey without knowing
whether the questionnaire and/or field interviewer
procedures work is a recipe for disaster.
Objectives
In surveys, nonsampling measurement error can be
caused by problems associated with interviewers,
583
Procedures
The pilot test procedures should mirror the procedures
that will be used in the production survey. For pilot
tests, the sample size is typically 50100 cases. In fact,
sample is an inappropriate term to use, since a nonrandom convenience sample rather than a random sample
is typically used. It is important to have a sample
that is as similar in characteristics as the respondents in
the production survey sample. But due to costs and
staff efficiencies, pilot tests are frequently done in one
to three locations in the country, in particular when
data collection is face-to-face, with a relatively small
interview staff. If a project involves surveying persons
with unique characteristics, it is extremely important
that persons with the targeted characteristics be
included in the pilot test sample.
All of the procedures used in the production survey
should be used in the pilot test. This includes modes
of survey administration, respondent rules, interviewer
staffing, and interviewer training.
It is not beneficial to include only the best or most
experienced interviewers in the pilot test. Those interviewers often have enough experience that they know
how to make a problematic question work, but their
solutions sometimes lie outside of standardized interviewing practices and are therefore inconsistent from
case to case. Using some inexperienced and/or lowcaliber interviewers allows problematic situations to
arise naturally and be evaluated. If the situation arises
during the pilot test, it is likely it will be encountered
in the production survey. It is better to find out about
the problem during the pilot test, so that the problem
can be addressed prior to production interviewing.
Implementation
When the pilot test is implemented, an estimate of
average interview time for questionnaire completion
can be obtained that has implications on the survey
budget. If the interview time exceeds that which is
allowed or budgeted for, then decisions will need to
be made about reducing the number of the questions
in the survey, reducing the sample size, or changing
584
Pilot Test
Evaluation Methods
The most common form of evaluation of a pilot
test is interviewer debriefings. Interviewer debriefings
usually consist of the pilot test interviewers meeting
together and along with the project manager; they go
through the questionnaire question by question, identifying what problems they encountered. Feedback on
interviewer training and interviewer procedures is also
solicited, so that revisions can be made prior to the
production survey. Relying solely on interviewers for
feedback about the questionnaire is insufficient to
gain an objective view, however. Some interviewers
may have difficulty separating the problems the
respondents encountered from issues about the questionnaire that the interviewers do not like. And some
interviewers are more vocal than others, and the problems they perceive exist with the questionnaire may
actually be less consequential than problems observed
with different items by a less vocal interviewer.
Interviewer evaluation of pilot tests has expanded
during the past 20 years to include standardized rating
forms. With these forms, interviewers consider their
pilot test experience and provide ratings as to whether,
and the extent to which, the question was problematic.
This allows all interviewers to weigh in equally on
problem identification and provides empirical data for
the researcher to analyze.
During the past 20 years, additional methods have
been used to evaluate pilot tests. These include behavior
coding, respondent debriefing, and vignettes. Behavior
Implications
It is very important for survey organizations to build
pilot testing into their timelines and budgets for each
survey. Pilot testing is frequently the first activity to
get cut when budget and time run tight, yet unknown
problems of the questionnaire or interviewer procedures may lead to increased nonsampling measurement error and subsequently lower-quality survey
data. Incorporating pilot tests into the routine procedures for survey development and planning is vital in
Point Estimate
Further Readings
POINT ESTIMATE
Point estimates are single numeric quantities (i.e.,
points) that are computed from sample data for the
purpose of providing some statistical approximation
to population parameters of interest. For example,
suppose surveys were being designed to estimate the
following population quantities: (a) the proportion of
teenagers within a school district who consumed at
least one alcoholic beverage last year, (b) the mean
number of candy bars consumed last week by county
hospital nurses within a state, (c) the total number of
text messages sent by cell phone customers of a particular cell phone provider within the last month, (d) the
correlation between education and annual expenditures
on magazine subscriptions within the past year for U.S.
citizens. In every case, a single numeric quantity, or
statistic, can be computed from collected sample data
to estimate the population parameters of interest. In
contrast to point estimates, interval estimates are computed using point estimates to provide an estimated
range of values for the parameter.
Point estimates generally have a form that is consistent with the population parameter they are intending
585
Table 1
Survey
Weights
Weighted
Values
40
120
40
200
40
40
18
108
18
162
18
126
18
144
18
72
15
30
10
15
30
47
240
1032
Student
Column Sum
586
Political Knowledge
POLITICAL KNOWLEDGE
Political knowledge has long been considered an integral part of public opinion, as well as a topic of survey research interest in its own right for many
decades. While many definitions of political knowledge exist, most reflect an understanding of it as factual information about politics and government that
individuals retain in their memory.
Political knowledge is similar to, but somewhat narrower than, political awareness, political sophistication, or political expertise. Political knowledge is
an important source of the considerations people draw
upon when asked to respond to an attitude or opinion
question and is thus an integral aspect of many theories
of public opinion formation and change. It also is
a key concept in democratic theory, with scholars and
philosophers since the time of Plato debating whether
the public knows, or could know, enough to play an
appropriate role in the governance of a society.
Political knowledge is typically measured in surveys
with questions asking respondents to recall specific
facts or to recognize names or events. There also are
questions asking respondents to rate their own level of
knowledge, either in general or on specific subjects.
Related to these are questions that ask whether the
respondent has heard or read anything about an event,
person, or issue. Another type of measure is an interviewer rating of the respondents level of knowledge.
In the absence of these kinds of measures, researchers
sometimes use other surrogates, such as level of formal
education. While such surrogates may be correlated
Poll
587
Further Readings
POLL
Poll is a term commonly used to refer to a public
opinion survey in which the main purpose is to collect
information concerning peoples opinions, preferences, perceptions, attitudes, and evaluations of public
opinion issues. The performance of public officials,
confidence in public and private institutions, enacted
policies on topics such as poverty, education, taxes,
immigration, public safety, the war on terror, samesex marriage, abortion, or gun control are only some
of the topics that polls typically focus upon. Also, the
term poll is associated with the measurement of current presidential and congressional vote intention,
using either candidates actual names or generic references such as Democrat or Republican in order
to forecast election results.
A poll is conducted on the basis of sampling principles; that is, a representative group of persons from
a specific population is interviewed in order to generalize results to the whole population. A common practice in a poll is to establish an approximate confidence
interval, also known as sampling error or margin of
error. For instance, a poll may be reported stating that
with 95% confidence the margin of error is plus/
minus 3% around the estimated percentage, given
a sample size of 1,000 cases. This type of statement
means that if the poll were taken repeatedly with different samples of the same size, one might expect that
95% of the time (i.e., 19 times out of 20) the poll estimate would be within the confidence interval.
A poll may be conducted using self-administered
or interviewer-administered methods or a combination
588
of both, depending on the survey aims as well as budget constraints. Telephone and face-to-face modes are
the predominant interviewer-administered methods
used for public opinion polling. Mailing, computerand Web-based methods are self-administered modes
that are also used. Such interviewing modes are used
either because of financial reasons, time constraints,
or at times when public issues are highly controversial or socially (un)desirable. By removing the interviewers presence, a pollster may elicit more truthful
answers; however, having no live interviewers may
considerably reduce the willingness of a respondent to
participate.
Question wording plays a critical role in a poll;
variations in the question stem may achieve different
results. For instance, a question asking, Would you
vote or not for making marijuana legally available for
doctors to prescribe in order to reduce pain and suffering? may yield very different results than a question
worded, Do you support legalizing marijuana for
medical use? Question order is also an important feature in opinion polling, because respondents tend to
answer polling questions within the context in which
they are asked; hence, the content of preceding questions may affect subsequent answers.
Mostly, polling questionnaires have closed-ended
questions with either ordered response scales or categorical response options. Sometimes polls also include
a few open-ended questions, which usually are followup questions from previous closed-ended questions.
Thus, respondents are provided statements or alternatives among which they can choose, as simple as
a Yes/No format and as extensive as a numeric 11point scale ranging from 0 to 10 and anchored with
Extremely Dissatisfied and Extremely Satisfied.
Also, differences in the way response options are presented may yield different patterns of response even
when the question wording is kept constant. For
instance, providing explicit response option categories
such as Dont Know, No Opinion, or Neutral
may change the overall distribution of the other response options.
The order of response options also matters in opinion polling, because in some interviewing modes,
respondents tend to select the response option listed
either in the first or last positionknown as a primacy
effect and recency effect. Furthermore, other elements
such as field work timing, weighting procedures, and
nonresponse may lead to simultaneous publication of
apparently contradictory poll results. Overall, polls are
Further Readings
Pollster
POLLSTER
A pollster is a person who measures public attitudes
by conducting opinion polls. Pollsters design, conduct, and analyze surveys to ascertain public views on
589
590
Population
Further Readings
POPULATION
All definitions of the population start with a definition
of an individual, the elementary units about which
inferences are to be drawn. The population is then the
collection or aggregation of the individuals or other
elements about which inferences are to be made.
In statistical usage, a population is any finite or
infinite collection of individual elements. That inferences are to be made to the collection of individuals
Population of Inference
POPULATION OF INFERENCE
The population of inference refers to the population
(or universe) to which the results from a sample survey are meant to generalize. Surveys are used to study
591
592
Population of Interest
POPULATION OF INTEREST
Most scientific research has some specific groups of
interest and attempts to make generalizations about
the characteristics of those groups. This is what is
termed the population of interest. For example, a public health study assesses medical needs among senior
citizens; an educational study examines the relationship between high school students academic performance and their parents academic attainment; and
a marine biology project attempts to investigate the
life cycle of humpback whales. The population of
interest in the first study is the senior citizens; the
second high school students; and the third humpback
whales. The same applies to applied social science
studies that employ surveys.
While closely related to one another, the population
of interest is more loosely defined than the population
of inference and the target population. In fact, the
definition of the population of interest is too loose to
be directly implemented in survey data collection. In
the case of the senior citizen study, who constitutes the
senior citizens? Is there any age criterion? What is the
reference time period for the survey? The age criterion
applied to the scope of senior citizens yields different
sets of people depending on the selected ages and time
POPULATION PARAMETER
Population parameters, also termed population characteristics, are numerical expressions summarizing
various aspects of the entire population. One common
example is the population mean,
Positivity Bias
N
P
Y =
Yi
i=1
y =
j=1
yj
593
Further Readings
POSITIVITY BIAS
Positivity bias refers to the phenomena when the
public evaluates individuals positively even when
they have negative evaluations of the group to which
that individual belongs. It is commonly seen within
the political science literature that examines positive
594
Post-Stratification
Examples
Within public opinion research, there are many examples of positivity bias to draw from. For instance, during the Watts riots in Los Angeles in August 1965,
a survey of African Americans revealed that only 2 of
the 13 political leaders assessed received negative evaluations. This was unexpected since the riots were in
response to racially motivated brutality by the police
department. Although evaluations of the police department and city government were low, the political leaders were generally held in relatively high regard.
Another example comes from surveys conducted
in the middle of the Watergate scandal in the 1970s.
Even in the midst of a scandal, only 4 of the 18 leaders assessed received negative evaluations. Once
again, the public generally held the government in relatively low regard, yet 77% of the leaders evaluated
received positive evaluations.
Positivity bias has a long history of scholarly
research in the social sciences. It is still common to
see published research examining these phenomena,
but it is rare to see it explicitly referred to as positivity
bias. This phenomenon has drawn the interest of a host
of scholars seeking to understand why the public
Source
The source of the positivity bias is debatable. Within
psychology, some believe that these biases are a part
of the consistency paradigm, while others suggest it is
more closely associated with the person perception literature. Political scientists tend to think of it in terms
of perceived similarity, such that respondents believe
the individuals they are evaluating are more similar at
the individual level than the group to which the individual belongs.
Research has also attempted to determine whether
the positivity bias is an artifact of the survey instrument. In a battery of experimental tests, research has
shown that this bias is not associated with a respondents desire to please the interviewer regardless of
the instrument used. Another possible explanation is
the way these evaluations are measured. These types
of questions typically use a Likert-based, bipolar
scale. Research has been conducted to determine if
measurement options explain the positivity bias. Studies also sought to determine if the use of the subjects
official title influenced a respondents evaluation. In
the end, results consistently suggest that the survey
instrument is not responsible.
James W. Stoutenborough
See also Approval Rating; Bipolar Scale; Demographic
Measure; Likert Scale; Respondent; Social Desirability
Further Readings
POST-STRATIFICATION
Stratification is a well-known sampling tool built on
the premise that like units in a population should be
treated similarly. It is a statistical fact that grouping
similar units, when sampling, can generally reduce
Post-Stratification
the variance of the survey estimates obtained. Stratification can be done when selecting units for study, or
it can be carried out afterward. The latter application
is usually termed post-stratification.
Illustration
To illustrate the differences between stratification and
post-stratification, assume a researcher is interested in
the total poundage of a population that consisted of
10,000 baby mice and one adult elephant. Suppose,
further, that the average baby mouse weighted 0.2
pounds but the elephant weighted three and one-half
tons or 7,000 pounds. This would mean, if the whole
population were to be enumerated, that the researcher
would obtain a total of
595
to pay for not stratifying before selection. To be specific, if two mice are selected, the expected estimate is
10,001=2 0:2 + 10,001=2 0:2 = 2,000:2:
The remaining 10,000 samples, with one elephant
and one mouse, do even more poorly unaided. For
them the researcher will have an unadjusted expected
estimate of
10,001=2 0:2 + 10,001=2
7,000 = 35,004; 500:1:
This second result, however, can be saved by
post-stratification in the same way as was previously
illustrated for stratification, since each sample has one
mouse and one elephant. The calculations here are
10,000=10,001=2=10,001=2 7,000
+ 1=10,001=2=10,001=2 :2 = 9,000:
In this case, the researcher gets back what the stratified estimator provided, but there is clearly a big risk
that she or he might get a sample that would not be
usable (i.e., that post-stratification cannot save).
Generalizations
Both stratification and post-stratification can be used
with designs that run the gamut from a simple random
sample to a complex multi-stage one. All that is needed
is to have a usable population total Y that can be estimated from the sample, say, by y. For each such pair
596
Post-Stratification
Further Readings
Post-Survey Adjustments
POST-SURVEY ADJUSTMENTS
Post-survey adjustments refer to a series of statistical
adjustments applied to survey data prior to data analysis and dissemination. Although no universal definition
exits, post-survey adjustments typically include data
editing, missing data imputation, weighting adjustments, and disclosure limitation procedures.
Data Editing
Data editing may be defined as procedures for detecting and correcting errors in so-called raw survey data.
Data editing may occur during data collection if the
interviewer identifies obvious errors in the survey
responses. As a component of post-survey adjustments, data editing involves more elaborate, systematic, and automated statistical checks performed by
computers. Survey organizations and government statistical agencies may maintain special statistical software to implement data editing procedures.
Data editing begins by specifying a set of editing
rules for a given editing task. An editing program is
then designed and applied to the survey data to identify and correct various errors. First, missing data and
not applicable responses are properly coded, based
on the structure of the survey instrument. Second,
range checks are performed on all relevant variables
to verify that no invalid (out-of-range) responses
are present. Out-of-range data are subject to further
review and possible correction. Third, consistency
checks are done to ensure that the responses to two or
more data items are not in contradiction. In computerized data collection, such as Computer-Assisted
Telephone Interviewing (CATI), Computer-Assisted
Personal Interviewing (CAPI), or Web surveys, realtime data editing is typically built into the data collection system so that the validity of the data is evaluated
as the data are collected.
597
data result from two types of nonresponse: unit nonresponse and item nonresponse. Unit nonresponse occurs
when no data are collected from a sampled unit, while
item nonresponse occurs when no data are obtained for
some items from a responding unit. In general, imputation is employed for item nonresponse, while weighting is reserved for unit nonresponse.
Imputation is the substitution of missing data with
estimated values. Imputation produces a complete,
rectangular data matrix that can support analyses
where missing values might otherwise constrain what
can be done. The statistical goal of imputation is to
reduce the potential bias of survey estimates due to
item nonresponse, which can only be achieved to the
extent that the missing data mechanism is correctly
identified and modeled.
Many imputation techniques are used in practice.
Logical imputation or deductive imputation is used
when a missing response can be logically inferred or
deduced with certainty from other responses provided
by the respondent. Logical imputation is preferred
over other imputation methods because of its deterministic nature. Hot-deck imputation fills in missing
data using responses from other respondents (donor
records) that are considered similar to the respondents missing data with respect to some characteristics. In cold-deck imputation, the donor may be from
a data source external to the current survey. In regression imputation, a regression model is fitted for the
variable with missing data and then used to predict
the missing responses.
Both deck imputation and regression imputation
could lead to underestimation of the variance, because
the imputed values are either restricted to those that
have been observed or they tend to concentrate at the
center of the distribution. The multiple imputation
method provides a valuable alternative imputation strategy because it supports statistical inferences that reflect
the uncertainty due to imputation. Instead of filling in
a single value for each missing response, the multiple
imputation method replaces each missing value with
a set of plausible values. The multiply imputed data sets
are then analyzed together to account for the additional
variance introduced by imputation.
Weighting adjustments increase the weight of the
respondents to compensate for the nonrespondents.
Weighting adjustments typically begin with the calculation of the base weight to account for the sample
design. The base weight, defined as the reciprocal of
the probability of selection, accounts for the unsampled
598
Precision
PRECISION
Precision in statistical surveys relates to the variation of
a survey estimator for a population parameter that is
attributable to having sampled a portion of the full
population of interest using a specific probability-based
sampling design. It refers to the size of deviations from
a survey estimate (i.e., a survey statistic, such as a mean
or percentage) that occurs over repeated application of
the same sampling procedures using the same sampling
frame and sample size. While precision is a measure of
the variation among survey estimates, over repeated
application of the same sampling procedures, accuracy
is a measure of the difference between the survey
estimate and the true value of a population parameter.
Precision and accuracy measure two dimensions for
a survey estimate. A sampling design can result in
survey estimates with a high level of precision and
accuracy (the ideal). In general, sampling designs are
developed to achieve an acceptable balance of accuracy
and precision.
The sampling variance is a commonly used measure
of precision. Precision can also be represented by the
standard error (the square root of the sampling variance
for a survey estimate), the relative standard error
(the standard error scaled by the survey estimate), the
Precision Journalism
599
PRECISION JOURNALISM
Precision journalism is a term that links the application of social science research methods (including
600
Precision Journalism
Precoded Question
Further Readings
PRECODED QUESTION
Precoded questions refer to survey items for which
response categories may be identified and defined
exhaustively, or very nearly so, prior to data collection activities. Precoding questions involves specifying the coding frame (i.e., the set of possible answers)
and associating each response category within the
frame with a value label (which is typically, but not
necessarily, numeric).
The term precoded also refers to a type of question that is asked by interviewers as though it is an
open-ended question, but it has precoded responses
that interviewers are to use to match (code) respondents answers rather than copy down the verbatim
given by the respondent. This also is referred to as
field coding.
The use of precoded questions delivers two important benefits to the survey researcher. First, their use
minimizes the time to prepare the answers for statistical analysis following the completion of data collection activities. Second, because the data are coded as
601
Value Labels
Value labels for precoded questions are typically
assigned in a manner that coincides with the measurement level (nominal, ordinal, interval, or ratio)
implied by the item in order to aid interpretive analysis of the survey data. For example, value labels
assigned to response possibilities that correspond to
interval or ratio level measures typically are numerical, with the set of numbered values chosen to reflect
the ordered and evenly spaced characteristics assumed
by these measurement levels. When ordinal measures
are involved, numerals are typically used for the value
label codes, and the number values chosen appear in
a consecutive sequence that is directionally consistent
with the ordinal character of the measures response
categories. (Although the use of alphabetical characters would be equally effective in communicating the
directional sequence of the response categories, use of
numerals is almost always preferred because manipulation of alphabetical or string value labels by statistical analysis software applications typically is far
more cumbersome than when numeric value labels
are used.) In contrast, code value labels for items
featuring nominal levels of measurement may be
assigned in an arbitrary manner, as they bear no
meaning or relationship to the response categories
themselves. Therefore, while sequenced numerals or
letters may be used for these value labels, these often
are assigned in an order corresponding to the sequence
in which the response choices are documented in the
research instrumentation.
These guidelines are reflected in the self-administered questionnaire examples in Figure 1, shown with
the code value labels represented by the small numerals near the check boxes corresponding to the response
choices offered.
It is worth observing that, since the value labels for
a nominal measure item are arbitrary and meaningless, it often is functional to use alphabetical value
labels to aid the analysts memory in associating the
value labels with the code definitions themselves. In
the nominal measurement item shown in Figure 1, for
example, the letter O (for Own) might have been chosen for the value label instead of 1, R (for Rent) might
have been assigned instead of 2, and A (for Another)
might have been used instead of 3.
602
Predictive Dialing
Interval-Level
Measurement
Ordinal-Level
Measurement
Nominal-Level
Measurement
Do you own your home, rent your home, or do you have some
other living arrangement?
Own home
1
Rent home
2
Another living arrangement
3
Figure 1
Types
Jonathan E. Brill
See also Closed-Ended Question; Coder Variance; Coding;
Field Coding; Interval Measure; Level of Measurement;
Nominal Measure; Open-Ended Question; Ordinal
Measure; Ratio Measure; Verbatim Responses
PREDICTIVE DIALING
Predictive dialing is a telephone call placement technology that is used in survey research to improve utilization of interviewer time during computer-assisted
telephone interviewing (CATI) surveys. In randomdigit dialing (RDD) surveys, typically fewer than
15% of dialings are answered by a human; for the
Predictive Dialing
Predictive Versus
Nonpredictive Dialing
From an operational standpoint, what distinguishes
predictive dialing from nonpredictive dialing is the
elimination of the 1:1 interviewer-to-call ratio. If one
call is placed for one available interviewer, then the
interviewer will be idle while the call is placed and
the signal returned, which could be for 30 seconds
or more. Autodialing (mechanized dialing while still
using a 1:1 ratio), along with automatic signal recognition for engaged (busy), fax, disconnected, and
unanswered lines, will partially reduce nonproductive
time and increase interviewers talk time. Predictive
dialers include these autodialing technologies but also
allow for the probability that not all call results will
require interviewer intervention, so typically there are
a great many more calls being placed than the number
of interviewers available to take them, with the dialer
handling the calls that do not require interviewer
involvement.
Predictive dialing algorithms utilize time-series or
queuing methods, and the variables utilized vary by
manufacturer: for example, interviewers in the queue,
those nearing the end of a call, connect rates, average
call length, and so on. Ideally, all predictive dialing
systems strive to make the number of connects equal
to the number of interviewers available. Variation
from this ideal has two possible results: first, there are
more connects than interviewers, leading to either
abandoned calls or dead airand ultimately alienated respondents. Second, there are too few connections, which results in a decrease in a call centers
603
Telemarketing Versus
Research Applications
Predictive dialing originated in the telemarketing
industry and is still extensively used there; however,
most predictive dialers used by survey research companies operate quite differently than their telemarketing counterparts. Specifically, the algorithms used to
predict how many telephone numbers to dial do not
utilize interviewer wait time as a criterion but instead
use the quantity of numbers that need to be dialed to
yield connects for already waiting interviewers. This
difference is significant both in terms of operation and
the designed intention to minimize abandoned calls.
Obviously, both algorithms result in abandoned calls,
but the research approach is much more conservative.
Another significant difference between telemarketing applications and research applications is the dead
air that respondents often hear after answering their
phone to a telemarketing call, which is the delay
between the moment a potential respondent answers
the telephone and says Hello to the time an interviewer is routed the call and can reply to the respondent, which is the signal that a human is actually on
the other end of the line. This delay can arise from
several sources:
1. The switching process. If the dialer sends a case
to an interviewer before the dial is made, there is in
effect a complete circuit between the interviewer and
604
Predictive Dialing
Pre-Election Polls
Further Readings
PRE-ELECTION POLLS
One of the most common and visible applications
of survey research is pre-election polling. These polls
typically include one or more trial heat questions,
which ask respondents how they will vote in the
upcoming election, along with other measures of
voter knowledge, attitudes, and likely voting behavior.
Unlike most polls, the validity of pre-election polls
can be assessed by comparing them with actual election outcomes.
Numerous organizations in the United States and
around the world conduct pre-election polls, and the
number and frequency of such polls has been growing.
Many pre-election polls are public, conducted by news
organizations, academic institutions, and nonprofit
organizations. Many others are privately conducted by
partisan party organizations and campaigns to assist in
candidates message development, resource allocation,
and overall strategy. The results of most of these private polls are not made public.
Accuracy
Pre-election polls in the United States have a generally
good record of accurately forecasting the outcome of
elections. According the National Council on Public
Polls (NCPP), the average candidate error for national
polls in the 2004 presidential election was 0.9 percentage point; in 2000, the average error was 1.1 percentage points. Polls in state-level races were also
very accurate. The average candidate error in 2004
605
History
Election polling has a long history that precedes the
development of modern probability sampling. Straw
polls of the 19th century were a popular means by
which public opinion in elections was gauged, and an
early 20th-century magazine, the Literary Digest, conducted a very large and widely followed straw poll in
several presidential elections through 1936. The Literary Digest polls spectacular failure in the presidential
election of 1936, in which it incorrectly forecast the
defeat of President Franklin Roosevelt by the Republican candidate, Alf Landon, by a large margin, discredited straw polls. Despite having more than 2,000,000
respondents, the self-selected nature of the mail survey
respondents and the fact that the sampling frame was
biased toward more affluent voters led to a gross overrepresentation of Republican voters in the Literary
Digest sample. A Gallup poll based on a sample that
more closely conformed to the principles of probability
sampling was quite accurate in the 1936 election and
helped to affirm the legitimacy of modern polling
methods.
Purposes
Conducting pre-election polls entails many decisions,
each of which can affect the accuracy of the poll:
(a) the timing of the poll, (b) the sampling method,
(c) the determination of likely voters, (d) the trial heat
question employed, and (e) the choice of other measures to be included in the poll. Pre-election polls are
conducted throughout the election cycle but are considered appropriate for forecasting the election outcome only if taken very close to Election Day. But
pre-election polls are conducted for many purposes
other than forecasting. Some are conducted at the
beginning of a campaign to provide information about
the public interest in the election, awareness of the
606
Pre-Election Polls
Challenges
Prefix
Further Readings
607
PREFIX
Understanding how telephone numbers are assigned is
very important when you are designing or implementing a telephone sample. Telephone numbers in the
United States, Canada, and the Caribbean consist of
10 digits divided into three components. The first three
digits are the area code or Numbering Plan Area
(NPA). Area codes usually cover large geographic
areas. The next three digits represent smaller geographic areas within an area code and are referred to as
the prefix (NXX). The term prefix is also used to refer
to the first three digits of the seven-digit local numbers.
When the NXX is combined with the final four digits
of a phone number, the result is a unique seven-digit
local number that is associated with a unique end
user within an NXX. In most areas of the United
States, local seven-digit dialing of landlines is the
norm. Within these areas, the area code component of
a 10-digit number need not be dialed, and calls are
switched using only the prefix. Although area codes
are unique, prefixes are unique only within an area
code. Thus, in making a long-distance telephone call or
a call to a different area code, the area code is required
in order to define a unique prefix. Not all prefixes are
available for assignment of local numbers. Some are
reserved for other uses, such as directory assistance
(555) and special services such as 911 and 411.
Although the terms prefix, wire center, rate center,
central office, and exchange are frequently used interchangeably, there are distinctions. The wire center, rate
center, or central office is the building containing the
telephone equipment and switches where individual
telephone lines are connected and through which calls
to and from individual local numbers are routed. The
term exchange usually refers to the geographic area
served by a particular rate center or wire center. Thus
an exchange can be serviced by multiple prefixes
(NXX codes). In some areas of the United States, these
prefixes might belong to different area codes.
Each prefix contains 10,000 possible local numbers (00009999). Within prefixes, local telephone
608
Pre-Primary Polls
PRE-PRIMARY POLLS
Pre-primary polls are those conducted before primary
elections in an attempt to measure voters preferences.
Primacy Effect
609
reported in these early primary states. Pollsters compete with campaigns for the time and attention of
voters, who by some accounts are driven crazy by telephone calls and emails. The increasing cacophony of
polling and electioneering in early presidential primary
states presents new problems for pre-primary polling.
Trevor N. Tompson
See also Election Polls; Horse Race Journalism; Leaning
Voter; Likely Voter; Media Polls; Pre-Election Polls
Further Readings
PRIMACY EFFECT
The primacy effect is one aspect of a well-known phenomenon called the serial position effect, which
occurs when one is asked to recall information from
memory. The other aspect of the serial position effect
is the recency effect. Psychologists discovered these
effects more than a century ago; for example, in the
1890s, Mary Whilton Calkins experimented with
these effects while she was a student of William
James. Essentially, the serial position effect means
that the recall of a list of items is easiest for a few
items at the end of the list and for a few items at the
beginning of the list. The recall of items in the middle
of the list is generally poor. The primacy effect refers
to the recall of items at the beginning of the list, while
the recency effect refers to the recall of items at the
end of the list. If one graphed the number of recalled
items as a function of position in the list, one would
obtain a U-shaped function.
Suppose that an experiment were performed in
which the following list of 24 words was read aloud at
the rate of one word per second to a group of persons:
apple, basketball, cat, couch, potato, book, bus,
lamp, pencil, glasses, guitar, truck, photo, rose,
apartment, movie, clock, car, dog, bowl, shoe, bicycle, plane, university
610
Further Readings
Priming
selecting aggregates within previously selected aggregates may proceed through several levels of a hierarchy until, at the last stage, the individual element
(e.g., households) is selected. The selected elements
are chosen only within already chosen aggregates
within the hierarchy.
Primary sampling units are not necessarily formed
from a list of elements or individuals. They may be
conceptual representations of groups for which lists
could be obtained, one from each primary sampling
unit. For example, suppose a sample of adults living
in a large country with widespread population is to be
selected, and no list of adults of sufficient quality
(e.g., good coverage, covering nearly all adults) is
available. Aggregation of adults on an existing list
cannot take place. However, units can be formed that
are for all intent aggregates of adults within them. A
common procedure, area sampling, uses geographic
areas at successive levels of a geographic hierarchy as
sampling units, with implicit aggregation of adults (or
other elements) who usually reside within them.
For example, in the United States, a common area
primary sampling unit is the county. Within counties,
geographic areas such as towns or townships or other
administrative units defined by geographic borders
can be identified. Within towns or other administrative units, the geography may be further divided into
smaller units, such as city blocks or enumeration areas
with boundaries formed by streets or highways, rivers,
and other relatively permanent, readily identified features. These blocks or enumeration areas are created
by a government statistical agency for the purpose
of providing census counts of various types of units
within them.
The geographic hierarchy from county to town to
block may be used as successive sampling units in a survey aimed at ultimately selecting elements that can be
associated uniquely with the final stage area unit.
In the first stage of selection to obtain a sample of
adults, a sample of counties is selected. The counties
from which sample counties are selected are primary
sampling units. The selected counties are sometimes
called primary selections.
One could, in principle, create a list of all adults in
selected counties and select adults who usually reside
in the selected counties in a second and final stage of
selection. However, the cost of creating lists of adults
at the county level would be prohibitive.
Additional area unitsblocksbecome sampling
units within selected counties and are second-stage
611
PRIMING
Priming is a psychological process in which exposure
to a stimulus activates a concept in memory that is
then given increased weight in subsequent judgment
tasks. Priming works by making the activated concept
accessible so that it can be readily used in evaluating
related objects. For example, hearing news about the
economy may prime individuals to focus on economic
considerations when assessing a presidents performance because economic concepts are activated,
accessible, and presumably relevant for this type of
evaluation. In this way, priming affects the opinions
that individuals express, not by changing their attitudes, but by causing them to alter the criteria they
use to evaluate the object in question.
Priming is a widely used concept with applications
in the fields of psychology, political science, and
communication. It is also relevant to survey researchers in that priming can inadvertently occur within
questionnaires and interviewing, and surveys can be
used to study the priming process.
Survey researchers recognize that their instruments
may be susceptible to producing unintended priming
effects that could bias key measurements. Inadvertent
612
Prior Restraint
PRIOR RESTRAINT
Prior restraint refers to a legal principle embodied in the
First Amendment to the U.S. Constitution that relates to
guarantees of freedom of the press. At the most fundamental level, it provides protection against censorship
by the government, and it is particularly relevant to survey research because of legal disputes about the presentation of exit poll results on Election Night.
Prior restraint actually refers to any injunction that
would prohibit the publication of information, an
infringement on the so-called freedom of the press.
Many consider it an especially serious issue because it
prevents the dissemination of information to the public,
as distinct from an injunction issued after the information has been released that prohibits further dissemination or provides for some kind of relief as in the
instance of libel, slander, or defamation of character.
The case law history of the principle is generally
credited as starting with Near v. Minnesota, in which
the U.S. Supreme Court decided in 1931 that a small
Minnesota paper could not be prevented in advance
from publishing information about elected officials. In
this 54 decision, the Justices left open the possibility
of prior restraint in some cases, especially those
involving national security. Censorship was practiced
during World War II, but the issue of national security
was not addressed explicitly until The New York
Prior Restraint
613
614
Privacy
PRIVACY
Within the realm of survey research, privacy consists
of the right to control access to ones self and ones
personal information. Private behavior occurs in a context in which an individual can reasonably expect that
no observation or recording is taking place. Privacy is
distinct from confidentiality in that privacy refers to the
protection of the right of individuals, whereas confidentiality refers to the protection of the data collected.
Identity, health and financial information, criminal
justice involvement and court records, education, and
work performance data are commonly regarded as private, despite the fact that many are commonly accessible through credit-reporting background check agencies.
The distinction between public and private behaviors is often ambiguous. Some information that
becomes part of public record, such as a persons
Probability
PRIVACY MANAGER
The privacy manager sample disposition is specific
to telephone surveys. Privacy manager services are
telecommunications technologies that are available to
households and businesses as an optional subscription
from most local U.S. telephone companies. Many privacy manager services work with caller identification
(caller ID) services to identify and block incoming
calls that do not display a telephone number that the
household has identified as a known number. Because
very few households or businesses will have the name
or telephone number of a telephone survey interviewing organization as a known number, privacy manager technologies make it significantly more difficult
for telephone interviewers to contact households that
subscribe to these services.
In addition to blocking calls from unknown
numbers, most privacy manager services also block
calls whose identification is displayed as anonymous,
unavailable, out of area, or private. Callers with blocked
numbers usually have the option to temporarily unblock
615
PROBABILITY
In general, probability is a numerical representation
of how likely is the occurrence of certain observations.
616
Probability
are also used to model the effects of errors of a measurement and, more generally, to model the effects of
not measured or unknown factors.
Theory of Probability
In a precise theory, the events associated with the
observation of an experiment possess probabilities. To
define the properties of probability, one has to define
certain operations on these events. The product of two
events A and B occurs if both A and B occur and is
denoted by AB. For example, one event may be, when
rolling a die, having a value less than 4; another event
may be having an even value. The product of these is
having a value that is less than 4 and is even, that is, 2.
Or, an event may be hitting the right-hand side of a circular target, another may be hitting its upper half, and
the product is hitting the upper right-hand-side quarter.
If two events cannot occur at the same time, their product is the impossible event, . Another operation is the
sum of two events, denoted as A + B. The sum occurs
if at least one of the two original events occurs. The
sum of having a value less than 4 and of having an
even value is having anything from among 1, 2, 3, 4,
and 6. The sum of the right-hand-side of the target and
of its upper half is three quarters of the target, obtained
by omitting the lower left-hand-side quarter.
There are three fundamental rules governing
probabilities:
1. The probability of an event is a number between
0 and 1:
0 PA 1:
2. If an event occurs all the times (surely), its probability is 1:
PS = 1:
3. If two events are such that they cannot occur at the
same time, then the probability of their sum is the
sum of their probabilities:
PA + B = PA + PB; if AB = :
The last property may be extended to cover the
sum of infinitely many events. To see that this may
be necessary, think of how many times a die needs to
be rolled to have the first 6. Let A1 be that once, A2
that twice, and so on. One cannot say that at most
A10 or A1000 cannot occur. The event that one has
Probability of Selection
Independence
The events A and B are independent, if
PAB = PAPB:
In the die-rolling example, the events of having an
even number and of having a value less than 5 are
independent, because the former (2,4,6) has probability 1/2 and the latter (1,2,3,4) has probability 2/3,
while the intersection (2,4) has probability 1/3, that is,
the product of 1/2 and 2/3. On the other hand, the
events of having an even number and having something less than 4 are not independent, because the former has probability 1/2, the latter has probability 1/2,
and their product has probability 1/6. The interpretation of this fact may be that within the first three numbers, even numbers are less likely than among the
first six numbers. Indeed, among the first six numbers,
three are even numbers; among the first three, only
one is even.
617
Applications
The foregoing simple examples are not meant to
suggest that formulating and applying probabilistic
models is always straightforward. Sometimes even
relatively simple models lead to technically complex
analyses. Further, the inference based on a probabilistic model is only relevant if, in addition to the correct
analysis of the model, it is also based on a model that
appropriately describes the relevant aspects of reality.
The results will largely depend on the choice of
the model.
Tamas Rudas
Conditional Probability
The conditional probability shows how the probability
of an event changes if one knows that another event
occurred. For example, when rolling the die, the probability of having an even number is 1/2, because one
half of all possible outcomes are even. If, however,
one knows that the event of having a value less than 4
has occurred, then, knowing this, the probability of
having an even number is different, and this new
probability is called the conditional probability of
having an even number, given that the number is less
than 4. Because there is only one even number out of
the first three, this conditional probability is 1/3.
The conditional probability of A given B is
denoted by P(A | B) and is precisely defined as
PA|B = PAB=PB,
that is, the probability of the product of the events,
divided by the probability of the condition. We have
PROBABILITY OF SELECTION
In survey sampling, the term probability of selection
refers to the chance (i.e., the probability from 0 to 1)
618
Probability of Selection
that a member (element) of a population can be chosen for a given survey. When a researcher is using
a probability sample, the term also means that every
member of the sampling frame that is used to represent the population has a known nonzero chance
of being selected. That chance can be calculated as
a members probability of being chosen out of
all the members in the population. For example,
a chance of 1 out of 1,000 is a probability of 0.001
1=1,000 = 0:001. Since every member in a probability sample has some chance of being selected, the
calculated probability is always greater than zero.
Because every member has a known chance of being
selected, it is possible to compute representative unbiased estimates of whatever a researcher is measuring
with the sample. Researchers are able to assume with
some degree of confidence that whatever they are estimating represents that same parameter in the larger
population from which they drew the sample. For
nonprobability samples (such as quota samples, intercept samples, snowball samples, or convenience samples), it is not feasible to confidently assess the
reliability of survey estimates, since the selection
probability of the sample members is unknown.
In order to select a sample, researchers generally
start with a list of elements, such as addresses or telephone numbers. This defined list is called the sampling frame. It is created in advance as a means to
select the sample to be used in the survey. The goal
in building the sampling frame is to have it be as
inclusive as possible of the larger (target) population
that it covers. As a practical reality, sample frames
can suffer from some degree of undercoverage and
may be plagued with duplication. Undercoverage
leads to possible coverage error, whereas duplication
leads to unequal probabilities of selection because
some elements have more than one chance of being
selected. Minimizing and even eliminating duplication
may be possible, but undercoverage may not be a solvable problem, in part because of the cost of the potential solution(s).
In designing a method for sampling, the selection
probability does not necessarily have to be the same
(i.e., equal) for each element of the sample as it would
be in a simple random sample. Some survey designs
purposely oversample members from certain subclasses
of the population to have enough cases to compute
more reliable estimates for those subclasses. In this
case, the subclass members have higher selection
probabilities by design; however, what is necessary in
Further Readings
PROBABILITY PROPORTIONAL
TO SIZE (PPS) SAMPLING
Probability proportional to size (PPS) sampling
includes a number of sample selection methods in
which the probability of selection for a sampling unit
is directly proportional to a size measure, Xi , which is
known for all sampling units and thought to be approximately proportional to the unknown Yi . The Xi must
all be strictly greater than 0. In single-stage sampling it
can be used to reduce the variance of survey estimates.
If the observed values, Yi , are exactly proportional to
Xi , the variance of an estimated total will be exactly 0.
When the Yi are approximately proportional to the Xi ,
the variance can still be greatly reduced relative to
equal probability sampling schemes. PPS sampling is
also used in the early stages of multi-stage samples to
achieve equal probability samples of the final-stage
sampling units or EPSEM samples.
In all cases, suppose one is estimating a population
total for a variable Yi with N units.
Y=
N
X
Yi :
619
n
1X
yi
,
Y^wr =
n i = 1 pi
i=1
Xi
:
N
P
Xi
i=1
620
i with probability 1 and select a PPS without replacement sample of size n 1 from the remaining elements.
The commonly used estimator associated with
PPS without replacement sampling is the Horvitzn
P
yi
Thompson estimator, Y^wpr =
pi .
i=1
In order to compute the variance of the HorvitzThompson estimator, it is also necessary to know the
pairwise selection probabilities. These probabilities,
denoted by pij , are the probabilities that population
units i and j both get selected in the same sample.
Similarly to the formal definition for the unit probabilities, the pairwise selection probabilities are defined
in terms of the sum over a subset of sample selection
probabilities; in this case, the summation is limited to
samples that contain both unit i and unit j:
X
Ps :
pij =
s i,j
:
VYwor =
2 i = 1 j6i
pi pj
This form of the variance is only defined for samples
of size 2 or greater. Samples of size 1 meet the PPS
with replacement definition.
The variance estimator has similar form:
2
n X
n
1X
pi pj pij yi
yj
^
:
vYwor =
2 i = 1 j6i
pij
pi pj
"
1
K = 2+
1 pi
i=1
#1
:
PPS Systematic
PPS systematic sampling provides a relatively simple
method for selecting larger PPS samples but does not
provide for unbiased variance estimation since most
pairwise probabilities will be 0. For larger samples,
the ordering of the frame before selecting the sample
is often desirable, since it imposes an implicit stratification along the ordering dimension. For example,
some socioeconomic balance can be achieved in
a sample by ordering counties or other geographic
units based on percentage above or below a specified
income level in the most recent decennial census.
Approximate variance formulas that recognize the
implicit stratification are often used.
PPS sample selection is relatively simple. Note that
pi =
N
X
nXi
N
P
Xi
i=1
and the pi sums to n. To select a PPS systematic sample from an ordered list, select a uniform (0,1] random number, R. Then form partial sums of the
i
P
pj . Select the n units i that satisfy
ordered list Si =
j=1
Si 1 < r + k 1 si for k = 1, 2, . . . , n.
Of note, PPS systematic sampling can be viewed
as a sequential sampling method.
James R. Chromy
See also EPSEM Sample; Equal Probability of Selection;
Multi-Stage Sample; Sampling Without Replacement;
Probability Sample
PROBABILITY SAMPLE
In gathering data about a group of individuals or items,
rather than conducting a full census, very often a sample is taken from a larger population in order to save
time and resources. These samples can be classified
into two major groups describing the way in which
they were chosen: probability samples and nonprobability samples. Both types of samples are made up of
a basic unit called an individual, observation, or elementary unit. These are the units whose characteristics
are to be measured from a population. In probability
samples, each member of the population has a known
nonzero probability of being chosen into the sample.
By a random process, elements are selected and receive
a known probability of being included in the sample;
this is not the case in nonprobability sampling.
In order to estimate some quantity of interest with
a desired precision from a sample, or to contrast characteristics between groups from a sample, one must
rely on knowing to whom or to what population one
is referring. Well-designed probability samples ensure
that the reference population is known and that selection bias is minimized. The best samples are simply
smaller versions of the larger population. The process
by which a sample of individuals or items is identified
will affect the reliability, validity, and ultimately the
accuracy of the estimates and inferences made.
Underlying Concepts
The concepts behind probability sampling underlie
statistical theory. From the finite population of N
621
622
Probability Sample
Sampling Designs
Simple Random Sampling
Probable Electorate
623
with the same first two digits. For example, 555-44412XX would be considered one 100-block listed on the
sampling frame. Once a household is found in an
area code + exchange + block, all of the numbers in
the particular block are called, dramatically reducing
the number of nonhousehold calls, as is done in the
Mitofsky-Waksberg approach to RDD sampling.
Importance of Well-Designed
Probability Surveys
The elements of design, including stratification, clustering, and statistical weights, should not be ignored in
analyzing the results of a probability sample survey.
Ignoring the sampling design may underestimate variability, affecting potential inferences. Software has
advanced greatly in recent years and has become more
accessible to researchers wishing to estimate population characteristics and their associated standard errors
using sample survey data. This should encourage
researchers to carry out well-designed probability surveys to attain their study objectives and to use appropriate methods to analyze their data.
Erinn M. Hade and Stanley Lemeshow
See also Census; Cluster Sample; Coverage Error;
Elements; Finite Population; Mitosky-Waksberg
Sampling; Multi-Stage Sample; n; N; Nonprobability
Sample; Probability Proportional to Size (PPS)
Sampling; Purposive Sample; Quota Sampling; Random;
Random-Digit Dialing (RDD); Random Sampling;
Replacement; Sampling Frame; Simple Random Sample;
Snowball Sampling; Strata; Stratified Sampling;
Systematic Sampling; Target Population; Unbiased
Statistic; Unit of Observation; Validity; Variance
Further Readings
PROBABLE ELECTORATE
The probable electorate is defined as those citizens
who are registered to vote and who very likely will
624
Probable Electorate
Probing
PROBING
Probing involves the use of specific words or other
interviewing techniques by an interviewer to clarify or
to seek elaboration of a persons response to a survey
question. When respondents provide incomplete or
irrelevant answers to survey questions, it becomes necessary for interviewers to query respondents further in
an effort to obtain a more complete or specific answer
to a given question. Although survey researchers may
take great care in constructing questionnaires in terms
of the wording of both questions and response options,
some respondents may not provide a response in the
format pre-specified by the researcher, especially when
answering closed-ended questions. Likewise, respondents may offer vague or overly simplistic replies to
open-ended questions, that is, questions with no predetermined response categories. Additionally, respondents may simply provide the response, I dont
know, which the researcher may or may not want to
accept as a valid response.
The following are examples of responses requiring
probing:
Interviewer question: In the past 12 months, has
a doctor, nurse, or other health professional given
you advice about your weight? [The valid response
options are Yes or No.]
Irrelevant respondent answer: My husband is on
a diet.
Unclear respondent answer: People are always
telling me I need to gain some weight.
625
technique is used with the expectation that after hearing the survey question a second time, the respondent
will understand what information the question is
intended to collect.
2. Silent probe. Interviewers may also use a silent
probe, which is a pause or hesitation intended to indicate to a respondent that the interviewer may be waiting
for additional information or clarification on a response.
Oftentimes, interviewers will utilize this technique
during later stages of an interview, once a respondents
habits or response patterns have become more obvious.
3. Neutral question or statement. When a respondent offers an inadequate response, interviewers use
neutral questions or statements to encourage a respondent to elaborate on their initial response. Examples
of good neutral probes are What do you mean?
How do you mean? Please tell me what you have
in mind, Please tell me more about . . . Note that
these probes maintain neutrality and do not lead the
respondent by asking things such as Do you mean
you dont support the bill?
4. Clarification probe. When a response to an item
is unclear, ambiguous, or contradictory, interviewers
will use clarification probes. Examples of good clarification probes are Can you give me an example? or
Could you be more specific? Whereas these probes
are helpful, interviewers must be careful not to appear
to challenge the respondent when clarifying a statement.
Interviewers must know when to draw the line between
probing a respondent for more or better information
and making the respondent feel pressured to answer an
item, which could lead to an outright refusal to continue participation in the rest of the survey.
Most interviewers will agree that I dont know is
the response to survey items requiring probing that
occurs most frequently. Because a dont know
response is vague and can mean any number of things,
interviewers are often challenged with the need to
choose the correct probe. Interviewers are also challenged to not cross the line between probing and pressuring in this situation as well. Interviewers are trained
to remind respondents that participating in a survey
is not a test, and that there are no right and wrong
answers. A good practice for interviewers is encouraging a given respondent who provides a dont know
response to an item to give his or her best estimate,
or best guess. Encouraging the respondent to answer
626
Propensity Scores
PROPENSITY SCORES
Propensity scoring was developed as a statistical technique for adjusting for selection bias in causal estimates of treatment effects in observational studies.
Unlike randomized experiments, in observational studies researchers have no control over treatment assignment, and, as a result, individuals who receive different
treatments may be very different in terms of their
observed covariates. These differences, if left unadjusted, can lead to biased estimates of treatment effects.
For example, if smokers tend to be older than nonsmokers, then comparisons of smokers and nonsmokers
will be confounded with age. Propensity scores can be
used to adjust for this observed selection bias.
Survey researchers have used propensity scores to
adjust for nonresponse bias, which arises when respondents and nonrespondents differ systematically in terms
of observed covariates, and to adjust for selection (coverage) bias, which arises when some of the population
is systematically omitted from the sample.
Propensity Scores
Propensity scores can also be used directly in nonresponse weighting adjustments. To adjust for nonresponse, survey weights can be multiplied by the
inverse of the estimated propensity score for that
respondent. This avoids the assumptions involved with
grouping together respondents with similar but not
identical response propensity scores into weighting
classes. However, this method relies more heavily on
the correct specification of the propensity score model,
since the estimated propensities are used directly in the
nonresponse adjustment and not just to form weighting
cells; and this method can produce estimators with very
large variances, since respondents with very small estimated propensity scores receive very large weights.
Propensity scores can also be used for imputation for nonresponse through propensity-matching
techniques. The idea here is similar to that of nearest neighbor imputation or predictive-mean-matching
imputation. Nonrespondents are matched to respondents with similar propensity scores, and then these
respondents serve as donors for the nonrespondents missing information. This can be done using
single or multiple imputation. For single imputation,
the respondent who is closest in terms of estimated
propensity score can be chosen as the donor or can be
selected at random from a pool of donors who are all
relatively close. For multiple imputation, a pool of
potential donors can be created by specifying tolerances for the difference in propensity scores between
the nonrespondent and the respondents, or by grouping
together the nearest k respondents, or by stratifying
on the estimated propensity scores. The approximate
Bayesian bootstrap can then be used to create proper
multiple imputations that correctly approximate the
uncertainty in the imputed values. To do this, a bootstrap sample of potential donors is selected at random
with replacement from the available pool of potential
donors for each nonrespondent. Imputation donors for
each nonrespondent are then selected at random with
replacement from this bootstrap sample. This process
is repeated to create multiple imputations for each
nonrespondent.
627
Further Readings
628
History
Propensity scoring has traditionally been applied in
biostatistics to estimate causal effects. Harris Interactive, a New Yorkbased Web survey business, first
applied propensity scoring to correct for selection
bias in Web surveys. Harris Interactive uses special
Webographic questions as propensity variables.
Webographic questionsalso called lifestyle, attitudinal, or psychographic questionsare meant to
capture the difference between the online and the offline population.
can be used to conduct statistical tests. Another popular method is to stratify the propensity scores into
quintiles and to use the resulting five-level categorical variable in a post-stratification step.
Limitations
First, it is not possible to adjust for unbalanced,
unobserved variables that correlate with outcomes
unless the unobserved variable strongly correlates
with observed variables. Second, to calibrate the
Web survey, the propensity scoring approach requires
a reference sample. For example, Harris Interactive
conducts regular RDD phone surveys for that purpose. This requirement currently limits the appeal of
this method and makes it most useful for panel surveys. Propensity scoring attempts to achieve balance.
That is, after the propensity weighting adjustment,
the distribution of the propensity variables should be
the same for both the Web sample and the reference
sample. The traditional goal to find a logistic regression model that fits the data well is therefore not necessarily useful. Instead, the researcher should verify
whether balance was achieved. Preliminary research
seems to indicate that the propensity adjustment
reduces the bias considerably but does not remove it
altogether for all outcome variables. One direction
for future research is to find out which set of propensity variables works best for which set of outcomes.
Of additional note, propensity scoring has also been
applied to adjust for nonresponse bias when data are
available for both nonrespondents and respondents of
a survey.
Matthias Schonlau
Further Readings
Schonlau, M., Van Soest, A., Kapteyn, A., Couper, M., &
Winter, J. (in press). Selection bias in Web surveys and
the use of propensity scores. Sociological Methods and
Research.
Schonlau, M., Zapert, K., Payne Simon, L., Sanstad, K.,
Marcus, S., Adams, J., et al. (2004). A comparison
between a propensity weighted Web survey and an
identical RDD survey. Social Science Computer Review,
22(1), 128138.
PROPORTIONAL ALLOCATION
TO STRATA
Proportional allocation is a procedure for dividing
a sample among the strata in a stratified sample survey.
A sample survey collects data from a population in
order to estimate population characteristics. A stratified
sample selects separate samples from subgroups of the
population, which are called strata and can often
increase the accuracy of survey results. In order to
implement stratified sampling, it is necessary to be able
to divide the population at least implicitly into strata
before sampling. Given a budget that allows gathering
data on n subjects or a budget amount $B, there is
a need to decide how to allocate the resources for data
gathering to the strata. Three factors typically affect
the distribution of resources to the strata: (1) the population size, (2) the variability of values, and (3) the data
collection per unit cost in the strata. One also can have
special interest in characteristics of some particular
strata that could affect allocations.
Assuming the goal of the survey is to estimate
a total or average for the entire population, the variability of values are not known to differ substantially
by strata, and data collection costs are believed to be
roughly equal by strata, one could consider allocating
sample size to strata proportional to strata population
sizes. That is, if there are H strata with population
size Nh in stratum h, h = 1, 2, . . . , H, and one can
afford to collect data on n units, then the proportional
allocation sample size for stratum h is nh = nNh =N,
H
P
where N =
Nh is the total population size. As
h=1
629
Year
Population
Size: Total
Enrollment
First
5,000
399.4
399
Second
4,058
324.1
324
Third
4,677
373.6
374
Fourth
6,296
502.9
503
Total
20,031
1,600.0
1,600
Sample Size:
nh = nNh =N
Sample
Size: Rounded
Values
630
Table 2
College
Population Size:
Total Enrollment
Sample Size:
nh = nNh =N
Sample Size:
Rounded Values
Sample Size:
Adjusted to 200 Minimum
Agriculture
2,488
198.7
199
200
Business
3,247
259.4
259
246
Design
1,738
138.8
139
200
Engineering
4,356
347.9
348
331
Human Sciences
2,641
211.0
211
201
5,561
444.2
444
422
20,031
1,600.0
1,600
1,600
Total
631
1.
2.
3.
4.
5.
6.
7.
8.
9.
632
This laid the foundation for most of the current ethical standards in research. The American Psychological
Association (APA) began writing ethical guidelines in
1947, and its first ethical code was published in 1953.
Numerous revisions of the APA ethical guidelines for
the protection of human participants have taken place
over the years, and the latest revision occurred in 2002.
Note that APA guidelines address more issues than the
Nuremberg Code did; one difference is the issue of
Table 2
Summary of the 2002 American Psychological Association guidelines for the protection of human
research participants
1.
Research proposals submitted to institutional review boards (IRB) must contain accurate information. Upon
approval, researchers cannot make changes without seeking approval from the IRB.
2.
Informed consent is usually required and includes: (a) the purpose of the research, expected duration, and
procedures; (b) the right to decline to participate and to withdraw once the study has begun; (c) the potential
consequences of declining or withdrawing; (d) any potential risks, discomfort, or adverse effects; (e) any potential
research benefits; (f) the limits of confidentiality; (g) incentives for participation; (h) whom to contact for questions
about the research and participants rights. Researchers provide opportunity for the participants to ask questions
and receive answers.
3.
When research is conducted that includes experimental treatments, participants shall be so informed at the
beginning of the study of (a) the experimental nature of the treatment, (b) the services that will or will not be
available to the control group(s) if appropriate; (c) the means by which assignment to treatment or control group
is made; (d) available treatment alternative if the person does not want to participate or withdraws after the
study has begun; (e) compensation for costs of participating.
4.
Informed consent shall be obtained when voices or images are recorded as data unless (a) the research consists
only of naturalistic observations taking place in public places and the recording is not anticipated to cause harm;
or (b) the research design includes deception, and consent for the use of the recording is obtained during debriefing.
5.
When psychologists conduct research with clients/patients, students, or subordinates as participants, steps must
be taken to protect the participants from adverse consequences of declining or withdrawing from participation.
When research participation is a course requirement, or an opportunity to earn extra credit, an alternative choice of
activity is available.
6.
Informed consent may not be required where (a) research would not be reasonably expected to cause distress or
harm and (b) involves (i) the study of normal educational practices, curricula, or classroom management methods
in educational settings; (ii) only anonymous questionnaires, archival research, or naturalistic observations that
would not place participants at risk of civil or criminal liability nor damage financial standing, employability, or
reputation, and confidentiality is protected; or (iii) the study of organizational factors conducted in the workplace
poses no risk to participants employability and confidentiality is preserved.
7.
Psychologists make reasonable efforts to avoid offering excessive or inappropriate financial or other inducement
for research participation that would result in coercion.
8.
Deception in research shall be used only if it is justified by the studys significant prospective scientific value and
nondeceptive alternatives are not feasible. Deception shall not be used if it will cause physical pain or severe
emotional distress.
9.
(a) Participants shall be offered promptly supplied information about the outcome of the study; (b) if delay or
withholding of the study outcome is necessary, reasonable measures must be taken to reduce the risk of harm;
(c) when researchers realize that research procedures have harmed a participant, they take reasonable steps to
minimize the harm.
Pseudo-Polls
Further Readings
PROXY RESPONDENT
If a respondent reports on the properties or activities
of another person or group of persons (e.g., an entire
household or a company), the respondent is said to be
a proxy respondent. In some cases, proxy responses
may be a part of the design of the survey. In the
U.S. Current Population Survey, for example, a single
responsible adult is asked to report for all members
of the household 14 years of age or older. In many
surveys, adults are asked to report for children. In
other cases, proxy responses are used only when there
is a particular reason that the targeted person cannot
report. In household travel surveys, for example, data
are typically sought directly from each adult member
of the household. Only if a particular adult is on
extended travel, has language or medical difficulties
that would interfere with responding, or some similar
reason, would a proxy respondent be used.
Since a proxy response is treated the same as a selfreported response, an obvious benefit of allowing proxy
responses is to increase the response rate. If the targeted person is unavailable for the entire survey period,
the only way to get a response may be to accept a proxy
response. In surveys in which information is sought
about all members of a household, the lack of a response
from one member, if only self-reported data are permitted, could jeopardize the utility of the information from
the others.
Not allowing proxy responding also may increase
nonresponse bias. Those unavailable to respond for
themselves are more likely to be on long trips, in the
hospital, or away at college, and so on, than those
633
PSEUDO-POLLS
The term pseudo-poll refers to a number of practices
that may appear to be legitimate polls but are not. A
legitimate poll uses scientific sampling to learn about
the opinions and behaviors of a population. Pseudopolls include unscientific (and thus, unreliable) attempts
to measure opinions and behaviors as well as other practices that look like polls but are designed for purposes
other than legitimate research.
A variety of techniques are used to conduct unscientific assessments of opinion, all of which are considered pseudo-polls. These can be used by media and
other organizations as an inexpensive way to measure
public opinion and get the audience involved. However, they are very problematic from a data quality
standpoint and should not be referred to as polls.
634
Psychographic Measure
PSYCHOGRAPHIC MEASURE
A psychographic measure is a variable that represents
a personal characteristic of an individual that is not
Public Opinion
Further Readings
635
PUBLIC OPINION
Public opinion is one of the oldest and most widely
invoked concepts in the social sciences and has many
meanings and controversies associated with it. The
concept of public opinion has evolved during the past
three millennia as the notion of democracy evolved.
Public opinion plays a central role in democracy, and,
as such, the ability to measure public opinion accurately has become a critical endeavor for modern-day
democracies.
636
Public Opinion
Public Opinion
what he called currents of opinions through the literate population without mass meetings and crowds
intervening. Tarde noted the conditions for a public
as being a group of readersnot just scholars
dispersed across a wide territory who were joined
together by their shared thoughts on certain topics that
were created by their reading of the same news
reports. Tarde thus inextricably linked the public and
the institution of journalism.
In the early 1990s, pioneering social psychologist
Charles Horton Cooley defined public opinion not as
the mere aggregation of opinion but as a social
product, that is, a result of communication and reciprocal influence. Cooley wrote an extensive treatise on
his theory of public opinion in which he made clear
that it was the product of coordinated communication
in which people give time and attention to studying
a problem individually. A nation can be said to have
a form of public opinion when individuals must take
into consideration their own ideas but also fresh ideas
that flow from others engaged in thinking about the
same topics. The outcome of the process for Cooley
was not agreement but maturity and stability of
thought that resulted from sustained attention and
discussion about a topic. In other words, people might
not agree after deliberation more than they did before,
but their differences after careful consideration are
better thought out and less changeable. Cooley
believed that after such deliberation, people would
know what they really think about a topic as well as
what others around them think. To Cooley, this was
the important difference between true opinion and
simply ones casual impression. Today, we would
refer to this as opinion quality.
James Bryce was an early theorist of public opinion whose views, from the late 1880s, are of particular importance because of their influence on George
Gallup, who did more than anyone in his era to bring
together survey research, public opinion, and democracy. Bryce theorized that the history of democracy
was best explained by stages in which political
regimes took public opinion seriously. The first stage
reflects systems in which public opinion is theoretically acknowledged as the basis of governance but
the public is passive and allows a dominant group to
exercise authority. The second stage moves theory
into practice, such as when the early United States
adopted a constitution built on the principle of the
consent of the governed and the people begin to exercise power. The third stage reflects public opinion
637
638
Public Opinion
This is perhaps the oldest and most important question about public opinion, and it has been asked since
ancient times. Some of the early social scientists took
a very dim view of the competency of public opinion.
Gustav LeBon, one of the early writers of mass
psychology, was a famous skeptic who took a dim
view of popular capabilities. Walter Lippmann was
skeptical of the capabilities of public opinion. His
influential books, Public Opinion and The Phantom
Public, make the point clearly that democratic theory
requires too much of ordinary people. Furthermore,
the press, in Lippmanns view, was not adequate to
the task of public enlightenment as required in democratic society.
When the issue of public competence is raised, it
is useful to determine whether the supposed incompetence is attributed to inherent limitations of people, or
is it more properly limited to the publics level of literacy and educational attainment? Other possibilities
include the type and amount of communication that is
available to the masses, the role models and values
conveyed to the public about expectations related to
civic virtues, whether the public actually has the requisite skills such as how to participate in dialogue, debate,
and deliberation of political choices. Lippmanns chief
critic was the philosopher and educational reformer
John Dewey. His 1927 book, The Public and Its Problems, argued that the public was fully capable but that
too many people lacked the required levels of communication and the quality of public media needed for
informed decision making.
Do Majorities Exercise
Tyranny Over Minorities?
Tyranny of the majority is a third enduring problem of public opinion. The fear is that people whose
views are in the majority will use their power to
oppress those holding minority views. This raises
questions of civil liberties such as the protections of
freedom of expression and assembly as well as protections for speakers from retribution. There has
emerged in the literature of public opinion over the
years considerable attention to social pressures that
hinder peoples willingness to speak their mind on
controversial issues. These constraints range from
peoples unwillingness to be out of step with others to
real fear of repercussions or reprisals from employers
or government officials.
Are People Too Gullible
in the Face of Persuasion?
Public Opinion
639
Choice Questionnaire
Further Readings
640
PUBLIC OPINION
QUARTERLY (POQ)
Public Opinion Quarterly (POQ) is the journal of the
American Association for Public Opinion Research
(AAPOR). Founded in 1937 at Princeton University,
the journal is an interdisciplinary publication with
contributors from a variety of academic disciplines,
government agencies, and commercial organizations.
The mission of the journal is to advance the study of
public opinion and survey research. At this writing,
POQ is the premier journal in this field, with high
citation rankings in political science, interdisciplinary
social science, and communication.
POQ was originally focused on the impact of the
mass media on public opinion and behavior. The conjunction of two developments contributed to the journals founding: the consolidation of channels of mass
communication (at that time, print, motion pictures,
and radio) and the growth of social science research
on human behavior. The mid-1930s was a time of
intense concern about the impact of messages distributed simultaneously to mass audiences. Franklin
Roosevelts fireside chats, Father Coughlins radio
commentaries, and Adolf Hitlers speeches from Nuremburg were exemplars of the new phenomenon of
a single persuasive voice reaching millions of people
across the spectrum of society at the same time. This
era also saw the consolidation of commercial interest
in the opportunities presented by the mass media to
influence public views and consumer behavior.
Advertising and public relations had become significant societal institutions. By this time, too, it was
standard for government entities to have press offices
and information bureaus to attempt to affect public
views of policies and programs. Public opinion polls,
commercial ventures supported by media concerns,
had become a significant focus of attention in elections and in commercial research. The rationale for
POQ, presented in the editorial foreword to the first
issue, was that mass opinion had become a major
641
Further Readings
642
643
644
Mass Observation
Purposive Sample
PURPOSIVE SAMPLE
A purposive sample, also referred to as a judgmental
or expert sample, is a type of nonprobability sample.
The main objective of a purposive sample is to
produce a sample that can be logically assumed to
be representative of the population. This is often
accomplished by applying expert knowledge of the
population to select in a nonrandom manner a sample
of elements that represents a cross-section of the
population.
In probability sampling, each element in the population has a known nonzero chance of being selected
through the use of a random selection procedure. In
contrast, nonprobability sampling does not involve
known nonzero probabilities of selection. Rather, subjective methods are used to decide which elements
should be included in the sample. In nonprobability
sampling, the population may not be well defined.
Nonprobability sampling is often divided into three
categories: purposive sampling, convenience sampling, and quota sampling.
645
646
Push Polls
PUSH POLLS
A push poll is a form of negative persuasion telephone calling during a political campaign that is
meant to simulate a poll but is really intended to convince voters to switch candidates or to dissuade them
from going to the polls to vote. To an unskilled recipient of such a call, it sounds like a traditional telephone survey at the start, but the tone and content of
the questioning soon changes to the provision of negative information about one of the candidates. A distinguishing characteristic of a push poll is that often
none of the data are analyzed; the purpose of the
call is to move voters away from a preferred candidate. Push polling is so antithetical to legitimate polling that in 1996 the American Association for Public
Opinion Research (AAPOR), the American Association of Political Consultants (AAPC), and the
National Council on Public Polls (NCPP) issued
a joint statement condemning the practice. Since then,
the Council for Marketing and Opinion Research
(CMOR) has joined in this denunciation.
The support of the AAPC was important because
campaigns often test thematic content or different
approaches to arguments for the purposes of developing
campaign strategy. In these circumstances, the polling
firm is collecting data for analysis to determine which
themes or approaches are most effective. But in a push
poll, no data are analyzed; the calls are made with the
intent of getting voters to switch their support from one
candidate to another or to so turn them off from the
political process that they will decide to stay home on
Election Day rather than vote. And a great many more
people are contacted than are needed in the standard
sample size for a legitimate election campaign poll.
In a period of declining response rates, polling and
survey research organizations are always concerned
about establishing good rapport with respondents,
in the short term with an eye toward completing a specific interview and, more generally, with the purpose
of maintaining a positive image of the industry that
will promote future cooperation in surveys. Having
a bad experience with something that seems like a very
biased poll is harmful to both of these interests.
In general, push polling efforts occur late in the
campaign, when public disclosure is more problematical. Many of those called cannot distinguish
between a legitimate call from an established polling
p-Value
p-VALUE
The probability value (abbreviated p-value), which
can range from 0.0 to 1.0, refers to a numeric quantity
computed for sample statistics within the context of
hypothesis testing. More specifically, the p-value is
the probability of observing a test statistic that is at
least as extreme as the quantity computed using the
sample data. The probability is computed using
a reference distribution that is generally derived using
the null hypothesis. The phrase extreme is usually
interpreted with respect to the direction of the alternative hypothesisif two tailed, then the p-value represents the probability of observing a test statistic that is
at least as large in absolute value as the number computed from sample data. Put another way, the p-value
can be interpreted as the likelihood that the statistical
result was obtained by chance alone.
P-values often are reported in the literature for
analysis of variance (ANOVA), regression and correlation coefficients, among other statistical techniques.
The smaller the p-value, the more the evidence
provided against the null hypothesis. Generally, if
the p-value is less than the level of significance for
the test (i.e., a), then the null hypothesis is rejected
in favor of the alternative and the result is said to be
statistically significant.
In the context of survey research, interest may be
given in comparing the results of a battery of questionnaire items across demographic or other groups. For
example, a sequence of t-tests may be conducted to
compare differences in the mean ages, education level,
number of children in the home, annual income, and so
on, for people who are surveyed from rural areas versus those from urban areas. Each test will generate
a test statistic and a corresponding two-sided p-value.
Comparing each of these separate p-values to .05 or
.10, or some other fixed level of alpha, can give some
647
Table 1
Possible Values
for X
Proportion of Samples
Having This Value for X
5%
10%
1.5
15%
35%
2.5
20%
10%
3.5
5%
648
p-Value
Further Readings
Q
Assessment of the knowledge gained and retained by
interviewers is also a part of survey quality control.
QUALITY CONTROL
The term quality control refers to the efforts and
procedures that survey researchers put in place to
ensure the quality and accuracy of data being collected using the methodologies chosen for a particular study. Quality-control efforts vary from study to
study and can be applied to questionnaires and the
computerized programs that control them, sample
management systems to ensure proper case processing,
the monitoring of appropriate interviewer behavior,
and other quality-control aspects of the survey process, all of which can affect the quality of data and
thus the results.
Monitoring Interviews
In surveys that use interviewer-administered questionnaires, monitoring interviews has become a standard
quality-control practice. Telephone interviewer monitoring is accomplished by using silent monitoring
equipment so that neither the interviewer nor the
respondent is aware of it. In addition, facilities may
use equipment that allows the monitors to listen to an
interview and simultaneously visually observe the
progress of an interview on a computer screen. Monitoring serves four objectives, all aimed at maintaining
a high level of data quality. These objectives include
(1) obtaining information about the interview process
(such as procedures or interview items) that can be
used to improve the survey, (2) providing information
about the overall data quality for the survey in order
to keep the data collection process in statistical control, (3) improving interviewer performance by reinforcing good interviewing behavior, and (4) detecting
and preventing data falsification.
Generally speaking, there are two different types
of telephone monitoring conducted: quality assurance
monitoring and interviewer performance monitoring.
Quality assurance monitoring involves the use of
a coding scheme that allows monitors to score interviewers on a variety of aspects such as proper question delivery, use of nondirective probing, and correct
entry of responses. Interviewer performance monitoring takes more of a qualitative approach. Under this
Training Interviewers
In surveys that use interviewer-administered questionnaires, proper training of interviewers is often seen as
the heart of quality control. To become successful and
effective, interviewers first must be introduced to the
survey research process through a general training, in
which fundamental concepts such as basic interviewing techniques, obtaining cooperation, and maintaining respondent confidentiality are covered. Following
general training, interviewers should be trained on the
specifics of an individual project. Training sessions
should strive to maximize active participation. Topics
covered should be reinforced with group discussion
and interaction, trainer demonstrations, and classroom
practice and discussion. Role playing and practice are
also key elements of effective basic project training.
649
650
Quality Control
Supervisory Meetings
During field and telephone data periods, supervisors
often hold regularly scheduled meetings, sometimes
referred to as quality circle meetings, to discuss data
collection issues. These sessions are conducted to
build rapport and enthusiasm among interviewers and
supervisory staff while assisting in the refinement of
an instrument and providing ongoing training for the
staff. Such meetings have the potential to identify previously unrecognized problems with an instrument,
such as questions that respondents may not understand or questions that are difficult to administer. In
addition these sessions may reveal computer-assisted
interviewing software problems and issues with data
collection procedures and operations.
Verification Interviews
In addition to monitoring for quality and conducting
regular meetings with interviewers, many data collection supervisors take an additional step to ensure
data quality and prevent falsification. Verification by
telephone and reviewing interview timing data are
particularly useful and important for field interviews
where data are not collected in a central facility;
when data are collected this way, it makes the interviewers more susceptible to falsification, also known
as curb stoning.
Telephone verification is conducted by first telling
respondents who have just completed face-to-face
interviews that they may be contacted at a later date
for quality-control purposes. Telephone interviewers
will then contact selected respondents to verify the
face-to-face interview was completed properly and
with the correct sample member. Any suspicious
results are then conveyed to the data collection supervisors for investigation.
Some face-to-face surveys also review questionnaire timing data as part of standard quality-control
procedures. Under this method, as interviews are completed and transmitted from the field, timing data are
captured for each module or section within the instrument, along with overall timing data on the overall
length of the study. By recording and analyzing this
information, researchers can create reports that can be
used to detect suspiciously short interview administration times. This quality-control method allows for
early detection and resolution of problems, leading to
more complete and accurate data for analysis.
Examining Systems
and Data for Quality
In addition to training and implementing procedures
to ensure quality control through interviewers, and
regardless of the mode of data collection and whether
or not the questionnaire is interviewer administered,
researchers must also take steps to verify that the systems and procedures being used to collect and process
data are functioning properly. By using systematic
testing plans, researchers can verify routes through
the computerized questionnaire and ensure that a program is working as specified in terms of display, skip
pattern logic, and wording. Researchers should also
review frequencies of the collected data throughout
data collection (especially early in the field period) to
651
652
Questionnaire
Some prominent examples of quality of life measures are Cantrils Self-Anchoring Scale; Campbell,
Converse, and Rogerss Domain Satisfactions and
Index of Well-Being; Andrews and Witheys Global
Well-Being Index; and Deiners Quality of Life Index.
Among the measurement challenges in this area are
the following: the positivity bias (the tendency for
respondents to use the high end of the scale for perceptual measures of psychological indicators), the ethnocentrism bias (indicators that reflect values of the
researchers), the subjectivity bias (a reliance on nonobjective perceptual indicators), the quantitative bias
(a reliance on quantitative measures that may be less
meaningful than qualitative indicators), measurement
biases (such as response set and social desirability),
weighting biases (problems of determining the relative
importance of indicators), and measurement complexity (because there are so many important dimensions
of life quality, measurement is cumbersome).
Douglas M. McLeod
See also Bias; Level of Analysis; Perception Question;
Positivity Bias; Public Opinion; Social Desirability
Further Readings
QUESTIONNAIRE
The questionnaire is the main instrument for collecting
data in survey research. Basically, it is a set of standardized questions, often called items, which follow
a fixed scheme in order to collect individual data about
one or more specific topics. Sometimes questionnaires
are confused with interviews. In fact, the questionnaire
involves a particular kind of interviewa formal contact, in which the conversation is governed by the
wording and order of questions in the instrument. The
Questionnaire Construction
When questionnaires are constructed, four primary
requirements must be met:
1. Theoretical knowledge of the topic of research,
achieved through the reconnaissance of the relevant
literature (if such exists) and/or in-depth interviews
or other qualitative methods of research (ethnographies, focus groups, brainstorming, etc.) that may
serve as pilot studies.
2. Valid and reliable operationalization of concepts
and hypotheses of research. Most questionnaire
items, in fact, originate from the operationalization
phase. To check the validity (the degree to which
an item or scale measures what it was designed to
measure) and reliability (the consistency or replicability of measurements) of a set of items, various
techniques can be used: external, construct, and
face validity, among others, in the first case; and
parallel forms, testretest, split-half, intercoder
techniques, in the case of reliability.
3. Experience in writing a questionnaire, or at least
the availability of good repertoires of published
questionnaires.
4. A knowledge of the target population. This is crucial information: The target population must be
able to answer to the questions accurately.
Questionnaire
653
Generally, the ordering of items within a questionnaire follows this pattern: (a) general and neutral
questions (to build rapport and thereby obtain the
respondents confidence), (b) questions that require
greater effort (e.g., complex, core questions), (c) sensitive questions, and (d) demographic questions. This
general pattern has been found to increase data quality
for most surveys.
Nevertheless, the order in which questions are
organized and the use of filter questions can yield
various shapes, for example, funnel format (from general to specific questions), inverted funnel format
(from specific to general questions), diamond format,
X-format, box format, and mixed formats. The
researcher will need to choose the form that best fits
his or her goals. Classic and recent studies confirm
that the order of the questions is an important contextual factor that influences respondents answers.
Questionnaire Wording
and Question Types
A general rule, in constructing questions, is to be
clear in terminology and simple in structure. More
specifically:
Questions should use simple vocabulary.
Their syntax should be simple, without subordinate
clauses and without double negatives.
They should not contain two questions in one (double-barreled questions).
Questions must be concrete with respect to time and
events.
They should not lead the respondent to particular
answers.
The number of response alternatives should be limited unless additional visual cues are employed.
All the alternatives of response should appear
acceptable, even the most extreme.
The response alternatives should be exhaustive and
mutually exclusive.
Basically, two broad types of questions can be distinguished: open-ended and closed-ended (or fixed
alternative) questions. The latter are more frequent in
survey research. Open-ended questions are suitable
when the researcher thinks it would be better to leave
the respondents free to express their thoughts with
their own words. However, they require a subsequent
postcoding, for instance through content analysis, lexical correspondence analysis, or a qualitative treatment.
654
Questionnaire
Questionnaire
655
Further Readings
656
Questionnaire Design
QUESTIONNAIRE DESIGN
Questionnaire design is the process of designing the
format and questions in the survey instrument that will
be used to collect data about a particular phenomenon.
In designing a questionnaire, all the various stages of
survey design and implementation should be considered. These include the following nine elements:
(1) determination of goals, objectives, and research
questions; (2) definition of key concepts; (3) generation
of hypotheses and proposed relationships; (4) choice of
survey mode (mail, telephone, face-to-face, Internet);
(5) question construction; (6) sampling; (7) questionnaire administration and data collection; (8) data summarization and analysis; (9) conclusions and communication of results.
One goal of the questionnaire design process is to
reduce the total amount of measurement error in
a questionnaire. Survey measurement error results
from two sources: variance and bias. Question wording (including response alternatives), ordering, formatting, or all three, may introduce variance and bias into
measurement, which affects the reliability and validity
of the data and conclusions reached. Increased variance involves random increases in distance between
the reported survey value from the true survey
value (e.g., attitude, behavior, factual knowledge).
Increased bias involves a systematic (directional) difference from the true survey value. Questionnairerelated error may be introduced by four different
sources: the interviewer, the item wording/format/
ordering, the respondent, and the mode of data collection. Careful questionnaire design can reduce the likelihood of error from each of these sources.
to the goals and research questions of the survey; parsimony is a virtue in questionnaire design. To properly define goals and research questions, key concepts
of interest need to be detailed. A list of concepts of
interest and how they relate to one another aids in
selecting specific questions to include. These concepts
are transformed into (i.e., operationalize as) survey
questions that measure the concept in some way, proceeding from abstract concepts to specific measurements. Once the goals and objectives of the survey
have been established, it is possible to generate
specific research questions and, possibly, directional
hypotheses based on theory and previous findings.
The items used in the questionnaire should produce
data that are appropriate to the desired analyses.
Questionnaire Format
The layout of a questionnaire, no matter what type,
should reduce the cognitive burden of respondents
(and interviewers) and contain an intuitive and logical
flow. For example, in most cases, questions on related
topics should be grouped together and questions
should maintain the chronology of events. Questionnaire format should be as easy as possible to understand and use for the interviewer or the respondent
depending on the mode of administration. Questions
should be numbered individually, clearly spaced, and
visually distinct from one another. It is especially
important that self-administered questionnaires provide clear and concise instructions and have a simple
layout (e.g., common question format, visual instructions for skipping questions). For intervieweradministered questionnaires, instructions that are to be
read aloud to respondents should be visually distinct
from instructions that are for only the interviewer;
for example, set off by italics, CAPS, bold type, or
parentheses ( ). Ideally, important questions should
appear early in a questionnaire to avoid the possible
negative effects of respondent fatigue on motivation,
recall, and item nonresponse.
Questionnaires that have a professional appearance
are taken more seriously by respondents (e.g., displaying professional affiliations). Social validation is also
an important factor; questionnaires that end by thanking respondents for their time and effort and reminding
respondents of the usefulness of the data they have
provided are rated as more enjoyable by respondents.
Questionnaire length is an important issue for both
cost reasons and effects on respondent behavior. First,
Questionnaire Design
Question Wording
Unclear concepts, poorly worded questions, and
difficult or unclear response choices may make the
questionnaire difficult for both respondents and interviewers. Questionnaires should contain items that are
both reliable and valid. Reliability is the consistency
of the measurement; that is, the question is interpreted
and responded to similarly over repeated trials.
Construct validity is whether or not the measurement,
as worded, properly reflects the underlying construct
of interest.
Questions may be troublesome for respondents
with respect to comprehension, retrieval, and response
formation. First, the respondent must be able to
understand the question. Questions should use simple
(enough) terms and concepts that respondents are
likely to know, and they should not contain vague and
ambiguous terms; in other words, questions should be
worded appropriately for the reading and knowledge
level of the respondents. Questions should avoid
unnecessary complexity, such as compound sentences,
double-barreled questions (two questions within one),
and double negatives. Longer questions that provide
sufficient detail (without becoming convoluted) and
are explicit generally will enhance comprehension.
Even though questions may be perfectly understandable for respondents, there may be problems
with retrieval of the desired information from respondents memory. Questions should ask for information that is as recent as possible and provide cues
that match the original encoding context. Question
657
Response Alternatives
Responses can be either open-ended or closed-ended.
Open-ended questions provide no predetermined response categories and allow the respondent to answer
with whatever information he or she considers relevant. This answer format is more cognitively demanding for respondents, but it often results in more
detailed and informative responses. Open-ended questions are good for gathering information on a topic for
which there is no clear set of response categories.
Open-ended questions work better with intervieweradministered questionnaires than in surveys that use
self-administered questionnaires. An open-ended response format may result in a great deal of information, but it may be information that is not easily comparable or easily codable. Open-ended data also can
be quite time-consuming and very costly to process.
Closed-ended questions ask respondents to select
among a predetermined set of response categories.
These response categories must be exhaustive and
mutually exclusive. The closed-ended method reduces
the cognitive burden of the respondent and enhances
the ability to compare responses. The data are already
coded (assigned a numerical value) and can be easily
quantified, which saves data processing time and
658
Questionnaire Design
money. Closed-ended questions are ideal for selfadministered questionnaires because they avoid the
greater subjectivity and volatility of open-ended questions. However, if the researcher is not careful, the
selection of response alternatives may bias respondents by framing thinking and by predetermining
what are considered appropriate answers.
Response categories for event information may
be divided by occurrence, absolute frequency (actual
amount), relative frequency (quantifiers that denote
more or less of something, such as always, most
of the time, sometimes, basically never), regularity, and specific dates. Attitude answers may take
the form of ratings, rankings, agreedisagree scales,
and forced choice. Attitude questions essentially
require an evaluation of some phenomenon (e.g.,
approval/disapproval of the president). In selecting
terms to use to describe attitudes, it is best to use
terms that are simple and easy to understand, as well
as terms that are likely to be interpreted similarly by
all respondents.
Response scales require decisions concerning the
length of scale, whether or not to include midpoint or
explicit dont know options, and whether to use verbal labels, numbers, or both. A very common scale
for closed-ended survey items is a Likert scale (e.g.,
strongly agree, somewhat agree, neither agree
nor disagree, somewhat disagree, or strongly disagree). Rating scales can be presented as either a verbal description (e.g., strongly agree, strongly disagree),
numbers (e.g., 1 to 5), or a combination of both.
Meaningful verbal labels with relatively equal intervals provide a linguistic reference point for numerical
scales, which significantly aids respondents.
Including a middle category is desirable when
there is a readily identifiable and meaningful midpoint
to a scale (e.g., increase, remain the same, decrease).
The number of response options should attempt to
maximize discrimination between response categories
while maintaining the usefulness and meaningfulness
of response categories. Research shows that five to
nine response categories is best. There is tremendous
debate over the usefulness of explicitly including
dont know responses. Some argue that questionnaire designers are incorrect in assuming respondents
always have an opinion and that surveys often force
respondents to artificially produce an opinion. The
lack of opinion regarding a topic has been referred
to as displaying a nonattitude. Others argue that
explicit dont know response choices encourage
Questionnaire Length
Further Readings
659
QUESTIONNAIRE LENGTH
Questionnaire length refers to the amount of time it
takes a respondent to complete a questionnaire. Survey instruments can vary in length from less than
a minute to more than an hour. The length of a questionnaire is important because it can directly affect
response rates, survey costs, and data quality. Longer
questionnaires result in higher data collection costs
and greater respondent burden and may lead to lower
response rates and diminished quality of response.
However, practical experience and experimental data
suggest that below a specific threshold, questionnaire
length bears little relationship to response rate or data
quality and has only a minor impact on survey costs.
For telephone and online surveys, although longer
questionnaires will result in higher data collection
costs, it is generally recognized that questionnaire
length is a major factor for these modes only when it
exceeds 20 minutes. The field cost of increasing the
length of a telephone interview 1 minute beyond the
20-minute mark is 2 to 3 times as high as the cost of
adding another minute to a 15-minute telephone interview. This higher cost reflects the lower response
rates associated with longer interviews (due to the
effects of the increase in respondent burden) and the
greater number of interviewer hours needed to
achieve the same sample size. The cost of administering a Web survey is similarly increased when the
length of interview exceeds 20 minutes. In this case,
additional monetary incentives and rewards must be
offered to prompt respondents to complete a longer
interview. For mail surveys, questionnaire length has
a small impact on field costs compared to phone and
online surveys. Longer mail questionnaires may
require more postage and data entry will be more
expensive, but these cost increases are generally not as
large as those for other modes of data collection. With
in-person interviewing, respondents do not appear to
be as affected by the length of the questionnaire as
they do by the length of other survey modes.
660
Questionnaire-Related Error
QUESTIONNAIRE-RELATED ERROR
Error related to the questionnaire is one component
of total survey error. This type of error is traditionally viewed as being a discrepancy between the
respondents answer and the true score or answer.
Questionnaire-related errors may stem from the content of the questionnaire (the wording, ordering, and/
or formatting of the questions themselves), the method
of delivery (whether by interviewer, by telephone, on
paper, by computer, or some other mode), the materials that accompany the questionnaire (such as show
cards in a face-to-face survey), or all of these things.
The focus in this entry is primarily on response error
by respondents, although these other factors can be
considered secondary aspects of questionnaire-related
error.
Recent work on questionnaire error draws on cognitive psychology to inform the design of standardized
instruments. One prominent framework of the survey
response process (or the mental steps a person undergoes when answering survey questions), advanced by
Roger Tourangeau, Lance Rips, and Ken Rasinski,
consists of four major components: (1) comprehension
and interpretation of the question; (2) retrieval of relevant information from memory; (3) judgment, or
deciding on an answer; and (4) response, including
mapping the answer to response categories and reporting the answer.
Errors may arise at any step, and the questionnaire may affect the likelihood and extent of those
errors. Although there are many design principles
that minimize error, often there are trade-offs that
Questionnaire-Related Error
Comprehension
Taking steps to ensure question comprehension by
respondents is often the first stage in developing
a questionnaire that minimizes error due to misunderstanding or misinterpreting questions. In general, the
more complex or ambiguous the task, the more error
there will be. Questions may contain words unfamiliar
to the respondent, may have complex syntax, or may
assume a higher level of respondent cognitive ability
than is warranted. Sometimes researchers use specialized terminology rather than colloquial language.
Respondents may also not pay close attention and
may miss part of the question.
One difficulty is that many words in natural language are ambiguous, and researchers often must use
vague quantifiers that have an imprecise range of
application. Some examples include categories of
words denoting frequency (e.g., never, sometimes,
pretty often, very often), probability expressions (e.g.,
very unlikely, somewhat unlikely, likely), amounts
(e.g., none, some, many, all), and strength (e.g.,
extremely dissatisfied, very dissatisfied, somewhat dissatisfied). There is evidence that respondents have
varied associations with words that denote frequency,
regularity, or size. For example, How often . . .?
results in variability based on how respondents interpret the word often. It is advisable to use more specific quantifiers, such as On how many days . . .? or
How many times in the last week/month/year . . .?
Simple words or categories that appear to have
common understandings, such as household or sibling, can be interpreted differently. When possible,
such terms should be defined so as to minimize variability in how respondents interpret them. For example, household can include persons who use the
address as their permanent address but who are not
physically living there, such as college students who
live in dormitories, individuals on military duty, or
individuals who may reside at that address for only
part of the year. Sibling may include biological siblings, half siblings, and adopted or stepsiblings.
661
662
Questionnaire-Related Error
housework during the past 7 days may be more manageable if respondents are asked separately about time
spent cleaning, time spent cooking, and time spent
washing and ironing (or other household activities).
Providing multiple cues and decomposing the task
can improve recall. In addition, spending more time
on the task gives respondents more time to search
memory. Thus longer introductions and a slower
interviewing pace can be beneficial. Finally, some
research suggests it is possible to motivate respondents to search their memories more carefully by
highlighting the importance of accurate information;
the impact of this, however, is normally small.
Trade-Offs
Although the questionnaire design principles mentioned earlier minimize many errors, they may
663
Further Readings
664
Question Stem
context in which later questions are interpreted. Question order effects are also persistent, and separating
items for which there is likely to be an order effect
with questions on unrelated topics generally does not
eliminate such effects. In designing any questionnaire,
therefore, careful consideration must be given to the
potential for question order effects. Conducting cognitive interviews and a thorough pretest of the questionnaire also will help to identify such possible effects.
Procedures such as randomization of sets of items
or systematically varying the order of questions will
serve to minimize the impact of question order or
enable the researcher to specify the magnitude of
such effects.
It is difficult to identify in advance the contexts
in which question order effects will occur, and
some effects will certainly arise despite the best
efforts of the designers of survey questionnaires to
minimize them. Identifying such effects is important
in any analysis, but it may be of particular consequence in trend analyses, in which the focus is on
change in opinions or attitudes. Even when the identical question is asked at different points in time,
varying results may be due not to actual change but
rather to the different order (context) in which the
questions were asked.
Robert W. Oldendick
See also Cognitive Interviewing; Context Effect; Pilot Test;
Random Order; Random Start; Response Bias; Topic
Saliency
Further Readings
665
QUESTION STEM
A question stem is the part of the survey question that
presents the issue about which the question is asking.
With closed-ended questions, the stem can perhaps
best be defined as the first half of a survey question
that consists of two parts: (1) the wording or text that
presents the issue the respondent is being asked to
consider (along with any instructions, definitions, etc.)
and (2) the answer options (response alternatives) from
which a respondent may choose.
Survey researchers must strive to craft question
stems that meet a number of important criteria. First
and foremost, question stems must be written so that,
to the degree that this can be controlled, all respondents understand the question being posed to them as
meaning the same thing. If a given question is perceived differently by different types of respondents, it
essentially becomes multiple questions capturing different things, depending on respondent perception.
Because of this, survey researchers must make every
attempt to word question stems simply and write to
the population of interest. Question designers should
consider providing definitions of terms and any guidance they can in a given question stem to ensure uniform understanding of concepts and terms.
Although questions including definitions, examples, and so forth, are created with the best of intentions, they may be too cumbersome for respondents.
Survey researchers should avoid question stems that
are too long and wordy; question stems should be as
succinct as possible. Survey researchers are encouraged to break up complex issues into separate questions in order to aid respondent understanding. In
doing so, the creation of double-barreled questions is
also avoided.
In addition to being worded to ensure respondents
understanding, question stems must also be carefully
worded to avoid affecting the results of a given item.
Question stems should be written without the use of
inflammatory phrases and should not purposely contain emotional or slanted language that could influence respondents and bias results. As the survey
research industry continues to move toward the use of
mixed-mode data collection and the full conversion of
surveys from one mode to another, question stems
must be checked carefully for mode appropriateness
as well. Different modes rely on different words,
instructions, and phrasing to communicate ideas, and
666
Further Readings
QUESTION WORDING AS
DISCOURSE INDICATORS
Many of the questions used by survey researchers
serve to solicit peoples opinions on topical issues,
and the distribution of the answers to survey questions
makes up what is usually considered to be the publics opinions. Thus, most research on survey questions treats these questions as stimuli and focuses
on their communicative and evocative functions. As
such, scholars examine response biases associated
with certain question formats or wordings, the cognitive processes underlying these biases, and the flow
and conversational logic of the interviewing process.
But survey questions also may be viewed as
667
668
Quota Sampling
Summary
Public opinion surveys provide a natural arena in
which discourse frames thrive and can be traced.
Surveys closely follow current events and issues in
the news. They constitute a key discourse domain of
ongoing social action, a kind of self-contained system
of communication highly indicative of the world of
political discourse in general. They provide critical
historical instances in the form of mini-texts, in which
major themes on the public agenda, assumptions, and
ideologies are crystallized in the form of questions.
Therefore, content analysis of question wording provides an appealing approach for the study of political
discourse.
Jacob Shamir
See also Content Analysis; Poll; Pollster; Public Opinion;
Questionnaire; Questionnaire Design
Further Readings
QUOTA SAMPLING
Quota sampling falls under the category of nonprobability sampling. Sampling involves the selection
of a portion of the population being studied. In probability sampling each element in the population has
a known nonzero chance of being selected through
the use of a random selection procedure such as simple random sampling. Nonprobability sampling does
not involve known nonzero probabilities of selection.
Rather, subjective methods are used to decide which
elements should be included in the sample. In nonprobability sampling the population may not be well
defined. Nonprobability sampling is often divided into
three categories: purposive sampling, convenience
sampling, and quota sampling.
Quota sampling has some similarities to stratified
sampling. The basic idea of quota sampling is to set
a target number of completed interviews with specific
subgroups of the population of interest. The sampling
procedure then proceeds using a nonrandom selection
mechanism until the desired number of completed
interviews is obtained for each subgroup. A common
example is to set 50% of the interviews with males
and 50% with females in a random-digit dialing telephone interview survey. A sample of telephone numbers is released to the interviewers for calling. At the
start of the survey, one adult is randomly selected
from a sample household. It is generally more difficult
to obtain interviews with males. So for example,
if the total desired number of interviews is 1,000
(500 males and 500 females), and interviews with 500
females are obtained before interviews with 500
males, then no further interviews would be conducted
with females and only males would be randomly
selected and interviewed until the target of 500 males
is reached. Females in those sample households would
have a zero probability of selection. Also, because the
500 female interviews were most likely obtained at
earlier call attempts, before the sample telephone
numbers were thoroughly worked by the interviewers,
females living in harder-to-reach households are less
likely to be included in the sample of 500 females.
Quotas are often based on more than one characteristic. For example, a quota sample might have
interviewer-assigned quotas for age by gender by
employment status categories. For a given sample
household the interviewer might ask for the rarest
group first, and if a member of that group is present
in the household, that individual will be interviewed.
If a member of the rarest group is not present in the
household, then an individual in one of the other rare
groups will be selected. Once the quotas for the rare
groups are filled, the interviewer will shift to filling
the quotas for the more common groups.
The most famous example of the limitations of this
type of quota sampling approach is the failure of the
Quota Sampling
669
survey estimates. In the case of the 1948 U.S. presidential election, the sampling bias was associated
with too many Republicans being selected. Another
problem with quota sampling is that the sampling
procedure often results in a lower response rate than
would be achieved in a probability sample. Most
quota samples stop attempting to complete interviews with active sample households once the quotas have been met. If a large amount of sample is
active at the time the quotas are closed, then the
response rate will be very low.
Michael P. Battaglia
See also Convenience Sampling; Nonprobability
Sampling; Probability Sample; Purposive Sample;
Simple Random Sample; Stratified Sampling
Further Readings
R
visible to the respondent, drop-down boxes require
a series of actions by the respondent to open the dropdown list, scroll through the full set of options, and
select a response. This difference can result in greater
respondent burden, increased cognitive difficulty for
the respondent, and response distribution differences
due to primacy effects.
RADIO BUTTONS
A radio button is a type of survey response format used
in electronic questionnaire media such as Web surveys,
email surveys, personal digital assistant (PDA) applications, and other electronic documents. The radio button
response format allows respondents to select one, and
only one, of a set of two or more mutually exclusive
response options. Respondents select the response
option of interest by using a mouse pointer, keyboard
key, or touch-screen stylus. The term radio button is
a reference to the punch-in buttons invented years ago
on radios to choose a preset station, such as in an automobile; as each new button is pressed, the previously
pressed button is returned to the neutral position.
Radio buttons are often displayed as a series of open
circles with a value or value label shown next to each
button. They can be listed vertically, horizontally, or in
a grid or matrix. Other electronic response formats
include (a) drop-down boxes for longer lists of options,
(b) check boxes for check-all-that-apply questions, and
(c) text input fields for open-ended responses. Radio
buttons, like other on-screen electronic response formats, require the respondent to exercise care when
selecting the option of interest, making the format susceptible to careless respondent error. This is particularly true for respondents with less extensive computer
experience.
Although drop-down boxes require less screen space
than do radio buttons, the latter more closely resemble
the format used in paper-based questionnaires. Additionally, while all radio button options are immediately
Adam Safir
See also Drop-Down Menus; Internet Surveys; Mutually
Exclusive; Primacy Effects; Questionnaire Design;
Respondent Error; Web Survey
Further Readings
RAKING
Raking (also called raking ratio estimation) is a poststratification procedure for adjusting the sample weights
in a survey so that the adjusted weights add up to
known population totals for the post-stratified classifications when only the marginal population totals are
known. The resulting adjusted sample weights provide
a closer match between the sample and the population
671
672
Raking
across these post-strata than the original sample. Raking, however, assumes that nonrespondents in each
post-stratum are like respondents. Nonetheless, when
implemented with care, raking improves the mean
squared error of sample estimates.
The term raking is used to describe this statistical
technique because the raking ratiothe ratio of the
population total (or control) total for a given poststratum to the marginal row (or column) total from the
sample for that same post-stratumis calculated and
then applied to each of the cells in that row (or column). This is done for each of the post-strata and
repeated iteratively multiple times until the marginal
row and column totals converge to the population totals.
In essence, the raking ratio is raked over the cells in
the respective rows and columns until convergence to
the population totals is achieved, hence the term.
Raking was developed by W. Edwards Deming
and Frederick F. Stephan in the late 1930s and used
with the 1940 U.S. Census to ensure consistency
between the census and samples taken from it. The
computational procedure is the same as iterative proportional fitting used in the analysis of contingency
tables, but the latter is not typically formulated in
terms of weight adjustment.
row 1 sum to the control total for that row. Then do the
corresponding adjustment (multiplying the appropriate
raking ratio for row 2 by each row 2 entry) to row 2.
Continue in this way until all rows have been adjusted
with each row summing to its respective control total.
For the second variable, multiply the first column
by the control total for column 1 divided by the sum
of the cell entries for column 1. At this point column
1 adds up to the control total for column 1, but the
rows may no longer add up to their respective control
totals. Continue multiplying each column by its
respective raking ratio until all columns add up to
their respective control totals. Then repeat the raking
ratio adjustments on the rows, then the columns in an
iterative fashion. (The appropriate raking ratios for
the rows and columns will likely change in each iteration until the process converges.) Raking stops when
all the rows and columns are within a pre-specified
degree of tolerance or if the process is not converging.
If raking converges, the new sampling weight for
a sampling unit in a particular cell is the initial sampling weight times the raking-adjusted cell estimate
divided by the initial cell estimate.
Because raking adjusts by a series of multiplications, any zero cell entry will remain zero. This property is advantageous in the case of structural zeros:
cells that must be zero by definition or that are known
to be zero in the entire population. In other cases, if the
initial sample estimate for a cell is zero, but the population value may be positive, for example, the number of
people suffering from a rare disease in a geographic
area, it may be advisable to replace the initial zero estimate for the cell by a small positive value.
For two-variable raking, the process is known to
converge if all initial cell entries are nonzero. If there
are zero cells, for some configurations the process will
not converge.
The Many-Variable
Raking Procedure
Raking can also be performed when there are more
than two distinct variables. For three variables, one
has control totals for rows, columns, and pillars. The
rows are adjusted to add up to the control totals, then
the columns, then the pillars. These raking steps are
repeated until convergence occurs. Raking is similarly
specified for more than three control variables. Convergence is difficult to ascertain in advance, but the
presence of zero cells is an important factor.
Random
over all cells ij with cij > 0. It can be verified that the
sum is always nonnegative and only zero when
rij = cij for cij > 0. The corresponding conditions hold
for more-than-two variable raking.
Practical Considerations
Raking is often employed when the control totals are
not known exactly but can be estimated with higher
accuracy than can the cell estimates. For example, for
household surveys in the United States, demographic
estimates from the Current Population Survey or the
American Community Survey may be deemed accurate enough to use as control totals.
Much experience and fine tuning are needed in
raking to determine the number of control variables to
use and the number of cells. Too many control variables may lead to convergence problems or highly
variable weights resulting in increased variance. In
addition, if the variables used in the raking are not
highly correlated with all the variables in the sample,
then the variances for the uncorrelated variables can
increase. Too few control variables may result in sample estimates that do not match the population as
closely as otherwise could be achieved.
Michael P. Cohen
See also Bias; Mean Square Error; Post-Stratification;
Weighting
Further Readings
673
RANDOM
A process is random if its outcome is one of several
possible outcomes and is unknown prior to the execution of the process; that is, it results merely by chance.
Some survey operations are carried out as random
processes because of advantages resulting from multiple possible outcomes or from the absence of advance
knowledge about the outcome. Such survey operations are said to be randomized and are discussed in
more detail later in this entry. Random numbers are
the realized outcomes of a numeric-valued random
process. They are used by researchers or surveyoperations staff to carry out randomized survey operations. This entry discusses properties and sources of
random numbers.
For some randomization operations, the needed
random numbers must be integers uniformly distributed between 1 and some maximum value. For other
operations the needed random numbers must be uniformly distributed between 0 and 1. Statistical tests
exist for testing the quality of random numbers. These
include tests for goodness of fit to the uniform distribution, tests for the absence of trends in sequences of
random numbers, and tests for the absence of nonzero
correlations between pairs of random numbers at fixed
lags. Random numbers obtained by using simple
devices such as dice, slips of paper drawn from a hat,
or numbered balls removed from an urn often fail one
or more of these tests and should not be used in professional survey work.
Tables of random numbers appearing in textbooks
about statistical methods or survey sampling procedures can be used as sources of small quantities of
random numbers. Early tables of random numbers
were created by using series of digits from an irrational number, such as e or p, or by using a specialpurpose electronic roulette wheel. A much easier
way to create tables of random numbers and to
674
Random Assignment
produce large quantities of random numbers for automated operations is to have a computer program generate pseudorandom numbers. These are numbers that
appear to be random but are actually deterministic. A
computer program cannot generate true random numbers because a programs outputs are completely
determined by its logic and inputs, so outputs from
a computer program are known. Algorithms for generating pseudorandom numbers appear easy to program, but there are a number of machine-dependent
considerations and choices of parameters associated
with successful implementation, so it is usually better
to use existing computer code that has been thoroughly evaluated.
Richard Sigman
See also Random Assignment; Random Order; Random
Sampling; Random Start
Further Readings
RANDOM ASSIGNMENT
Random assignment is a term that is associated with
true experiments (called controlled clinical trials in
medical research) in which the effects of two or more
treatments are compared with one another. Participants (respondents, subjects, etc.) are allocated to
treatment conditions in such a way that each participant has the same chance of being a member of a particular treatment group. The groups so constituted are
therefore as similar as possible at the beginning of the
experiment before the treatments are introduced. If
they differ at the end of the experiment, there will be
a statistical basis for determining whether or not, or to
what extent, it is the treatments that were the cause of
the difference.
Traditionally, survey research and experimental
research have not had much in common with one
another. However, the following examples illustrate
some of the many ways that survey researchers can
benefit from random assignment and experimental
design.
A survey researcher would like to determine whether
it would be better to use answer sheets for a particular
Further Readings
675
676
The original RDD sampling technique was formally introduced in 1964. The approach involved
identifying area codeprefix combinations assigned
for regular household telephones, and then appending
a random 4-digit number to create a sampled telephone number. This process was repeated until the
desired sample size was achieved. In this scheme,
each number has an equal probability of selection.
Households, on the other hand, have unequal probabilities of selection because they may have more than
one telephone line in the household; thus, the household could be sampled on any of these numbers. But
this inequality is easily remedied by either weighting
adjustments or further subsampling, as is often done
in case-control RDD samples. In the early days, the
area codeprefix combinations were identified from
local telephone directories, but later eligible combinations became available nationally from computerized
files. In this RDD method, all possible telephone
numbers, not just those listed in telephone directories,
could be sampled. Furthermore, by taking advantage
of the area codeprefix combination, the efficiency of
sampling (i.e., contacting a residence with the dialed
number) increased from less than 1% to over 20%.
Alternatives to further improve the efficiency of the
calling were offered, such as the add-a-digit method,
but most of these methods suffered from other problems that limited their acceptance.
Mitofsky-Waksberg (MW) RDD Sampling
The next major advance was named the MitofskyWaksberg (MW) RDD sampling technique after its
two developers, Warren Mitofsky and Joseph Waksberg. The method was developed after it was
observed that residential numbers tended to be clustered within 100-banks. More specifically, a large portion of the eligible 100-banks had no residential
numbers assigned, whereas in those with any residential numbers, over 30% of the numbers were residential. This was due largely to the ways local telephone
companies assigned the numbers to customers. The
MW scheme takes advantage of this clustering to
improve calling efficiency using an equal probability,
two-stage sampling scheme. The first stage begins with
the random selection of 100-banks from the frame of
eligible area codeprefixes. Two random digits are
appended to form 10-digit telephone numbers that are
dialed. If a sampled number is not residential, then
the 100-bank is not sampled in the first stage. If the
number is residential, then the 100-bank is sampled,
and a set of additional numbers within the 100-bank
are sampled to achieve a fixed number of residences
within the 100-bank that are sampled. This scheme is
677
very elegant theoretically because 100-banks are sampled with probability exactly proportional to the number of residential numbers in the bank even though the
number of residential numbers is unknown at the time
of sampling. All households have the same probability
of selection, with the same caveat given previously
concerning multiple lines within the household. The
biggest advantage of the MW method is its efficiency:
About 60% to 65% of the numbers sampled in the
second stage are residential. This was a marked
improvement over the original RDD sampling. Due to
its efficiency, the MW technique became the main
method of RDD sampling for nearly two decades.
List-Assisted Sampling
Despite the popularity of the MW sampling technique, many alternatives were suggested because the
method had some implementation and operational difficulties and the sample was clustered (the 100-banks
are the first-stage clusters). The clustering reduced
the statistical efficiency of the sample. In the early
1990s alternative methods of stratified RDD sampling
were being explored, and technological developments
allowed easy and inexpensive access to data to support
these schemes. The most recent method of RDD sampling, list-assisted RDD sampling, was spawned from
this environment and was quickly adopted by practitioners. In list-assisted RDD sampling all 100-banks in
eligible area codeprefix combinations are classified by
the number of directory-listed telephone numbers in the
bank. A sample of 100-banks with x or more listed telephone numbers is selected (when x = 1 this is called
the listed stratum). The other stratum (called the zerolisted stratum when x = 0) may be sampled at a lower
rate or may not be sampled at all because it contains so
few residential telephone numbers. Commercial firms
developed systems to implement this method of sampling inexpensively. When research showed that
excluding the zero-listed stratum resulted in minimal
noncoverage bias, list-assisted sampling from only the
listed stratum became the standard approach.
The list-assisted method is an equal probability
sampling method that avoids both the operational
and statistical disadvantages of the MW method. The
calling efficiency of the list-assisted technique is
approximately equal to that of the MW method. The
efficiency has decreased over time as there has been
less clustering of residential numbers in 100-banks,
but the reduced clustering also affects the MW
678
Random Error
See also Add-a-Digit Sampling; Cluster Sample; ListAssisted Sampling; Mitofsky-Waksberg Sampling;
Number Portability; Telephone Surveys; Voice over
Internet Protocol (VoIP) and the Virtual ComputerAssisted Telephone Interviewing (CATI) Facility
Further Readings
RANDOM ERROR
Random error refers to the fact that any survey measure taken over and over again may well be different
(by some small amount) upon each measure, merely
due to chance measurement imprecision. Thus, compared to the true value of the measure for a given
respondent, the observed value will be on the high
side some of the time and on the low side other times.
This deviation of the observed value from the true
value often is signified by e in the following formula:
X = T + e,
where X is the observed value and T is the true value.
In theory, over many similar measures of the same
variable taken from the same respondent, the average
of the observed values will equal the true value. That
is, the random error in measuring the variable of interest will deviate from the true value so that the sum of
Randomized Response
RANDOMIZED RESPONSE
Researchers who study sensitive topics are often confronted with a higher refusal rate and often obtain more
socially desirable answers. To tackle these problems,
Stanley L. Warner introduced the randomized response
technique (RRT). This is an interview method that
guarantees total privacy and therefore, in theory, can
overcome the reluctance of respondents to reveal sensitive or probably harmful information. Warners original
method used a randomization device (usually colored
beads, coins, or dice) to direct respondents to answer
one out of two statements, such as:
A: I am a communist. (A: selected with probability p)
B: I am not a communist. (not-A: selected with probability 1 p)
Without revealing to the interviewer which statement was selected by the dice, the respondent answers
true or not true according to whether or not he or she
679
Other Randomized
Response Techniques
Since Warners original idea, many adaptations and
refinements have been developed. First, methods were
developed to improve the power of the design. As
previously noted, compared to a direct question, Warners method typically needs much larger samples.
Variations on Warners original method have been
developed that have a larger statistical power. These
methods include the unrelated question technique and
the forced response technique. In addition, randomized response methods have been developed that are
easier for the respondent, such as Kuks card method
and the item count technique.
680
Randomized Response
The unrelated question technique uses a randomization device to direct the respondent to one of two
unrelated questions, the sensitive question and an
innocuous question, for example:
Analysis of Randomized
Response Data
For all randomized response techniques, straightforward analysis techniques exist to estimate the prevalence of the sensitive behavior and the corresponding
sampling variance. For all techniques except the item
count technique, it is also possible to use a modified
Random Order
681
RANDOM ORDER
Random order refers to the randomization of the order
in which questions appear in a questionnaire. The purpose is to overcome a type of measurement error
known as context effects. This randomization is most
often done using a computer program that controls
a computer-assisted interview (CAI) being conducted
in person, over the phone, or self-administered. Prior to
the common use of CAI in survey research, sets of
questions were randomized using modified Kish tables
that were generated prior to the beginning of the survey
and printed on labels indicating to the interviewers
what order the questions were to be asked. The order
changed randomly for each label to be printed. The
682
Random Sampling
Terminological Considerations
Some authors, such as William G. Cochran, use the
term random sampling to refer specifically to simple
random sampling. Other texts use the term random
sampling to describe the broader class of probability
sampling. For this reason, authors such as Leslie
Kish generally avoid the term random sampling. In
this entry, random sampling is used in the latter context, referring to the broader class of probability
sampling.
Critical Elements
The two critical elements of random sampling are randomness and known probabilities of selection.
Further Readings
RANDOM SAMPLING
Random sampling refers to a variety of selection
techniques in which sample members are selected by
chance, but with a known probability of selection.
Most social science, business, and agricultural surveys
rely on random sampling techniques for the selection
of survey participants or sample units, where the sample units may be persons, establishments, land points,
or other units for analysis. Random sampling is a critical element to the overall survey research design.
This entry first addresses some terminological considerations. Second, it discusses two main components of random sampling: randomness and known
probabilities of selection. Third, it briefly describes
specific types of random samples, including simple
random sampling (with and without replacement),
systematic sampling, and stratification, with mention
Randomness
Random Sampling
Sample Designs
Simple Random Sampling
683
In practice, simple random sampling and systematic sampling are rarely used alone for large surveys,
but are often used in combination with other design
features that make for a more complex design. Complex random sampling designs tend to have smaller
sampling error or lower cost, or both. The complex
random sampling designs may incorporate elements
that resemble purposive sampling, such as stratification. Double sampling, cluster sampling, multi-stage
sampling, and sampling with probability proportional
to size are other examples of complex probability or
random sampling.
Stratified Sampling
Stratification involves dividing the target population into groupings, or strata, of relatively homogeneous members and selecting a random sample
independently within each stratum. For example, a list
of individuals may be divided into strata based on age
and gender, or the population of businesses may be
divided into strata based on geography, size, and
industry. The variables for stratification need to be
available for all population members and presumably
related to the survey response variables or the propensity to respond. Because members within a stratum
tend to be more alike, the emphasis is on selecting
684
Random Sampling
Inference
The survey design includes both the sampling technique and the corresponding estimators for inferences.
With scientific random sampling and known probabilities, mathematical formulas exist for making inferences about the target population and for estimating
the sampling error attributed to the inferences. Confidence intervals, tests of hypotheses, and other statistics are possible with random sampling and estimates
of sampling error. While an expert may judiciously
select a sample that is a good representation of the
target population on some measure, a purposive sample of this sort cannot, by itself, be used to estimate
the precision of the sample-based estimates because
no such mathematical formulas are possible. Neither
can the sampling error be estimated from a convenience sample.
Under simple random sampling, the distribution of
the sample mean often approximates the normal distribution, where the variance decreases with sample
size. That is, the sample mean is a good estimator for
the population mean, and the error associated with the
sample-based estimate is smaller for larger samples.
This result is based on the central limit theorem.
Alternatively, in some circumstances, especially
when sample sizes are small, the distribution of
sample statistics may approximate Poisson, hypergeometric, or other distributions. The approximate
distribution is what enables the researcher to make
inferences about the population based on sample
estimates.
Complex probability designs may have more complex mathematical forms for statistical inference and
may require specialized software to properly handle
the estimation. Some methods of variance estimation
developed for complex designs may be less stable in
some circumstances, an issue to be aware of as the
field moves toward smaller samples and estimation
using subsets of the sample. Nevertheless, inference is
Further Readings
Random Start
RANDOM START
The term random start has two separate meanings in
survey researchone related to questionnaire item
order and one related to sampling. This entry discusses both meanings.
Random Start in
Questionnaire Construction
In terms of questionnaire item order, a random start
means that a series of similar items (such as a set of
agreedisagree questions) is administered in a way that
the first question in the series that is asked of any one
respondent is randomly chosen to start the series,
and then the order of the subsequent questions is not
randomized. For example, if Q23 through Q28 (six
items in all) made up a set of attitude questions that use
an agreedisagree response scale, by using a random
start a researcher will have one sixth of the respondents
asked Q23 first, one sixth asked Q24 first, one sixth
Q25 first, and so on. All respondents are asked all the
questions in the series, and each question follows its
numerical predecessor, except when it is the first item
asked in the series; for example, Q26 always follows
Q25, except when Q26 is the first question asked (then
Q25 is asked last as the random start order of the set
would be Q26-Q27-Q28-Q23-Q24-Q25). The purpose
of using a random start is to help control for possible
item order effects, so that not all respondents will be
asked the questions in the exact same order.
A random start set of questions differs from a random order set of questions in that the latter randomizes all questions within the set in all possible ways
and then randomly assigns a respondent to one of the
possible randomized orders. Random start questions
are more readily analyzed and interpreted than random order questions, especially when there are more
than a few questions in the set. Thus, for example,
even with a set of 10 items using a random start
deployment, a researcher with an overall sample size
of 1,000 would have 100 respondents for each of
the 10 random start orders in which the items would
be asked. That would allow the researcher to investigate for order effects in how the random start series
of questions were answered. In the case of a set of 10
items that were asked in a true random order, the permutations would be so great that the researcher could
685
686
To provide a concrete illustration of how RSS is conducted, we consider the setting where our goal is to
using the
estimate the unknown population mean, X,
information in n measured observations. RSS takes
advantage of available information from additional
potential sample units to enable us to measure
selected units that are, collectively, more representative of the population of interest. The net result of
RSS is a set of measurements that are more likely to
span the range of values in the population than can be
guaranteed from SRS. Following is a more precise
description of this process.
Suppose we wish to obtain a ranked-set sample of
k measured observations. First, an initial simple random sample of k units from the population is selected
and rank-ordered on the attribute of interest. This
ranking can result from a variety of mechanisms,
including expert opinion (called judgment ranking),
visual comparisons, or the use of easy-to-obtain auxiliary variables; it cannot, however, involve actual measurements of the attribute of interest on the sample
units. The unit that is judged to be the smallest in this
ranking is included as the first item in the ranked-set
sample, and the attribute of interest is formally measured on this unit.
Denote the measurement obtained from the smallest item in the rank ordered set of k units by x*1 .
The remaining k 1 unmeasured units in the first random sample are not considered further in the selection
of this ranked-set sample or eventual inference about
the population. The sole purpose of these other k 1
units is to help select an item for measurement that
represents the smaller values in the population.
Next, a second independent random sample of size
k is selected from the population and judgment ranked
without formal measurement on the attribute of interest. This time we select the item judged to be the second smallest of the k units in this second random
sample and include it in our ranked-set sample for
measurement of the attribute of interest. This second
measured observation is denoted by x*2 .
From a third independent random sample we select
the unit judgment ranked to be the third smallest, x*3 ,
for measurement and inclusion in the ranked-set sample. This process is continued until we have selected
the unit judgment ranked to be the largest of the k units
in the kth random sample, denoted by x*k , for measurement and inclusion in the ranked-set sample.
The entire process described above is referred to as
a cycle, and the number of observations in each random sample, k, is called the set size. Thus, to complete a single ranked set cycle, we judgment rank k
independent random samples of size k each, involving
a total of k2 sample units to obtain k measured observations x*1 , x*2 , . . . , x*k . These k observations represent a balanced ranked-set sample with set size k,
where balanced refers to the fact that we have collected one judgment order statistic for each of the
ranks I = 1, . . . , k. To obtain a ranked-set sample with
a desired total number of measured observations
n = km, we repeat the entire cycle process m independent times, yielding the data x*1j , . . . , x*kj , for
j = 1, . . . , m. The RSS estimator of the population
mean X is
xRSS =
1 Xm Xk
x* :
i = 1 ij
j=1
km
687
688
Ranking
Further Readings
RANKING
There are a variety of formats that can be used in asking survey questions, from items that require a simple
yes or no response to other types of forced-choice
items, rating scales, and multi-part items in which
respondents opinions are determined through a series
of questions. Ranking is a question response format
used when a researcher is interested in establishing
some type of priority among a set of objects, whether
they be policies, attributes, organizations, individuals,
or some other topic or property of interest.
For example, if city officials were interested in
whether citizens thought it was most important to
improve city services in the area of police protection,
fire protection, garbage and trash collection, road
maintenance and repair, or parks and recreation, one
method they could use would be to use an open-ended
question that asked respondents, What service that the
city provides do you think is most important to
improve in the next 12 months? The percentage of
respondents that mentioned each of these services
could then be used as a measure of priority for service
improvement. Another method for determining this
would be to ask a series of rating questions in which
respondents were asked whether it was very important, somewhat important, not too important, or not at
all important for the city to improve services in each
of these five areas, and the option that received the
highest percentage of very important responses or
had the highest average rating would have priority for
service improvement. A third way to accomplish this
would be through a ranking question, for example:
The city provides a number of services, including
police protection, fire protection, garbage and trash
collection, road maintenance and repair, or parks
and recreation. Of these, which do you think is most
important for the city to improve in the next 12
months? After respondents selected their most
important service, they would then be asked, And
which service is next most important for the city to
Rare Populations
689
RARE POPULATIONS
A rare population is generally defined as a small proportion of a total population that possesses one or
more specific characteristics. Examples include billionaires; people with a certain illness, such as gall
bladder cancer; or employees in a highly technical
occupation. Although the literature offers no precise
definition of rare or small in this context, researchers
have proposed proportions of .10 or less to identify
rare populations. (When this proportion is larger,
standard sampling techniques can usually be used
efficiently.) In addition, sampling frames are often
nonexistent or incomplete for most rare populations.
Although researchers can use convenience sampling
(e.g., snowball samples) to study rare populations,
most efforts in this area have focused on probability
sampling of rare populations. The costs and benefits
of the various approaches can be difficult to define
a priori and depend on the type and size of the rare
population.
690
Rare Populations
Sampling Strategies
Generally, sampling frames for the total population
do not contain information identifying members of
the rare population; if they do, then the sampling process is trivial. If not, then screening must be used to
identify members of the group. With screening, members of the rare population are identified at the beginning of the interview process. However, the costs of
screening can often exceed the costs of interviewing,
especially if the rare population is only a very small
proportion of the sampling frame.
There are several ways in which screening costs
can be reduced. Mail questionnaires can be used
to identify members of the rare population if the
sampling frame has correct address information; for
in-person interviews, the use of telephone screening
can reduce interviewer costs. If the rare population is
geographically clustered, for example, in certain
states or urban areas, screening based on clusters
becomes less costly.
Besides cost, another potential drawback to the
screening approach is response errors of commission and omission (i.e., false positives and false negatives, respectively) during the screening process,
especially if many different questions must be correctly answered in order to identify a member of the
rare population. Using less stringent criteria during
screening to identify members is one approach to
reducing response errors, because misclassified members (false positives) can be excluded after the full
interview is complete. The main problem here is
devising a screening process that avoids erroneously
classifying members of the rare population as nonmembers (false negatives).
Whereas for many rare populations it is impossible
to derive a complete sampling frame, one or more
incomplete lists may be available. For example, hospital or pharmacy records may identify some, but not
all, members of a population with a specific rare disease. A dual frame approach could then be used, in
which the partial list is combined with screening of
the total population to reduce screening costs. Alternatively, the partial list could be used with a cluster
sampling approach to identify areas where members
of the rare population are located.
In some situations multiplicity or network sampling can be used to locate and interview members
of a rare population. Typically, a member of a household is interviewed and queried as to whether other
Rating
for adaptive samples that allow generalization. Generally, these methods depend on assumptions about the
probability distribution generating the sample, although
recent work has attempted to relax this restriction.
Stephen R. Porter
See also Adaptive Sampling; Cluster Sample; Convenience
Sampling; Dual-Frame Sampling; Errors of Commission;
Errors of Omission; Multiplicity Sampling; Network
Sampling; Screening; Snowball Sampling
Further Readings
RATING
In constructing a survey questionnaire there are a number of response formats that can be used in asking
respondents about their views on various topics. One
of the most commonly used formats in survey questionnaires is the rating, or rating scale.
As the name implies, ratings are regularly used in
surveys to evaluate an object along some dimension,
such as performance or satisfaction. In a typical rating
question, respondents are asked to make judgments
along a scale varying between two extremes, such as
from very good to very poor, or extremely positive to extremely negative, and the like. The following illustrates a common rating scale question:
How would you rate the job (PERSONs NAME) is
doing as governor . . . Would you say she is doing an
excellent job, a good job, a fair job, a poor job, or
691
692
Ratio Measure
Further Readings
RATIO MEASURE
Ratio measure refers to the highest (most complex)
level of measurement that a variable can possess. The
properties of a variable that is a ratio measure are the
following: (a) Each value can be treated as a unique
category (as in a nominal measure); (b) different values
have order of magnitude, such as greater than or less
than or equal to (as in an ordinal measure); (c) basic
mathematical procedures can be conducted with the
values, such as addition and division (as with an interval measure); and (d) the variable can take on the value
of zero. An example of a ratio measure is someones
annual income.
A ratio measure may be expressed as either a fraction or percentage; in addition, a ratio measure may
be written as two numbers separated by a colon. For
example, if Susan earns $40 per hour and John earns
$20 per hour, then the fraction of Susans pay that
John earns can be expressed as 2/4, the percentage of
Susans pay that John earns as 50%, and the ratio of
Johns pay to Susans as 1:2.
There are many situations in which a ratio measure
is more appropriate than a total or mean or other
descriptive statistic when reporting the results of a survey. Ratio measures are utilized in the business world,
in health care, in banking, and in government, as well
as in other applications.
In the business world, there are a number of applications involving ratio measures. For example, suppose that one purpose of a survey of a sample of
businesses is to assess liquidity. A commonly used
measure of liquidity is the current ratio measure; this
ratio measure is an important indicator of whether the
company can pay its bills and remain in business. It is
calculated by dividing a companys total current
assets by total current liabilities for the most recently
reported quarter. Thus, if a business has total current
assets of $450,000 and total current liabilities of
$200,000, then its liquidity ratio measure would be
$450,000/$200,000 = 2.25. However, in the sample,
a number of businesses are sampled and each reports
its total current assets and total current liabilities. To
determine the liquidity of the sample of businesses,
one would calculate the total of the reported assets of
all of the sampled businesses and divide by the total
of the reported liabilities of the same businesses. That
is, if the total assets of the sample equal $10 billion
and the total liabilities equal $5 billion, then the cost
Raw Data
Further Readings
RAW DATA
The term raw data is used most commonly to refer
to information that is gathered for a research study
before that information has been transformed or analyzed in any way. The term can apply to the data as
soon as they are gathered or after they have been
cleaned, but not in any way further transformed or
analyzed. The challenge for most researchers who
collect and analyze data is to extract useful information from the raw data they start with.
First, it is important to note that data is a plural
noun (datum is the singular noun). Second, all collections of raw data are, by definition, incomplete. Some
data are uncollected because of a variety of constraints.
Lack of time and money are common reasons researchers fail to collect all available raw data to answer
a given research question. Increasingly, however, data
collection has become easier and more efficient in
many ways, and more and more of the data that are
intended to be collected are collected. (Of note,
a counter trend in survey research has led to a decrease
in the amount of intended data that are able to be collected. This is the trend for sampled respondents to
be more reluctant to cooperate with survey requests.)
Absent the ability to process this additional raw data,
merely having more data may not increase knowledge,
but enhanced computing power has produced an ability
to process more data at little or no additional cost.
Third, the expression raw data implies a level of
processing that suggests that these data cannot provide useful information without further effort. Raw
suggests the data have not yet been summarized or
693
694
Reactivity
REACTIVITY
Reactivity occurs when the subject of the study (e.g.,
survey respondent) is affected either by the instruments
Recency Effect
Further Readings
RECENCY EFFECT
A recency effect is one type of response order effect,
whereby the order in which response options are
offered to respondents affects the distribution of
responses. Recency effects occur when response
options are more likely to be chosen when presented
at the end of a list of response options than when presented at the beginning. In contrast, primacy effects
occur when response options are more likely to be
chosen when presented at the beginning of a list of
response options than when presented at the end.
Response order effects are typically measured by presenting response options in different orders to different groups of respondents. For example, if half of the
respondents in a survey are asked, Which of the following is the most important problem facing the country today: the economy or lack of morality? and the
other half are asked, Which of the following is the
most important problem facing the country today:
695
696
Recoded Variable
Further Readings
RECODED VARIABLE
Analysis with survey research data often first requires
recoding of variables. Recode is a term used to
describe the process of making changes to the values
of a variable. Rarely can researchers proceed directly
to data analysis after receiving or compiling a raw
data set. The values of the variables to be used in the
analysis usually have to be changed first. The reasons
for recoding a variable are many. There may be errors
in the original coding of the data that must be corrected. A frequency distribution run or descriptive
Record Check
Further Readings
RECONTACT
The term recontact has more than one meaning in
survey research. It is used to refer to the multiple
attempts that often must be made to make contact
with a sampled element, such as a household or person, in order to gain cooperation and gather data and
achieve good response rates. Recontacting hard-toreach respondents is the most effective way to reduce
survey nonresponse that is due to noncontact. This
holds whether the mode of contact is via multiple
mailings, multiple emails, multiple telephone calls, or
multiple in-person visits.
The other meaning of recontact in survey research
refers to the efforts that are made to validate completed
interviews soon after they have been completed,
by having a supervisory staff member recontact the
respondent to validate that the interview was in fact
completed and that it was completed accurately. These
recontacts often are conducted via telephone even
when the original interview was completed by an interviewer face-to-face with the respondent. Recontacts of
this type are reasoned to reduce the likelihood of interviewer falsification, assuming that interviewers know
that such quality assurance efforts are carried out by
survey management.
Paul J. Lavrakas
See also Falsification; Noncontacts; Nonresponse; Quality
Control; Response Rates; Validation
697
Further Readings
RECORD CHECK
A record check study is one in which the validity of
survey self-reports is assessed by comparing them to
evidence in organizational records. Record check
studies have been employed to evaluate the quality of
data from existing surveys and also to test the relative
quality of data from different prospective survey
designs. Such investigations are often viewed as the
gold standard for judging the validity of self-report
information on behavior and experiences.
Record checks are infrequently conducted because
of the limited availability of record information, as well
as the cost and effort of obtaining and matching record
and survey data for the same individuals. Although
there are many potential sources of records, in practice
it can be difficult to gain access to information that
gives insight into the quality of survey data. The cooperation of record-holding organizations is required and,
unfortunately, often hard to obtain. Records must be
collected and preserved in a manner that permits investigators to use the information to make valid inferences. Organizations assemble records for their own
aims, not for the purpose of validating survey data.
Thus, information that survey investigators deem
important may be missing or poorly maintained in
records. Records may be organized in such a way that
they require substantial transformation before they can
be employed in a record check study. Finally, records
are not kept on many things about which survey investigators are interested; record check studies are therefore confined to those matters in which human
behavior comes in contact with bureaucratic systems.
Other important behaviors, as well as attitudes, perceptions, and other mental constructs cannot be examined
through the record check technique.
These points notwithstanding, it is clear that evidence from record check studies has been influential in
understanding reporting errors and in designing important surveys. Record checks have been conducted in
a number of contextsfor example, health, labor, voting, and victimization. In research done in connection
with the National Health Interview Survey, medical
698
Record Check
records were employed to check the quality of reporting of doctor visits, hospitalizations, and morbidity.
Records were obtained from physicians and hospitals
detailing the date of medical encounters, along with
the reasons for the visits. Respondents were then interviewed about their health experiences, and the survey
reports were matched against the records. Comparisons
revealed, for example, whether respondents reported
a recorded medical contact or not. A result of one of
these studies was that the reference period for the question on doctor visits was set at 2 weeks before the
interview because comparison of survey reports with
visit records showed that accuracy of recall was
sharply lower for longer reference periods.
Another important series of record checks involved
the American National Election Study. For several
post-election waves of this biennial survey, respondent
reports of voting were checked against public voting
records. In this case, respondents were asked about
their voting first and then interviewers attempted to
verify the reports by examining voting records in
county clerk offices. These comparisons suggested that
there is an overreporting bias in self-reported voting.
More people said that they voted than actually did.
Since these studies were conducted, there have been
many papers written on the possible causes and consequences of vote overreporting. In addition, the American National Election Study question used to measure
voting was modified in an attempt to reduce the
hypothesized social desirability bias attached to selfreport of voting. Finally, there have been investigations
of whether statistical models predicting voting differed
in their predictive power if the analysis was confined
to only valid votesthose reported that matched the
official recordsor also included the invalid votes.
The findings suggest that the models performed equivalently in these cases.
In the development and the redesign of the National
Crime Victimization Survey, police records were
employed to validate reports of victimization. People
who had reported crimes to the police were interviewed and asked if they had been victimized. The
comparison of record information and self-reports suggested that victimization is underreported, particularly
(as in the case of medical events) as the time between
the police report and the survey interview increased
and for less serious crimes. These findings influenced
the designers of the National Crime Victimization Survey to set a 6-month reference period for asking questions about victimization and to employ a variety of
Reference Period
REFERENCE PERIOD
The reference period is the time frame for which
survey respondents are asked to report activities or
experiences of interest. Many surveys intend to measure frequencies of events or instances within a given
period of time; for example, How many times did you
consult a medical practitioner during the last two
months? or Think about the 2 weeks ending yesterdayhave you cut down on any of the things you usually do about the house, at work, or in your free time
because of illness or injury? Most of the time, the reference period starts at some point in the past and ends
699
700
Refusal
2. Besides recall loss and telescoping, lengthy reference periods have additional disadvantages because
they increase the time until results can be reported. For
example, in an ongoing panel study on crime victimization, a 12-month reference period requires a longer
period of time between the first occurrence of a given
victimization and the availability of the report.
REFUSAL
3. When it comes to answering a frequency question, respondents make use of at least two different
strategies to generate a reasonable response: They
either recall every single event that has occurred in the
reference period and count the number of instances, or
they estimate the number of instances using various
estimation strategies (rate-based estimation, guessing)depending on the characteristic of the event in
question (similar vs. dissimilar, regular vs. irregular).
Generally speaking, recall and count is considered
to be advantageous compared to estimating in terms
of the validity of the response. Even though the literature offers a great variety of findings, it is safe to
assume that with longer reference periods, the proportion of respondents who estimate the number of events
increases.
Considering these implications, survey researchers
must balance the size of the variance (and cost)
against the biases and disadvantages associated with
a lengthy reference period.
Marek Fuchs
See also Bounding; Respondent-Related Error; Telescoping
Further Readings
Refusal
701
702
Refusal Avoidance
Further Readings
REFUSAL AVOIDANCE
Because refusals make up a large portion of survey
nonresponse, researchers want to minimize their
occurrence as much as possible. Refusal avoidance is
the researchers awareness of, and efforts to eliminate
or mitigate, factors that influence potential respondents toward refusing an invitation to participate in
a survey. The relevance of concerted efforts to lower
refusal rates is that the researcher is trying to reduce
survey error through improvement in response rates,
REFUSAL AVOIDANCE
TRAINING (RAT)
Research interviewers have the difficult task of
obtaining the cooperation of respondents. Successful
703
704
Refusal Conversion
effectively developed and administered, refusal avoidance training may substantially increase an interviewers toolbox of possible effective responses
to stated concerns from reluctant respondents and
ingrain quick responses.
Charles D. Shuttles
See also Interviewer Training; Refusal; Refusal Avoidance;
Tailoring
Further Readings
REFUSAL CONVERSION
Refusal conversions are the procedures that survey
researchers use to gain cooperation from a sampled
respondent who has refused an initial survey request.
Refusal conversion may include different versions of
the survey introductions and other written scripts or
materials (e.g., cover letters), study contact rules,
incentives, and interviewer characteristics and training. This is a common procedure for many surveys,
but it requires careful consideration of the details of
the refusal conversion efforts and the potential costs
versus the potential benefits of the effort.
The goal of converting initial refusals is to raise
the survey response rate, under the assumption that
this may lower the potential for refusal-related unit
nonresponse error. The research literature contains
reports of successfully converting refusals in telephone surveys between 5% and 40% of the time.
There is little reported in the research literature about
whether refusal conversions efforts are effective in
reducing nonresponse bias.
Gaining cooperation from a potential respondent
during initial contacts is much more effective than
attempts to convert that respondent after an initial
refusal has been encountered. Thus, all researchers
Refusal Conversion
should pay attention to survey design and administration to effectively maximize cooperation and minimize refusals for initial contacts. However, despite
researchers best efforts to avoid refusals, they will
still occur; thus, refusal conversion is an avenue to
gain some amount of completed questionnaires from
initial refusers. The basic procedures used to carry out
the administration of the survey may need to be modified during refusal conversion attempts.
Refusal conversions often are attempted in surveys
that are conducted via mail or the Internet. In these
survey modes, refusal conversion essentially is part
and parcel of the larger process of recontacting
respondents who have not yet returned a completed
questionnaire under the assumption that some portion
of them initially have decided to refuse to do so.
Procedures for doing these follow-up contacts of initial nonresponders in mail and Internet surveys are
described elsewhere in this encyclopedia and will not
be discussed here in any detail. Instead, the remainder
of this entry focuses mostly on refusal conversion as
applied in face-to-face and telephone interviewing.
705
Approaches
There are two basic ways to structure the refusal conversion process: (1) ignoring the fact that a respondent
has refused an initial survey request and using the
exact same survey request approach, or (2) implementing a revised or different approach when trying to convert the initial refusal. The first approach relies upon
the hope that the timing will be more opportune or the
respondents inclination to participate will be more
favorable when recontacted than at the time of the first
contact. When using this approach, the researcher
should modify survey procedures by designing recontact rules that use as much lag time (i.e., amount of
time between the initial refusal attempt and the subsequent refusal conversion attempt) as possible. If the
field period allows, lag time of 5 to 14 days has been
found to increase the likelihood of encountering a different respondent (in household or organizational surveys) or reaching the original respondent at a more
convenient or favorable time.
Many survey organizations use the second approach
of doing a more extensive modification of survey procedures for subsequent contacts to a respondent after the
initial refusal. For mail or Internet mode written survey
requests, this may mean using different text in cover letters that acknowledges the previous attempt, addresses
706
Refusal Conversion
Recontact Determination
Deciding which initially refusing cases to try to convert is also an important part of the refusal conversation process. In almost all interviewer-administered
surveys, the interviewer should never attempt to convert all initial refusers, because some of them will
have refused so vehemently or pointedly as to make
recontact wholly inappropriate. For example, it is generally agreed among survey professionals that any initial refuser who states something to the effect, Dont
ever contact me again! should not be subjected to
a refusal conversion attempt. The use of a refusal
report form can be very helpful in determining which
initial refusers to try to convert. This can be done
manually by supervisory personnel reviewing the
prior refusals, or if the data from the refusal report
form are entered into a computer, an algorithm can be
devised to select those initial refusers who should
be tried again and select out those that should not be
contacted again.
Refusal Rate
707
Further Readings
REFUSAL RATE
The refusal rate is the proportion of all potentially
eligible sample cases that declined the request to
be interviewed. Before calculating the refusal rate,
researchers must make some decisions about how to
handle the types of nonresponse in the calculation.
708
RRFs capture structured information about all individual refusals that most often is lost when interviewers
are given only informal instructions to write down
notes about the refusals if they think it is appropriate.
RRFs are used in mediated interviewsthat is,
those that are conducted by interviewers, either in person or on the phone. Information about the refusal is
recorded based on estimates that interviewers make
either visually (for in-person) or audibly (for telephone and in-person).
There is no standardized format for an RRF. (Paul J.
Lavrakas appears to have been the first to describe the
use and value of such a form and to show an example
of it in a book on telephone survey methods.) Researchers develop RRFs with variables that are most relevant
to the study and that are reasonable to be estimated by
interviewers given the situation. For example, in an inperson survey, an interviewer may be able to provide
information about the type of home, neighborhood setting, and so forth. Telephone interviewers could not
begin to estimate those details from brief phone conversation but past research has shown that they can provide
accurate estimates of certain demographic characteristics for the person being spoken to.
In any case, the RRF usually tries to capture two
types of information, linked to the two benefits specified previously. Demographic information about the
refuser (e.g., gender, age, race) and details about the
context of the refusal (e.g., strength of the refusal, reasons given for refusal, perceived barriers to participating, etc.) may help interviewers, in future attempts, to
convert that refuser into a cooperating respondent.
Given that the vast majority of refusals in telephone
surveys typically occur in the first few seconds of the
interviewerrespondent interaction, the interviewer has
very little time to develop rapport, anticipate potential
barriers, and alleviate respondent objections. Although
there has been little published about the use of RRFs,
research by Sandra L. Bauman, Daniel M. Merkle, and
Paul J. Lavrakas suggests that telephone interviewers
can accurately make estimates of gender, race, and age
in a majority of cases, even when the interactions are
brief. These estimates can be used to help determine
the presence of nonresponse bias.
Interviewers are also able to provide details about
the refusal that can help survey management determine
Further Readings
REGISTRATION-BASED
SAMPLING (RBS)
Registration-based sampling (RBS) is a sampling
frame and sampling technique that has been used,
with growing frequency in the past two decades, for
conducting election polls. RBS frames for a given
geopolitical area can be built by researchers using
709
710
Regression Analysis
Further Readings
REGRESSION ANALYSIS
Regression analysis is the blanket name for a family
of data analysis techniques that examine relationships between variables. The techniques allow survey
researchers to answer questions about associations
between different variables of interest. For example,
how much do political party identification and Internet usage affect the likelihood of voting for a particular candidate? Or how much do education-related
variables (e.g., grade point average, intrinsic motivation, classes taken, and school quality) and demographic variables (e.g., age, gender, race, and family
income) affect standardized test performance? Regression allows surveyors to simultaneously look at the
influence of several independent variables on a dependent variable. In other words, instead of having to calculate separate tables or tests to determine the effect
of demographic and educational variables on test
Regression Analysis
711
712
Reinterview
REINTERVIEW
A reinterview occurs when an original respondent is
recontacted by someone from a survey organization
usually not the original interviewerand some or all
of the original questions are asked again. Reinterviewing can serve more than one purpose in survey
research, including (a) verifying that interviews were
actually completed as the researchers intended with
sampled respondents, (b) checking on the reliability
of the data that respondents provided when they were
originally interviewed, and (c) further studying the
variance of survey responses.
As part of quality assurance efforts to monitor the
quality of survey data collection when the questionnaire is interviewer-administered, some part of the
recontacts made to selected respondents may include
asking some of the original questions again, especially
the questions that the researchers deem as key ones.
Although there are many reasons why a respondent
may not provide the exact same answer during the
reinterview, including some legitimate reasons, the
purpose of re-asking some of the questions is not to
Further Readings
Reliability
RELATIVE FREQUENCY
Relative frequency refers to the percentage or proportion of times that a given value occurs within a set of
numbers, such as in the data recorded for a variable in
a survey data set. In the following example of a distribution of 10 values1, 2, 2, 3, 5, 5, 7, 8, 8, 8while
the absolute frequency of the value, 8, is 3, the relative frequency is 30% as the value, 8, makes up 3 of
the 10 values. In this example, if the source of the
data has a wider range of possible scores than the
observed values (such as a 010 survey scale), then it
is permissible to report that some of possible values
(e.g., 0, 4, 6, 9, and 10) were not observed in this set
of data and that their respective relative frequencies
were zero (i.e., 0%).
In survey research, the relative frequency is a much
more meaningful number than is the absolute frequency.
For example, in a news article using results from a poll
of 800 citizens, it is more meaningful to know that
approximately two thirds of them (67.5%) are dissatisfied with the job the president is doing than to know that
540 citizens who were polled think this way.
Relative frequency can be displayed in a frequency
tablewhich displays each value in a distribution
ordered from lowest to highestalong with the absolute and cumulative frequencies associated with each
value. Relative frequency also can be displayed graphically in a bar graph (histogram) or pie chart.
Paul J. Lavrakas
See also Frequency Distribution; Percentage Frequency
Distribution
713
RELIABILITY
The word reliability has at least four different meanings. The first of these is in an engineering context,
where reliability refers to the likelihood that a piece
of equipment will not break down within some specified period of time. The second meaning is synonymous with dependability, as in She is a very reliable
employee (i.e., she is a good worker). The third has
to do with the sampling variability of a statistic. A
percentage, for example, is said to be reliable if it
714
appeared a variety of other approaches to the reliability of measuring instruments. One of these is based
upon generalizability theory; another is based upon
item-response theory; a third is based upon structural
equation modeling.
Thomas R. Knapp
See also Intercoder Reliability; Item Response Theory;
TestRetest Reliability; Validity
Further Readings
715
716
Replacement
Var
y
t+1
In a repeated cross-sectional design with independent samples, at each occasion within the 3-month
period the covariances are all zero. If there is sample
overlap, the covariance usually become positive and
therefore increases the sampling variance. Simple
moving averages are crude filters, and this feature
affects trend estimates calculated by using filters. It
would be better to use independent samples if averages over time are the main estimates of interest.
David Steel
See also Attrition; Composite Estimation; Cross-Sectional
Survey Design; Longitudinal Studies; Panel Conditioning;
Panel Surveys; Respondent Burden; Rolling Averages;
Rotating Panel Design; Sampling Variance; Trend
Analysis; Wave
Further Readings
REPLACEMENT
Replacement is a term used in two different contexts in
surveys. With-replacement sampling refers to methods
of sampling in which the unit selected in a particular
draw is returned to the finite population and can be
selected in other draws. The other context refers to the
substitution (replacement) of a sampled unit with
another unit as a result of difficulties in contacting or
obtaining cooperation. The entry discusses both withreplacement sampling and replacement substitution.
With-Replacement Sampling
One approach to selecting an equal probability sample
of n units from a finite population of N units is to
draw one unit randomly from the N and then independently draw subsequent units until all n are selected.
If the selected unit can be sampled more than once,
then the method is sampling with replacement. This
particular form of sampling is called multinomial
sampling. There are many other ways of drawing
a sample with replacement. For example, suppose the
multinomial sampling procedure is used, but each unit
is assigned its own probability of selection. If all the
assigned probabilities of selection are not the same,
then this is a simple way to draw a with-replacement,
unequal probability sample.
When sampling with replacement, theoretically the
same unit should be interviewed independently the
number of times it is selected. Because this is operationally infeasible in most cases, the unit is interviewed once and the sampling weights are adjusted to
account for the number of times it was sampled.
In practice, with-replacement sampling is not used
frequently. However, a with-replacement sample does
have some very important advantages, especially in
the estimation of the precision of estimates in multistage samples. When a small fraction of the population
units is sampled, it is often convenient to assume withreplacement sampling has been used when estimating
variances. The variances of with-replacement and without-replacement samples are nearly equal when this
condition exists, so the computationally simpler withreplacement variance estimator can be used.
Substitution
When data cannot be collected for a sampled unit,
some surveys replace or substitute other units for the
sampled ones to achieve the desired sample size (e.g.,
in Nielsens television meter panels). Often, the substitutes are done during the field data collection; this
is called field substitution, and the substituted unit
may be called a reserve unit.
There are many different ways of selecting substitutes. Almost all substitution methods try to select the
substitute from a set of units that match the characteristics of the nonresponding unit in some way. Some
methods use probability mechanisms; for example, the
units in the population that match the nonresponding
unit are sampled with equal probability. Other methods
are not based on probability mechanisms; for example,
the interviewer is allowed to choose another household
in the same block of the nonresponding household.
Replacement substitution tries to deal with unit
nonresponse by replacing the nonresponding unit with
another unit. In this sense, it is the equivalent of an
imputation method. Just like imputation, substitution
makes it difficult to accurately assess the statistical
properties of estimates, such as their bias and variance. Substitute responses are typically treated as if
they were the responses of the originally sampled
units, which is usually not completely appropriate.
717
REPLICATE METHODS
FOR VARIANCE ESTIMATION
Replicate methods for variance estimation are commonly used in large sample surveys with many variables. The procedure uses estimators computed on
subsets of the sample, where subsets are selected in
a way that reflects the sampling variability. Replication variance estimation is an appealing alternative to
Taylor linearization variance estimation for nonlinear
functions. Replicate methods have the advantage of
transferring the complexity of variance estimation
from data set end users to the statistician working on
creating the output data set. By providing weights for
each subset of the sample, called replication weights,
end users can estimate the variance of a large variety
of nonlinear estimators using standard weighted sums.
Jackknife, balanced half-samples, and bootstrap methods are three main replication variance methods used
in sample surveys. The basic procedure for constructing the replication variance estimator is the same for
the three different methods. This entry describes the
form of replication variance estimators, compares the
three main approaches used in surveys, and ends with
an illustration of the jackknife method.
Description
Further Readings
Replication variance methods involve selecting subsets from the original sample. Subsets can be created
by removing units from the sample, as in the jackknife and balanced half-samples methods, or by
resampling from the sample, as in the bootstrap
method. A replicate of the estimator of interest is created for each subset. The replicate is typically constructed in the same way as the estimator of interest is
constructed for the entire sample. Replication variance estimators are made by comparing the squared
deviations of the replicates to the overall estimate.
Thus, the replication variance estimator has the form
L
P
ck ^k ^2 , where k identifies the replicate, L is
J. Michael Brick
See also Imputation; Nonresponse; Sampling Without
Replacement
k=1
718
Replication
Table 1
719
Stratum
1
Table 2
Element
Original Weight
21
10
42
21
21
21
21
16
14
42
21
21
21
15
15
15
22.5
22.5
15
14
13
15
15
22.5
22.5
15
15
15
22.5
22.5
k
T^y
k
T^x
k
R
ck T^y T^y 2
^ k R
^ 2
ck R
20/42
987
918
1.08
10,290
0.0022
20/42
693
750
0.92
10,290
0.0033
28/45
952.5
976.5
0.98
7,875
0.0006
28/45
682.5
706.5
0.97
15,435
0.0011
28/45
885
819
1.08
1,260
0.0034
4,200
4,170
45,150
0.0106
Replicate
Sum
REPLICATION
Replication is reanalysis of a study, building on a new
data set that was constructed and statistically analyzed
in the same way as the original work. Repeating the
statistical analysis on the original data set is known as
verification (or replication of the statistical analysis).
Replicability should be maximized in both quantitative and qualitative works, as replication studies and
verification of existing data sets may be extremely
useful in evaluating the robustness of the original
findings and in revealing new and interesting results.
Even if a given work will never actually be replicated, it still needs to be replicable, or else there is no
possibility of refuting its findings; that is, it fails
to hold the falsifiability criterion for scientific work.
Because replicability is an underlying principle in science, most disciplines in social sciences hold some
replication standards for publications, determining
720
Representative Sample
REPRESENTATIVE SAMPLE
A representative sample is one that has strong external
validity in relationship to the target population the sample is meant to represent. As such, the findings from
the survey can be generalized with confidence to the
population of interest. There are many factors that
affect the representativeness of a sample, but traditionally attention has been paid mostly to issues related to
sample design and coverage. More recently, concerns
have extended to issues related to nonresponse.
Determining Representativeness
When using a sample survey to make inferences about
the population from which the sampled elements were
drawn, researchers must judge whether the sample is
actually representative of the target population. The
best way of ensuring a representative sample is to (a)
have a complete list (i.e., sampling frame) of all elements in the population and know that each and every
element (e.g., people or households) on the list has
a nonzero chance (but not necessarily an equal
chance) of being included in the sample; (b) use random selection to draw elements from the sampling
frame into the sample; and (c) gather complete data
from each and every sampled element. In most sample surveys, only the goal of random selection of elements is met. Complete and up-to-date lists of the
populations of interest are rare. In addition, there are
sometimes elements in the target population with
a zero probability of selection. For example, in random-digit dialing telephone surveys, households without a telephone may belong to the population of
interest, but if they do, then they have a zero chance
Representative Sample
721
more white, and less ethnic than the general population of interest.
To correct for these biases, survey researchers
invoke a second way to deal with these problems, that
is, post-stratification. Post-stratification is the process
of weighting some of the respondents or households
in the responding sample relative to others so that the
characteristics of the responding sample are essentially equal to those of the target population for those
characteristics that can be controlled to census data
(e.g., age, race, ethnicity, sex, education, and geography). By invoking post-stratification adjustments, the
bias due to sample noncoverage and differential nonresponse theoretically is reduced.
The final correction in which researchers should
engage is to limit the inferential population of a survey
to those elements on the sampling frame with nonzero
probability of inclusion. For example, a careful and
conservative researcher who conducts a traditional
landline random-digit dialing survey of adults in
Georgia, would limit inferences to adults living in
households with landline telephones, who respond to
surveys, in the state of Georgia. In practice this is
rarely done because research sponsors typically want
the survey to be representative of all adults in a given
geopolitical area (e.g., Georgia), and too often empirical reports are written assuming (and implying) this is
the case.
Future Research
Striving for representative samples is key when conducting sample survey research. However, it is important that consumers of survey-based information
recognize that the standards for true representativeness are rarely met, but the biases produced by failures often are not severe enough to threaten the
ultimate value of the survey findings. Because of
these challenges, it is critical that research continue
into the problems of sample design flaws of popular
techniques (e.g., Internet surveys) and into the impact
of unit and item nonresponse on findings from the
survey.
Michael Edward Davern
See also Coverage Error; External Validity;
Inference; Nonprobability Sample; Nonresponse
Error; Population of Inference; Post-Stratification;
Probability Sample; Sampling Error; Sampling Frame;
Target Population
722
Further Readings
Research Versus
Nonresearch Call Centers
Nonresearch call centers fall into two main groups:
(1) predominantly outbound (such as a telemarketing
or debt collection center) and (2) inbound (such as
a customer assistance contact center for a bank or
a catalogue sales support operation).
The common denominator among all call centers,
including research call centers, is that there is a group
of staff (interviewers or agents) sitting in booths either
making or receiving calls. At this front stage of contact, there are rarely any formal educational requirements beyond high school reading ability; however,
a clear voice is a necessity. Because calling volumes
typically peak for only a small part of the day and
often vary over the course of a year, most positions
are part-time and often seasonal. As a result, many
call centers in the same geographical locale tend to
share the same labor pool, and the physical buildings
in which they operate and the furniture needed tend to
be very similar.
Technologically, there are a lot of similarities as
well. All need a telephone system that can support
many simultaneous calls and a computer system that
will track and store the outcomes of the calls (such as
completed questionnaires, completed applications for
credit, or queries made and resolutions offered). However, an outbound center will require more sophisticated
dialer equipment to place calls, whereas an inbound
center will require a more sophisticated automatic call
distributor and interactive voice response system to handle the queuing and directing of incoming calls to the
most appropriate agent (not always an interviewer).
It is in the processes and procedures that the differences become more pronounced. For example, in comparing an outbound research survey operation and an
outbound telemarketing operation, one of the main
objectives of a research center is a high response rate,
whereas for a telemarketing operation the overriding
objective is to obtain a high volume of sales. Essentially, this is the difference between survey quality and
telemarketing quantity, and this difference will play
out in many ways. A research operation will make
multiple calls to the same number following complex
calling rules and will spend much more time on that
one sample item. Shortcuts cannot be risked, the interviewers almost always will be paid by the hour rather
than by the complete interview, in part on the assumption that this will help ensure they conduct the entire
research task exactly as required (including gaining
cooperation from as many respondents as possible and
reading questions exactly as worded to minimize interviewer bias), and the dialer technology will be set to
a slower rate to allow the interviewer time to read the
call history notes from the previous call and to ensure
that if the number answers, the interviewer is ready to
take the call. The telemarketing operation will instead
discard numbers very quickly and move onto fresh
numbers, they will give their agents considerable latitude in the scripts and often pay commission rather
than an hourly rate, and they will use high-volume predictive dialers to maximize the time their agents spend
talking and selling as opposed to listening for answering machines or correctly classifying businesses.
723
Facilities
Interviewers
724
Research Design
These positions usually carry both project management responsibility and day-to-day floor management.
In larger operations these functions may be separated
into specialist roles, but even where fewer than 50
booths are involved there will usually be two managers
carrying both functions to provide redundancy. These
are professional-level positions, requiring formal qualifications in survey methodology or a related field, along
with substantial personnel management skills.
Large centers will also have specialized positions
in technical support, human resource management,
and training, whereas in smaller centers these functions will typically be spread among the supervisory
and management staff.
Further Readings
RESEARCH DESIGN
A research design is a general plan or strategy for conducting a research study to examine specific testable
research questions of interest. The nature of the research
questions and hypotheses, the variables involved, the
sample of participants, the research settings, the data
collection methods, and the data analysis methods are
factors that contribute to the selection of the appropriate
research design. Thus, a research design is the structure,
or the blueprint, of research that guides the process of
research from the formulation of the research questions
Research Design
725
726
Research Design
the targeted population and randomly assigning the random sample to one of four groups. Two of the groups
are pretested (Experimental and Control Groups 1) and
the other two are not (Experimental and Control Groups
2). One of the pretested groups and one of the not pretested groups receive the experimental treatment. All
four groups are posttested on the dependent variable
(experimental outcome). The design is represented as
Experimental Group 1 : R O1 E O2
Control Group 1 : R O1 C O2
Experimental Group 2 : R E O2
Control Group 2 : R C O2
Here, the researcher has two independent variables
with two levels. One independent variable is the
experimental conditions with two levels (experimental
and control groups), and the other independent variable is the pretesting condition with two levels (pretested and not pretested groups). The value of this
design is that it allows the researcher to determine if
the pretest (O1 ) has an effect on the resulting answer
given in the posttest.
An example of this design would be one that builds
on the previous example of the experiment to test the
effect of the information about the value of the new
bridge. However, in the Solomon four-group design,
there would be two more randomly assigned groups of
respondents, ones who were not asked whether they
favored or opposed the bridge funding at the beginning
of the questionnaire. Instead, one of these groups
would be the second control group, asked only their
opinions about the bridge funding later in the questionnaire. The other group would be the second experimental group, asked their opinions about the bridge
funding only later in the questionnaire but after first
being given the information about the value of the
bridge. This design would allow the researchers to test
not only the effects of the information but also whether
the saliency of the bridge funding, by asking about it
first before giving the new information, affected opinions given later about the funding.
Experimental Factorial Designs
Research Design
the effects of one or more independent variables individually on the dependent variable (experimental outcome) as well as their interactions. These interactions
cannot be examined by using single independent variable experimental designs.
The term factorial refers to experimental designs
with more than one independent variable (factor).
Many different experimental factorial designs can be
formulated depending on the number of the independent variables. The Solomon four-group design is an
example of a 2 2 factorial design with treatment
conditions (treatment and control groups) crossed
with pretesting conditions (pretested and not pretested
groups).
727
toward science. Because the effects are being measured at the level of the individual student, but the
students themselves were not randomly assigned to
the control and treatment condition, this is a quasiexperiment, not a true experiment.
Longitudinal Research Designs
728
Research Design
Correlational research is a type of descriptive nonexperimental research because it describes and assesses
the magnitude and degree of an existing relationship
between two or more continuous quantitative variables
with interval or ratio types of measurements or discrete
variables with ordinal or nominal type of measurements. Thus, correlational research involves collecting
data from a sample of individuals or objects to determine the degree of the relationships between two or
more variables for the possibility to make predictions
based on these relationships. There are many different
methods for calculating a correlation coefficient, which
depends on the metric of data for each of the variables.
The most common statistic that measures the degree of
the relationship between a pair of continuous quantitative variables, having interval and ratio types of measurements, is the Pearson productmoment correlation
coefficient, which is represented by the letter r.
Alternative correlation coefficients can be used
when the pair of variables has nominal or ordinal types
of measurement. If the pair of variables is dichotomous
(a nominal type of measurement having only two categories), the Phi coefficient should be used. If the pair
of variables has ordinal type of measurement, the
Spearman rank order correlation coefficient should be
used.
Another type of correlational research involves
predicting one or more continuous quantitative dependent variables from one or more continuous quantitative independent variables. The most common
statistical methods used for prediction purposes are
simple and multiple regression analyses.
The significance of correlational research stems from
the fact that many complex and sophisticated statistical
analyses are based on correlational data. For example,
logistic regression analysis and discriminant function
Research Design
analysis are quite similar to simple and multiple regression analyses with the exception that the dependent (criterion) variable is categorical and not continuous as
in simple and multiple regression analyses. Canonical
analysis is another statistical method that examines the
relationship between a set of predictor (independent)
variables and a set of criterion (dependent) variables.
Path analysis and structural equation modeling are other
complex statistical methods that are based on correlational data to examine the relationships among more
than two variables and constructs.
Causal-Comparative Research
729
Case Study
Case study is an in-depth examination and intensive description of a single individual, group, and
730
Research Design
organization based on collected information from a variety of sources, such as observations, interviews, documents, participant observation, and archival records.
The goal of the case study is to provide a detailed and
comprehensive description, in narrative form, of the
case being studied.
Ethnographic Research
Grounded theory research is an inductive qualitative research design that is used for generating and
developing theories and explanations based on systematically collected qualitative data. The data collection process in grounded theory research is usually an
ongoing iterative process that starts with collecting
and analyzing qualitative data that leads to tentative
theory development. Then, more qualitative data are
collected and analyzed that lead to further clarification
and development of the theory. The qualitative data
collection and further theory development process
continues until the particular theory is developed that
is grounded in the data.
Research Hypothesis
Exploratory Mixed-Methods
Research Designs
731
RESEARCH HYPOTHESIS
A research hypothesis is a specific, clear, and testable
proposition or predictive statement about the possible
outcome of a scientific research study based on a particular property of a population, such as presumed differences between groups on a particular variable or
relationships between variables. Specifying the research
hypotheses is one of the most important steps in planning a scientific quantitative research study. A quantitative researcher usually states an a priori expectation
about the results of the study in one or more research
hypotheses before conducting the study, because the
design of the research study and the planned research
design often is determined by the stated hypotheses.
Thus, one of the advantages of stating a research
hypothesis is that it requires the researcher to fully think
through what the research question implies, what measurements and variables are involved, and what statistical methods should be used to analyze the data. In other
words, every step of the research process is guided by
the stated research questions and hypotheses, including
the sample of participants, research design, data collection methods, measuring instruments, data analysis
methods, possible results, and possible conclusions.
The research hypotheses are usually derived from
the stated research questions and the problems being
investigated. After the research hypotheses are stated,
inferential statistical methods are used to test these
hypotheses to answer the research questions and make
conclusions regarding the research problems. Generally, in quantitative research designs, hypothesis testing and the use of inferential statistical methods begin
with the development of specific research hypotheses
that are derived from the study research questions.
Research hypotheses differ from research questions in
that hypotheses are specific statements in terms of the
anticipated differences and relationships, which are
based on theory or other logical reasoning and which
can be tested using statistical tests developed for testing the specific hypotheses.
The following are two examples of research questions and possible corresponding research hypotheses.
732
Research Hypothesis
Difference-Between-Groups
Hypotheses Versus
Relationship Hypotheses
Research hypotheses can take many different forms
depending on the research design. Some hypotheses
Research Management
Directional Versus
Nondirectional Hypotheses
In hypothesis testing, it is important to distinguish
between directional hypotheses and nondirectional
hypotheses. The researcher should make the choice
between stating and testing directional or nondirectional hypotheses depending on the amount of information or knowledge the researcher has about the groups
and the variables under investigation. Generally,
a directional hypothesis is based on more informed reasoning (e.g., past research) than when only a nondirectional hypothesis is ventured. Research Hypothesis 1
stated earlier is an example of a nondirectional research
hypothesis. It can be represented along with its null
hypothesis symbolically as
H0 : m1 = m2
H1 : m1 6 m2 :
m1 and m2 are the mean attitudes toward statistics held
by the male and female Ph.D. students, respectively.
Research hypothesis 2 stated earlier is an example
of a directional research hypothesis and it can be represented along with its null hypothesis symbolically as
H0 : m1 m2
H1 : m1 > m2 :
m1 and m2 are the mean levels of rapport with their
students for professors who use a student-centered
method and those who use a teacher-centered teaching
method, respectively.
Sema A. Kalaian and Rafa M. Kasim
733
RESEARCH MANAGEMENT
The function of research management is to coordinate
disparate activities involved in planning and executing
any research project. Ability to grasp the big picture,
coupled with attention to detail, is required in this
function. Regardless of research team size or the size
of the project, the same management activities are
required.
Research Plan
Research management begins at the initial phase of
the research process. As survey research texts point
out, identification of the research problem is the first
step. From this, the research team develops testable
hypotheses and the research plan. This research plan
or process is the cornerstone of research management,
and the more detailed the plan is, the more efficient
the project implementation phase will be.
Identification of Sample
Once the research team has identified the population of interest, it is time to address sampling issues.
Sample composition required to meet project goals,
sample size (based on power calculations and tempered by monetary considerations), and source of the
sample are specified in the research plan.
Timeline
Effective research management requires the development of a realistic timeline to ensure that the project
proceeds in an organized and logical manner. Recognizing that the research process seldom proceeds on
734
Research Management
Research managements role in questionnaire development is to ensure that included survey questions
answer the research questions and are appropriate for
the selected survey mode. Competent research management occasionally requires interjecting realism into the
questionnaire development process. Occasionally, those
involved in writing survey questions assume that
potential respondents have similar subject matter
knowledge and language comprehension levels as
themselves. When necessary, it is the responsibility of
research management to disabuse them of this belief.
Unique challenges are encountered when sampling
a multi-cultural, multi-lingual population because
without interviewing in multiple languages, researchers cannot extrapolate their findings to the entire
population. At the beginning of such projects, it is
important to secure the services of an experienced
translator or translation company. Researchers may be
tempted to economize on the translation and discover,
after the fact, that simply speaking and reading a language does not mean that the person is capable of
translating a questionnaire. Specialized subject-matter
knowledge greatly aids in the translation process. Cultural competencies and sensitivities also affect the
translation. After the initial translation is completed,
the questionnaire often is back-translated to ensure
accuracy of the original translation.
Protection of Human Subjects
Research Management
project to multiple human subject committees. Adequate time should be included in the project timeline to
account for differences in the frequency with which the
committees meet and the necessity of coordinating
changes required by each committee into a cohesive
project approved by all.
Projects that do not involve federal, state, or university funds are subject to the review requirements of the
client. Some commercial organizations have an institutional review board that performs the functions of the
Committee for the Protection of Human Subjects, and
projects must be approved through this committee.
When working with clients whose organizations do not
require the approval of an institutional review board, it
is prudent to obtain approval, in writing, for all aspects
of the project prior to data collection.
Data Collection
735
not given much thought to issues addressed in openended questions and, therefore, may provide incomplete, somewhat incoherent, or other unusual responses.
Prior to coding open-ended responses, it is suggested
that clients be provided with the text responses and help
develop a coding scheme for the research organization.
In doing so, the client will gain the greatest benefit from
the open-ended question responses. When clients are
reluctant to do this, those coding the text responses
should look for common response themes, receive client approval for the coding scheme, and code the data
accordingly.
Data obtained using computer-assisted modes are
relatively clean when the software programming is
thoroughly checked for accuracy, response categories
are appropriate, and when incomplete or out-of-range
responses are not allowed. Paper-and-pencil administration often results in incomplete data because of
item nonresponse, out-of-range responses, and illegible responses. Data cleaning becomes more onerous
in this situation, and research managements role is to
ensure that data quality is not compromised. Decisions regarding data cleaning, recoding, and acceptable data quality should be delineated in the data set
documentation.
Data Analysis
Before data analysis and report writing, the audience for the report needs to be determined. The client
should be asked about the degree of complexity
desired and the type of report needed. While it may
be tempting to dazzle a client by providing highlevel statistical analysis in reports, if the client is
unable to use the analysis, then the report may not be
useful. All specifications of this deliverable should be
delineated in writing prior to the start of analysis. A
function of project management is to ensure that clients receive usable deliverables that meet their needs
and, at the same time, enhance the survey organizations reputation.
Data Set and Documentation Development
736
Research Question
measures of data quality; and data weighting, if appropriate. Project complexity affects data set documentation development and can take weeks or months to
complete. Just as with data collection, responsibility
for overseeing this activity is a function of research
management.
Project Success
A thorough understanding of the research process, coupled with a basic knowledge of fiscal management,
is required for competent project management. Ability
to communicate both orally and in written form with
researchers, support staff, and interviewers is vital to
project success. While not required, being personable
and having some tact makes communication with the
research team and others easier. Although a research
team usually consists of many persons, ultimately
research project success is dependent upon competent
research management.
Bonnie D. Davis
See also Audio Computer-Assisted Self-Interviewing
(ACASI); Codebook; Coding; Computer-Assisted
Personal Interviewing (CAPI); Computer-Assisted
Self-Interviewing (CASI); Computer-Assisted
Telephone Interviewing (CATI); Interviewing;
Protection of Human Subjects; Questionnaire;
Research Design; Survey Costs
Further Readings
Further Readings
RESEARCH QUESTION
A research question is an operationalization of the purpose(s) to which a survey project aims. The research
question(s) should state the research problem in a way
that allow(s) for appropriate research methods to be
RESIDENCE RULES
A design objective in many surveys is to measure
characteristics of the population and housing in specified geographic areas. Some surveys, in addition,
attempt to estimate counts of the population in these
areas. A key component of the process needed to
accomplish these goals is a clear set of residence
rules, the rules that determine who should be included
and who should be excluded from consideration as
a member of a household.
Residence Rules
Usual Residence
The U.S. Bureau of the Census has traditionally used
the concept of usual residence. This is defined as the
place where the person lives and sleeps most of the
time and may be different from his or her legal address
or voting address. As a general principle, people should
be counted (i.e., included in a survey) at their usual
place of residence. Some people have no usual place of
residence; these people should be counted where they
are staying at the time of the survey.
This gives rise to a set of rules that can be used to
address most situations. They include the following:
People away on vacation and business should be
counted at their permanent residence.
Students
Boarding school students should be counted at
their parental homes. (This is the major exception
to the usual rule.)
Students living away at college should be counted
where they are living at college and therefore
excluded from their parental homes.
Students living at home while attending college
are counted at home.
Nonfamily members in the home are included if this
is where they live and sleep most of the time.
Included here are live-in household help, foster children, roomers, and housemates or roommates.
Military personnel are counted where they live or
sleep most of the time, which generally means that
they are not included in their permanent residence
households. For example, personnel serving in Iraq,
while absent, are not included in the household
where they resided before they left and to which
they will return.
737
738
Respondent
permanent residence, it would be considered a temporary vacancy. In the same situation in the decennial
census, the household would be enumerated at its permanent residence, and the vacation home would be
classified as a vacant unit.
A place of residence is usually a housing unit,
occupied by a household comprised of one or more
persons, or vacant. It should be noted that not everyone lives in a household. Under current terminology,
people who are not living in households are considered to be living in group quarters (unless they are literally homeless). Group quarters fall into two broad
categories. Institutional facilities include prisons,
nursing homes, mental hospitals, and other places
where the residents are generally not free to come and
go at will. Noninstitutional facilities include college
dormitories, military barracks, and other places where
the residents are free to move in, out, and about.
In practice, survey research activities are rarely concerned with de jure residence. De facto residence, the
place where people and households actually reside, is
the relevant issue.
Patricia C. Becker
See also American Community Survey (ACS); U.S. Bureau
of the Census
Further Readings
RESPONDENT
A respondent is the person who is sampled to provide
the data that are being gathered in a survey. (In some
social science disciplines, such as psychology, the person from whom data are gathered is called the subject.) A respondent can report data about himself or
herself or can serve as a proxy in reporting data about
others (e.g., other members of the household). Even
when the respondent is serving as a proxy, he or she is
directly generating the data and contributing to their
accuracy, or lack thereof.
Some surveys sample respondents directly, whereas
other surveys begin by sampling larger units, such as
households, and then choose a respondent within the unit
from whom to gather data. In interviewer-administered
surveying, such as what is done face-to-face or via the
telephone, rapport first must be developed by the interviewer with the respondent in order to gain cooperation
and then gather accurate data. In self-administered surveys, such as those conducted via mail and the Internet,
there is no one representing the researchers interests
who is available to mediate the respondents behavior,
and cooperation typically is gained from the respondent
via printed materials, such as a cover letter, which are
sent to the respondent to read.
Gaining cooperation from sampled respondents has
become progressively more difficult in the past two
decades as lifestyles have become more hectic. The
quality of the data that a survey gathers will be no
better than the quality of the effort the respondent
makes and her or his ability to provide it. A respondents willingness and ability to provide accurate data
will vary considerably across respondents and also
from time to time for the same respondent.
There are many factors that influence whether
a respondent will agree to participate in a survey and
Respondent Debriefing
whether a respondent will provide complete and accurate data after agreeing to participate. It remains the
responsibility of the researcher to choose the best survey methods, within the limitations of the researchers
finite budget for conducting the survey, that make it
most likely that a sampled respondent will agree to
cooperate when contacted, and once the respondent
agrees, that he or she will provide the highest possible
quality of data when answering the questionnaire.
Paul J. Lavrakas
See also Cover Letter; Proxy Respondent; Respondent
Interviewer Rapport; Respondent-Related Error;
Within-Unit Selection
739
the quality of the data that are gathered when the survey task is burdensome, as the respondents will strive
to reach the level of data quality sought by the
researcher, either because it will qualify him or her
for an incentive or because he or she has a heightened
feeling of obligation to the researcher to provide good
quality data, or both.
Reduction of respondent burden may result in
decreased nonresponse both at the unit level and the
item level. When incentives are offered, data quality
also should increase, as fewer respondents will turn to
satisficing strategies to get them through the survey
task as easily as possible regardless of how well they
are complying with the task.
Ingrid Graf
RESPONDENT BURDEN
The degree to which a survey respondent perceives participation in a survey research project as difficult, time
consuming, or emotionally stressful is known as respondent burden. Interview length, cognitive complexity of
the task, required respondent effort, frequency of being
interviewed, and the stress of psychologically invasive
questions all can contribute to respondent burden in survey research. The researcher must consider the effects
of respondent burden prior to administering a survey
instrument, as too great an average burden will yield
lower-quality data and is thereby counterproductive.
Mechanisms that researchers may use to minimize
respondent burden include pretesting, time testing,
cognitive interviewing, and provision of an incentive.
With pretesting, cognitive interviewing, and debriefing of respondents (and sometimes of interviewers as
well) after the completion of the pretest, a researcher
may glean insights into how to reduce any especially
onerous aspects of the survey task. For example, it
may be possible to break up a long series of questions
that uses the same response format into two or three
shorter series and space them throughout the questionnaire so that they do not appear and are not experienced as overly repetitive and, thus, as burdensome to
complete. With sensitive and otherwise emotionally
stressful questions, special transition statements that
an interviewer relates to the respondent prior to these
being asked may lessen the burden. Using incentives,
especially contingent (i.e., performance-based) incentives, as well as noncontingent ones, often will raise
Further Readings
RESPONDENT DEBRIEFING
Respondent debriefing is a procedure that sometimes
is carried out at the end of a surveys data collection
phase. That is, when an individual participant has
completed all aspects of the survey, debriefing occurs
for that person. Debriefing is usually provided in the
same format as the survey itself, that is, paper and
pencil, online, or verbally via telephone. There are
two major reasons that a researcher may want to
debrief respondents: (1) The researcher may want to
gather feedback from the respondent about the
respondents experience participating in the study or
about more details concerning the topic of the survey,
and (2) the researcher may have used some form
of deception as part of the study and will use the
debriefing to inform the respondent of this and try to
undo any harm the deception may have caused the
respondent.
740
Further Readings
RESPONDENT-DRIVEN
SAMPLING (RDS)
Respondent-driven sampling (RDS) is a method for
drawing probability samples of hidden, or alternatively, hard-to-reach, populations. Populations such as
these are difficult to sample using standard survey
research methods for two reasons: First, they lack
a sampling frame, that is, an exhaustive list of population members from which the sample can be drawn.
Second, constructing a sampling frame is not feasible
because one or more of the following are true: (a) The
population is such a small part of the general population that locating them through a general population
survey would be prohibitively costly; (b) because the
population has social networks that are difficult for outsiders to penetrate, access to the population requires
personal contacts; and (c) membership in the population is stigmatized, so gaining access requires establishing trust. Populations with these characteristics are
important to many research areas, including arts and
culture (e.g., jazz musicians and aging artists), public
policy (e.g., immigrants and the homeless), and public
health (e.g., drug users and commercial sex workers).
These populations have sometimes been studied
using institutional or location-based sampling, but such
studies are limited by the incomplete sampling frame;
for example, in New York City only 22% of jazz musicians are musician union members and they are on
average 10 years older, with nearly double the income,
of nonmembers who are not on any public list.
This entry examines the sampling method that RDS
employs, provides insights gained from the mathematical model on which it is based, and describes the types
of analyses in which RDS can be used.
Sampling Method
RDS accesses members of hidden populations through
their social networks, employing a variant of a snowball
Mathematical Model
RDS is based on a mathematical model of the network-recruitment process, which functions somewhat
741
742
Respondent Fatigue
Additional Analyses
RDS emerged as a member of a relatively new class
of probability sampling methods termed adaptive or
link-tracing designs, which show that network-based
sampling methods can be probability sampling methods. RDS continues to evolve. Though originally limited to univariate analysis of nominal variables, it has
been extended to permit analysis of continuous variables and multivariate analysis. The significance of
these methods is that they expand the range of populations from which statistically valid samples can be
drawn, including populations of great importance to
public policy and public health.
Douglas D. Heckathorn
See also Adaptive Sampling; Network Sampling;
Nonprobability Sampling; Probability Sample;
Sampling Frame; Snowball Sampling
Further Readings
RESPONDENT FATIGUE
Respondent fatigue is a well-documented phenomenon
that occurs when survey participants become tired of
the survey task and the quality of the data they provide
begins to deteriorate. It occurs when survey participants attention and motivation drop toward later
sections of a questionnaire. Tired or bored respondents
may more often answer dont know, engage
in straight-line responding (i.e., choosing answers
down the same column on a page), give more perfunctory answers, or give up answering the questionnaire
altogether. Thus, the causes for, and consequences of,
respondent fatigue, and possible ways of measuring
and controlling for it, should be taken into account
when deciding on the length of the questionnaire, question ordering, survey design, and interviewer training.
RespondentInterviewer Rapport
743
Further Readings
RESPONDENTINTERVIEWER RAPPORT
Respondents and interviewers interact during the conduct of surveys, and this interaction, no matter how
brief, is the basis for a social relationship between the
two. Often this relationship begins when the interviewer calls or visits the respondent in an attempt to
initiate and complete an interview. Other times, the
respondent may call the interviewer in order to complete an interview. During the social interaction of
conducting an interview, the respondent and interviewer will typically develop a rapport.
The establishment of rapport between the respondent and the interviewer, or lack thereof, is a key element in the interviewer gaining the respondents
cooperation to complete an interview. If a good rapport is not established, the likelihood of the interviewer completing an interview decreases. Further,
good rapport will make the respondent comfortable
with answering questions that could be considered
personal or embarrassing. It is important for interviewers to convey a neutral, nonjudgmental attitude
toward respondents regardless of the survey topic or
744
RespondentInterviewer Rapport
Further Readings
Respondent-Related Error
RESPONDENT REFUSAL
The respondent refusal disposition is used in telephone, in-person, mail, and Internet surveys to categorize a case in which contact has been made with
the designated respondent, but he or she has refused
a request by an interviewer to complete an interview
(telephone or in-person survey) or a request to complete and return a questionnaire (mail or Internet survey). A case can be considered a respondent refusal
only if the designated respondent has been selected
and it is clear that he or she has stated he or she will
not complete the interview or questionnaire. Respondent refusals are considered eligible cases in calculating response and cooperation rates.
In a telephone survey, a case is coded with the
respondent refusal disposition when an interviewer
dials a telephone number, reaches a person, begins his
or her introductory script, and selects the designated
respondent, and the respondent declines to complete
the interview. In calls ending in a respondent refusal,
the designated respondent may provide an explanation
for the refusal such as, I dont do surveys, I dont
have time, Im not interested, or Please take me
off your list. In other instances, the respondent contacted may simply hang up.
Respondent refusals in an in-person survey occur
when an interviewer contacts a household, a household
member answers the door, the interviewer begins his
or her introductory script, and selects the designated
respondent, and the designated respondent declines
to complete the interview. Common explanations in
in-person surveys for household refusals parallel those
for telephone surveys.
Cases in a mail or Internet survey of specifically
named persons are coded with the respondent refusal
disposition when contact has been made with the sampled person and he or she declines to complete and
return the questionnaire. Because little may be known
in a mail survey about who in the household generated
the refusal, it can be difficult to determine whether
a household refusal or respondent refusal disposition
is most appropriate. Different invitation methods for
Internet surveys (such as contacting sampled respondents at their email addresses) make respondent refusals
the most common type of refusal in an Internet survey.
Respondent refusals usually are considered a final
disposition. Because refusal rates for all types of surveys
have increased significantly in the past decade, many
745
RESPONDENT-RELATED ERROR
Respondent-related error refers to error in a survey
measure that is directly or indirectly attributable to
the behaviors or characteristics of respondents; it is
distinguished from error resulting from other survey
components, such as questionnaires, interviewers, or
modes of administration. However, respondents may
interact with these other components of survey design
in producing errors. It is useful to dichotomize
respondent-related errors into those that arise from
nonobservation or nonresponse (e.g., during efforts to
obtain interviews) and those that result from observation or measurement (e.g., during the administration
of the survey).
Errors in survey measures have two components:
bias and variable errors. Bias results when responses
provided in the survey differ from their true values,
which are typically unknown and unmeasured, in
a systematic way across repeated measurements. For
example, in contrast to the responses they enter onto
self-administered questionnaires, respondents are more
likely to underreport abortions they have had when
data are collected by an interviewer. Variable errors
result from differences across the source of the error.
746
Respondent-Related Error
For example, respondents sometimes provide different answers to interviewers who deviate from the
rules of standardized interviewing. Survey methodologists also make a distinction between error associated with objective versus subjective questions.
For objective questions about events and behaviors,
error is expressed as the difference between the
respondents answer and what might have been
observed if observation had been possible. For subjective questions about attitudes and opinions, error
is conceptualized as sources of variation in the
answers other than the concept the researcher is trying to measure.
Respondent-Related Error
747
748
Respondent-Related Error
Further Readings
Response Alternatives
RESPONSE
In survey research, a response generally refers to the
answer a respondent provides when asked a question.
However, a response to a survey can also be considered to occur when a sampled respondent decides
whether or not to participate in the survey. Both types
of responses may affect data quality; thus, a welltrained researcher will pay close attention to each.
749
Further Readings
RESPONSE ALTERNATIVES
Response alternatives are the choices that are provided to respondents in a survey when they are asked
a question. Response alternatives are generally associated with closed-ended items, although open-ended
items may provide a limited number of such choices.
Response alternatives can take a number of different forms, related to the type of question presented.
In a Likert-type item, in which respondents are asked
the extent to which they agree or disagree with a
statement, the response alternatives might be strongly
approve, approve, neither approve nor disapprove,
750
Response Alternatives
response would be provided. For example, in the question, Do you think state spending on roads and highways should be increased, kept about the same as it is
now, decreased, or dont you have an opinion on this
issue? the response alternative or dont you have an
opinion on this issue is an explicit dont know or
no opinion option. When such options are provided
as part of the question, the percentage of respondents
who choose this alternative is higher than when it is
not present, and respondents have to volunteer that they
dont know.
The fact that a larger percentage of respondents
will choose a dont know response when offered
explicitly applies more generally to all types of
responses. For example, if survey respondents are presented with the question, Do you think that federal
government spending on national defense should be
increased or decreased? some percentage will volunteer that they believe it should be kept about the same
as it is now. If the same group of respondents were
asked the question, Do you think federal government
spending on national defense should be increased,
decreased, or kept about the same as it is now?
a much higher percentage would select the same as
it is now response alternative. Similarly, for items in
which respondents are presented with some choices
(generally the most likely options) but not an exhaustive list, researchers will often leave an other
option. An example of such an other response is
provide by the question, Which television network do
you watch most frequently, ABC, CBS, Fox, NBC, or
some other network? Although some respondents will
provide other responses such as UPN or the CW, the
percentage of such responses will typically be smaller
than if these other networks had been mentioned specifically in the question.
An additional aspect of response alternatives is balance. In the design of most questions, researchers
strive to provide balance in the response alternatives.
If presenting a forced choice item, the strength of the
arguments on one side of a question should be similar
to the strength of the arguments on the other. Similarly, in asking an agreedisagree item, an interviewer
would ask, Do you agree or disagree that . . .; asking
only Do you agree that . . . would lead more respondents to choose this option. In asking about approval
or disapproval of some individual or proposal, a question that asks, Do you completely approve, approve
a great deal, approve somewhat, or disapprove of . . .
is not balanced. Asking Do you approve a great deal,
Response Bias
approve somewhat, disapprove somewhat, or disapprove a great deal of . . . provides a more balanced
item.
Response alternatives are an important consideration in question design. The number of choices presented, how they are balanced, whether there is an
explicit middle alternative, and an explicit no opinion option can all influence the results obtained by
a survey question.
Robert W. Oldendick
See also Balanced Question; Closed-Ended Question;
Feeling Thermometer; Forced Choice; Likert Scale;
Open-Ended Question; Ranking; Rating; Response
Order Effects; Show Card; Unfolding Question
Further Readings
RESPONSE BIAS
Response bias is a general term that refers to conditions or factors that take place during the process of
responding to surveys, affecting the way responses
are provided. Such circumstances lead to a nonrandom
deviation of the answers from their true value.
Because this deviation takes on average the same
direction among respondents, it creates a systematic
error of the measure, or bias. The effect is analogous
to that of collecting height data with a ruler that consistently adds (or subtracts) an inch to the observed
units. The final outcome is an overestimation (or underestimation) of the true population parameter. Unequivocally identifying whether a survey result is affected by
response bias is not as straightforward as researchers
would wish. Fortunately, research shows some conditions under which different forms of response bias can
be found, and this information can be used to avoid
introducing such biasing elements.
The concept of response bias is sometimes used
incorrectly as a synonym of nonresponse bias. This
751
use of the term may lead to misunderstanding. Nonresponse bias is related to the decision to participate in
a study and the differences between those who decide
to cooperate and those from whom data are not gathered. Response bias, on the other hand, takes place
once a respondent has agreed to answer, and it may
theoretically occur across the whole sample as well as
to specific subgroups of the population.
752
Response Bias
Avoidance Strategies
Researchers have proposed multiple ways to control
and correct for response bias. Some of these strategies
are broad ways to deal with the issue, whereas others
depend on the specific type of effect. Conducting
careful pretesting of the questions is a general way to
detect possible biasing problems. An example of a specific strategy to detect acquiescence, for instance, is
writing items that favor, as well as items that oppose,
the object of the attitude the researcher wants to measure (e.g., abortion). Respondents that answer in an
acquiescent way would tend to agree with statements
in both directions.
If response bias is a function of how questions are
worded, how they are ordered, and how they are presented to the respondent, then careful questionnaire
design can help minimize this source of error. However, there are sources of error that cannot be predicted
in advance and therefore are much more difficult to
avoid. In that case, there are strategies that can be followed to identify response bias. If validation data are
available, survey outcomes can be checked against
them and response bias can be more precisely identified. Similarly, the use of split ballot experiments
can provide insights about deviations across different
conditions and point out biasing factors. Including
interviewer characteristics and other external data to
analyze survey outcomes can help reveal hidden
effects. At any rate, the researcher should check for
possible response biases in data, take bias into account
when interpreting results, and try to identify the cause
of the bias in order to improve future survey research.
Ana Villar
See also Acquiescence Response Bias; Context Effect;
Interviewer-Related Error; Measurement Error; Primacy
Response Latency
Further Readings
RESPONSE LATENCY
Response latency is the speed or ease with which
a response to a survey question is given after a respondent is presented with the question. It is used as an
indicator of attitude accessibility, which is the strength
of the link between an attitude object and a respondents evaluation of that object. While response latency
has been used for some time in cognitive psychology
lab experiments, its use in surveys came about more
recently. In telephone surveys, response latency is
measured in milliseconds as the elapsed time from
when an interviewer finishes reading a question until
a respondent begins to answer.
There are four stages that survey respondents use
when answering questions: (1) question comprehension,
(2) retrieval of information from memory, (3) integration of the information to form a judgment, and (4)
selection of an appropriate response option. Response
latency measures the time it takes to retrieve, form, and
report an answer to a survey question. Response latency
data can provide much useful information about attitude
accessibility that can be incorporated into data analysis.
For example, when attitudes are modeled to predict
subsequent behavior, respondents with more accessible
attitudes (indicated by shorter response times) often
exhibit a stronger relationship between the attitude and
the subsequent behavior. Attitude accessibility as measured by response latency is just one way of measuring
the strength of an attitude, but it can be consequential
753
for attitude stability, a respondents resistance to persuasion, as well as the influence of the attitude on behavior.
Response latency has also been used as a method of
pretesting survey questionnaires in order to identify
problematic questions.
Response latency was first used in cognitive psychology lab experiments where the timer measuring
the response latency is a function of the participants
own reaction time to a self-administered instrument.
When adapted for use in telephone surveys, it is generally measured via a voice-activated or automatic
timer (which requires special equipment) that senses
when a response is given or through an intervieweractivated or active timer embedded into the programming of computer-assisted telephone interviewing (CATI) software. The active timer requires the
interviewer to start and stop the timer at the appropriate time using the computer keyboard and then verify
that the time measurement is valid. Response latencies are coded as invalid if the interviewer fails to
apply the timer correctly or if the respondent asks for
the question to be repeated. Response latency can also
be measured using a latent or unobtrusive timer that
is programmed into CATI software and is invisible to
both interviewers and respondents. Latent timers simply measure the total duration of each question from
the time the question appears on the interviewers
screen until the moment the respondents answer is
recorded. Such timers also can be used in computerassisted personal interviewing (CAPI) and computerized self-administered questionnaires (CSAQ).
Regardless of how the response latency data are collected, the distribution of responses is frequently
skewed, and the data require careful examination and
cleaning before analysis. Invalid data and extreme outliers should be removed and the data transformed to
eliminate the skew. Depending on how the data are
collected, researchers using response latency data may
also want to control for baseline differences among
respondents in answering questions and interviewers in
recording survey responses because some respondents
are naturally faster in answering questions and some
interviewers are naturally faster in recording responses.
J. Quin Monson
See also Attitudes; Attitude Strength; Comprehension;
Computer-Assisted Personal Interviewing (CAPI);
Computer-Assisted Telephone Interviewing (CATI);
Computerized Self-Administered Questionnaires (CSAQ);
Outliers; Retrieval
754
Further Readings
755
756
Response Propensity
Sampled Person
Characteriscs
Sociodemographic
characteriscs
Psychological
disposion
Topic
Survey period
Number of callbacks
Refusal conversion
Incenve type and
amount
RESPONSE PROPENSITY
Response propensity is the theoretical probability that
a sampled person (or unit) will become a respondent
in a specific survey. Sampled persons differ in their
likelihood to become a respondent in a survey. These
differences are a result of the fact that some persons
are easier to get into contact with than are others,
some persons are more willing and able to participate
in a specific survey than others, or both. Response
propensity is an important concept in survey research,
as it shapes the amount and structure of unit nonresponse in a survey. The covariance between response
propensity and the survey variable of interest determines the bias in survey estimates due to nonresponse.
Theoretically, using response propensities, a researcher
could entirely correct for nonresponse bias. However,
such a correction requires that the researcher know the
true value of this unobservable variable. In practice,
researchers can use only estimates of response propensities, so-called propensity scores, using a logistic
model that hopefully captures the concept well. The
extent to which the nonresponse bias can be corrected
using propensity scores depends on the quality of the
propensity score model.
Response
Propensity
Figure 1
Response Propensity
Topic
Sponsor
Burden
Incentives
Interviewers
757
758
Response Rates
The advantage of using the concept of response propensity over traditional methods is that this approach
allows incorporating theoretical notions about survey
participation into the procedure of bias reduction. From
this perspective it makes sense to use a procedure that
reflects the two-step process of survey participation
and estimate multivariate logistic models for contactability and cooperation separately. The estimated models can then be used in a two-stage procedure to adjust
the original sampling weight that corrects for unequal
selection weights into weights to also correct for nonresponse bias. The quality of the procedure depends on
the quality of the response propensity models. In practice, the quality of these models is still rather low.
However, it is a challenge to find concepts that have
strong relationships with contactability and cooperation, to obtain measures of these concepts on both
respondents and nonrespondents, and to use these in
estimating response propensity to correct for nonresponse bias.
Adriaan W. Hoogendoorn
See also Contactability; Contact Rate; Contacts; Cooperation;
Cooperation Rate; Leverage-Saliency Theory;
Nonresponse; Nonresponse Bias; Post-Stratification;
Propensity Scores; Refusal Conversion; Response
Further Readings
RESPONSE RATES
A response rate is a mathematical formula that is calculated by survey researchers and is used as a tool to
understand the degree of success in obtaining completed interviews from a sample. In probability samples, where the intent of a survey is to project the
results of the data onto a population (e.g., all adults
in the United States), statistical theory rests on an
assumption that data are collected from every unit, or
person, selected. In practice, it is extremely rare for
Background
Although nonresponse has been studied since the
1940s, serious efforts to standardize the measurement
of nonresponse have arisen only within the last quarter of the 20th century. Furthermore, the common use
of standardized response rate measurements has not
yet been fully realized throughout the survey research
profession.
Traditionally, there has been a great deal of overlap and inconsistency in both the definitions and formulas used to understand the concept of response
rates. These discrepancies present a difficulty to the
survey research profession because they often confuse
consumers of survey information. Using consistent
outcome rates is important because it allows the level
of nonresponse to be compared more easily between
different surveys. This provides researchers and clients or other end-users with a meaningful target when
planning the design of research. Equally as important,
standard outcome rates offer an important benchmark
for understanding how well surveys performed.
For example, a lack of consistency prohibits the
accurate comparison of nonresponse between two
unique surveys, obscures agreement in target levels of
nonresponse in research proposals, and hampers methodological research exploring nonresponse error.
In response to the historical differences among
response rate calculations, the survey research profession has gradually worked toward a uniformly accepted
set of formulas and definitions for nonresponse. These
efforts are now spearheaded by the American Association for Public Opinion Research (AAPOR), which
maintains a series of definitions, formulas, and dispositions that are continuously updated to reflect new technologies and changes in the survey research profession.
Response Rates
759
760
Response Rates
of response rates is often misinterpreted. It is important to view survey response rates in the context of
the survey design.
Recent research in survey methods lends support to
the idea that response rates must be considered along
with other information. This convention contrasts somewhat with previous notions of researchers who believed
a certain minimum response rate would offer sufficient
protection (or mitigation) against nonresponse error,
which is recognized nowadays to not be the case.
Further Readings
Retrieval
RETRIEVAL
Basic memory processes fundamentally affect the
answers survey respondents give to survey questions,
and retrieval is one of these memory processes.
Retrieval refers to the active recovery of previously
encoded information from long-term memory into
conscious, working memory. Long-term memory that
requires conscious retrieval is divided into (a) memory for facts and (b) memory for events. It is important for questionnaire designers to be aware of the
limitations that retrieval processes place on the survey
experience for the respondent.
The process of retrieving information and memories
for events is closely related to the initial process of
encoding, when a memory for the event is first stored
by transferring information from active experience into
long-term storage. The most difficult retrieval situation
is unaided (free) recall, where there are no cues provided that relate to the desired information. However,
questionnaires can avoid this situation by including
cues in the survey questions that aid in recall. Retrieval
is maximized when the cues present during encoding
closely match the cues present in the questionnaire. If,
for example, memory for a particular event is typically
encoded in terms of time (i.e., filing your taxes), then
survey questions that cue respondents to think about
the event temporally will aid recall. The more cues
there are, the more precise the retrieved information
will be; for example, memory for a persons name may
be aided by cueing specific events relating to that person, picturing their physical characteristics, thinking
about the sound of her or his voice, or thinking about
common acquaintances. Although the match between
encoding and retrieval cues is important, it is also the
case that, all other things being equal, some cues are
generally better than others in aiding recall for events.
Cues relating to the type of event work best; next best
are cues relating to location and people and, finally,
cues relating to time.
The depth of initial encoding is also important to
later success in retrieving that information from memory; this phenomenon is referred to as the levels-ofprocessing effect. It is easier to remember information
761
that was initially processed more deeply in comparison to information that was processed superficially.
For example, information that is processed both visually (i.e., pictorially) and verbally is easier to remember than information processed in only one fashion.
Retrieval is greatly affected by the type of information that a survey question requires respondents to
access. Memory access is first and foremost affected
by how long ago the memory was initially encoded.
Retrieval for older information and events is more
difficult and error-prone, whereas memory for more
recent information is easier to access. Certain kinds of
information are also easier to retrieve from memory.
For example, dates and names are forgotten easily.
Information about a unique event is easier to remember, whereas memory for a specific event that is
a repetitive and common experience of the respondent
is more difficult to remember and more prone to
errors. For example, it is easy to remember the details
of meeting the president because this information is
distinctive in comparison to other information in
memory. In contrast, retrieving specific information
about your last trip to the grocery store is more difficult to remember accurately because there are so
many similar events in your past that the details for
each event seem to blur into a general memory for
going to the grocery store. This difficulty leads to
errors in identifying the source of a memory; that is,
respondents are unable to accurately identify the specific originating event of the retrieved information.
It is important to note that the retrieval of information from long-term memory is a reconstructive process and fraught with possible error. When memory
for an event or fact is accessed, this remembrance is
not a perfect replica of that event or initially encoded
fact. Retrieval errors affect survey responses in three
common ways: (1) Forgetting occurs when respondents are simply unable to access information stored
in memory, or the retrieved memory is partial, distorted, or simply false (false memories may occur
even when respondents are quite confident in the
accuracy and validity of the memory); (2) estimation
occurs when respondents make a best educated
guess that stands in for the retrieval of exact information, which often occurs because respondents satisfice;
that is, they attempt to finish a survey by answering
questions with as little effort as possible; and (3) telescoping occurs when respondents inaccurately include
events that belong outside the reference period a survey
question requires.
762
Reverse Directory
REVERSE DIRECTORY
A reverse directory has two general definitions. The
first is a residential telephone directory, which has
been converted from a surname alphabetical ordered
listing to a street name ordered listing. The second is
a listing of addresses in a city, also known as a city
directory.
A telephone directory covering a city or other geographic area consists of residential listed telephone numbers. The directory is in alphabetical order by surname.
A first name or initials generally accompany the surname. The street address is then listed, although in some
areas a household may decide to not have their address
listed in the telephone directory. The telephone number
763
764
(Rho)
Further Readings
(RHO)
A statistic that quantifies the extent to which population units within clusters are similar to one another
(Rho)
(i.e., the degree of homogeneity within clusters) is
called the intraclass correlation coefficient (ICC) and
is often denoted by the Greek letter rho ().
When population units are grouped or clustered
into larger units, which are themselves easier to identify and sample (i.e., children grouped into classrooms
or elderly citizens grouped into nursing homes), oneor two-stage cluster sampling becomes an appealing,
cost-effective, and practical choice for a sampling
strategy. These benefits are often counterbalanced by
the usual expected loss in efficiency and precision in
estimates derived from cluster samples that is, in large
part, due to the fact that units within clusters tend to
be more similar to each other compared to units in the
general population for many outcomes of interest.
The computation of essentially provides a rate of
homogeneity for elements within a cluster relative to
the overall population variance, as seen by the following equation:
C P
M P
M
P
ICC = =
i = 1 j = 1 k6j
CM 1M 1S2
S2 =
M
C X
X
yij yU 2
i=1 j=1
CM 1
C P
M
P
M SSW
,
M 1 SST
i=1 j=1
M
C P
P
yij yU 2
i=1 j=1
765
ICC = = 1
8
5:932
= 0:941:
8 1 114:690
766
Table 1
(Rho)
Population values of fruit and vegetable intake for five health club franchises
Cluster
Average Daily
Fruit and Vegetables
Cluster Average
(yij yiU)2
2.05
2.35
0.090
2.250
2.20
0.023
1.823
2.15
0.040
1.960
2.30
0.003
1.563
2.45
0.010
1.210
2.50
0.023
1.103
3.05
0.490
0.250
2.10
0.063
2.103
4.00
0.001
0.203
3.75
0.078
0.040
4.10
0.005
0.303
4.35
0.102
0.640
4.05
0.000
0.250
3.90
0.017
0.123
4.00
0.001
0.203
4.05
0.000
0.250
3.35
0.001
0.040
3.50
0.014
0.002
3.65
0.073
0.010
3.20
0.032
0.123
3.10
0.078
0.203
3.40
0.000
0.023
3.45
0.005
0.010
3.40
0.000
0.023
6.10
0.090
6.503
7.10
0.490
12.603
6.25
0.023
7.290
5.95
0.203
5.760
7.05
0.422
12.250
6.40
0.000
8.123
6.05
0.123
6.250
Staff Member
4.03
3.38
6.40
(yij yU)2
Role Playing
Cluster
Average Daily
Fruit and Vegetables
6.30
0.50
Staff Member
Cluster Average
(yij yiU)2
767
(yij yU)2
0.010
7.563
1.232
9.303
0.75
0.740
7.840
1.25
0.130
5.290
1.65
0.002
3.610
2.30
0.476
1.563
2.25
0.410
1.690
2.05
0.194
2.250
2.10
0.240
2.103
5.932
114.690
SSW
SST
yU = 3.553
ROLE PLAYING
Role playing is an educational technique that is used in
the training of survey interviewers. It involves face-toface or telephone interviewers practicing the tailored
use of the introduction of the survey in which an
attempt is made to gain cooperation from a respondent,
1.61
Total:
and practicing the proper way to administer the questions to respondents, or both. Role playing generally
takes place toward the end of training, after the interviewers have been methodically exposed to the entire
questionnaire by the trainer, on a question-by-question
basis. The role playing part of training may last as long
as 2 hours depending on the length and complexity of
the questionnaire. Interviewers generally enjoy the role
playing part of training, in part because it allows for
their active participation in the training process.
Role playing typically takes two forms. One is
where the supervisor or trainer takes on the persona of
a respondent and interviewers in the training session go
through the introduction or the questionnaire, or both,
in a round-robin fashion asking the questions to the
trainer and going on to the next appropriate question
depending on the answer just given by the trainer. The
trainer often varies his or her persona while this is happeningsometimes being cooperative and other times
being uncooperative when the introduction is being
practiced, and sometimes being an easy respondent
and others times being a difficult respondent when
questions are being asked. The second form of role
playing is where interviewers are paired up and take
turns interviewing each other with the questionnaire.
As this is taking place, supervisory personnel typically
move around the room observing the practice, making
tactful suggestions for improvement where appropriate
768
Rolling Averages
Further Readings
ROLLING AVERAGES
Rolling averages, also known as moving averages, are
a type of chart analysis technique used to examine survey data collected over extended periods of time, for
example, in political tracking polls. They are typically
utilized to smooth out data series. The ultimate purpose
of rolling averages is to identify long-term trends. They
are calculated by averaging a group of observations of
a variable of interest over a specific period of time. Such
averaged number becomes representative of that period
in a trend line. It is said that these period-based averages
roll, or move, because when a new observation is
gathered over time, the oldest observation of the pool
being averaged is dropped out and the most recent
observation is included into the average. The collection
of rolling averages is plotted to represent a trend.
An example of how rolling averages are calculated is
as follows. Imagine that a survey analyst is interested in
computing 7-day rolling averages (i.e., a 1-week period)
for a period of 52 weeks (364 days, approximately
Further Readings
769
770
The Tools
The iPOLL database contains a half-million questions
and responses from surveys conducted in the United
States since 1935. Using simple search terms, it is
possible to identify hundreds of relevant questions on
most public policy and social issues in a matter of
seconds. Of these half-million questions, about 60%
are from surveys where the entire data set has been
archived with the Roper Center; the other 40% are
from surveys that have yet to arrive, are only available in report form, or are archived elsewhere. Using
the integrated RoperExpress service, it is possible to
move directly from a question in the iPOLL database
to the documentation and download the data set, as
well. This service permits researchers to conduct secondary analyses (e.g., cross-tabulations that reveal
more details about respondents answering in a particular way) on their desktop computers.
The drill-down functionality that permits this ondemand download of the documentation and data files
is distinctive to the Roper Centers data archives.
Although systems exist that permit searching at the
question level, most of those organizations do not
maintain the raw data set files. Conversely, there are
institutions that offer direct download of data set files
but have not developed a question-level retrieval
system.
Further Readings
771
772
example, different periods of time. However, composite estimation can reduce variances further in many
cases, especially where estimates of means or totals are
desired.
Many factors are considered before implementing
a rotating panel design. Of primary concern are the
types of data and estimates to be produced and how
they rank in importance. The relative importance of
estimates of level (totals, means, and proportions) at
specific points in time, change across periods, and
averages over time must be weighed. The variances
and biases of these estimators will typically depend on
the type of design, the amount of rotation, and the correlations among the rotation group estimates over time.
Operational aspects, such as survey costs, the mode
of data collection, and respondent burden, are important considerations as well. For surveys that are conducted repeatedly, much of the cost may be incurred
the first time a sample unit is contacted. For example,
canvassing a region to devise an area sample can be
very expensive, especially if the units are interviewed
only once. Even with list sampling, the tasks may
include (a) preparing introductory questionnaire materials, (b) finding the household or establishment (for
a personal visit), (c) explaining the survey or special
procedures to the respondent, (d) filing background
information about the sample unit, and (e) preparing
the database for each unit. Many of these steps need
not be repeated if the same unit is canvassed a second
or third time, as in a rotation design. For example, in
some household surveys, the first contact is made in
person because a telephone number is not available.
When a rotating panel design is used, subsequent contacts can often be completed by telephone, reducing
the cost per interview.
Issues of respondent burden and response are relevant in a rotating panel design. The burden on an individual household or establishment typically increases
with the number of interviews and the length of the
questionnaire. With increased burden might come
decreased response rate or lower-quality data. For
household surveys, how to handle people who move
and what effect they have on the estimates can be
a problem. These issues become more prominent as the
time between contacts with the sample unit increases.
For business surveys, various approaches have been
proposed for continuous sample selection while controlling the overlap over time and the burden. The permanent random numbers technique has been implemented
in several countries, whereby a unique random number
773
Further Readings
S
month or the number of movies you have seen in the
past year can be difficult. On the other hand, while
remembering how many movies you have seen in
the past year is probably difficult, remembering the
number of movies you have seen in the past week is
probably not difficult. This illustrates an important
point, the lower the saliency of an item, the shorter
the reference period should be.
On the survey level, the saliency of the survey
topic refers to the degree to which the subject matter of the survey resonates for the population being
surveyed. If the questions being asked are of great
interest to the average sample member, the survey
is said to highly salient, whereas surveys where the
subject being investigated is of little interest are
said to have low saliency. Gaining cooperation or
attaining a high response rate is made more difficult
when the saliency is low, because sample members
have little motivation to respond. On the other
hand, when the central topic of a survey is one of
great interest to those being surveyed, sample members are more likely to respond. For them, the burden of responding is compensated for by their
interest in the topic. Thus, saliency is an important
factor when thinking about response rates and the
level of effort required to attain a certain response
rate. For example, a questionnaire with high saliency and low respondent burden (e.g., takes minimal time to complete, is easy and straightforward to
understand) will require much less effort to attain a
high response rate than will a survey that has both
low saliency and high respondent burden (i.e.,
takes a long time to complete, requires a great deal
SALIENCY
Saliency refers to the degree to which a topic or event
resonates with a prospective respondent or sample
member. The more a topic or event resonates with a
sample member, the more salient or important that
topic or event tends to be in that persons life. Conversely, topics or events that resonate little or hold little importance for the sample member are said to have
little saliency.
What are the implications of saliency for survey
researchers? Saliency actually operates at two levels:
the question level and the survey level.
On the question level, saliency refers to the importance of an event or action in a persons life. More
important events or actions are better remembered
than actions or events of low saliency. Consequently,
saliency can affect the accuracy with which an event
is remembered, which, in turn, can affect the accuracy
of the response. More important or unusual events are
generally remembered with greater accuracy than are
common or frequent events. For example, most people can tell you, with little effort, their date of birth,
the highest degree they have completed, what major
illnesses they have suffered, or how many children
they have. Similarly, given the significance of the
event, many can tell you where they were when they
heard the news of 9/11 or when John F. Kennedy
was shot. Items of lesser importance, on the other
hand, have lower saliency and are thus more difficult
to remember. For example, recalling the number of
times you have visited the grocery store in the past
775
776
Sample
Further Readings
Further Readings
Groves, R. M., Singer, E. & Corning, A. (2000). Leveragesalience theory of survey participation. Public Opinion
Quarterly, 64, 299308.
Geraldine Mooney
SAMPLE
In survey research, a sample is a subset of elements
drawn from a larger population. If all elements from
the larger population are sampled for measurement,
then a census is being conducted, not a sample survey.
In a broad context, survey researchers are interested in obtaining some type of information for some
population, or universe, of interest. A sampling frame,
that is, a frame that represents the population of interest, must be defined. The sampling frame may be
identical to the population, or it may be only part of
the population and is therefore subject to some undercoverage, or it may have an indirect relationship to
the population (e. g., the population is males and the
frame is telephone numbers). It is sometimes possible
to obtain the desired information from the entire
population through a census. Usually, however, for
reasons of cost and time, survey researchers will only
obtain information for part of it, referred to as a sample of the population. There may be several different
samples selected, one for each stage of a multi-stage
sample. For example, there may be a sample of counties, a sample of blocks within sampled counties, a
sample of addresses within sampled blocks, and a sample of persons from sampled addresses.
A sample can be obtained in many different ways,
as defined by the sample design. Survey researchers
usually will want to have a probability sample, which
ensures that all units in the frame have a known nonzero probability of selection, rather than a convenience sample or nonprobability sample, which do not
SAMPLE DESIGN
A sample design is the framework, or road map, that
serves as the basis for the selection of a survey sample
and affects many other important aspects of a survey
as well. In a broad context, survey researchers are
interested in obtaining some type of information
through a survey for some population, or universe, of
interest. One must define a sampling frame that represents the population of interest, from which a sample
is to be drawn. The sampling frame may be identical
to the population, or it may be only part of it and is
therefore subject to some undercoverage, or it may
have an indirect relationship to the population (e. g.,
the population is preschool children and the frame is
a listing of preschools). The sample design provides
the basic plan and methodology for selecting the
sample.
A sample design can be simple or complex. For
example, if the sampling frame consists of a list of
every unit, together with its address, in the population
of interest, and if a mail survey is to be conducted,
then a simple list sampling would be appropriate; for
example, the sample design is to have a sampling
interval of 10 (select every 10th unit) from the list.
The sample design must vary according to the nature
of the frame and the type of survey to be conducted
(the survey design). For example, a researcher may
want to interview males through a telephone survey.
In this case, the sample design might be a relatively
simple one-stage sample of telephone numbers using
Sample Management
777
Further Readings
SAMPLE MANAGEMENT
In survey research, sample management refers to the
efforts that must be made to coordinate the processing
of sampled cases during a surveys field period so as
to achieve the highest possible response rates within
the finite budget that is allocated for data collection.
This requires the coordination of whatever software
is used to help manage the sample by the person(s)
doing the managing.
The person or persons charged with sample management typically do this on a daily basis throughout
the field period. For smaller surveys that involve a
few hundred cases, this might be done by one person
who may also have responsibility for other supervisory tasks. This person may manage the sample manually. In very large surveys with thousands of cases
(including some large random-digit dialing surveys
with millions of sampled telephone numbers), an entire
team of supervisory personnel whose sole responsibility is sample management also will need to use specialized computer software to manage the sample.
Managing a survey sample differs greatly depending on whether or not the survey uses interviewers
to gather the data. The following sections address
sample management when the survey is intervieweradministrated and when it is self-administered.
778
Sample Management
Sample Management in
Interviewer-Administered Surveys
In interviewer-assisted surveying, both for in-person
and telephone surveys, sample management can be
viewed as being composed of two primary responsibilities that often are in direct conflict with each other.
One of these responsibilities is to allocate enough
sample cases to keep interviewers productive and, in
turn, to employ the right number of interviewers to
process the sample so as to achieve the number of
completed interviews that are required for the survey
within the field period. The other responsibility is to
activate just enough cases so that the field period
ends with the highest possible response rates at the
lowest possible cost.
Based on past experience, the sample manager of
an interviewer-administered survey will have an informed expectation of approximately how many sample
cases will need to be activated from the available
sample (i.e., the sampling pool) during the survey
field period to achieve the required sample size of
completed interviews. Oftentimes the sample will be
divided into sample replicates, each containing random subsets of cases. Once a replicate of cases has
been activated, the entire replicate must be fully processed to allow each case to reach its proper final
outcome.
If cases are not worked enough during the field
period, response rates will suffer. Thus, the sample
managers need to activate the last replicate of cases,
leaving enough time in the field period to allow those
cases to be processed fully. If not enough time is left
to process this final set of cases, the most hard-toreach respondents within those cases will not be contacted and interviewed. In contrast, if cases are contacted too frequently during the field period, for
example, by having an overabundance of interviewers
constantly making contact attempts, response rates
also will suffer because respondents will become
annoyed if the survey organization contacts them too
frequently. Thus, the challenge for the sample managers is to have the right number of sample cases
active at any one time and the right number of interviewers scheduled to process these cases.
To this end, the sample manager likely will need
to track the progress that interviewers are making on
the sample on a daily basis. The metrics that the manager will consider include the hourly or daily productivity of interviewers. Are they gaining completions
Sample Management in
Self-Administered Surveys
In mail, Internet, and Web surveys, in which interviewers play no part in the data collection, managing
Sample Management
779
Sample Management in
Mixed-Mode Surveys
As response rates have dropped during the past
two decades, the need for researchers to implement
mixed-mode surveys, which often can achieve higher
response rates at cost-favorable levels, has become
more commonplace. For example, using an addressbased sampling frame would allow a researcher to
implement a design that begins the field period with
a mail survey, as well as offering an Internet mode
for completing the questionnaire and other survey
task(s). Then, nonresponders for whom the researchers have a telephone number matched to their address
would be followed up using telephone interviewers.
Finally, in-person interviewers would be sent during
the last stage of the field period to all addresses that
have not responded to previous modes of contact.
Managing a mixed-mode sample design such as this
is much more complex than managing a sample for
a survey that uses a single mode of data collection.
In the case of the mixed-mode approach, with multiple channels allowed for data to be gathered and to
be returned to the survey organization, it is paramount that the sample managers have the proper
software systems in place to capture and safely store
the incoming data and to accurately track the status
of every case on a daily basis that remains active.
If this does not happen, the response rates for the
survey will suffer in ways that could have been
avoided had a better sample management system
been deployed.
Paul J. Lavrakas
See also Address-Based Sampling; Case; Case-Control
Study; Contact Rate; Control Sheet; Face-to-Face
Interviewing; Field Period; Hit Rate; Internet Surveys;
Mail Survey; Mixed Mode; Refusal Conversion;
Response Rates; Sample; Sample Replicates; Sampling
Pool; Supervisor; Supervisor-to-Interviewer Ratio; Survey
Costs; Telephone Surveys; Total Design Method (TDM);
Web Survey
Further Readings
780
Sample Precinct
SAMPLE PRECINCT
Sample precinct refers to a sampling unit used in data
collection and analysis on Election Day. The term is
most commonly associated with media exit polling
and election night projections. For voting purposes,
jurisdictions such as counties or townships are divided
into precincts based on geography. A precinct is the
smallest voting unit in a jurisdiction. Voters are
assigned to a precinct, which typically has one polling
place where voters can go to cast their ballots. Data
are collected from sample precincts, which are used
to form estimates of the vote and voter opinions and
characteristics.
For Election Day analysis, a sample of precincts
is selected because it would be too costly and time
consuming to collect data from all precincts in a
jurisdiction (e.g., a state or city). Prior to the election, a representative sample of precincts is selected
for a given jurisdiction. A listing of all precincts in
the jurisdiction is compiled, and a probability sample
of precincts is selected. Typically, stratified sampling
is used to increase the precision of the estimates.
Precincts can be stratified using such variables as historical voting patterns, geography, and race/ethnicity.
In practice, sample precincts are used for two
purposes. The first is to provide data to project the
outcome of the election using actual votes from the
sample precincts. Workers, sometimes called stringers,
are sent to the sample precincts on Election Day. As
soon as possible after the polling place closes, the
stringers job is to get the actual vote totals from the
polling place official and call these results into a central
location where they are tabulated and fed into computerized statistical models for analysis. Sometimes it is
also possible to get the sample precinct votes via the
Internet or by phone. The sample precinct models can
take various forms, including ratio and regression estimators. For statewide elections, a typical sample size
for this purpose is about 80 precincts and usually varies from 15 to 100 depending on the characteristics of
Further Readings
SAMPLE REPLICATES
A sample replicate is a random subset of the entire
available sample (i.e., sampling pool) that has been
drawn for a particular survey. Sample replicates help
survey managers coordinate the progress that is made
on data collection during the surveys field period.
Sample replicates often are made up of a randomly
assigned 1,000 of the sampled elements, although
sometimes replicates may be as small in size as 100.
The value of structuring a survey sampling pool
into replicates is that it is not possible to know in
advance exactly how many of the telephone numbers,
emails, or street addresses in a sampling pool actually
Sample Size
781
Further Readings
SAMPLE SIZE
The sample size of a survey most typically refers to
the number of units that were chosen from which data
were gathered. However, sample size can be defined
in various ways. There is the designated sample size,
which is the number of sample units selected for contact or data collection. There is also the final sample
size, which is the number of completed interviews or
units for which data are actually collected. The final
sample size may be much smaller than the designated
sample size if there is considerable nonresponse,
782
Sample Size
ineligibility, or both. Not all the units in the designated sample may need to be processed if productivity
in completing interviews is much higher than anticipated to achieve the final sample size. However, this
assumes that units have been activated from the designated sample in a random fashion. Survey researchers
may also be interested in the sample size for subgroups of the full sample.
In planning to conduct a survey, the survey
researcher must decide on the sample design. Sample
size is one aspect of the sample design. It is inversely
related to the variance, or standard error of survey
estimates, and is a determining factor in the cost of
the survey. In the simplest situation, the variance is
a direct function of the sample size. For example, if
a researcher is taking a simple random sample and is
interested in estimating a proportion p, then
Variance = p1 p=n,
where n is the sample size. More generally,
Variance = f p1 p=n,
where f is the design effect, which reflects the effect
of the sample design and weighting on the variance.
In the planning effort for a more complex survey,
it is preferable to not focus directly on sample size. It
is best to either (a) set a budget and determine the
sample size and sample design that minimize the variance for the available budget, or (b) set a desired variance or required reliability, possibly using statistical
power analysis, and then determine the sample size
and sample design that minimize costs while achieving the desired variance. Sample size does not solely
determine either the variance or the cost of a survey
and thus is not generally, by itself, a meaningful planning criterion. More useful is effective sample size,
which adjusts the sample size for the sample design,
weighting, and other aspects of the survey operation.
Even better than fixing the budget and minimizing
the variance is to minimize the mean square error
(MSE), the researcher can set a desired MSE and
minimize the cost to obtain it. MSE is defined as the
sum of the variance and the bias squared and thus
accounts for more than just the variance. One sample
design option may have a larger sample size and
a lower variance than a second option but have
a larger MSE, and thus it would be a poor choice.
A common misconception is that the needed sample size is a function of the size of the population of
Sampling
Further Readings
SAMPLING
Sampling is the selection of a given number of units
of analysis (people, households, firms, etc.), called
cases, from a population of interest. Generally, the
sample size (n) is chosen in order to reproduce, on a
small scale, some characteristics of the whole population (N).
Sampling is a key issue in social research designs.
The advantages of sampling are evident: feasibility of
the research, lower costs, economy of time, and better
organization of the work. But there is an important
problem to deal with: that is, sampling error, because
a sample is a model of reality (like a map, a doll, or an
MP3) and not the reality itself. The sampling error
measures this inevitable distance of the model from
reality. Obviously, the less it is, the more the estimates are close to reality. Unfortunately, in some
cases, the sampling error is unknowable.
There are two main families of sampling methods:
probability (random) sampling and nonprobability
sampling, respectively typical of (but not exclusive
to) quantitative and qualitative research.
Probability sampling, definitively codified in the
1930s by the Polish statistician Jerzy Neyman, is characterized by the condition that all units of the population
have an (theoretically) equal, calculable (i.e., known),
nonzero probability of being included in the sample.
783
z2 pqN
,
E2 N 1 + z2 pq
784
Sampling Bias
the researchers cannot say whether the results are representative or not, and the risk of nonsampling errors
is large.
The big issue with sampling remains representativeness (i.e., external validity). Are probability samples really representative? The answer to this question
is not trivial. In fact, probability samples cannot guarantee representativeness, for at least four reasons:
1. Survey researchers cannot say whether a sample
is indeed representative or not, because they generally
sample precisely to find out something about an unknown reality. This is the so-called sampling paradox.
2. To prove or correct (in case of post-stratification)
the representativeness of a sample, the estimates are
often compared to census data. In this case, the
researcher must take into account two further problems: (a) Census data may be too old, and (b) they
could represent a benchmark only with respect to certain variables, so the best thing the researcher can
obtain is a sample that is representative only with respect to those limited variables (mainly demographics).
3. The researcher must take into account nonsampling errors, trying to minimize them (e.g., through
weighting). There are four major types of nonsampling errors: coverage errors (e.g., when the list of
the population is incomplete), measurement errors
(due to bad questionnaires); nonresponse errors (associated with refusals, noncontacts, movers, illiteracy,
language barriers, and missing data); and processing
errors (coding or inputting errors). These errors are
often quite difficult to know and control.
4. Nonprobability samples may be representative
by chance (e.g., many quota samples prove to be representative a posteriori).
These are the reasons why, on one hand, nonprobability samples are used even in important surveys, and,
on the other, a hybrid direction is gradually getting
a footing in the social research community, as the success of mixed strategies like respondent-driven sampling shows.
A recent frontier in sampling is the alliance with
the new technologies. Though rather promising,
however, Internet sampling, cell-phone sampling,
and others still have to deal with many problems.
For example, the number of Internet users is significantly lower among older people. For this reason,
Further Readings
SAMPLING BIAS
Sampling bias occurs when a sample statistic does not
accurately reflect the true value of the parameter in
the target population, for example, when the average
age for the sample observations does not accurately
reflect the true average of the members of the target
population. Typically, sampling bias focuses on one
of two types of statistics: averages and ratios. The
sources of sampling bias for these two types of statistics derive from different sources; consequently, these
will be treated separately in this entry.
Sampling Error
Imperfect sampling frames occur frequently in research and can be classified into four general categories:
(1) frames where elements are missing, (2) frames
where elements cluster, (3) frames that include foreign
elements, and (4) frames with duplicate listings of elements. For example, household telephone surveys
using random-digit dialing samples for landline telephones exclude households that are cell phone only
(missing elements), include telephone numbers in
some dwelling units that include more than a single
household (element cluster), include some telephone
numbers that are dedicated solely to fax machines
(foreign elements), and include some households with
more than a single landline telephone number (duplicate listings). All of these can cause sampling bias if
they are not taken into account in the analysis stage.
Sampling bias due to nonresponse results from
missing elements that should have been included in
the sample but were not; these can be classified as
noncontact (missing), unable to answer, or refusal.
Noncontacts are those elements that are selected into
the sample but cannot be located or for which no contact can be made; whereas the unable-to-answer either
do not have the necessary information or the required
health or skills to provide the answer, refusals are elements that, once located, decline to participate. Each
will contribute to sampling bias if the sampled nonresponding elements differ from the sampled elements
from which data are gathered. This sampling bias for
averages can be characterized as follows:
nnonresponders
Yresponders Ypopulation =
Yresponders Ynonresponders :
nsample
785
yij = mi + bi + eij ,
where yij represents the observed values of some variable y on j repeated observations of individual i, mi
represents the true value of what the researchers wish
to measure for individual i, bi represents the consistent bias in individual is response, and eij represents
the random error associated with observation j for
individual i. In sample surveys, consistent measurement bias can occur for a number of reasons, such as
the questions focus on issues that are subject to
a social desirability bias (e.g., illegal drug use).
SAMPLING ERROR
Sampling error consists of two components: sampling
variance and sampling bias. Sometimes overall sampling error is referred to as sampling mean squared
786
Sampling Error
= Vary + Bias ,
where P is the true population value, p is the measured sample estimate, and p0 is the hypothetical mean
value of realizations of p averaged across all possible
replications of the sampling process producing p.
Sampling variance is the part that can be controlled
by sample design factors such as sample size, clustering strategies, stratification, and estimation procedures. It is the error that reflects the extent to which
repeated replications of the sampling process result in
different estimates. Sampling variance is the random
component of sampling error since it results from
luck of the draw and the specific population elements that are included in each sample. The presence
of sampling bias, on the other hand, indicates that
there is a systematic error that is present no matter
how many times the sample is drawn.
Using an analogy with archery, when all the
arrows are clustered tightly around the bulls-eye we
say we have low variance and low bias. At the other
extreme, if the arrows are widely scattered over the
target and the midpoint of the arrows is off-center, we
say we have high variance and high bias. In-between
situations occur when the arrows are tightly clustered
but far off-target, which is a situation of low variance
and high bias. Finally, if the arrows are on-target but
widely scattered, we have high variance coupled with
low bias.
Efficient samples that result in estimates that are
close to each other and to the corresponding population value are said to have low sampling variance,
low sampling bias, and low overall sampling error. At
the other extreme, samples that yield estimates that
fluctuate widely and vary significantly from the corresponding population values are said to have high sampling variance, high sampling bias, and high overall
sampling error. By the same token, samples can have
average level sampling error by achieving high levels
of sampling variance combined with low levels of
sampling bias, or vice versa. (In this discussion it is
assumed, for the sake of explanation, that the samples
are drawn repeatedly and measurements are made for
each drawn sample. In practice, of course, this is not
Sampling Variance
Sampling variance can be measured, and there exist
extensive theory and software that allow for its calculation. All random samples are subject to sampling
variance that is due to the fact that not all elements in
the population are included in the sample and each
random sample will consist of a different combination
of population elements and thus will produce different
estimates. The extent to which these estimates differ
across all possible estimates is known as sampling
variance. Inefficient designs that employ no or weak
stratification will result in samples and estimates that
fluctuate widely. On the other hand, if the design
incorporates effective stratification strategies and minimal clustering, it is possible to have samples whose
estimates are very similar, thereby generating low variance between estimates, thus achieving high levels of
sampling precision.
The main design feature that influences sampling
variance is sample size. This can be seen readily from
the following formula for the sampling variance to
estimate a proportion based on a simple random sample design:
varp = pq=n,
where p is the sample estimate of the population proportion, q = 1 p, and n is the sample size. (Formula 2
and subsequent formulae are relevant for proportions.
Similar formulae are available for other statistics such
as means, but they are more complicated.)
It can be easily seen from this formula that as n
increases, the variance decreases in direct and inverse
proportion. Because sampling variance is usually measured in terms of the confidence interval and standard
error (which is the square root of the sampling variance), we usually refer to the impact of an increase in
sample size in terms of the square root of that increase. Thus, to double the sampling precision, that is,
reduce the sampling variance by 50%, we would have
to increase the sample size by a factor of 4.
Sampling variance, or its inverse, sampling precision, is usually reported in terms of the standard error,
confidence interval, or more popularly, the margin
of error. Under a simple random sample design, the
Sampling Error
787
Restricting the sample first to 100 clusters (e.g., counties), and then taking 10 households within each cluster, reduces the travel costs but reduces our ability to
spread the sample effectively over the entire country.
This reduction in efficiency is further exacerbated by
the fact that within each cluster, usually a geographically contiguous area, households tend to be more
alike than households across these units. This phenomenon, called intraclass homogeneity, tends to
drive up the sampling variance because efficiency is
lost and the original sample of 1,000 might, in effect,
have only the impact of 100 if, in the extreme case,
the clusters are perfectly homogeneous.
Thus, in summary, with respect to sample design
optimization, stratification is beneficial in that it
reduces sampling variance, whereas clustering is to be
avoided when possible or at least minimized as its
effect is to increase sampling variance. Usually the
effect of clustering is more marked than that of stratification. In many situations, though, clustering is necessary for cost reasons; thus, the best clustered design
strategy involves finding a compromise between the
cost savings and the penalty to be paid in terms of
lower precision.
Another important factor that influences sampling
variance is weighting. Weighting refers to adjustment
factors that account for design deviations such as
unequal probabilities of selection, variable nonresponse rates, and the unavoidable introduction of bias
at various steps in the survey process that are corrected for through a process called post-stratification.
The effect of weighting is to increase the sampling
variance, and the extent of this increase is proportional to the variance among the weights.
A useful concept that quantifies and summarizes
the impact of stratification, clustering, and weighting
on sampling variance is the design effect, usually
abbreviated as deff. It is the ratio of the true sampling
variance taking into account all the complexities of the
design to the variance that would have been achieved if
the sample had been drawn using a simple random
sample, incorporating no stratification, clustering, or
weighting. A value of 1.00 indicates that the complexity
of the design had no measurable impact on the sampling variance. Values less than 1.00 are rare; values
larger than 5.00 are generally considered to be high.
The design effect is closely related to rho (), the
intraclass correlation, mentioned previously. The
following formula shows the relationship between
the two:
788
Sampling Error
deff = 1 + b 1,
where deff is the design effect, is the intraclass correlation, and b is the average cluster size.
The correct calculation of sampling variance,
incorporating all the complexities of the design, is not
straightforward. However, there is extensive software
currently available that uses either the empirical bootstrap replication approach or the more theoretically
based Taylor Series expansion. These systems typically allow for many types of stratification, clustering,
and weighting although the onus is always on the user
or the data producer to ensure that relevant information, such as the stratum identifier, cluster identifier,
and weight, are present in the data set.
Sampling Bias
This component of sampling error results from a systematic source that causes the sampling estimates,
averaged over all realizations of the sample, to differ
consistently from their true target population values.
Whereas sampling variance can be controlled through
design features such as sample size, stratification, and
clustering, we need to turn to other methods to control
and reduce bias as much as possible.
Sampling bias can only be measured if we have
access to corresponding population values. Of course,
the skeptic will point out that if such information
were available, there would be little point in drawing
a sample and implementing a survey. However, there
are situations in which we can approximate sampling bias by comparing underlying information such
as basic demographics for the sample with corresponding data from another, more reliable, source
(e.g., census or large national survey) to identify areas
in the data space for which the sample might be
underrepresented or overrepresented.
One major source of sampling bias is frame coverage; that is, the frame from which the sample is drawn
is defective in that it fails to include all elements in
the population. This is a serious error because it cannot be detected, and in some cases its impact cannot
even be measured. This issue is referred to as undercoverage because the frame is missing elements that it
should contain. The opposite phenomenon, overcoverage, is less serious. Overcoverage occurs when the
frame includes foreign elements, that is, elements that
do not belong to the target population. However, these
elements, if sampled, can be identified during the field
Sampling Fraction
by gender and age, even for minor geographical subdivisions such as tracts. To take a hypothetical example, suppose the sample distribution by gender turns
out to be 40% male, 60% female, a not uncommon
result in a typical random-digit dialing telephone survey. Furthermore, assume that the corresponding census numbers are close to 50:50. Weighting would
assign relative adjustment factors of 50/40 to males
and 50/60 to females, thus removing the possible bias
due to an overrepresentation of females in the sample.
Challenges
Overall sampling error needs to be viewed in terms of
a combination of sampling variance and sample bias.
The ultimate goal is to minimize the mean squared
error. Survey researchers know how to measure sampling variance, and they have a good handle on how
it can be reduced. Sampling bias represents more of
a challenge as it is often difficult to measure and even
if it is measurable, bias reduction is often expensive
and problematic to achieve.
It is illustrative to discuss surveys that are based
on nonprobability judgment, quota, or convenience
samplesthat is, samples that are not based on probability-based design. One currently prominent example
is the Internet-based panel, which consists of members who choose (self-select) to belong to these
panels. That is, the panel members are not selected
randomly and then invited to join the panel, but
rather, the members themselves decide to join the
panels, hence the term opt-in populations. This means
that the underlying frame suffers from undercoverage
and many potential types of bias, only some of which
are known. These samples might be appropriate for
certain studies (e.g., focus groups), in which generalizing with confidence to the population is not an absolute prerequisite. But, in general, these surveys fall
short of required methodological rigor on two counts.
In the first place, the probabilities of selection are usually unknown and often unknowable, thus precluding
any chance of calculating sampling variance. Second,
these surveys suffer from coverage and selection bias
issues that, in many cases, are not even measurable.
With the advent of relevant software, surveys now
regularly produce large-scale sampling variance results
showing not only standard errors and confidence intervals but also design effects and measures of intraclass
correlation. The results typically are presented for the
entire sample and also for important subpopulations
789
that are relevant for data users. These are useful not
only to shed light on the quality of the data but also to
inform future sample designs. The choice of estimates
and subpopulations for which to publish sampling
errors is not simple, and some researchers have developed generalized variance functions that allow users
to estimate their own sampling errors based on the
type of variables in question, the sample size, and
level of clustering. However, these results are usually
limited to sampling variance, and much less is calculated, produced, and disseminated with respect to
sampling bias. This is due largely to the difficulty of
calculating these measures and to the challenge of separating sampling bias from other sources of bias, such
as nonresponse bias and response bias.
Karol Krotki
See also Bootstrapping; Clustering; Coverage Error; Design
Effects (deff); Duplication; Intracluster Homogeneity;
Margin of Error (MOE); Mean Square Error;
Nonprobability Sampling; Nonresponse Bias;
Overcoverage; Post-Stratification; Probability of
Selection; Response Bias; (Rho); Sample Size;
Sampling Bias; Sampling Frame; Sampling Variance;
Self-Selection Bias; Simple Random Sample; Strata;
Stratified Sampling; Systematic Sampling; Taylor Series
Linearization; Undercoverage; Weighting
Further Readings
SAMPLING FRACTION
A sampling fraction, denoted f , is the proportion of
a universe that is selected for a sample. The sampling
fraction is important for survey estimation because in
sampling without replacement, the sample variance is
reduced by a factor of (1 f ), called the finite population correction or adjustment.
In a simple survey design, if a sample of n is
selected with equal probability from a universe of N,
then the sampling fraction is defined as f = n=N. In
this case, the sampling fraction is equal to the probability of selection. In the case of systematic sampling,
f = 1=I where I is the sampling interval.
790
Sampling Frame
Further Readings
SAMPLING FRAME
A survey may be a census of the universe (the study
population) or may be conducted with a sample that
represents the universe. Either a census or a sample
survey requires a sampling frame. For a census, the
frame will consist of a list of all the known units in
the universe, and each unit will need to be surveyed.
For a sample survey, the frame represents a list of the
target population from which the sample is selected.
Ideally it should contain all elements in the population, but oftentimes these frames do not.
Sampling Interval
Quality Issues
Ideally, the sampling frame will list every member
of the study population once, and only once, and will
include only members of the study population. The
term coverage refers to the extent to which these criteria are met. In addition, the frame should be complete in terms of having information needed to select
the sample and conduct the survey, and the information on the frame should be accurate.
Needless to say, almost no sampling frame is perfect. Examining the quality of a frame using the criteria discussed in this section may lead to looking for
an alternative frame or to taking steps to deal with the
frames shortcomings.
Problems in frame coverage include both undercoverage and overcoverage. Undercoverage means
that some members of the universe are neither on the
frame nor represented on it. Some examples of undercoverage are the following:
1. All RDD landline frames exclude households with
no telephone service, and those with only cellular
phone service.
2. Frames drawn from telephone directories exclude
those households (listed in #1 above) plus those
with unpublished and recently published numbers.
3. New construction may be excluded from lists of
addresses used as sampling frames for surveys conducted by mail or personal visit.
4. Commercial lists of business establishments exclude
many new businesses and may underrepresent small
ones.
Frames can also suffer from undercoverage introduced by self-selection bias, as in the case of panels
recruited for Internet research, even if the panels were
recruited from a survey that used a probability sample
with a good frame.
Overcoverage means that some elements on the
frame are not members of the universe. For example,
791
SAMPLING INTERVAL
When a probability sample is selected through use of
a systematic random sampling design, a random start
is chosen from a collection of consecutive integers
792
Sampling Interval
Selected Units
Unit
1
Unit . . . Unit
2
k
Unit
k+1
Unit . . . Unit
k+2
2k
Unit
2k+1
Unit
2k+2
Unit
Unit
. . . Unit
k(n 1) +1 k(n 1) +2
nk = N
Sampling
Interval
Figure 1
Illustration of sampling interval and its implication for systematic random sampling
Population units are ordered according to identification numbers: 1, 2, . . . nk = N. The size of the sampling interval is k units from
which a random start is selected. In this case, 2 is selected (circled) for the sample implying that every subsequent kth unit (i.e.,
2, k + 2, 2k + 2, . . .), as shown by upper arrows, is selected for the sample.
Sampling Pool
class of which a sample of 16 was desired. The corresponding (fractional) sampling interval is k = 200=
16 = 12:5.
Trent D. Buskirk
See also Cluster Sample; n; N; Random Start; Systematic
Sampling; Weighting
Further Readings
SAMPLING POOL
Sampling pool is a survey operations term, one that
statisticians sometimes refer to as the designated sample size, which was proposed by Paul J. Lavrakas in
the 1980s to refer to the set of elements selected from
a sampling frame that may or may not all be used in
completing data collection for a given survey project.
The value of using this term is to be able to have
a unique term to differentiate the sampling pool that
a researcher starts with from the final sample (i.e., the
final sample size) the researcher finishes with. Traditionally, survey researchers have used the word sample to refer to both the final number of completed
interviewers a survey is striving to attain and the
number of elements used to gain those completed
interviews. Because noncontacts, nonresponse, and
other reasons (e.g., ineligibility) cause many elements
in a sampling pool to not end as completed interviews, the final sample is essentially always smaller
in size than the sampling pool and, in many cases, is
substantially smaller, for example, 1/10 or 1/20 the
size of the sampling pool.
For example, if researchers have estimated that
they will need 10,000 telephone numbers for a random-digit dialing (RDD) survey that has a goal of
completing 800 interviews, the survey call center that
does the interviewing may not need to activate all of
those numbers during the data collection period. That
is, their processing of the RDD numbers toward the
goal of 800 completions may be more efficient than
expected, and they may not need to activate all the
793
794
Sampling Variance
SAMPLING VARIANCE
Sampling variance is the variance of the sampling
distribution for a random variable. It measures the
spread or variability of the sample estimate about its
expected value in hypothetical repetitions of the sample. Sampling variance is one of the two components
of sampling error associated with any sample survey
that does not cover the entire population of interest.
The other component of sampling error is coverage
bias due to systematic nonobservation. The totality of
sampling errors in all possible samples of the same
size generates the sampling distribution for a given
variable. Sampling variance arises because only a sample rather than the entire population is observed. The
particular sample selected is one of a large number of
possible samples of the same size that could have
been selected using the same sample design. To the
extent that different samples lead to different estimates for the population statistic of interest, the sample estimates derived from the different samples will
differ from each other.
The positive square root of the sampling variance
is called the standard error. For example, the square
root of the variance of the sample mean is known as
the standard error of the mean. The sample estimate
and its standard error can be used to make inferences
about the underlying population, for example, through
constructing confidence intervals and conducting
hypothesis testing. It is important to note, however,
that sampling variance is measured about the expected
ns =
n
1 f X
yi y2
:
n i = 1 n 1
795
sk :
kU
0, 1, 1
1, 0, 1
1, 1, 0
Ps
1
4
1
4
2
4
fs D:sk = 1g
definition of the first inclusion probability of the sample unit k U. Now, the matrix of second moments of
796
SAS
S is P = ESST =
sD
1
:
N
n
1
, i 6 j U:
N 1
SAS
SAS (pronounced sass) is the name of one of the
worlds largest software development corporations.
Originally an acronym for statistical analysis software, SAS was created by Jim Goodnight, John Sall,
and other researchers at North Carolina State University in the early 1970s. What began as a locally developed set of programs for agricultural research quickly
became so popular that in 1976 the SAS Institute was
formed in Raleigh, North Carolina, to meet the growing demand. The company immediately formed an
alliance with IBM and created the first SAS Users
Group International (SUGI), which continues to provide assistance to SAS users, distribute newsletters,
maintain a popular Web site and hold conferences
throughout the world.
Within 5 years, SAS outgrew its original site and
moved to its current campus in Cary, North Carolina.
By this time, it was installed in thousands of sites
around the world. Within 10 years, not only was SAS
installed on 65% of all mainframe sites, but partnerships had been established with Microsoft and Apple
as the personal computer revolution began. Throughout the 1990s and early 2000s, SAS has been the
recipient of many prestigious awards for its technical
accomplishments. Annual revenues in 1976 were
$138,000; by 2006 they were $1.9 billion.
The features of SAS software cover a large family
of products with applications in the government, academic, and private sectors. The characteristics most
used by survey research professionals are common to
most data analysis software, although the implementation can be very different from one software package
to another. First, SAS has the ability to read electronically stored data in almost any format from almost
any medium. Second, it has an enormous array of data
transformation options with which to recode existing
variables and create new ones. Third, SAS has an
unlimited number of data analysis procedures from
commonly used procedures to the most exotic
Satisficing
797
Forms of Satisficing
James Wolf
Further Readings
SAS: http://www.sas.com
SATISFICING
The notion of satisficing is consistent with cognitive
theory articulated by Roger Tourangeau, Lance Rips,
and Kenneth Rasinski that survey respondents must
execute four stages of cognitive processing to answer
survey questions optimally. Respondents must (1) interpret the intended meaning of the question, (2) retrieve
relevant information from memory, (3) integrate the
information into a summary judgment, and (4) map
the judgment onto the response options offered. When
respondents diligently perform each of these four steps,
they are said to be optimizing. However, instead of
seeking to optimize, respondents may choose to perform one or more of the steps in a cursory fashion, or
they may skip one or more steps altogether. Borrowing
Herbert Simons terminology, Jon Krosnick has referred
to this behavior as satisficing in his seminal paper
published in 1991.
Whereas some people may begin answering
a questionnaire without ever intending to devote the
effort needed to optimize, others might begin to
answer a questionnaire with the intention to optimize, but their enthusiasm may fade when they face
a long questionnaire or questions that are difficult
to understand or answer. As they proceed through
the questionnaire, these respondents may become
increasingly fatigued, distracted, and uninterested.
But even after motivation begins to fade, the
fatigued and unmotivated respondent is nevertheless expected to continue to provide answers to
questions with the implicit expectation that he or
she will answer each one carefully. At this point,
a respondent may continue to expend the effort necessary to provide optimal responses or may choose
798
Satisficing
Ability
Screening
used words and avoiding jargon, keeping the questionnaire to a reasonable length, and more) can help
to increase the likelihood of optimal responding to
survey questions.
Sowmya Anand
See also Acquiescence Response Bias; Cognitive Aspects
of Survey Methodology (CASM); Nondifferentiation;
Questionnaire Length; Respondent Burden; Respondent
Fatigue; Response Bias; Response Order Effects
Further Readings
SCREENING
Screening is the process by which elements sampled
from a sampling frame are evaluated to determine
whether they are eligible for a survey. Ideally, all
members of the sampling frame would be eligible, but
eligibility information is often not available prior
to constructing the frame. In this case, the sampling
frame must be narrowed to include only eligible sample members by subsectioning the frame, matching it
against an external administrative data source, or collecting eligibility information directly from a sampled
respondent or a proxy for that respondent.
Screening Types
When the sample frame is subsectioned or matched
against an external administrative data source, this is
referred to as passive screening because a respondent
is not directly involved in the process. Passive screening uses existing data to determine who, from a sampling frame of individuals, establishments, or other, is
likely eligible for a survey. For instance, a survey of
pediatric specialty hospitals in the western United
States may begin with a list of all hospitals across the
United States. Based on the original list itself, or
another that has been merged with the original, the list
799
800
Screening
Screening Techniques
Because screening is often conducted very early in
the interview process, it is vital that screening techniques are designed to foster, rather than discourage,
participation. Screening questions and interviews
should be as brief and to the point as possible. The
screener language should not be written in a way that
might bias the response decision process. For instance,
a survey on charitable donations among wealthy people may include a screener that simply obtains the
respondents basic demographic information and
general income range rather than collecting a great
deal of information about the respondents sources
of wealth and charitable behaviors. Such detailed
questions can be administered in the main survey
once the correct respondents have been identified
and the specifics of the survey topic and benefits
have been explained. A brief and persuasive screening approach can help limit the potential for nonresponse bias by maximizing the rate of successful
Seam Effect
801
SEAM EFFECT
The seam effect, also called the seam bias, a phenomenon specific to longitudinal panel surveys, refers to
the tendency for estimates of change, as measured
300
200
100
Figure 1
th
12
th
to
11
11
th
to
10
10
th
to
h
9t
h
8t
h
7t
th
h
9t
to
8t
Social Security
to
7t
h
to
h
6t
to
5t
5t
to
6t
h
4t
to
4t
3r
d
d
2n
1s
tt
to
3r
2n
Food Stamps
Survey of income and program participation month-to-month transition rates for receipt of social security
benefits and food stamps
802
Segments
Further Readings
SEGMENTS
Segments is a term for sample units in area probability
sampling (a specific kind of cluster sampling). Most
often, segments are sample units in the second stage
of area probability sampling and are more formally
referred to as secondary sampling units (SSUs).
As second stage sample units, segments are neighborhoods or blocks (either census-defined or practically defined by field workers) within the selected
primary sampling units, which are often counties or
whole metropolitan areas). Occasionally, segments can
refer to the first-stage sample units or to larger areas
than neighborhoods or blocks, such as entire census
tracts or even counties, but this entry describes them
in their more common usage as second-stage sample
units.
Individual units (often housing units, or sometimes
clusters of housing units) within the selected segments
are selected for inclusion in the sample. Traditionally,
field workers are sent out to the selected segments to
list every housing unit. New lists are becoming available for segments in urban areas built from postal
address lists. These lists are not yet available in rural
areas, but as rural addresses in the United States get
Self-Administered Questionnaire
803
n
,
1 + b 1
SELF-ADMINISTERED QUESTIONNAIRE
A self-administered questionnaire (SAQ) refers to a
questionnaire that has been designed specifically to be
completed by a respondent without intervention of the
researchers (e.g., an interviewer) collecting the data.
An SAQ is usually a stand-alone questionnaire though
it can also be used in conjunction with other data collection modalities directed by a trained interviewer.
Traditionally the SAQ has been distributed by mail or
in person to large groups, but now SAQs are being
used extensively for Web surveys. Because the SAQ
is completed without ongoing feedback from a trained
interviewer, special care must be taken in how the
questions are worded as well as how the questionnaire
is formatted in order to avoid measurement error.
A major criterion for a well-designed SAQ is
proper wording and formatting of the instructions, the
questions, and the answer categories. Don Dillman
804
Self-Reported Measure
Further Readings
SELF-REPORTED MEASURE
Self-reported measures are measures in which respondents are asked to report directly on their own behaviors, beliefs, attitudes, or intentions. For example,
many common measures of attitudes such as Thurstone scales, Likert scales, and semantic differentials
are self-report. Similarly, other constructs of interest
to survey researchers, such as behavioral intentions,
beliefs, and retrospective reports of behaviors, are
often measured via self-reports.
Self-reported measures can be contrasted to other
types of measures that do not rely on respondents
reports. For example, behavioral measures involve
observing respondents behaviors, sometimes in a constrained or controlled environment. Similarly, physiological measures like galvanic skin response, pupillary
response, and subtle movements of facial muscles rely
on biological responses rather than self-report. Measures of other variables, such as weight, height, or cholesterol level could also be assessed without self-report
by weighing or measuring respondents or by taking
Self-Reported Measure
805
more likely to bring rewards and minimize punishments than being viewed unfavorably, people may
sometimes be motivated to construct favorable images
of themselves for other people (e.g., for interviewers),
sometimes via deceit. Such systematic and intentional
misrepresentation by respondents when answering
questionnaires has been well documented. For example, people are more willing to report socially embarrassing attitudes, beliefs, and behaviors when their
reports are anonymous and when respondents believe
researchers have other access to information about the
truth of their thoughts and actions. Thus, some people
sometimes distort their answers to questions in order
to present themselves as having more socially desirable attitudes, beliefs, or behavioral histories, and
peoples reports may therefore be distorted by social
desirability bias.
Allyson Holbrook
See also Behavioral Question; Cognitive Aspects of Survey
Methodology (CASM); Computer-Assisted SelfInterviewing (CASI); Likert Scale; Question Order
Effects; Response Latency; Semantic Differential
Technique; Social Desirability
Further Readings
806
Further Readings
SELF-SELECTED SAMPLE
A sample is self-selected when the inclusion or exclusion of sampling units is determined by whether the
units themselves agree or decline to participate in the
sample, either explicitly or implicitly.
SELF-SELECTED LISTENER
OPINION POLL (SLOP)
A self-selected listener opinion poll, also called
SLOP, is an unscientific poll that is conducted by
broadcast media (television stations and radio stations) to engage their audiences by providing them an
opportunity to register their opinion about some topic
that the station believes has current news value. Typically two local telephone numbers are broadcast for
listeners to call to register their opinion on a topic.
One number might be for those who agree with the
issue, and the other might be for those who disagree.
For example, a question might be to indicate whether
a listener agrees or disagrees that the mayor should
fire the police commissioner.
Because the people who choose to call in do not
represent a known target population, the findings from
such polls have no external validity as they cannot be
generalized to any particular population and therefore
are not valid as measures of public opinion. Although
these polls may provide some entertainment value for
the station and its audience, especially for those who
call in, they are not scientifically valid measures of
news or public opinion.
Paul J. Lavrakas
See also Call-In Polls; Computerized-Response Audience
Polling (CRAP); 800 Poll; External Validity; 900 Poll;
Pseudo-Polls
Refusals
Self-Selected Sample
surveyors seek to obtain information from many people, quickly, at relatively low cost. For example, in an
Internet survey of the prevalence of student drug use,
students could be recruited by first asking school principals to volunteer their schools for participation, then
asking teachers within schools to volunteer their classrooms to participate, and third asking students to volunteer to participate by going to a Web site to fill out
the questionnaire.
Incidental Truncation
When units are sampled if, and only if, they engage
in some behavior, then the distribution of the outcome
variable is truncated at an unknown point. This is incidental truncation. For example, consider a survey in
which researchers are interested in estimating the mean
Scholastic Aptitude Test (SAT) score in each of the 50
United States. However, students SAT scores can be
sampled only if they choose to take the SAT, which
results in incidental truncation because it is very likely
that among those who do not take the SAT, there will
be scores even lower (were they to take the test) than
the lowest score among those taking the SAT.
807
Solutions
Essentially, for unbiased point estimates and standard
errors, researchers need to be able to assume that
selection into or out of the sample is either random
(i.e., a sampled unit is an independent draw from the
marginal population distribution of the outcome variable) or conditionally random (i.e., a sampled unit is
an independent draw from the conditional population
distribution of the outcome variable, conditioning on
measured covariates). In most instances, it is highly
unlikely that sampling units self-select into or out of
a sample for reasons completely independent of the
outcome of interest. More often, sampling units can
be thought of as self-selecting into or out of a sample
for reasons that are conditionally independent from
the outcome variable. If this assumption is met, several solutions are possible.
One solution that is available when dealing with
a probability sample with nonresponse is to identify
those units who opted not to participate, identify variables measured on both responders and nonresponders
that are also related to the outcome variable, and
stratify the sample on these measured variables. This
amounts to constructing weighting classes. The surveyor can then adjust raw sampling weight for each
unit by the inverse of their response propensity in
the weighting class (i.e., weighting class adjustment).
808
Self-Selection Bias
Further Readings
SELF-SELECTION BIAS
Self-selection bias is the problem that very often
results when survey respondents are allowed to decide
entirely for themselves whether or not they want to
participate in a survey. To the extent that respondents
propensity for participating in the study is correlated
with the substantive topic the researchers are trying
to study, there will be self-selection bias in the resulting data. In most instances, self-selection will lead
to biased data, as the respondents who choose to participate will not well represent the entire target
population.
A key objective of doing surveys is to measure
empirical regularities in a population by sampling
a much smaller number of entities that represent the
whole target population. Modern sampling theory is
predicated on the notion that whether an entity is eligible for interview should be determined by a random
mechanism as implemented by the researcher that
ensures that, for defined subpopulations formed by
a partition of the entire population, the probability of
selection is either proportional to the number in the
subpopulation or, after weighting, weighted sample
size is proportional to the number in the subpopulation. Further, the notion that sampling is random rules
out selection based on behaviors or attributes about
which the researchers are attempting to learn. For
Self-Selection Bias
809
Further Readings
810
SEMANTIC DIFFERENTIAL
TECHNIQUE
The semantic differential measurement technique is
a form of rating scale that is designed to identify the
connotative meaning of objects, words, and concepts.
The technique was created in the 1950s by psychologist Charles E. Osgood. The semantic differential
technique measures an individuals unique, perceived
meaning of an object, a word, or an individual.
Term: Energy
3
Good
Bad
Weak
Strong
Active
Passive
Wet
Dry
Cold
Hot
Meaningful
Meaningless
Fresh
Stale
Bright
Dark
Brave
Cowardly
Beautiful
Ugly
Figure 1
Semantic Space
Factor analysis of the semantic differential data allows
the researcher to explore a respondents semantic
space. The semantic space represents the three underlying attitudinal dimensions that humans are hypothesized to use to evaluate everything. Research has
demonstrated that these dimensions are present regardless of the social environment, language, or culture of
the respondent. The three dimensions are evaluation,
power, and activity.
The evaluation factor can be thought of as the
good/bad factor. Common bipolar responses are
good/bad, fresh/stale, friendly/unfriendly, or
interesting/uninteresting. The power factor, which
is sometimes called the potency factor, is the strong/
weak factor. Common semantic differential responses
for the power factor include strong/weak, powerful/
powerless, large/small, or brave/cowardly.
The activity factor is characterized as the active/
passive factor. A number of bipolar responses can
measure this, including active/passive, tense/
relaxed, and fast/slow.
811
Using these scales, the researcher can attain a reliable measure of a respondents overall reaction to
something. Researchers can obtain a subjects dimensional average by dividing the scales into their appropriate dimensions and averaging their response scores.
Once completed, these measurements are thought of
as the concepts profile.
Critique
Although the semantic differential technique has been
widely used since its inception, there are some concerns about it. Theoretically, using the scale rests on
the assumption that humans connotations of a word
are not the same, which is why this technique is
needed. Paradoxically, the scale assumes that the
chosen adjectives mean the same to everyone. For the
scale to work, it must be assumed that humans share
the same connotations for some words (i.e., the bipolar
adjectives). This is a fairly strong assumption. For
instance, looking at Table 1, the bipolar pairs of adjectives generally are set out with positive adjectives
on the left. However, for the cold/hot dyad it is not
clear that cold is always associated with positive
thoughts. Indeed, depending upon the respondents
past, cold could easily evoke negative thoughts.
Another concern is that attitudes do not always
match up with behavior. As such, attitudes can be poor
predictors of action or behavior. The semantic differential ought to be able to overcome these concerns
simply by design, but that does not mean that it can
do so on its own. If respondents give socially desirable
answers, it will negatively impact the reliability of the
measure. Also, if a respondent begins to consistently
answer in the same way (i.e., all neutral or always
agreeing), the reliability must be questioned. Yet
another critique is that the semantic differential does
not actually identify individual emotions. Because of
its design, the semantic differential technique cannot
distinguish beyond the one continuum, but then it
never was intended to do so in the first place.
James W. Stoutenborough
See also Attitudes; Bipolar Scale; Likert Scale; Rating;
Respondent; Social Desirability
Further Readings
812
Sensitive Topics
SENSITIVE TOPICS
There is no widely accepted definition of the term
sensitive topics, even though most survey researchers
would probably agree that certain subjects, such as
income, sex, and religion, are definitely examples of
the concept. In their classic text Asking Questions: A
Practical Guide to Questionnaire Design, Seymour
Sudman and Norman Bradburn avoided the term
altogether and instead talked about threatening
questions.
Part of the problem is that topics or questions can
be sensitive in at least three different, though related,
senses. The first sense is that of intrusiveness. Some
questions are inherently offensive to some (or most)
respondents; some topics are seen as inappropriate for
a survey. Respondents may find it offensive to be
asked about their religion in a government survey or
about their income in a study done by market
researchers. Sudman and Bradburn used the phrase
taboo topics, and some topics are clearly out of
bounds in certain contexts. It would be odd (and
impolite) to ask a new coworker intimate details about
his or her sexual life or medical history; in the same
way, respondents may regard some survey topics or
questions as none of the researchers business. A second sense of sensitivity involves the risk that the
information may fall into the wrong hands. Teenagers
in a survey on smoking may worry that their parents
will overhear their answers; respondents to the American Community Survey may worry that the Internal
Revenue Service will be able to access their answers
to the income questions on the survey. Questions are
sensitive in this second sense when they raise concerns that some third party (whether another household member or some agency or business other than
the survey firm) will learn what the respondents have
reported. For example, in business surveys, responding firms may worry that their competitors will gain
proprietary information about them. The final sense of
sensitivity involves the social desirability of the
behavior or attitude that is the subject of the question.
Consequences of Sensitivity
Despite respondents potential objections to such
topics, surveys often include questions about sensitive
topics. To cite one example, since 1971 the federal
government has sponsored a series of studies (most
recently, the National Survey on Drug Use and
Health) to estimate the prevalence and correlates of
illicit drug use in the United States. Because surveys
cannot always avoid asking about sensitive subjects,
it is important to know how people react to them and
how those reactions may affect the statistics derived
from the surveys.
Asking about sensitive topics in surveys is thought
to have three negative consequences for survey statistics. First, sensitive items may produce higher-thannormal levels of item nonresponse (missing data),
reducing the number of cases available for analysis
and possibly biasing the results. Often, the income
Sensitive Topics
813
Some methods used in surveys with sensitive questions increase the privacy of the data collection
process in an attempt to reduce the motivation to misreport. For example, the data may be collected in a private setting, either away from anyone else in the
respondents home or in a setting outside the home.
Evidence shows that interviews are often conducted
in the presence of bystanders and, at least in some
cases (e.g., when the bystander is the respondents
parent), this inhibits truthful reporting. Because of
the threat that other family members may learn sensitive information, other settings, such as schools, may
be better for collecting sensitive information from
teenagers.
Another method that may increase the respondents
sense of privacy is self-administration of the questions. Both paper questionnaires and computerized
self-administration (such as audio computer-assisted
self-interviewing) increase reporting of sensitive
information, and some studies indicate that computerized self-administration is even more effective than
the self-administration via paper questionnaires.
Collecting the data anonymously (with no identifying information that links the respondents to their
answers) may produce further gains in reporting.
However, having respondents put their completed
questionnaires in a sealed ballot box does not appear
to enhance the basic effect of self-administration,
although there is little firm evidence on this point.
Indirect Question Strategies
814
Sensitive Topics
Sequential Sampling
Conclusion
Surveys that ask people sensitive questions are here
to stay. Unfortunately, people come to surveys
armed with a lifetime of experience of fending off
unwelcome questions. They can avoid the questions by avoiding the survey entirely, by refusing to
answer specific sensitive questions, or by deliberately misreporting. Many surveys have adopted selfadministration to improve reporting; this and the randomized response technique both seem to be effective.
The key to both procedures is that the interviewer is
not aware of the embarrassing information being
revealed. Unfortunately, none of the survey techniques
for dealing with sensitive questions eliminates misreporting entirely. Research methodologists still have
a lot to learn about how best to collect sensitive information in surveys.
Roger Tourangeau
See also Anonymity; Audio Computer-Assisted SelfInterviewing (ACASI); Confidentiality; List Experiment
Technique; Misreporting; Missing Data; Mode of Data
Collection; Overreporting; Privacy; Randomized
Response; Refusal; Response Bias; Social Desirability;
Underreporting; Unfolding Question; Unit Nonresponse
Further Readings
SEQUENTIAL SAMPLING
For survey sampling applications, the term sequential
sampling describes any method of sampling that reads
an ordered frame of N sampling units and selects the
sample with specified probabilities or specified expectations. Sequential sampling methods are particularly
815
Systematic Sampling
For the simplest case, where sampling is with equal
probabilities and k = N=n is an integer, a random integer, I, between 1 and k is drawn and when the Ith element is encountered it is included in the sample. I is
then incremented by k, and the (I + k)th element is
included in the sample when encountered. The process continues until n units have been designated for
the sample.
A more general form of systematic sampling can
be applied where sampling is with unequal probabilities and/or k
6 N=n. Define the desired probabilities
of selection for each unit as pi for i 1, 2, . . . , N: For
an equal probability design, pi = n=N. For unequal
probability designs, it is only necessary that 0 < pi
816
1 and
Sequential Sampling
N
P
i=1
then
i1
it is necessary to draw a uniform (0,1) random number, R. For Unit 1, define S = p1 . If R S, then select
Unit 1 and increment R by 1. For subsequent Unit i
increase S by pi and if R S, then select Unit i and
increment R by 1.
PPS Sequential
PPS sequential sampling is defined for probability
minimum replacement (PMR) sampling. Sampling
without replacement is then shown as a special case.
Rather than working with the probability of selection,
PMR selection schemes work with the expected number of hits or selections for each unit designated by
Eni , where ni is the number of times unit i is
selected for a particular sample. When a size measure
Xi is used, the expected number of selections per unit
i is set as
Eni =
nXi
N
P
:
Xi
i=1
If Ti 1 = Ii 1 + 1 and Fi 1 Fi > 0,
Ti = Ii + 1 with probability F Fi .
i
X
j=1
Enj
and Ti =
i
X
nj :
j=1
Further Readings
Show Card
817
SHOW CARD
Within the context of survey research, a show card or
show sheet is a visual aid used predominantly during
in-person surveys. It is a card, a piece of paper, or an
electronic screen containing answer categories to
a question, from which the respondent chooses the
answer to the survey question. The respondent may
either look at the answer categories listed on the show
card when providing the answer or mark the answer
directly on the show card. The answers listed on the
show card can be in the form of words, numbers,
scales, pictures, or other graphical representations.
818
Show Card
SHOWCARD A
Please provide the letter (a, b, c, d, e, or f) next to the line that best
matches your households total income from last year. Your
households total income should include the money earned by all
members of your household before taxes.
a) less than $25,000
b) $25,000 to $49,999
c) $50,000 to $74,999
d) $75,000 to $99,999
e) $100,000 to $150,000
f) more than $150,000
Figure 1
Significance Level
SIGNIFICANCE LEVEL
The significance level (also called Type I error rate or
the level of statistical significance) refers to the probability of rejecting a null hypothesis that is in fact true.
This quantity ranges from zero (0.0) to one (1.0) and is
typically denoted by the Greek letter alpha (a). The significance level is sometimes referred to as the probability of obtaining a result by chance alone. As this
quantity represents an error rate, lower values are
generally preferred. In the literature, nominal values of
a generally range from 0.05 to 0.10. The significance
level is also referred to as the size of the test in that
the magnitude of the significance level determines
the end points of the critical or rejection region for
hypothesis tests. As such, in hypothesis testing, the
p-value is often compared to the significance level in
order to determine if a test result is statistically
819
820
Further Readings
821
years. As such, small area estimation is very important for the well-being of many citizens.
822
i = xTi + vi ,
=
i = 1, . . . , m,
ind
ind
N0, 2v . Sampling
N0, Di and vi
where ei
var 2v ^ 2 2 ^ 2v
n2i 2v + 2 =ni 3
indepen-
2. vi |, 2v , 2 N0; 2 , independently I = 1, . . . , m;
3. Independent prior on the second-stage parameters:
an improper uniform prior on Rp , 2v and 2
inverse gamma prior (possibly improper).
The HB predictor for i is obtained from the predictive distribution of the unsampled units Yij , j =
ni + 1, . . . , Ni , i = 1, . . . , m given the sampled units.
Datta and Ghosh gave the predictive distribution in
two steps. In Step 1, the predictive distribution is given
for fixed hyperparameters 2v and 2 . In Step 2, the
posterior distribution of the hyperparameters is given.
Under squared error loss, for known hyperparameters,
Snowball Sampling
the Bayes predictor is also the BLUP. The HB predictor, typically with no closed-form, is obtained by integrating the above Bayes predictor with respect to the
posterior distribution of the hyperparameters. Instead
of assigning priors to the hyperparameters, if one estimates them from the marginal distribution of the data
and replaces the variance components by their estimates in the Bayes predictor of i , the result is the EB
predictor. In fact, the EB predictor is identical to the
EBLUP of i . The HB predictor and associated measure of uncertainty given by the posterior variance can
be computed by numerical integration or Gibbs
sampling.
While the EBLUP is applicable to mixed linear
models, the HB and the EB approaches can be applied
even to generalized linear models, thereby making
a unified analysis of both discrete and continuous data
feasible.
Gauri Sankar Datta and Malay Ghosh
See also Composite Estimation; Parameter
Further Readings
823
SNOWBALL SAMPLING
Snowball sampling is a technique that can be applied
in two survey contexts. The first context involves
surveying members of a rare population. The second
involves studying mutual relationships among population members. In both cases, respondents are expected
to know about the identity of other members of the
same population group.
824
Social Capital
Studying Relationships
In the early 1960s, sociologist Leo Goodman proposed a probability samplebased method for studying relationships among individuals in a population.
An initial zero-stage (Stage 0) probability sample is
drawn. Each person in the sample is asked to name k
persons with some particular relationship; example
relationships are best friends, most frequent business
associate, persons with most valued opinions, and
so on. At Stage 1 these k persons are contacted and
asked to name k persons with the same relationship. The Stage 2 sample consists of new persons
named at Stage 1, that is, persons not in the original
sample. At each subsequent stage, only newly identified persons are sampled at the next stage. The process may be continued for any number of stages,
designated by s.
The simplest relationships involve two persons
where each names the other. If the initial sample is
a probability sample, an unbiased estimate of the
number of pairs in the population that would name
each other can be obtained. More complex relationships such as closed rings can be studied with more
stages of sampling. For example, person A identifies
person B; person B identifies person C; and person
C identifies person A.
If the initial sample is drawn using binomial sampling so that each person has probability p of being in
the sample and s = k = 1, an unbiased estimate of the
number of mutual relationships in the population designated by M11 is
^ 11 = y ,
M
2p
where y is the number of persons in the Stage 0 sample who named a person who also names them when
questioned either in the initial sample or in Stage 1.
The theory for estimating the population size for
various types of interpersonal relationships has been,
or can be, developed assuming binomial sampling and
may apply, at least approximately, when using other
initial sample designs more commonly applied in
practice, for example, simple random sampling (without replacement).
James R. Chromy
See also Coverage Error; Multi-Stage Sample;
Nonprobability Sampling; Probability Sample; Rare
SOCIAL CAPITAL
Building on the work of sociologist James Coleman,
political scientist Robert Putnam popularized the term
social capital to describe how basic features of civic
life, such as trust in others and membership in groups,
provides the basis for people to engage in collective
action. Even though social capital is not explicitly
political, it structures various types of activities that
are essential to maintaining civil and democratic institutions. Thus, social capital is defined as the resources
of information, norms, and social relations embedded
in communities that enable people to coordinate collective action and to achieve common goals.
It is important to recognize that social capital
involves both psychological (e.g., trusting attitudes)
and sociological (e.g., group membership) factors
and, as such, is a multi-level construct. At the macro
level, it is manifested in terms of connections between
local organizations, both public and private. At the
meso level, it is observed in the sets of interpersonal
networks of social affiliation and communication in
which individuals are embedded. And at the micro
level, it can be seen in the individual characteristics
that make citizens more likely to participate in community life, such as norms of reciprocity and feelings
of trust in fellow citizens and social institutions.
Research on social capital, despite its multi-level
conception, has focused on the micro level with individuals as the unit of analysis, typically using crosssectional surveys to measure citizens motivation,
attitudes, resources, and knowledge that contribute to
the observable manifestation of social capital: civic
participation. The meso-network level is represented
through individuals reports of their egocentric networks
in terms of size and heterogeneity as well as frequency
of communication within these networks. Examinations
of individuals connections to community institutions
Social Desirability
825
SOCIAL DESIRABILITY
Social desirability is the tendency of some respondents to report an answer in a way they deem to be
more socially acceptable than would be their true
answer. They do this to project a favorable image of
themselves and to avoid receiving negative evaluations. The outcome of the strategy is overreporting of
socially desirable behaviors or attitudes and underreporting of socially undesirable behaviors or attitudes.
Social desirability is classified as one of the respondent-related sources of error (bias).
Social desirability bias intervenes in the last stage
of the response process when the response is communicated to the researcher. In this step, a more or less
deliberate editing of the response shifts the answer in
the direction the respondent feels is more socially
acceptable. Since the beginning of survey research,
there have been many examples of socially desirable answers: for example, overreporting of having
a library card, having voted, and attending church and
underreporting of bankruptcy, drunken driving, illegal
drug use, and negative racial attitudes.
The concept of social desirability has four nested
characteristics: (1) The highest layer is a cultural characteristic, followed by (2) a personality characteristic,
(3) mode of data collection, and (4) an item characteristic. The cultural characteristic is determined by the
826
norms of that particular group or culture. For example, social desirability differs in conformist and individualist societies. Members of individualist societies
tend to reveal more information about themselves to
out-group representatives (the interviewer/researcher)
than members of collectivist societies where the distinction between in-group and out-group is sharper.
Within a society, specific cultural groups differ in
level of social desirability. For example, studies conducted in the United States have shown higher scores
of social desirability for minority groups when compared to majority groups.
The second characteristic is tied to a personality
trait. Researchers describe it as the need to conform
to social standards and ultimately as a response style.
Some scales have been developed to measure the tendency of respondents to portray themselves in a favorable light, for example, the Marlowe-Crowne scale,
the Edwards Social Desirability scale, and the Eysenck
Lie scale.
The third characteristic is at the mode of data
collection level. Social desirability has been found to
interact with some attributes of the interviewer and
the respondent, such as race/ethnicity, gender, social
class, and age. One of the most consistent findings in
the literature is that self-administrated methods of
data collection, such as mail surveys and Internet surveys, decrease the prevalence of social desirability
bias. The general explanation is that the absence of
the interviewer reduces the fear of receiving a negative
evaluation, and responses are, therefore, more accurate. The item characteristic is the question wording.
For items that have shown social desirability bias
or that are expected to show it, some question wording techniques successfully reduce it. Methods include
loading the question with reasonable excuses, using
forgiving words, and assuming that respondents have
engaged in the behavior instead of asking if they have.
Another strategy is the randomized response method.
From a practical point of view, the researcher
should be aware of potential social desirability
effectsespecially in cross-cultural research. Although
the researcher has no control over the cultural and personal characteristics, question wording and mode of
data collection can be used to decrease potentially
socially desirable responses. Particular care should be
taken when mixing modes of data collection.
Mario Callegaro
827
Further Readings
828
Social Isolation
SOCIAL ISOLATION
Theories of social isolation (or social location)
have been used to explain lower cooperation in
responding to surveys among certain subgroups in
society, such as the elderly, minority racial and ethnic
groups, and lower socioeconomic status groups. The
social isolation theory of unit nonresponse states that
subgroups in a society that are less connected to the
dominant culture of the societythat is, those who
do not feel part of the larger society or bound by its
normswill be less likely to cooperate with a survey
request that represents the interests of the dominant
society. According to the leverage-saliency theory,
respondents decide to participate in a survey depending on survey attributes such as how long the interview might take, the presence of an incentive, and
what the data might be used for. They may also make
this decision depending on who is sponsoring the survey and what the topic of the survey is. The hypothesis of social isolation comes into play in these last two
aspects.
Survey researchers have noted that sometimes sentiments of civic duty prompt survey participation
and that these feelings of civic duty are associated
with survey participation, especially when a properly
constituted authority is requesting the participation of
the sampled respondent. In theory, organizations with
greater legitimacy, for example, those representing
federal government agencies, are more likely to positively influence a respondents decision to participate
in a survey than, for example, a commercial survey
research firm with less authority. However, according
to some researchers, individuals who are alienated or
isolated from the broader society feel less attachment
to that society and thus have a lower sense of civic
duty, which in turn will lead to a higher refusal rate
in surveys conducted by and for those perceived as
being in authority. Similar reasoning is often advanced
regarding interest in and knowledge of the survey
topic, in particular when focusing on political and election surveys. There are a number of studies that link
Further Readings
829
O1
O1
X
O2
X
O2
O2
O2
Improving Validity
The Solomon four-group design provides information
relevant to both internal and external validity. Regarding internal validity, for example, many single-group
quasi-experiments use both pretests and posttests. If
a pretest sensitization occurs, it could be mistaken for
a treatment effect, so that scores could change from
pretest to posttest even if the treatment had no effect
at all. In randomized experiments, pretest sensitization
830
Spiral of Silence
Examples
John Spence and Chris Blanchard used a Solomon
four-group design to assess pretest sensitization on
feeling states and self-efficacy and treatment effects
when using an aerobic fitness intervention to influence
feeling states and self-efficacy. The pretest and posttest
questionnaires were (a) the Subjective Exercise Experience Scale, a 12-item questionnaire that measures
psychological responses to stimulus properties of exercise, and (b) a self-efficacy questionnaire, which asked
respondents about their ability to exercise at a high
level for various time lengths. The intervention required
respondents to pedal on a cycle ergometer at 60 rotations per minute for 12 minutes at three different target
heart rates. There was no main effect for pretest sensitization for feeling states, nor was there an interaction
between pretest administration and the treatment. However, the aerobic intervention did affect feeling states:
Treated respondents reported a better sense of well-being
and decreased psychological distress. There were no significant main effects or interactions for self-efficacy.
Bente Traeen used a Solomon four-group design as
part of a group randomized trial, in which schools
were randomly assigned to the four conditions. The
intervention consisted of a sexual education course to
increase contraceptive use; the pretests and posttests
were self-report questionnaires asking about sexual
activity. The researcher found an interaction between
the pretest and intervention, indicating that the pretest
may have affected the size of the treatment effect.
M. H. Clark and William R. Shadish
See also Experimental Design; External Validity; Factorial
Design; Interaction Effect; Internal Validity; Main Effect;
Random Assignment
Further Readings
SPIRAL OF SILENCE
Originally developed in the early 1970s by German
political scientist and pollster Elisabeth NoelleNeumann, the spiral of silence remains one of the few
theoretical approaches that attempts to understand
public opinion from a process-oriented perspective.
The general conceptual premise of this theory is that
there are different styles of communicative behavior
for those who are in the majority versus those in the
minority for a given issue.
According to the theory, those who are in the
majority are more likely to feel confident in expressing their opinions, while those in the minority fear
that expressing their views will result in social ostracism, and therefore they remain silent. These perceptions can lead to a spiraling process, in which
majority viewpoints are publicly overrepresented,
while minority viewpoints are increasingly withheld
and therefore underrepresented. This spiral results in
escalating social pressure to align with the majority
viewpoint, which in turn can lead to declining expressions of minority viewpoints. As the apparent majority position gains strength (i.e., is expressed more
and more confidently in public), those who perceive
themselves as being in the minority will be more
likely to withhold their opinion.
In responding to those who have attempted to
explain public opinion as a rational, informed process,
Noelle-Neumann has argued repeatedly that public
opinion is a method of social control, akin to the latest
fashion trends, which exert pressure to conform and
comply. At the heart of the spiral of silence theory is
the notion that people are highly motivated to fit in
and get along with others. Noelle-Neumann refers to
this motivation to avoid group exclusion as the fear
of isolation. Noelle-Neumanns argument for the
existence of the fear of isolation is based primarily on
Spiral of Silence
Measurement
Early research on the spiral of silence was almost
exclusively limited to survey and interview contexts.
In most survey-based studies, the key dependent variable is generally conceptualized as the respondents
willingness to express his or her opinion. Yet the operationalization of this expression has varied widely.
Noelle-Neumanns original research focused on West
Germans views of East Germany. Specifically, her
research sought to determine whether West Germans
were for or against recognizing East Germany as a second German state, what their impressions were of the
831
views of other West Germans, and what their perceptions were of how other West Germans would view
this issue in one years time. As Noelle-Neumanns
research program expanded, survey questions were
created with the goal of simulating a typical public situation in which an individual would have the choice
to express or withhold his or her own opinion on a
variety of political and social issues. The public situations included riding in a train compartment with a
stranger and conversing at a social gathering where
the respondent did not know all of the guests. By varying the opinion of the hypothetical conversation partner or group, Noelle-Neumann and colleagues were
able to gauge the effect of respondents perceptions
that they were either in the minority or the majority
on a given issue. Results from this initial wave of
research were mixed, with some combinations of issue
and setting producing a silencing effect while others
produced no such effect.
In addition to replicating Noelle-Neumanns research by asking survey respondents about hypothetical opinion expression situations involving strangers,
scholars have expanded on this line of work by gauging respondents willingness to speak out by measuring voting intention, evaluating willingness to discuss
issues with family and friends, assessing financial contributions to candidates, and determining respondents
willingness to express their views on camera for news
broadcasts. Several of these methods have been criticized for various methodological weaknesses. For
example, some scholars have noted that voting is a private act that differs significantly in its social context
from expressing an opinion to another person or in
a group. Expressing an opinion to family and friends
introduces a variety of social-psychological considerations, including interpersonal dynamics, as well as
knowledge about the opinions of other group members, as knowledge about the opinions of close
friends and family members is likely to be much
greater than knowledge about the opinions of other
groups. Whereas financial contributions can be associated with individuals through public finance
records, writing and mailing a check is certainly
a behavior distinct from publicly voicing support for
a candidate. Asking respondents about a hypothetical
chance to express their opinion on a news broadcast
evokes a social setting in which they could face
recriminations for their views, but it also lacks the
immediacy of reaction in comparison to an interpersonal exchange.
832
Spiral of Silence
Inconsistent Results
In the late 1990s, Carroll Glynn and colleagues conducted a meta-analysis of spiral of silence research.
Results indicated that there was a statistically significant relationship between individual perceptions that
others hold similar opinions and individual willingness to express an opinion, but the relationship was
quite weak across 9,500 respondents from 17 published and unpublished studies. Suggesting that hypothetical opinion expression situations might not evoke
the psychological response to contextual pressures,
this study called for experimentally created opinion
environments in future spiral of silence work. With
experimental research, respondents could be presented
with the opportunity to express an opinion or to keep
silent in the face of experimentally controlled majorities and minorities through the use of confederates
or deliberately constructed groups. The meta-analysis
also tested the impact of moderators on willingness to
Future Applications
The spiral of silence has remained a viable approach
in social science research perhaps because it addresses
the inherent multi-level nature of public opinion
processes. At the macro or contextual level, whether
media outlets tend toward consonance or present
a multi-faceted representation of events, these outlets
create a mediated context in which opinions are
formed and shaped. It is also within these mediated
contexts that the interpersonal exchange of opinions
occurs. In other words, media outlets create the backdrop for interpersonal exchanges of opinion, and these
opinion exchanges can further influence the process
of opinion formationspiraling or otherwise.
With this in mind, studying the spiral of silence
within a given context requires careful planning in
terms of the following:
The unit of analysisHow will opinion expression
be operationalized?
Contextual parametersWithin what context will
opinion expression occur?
Contextual characteristicsHow will the opinion
climate be measured?
Individual factorsWhat can moderate this
relationship?
Issue contextsWhat types of issues are most susceptible to the spiral of silence?
With the increasing use of multi-level modeling in the social sciences, scholars are afforded the
Split-Half
833
Further Readings
SPLIT-HALF
Split-half designs are commonly used in survey
research to experimentally determine the difference
between two variations of survey protocol characteristics, such as the data collection mode, the survey
recruitment protocol, or the survey instrument. Other
common names for such experiments are split-sample,
split-ballot, or randomized experiments. Researchers
using split-half experiments are usually interested
in determining the difference on outcomes such as
survey statistics or other evaluative characteristics
between the two groups.
In this type of experimental design, the sample is
randomly divided into two halves, and each half
receives a different treatment. Random assignment of
sample members to the different treatments is crucial
to ensure the internal validity of the experiment by
guaranteeing that, on average, any observed differences between the two groups can be attributed to
treatment effects rather than to differences in subsample composition. Split-half experiments have been
successfully used in various survey settings to study
measurement error bias as well as differences in survey nonresponse rates.
834
Standard Definitions
This experimental design has been used by questionnaire designers to examine the effects of questionnaire characteristics on answers provided by survey
respondents. Current knowledge of question order
effects, open- versus closed-response options, scale
effects, response order effects, and inclusion versus
exclusion of response options such as Dont Know
is based on split-half experiments. These experiments
have been conducted both in field surveys and in laboratory settings. Researchers usually assume that the
experimental treatment in questionnaire split-half
designs that produces the better result induces less
measurement bias in the survey statistic of interest.
However, these researchers often conduct such experiments because they do not have a gold standard or
true value against which to compare the results of the
study. Thus, the difference in the statistic of interest
between the two experimental groups is an indicator
of the difference in measurement error bias. Without
a gold standard, it does not, however, indicate the
amount of measurement error bias that remains in the
statistic of interest.
Split-half experiments have also proven useful for
studying survey nonresponse. Experimenters interested
in how variations in field recruitment procedures
such as interviewer training techniques, amounts or
types of incentive, advance and refusal letter characteristics, survey topic, and the use of new technologies
such as computer-assisted self-interviewingaffect
unit response, contact, and refusal rates have used
split-half designs. These experiments often state that
the design feature that had the higher response rate is
the better outcome. Nonresponse bias is less frequently
evaluated using a split-half design. Such designs have
also been used to study item nonresponse as a function
of both questionnaire characteristics and survey
recruitment protocol characteristics.
Split-half experiments are an important experimental design when examining the effect of different survey protocol features on survey statistics. However,
they do not automatically reveal which protocol or
instrument choice is better. In general, to determine
the better treatment, the researcher fielding a splithalf experiment should use theory to predict the
desirable outcome. For example, the type of advance
letter that produces a higher response rate will often
be considered superior, under the hypothesis that
higher response rates lead to more representative samples. Alternatively, the mode of data collection that
increases the number of reports of sensitive behaviors
is considered better under the assumption that respondents underreport sensitive behaviors. Although splithalf experiments are a powerful experimental design
that isolate, on average, the effects that different treatments, survey protocols, or other procedures have on
various types of outcomes, they are made practically
useful when the survey researcher has a theory on
which outcome should be preferred.
A final point of interest, despite the name split
half, survey researchers are not obligated from
a statistical standpoint or methodological standpoint
to actually split their samples exactly in half (i.e., 50/
50). For example, a news organization may be conducting a survey on attitudes towards the Iraq war and
want to determine whether alternative wording for the
primary support/opposition attitude question will elicit
different levels of support/opposition to the presidents policies. However, the news organization may
not want to dilute their measurement of this key
attitude, for news purposes, by assigning half of their
sample to the alternative wording and may choose to
assign far less than 50% of respondents to the alternative wording.
Sonja Ziniel
See also Experimental Design; Internal Validity;
Measurement Error; Missing Data; Nonresponse;
Nonresponse Bias; True Value
Further Readings
STANDARD DEFINITIONS
Broadly, the term standard definitions refers to
the generally accepted nomenclature, procedures,
and formulas that enable survey researchers to calculate outcome rates for certain kinds of sample
surveys and censuses. Specifically, Standard Definitions is the shorthand name for a booklet published
by the American Association for Public Opinion
Standard Definitions
835
Categorization of Outcomes
Just as doctors have to prepare equipment (and even
the patient) before testing blood, the researcher also
has to carefully categorize the outcome of each
attempt to reach respondents in the sample and use
that information to determine final outcome codes
for each case. All this has to be done before the
researcher can calculate final outcome rates. Classifying final dispositions can come at any time during
the data collection period for some clear-cut cases:
completed interviews, refusals, or, in random-digit
dialing (RDD) samples, telephone numbers that are
not working or have been disconnected.
However, during the data collection period, many
attempts to reach respondents must be assigned temporary outcome codes for cases where the final disposition may not be known until the field work is
over. An example for mail surveys is when the first
attempt at mailing a respondent a questionnaire
results in a returned envelope with no questionnaire,
in which case the researcher could mail a second
questionnaire with a letter asking the respondent to
complete it. An example for a phone survey is one in
which a respondent agreed to complete the interview
but asked the interviewer to call back at a more
convenient time. In both examples, the researcher
likely would be using temporary outcome codes (i.e.,
part of what has come to be considered paradata)
that would be useful in keeping track of the respondent sampled and would be useful in guiding the
researcher later when it came time to assign final
outcome codes.
836
Standard Definitions
Classification Problems
One of the problems that plague survey researchers is
when case outcomes are not clear-cut. For example,
in face-to-face interviews with people in their own
households, a researcher may never find any eligible
respondent at home during the field period. There
may clearly be an eligible respondent, but the interviewer is never able to talk to him or her because he
or she is unavailable during the data collection period.
This is a case of nonresponse, but it cannot be classified as a respondents refusal.
Some telephone survey case outcomes are even
harder to classify. If the researcher uses an RDD
sample of landline telephone numbers, many will be
business, government, disconnected, fax or computer
lines, or not working; these clearly are not considered
to be part of the eligible telephone numbers for a residential sample and are excluded from outcome rate
calculations. But what happens, for example, when
during the data collection period, interviewers call
a dozen times and it always rings but no one answers?
This example falls into a group of outcomes that force
the researcher to make assumptions about the proportion that might be truly eligible. There are many ways
to deal with this. One is to assume that the group of
phone numbers with an unknown eligibility split in
the same proportion of eligible-to-ineligible phone
numbers in the group with known eligibility in the
sample. This may not be always accurate, but it may
Standard Error
Sample Validity
Because using standard definitions of case outcomes
is relatively recent and their adoption has not been
universal, AAPOR outcome rates appear to be used
more among academic researchers and those who do
media polls and less among commercial market
researchers. Researchers have only just begun exploring how differences in outcome rates affect sample
validity. Some early research suggests that extremely
high outcome rates in samples of likely voters actually can hinder the ability of public opinion researchers to be as accurate as they need to be in pre-election
polls. Other research for general population samples
suggests that validity of samples for some types of
measures response rates between 40% and 70% does
not hamper validity. In fact, a team at the University
of Michigan, the University of Maryland, and the Pew
Center for the People and the Press found that for
837
STANDARD ERROR
Statistics are derived from sample data, and because
they are not taken from complete data, they inevitably
vary from one sample to another. The standard error
is a measure of the expected dispersion of sample
estimates around the true population parameter. It is
used to gauge the accuracy of a sample estimate: A
larger standard error suggests less confidence in the
sample statistic as an accurate measure of the population characteristic. Standard errors can be calculated
for a range of survey statistics including means, percentages, totals, and differences in percentages. The
discussion that follows focuses on sample means and
proportions.
In a survey of families, let X represent family
income, and let X denote the mean family income,
which is an example of a statistic resulting from the
data. Although the survey may be designed to provide
838
Standard Error
839
840
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
100
= 2:846:
1
1000
Trent D. Buskirk
Further Readings
Ideally, interviewers adhering to standardized procedures read (either from paper or from a computer
screen) survey questions and all response alternatives
precisely as worded by the designers of the survey,
and they repeat the full question and all response
alternatives when asked to repeat the question. In the
strictest forms of standardized survey interviewing,
interviewers also leave the interpretation of questions
entirely up to respondents and only respond to any
requests for clarification with neutral probes like
whatever it means to you and let me repeat the
question. The logic is that if only some respondents
receive clarification or help with answering, then the
stimulus (the question wording and response alternatives) is different for a different respondent, and thus
there is no guarantee that the data are comparable.
The broad consensus is that standardized interviewing procedures are the most desirable for sample
surveys and that more idiosyncratic or ethnographic
forms of interviewing that are useful for other more
qualitative research purposes are risky or undesirable
in surveys. But this consensus can manifest itself
somewhat differently in different survey organizations, where the precise procedures that count as
standardized in one center can differ from those in
other organizations. For example, organizations differ
on whether providing clarification to a respondent
counts as nonstandardized or standardized and on
whether repeating only the most appropriate response
alternatives is better than repeating them all. Survey
organizations can also vary in how extensively they
train and monitor their interviewers for adherence
to the standardized procedures they advocate, which
means that in practice some standardized surveys turn
out to be less standardized than others.
Controversies
Starting in the 1990s, methodological researchers who
closely examine interview recordings began documenting that strictly standardized procedures sometimes can create uncomfortable interactions that not
only frustrate respondents but also lead to demonstrably poorer data quality. When interviewers force
respondents into answers that do not reflect their circumstances, when they repeat information that the
respondent already knows, when they ask for information the respondent has already provided, or when
they refuse to clarify what their questions mean,
respondents can become alienated and recalcitrant,
841
842
STATA
STATA
STATA is a general-purpose interactive statistical
software package available in major platforms such as
Windows, Unix, and Macintosh. In part due to its upto-date coverage of statistical methodology and flexibility in implementing user-defined modules, STATA
has gained considerable popularity among social and
behavioral scientists, including survey researchers, in
recent years despite its initial learning curve for the
uninitiated.
STATA comes in four versions: (1) small STATA,
a student version; (2) intercooled STATA, the standard version; (3) STATA/SE, a version for large
data sets; and (4) STATA/MP, a parallel-processingcapable version of STATA/SE. Depending on size of
data and number of variables as well as computer
capacity, most survey researchers will likely choose
Intercooled STATA and STATA/SE, in that order of
preference.
With a fairly developed graphics capacity, STATA
offers a vast array of commands for all kinds of statistical analysis, from analysis of variance to logistic
regression to quantile regression to zero-inflated Poisson regression. Although not an object-oriented language like C, R, or S, STATA is fairly programmable,
and that is why there is a huge collection of userwritten macros, known as ado-files, supplementing
the main program of STATA and which are typically
well documented and regularly maintained. These
ado-files satisfy a spectrum of needs among common users. Two examples provide a sense of the
range: SPost, which is a set of ado-files for the postestimation interpretation of regression models for
categorical outcomes, and svylorenz, which is a
module for computing distribution-free variance estimates for quantile group share of a total, cumulative
quantile group shares (and the Gini index) when estimated from complex survey data.
STATA already has good capacity for analyzing
survey data in its main program. For example, the
svyset command declares the data to be complex survey type, specifies variables containing survey design
information, and designates the default method for
variance estimation. Many regression-type commands
work with the cluster option, which gives clustercorrelated robust estimate of variance. Adjustment for
survey design effects can also be achieved by using
the svy prefix command (e.g., svy: logit) before a
command for a specific operation. STATA supports
three major types of weight: frequency weight
(fweight) denoting the number of duplicated cases,
probability or sampling weight (pweight) for indicating the inverse of the probability that the observation
is included due to sampling design, and analytic
weight (aweight) being inversely proportional to the
variance of an observation. (Another weight, importance weight, can be used by a programmer for a particular computation.)
The xt series of commands are designed for analyzing panel (or time-series cross-sectional) data.
These, coupled with the reshape command for changing the data from the wide to the long form (or vice
versa) when survey data from multiple panels are
combined into one file, are very attractive features of
STATA for the survey data analyst.
Tim F. Liao
See also SAS; Statistical Package for the Social Sciences
(SPSS)
Further Readings
STATISTIC
A statistic is a numerical summary of observed values
for a random variable (or variables) in a sample. The
term random indicates that the variables value, and
thus its statistic, may differ across samples that are
drawn from the same sample population. A statistic is
used to estimate a parameter, a numerical summary of
a given variable (or variables) in the population.
843
A statistic is commonly represented with the common alphabet rather than with Greek letters, as is
typically the case with a parameter. For example, a
statistic such as a standard deviation is represented by
s, whereas the corresponding population parameter is
represented by s.
Describing characteristics of a sample variable
(or variables) is often called descriptive statistics.
Examples include computing the samples mean or
standard deviation on a variable such as age or
weight or describing proportions for categorical
variables such as race or marital status. The process
of using the samples descriptive data to generate
population estimates is typically referred to as inferential statistics.
It should be recognized that if data are available for
the entire population of interest, inference is unnecessary as the parameter can be calculated directly.
Kirsten Barrett
See also Inference; Parameter; Population; Random; Sample
Further Readings
844
Statistical Power
STATISTICAL POWER
The probability of correctly rejecting a null hypothesis that is false is called the statistical power (or simply, power) of the test. A related quantity is the Type
II error rate (b) of the test, defined as the probability
of not rejecting a false null hypothesis. Because
power is based on the assumption that the null
hypothesis is actually false, the computations of statistical power are conditional probabilities based on
specific alternative values of the parameter(s) being
tested. As a probability, power will range from 0 to 1
Statistical Power
845
0.020
0.015
Power
0.010
0.005
0.000
300
350
400
450
500
0
550
1
600
650
700
Figure 1
Illustration of the statistical power computation for a one-tailed hypothesis test for a single population mean
assuming the Type I error rate is .05
846
Statistics Canada
computed for three different effect sizes for onesided hypothesis tests involving a single mean is
depicted in Figure 2. Notice from the figure that the
statistical power increases as the sample size
increases. Also notice that for any given possible
sample size, the power increases as the effect size
increases. Finally, note that the height of the solid
curve in Figure 2 (corresponding to an effect size of
.50) at the possible sample size of 25 is approximately .81, consistent with previous calculations.
Statistical power computations are to hypothesis
testing of parameters as precision is to interval
estimation of parameters. For example, if the goal of
a survey sample is to produce reliable statewide estimates of obesity, average consumer spending per
month, or other variables of interest, then precision is
the defining computation that drives estimates of sample size. However, if interest is given to making comparisons for average obesity rates across several strata
or post-strata, or in comparing average unemployment
rates from one year to another within a state, then statistical power computations can be useful in understanding how likely one is to detect a given effect
with a particular sample size. Of course, these
1.0
Statistical Power
0.8
0.6
0.4
0.2
10
20
30
40 50 60 70 80
Possible sample sizes
Effect Sizes
0.25
0.50
Figure 2
90
100
1.0
quantities are closely intertwined statistically, but typically they imply different uses of the data at hand.
Trent D. Buskirk
See also Alternative Hypothesis; Finite Population
Correction (fpc) Factor; Null Hypothesis; Sample Size;
Type I Error; Type II Error
Further Readings
STATISTICS CANADA
Statistics Canada is Canadas national statistical
agency. Prior to 1971, Statistics Canada was known
as the Dominion Bureau of Statistics. The bureau was
created in 1918 as a permanent home for the national
census and to develop a standardized form and database for the collection of vital statistics. In addition to
the census and collection of vital statistics, Statistics
Canada administers more than 400 surveys and
compiles a number of different administrative data
sources. Some examples of their cross-sectional surveys include the Labour Force Survey, Survey of
Household Spending, and the General Social Survey.
Statistics Canada also completes several longitudinal
surveys, such as National Population Health Survey
Household Component, National Graduates Survey,
National Longitudinal Survey of Children and Youth,
and the Survey of Labour and Income Dynamics.
These surveys about employment, income, education,
health, and social conditions are a handful of the
many Statistics Canada surveys.
Many census data reports are free to the public
through Statistics Canadas online data source Community Profiles. CANSIM, another online data source,
includes summary tables from many of Statistics
Canadas surveys and administrative data sources.
Some of these summary tables are free, and other
tables have a nominal fee. Government officials, academics, the media, and members of the general public
receive daily updates about these data sources through
The Daily, Statistics Canadas official release bulletin.
Step-Ladder Question
Census
The first national census was in 1871, a few years after
Confederation (1867). The census included the four
original provinces and expanded as additional provinces
joined Confederation. When the prairie provinces joined
Confederation, an Agricultural Census was initiated
(1906) and repeated every 5 years to help monitor the
growth of the west. Since 1956, these two censuses
have been administered together every 5 years. Agricultural producers across the country received a copy of
the Census of Population questionnaire as well as a copy
of the Census of Agriculture questionnaire. Participation
in the census is required by law.
The intent of the first national census was to determine population estimates to ensure representation
by population in Parliament, a purpose still relevant
today. In Canada, the census is also used to calculate
transfer payments between the different government
jurisdictions (federal government to the provinces and
territories; provinces and territories to municipalities).
The transfer payments fund schools, hospitals, and
other public services.
Prior to the 1950s, the census involved multiple
questionnaires and ranged from 200 to 500 questions.
To reduce response burden and administration costs,
sample surveys were initiated to gather certain data
and the census questionnaires were reduced in size.
Two versions of the census questionnaire were introduced. The short version is administered to all Canadian households, except every fifth household, which
receives a long version of the questionnaire. The short
version gathers information about names of household
members, relationship to the head of household, sex,
age, marital status, and language. The long version
includes these questions as well as additional questions about education, ethnicity, mobility, income,
and employment. The content of the long version is
more apt to change across time, with the short version
staying more consistent in content.
In 1971, several innovations were introduced to the
census. First, one in three households received the
long questionnaire, but this innovation was abandoned
in subsequent years in favor of the one in five sampling approach. Second, the census moved from being
interviewer-administered to being self-administered
for most of the population. The latter innovation has
continued, and in 2006, this self-administered mode
included the option of completing the questionnaire
847
STEP-LADDER QUESTION
A step-ladder question refers to a type of question
sequence that yields more complete and accurate data
than would a single question on the same topic. Stepladder questions are used by survey researchers in an
attempt to reduce item nonresponse (missing data),
measurement error, or both, although they add slightly
to the cost of gathering the data since more than one
question needs to be asked.
For example, asking someone into which of the following income categories his or her 2007 total household income fellless than $20,000; $20,000$39,999;
$40,000$59,999; $60,000$79,999; $80,000$99,999;
$100,000 or morewill lead to a good deal of Dont
Know or Refused answers. Researchers have found
that a step-ladder question about income will substantially reduce item nonresponse and thus the need to
impute those missing values.
A step-ladder question sequence for the income
variable referenced in the previous paragraph, that
was programmed to be asked in a computer-assisted
interview, would be as follows:
Q1. Was your total household income from all
sources in 2007 more than $19,999?
< 1 > YES (GO TO Q2)
< 2 > NO (GO TO Q6)
848
Strata
STRATA
Strata in stratified sampling are distinct subsets of all
entries on a sampling frame. These subsets are strategically defined by one or more stratification variables
to improve the statistical quality of findings for the
population as a whole or for important population
subgroups. Once strata are formed and the sample is
allocated among strata, stratified sampling is accomplished by randomly selecting a sample within each
stratum.
Sampling strata may be formed explicitly from
a list of population members or as a part of the selection process in individual stages of a cluster sample.
Strata may also be considered implicit when the
frame is sorted by variables that could otherwise
define strata, and sampling is done in each of a set of
groups of neighboring entries throughout the frame
(e.g., as in systematic sampling). For simplicity, this
entry focuses largely on explicit stratification of individual population members, although the same principles of stratum formation apply to other forms of
stratified sampling as well.
Stratification Variables
The final set of strata used for sampling is usually
some type of cross-classification of the stratification
variables. For example, stratification of individuals by
two gender categories (i.e., male or female) and four
educational attainment categories (i.e., less than a high
school diploma, high school diploma, some college,
or college degree and beyond) would imply 2 4 = 8
fully cross-classified strata. If the number of males in
the population with some college or a college degree
and beyond were considered to be too small, a partially collapsed set of seven strata might be formed
from these two stratification variables by using all
four female strata and by considering all males with
Stratified Sampling
849
Further Readings
STRATIFIED SAMPLING
Sample selection is said to be stratified if some form
of random sampling is separately applied in each of
a set of distinct groups formed from all of the entries
on the sampling frame from which the sample is to be
drawn. By strategically forming these groups, called
strata, stratification becomes a feature of sample
designs that can improve the statistical quality of
850
Stratified Sampling
Forming Strata
Strata are formed by deciding on one or more stratification variables and then defining the actual strata in
terms of those variables. Improving the statistical
quality of sample estimates through stratification
implies that the stratification variables should (a) be
statistically correlated with the measurements that
define the main population characteristics to be estimated, (b) effectively isolate members of important
analysis subgroups, or both. Indeed, the first goal
implies that useful strata will be internally homogeneous with respect to the main study measurements,
while the second goal suggests that key subgroups
should be identifiable by one or more strata. Although
the number of formed strata need not be large, it
should be large enough to meet the needs of later
analysis.
There are two common misconceptions about stratum formation. One is that subgroup analysis will be
valid only if the subgroups are defined by those comprising one or more sampling strata. In fact, valid estimates can be produced for subgroups defined by
portions of one or more strata, although their precision will be slightly less than if complete strata define
the subgroup. Another myth is that incorrect stratum
assignment (e.g., due to measurement error in the
values of the stratification variables) will invalidate
Straw Polls
is also inversely related to the square root of the average cost of adding another sample member to the stratum. A practical limitation of optimum allocation is
that its statistical utility is tied to the existence of good
measures of stratum standard deviations and unit costs,
which may be hard to find.
851
STRAW POLLS
Straw polls originated as small, informal public opinion surveys that later evolved into large-scale, random sample surveys used primarily to determine the
viability of potential political candidates. Nowadays,
pollsters offer a litany of straw polls before nearly
every national election and several state elections.
The phrase straw poll has its origins in the idea of
straws in the wind, which were used to determine
which way the wind is blowing. The earliest use of a
straw poll was by a newspaper, The Harrisburg Pennsylvanian, during the 1824 presidential election.
Historically, straw polls were not scientific, and often
they were conducted haphazardly. They relied on
relatively large and sometimes massive samples to
achieve some semblance of accuracy. Early on this
included going to public places such as bars, political
rallies, train stations and such, where groups of people
were gathered and asking them their voting preferences. There was no sampling science at work at all.
However, as straw polling moved into the 20th
century, this situation began to change. Yet, even with
more than 2 million potential voters surveyed, straw
polls were often quite biased, as demonstrated in the
often-cited example of the Literary Digests 1936
declaration that, by a landslide, Alf Landon was going
852
Subgroup Analysis
SUBGROUP ANALYSIS
Subgroup analysis involves subdividing respondents
in a survey into groups on the basis of demographic
characteristics (e.g., race, ethnicity, age, education, or
gender) or other variables (e.g., party identification,
health insurance status, or attitudes toward the death
penalty). Analyses of subgroups can be done for
a number of reasons. A researcher might analyze differences in variable means or distributions across subgroups to identify disparities or other differences. For
example, a researcher studying health care insurance
may want to test whether there are differences in the
proportion of respondents in different income, education, or race subgroups who are covered by private
health care insurance.
Researchers may also want to compare bivariate
relationships or multivariate analyses across subgroups
to test whether relationships between variables are
moderated by subgroup membership. Alternatively,
these subgroup comparisons may be conducted to test
or establish the generalizability of a relationship or
model across subgroups of respondents. For example,
a researcher may want to compare the extent to which
respondents pre-election reports of their presidential
candidate preferences correspond to their post-election
reports of vote choice for respondents who are strong
partisans versus those who are weak partisans.
In research involving experimentally manipulated
variables, researchers may compare the characteristics
of respondents assigned to experimental groups to
test whether random assignment has been successful.
Researchers also may compare the effect of an experimental manipulation across subgroups to determine if
SUDAAN
853
SUDAAN
SUDAAN is a statistical software package for the
analysis of correlated data, including but not limited
to the analysis of correlated data encountered in complex sample surveys. Correlation between observations
may exist, for example, in multi-stage studies where
units such as geographic areas, schools, telephone
area codes, or hospitals are sampled at the first stage
of selection. A correlation may exist between observations due to the increased similarity of the observations located in the same cluster compared to
observations between clusters. During any statistical
analysis with survey data, the complex design features
of the study, including clustering, unequal weighting,
stratification, and selection with or without replacement, should be accounted for during the analysis.
Standard software packages that do not account for the
complex design features can produce biased results. In
fact, in most instances the precision of statistics will
likely be underestimated if one were to use a standard
software package that does not account for the complex design features of a study. Software packages,
such as SUDAAN, that are specifically designed for
the analysis of data from survey studies will properly
account for many of the complex design features of
the study during an analysis.
SUDAAN originated in 1972 at Research Triangle
Institute (RTI) and, over the years, has evolved into
an internationally recognized, leading software package for analyzing correlated and complex survey data.
SUDAAN 9.0, which was launched in August 2004,
offers several additions and enhancements to prior
releases. RTI statisticians are currently working on
the next major release of SUDAAN and plan to have
that available in 2008. SUDAAN can be run on any
DOS or Windows platform and is also available for
the SUN/Solaris and LINUX operating systems (32and 64-bit operating systems). SUDAAN is available
as a stand-alone software package or can be used in
SAS-callable format. The SAS-callable format is of
particular value to SAS users because it allows
SUDAAN to be run directly within any existing SAS
job. SUDAANs syntax is very similar to SAS (e.g., a
semicolon must appear at the end of every statement),
854
Suffix Banks
and SUDAAN has the ability to read data files in various formats such as SAS, SAS XPORT, SPSS, and
ASCII.
As noted earlier in this entry, SUDAAN provides
estimates that correctly account for complex design
features of a survey. These features include the
following:
Unequally weighted or unweighted data
Stratification
With- or without-replacement designs
Multi-stage and cluster designs
Repeated measures
General cluster-correlation (e.g., correlation due to
multiple measures taken from patients)
Multiply imputed analysis variables
Further Readings
SUDAAN: http://www.rti.org/sudaan
SUFFIX BANKS
Telephone numbers in the United States, Canada, and
the Caribbean consist of 10 digits divided into three
components: The first three digits are the area code;
the next three digits are the prefix or exchange; and
the final four digits are the suffix, or local number.
Further Readings
Superpopulation
SUGING
SUGing, or Selling Under the Guise of research (also
known as sugging), is the act of telemarketing or
pre-qualifying customers while pretending to conduct
a legitimate survey. Because it is, by definition, a practice that seeks to deceive the respondent, it is an
unethical business and research practice. In addition,
there is essentially no interest in the data being
gathered other than to help make the sales pitch more
effective.
It also has the side effect of increasing the victims
suspicion of subsequent legitimate survey contacts,
thereby reducing the population of individuals readily
willing to answer survey questions. In short, SUGing
not only represents illegitimate and thus unethical
research in itself, but it also has a long-term negative
impact on legitimate research projects. SUGing is proscribed by numerous public opinion, market research,
and direct marketing trade associations.
In SUGing, the respondents answers to the purported survey questions serve as a means of setting up
a sales pitch. This is in contrast with legitimate survey
research, in which obtaining the respondents answers
is the desired goal. Questions used in SUGing, therefore, are often superficial or biased to create a
favorable disposition toward the sales solicitation. In
some cases, however, SUGing solicitations can be
extremely detailed in order to develop a detailed profile of the respondent to facilitate a custom tailored
sales pitch that arrives later (e.g., via mail), which
itself may represent another violation of ethical obligations to protect respondent confidentiality. Regardless of the nature of the questions, a key feature of
a SUGing solicitation is its deceptiveness. There is no
indication of the true nature of the interview until the
sales pitch is given.
SUGing is illegal in several jurisdictions, including
the United States, Canada, and the European Union.
As a complicating factor, though, there is some international disagreement about the exact meaning
of SUGing. Although public opinion and market
research organizations in the United States, the United
Kingdom, and continental Europe all define SUGing
as selling under the guise of research, in Canada the S
also stands for soliciting (giving the term a similar
meaning to FRUGing, which is fund-raising under the
guise of research), and deceptive survey-like sales
855
Further Readings
SUPERPOPULATION
When data for a variable are gathered from a finite
population and that variable is regarded to be a random variable, then the finite population is referred
to as being a realization from a superpopulation. A
superpopulation is the infinite population that elementary statistical textbooks often describe as part of the
enumeration of a finite population. It is because sampling theory is based on making inference for a welldefined finite population that the concept of superpopulation is needed to differentiate between a finite
population and an infinite superpopulation.
This distinction is important for two reasons:
(1) Sampling theory estimation and inference can
856
Supervisor
SUPERVISOR
Survey research supervisors are tasked with two main
responsibilities: managerial or administrative duties
and research-specific duties. It is the supervisors who
are responsible for ensuring projects are proceeding
on schedule and are within budget, the research
process is carried out with due regard for ethical obligations, and all research goals are satisfied. To do
this, supervisors must have a firm grasp on the operations of their research firm, knowledge of the entire
research process, and an understanding of business
policies and practices. A comprehensive knowledge
of the science of survey research is also very helpful
to supervisors. In addition, direct experience with various survey tasks is an inherent necessity for survey
research supervisors.
Administrative Duties
Research operation supervisors typically oversee
a staff of department or area supervisors, administrative personnel, interviewers, and other research assistants. Department supervisors are often responsible
for specific tasks, such as data collection, computerassisted data collection programming, or data coding
and editing. These supervisors are responsible for the
day-to-day supervisory activities.
Proposals
Further Readings
Budgetary
Supervisor
Hiring
Research Duties
Project Development
Research supervisors are often tasked with developing the operational features of projects from the
ground up. They may work in cooperation with their
client or with topical experts, as necessary, to develop
the operationalization of the research design, questionnaire, and data processing stages. Of course, this
varies greatly depending on the specific research firm
or project. At times, research supervisors may be fulfilling only specific research requests such as data collection. This is often the case when research firms
conduct data collection for government agencies.
Survey Design
to interviewers on a timely basis, overseeing the system that sets callbacks and recontacts, sending out follow-up mailings; and making certain that the final
sample size of completed interviews is accomplished
within the needed time frame and, ideally, on budget.
If problems arise during the field period, especially
ones that were unanticipated by the researchers, it is
the responsibility of supervisors to bring these problems immediately to the attention of their superiors
and seek their immediate resolution. They must
ensure that the solution is implemented properly, and
they must monitor the effectiveness of the solution
and provide continuous feedback to their superiors.
Timelines
Further Readings
Supervisors are responsible for the successful processing of the sample. This includes issuing new cases
857
858
Supervisor-to-Interviewer Ratio
SUPERVISOR-TO-INTERVIEWER RATIO
This entry addresses the issue of how the average
number of interviewers that are assigned to a survey
supervisor affects data quality. There are many factors
that affect survey interviewer performance and success. Individual skill set, training, experience, and
questionnaire design all influence the ability of interviewers to gain cooperation and gather accurate
data. The one ongoing factor that has a significant
impact on data quality in surveys that are intervieweradministered is the interaction of the supervisor with
the telephone interviewer. Sometimes called a monitor,
coach, team lead, or project/section/floor supervisor,
this staff position gives management the greatest
leverage to influence the human aspect of such survey
projects.
The supervisors of survey interviewers may fulfill
many roles in a survey project, including any or all of
the following:
Monitor the quality of the data being collected.
Motivate the interviewers to handle a very difficult
job.
Provide insight into how a study is going and what
the problem areas are.
Provide training to improve the interviewing skills
of the interviewers.
Supply input for the interviewer performance
evaluation.
Supervisor-to-Interviewer Ratio
859
860
Survey
SURVEY
Survey is a ubiquitous term that is used and understood differently depending on the context. Its
definition is further complicated because it is used
interchangeably as a synonym for other topics and
activities listed under the broad classification of survey research or survey methods. There are multiple
uses of the term survey that are relevant to particular
linguistic applications. Survey is used as a noun when
it refers to a document (e.g., fill out this survey) or
a process (e.g., to conduct a survey); as an adjective
(e.g., to use survey methods); or as a verb (e.g., to
survey a group of people). Survey is used interchangeably with and strongly associated with the terms poll
and public opinion polling, and survey methods are
used to conduct a censusan enumeration of the total
population. In addition to being used to identify a type
of research tool, survey research is a subject that can
be studied in an educational course or workshop, and
it is an academic discipline for undergraduate and
graduate degrees. This entry focuses on the basic definition from which these other uses of the term survey
have evolved by outlining the essential elements of
this term as it relates to the scientific study of people.
A survey is a research method used by social scientists (e.g., economists, political scientists, psychologists, and sociologists) to empirically and scientifically
study and provide information about people and social
phenomena. A survey is scientific because there is an
established process that can be followed, documented,
and replicated. This process is rigorous and systematic.
The typical steps in the survey process are (a) problem
formation, (b) hypothesis development, (c) research
design, (d) sample design and selection, (e) questionnaire development, (f) data collection, (g) data analysis, (h) reporting and dissemination, and (i) application
of information. Underscoring the complexity of a survey, each of these steps in the process also has a set
of essential and accepted practices that is followed,
documented, and replicated, and specific professional training is required to learn these practices.
The documentation that accompanies a survey provides the information necessary to evaluate the survey results and to expand the understanding and
analysis of the information provided from the survey. Sampling error and nonsampling error are two
dimensions of survey error. Most scientific surveys
Survey Costs
861
in that it is intended to provide insight for better organizational decision making. As such, managerial
accounting has no standards, and its formulas and data
are often unique to an organization. To learn more
about managerial accounting, see the Institute of Management Accountants Web site (www.imanet.org).
Much of what goes into managerial accounting is
built on an organizations operations and is often proprietary and confidential. However, there are some
basic concepts that are universal to costing in all survey operations. These concepts, and the costing principles derived from them, provide the foundation for
any organization to understand and build its own
managerial accounting system for survey costing
and budget management. Understanding them helps
researchers understand how their budget and costs are
related to survey quality.
Managerial accounting uses six basic concepts to
define costs:
1. Cost object: The thing managers want to know how
to cost
2. Cost driver: Any factor that affects costs
SURVEY COSTS
Managerial Accounting
Managerial accounting defines the principles used to
develop costs. It differs from financial accounting
862
Survey Costs
Analytical labor
Travel to client
For example, if the cost object is a computerassisted telephone interviewing (CATI) survey, cost
drivers might include the following list of items:
Length and complexity of CATI programming
Number of open-ended questions and associated coding
Desired data outputs and associated reporting
Once the cost drivers have been identified for a particular cost object, a list of direct and indirect costs
associated with those cost drivers can be constructed.
These cost categories will also usually be constant
across organizations, although the actual costs associated with each can be highly variable. Using the
CATI example, an organization might see the following direct and indirect costs:
Direct Costs:
Interviewer labor
Telephone long distance
Production floor management
CATI programming
Cost of sample
Data manipulation
Manual reports
Indirect Costs:
Concurrency/bandwidth
Data manipulation
Manual reports
Length and complexity of programming
Rate Calculations
Because of the dominance of variable costs in survey
data collection modes which rely on interviewing
labor (in-person and telephone), formulas to assist in
calculating production accuracy have been developed
to predict costs for these modes. This is true particularly of survey modes that are applied to difficult-toreach populations and those with large sample sizes.
In these cases, variable direct costs can fluctuate tremendously with the slightest change in specification.
There are three fundamental formulas used to predict cost for surveys that include interviewing labor.
These are (1) net effective incidence rate, (2) respondent cooperation rate, and (3) production rate.
These three rates often have different names,
depending on the country or even which professional
associations standards are being employed within an
organization. It is not the intent here to suggest that
any one associations or organizations labels are
preferred.
Fortunately, the research industry in the United
States has reached a point where the mathematical formulas for calculating net effective incidence and
respondent cooperation rate are generally agreed upon.
Production rates are derived using incidence and
respondent cooperation calculations. They are managerial accounting equations and, as such, are proprietary
and confidential. However, a universal standard for
CATI surveys is offered here as an example and as
a baseline for any organization or researcher building
an accurate production rate.
Incidence is another way of saying frequency of
a desired characteristic occurring in some group of
people. It is sometimes expressed as eligibility. There
are as many names for incidence as there are types of
research. The term incidence is business language
used primarily in commercial research, though the
Survey Costs
863
864
Survey Costs
Survey Ethics
Further Readings
SURVEY ETHICS
Survey ethics encompasses a set of ethical procedures
that are intended to guide all survey researchers.
These procedures are essential to the research process
so that explicit care is taken that (a) no harm is
done to any survey respondent, and (b) no survey
865
respondent is unduly pressured or made to feel obligated to participate in a survey. This entry discusses
informed consent, the rights of respondents, and the
social responsibility of researchers.
Informed Consent
The ethics of survey research and its importance often
are not covered thoroughly in research methods
textbooks and courses. However, the acquisition of
knowledge through survey research requires public
trust and, therefore, researchers must adhere to ethical
practices and principles involving human subjects.
Most governmental bodies throughout the industrialized world have established ethical guidelines for conducting survey research. Research proposals often are
subject to regulatory review by one or more ethics
boards (e.g., an institutional review board) that have
been established to ensure that the ethical guidelines
set forth by the governing body will be followed. In
addition to regulatory bodies overseeing research
activities involving human subjects, most professional
organizations and associations also have established
guidelines and standards of conduct for conducting
research that are expected to be maintained by organizational members. The primary tenet among these
governing bodies and organizations, as it relates to
carrying out research on human subjects in an ethical
manner, is that the researcher be cognizant of research
participants rights and minimize the possibility of
risk (i.e., avoid exposing research participants to the
possibility of physical or psychological harm, discomfort, or danger).
To that end, central to all research ethics policy is
that research participants must give their informed
consent voluntarily. The purpose of informed consent
is to reasonably ensure that survey respondents understand the nature and the purpose of the survey, what
is expected of them if they participate, the expected
length of time necessary for them to complete the survey (and, if a longitudinal study, the frequency with
which their participation will be requested), how the
data will be utilized, and their rights as research participants, including their right to confidentiality. Based
on the information provided by the researcher, potential respondents can then make an informed determination as to whether they are willing to participate in
a given study (i.e., give their consent). In addition
to the willingness to participate, it is fundamental
that potential respondents have the competence to
866
Survey Ethics
Respondents Rights
Respondents rights are paramount to any ethical survey research project and an integral part of informed
consent. Researchers have an obligation to their
subjects to minimize the possibility of risk. Granted,
respondents participating in the bulk of survey
research are not generally at a high risk of physical or
psychological harm. However, some survey research
topics are very sensitive in nature and may cause
a considerable amount of discomfort for some respondents. In addition, researchers are ethically bound to
report any child abuse that is suspected. Therefore,
it is essential to minimize risk through a thorough
advance disclosure of any possible harm or discomfort that may result from survey participation. If there
are compelling scientific reasons that respondents
must be kept blind to, or entirely deceived about,
some of the aspects of a study before they give their
consent, and while the study is being conducted (e.g.,
an experimental design would be compromised if the
respondents knew it was being conducted), then it is
the responsibility of the researcher to debrief the
respondents about any deception they may have experienced, even if it can be argued that the deception
was trivial in nature. It also is the responsibility of the
researcher to undo any harm he or she may have
caused associated with any deception or other withholding of information.
Furthermore, respondents are afforded additional
rights that also minimize any risks associated with
their participation, including the right that their
responses will be kept confidential and the right
to privacy. Confidentiality protects respondents iden-
Social Responsibility
Social responsibility is key to survey ethics. Research
findings contribute to the larger body of knowledge
for purposes of better understanding social behavior
Survey Methodology
867
SURVEY METHODOLOGY
Survey Methodology is a peer-reviewed journal published twice yearly by Statistics Canada. The journal
publishes articles dealing with various aspects of statistical development relevant to a statistical agency,
such as design issues in the context of practical constraints, use of different data sources and collection
techniques, total survey error, survey evaluation,
research in survey methodology, time-series analysis,
seasonal adjustment, demographic methods, data integration, estimation and data analysis methods, and
general survey systems development. Emphasis is
placed on development and evaluation of specific
methodologies as applied to data collection, data processing, estimation, or analysis.
The journal was established in 1975, as a Statistics
Canada in-house journal intended primarily to provide a forum in a Canadian context for publication
of articles on the practical applications of the many
aspects of survey methodology. Its basic objectives
and policy remained unchanged for about 10 years.
During this period, however, the journal grew to the
point that pressing demands and interests could not be
met within the framework established at its inception.
In 1984 several major changes were introduced,
which included broadening the scope of the editorial
policy and expansion of the editorial board. A separate management board was also established. The
editorial board now consists of the editor, a deputy
editor, approximately 30 associate editors, and six
assistant editors. M. P. Singh was the editor of the
journal from the beginning until his death in 2005.
The current editor is John Kovar. The associate editors are all leading researchers in survey methods and
come from academic institutions, government statistical agencies, and private-sector firms around the
world.
Since its December 1981 issue, one unique feature
of the journal is that it is fully bilingual, with submissions being accepted in either English or French and
all published papers being fully translated. The French
name of the journal is Techniques dEnquete.
From time to time the journal has included special
sections containing a number of papers on a common
theme. Topics of these special sections have included,
among others, census coverage error, small area
estimation, composite estimation in surveys, and
868
Survey Sponsor
SURVEY SPONSOR
The sponsor of a survey is responsible for funding
part or all of the sampling and data collection activities and typically has first or exclusive rights to the
data. Often, survey sponsors do not have the in-house
capacity (i.e., facilities or interviewing staff) to
administer a survey and, as a result, contract these
data collection activities to a third party. These thirdparty organizations focus on survey administration,
such as selecting the sample, interviewing and distributing questionnaires, and data entry and analyses.
Sometimes these third parties (i.e., survey research
Systematic Error
Private-sector companies and politicians, for example, share concerns about response bias and response
quality, but they also have concerns around disclosing
their sponsorship of a survey. These types of sponsors
may be reluctant to disclose themselves as a sponsor
to a sampled respondentat least prior to the survey
being completedfearing that this information may
bias survey responses, with respondents conveying
their opinions about the sponsor rather than the survey
questions. In addition, private-sector companies and
politicians may be reluctant to identify their sponsorship out of fear that their competitors may learn about
their proprietary market research.
Prior to the 1990s, several experiments were conducted related to the effects of the survey sponsor on
response rates, data quality, and response bias. In general, the studies found that government or university
sponsorship yielded higher response rates compared
to sponsorship by a commercial organization. This
body of research is not consistent about whether university sponsorship or government sponsorship produces a higher response rate. A well-cited British study
by Christopher Scott suggested that government sponsorship yielded a slightly higher response rate over the
London School of Economics or British Market
Research Bureau (a commercial agency). Thomas A.
Heberlein and Robert Baumgartners meta-analysis in
the 1970s found that government-sponsored research
produced higher initial response rates, but controlling
for topic saliency and other factors, sponsorship had
no significant effect on the final response rate. John
Goyders test of the Heberlein and Baumgartner model
of predicting response rates, using a more expansive
database of research, found that government-sponsored
surveys have significantly higher response rates.
Although many studies document differences in response rates depending on sponsorship, many researchers remain unconvinced that survey sponsor has a
significant effect on response patterns. The advantage
of noncommercial sponsorship (university, government) may diminish if the survey sponsor is identified as the survey organization conducting the survey
(rather than a separate entity that is paying for the
study). Commercial survey research firms (when no
additional sponsor is identified) may produce slightly
lower response rates than university-administered
surveys, but these differences may not be statistically
significant.
Although research tends to suggest an effect of
sponsorship on response rates, the effects of sponsorship
869
Further Readings
SYSTEMATIC ERROR
Systematic errors result from bias in measurement or
estimation strategies and are evident in the consistent
over- or underestimation of key parameters. Good survey research methodology seeks to minimize systematic error through probability-based sample selection
and the planning and execution of conscientious survey
design. There are two key sources of systematic error
in survey research: sample selection and response bias.
In the context of survey research, systematic errors
may be best understood through a comparison of
samples in which respondents are randomly selected
(i.e., a probability sample) and samples in which
respondents are selected because they are easily
accessible (i.e., a convenience sample). Consider, for
example, a research project in which the analysts wish
to assess attitudes about a towns public library. One
870
Systematic Error
Good survey research methodology strives to minimize both the selection bias and response bias that
induce systematic error.
Further Readings
Systematic Sampling
SYSTEMATIC SAMPLING
Systematic sampling is a random method of sampling
that applies a constant interval to choosing a sample
of elements from the sampling frame. It is in common
use in part because little training is needed to select
one. Suppose a sample of size n is desired from
a population of size N = nk. Systematic sampling uses
a random number r between 1 and k to determine the
first selection. The remaining selections are obtained
by taking every kth listing thereafter from the ordered
list to yield the sample of size n: Because it designates the number of records to skip to get to the next
selection, k is referred to as the skip interval.
Systematic sampling is particularly desirable when
on-site staff, rather than a data collection vendor,
select the sample. For example, one-page sampling
instructions can be developed for a survey of mine
employees that explain how the mine operator is to
order the employees and then make the first and subsequent sample selections using a pre-supplied random starting point. Systematic sampling is also
useful in sampling on a flow basis, such as sampling
clients entering a facility to obtain services such as
emergency food assistance or health care. In this situation, the facility manager is approached to determine
how many clients might visit in the specified time
^ is used to deterperiod. This population estimate N
mine the skip interval k based upon the desired
sample n of clients from the facility. At the end of
specified time period, the actual population N of clients is recorded.
Frequently, the population size N; together with
the desired sample size n, results in a skip interval k
that is a real number as opposed to an integer value.
In this case, the simplest solution is to round k down
to create a skip interval that is an integer. This
approach results in a variable sample size (n or
n + 1), but it is preferable for use by nontechnical
staff. Alternately, the list can be regarded as circular,
with a random number between 1 and N selected and
then a systematic sample of size n selected using the
integer value of k obtained by rounding down. An
easily programmable option is to select a real number
between 1 and k as the random starting point and then
continue adding the real number k to the random
starting point to get n real numbers, which are
rounded down to integers to determine the records
selected for the sample.
871
872
Systematic Sampling
T
Tailored messages can be based on age, sex, educational attainment, and other sociodemographics, as
well as cognitive, motivational, and behavioral attributes. These messages usually refer to interventions
created specifically for individuals with characteristics
unique to them, and they are usually based on data
collected from them as well.
TAILORING
Tailoring is a term that is used in different ways in
social behavioral research. It often is used to describe
health behavior and health education messages that
have been crafted to appeal to and influence the individual, with the intent of modifying the individuals
attitudes and behavior such that he or she engages in
healthier endeavors. A message that is tailored is
a personalized message that attempts to speak to the
individual as an individual and not as a member of
any group or stratum. Additionally, a tailored message
attempts to create a customized, personally meaningful communicationbased on knowledge (data and
information) about individuals. A more formal definition comes from M. W. Kreuter and C. S. Skinner
and indicates that a tailored message is any combination of information or change strategies intended to
reach one specific person, based on characteristics that
are unique to that person, related to the outcome of
interest, and have been derived from an individual
assessment. This definition highlights the two features
of a tailored health promotion intervention that distinguish it from other commonly used approaches: (a)
Its collection of messages or strategies is intended for
a particular person rather than a group of people, and
(b) these messages or strategies are based on individual-level factors that are related to the behavioral outcome of interest. Tailoring techniques are utilized
also by marketing practitioners but with the marketing
goal of selling consumer goods and services.
873
874
Tailoring
when the message begins to be segmented or subdivided by individual attributes. While the point where
that occurs still may be debated, targeted interventions
can be thought of as strategies based on group-level
variables and tailored interventions as strategies and
messages that are customized based on individuallevel variables.
Theoretical Approach
Tailored communications are not theory-specific, in
that many different kinds of theory might be used
to anchor an approach to attitudinal or behavioral
modification. The important point is that a theoretical
approach is utilized, as a systematic means to achieve
a desired end, and different theories are best used as
a basis for tailoring in relation to different behaviors
and audiences. The parameters of a messagethe
quality, appearance, literacy level, and packaging
of tailored communicationscan vary widely. In the
ideal, tailored materials eliminate all superfluous
information and just transmit the right message to the
right individual, through the right channel at the right
time. Of course, reality is far from that ideal, as few
behaviors have been subject to enough rigorous theory-based research. It is the utilization of theory that
guides a reasoned and grounded approach that clearly
delineates what is superfluous and what part of the
message contains the active ingredients, that is, the
parts that get people to think, act, buy, cooperate with
a survey request, and so forth. Moreover, tailored
materials can be designed well or poorly. So, just
because tailoring is used as a messaging technique
does not mean that other elements of creative design
become meaningless. In fact, those other elements are
likely to become more important. Tailoring holds
great promise and great challenges as well. For example, tailoring population-level interventions so that
they also are personalized at the individual level could
have a wide-scale impact on public health outcomes.
Target Population
Further Readings
875
TARGET POPULATION
The target population for a survey is the entire set of
units for which the survey data are to be used to make
inferences. Thus, the target population defines those
units for which the findings of the survey are meant
to generalize. Establishing study objectives is the first
step in designing a survey. Defining the target population should be the second step.
Target populations must be specifically defined, as
the definition determines whether sampled cases are
eligible or ineligible for the survey. The geographic
and temporal characteristics of the target population
need to be delineated, as well as types of units being
included. In some instances, the target population is
restricted to exclude population members that are difficult or impossible to interview. For instance, area
household surveys tend to define their target populations in terms of the civilian, noninstitutionalized
population of the United States. Any exclusion made
to the target population must also be reflected in the
inferences made using the data and the associated presentations of findings. Undercoverage of the target
population occurs when some population units are not
linked or associated with the sampling frame and
hence have no chance of inclusion in the survey. The
subset of the target population that has a chance of
survey inclusion because of their membership in, or
linkage to, the sampling frame is referred to as the
survey population. Traditional telephone surveys have
a survey population of households with landline
telephone service but typically are used to make
inferences to target populations of all households,
regardless of telephone service. In some instances,
surveys have more than one target population because
analysis is planned at multiple levels. For instance,
a health care practices survey might have a target
population defined in terms of households, another
defined in terms of adults, and another defined in
terms of children. Such a survey might sample households, collect household-level data, and then sample
an adult and a child for interviews at those levels.
For business surveys, the target population definition must also specify the level of the business that
comprises the units of the target population. Surveys
of business finances typically define their target populations in terms of the enterprisethe organizational
level that has ownership and ultimate responsibility
876
TAYLOR SERIES
LINEARIZATION (TSL)
The Taylor series linearization (TSL) method is
used with variance estimation for statistics that are
vastly more complex than mere additions of sample
values.
Two factors that complicate variance estimation
are complex sample design features and the nonlinearity of many common statistical estimators from
complex sample surveys. Complex design features
include stratification, clustering, multi-stage sampling, unequal probability sampling, and without
replacement sampling. Nonlinear statistical estimators for complex sample surveys include means, proportions, and regression coefficients. For example,
consider
P the estimator of a subgroup total,
^yd = i wi di yi , where wi is the sampling weight, yi
is the observed value, and di is a zero/one subgroup
membership indicator for the ith sampling unit. This
is a linear estimator because the estimate is a linear
di . On the
combination of the observed values yi andP
^yd = i wi di yi
other
hand,
the
domain
mean,
P
i wi di , is a nonlinear estimator as it is the ratio of
two random variables and is not a linear combination of the observed data.
Unbiased variance estimation formulae for linear
estimators are available for most complex sample
designs. However, for nonlinear estimators, unbiased
This approximation can easily be extended to functions of more than two linear sample statistics. An
approximate estimate of the variance of ^ is then
obtained by substituting sample-based estimates of my ,
mx , Var^y and Var^x in the previous formula.
An equivalent computational procedure is formed
by recognizing that the variable portion of the Taylor
series approximation is ^z = qFy ^y + qFx ^x so that
Var^ VarqFy ^y + qFx ^x = Var^z):
Because ^y and ^x are linear estimators, the Taylor
series variance approximation can be computed using
the linearized values zi = wi qFy yi + qFx xi so that
Technology-Based Training
P
^z = i zi . As before, substituting sample-based
estimates of my and mx , namely, ^y and ^x, in the formula for zi and then using the variance formula of
a linear estimator for the sample design in question to
estimate the variance of ^z yields an approximate estimate of the variance of ^. This reduces the problem
of estimating the variance of a nonlinear statistics to
that of estimating the variance of the sum of the linearized values. As an example, the
linearized
values
P
w
d
.
for the mean ^yd are zi = wi di yi ^yd )
i
i
i
This illustration of the TSL method was for an estimator that is an explicit function of the observed data
such as a mean, proportion, or linear regression coefficient. There are extensions of the TSL method to estimators that are implicitly defined through estimating
equations, such as the regression coefficients of logistic, log-link, or Cox proportional hazards models.
Rick L. Williams
See also Balanced Repeated Replication (BRR); Clustering;
Complex Sample Surveys; Jackknife Variance Estimation;
Multi-Stage Sample; Probability of Selection; Replicate
Methods for Variance Estimation; Sampling Without
Replacement; Stratified Sampling; SUDAAN
Further Readings
TECHNOLOGY-BASED TRAINING
Technology-based training uses computer-based
tools to enhance the training process, typically by
involving trainees actively rather than passively.
For survey research, technology-based training is
usually used to build and sharpen interviewer skills,
particularly skills required for successfully interacting
with survey respondents. Traditionally, interaction-skills
training relied on peer-to-peer role playing or passive
877
878
Telemarketing
Further Readings
TELEMARKETING
Organizations engage in various forms of direct marketing to sell their products and services, solicit
money, or nurture client relationships. Telemarketing
is one such direct marketing technique that can target
either individual consumers or other organizations.
Groups that engage in telemarketing include nonprofit
organizations, institutions, and political interest
groups. A telemarketing call could come from a charity soliciting a small donation, an alumni group from
a university, or a political party seeking support for
a candidate. However, the prototypical use of telemarketing is for purposes of private enterprise. For businesses, telemarketing is a common form of direct
selling, which originated with the advent of the telecommunications industry. Though it had been used
with varying degrees of success for decades, it was
not until the 1970s that the telephone became a standard tool of mass marketing. Cold calling, the unsolicited calling of large groups of people, is just one
form of profit-seeking telemarketing. It should be recognized that although many firms do engage in cold
calling, the two primary technological advances that
popularized telemarketing practice are not closely tied
to this practice.
By the late 1970s, AT&T widely disseminated
toll-free numbers, which allowed for customers to call
into a business to gather more product information,
receive specific services, and place long-distance
orders free of a telephone line usage fee. The second
technology that can be credited with the rise of telemarketing is the electronic management of database
information. As computers increased in capability and
declined in price, businesses capitalized on the ability
to marry customer information and phone numbers,
with their own product catalogs. Consequently, while
cold calling is a shot in the dark for businesses, the
879
TELEPHONE CONSUMER
PROTECTION ACT OF 1991
In 1991, the U.S. Congress, in response to concerns and
complaints about the increasing number of unsolicited
telemarketing calls to consumers, passed the Telephone
Consumer Protection Act (TCPA). The TCPA (Public
Law 102243) updated the Communications Act of
1934 and is the primary law governing telemarketing.
The TCPA mandated that the Federal Communications
Commission (FCC) amend its rules and regulations to
implement methods for protecting the privacy rights of
citizens by restricting the use of the telephone network for unsolicited advertising as stated in the TCPA.
Although the TCPA was specifically directed at telemarketing activities and abuses, some of its prohibitions
and some of the subsequent FCC Rules and Regulations Implementing the Telephone Consumer Protection
Act of 1991 are not specific to telemarketing. Rather
they are aimed at protecting individual privacy in general and, as such, can impact telephone survey research.
TCPA
The TCPA specifically restricted the use of automatic
telephone dialing systems, prerecorded voice messages and unsolicited fax advertisements:
It shall be unlawful for any person within the
United States to (A) make any call (other than a call
made for emergency purposes or made with the
prior express consent of the called party) using any
880
Do-Not-Call Legislation
The TCPA had required the FCC to compare and
evaluate alternative methods and procedures (including the use of electronic databases, telephone network
technologies, special directory markings, industrybased or company-specific do-not-call systems, and
any other alternatives, individually or in combination)
for their effectiveness in protecting such privacy
rights, and in terms of their cost and other advantages
and disadvantages. Originally the FCC had opted for
company-specific do-not-call lists. In the absence of
a national database, many states enacted their own
do-not-call legislation and lists. Some states also
implemented special directory markings. Consumers
still were not satisfied.
Responding to ongoing consumer complaints, the
Do-Not-Call Implementation Act of 2003 (Public Law
No. 108-10) was signed into law on March 11, 2003.
This law established the FTCs National Do Not Call
(DNC) Registry in order to facilitate compliance with
881
882
Telephone Households
TELEPHONE HOUSEHOLDS
cell (wireless) telephone service, or both. The approximately 3% without service at a given point in time
the nontelephone householdsare households that may
have service a few months of the year but cannot afford
it consistently. These households are disproportionately
low-income renters, who live in very rural areas or
inner-city poverty areas. Telephone surveys cannot
reach (cover) households without any telephone service,
and if the topic of the survey is correlated with whether
or not a household has telephone service, the telephone
survey may suffer from nonnegligible coverage error.
It was not until the 1970s in the United States that
telephone service existed in at least 90% of households, although at that time in certain regions of the
country less than 80% of households had telephone
service. Nowadays the vast majority of households in
the United States have both landline and cell telephone service, although reliable and up-to-date statistics on the exact proportions of landline only, cell
phone only, and those that have both types of telephone service are not routinely available, especially
not at the nonnational level. However, as of late 2007,
a federal government study determined that 20% of
U.S. households had only cell phone service and
approximately 77% had landline service (with most
of these also having a cell phone). Survey researchers
who choose to sample the public via telephone must
pay close attention to the prevalence of telephone
households in the geographic areas that are to be sampled. Many challenges exist for telephone survey
researchers as of 2008 due in part to (a) the rapid
growth of the cell phone only phenomenon, especially
among certain demographic segments of the population (e.g., renters and adults under the age of 30),
(b) number portability, (c) difficulties in knowing
how to properly sample from both the landline and
cell frame, especially at the state and local level, and
(d) difficulties in knowing how to weight the resulting
data that come from both frames.
Paul J. Lavrakas
Further Readings
Telephone Penetration
TELEPHONE PENETRATION
The term telephone penetration refers to the number of
households in a given survey area with one or more
telephones. Traditionally, this has meant one or more
landline or wired telephones, not including cell phones.
A major challenge for drawing proper survey samples is ensuring that the sample represents a very high
proportion of the population of interest. Traditionally,
bias as a result of undercoverage in telephone surveys
was credited to households without phones. As time
has gone on, the rate of households without phones has
been declining, leading to a decline of said bias. In its
infancy, telephone interviewing was used only as a method of support for other interviewing techniques, such as
face-to-face interviewing. However, by the 1970s the
penetration of telephones in U.S. households had exceeded 90%, and this higher telephone penetration resulted in
the evolution of telephone interviewing as it has become
a centralized and exact data collection method, evolved
further through the use of networked computers and
computer-assisted telephone interviewing (CATI).
Telephone penetration in the United States has
been on the rise ever since the invention of the telephone in 1861. The percentage of households in the
United States with telephones increased from less
than 40% in 1940, to above 95% as of 2008. Because
of coverage issues, telephone surveys, while often
regarded as cheaper than face-to-face interviews,
lacked respectability for much of the early 20th century. In the first half of the 20th century, telephone
ownership was a privilege for those who could afford
it. Its use was not common, and ownership was limited to a select group of citizens. For this very reason,
telephone directories and telephone surveys were limited to surveys of special populations.
Perhaps one of the most famous examples of the
effect of undercoverage on the results of a survey
883
884
Telephone Surveys
Further Readings
TELEPHONE SURVEYS
Surveys for which data collection is conducted via
a telephone interview represent a major source of all
Telephone Surveys
Evolution
Advances in computer and software technology have
allowed the practice of telephone interviewing to
evolve over the past few decades. Currently the
majority of survey organizations rely on a paperless
system that assigns a specific identification number to
each telephone number that is part of the sample, that
is, to each potential respondent. This allows a record
to be kept of every time that number is called and
what the outcome of each call was (such as no one
was home, a busy signal was received, an interview
was completed, etc.). It also enables a record to be
kept of the date and time that each call to a number
was placed, which helps to ensure that numbers in the
unresolved sample are called back at the appropriate
time (such as calling back a respondent who is available only in the afternoons).
Length
Telephone surveys vary greatly in length. Many news
and election-related polls can be on the shorter side:
approximately 2 to 5 minutes. Surveys employed
for academic research can range up to 45 minutes
or more, and market research surveys are often somewhere in between. Longer surveys tend to elicit
a lower participation rate due to respondent fatigue.
However, many media polls and marketing surveys,
for example, are not as concerned with the response
885
rate of surveys and instead focus on obtaining a specific number of completed interviews without regard
for the number of telephone numbers at which no person was reached or for which the person at the number refused to participate. Other surveys, in particular
government and academic surveys, are more concerned with these aspects because they need to obtain
results that are more certain to be representative of
the population of interest.
Advantages
Telephone surveys have many advantages over faceto-face interviewing, mail surveys, and Internet surveys. When compared to face-to-face and mail surveys, telephone surveys can be completed much more
quickly and cheaply, which is almost always a major
consideration in the field of survey research. Although
telephone surveys are still more expensive than Internet surveys, they have the advantage that they are
much better able to represent the general population,
despite the challenges posed by cell phone only
households in the United States. Even though Internet
access and use have been on the rise over the past
several years, the percentage of U.S. households that
have such access is still not as high as the percentage
that can be reached by telephone (97% in 2008).
Therefore, in surveys that attempt to contact members
of a large or diverse population, the representativeness
of the data is questionable when Internet surveys are
employed instead of telephone surveys. Furthermore,
surveys that use interviewers, such as telephone surveys, will almost always elicit higher response rates,
all other factors being equal, than surveys that do not
use interviewers, such as Internet surveys.
Challenges
The telephone survey represents a major way that
a large population can be accurately surveyed, but it
still possesses many challenges to survey accuracy.
Response rates have been declining as the use of caller ID and cellular telephones has been on the rise,
which represents a major problem for the industry.
As response rates decline, the accuracy of the data
becomes more questionable. For example, if only
a small portion of the selected sample participates in
a survey, it could mean that those people who were
sampled but did not participate are, in some way,
inherently different from those who do participate and
886
Telescoping
would therefore give different responses to the interview questions. The end result would be that the data
would be skewed toward a certain set of opinions due
to nonresponse bias. Coverage rates in landline telephone surveys also have decreased to below 80% as
of 2008.
Innovations such as caller ID and privacy manager allow potential respondents to screen out calls
before they even find out the purpose of the call.
Telemarketers are generally thought to be a compounding factor because so many households receive
a large number of telephone solicitations from telemarketers, and potential respondents begin to assume
that any telephone number appearing on caller ID
that is not recognizable is likely to be a telemarketer.
Another contributor to the decline in telephone survey response rates are the popular do-not-call lists.
Created to reduce the number of unwanted telemarketing calls, many people assume that if they are on
such a list, they should not be called by anyone
whom they do not personally know. Even though
such lists do not apply to legitimate survey research
organizations, many potential respondents do not
make such distinctions. Additionally, the increased
use of the telephone to conduct surveys over the past
few decades has contributed to the decline in
response rates. As people become used to receiving
solicitations for their opinions, it becomes easier to
refuse those requests.
Finally, the use of SUGing and FRUGing has also
contributed to the decline in response rates for telephone surveys. SUGing (Soliciting Under the Guise
of survey research) and FRUGing (Fund-Raising
Under the Guise of survey research) is done by organizations that attempt to give the purpose of the calls
greater legitimacy by claiming to only be interested in
surveying opinions. This tactic quickly degenerates
into a sales or fund-raising call but leaves the lasting
impression on the respondent that telephone surveys
often are merely a disguised way of selling products
or raising money.
Mary Outwater
See also Caller ID; Cell Phone Only Household; Cell Phone
Sampling; Computer-Assisted Telephone Interviewing
(CATI); Do-Not-Call Registries; FRUGing; Interviewer
Monitoring; List-Assisted Sampling; Nonresponse Bias;
Number Portability; Random-Digit Dialing (RDD);
Response Rates; SUGing
Further Readings
TELESCOPING
Telescoping describes a phenomenon that threatens
the validity of self-reported dates, durations, and frequencies of events. Respondents often are asked in
surveys to retrospectively report when something
occurred, how long something lasted, or how often
something happened within a certain time period. For
example, health surveys often ask respondents the
date of the last time, how often, or how many days
they were hospitalized during the last calendar year.
Answering this type of question requires the respondent to remember exact dates and temporal sequences
and to determine whether an event happened within
a certain time period. At this stage of the response
process, dates or events can be forgotten entirely or
telescoped forward or backward. While forgetting
describes not remembering an event at all, telescoping
focuses on errors made by incorrectly dating events
that were recalled.
Survey researchers distinguish between two types
of telescoping: forward and backward. Forward telescoping occurs when an event is erroneously remembered as having occurred more recently than it
actually did. A backward telescoped event is erroneously remembered as having occurred earlier than its
actual date. In general, empirical data show that forward telescoping is more likely to occur than backward telescoping.
Why telescoping occurs is not fully understood.
Two main theories have emerged in the literature: the
time compression and variance theories. However,
Temporary Dispositions
887
Further Readings
TEMPORARY DISPOSITIONS
Temporary disposition codes are used to record the
outcomes of specific contact attempts during a survey
that are not final dispositions, and provide survey
researchers with the status of each unit or case within
the sampling pool at any given point in the field
period. Temporary disposition codes provide a record
of what happened during each contact attempt prior to
the case ending in some final disposition and, as such,
provide survey researchers with the history of each
active case in a survey sample. Temporary dispositions function as an important quality assurance component in a surveyregardless of the mode in which
the survey is conducted. They also serve as paradata in some methodological research studies. However, the primary purpose of temporary dispositions is
to assist researchers in controlling the sampling pool
during the field period.
Temporary dispositions usually are tracked through
the use of an extensive system of numeric codes or
categories that are assigned to each element in the
sampling pool once the field period of the survey
has begun. Common temporary sample dispositions
include the following:
No one at home/answering
Busy signal (telephone survey only)
Fast busy (telephone survey only)
Callback
Privacy manager (telephone survey only)
Unable to participate
Unavailable respondent
Household refusal (a temporary disposition if refusal
conversions are planned)
Respondent refusal (a temporary disposition if refusal
conversions are planned)
888
TestRetest Reliability
TESTRETEST RELIABILITY
Testretest reliability is a statistical technique used
to estimate components of measurement error by
repeating the measurement process on the same subjects, under conditions as similar as possible, and comparing the observations. The term reliability in this
context refers to the precision of the measurement
(i.e., small variability in the observations that would be
made on the same subject on different occasions) but
is not concerned with the potential existence of bias.
In the context of surveys, testretest is usually
in the form of an interviewreinterview procedure,
where the survey instrument is administered on multiple occasions (usually twice), and the responses on
these occasions are compared.
Ideally the reinterview (henceforth referred to as
T2 ) should exactly reproduce the conditions at the
original interview (T1 ). Unfortunately, learning, recall
bias, and true changes may have occurred since the
original interview, all leading to T2 not matching T1
answers.
Which measurement error components are of interest may affect certain decisions, such as whether to
use raw data (if the focus is on response error alone)
or edited data and imputed data (if editing and imputation errors are of interest). Oftentimes raw responses
need to undergo light editing for practical reasons.
Reliability measures (discussed later in this entry)
may be calculated for individual questions to identify
ones that may be poorly worded or others that may
need to be followed by probing. Also, reliability measures may be calculated for population subgroups or
by type of interviewer (e.g., experienced vs. novice)
to detect problems and improve precision.
Further considerations in the design of a testretest
study include considering whether the interview
reinterview sample should be embedded in the main
survey sample or exist as an additional, separate sample. An embedded sample is obviously cheaper, but
sometimes a separate sample may be more advantageous. Another issue to consider is using the same
interviewer at T2 or not. Using the same interviewer
in part of the sample and assigning different interviewers to the rest may help in assessing interviewer
effects. The duration of time between T1 and T2 (the
testretest lag) often has an effect on the consistency between the interviews. Thus, limiting to a certain time range may be an important consideration.
Furthermore, it is important to properly inform the
respondents of the need for the reinterview, because
respondents perceptions of its necessity may affect the
quality of the data they provide. After the reinterview
is completed, additional special questions may be
TestRetest Reliability
889
Table 1
T1
T2
Row total
pyy
pyn
py+
pny
pnn
pn +
p+y
p+n
Column total
kappas undesirable properties (e.g., when the marginal distributions of the prevalence are asymmetric).
Weighted kappa is a version of Cohens kappa for
ordinal responses, where differences between the
levels of the response variable are weighted, based on
judgment of the magnitude of the discrepancy. For
continuous responses, a measure called the concordance correlation coefficient was proposed by L. Lin
in 1989.
L. Pritzker and R. Hansen proposed another measure called the index of inconsistency (IOI). Its complement, 1 IOI, is referred to as the index of
reliability. The IOI is defined as the ratio of the measurement variance to the total variance, and thus can
be viewed as an intracluster correlation. (Of note:
Some measures defined for continuous responses can
also be used for dichotomous or ordinal data.) Finally,
various types of models have also been used to make
inference on reliability. While modeling involves
making certain distributional assumptions, it allows
for adjusting for covariates.
Moshe Feder
See also Measurement Error; Probing; Raw Data;
Reinterview; Reliability
Further Readings
890
Third-Person Effect
THIRD-PERSON EFFECT
The third-person effect is a term that refers to the
documented belief held by many people that mass
communication has different and greater effects on
others than on themselves, and because of this perception, some of these people will support certain policies and actions based upon the presumed effect on
others. The phenomenon has been linked to public
opinion research, and it often is studied through survey research methods.
was of minor theoretical significance and his observations were based on sketchy data. Nevertheless,
his observations were considered intriguing by many
others who read and elaborated on his work, and in
2004 the third-person effect was named one of the
most popular communication-theory frameworks of
the early 21st century.
Davison explained the third-person effect in the
following terms:
People will tend to overestimate the influence that
mass communications have on the attitudes and
behaviors of others.
People will expect the communication to have a
greater effect on others than on themselves.
Whether or not these individuals are among the
ostensible audience for the message, the impact that
they expect this communication to have on others
may lead them to take some action.
Topic Saliency
including attribution, social comparison, social desirability, social distance, unrealistic optimism, symbolic
interactionism, and self-categorization, among others.
The third-person effect has also been connected to the
larger scholarly body of public opinion theory and
research, including such phenomena as spiral of silence
and pluralistic ignorance. Additional theoretical attention has been paid to the role that message variables
may play in contributing to the third-person effect.
Evidence is mixed across studies, but there is some
indication that the desirability of a message, that is,
whether it is pro- or anti-social, can influence the
nature and magnitude of the third-person effect. In
addition, some studies have found that certain messages can elicit a reverse third-person effect, that is,
a situation in which individuals report a greater effect
of the message on ones self than on others. Another
key consideration in third-person research has to do
with the nature of the other and his or her relation
to, and psychological distance from, the self.
Turning to issues more methodological in nature,
early formal tests of third-person propositions tended
to employ lab-based experimental designs, but the
phenomenon has been studied extensively with survey
research methodology as well. Results across studies
indicate that the third-person effect occurs independently of the methodological approach used to study
it. Several studies have investigated whether the phenomenon is simply a methodological artifact relating
to question wording, question order, or study design
(e.g., whether a between- or within-subjects design is
employed). Results of these studies indicate that the
third-person effect is more than a mere methodological quirk.
What originally appeared to be an assortment of
observations of seemingly minor theoretical significance subsequently has achieved the status of a vibrant
media-effects approach that has important implications for survey research and public opinion theory.
Charles T. Salmon
See also Experimental Design; Public Opinion; Public
Opinion Research; Question Order Effects; Social
Desirability; Spiral of Silence
Further Readings
891
TOPIC SALIENCY
The saliency of a topicthat is, its importance or
relevance to potential respondentscan affect response patterns to surveys. There are several explanations as to how and why topic saliency affects
response rates. First, if a potential respondent believes
the topic to be important, he or she may be more
likely to rationalize incurring the costs of responding
to the survey. Second, responding to a survey topic
that is of personal interest may have intrinsic rewards,
such as providing an opportunity to exhibit ones
knowledge or share ones opinion. Third, responding
to a survey about a salient topic may be motivated by
perceived direct benefits. Survey participation may be
viewed as an opportunity to advance ones own needs,
interests, or agenda. All of these explanations may
apply to explain why a single respondent or respondents are more apt to complete a survey about a salient
topic.
Research suggests that people are more likely to
respond to a survey if the topic is of interest to them.
For example, teachers are more likely than nonteachers to respond to, and cooperate with, a survey
about education and schools; senior citizens are more
likely than younger adults to respond to a survey
about Medicare and health.
In addition to its impact on survey participation,
topic saliency is important for attitude formation and
response retrieval. Theories about the representation
of attitudes in memory suggest that attitudes reported
by the respondent as being more important or as being
more strongly held are also more stable over time.
Attitudes about salient topics require less cognitive
effort to recall, resulting in attitude responses that are
more stable over time, more stable when presented
892
Rewards
893
894
Critiques
One of the effects of publication of the TDM was to
shift the emphasis away from looking for a magic bullet that might improve response rates toward providing a template, which, if followed carefully, was
likely to produce high response rates. Its use also has
shown that mail response rates of the public can often
rival or even surpass those obtained in telephone surveys; this was true especially for surveys conducted
in the 1980s and 1990s. Thus, the overall effect was
to give legitimacy to mail surveys as a way of undertaking serious research.
It also became apparent that the TDM was not an
adequate design for mail surveys in certain respects.
895
order to get a complete response. Also, whereas government surveys carry great legitimacy and obtain
relatively high response rates without the use of
cash incentives, private-sector surveys make the use
of such incentives more appropriate. The variety of
situations calling for design variations is quite large,
ranging from employee surveys sent through internal
company mail (for which there is strong employer
encouragement to respond) to election surveys that
use voter registration lists and require focusing contacts on a deadline date.
The development of the tailored design, or the
new TDM as it is sometimes called, also reflected
a rapidly growing science based on factors that influence responses to survey questions. Research that
began in the 1990s made it clear that people who
respond to mail surveys are influenced by the graphical layout of questionnaires as well as the use of symbols and numbers, and that their reactions to these
features of construction are influenced by behavioral
principles such as the gestalt laws of psychology.
Whereas the original TDM included a focus on visual
layout, the scientific underpinnings of those principles
had not yet been demonstrated or methodically tested.
Another aspect of the science base that provided
a basis for tailored design was nearly two decades of
research on question order and context affects. Thus,
measurement principles developed for tailored design
gave much greater importance to what questions were
to be asked and the order in which they needed to be
asked as well as how they were to be displayed on
pages, while giving somewhat less interest to what
questions would appeal most to respondents.
Development of a tailored design perspective was
facilitated by the development of information technologies that make it possible to treat different sample
units in different ways, efficiently and accurately.
Active monitoring of response and the timely tailoring
of specific follow-up letters to different types of
respondents were not feasible in the 1970s.
Internet surveys have now been encompassed
within the tailored design perspective. Because mail
and Web surveys are both self-administered as well
as visual, they share many commonalities. Whereas
the technological interface between respondent and
response mechanisms clearly distinguishes Web from
mail, its similar exchange processes appear to occur
between respondent and the questionnaire and its
sponsor. Both mail and the Internet face the challenge of not having an interviewer who can provide
896
897
available to fund them. The quality of a survey statistic such as a mean, a percentage, or a correlation coefficient is assessed by multiple criteria: the timeliness
of reporting, the relevance of the findings, the credibility of researchers and results, and the accuracy of
the estimatesjust to mention a few. Among those
criteria the accuracy of the estimate is not necessarily
the most important one. However, the accuracy is a
dimension of the overall survey quality for which
survey methodology offers a wide range of guidelines
and instructions. Also, standard measures for the magnitude of the accuracy are available. The accuracy of
a survey statistic is determined by its distance to or
deviation from the true population parameter. If, for
example, a survey aims to determine the average
household income in a certain population, any deviation of the sample estimate from the true value
that is, what would have been determined if all members of the target population were asked their income
and they all answered accuratelywould decrease
accuracy.
1. The first class of error sources applies to the representation of the target population by the weighted
net sample (representation).
2. The second class of error components adds to the
total error by affecting the edited survey responses
obtained from a respondent (measurement).
Coverage Error
For a sample to be drawn, a sampling frame is necessary in order to provide the researcher with access to
the members of the population from whom data are to
be gathered. The incompleteness of this frame and the
possible bias of its composition cause misrepresentations of the population by the sample. If a group is
underrepresented in the framefor example, individuals
who own mobile phones as their only telecommunications device are missing from traditional random-digit
dialing (RDD) sampling frames because they do not
have a landline telephone numberthe sociodemographic or substantive characteristics of this group cannot be considered when computing the survey statistic.
This causes a lack of accuracy of the survey estimate
since some groups might be underrepresented in the
frame and subsequently in any sample that is selected
from the frame, resulting in coverage bias.
Sampling Error
898
sample design, for example, would require a disproportional stratified sample, an appropriate weighting
procedure would have to be devised to compensate
for the unequal selection probabilities when estimating the population parameter. In addition, and as
noted earlier in this entry, the net sample may need to
be adjusted for possible nonresponse bias. Both procedures require complex computations that take into
account information from the gross sample, official
statistics, or both. Whereas the first approach may
potentially increase the random error of the estimate,
the second approach may introduce systematic errors
into the sample and thus bias the estimate.
Specification Error
Measurement Error
899
Also within the CASM framework, a detailed theoretical approach on how respondents consider and
answer survey questions has been developed. As a
result, the questionanswer process has been described
psychologically in great detail. Using this framework,
several systematic and random respondent errors have
been identified related to what may happen when
respondents answer survey questions. For example,
satisficing behavioras opposed to optimizing response behavioras well as mood effects have been
demonstrated to occur by methodological research.
Interviewer
900
901
902
Further Readings
Because of the limitations of most telephone keypads (only 12 keys, including # and *), TDE is best
used only for binary responses (e.g., enter 1 for Yes,
2 for No), limited choice sets (e.g., 1 for Yes, 2 for
No, 3 for Undecided), and simple numeric entry (e.g.,
Please enter your year of birth in four digits).
Other numeric information can be collected via
TDE, but additional read-back checks (e.g., You
entered 10 thousand and 43 dollars, is that correct?
Press 1 for Yes, 2 for No) are necessary because
many phones do not have a visual display whereby
the respondent can readily see if he or she made an
error.
Drawbacks of TDE include the following:
The need for the respondent to have a touchtone
phone
The increasing tendency of phones to have the keypad embedded in the handset (such as with cordless
and mobile phones), making TDE physically awkward as the respondent juggles trying to listen with
trying to respond
Tracking Polls
Jenny Kelly
See also Interactive Voice Response (IVR); Telephone Surveys
TRACKING POLLS
A tracking poll is a series of individual surveys
repeated continuously over time to measure attitudinal
and behavioral changes in a target population. While
most commonly associated with election campaigns,
tracking polls also are used for a variety of other
research needs, from measurement of consumer sentiment to ad tracking in marketing research.
The term tracking often mistakenly is used to
describe any trend data obtained over time. In fact it
correctly applies only to continuous measurements in
stand-alone samples. In all applications, the aim of
a tracking poll is to produce an ongoing, rather than
a one-time or episodic, assessment of evolving attitudes.
Tracking polls provide substantial flexibility in
data analysis. The consistent methodology produces
useful time trends. Tracking data can be segregated
before and after an event of interesta major policy
address, a campaign gaffe, or an advertising launch
to assess that events impact on attitudes or behavior.
These same data can be aggregated to maximize sample sizes for greater analytical power. And tracking
surveys can be reported in rolling averages, adding
new waves of data while dropping old ones to smooth
short-term or trendless variability or sampling noise.
The most publicized use of tracking polls is in
election campaigns, particularly in the often-frenzied
closing days of a contest when campaign advertising
spikes, candidates voice their final appeals, voter
interest peaks, and tentative preferences become final
903
choices. Campaigns conduct their own private tracking polls to find their best prospects, target their message, and gauge their progress. The news media use
tracking polls to understand and report the sentiments
behind these crystallizing choices and to evaluate preferences among population groups, as well as to track
those preferences themselves.
Election tracking surveys customarily are composed of a series of stand-alone, 1-night surveys, combined and reported in 3- or 4-day rolling averages. In
well-designed election tracking polls, the limitations
of 1-night sampling on dialing and callback regimens
are mitigated by sample adjustments, such as a mix of
new and previously dialed sample and the integration
of scheduled callbacks into fieldwork protocols.
Election tracking polls have acquired a negative
reputation in some quarters because of their reputed
volatility. Thoughtful analysis by Robert Erikson and
his colleagues, however, has established that volatility
in pre-election polls, where it exists, is introduced by
idiosyncratic likely voter modeling rather than by
tracking poll methodology.
Humphrey Taylor, chairman of the Harris Poll, is
credited with creating the first daily political tracking
polls, a series of twice-daily, face-to-face surveys
done for the Conservative Party in the last 4 weeks of
the 1970 British general election: One survey tracked
media exposure, while the other tracked issue priorities and party preferences on the issues. Taylor (personal communication, January 29, 2007) relates that
his tracking data picked up a collapse in confidence
in the Labor governments ability to manage the economy following bad economic news three days before
the election. And this was the reason why almost all
the polls which stopped polling too soon failed to pick
up a last-minute swing to the Conservatives. The
Tories won by two points.
Given such sensitivity, tracking polls have developed into an election fixture. In the largest mediasponsored tracking poll in the 2004 election cycle,
ABC News and The Washington Post conducted
21,265 interviews over the final 32 days of the presidential campaign. In a variation on the election tracking theme, an academic poll that focused on political
issues, the National Annenberg Election Survey, interviewed 81,422 respondents in continuous interviewing
from October 2003 through Election Day 2004.
In another use of tracking, ABC, with the Post (and
previously with an earlier partner, Money magazine),
adapted election tracking methodology in creating the
904
Training Packet
Further Readings
TRAINING PACKET
A training packet contains the materials that are used
to train survey interviewers, either in general interviewing skills or for a specific survey project. The
interviewer training packet is an important part of
interviewer training and quality data collection. Interviewers often will use the training packet they receive
throughout a study to review procedures, and thus it
serves an important role in continuous, on-the-job
training.
The entire research process relies upon welltrained, knowledgeable interviewers to ensure the success of a project and the integrity of the data. Managers who are tasked with training interviewers must
prepare a thorough training curriculum and system to
provide the interviewers with the necessary tools to
carry out their duties in an ethical and quality-minded
manner. Trainers must always keep in mind that many
interviewers do not have formal academic experience
with the research process. Thus, it is important that
interviewer training not only discusses the techniques
Administrative
Typically, general training begins with an overview
of the research organization and applicable administrative tasks. Administrative information includes time
reporting, assignment logging, and work hour scheduling. The addition of an overview of the research
firm provides interviewers with the big picture by
showing them how they fit into the organization as
well as the research process.
Interviewing
The following outlines important interviewing training topics that should be addressed in the training
packet. It is important to note, some of these topics
may only be relevant to certain types of interviewing
Training Packet
905
modes, such as CATI or face-to-face (field) interviewing, but most are universally applicable.
Interviewer Bias
Initial Interaction
The first few seconds of contact between interviewer and respondent often set the tone for all that
follows. Within the first few seconds, respondents
often will have made up their mind whether they will
continue speaking with an interviewer or will terminate the contact. Interviewers must build rapport
quickly, deliver their introduction, and screen for eligible respondents within this initial interaction.
The training packet should include instructions and
examples about how the interviewer can deliver the
introduction of the interview with confidence. Interviewers may have some input into the development of
the introductory text, but once it is set they cannot
alter it beyond their own ability to properly tailor it to
each individual respondent.
The introduction is typically followed by a screening section. This should be explained in the training
packet and may be complex with multiple questions,
or it may be simple with only a single question. Interviewers must be trained to take extra precaution when
asking screening questions. All too often, a respondent
will automatically answer in agreement with an interviewer only to have the interviewer discover later that
the respondent actually did not qualify to participate.
Interviewers should be trained to always encourage
respondents to think before they respond. This can be
done through the instructions about the use of an
appropriate pace and thought-provoking tone of voice.
Approach
The training packet also should include information about the following techniques used by interviewers to properly administer the questionnaire.
When a respondent gives an ambiguous answer, an
interviewer must probe respondents for clarification.
Probing is the use of a nonleading phrase to encourage a respondent to provide a more precise response.
Probing should be described in the training packet,
and examples of appropriate and inappropriate probes
should be included. Interviewers must be comfortable
with this technique, as it is needed in a variety of
situations. It can be used when respondents do not
adhere to a closed-ended question or when an unclear
open-ended response is given.
906
Trend Analysis
Respondents will sometimes stray from the questions. Interviewers must know how to refocus respondents attention to the task. Focusing is necessary
when a respondent becomes irritated by a particular
question and proceeds to rant about the failures of the
questionnaire or when a respondent gives open-ended
responses to closed-end questions. Interviewers can
use reassuring phrases demonstrating that they are listening to the respondent but that they must proceed
with the interview. The training packet also should
include instructions on how to accomplish this.
Jeff Toor
See also Computer-Assisted Personal Interviewing (CAPI);
Computer-Assisted Telephone Interviewing (CATI);
Ethical Principles; Face-to-Face Interviewing;
Interviewer-Related Error; Interviewer Training;
Interviewer Neutrality; Mode of Data Collection; Probing;
Respondent-Interviewer Rapport; Tailoring
Further Readings
TREND ANALYSIS
Trend analysis is a statistical procedure performed to
evaluate hypothesized linear and nonlinear relationships between two quantitative variables. Typically, it
is implemented either as an analysis of variance
(ANOVA) for quantitative variables or as a regression
analysis. It is commonly used in situations when data
have been collected over time or at different levels of
a variable; especially when a single independent variable, or factor, has been manipulated to observe its
effects on a dependent variable, or response variable
(such as in experimental studies). In particular, the
means of a dependent variable are observed across
conditions, levels, or points of the manipulated independent variable to statistically determine the form,
shape, or trend of such relationship.
907
908
TROLDAHL-CARTER-BRYANT
RESPONDENT SELECTION METHOD
Researchers often desire a probability method of
selecting respondents within households after drawing
a probability sample of households. Ideally, they
wish to simultaneously maximize response rates (by
909
Further Readings
910
True Value
Further Readings
TRUE VALUE
A true value, also called a true score, is a psychometric concept that refers to the measure that would have
been observed on a construct were there not any error
involved in its measurement. Symbolically, this often
is represented by X = T + e, where T is the (errorfree) true value, e is the error in measurement, and X
is the observed score. For example, a survey respondents true value (T) on a measure (e.g., a 5-item
Likert scale measuring attitudes toward her ancestors
native land) might be 13, but her observed score (X)
on a given day may be 15 or 17 or 10. There are
many reasons why the observed score may deviate
from the true value, but the researcher will rarely
know exactly why or exactly by how much. In theory,
if the error on the observed score is truly random,
then a researcher could take the same measure from
the same respondent over and over again, and assuming the true value on that measure for that respondent
did not actually change over the time the various measures were being taken, then the mean (average) score
of the various observed scores would be the true
value. The concept of a true value relates to the concepts of reliability and validity.
For survey researchers, the true value has some
very practical applications. First, it reminds researchers that random error is always to be assumed and that
any one measurement should never be regarded as
error-free. Second, if the researcher has the testretest
reliability on a given variable (x), then the researcher
TRUST IN GOVERNMENT
Trust in government has multiple meanings, many
sources, and numerous consequences. David Easton
distinguished between diffuse supportthat is, trust
in the institutions and mechanisms of government
and specific support, trust in current officeholders.
Researchers also can ask about trust in specific institutions within government (legislative, judicial, executive, law enforcement), at all levels (federal, state, and
local).
Political life, even at a routine and mundane level,
entails risk. When we vote for a candidate, pay our
taxes, or obey the law, we hope that our efforts will
not be wasted or exploited. Civic life, in a nutshell,
demands that we trust our government to uphold its
end of the democratic bargain. Survey research has
tracked the rise and fall in political trust over the decades and has tried to identify the factors that sustain
this tenuous but critical component of civilization.
Many surveys use a single item that simply asks
respondents how much they trust the government.
By far the most frequently used item is Question 1
from the biennial National Elections Study (NES):
How much of the time do you think you can trust the
government in Washington to do what is rightjust
t-Test
911
Further Readings
t-TEST
A t-test is a statistical process to assess the probability
that a particular characteristic (the mean) of two
populations is different. It is particularly useful when
data are available for only a portion of one or both
populations (known as a sample). In such a case, a ttest will enable an estimate of whether any differences
in means between the two groups are reflective of differences in the respective populations or are simply
due to chance. This statistical process is called a t-test
because it uses a t-distribution to generate the relevant
probabilities, typically summarized in a t-table.
There are many types of t-tests, each of which is
appropriate for a particular context and nature of the
data to be analyzed. Each t-test has its own set of
assumptions that should be checked prior to its use.
For instance, comparing the average value of a characteristic for independent samples (i.e., when individuals in each sample have no systematic relationship
to one another) and for dependent samples (such as
when related pairs are divided into two groups)
requires different kinds of t-tests.
Frequently, a t-test will involve assessing whether
a sample comes from a specific population or whether
average values between two samples indicate differences in the respective populations. In the latter case,
three factors determine the results of a t-test: the size
of the difference in average value between the two
groups; the number of data points, or observations, in
each group; and the amount of variation in values
within each group.
912
Type I Error
Further Readings
TYPE I ERROR
Type I error refers to one of two kinds of error of
inference that could be made during statistical hypothesis testing. The concept was introduced by J. Newman and E. Pearson in 1928 and formalized in 1933.
A Type I error occurs when the null hypothesis (H0 ),
that there is no effect or association, is rejected when
it is actually true. A Type I error is often referred to
as a false positive, which means that the hypothesis
test showed an effect or association, when in fact
there was none.
In contrast, a Type II error occurs when the null
hypothesis fails to be rejected when it is actually
false. The relation between the Type I error and Type
II error is summarized in Table 1.
The probability of a Type I error is usually denoted
by the Greek letter alpha (a) and is often called the
significance level or the Type I error rate. In the table,
the Greek letter beta (b) is the probability of a Type
II error. In most studies the probability of a Type I
error is chosen to be smallfor example, 0.1, 0.05,
0.01or expressed in a percentage (10%, 5%, or 1%)
or as odds (1 time in 10, 1 time in 20, or 1 time in
100). Selecting the alpha level of 0.05, for example,
means that if the test were to be conducted many
times where the null hypothesis is true, one can
expect that 5% of the tests will produce an erroneously positive result. A Type I error is an error due to
chance and not because of a systematic error such as
model misspecification or confounding.
A Type I error could be illustrated with an example of a disease diagnostic test. The null hypothesis is
that a person is healthy and does not have a particular
disease. If the result of a blood test to screen for the
disease is positive, the probability that the person has
the disease is high; however, because of the test used,
a healthy person may also show a positive test result
by chance. Such a false positive result is a Type I
error for the disease diagnostic test. Note that a Type
Type II Error
Table 1
913
Statistical Decision
H0 True
H0 False
Reject H0
Correct decision
Probability = 1
Do not reject H0
Correct decision
Probability = 1
I error depends on the way the hypothesis test is formulated, that is, whether healthy or sick is taken
as the null condition. In survey research, an example
of a Type I error would occur when a pollster finds
a statistically significant association between age and
attitudes toward normalization of U.S. relations with
Cuba, if in fact no such relationship exists apart from
the particular polling data set.
When testing multiple hypotheses simultaneously
(e.g., conducting post-hoc testing or data mining), one
needs to consider that the probability of observing
a false positive result increases with the number of
tests. In particular, a family-wise Type I error is making one or more false discoveries, or Type I errors,
among all the hypotheses when performing multiple
pairwise tests. In this situation, an adjustment for
multiple comparison, such as Bonferroni, Tukey, or
Scheffe adjustments, is needed. When the number of
tests is large, such conservative adjustments would
often require prohibitively small individual significance levels. Y. Benjamini and Y. Hochberg proposed
that, in such situations, other statistics such as false
discovery rates are more appropriate when deciding
to reject or accept a particular hypothesis.
Georgiy Bobashev
See also Alternative Hypothesis; Errors of Commission; Null
Hypothesis; Statistical Power; Type II Error
Further Readings
TYPE II ERROR
Type II error refers to one of two errors that could be
made during hypothesis testing. The concept was
introduced by J. Newman and E. Pearson in 1928 and
formalized in 1933. A Type II error occurs when the
null hypothesis (H0 ), that there is no effect or association, fails to be rejected when it is actually false. A
Type II error is often referred to as a false negative
because the hypothesis test led to the erroneous conclusion that no effect or association exists, when in
fact an effect or association does exist. In contrast to
Type II errors, a Type I error occurs when the null
hypothesis is rejected when it is actually true. The
features of Type II and Type I errors are summarized
in Table 1, above.
In the table, the maximum probability of a Type II
error is denoted by the Greek letter beta (b), and
1 b is often referred to as the statistical power of
the test. It is intuitive to require that the probability of
a Type II error be small; however, a decrease of b
causes an increase in the Type I error, denoted by
Greek letter alpha (a), for the same sample size.
In many statistical tasks, data collection is limited
by expense or feasibility. Thus, the usual strategy is
to fix the Type I error rate and collect sufficient data
to give adequate power for appropriate alternative
hypotheses. Although power considerations vary
depending on the purpose of the study, a typical
914
Type II Error
association between age and attitudes toward normalization of U.S. relations with Cuba, if in fact such
a relationship exists apart from the particular polling
data set.
If a number of studies have been conducted on the
same topic, it is possible sometimes to pool the data
or the results and conduct a meta-analysis to reduce
type II error. Because of the relationship between the
two types of error, many considerations regarding the
Type I errorsuch as multiple tests and rejecting
(H0 ) by chancealso apply to the Type II error.
Georgiy Bobashev
See also Alternative Hypothesis; Errors of Omission; Null
Hypothesis; Statistical Power; Type I Error
Further Readings
U
be intoxicated or might be ill or in the hospital for
a brief period. In these cases, the unable to participate
disposition should be considered a temporary disposition although the interviewer should do his or her best
to determine how long this situation will last (even if
just an estimated number of days) and note this in the
case history file; then the case should be recontacted
after an appropriate amount of time has passed.
It is important to note that cases in which the respondent indicates that he or she is unable to participate in
the survey may be a tactic by the respondent to avoid
completing the interview or the questionnaire. Although
these instances are not common, cases in which it is detected should be considered indirect respondent refusals.
UNABLE TO PARTICIPATE
The unable to participate survey disposition is used
in all types of surveys, regardless of mode. It occurs
when the selected respondent for a survey is incapable
of completing the telephone or in-person interview or
of completing and returning the Web-based or mail
questionnaire. Cases coded with the unable to participate disposition often are considered eligible cases in
calculating survey response rates.
A variety of permanent and temporary reasons may
cause respondents to be unable to participate in a survey. Permanent reasons for being unable to participate
in a survey include language barriers, physical or mental disabilities, and chronic severe illnesses. The sampled respondent in a telephone or in-person survey
may not speak English (or another target language)
well enough to complete the interview. Respondents
with a mental disability may not have the cognitive
capacity to complete the survey questionnaire or the
interview. Being hospitalized beyond the surveys field
period would also be considered a permanent reason
for being unable to participate in the survey. In the
case of a mail or Internet survey, being illiterate would
constitute a permanent reason for being unable to participate. The unable to participate survey disposition
usually is considered a final disposition in cases where
there is a permanent or long-term reason for the respondent not being capable of participating in the survey.
In other cases, the reason for a respondent being
unable to participate in a survey may be temporary in
nature. For example, the sampled respondent might
Matthew Courser
See also Final Dispositions; Language Barrier; Respondent
Refusal; Response Rates; Temporary Dispositions;
Unavailable Respondent
Further Readings
UNAIDED RECALL
One of the first decisions a designer of a questionnaire
must make is whether the information sought can best
915
916
Unaided Recall
Unavailable Respondent
in which the answers to previous unstructured questions are incorporated as reminders. At the same time,
new behaviors are checked with previous ones to
avoid duplications. In this way, the first round of
unaided questions can be said to bound the responses
and provide a kind of baseline for subsequent rounds
of questions and answers. Bounding can reduce telescoping but does not improve the problem of omissions. Bounding, in its full implementation, requires
expensive panel data in which the same respondents
are repeatedly interviewed. But modified bounding
questions can be implemented in a single interview
by asking first about an earlier time period and then
referring back to that period when asking about subsequent time frames.
The mode by which the question is administered
is also important to consider, that is, face-to-face,
telephone, or self-administered interview, such as in
a mail or Internet survey. If the questionnaire is selfadministered, presumably the respondent could take
as much time to think about the answer as needed. A
person might feel under more time pressure if an
interviewer is waiting for the answer on the telephone
than in person. In self-administered online questionnaires, the designer might want to control cueing
effects of question context or order by requiring people to answer questions in a fixed order and present
them one at a time, thus limiting peoples ability to
scroll ahead to learn more about the question context.
If the question is being posed to a respondent by an
in-person interviewer, it also places some burden on
the interviewer to gather verbatim answers on the fly
as the respondent provides the information. To ensure
this information is gathered completely and accurately, the interviewer must be carefully instructed to
reproduce the respondents comments with precision.
Accuracy and completeness can be enhanced by
training interviewers to use procedures such as asking
the respondent to slow down, to repeat, and to review
the answer with the respondent more than once to
make sure the answer accurately reflects what the
respondent intended. Such interviewers might also be
equipped with voice recording technology or be specifically trained to write down material rapidly and accurately. Interviewers must also be given specific and
uniform instructions regarding the appropriate probing
that they are to do to make certain that the respondent
has told all they need to in order to answer the questions completely. Standard probing protocols are important to minimize the likelihood that a respondents
917
Further Readings
UNAVAILABLE RESPONDENT
The unavailable respondent survey disposition is used
in all types of surveys, regardless of mode. It occurs
when the selected respondent for a survey is not available (a) at the time of interview, (b) for some portion
of the survey field period (temporarily unavailable),
or (c) until after the field period of the survey has
ended (permanently unavailable). Cases in which the
respondent is unavailable, regardless of the duration,
are considered eligible cases in calculating survey
response rates.
In most surveys, less than half of the completed
interviews result after the first contact with the
respondent, regardless of whether that contact occurs
in person, by telephone, by mail, or by Internet. Many
respondents simply are not available at the time of an
918
Unbalanced Question
interviewers telephone call or visit to the respondents residence. Respondents may also be away from
their residence, may not have retrieved or processed
their postal mail or their electronic mail, or may be
too busy to complete the survey immediately after it
is received. In these situations, it is important to make
additional contacts with the selected respondent. For
example, in-person and telephone interviewers may
ask questions to determine a good day or time for
another call or visit to the respondent; survey firms
may send a reminder postcard or email message to
the respondent asking him or her again to complete
and return the survey questionnaire.
In other situations, respondents might be unavailable for several days or longer (not just at the time of
contact by an interviewer or at the time a survey is
delivered). Respondents may be on vacation or away
on a business trip. As long as these temporary periods
of unavailability do not extend beyond the field period
of a survey, most survey firms hold these cases for an
appropriate amount of time and then make additional
attempts to contact these respondents and to obtain
their cooperation.
There also are instances when the respondent is
unavailable during the entire field period of a survey.
These instances can include, for example, extended
vacations and business travel to other countries.
Unless the field period of the survey is extended,
these cases usually are coded as permanently unavailable and are not processed further or contacted again.
Matthew Courser
See also Callbacks; Final Dispositions; Response Rates;
Temporary Dispositions
Further Readings
UNBALANCED QUESTION
An unbalanced question is one that has a question
stem that does not present the respondent with all
Unbiased Statistic
Further Readings
UNBIASED STATISTIC
An unbiased statistic is a sample estimate of a population parameter whose sampling distribution has a mean
that is equal to the parameter being estimated. Some
919
920
Undecided Voters
In the survey sampling context, a well-known statistic for estimating the population total or mean in the
presence of additional auxiliary information is the
ratio estimator. Although this estimator is not unbiased
for the corresponding population parameter, its bias is
in part a decreasing function of the sample size.
It perhaps should go without saying, but an unbiased statistic computed from a given sample is
not always equal to the corresponding parameter.
Unbiasedness is strictly a long run concept.
Thus, although not necessarily equal to the population parameter for any given sample, the expected
value of the statistic across repeated samples is, in
fact, the parameter itself. On any given occasion,
a biased statistic might actually be closer to the
parameter than would an unbiased statistic. When
estimating a population variance, for example, division of the sum of squares by n rather than n 1
might provide a better estimate. That statistic is
the maximum likelihood estimator of a population variance. So care should be taken when choosing a statistic based on its bias and precision; that
is, variability.
Thomas R. Knapp
See also Bias; Design-Based Estimation; Model-Based
Estimation; Population Parameter; Precision; Variance
Further Readings
UNDECIDED VOTERS
Undecideds, in election campaign lingo, are voters
who have yet to decide which candidateor side of
an issue or referendumthey will support on Election
Day. In pre-election polls, there are several ways of
measuring support for a candidate or issue and, concomitantly, the number of undecided voters. A few
researchers argue for an open-ended question format,
in which the respondent volunteers the name of his or
her preferred candidate. Answering such open-ended
questions requires respondents to recall candidate
names on their own, which means that these questions
Undecided Voters
921
Discriminant analysis and other multivariate techniques first look at candidates supporter profiles using
researcher-specified variables such as party identification or gender. Then the technique allocates undecided
voters to candidates based on how similar they are to
the candidates supporters on these variables. The
technique usually involves using a random half of the
sample to generate the statistical formula or model that
will classify undecided respondents and using the
other half to test the accuracy of the formula.
A fourth method of allocation, which some campaign researchers favor, has two steps. First, because
party identification is so closely correlated with candidate choice, undecided partisan voters are allocated to
their partys candidateundecided Democrats would
be allocated to the Democratic candidate and undecided Republicans, to the Republican candidate. Next,
the remaining undecided voters without a party affiliation are allocated proportionally.
There are other less-empirical ways of allocating
undecided voters. One is called the incumbent rule,
as advocated by pollster Nick Panagakis. Although
popular references call it a rule, it is less a formula
than it is a conventional wisdom guideline adopted by
some pollsters over the past two decades. The rule,
which has received very little attention from political
scientists, suggests that in an election where there is
a well-known incumbent with less than 50% of the
support and a well-known challenger, the majority of
the undecided voters ultimately will vote for the challenger and against the incumbent. The rule assumes that
the election is a referendum on the incumbent, and if
voters do not back the incumbent after a term in office,
they probably are not supporters. The rule appears to
hold better in state and local elections than in presidential elections. Recently there has been some evidence
that the incumbent rule may be weakening.
Robert P. Daves
See also Leaning Voters; Likely Voter; Pre-Election Polls;
Trial Heat Question
Further Readings
922
Undercoverage
UNDERCOVERAGE
Undercoverage occurs when an element of the target
population is not represented on the survey frame and
therefore not given any chance of selection in the survey sample; that is, the element has zero probability of
selection into the sample. Undercoverage is the most
serious type of coverage error because it can be difficult to detect and even more difficult to solve. Therefore, preventing undercoverage is often a priority
during survey design. Large survey operations often
plan and budget for extensive coverage evaluations.
For example, a large sample survey called the Accuracy and Coverage Evaluation was conducted by the
U.S. Census Bureau during Census 2000, with separate
staff and separate physical office space. Its primary
purpose was to evaluate the coverage of the census.
In household surveys of the general population,
there are generally two levels at which undercoverage
is a concern. First, households may be missing from
the frame. For example, in a random-digit dialing
telephone survey, households without telephone service will be missed. In addition, while landline telephone coverage of most U.S. populations had been
increasing, the recent popularity of cell phones has
begun to jeopardize coverage of traditional landline
telephone frames because of the rapidly growing cell
phone only populationmore than 20% of all U.S.
households as of 2008. In surveys that use an area
frame, households can be missed during the listing
operation for a variety of reasons; for example, difficult to visually spot, misclassified as vacant or business, incorrectly assigned to across a boundary into
an unsampled geographic area.
Second, even when a household is on the frame,
some people within the household may not be covered. Unfortunately, this type of undercoverage cannot be prevented by good frame construction. There
are two major theories on why people fail to identify
Unfolding Question
Further Readings
923
UNDERREPORTING
When answering questions on sensitive behaviors,
many respondents show a specific variant of response
error: They tend to report fewer instances of undesired
behaviors compared to what they have actually experienced. This is called underreporting. It is assumed
that respondents avoid reporting unfavorable conduct
because they do not want to admit socially undesirable
behaviors, which in turn leads to this type of misreporting. Similar effects are known for responses to survey questions about unpopular attitudes.
Currently, it is not known whether underreporting
is the result of a deliberate manipulation of the true
answer or whether it occurs subconsciously. Nevertheless, for the most part it is assumed to be a response
error that is associated to the cognitive editing stage
of the questionanswer process. Misreporting due to
forgetting or other memory restrictions is considered
to be a minor source of underreporting.
Several validation studies use records of the true
answers to provide evidence for underreporting. For
example, studies on abortion or illegal drug use show
that fewer respondents admit those behaviors, or if
UNFOLDING QUESTION
An unfolding question refers to a type of question
sequence that yields more complete and accurate data
than would a single question on the same topic.
Unfolding questions are used by survey researchers in
an attempt to reduce item nonresponse (i.e., missing
data) and measurement error.
For example, asking someone into which of the following income categories her 2007 total household
924
Unit
UNIT
Sampling, variance, analysis, reporting, and dissemination units cover most of the unit types of interest in
survey methodology. Sampling units can be primary, secondary, tertiary, and beyondalso known
as first-stage, second-stage, and so on. These refer
to a hierarchy of smaller and smaller geographic or
organizational structures that have been exploited in
the sample design to improve efficiency. In face-toface surveys of the general population, the stages are
Unit Coverage
925
Further Readings
UNIT COVERAGE
The proportion of the intended referential universe
that is represented (on a weighted basis) in the sampling frame of a survey is referred to as the coverage
rate. Unit coverage refers to how extensively the units
(e.g., households, schools, businesses) in the universe
are covered by (i.e., included in) the sampling
frame, which includes the dynamic process whereby
an enumerator (lister) or interviewer decides whether
a household or person should be included in the
frame.
For purposes of setting public policy, it is generally considered desirable to hear the voices of a broad
swath of the public. Even in voter intention surveys,
pollsters are more confident in their predictions if
their screening survey for likely voters covers most of
society. As such, survey researchers should strive to
use sampling frames that cover the units of the population as well as is possible.
Coverage is a function of both how a sample
is selected and how contact with sample units is
attempted. Being covered does not necessarily
mean that the person responds. But it does include
whether they cooperate enough to confirm their
existence. For example, an interviewer may approach a house that appears to be abandoned and
knock on the door. If the occupant shouts out
a demand that the interviewer leave, then he or she
is covered. If, on the other hand, the occupant
926
Unit Coverage
remains stealthily silent so that the interviewer erroneously classifies the house as vacant, then he or
she is not covered. As another example, a person
who answers the phone when contacted as part of
a random-digit dialing survey and then hangs up
without saying anything else is covered, but a person
who never answers the phone because of a reaction
to the caller ID is typically not covered. Errors in
coverage that lead to households or people being
missing from the frame make up undercoverage.
There are a variety of reasons for undercoverage.
One is that some members of society are secretive,
antisocial, or both. Examples include illegal migrants,
squatters, fugitives, members of insular religious
cults, and members of rebellion-like movements.
Another reason is mental or physical illness that
makes it impossible for someone to communicate
with a survey interviewer. A third is high mobility,
as in the case of migrant farm workers. A fourth is
physical remoteness of dwellings. A fifth is shared
addresses and improvised apartments. Electronic connectedness matters for some surveys. The desire to
protect children also can be a factor. For example, it
is known that overtly screening a sample for households with children can lead to much lower coverage
of children than a general interview where information about children is collected as a secondary issue.
A continuum of eagerness and capacity for social
engagement can range from schizophrenic homeless
wanderers and misanthropic hermits at one extreme to
politicians who live almost purely in the public eye at
the other. Professional experience in the United States
indicates that people in the top half of that continuum
can be covered easily, provided that the survey has
a legitimate social research purpose. People in the
second quartile of the continuum require more effort
but can still usually be covered with fairly low-cost
methods. People between the 15th and 25th percentiles of the continuum can usually only be covered
with fresh area samples conducted by door-to-door
interviewers under government auspices. People
between the 5th and 15th percentiles usually can be
covered only in official surveys conducted by the
Census Bureau. People between the 2nd and 5th percentiles can be covered only through heroic means,
such as those employed in the decennial censuses.
People in the first two percentiles are mostly not coverable by any procedure. Or rather, the extraordinary
procedures that are required to try to cover them also
result in overcoverage where some other people are
Further Readings
Unit Nonresponse
UNIT NONRESPONSE
Unit nonresponse in a survey occurs when an eligible
sample member fails to respond at all or does not
provide enough information for the response to be
deemed usable (not even as a partial completion).
Unit nonresponse can be contrasted to item nonresponse (missing data) wherein the sample member
responds but does not provide a usable response to
a particular item or items. Unit nonresponse can be
a source of bias in survey estimates, and reducing unit
nonresponse is an important objective of good survey
practice.
927
928
Unit of Observation
Efforts to reduce unit nonresponse can have drawbacks. Incentives and follow-up activities are costly. If
the follow-up is too intense, survey members may provide low-quality responses so as not to be bothered further. Proxy respondents may not be able to provide
data as accurate as the intended respondent. In some
cases, the disadvantages may even override the benefits
of the reduced unit nonresponse, in part because not all
unit nonresponse leads to unit nonresponse error.
In many surveys there are hard-core nonrespondents that are almost impossible to get to respond. In
some cases, unfortunately, these nonrespondents may
be of special interest. In household travel surveys, for
example, there is great interest in recent immigrants
because these individuals may not have obtained a drivers licence and hence may be dependent on public
transportation, walking, and bicycling as modes of
transportation. Yet because of possible language barriers, lack of telephone coverage, and other factors,
recent immigrants may be especially difficult to get to
respond.
Further Readings
UNIT OF OBSERVATION
A unit of observation is an object about which information is collected. Researchers base conclusions on
information that is collected and analyzed, so using
defined units of observation in a survey or other study
helps to clarify the reasonable conclusions that can be
drawn from the information collected.
Universe
Further Readings
UNIVERSE
The universe consists of all survey elements that qualify for inclusion in the research study. The precise
definition of the universe for a particular study is set
by the research question, which specifies who or what
is of interest. The universe may be individuals, groups
of people, organizations, or even objects. For example, research about voting in an upcoming election
would have a universe comprising all voters. If the
research was about political parties sponsorship of
candidates, the research would include all political
parties. In research that involves taking a sample of
929
930
Unknown Eligibility
UNKNOWN ELIGIBILITY
Unknown eligible is a category of survey dispositions
that is used in all survey modes (telephone, in-person,
mail, and Internet) when there is not enough information known about the case to determine whether the
case is eligible or ineligible. For this reason, most survey organizations make additional contact attempts
on cases categorized into one of the unknown eligibility categories to try to definitively categorize the case
into either the eligible or ineligible category. Because
the unknown eligible category is broad, researchers
usually use a more detailed set of survey disposition
codes to better categorize the outcomes of cases that
fall into the unknown eligible category.
In a telephone survey, cases are categorized as
unknown eligibles under two circumstances: (1) when
it is not known whether an eligible household exists
at a telephone number and (2) when it is known that
a household exists at the number, but it is unknown
whether an eligible respondent exists at the number.
For cases in which it is not known whether a household exists at the sampled telephone number, cases of
unknown eligibility include those numbers that were
included in the sample but never dialed by an interviewer. Another common subcategory of cases of
unknown eligibility include numbers that always
Unlisted Household
completed. A fourth type of cases of unknown eligibility occurs in mail surveys when a questionnaire
is returned as undeliverable but includes forwarding
information. The fifth, and most common, case is
those mailings for which the researcher never receives
back any information on whether or not the mailing
was received, including no returned questionnaire.
Cases of unknown eligibility occur for two principal reasons in an Internet survey: (1) when an invitation to complete the survey is sent, but nothing is
known about whether the invitation was received, and
(2) when the researcher learns that the invitation was
never delivered. In the former case, many invitations
to complete an Internet survey are sent via email, and
researchers may receive no notification if the email
address the invitation was sent to is incorrect, not working, or is not checked by the sampled respondent. In
the latter case, researchers may receive a message that
indicates that the email address is incorrect and that
the invitation cannot be delivered. In either instance,
usually nothing is known about the sampled respondent and his or her eligibility to complete the survey.
There tends to be variance across survey organizations in how cases of unknown eligibility are treated
in computing surveys response and cooperation rates.
The most conservative approach is to treat these cases
as eligible cases; the most liberal approach, which
makes the survey appear more successful, would treat
cases of unknown eligibility as ineligible and thus
exclude them from the response rate calculations. In
most instances, neither of these extremes is prudent,
and instead researchers often treat a proportion of
unknown eligibility cases (referred to as e) as being
eligible in their response rate calculation. When doing
this, the researcher should have an empirical basis
upon which to calculate the proportion counted as
eligible or ineligible.
Matthew Courser
See also e; Final Dispositions; Ineligible; Out of Sample;
Response Rates; Temporary Dispositions
Further Readings
931
UNLISTED HOUSEHOLD
An unlisted household is a residential unit that does
not have a telephone number listed in published
directories or one that would be given out by directory assistance. A telephone number may be unlisted
for several reasons. Household residents actively may
have chosen to keep their telephone number unlisted.
Alternatively, although the residents of an unlisted
household may not have specifically chosen to keep
their telephone number unlisted, the number may be
too new to be found in published directories or via
directory assistance. This can be a result of a recent
move or change in telephone number.
Although the literature suggests important differences between the characteristics of respondents
who choose to keep their telephone numbers
unlisted and respondents whose telephone numbers
are unlisted for other reasons, consistent findings
have been reported on the larger differences
between the characteristics of households with listed
numbers versus those with unlisted numbers. Directory-listed sampling frames have tended to exhibit
an overrepresentation of (a) established households,
(b) middle- and higher-income households, (c) two
or more person households, (d) older householders,
(e) married householders, (f) better-educated householders, (g) retired householders, (h) white-race
households, and (i) home owners.
In general, sampling frames based on listed telephone numbers exclude unlisted households. This
coverage shortfall has the potential to introduce bias
into surveys that use a directory-listed sampling frame
to construct a representative sample of the general
population of telephone households. The coverage
issue is most acute in urban areas and the western
region of the United States, where unlisted telephone rates are higher. Additionally, as the proportion of households without listed telephone numbers
in the general population increases, the potential for
bias due to their exclusion from directory-listed
sampling frames is likely to increase commensurately. Random-digit dialing (RDD) is an alternative
sample design that can overcome this coverage limitation. However, although RDD sampling frames
include both listed and unlisted households, the
higher proportion of nonworking numbers in RDD
sampling frames can reduce the efficiency of the
932
Unmatched Number
sample as compared to that of a directory-listed sampling frame. It also may reduce the external validity of
the sample, as respondents with an unlisted number
are more prone to survey nonresponse due to both
noncontact and refusal.
Adam Safir
See also Coverage Error; Directory Sampling; External
Validity; List-Assisted Sampling; Noncontacts; RandomDigit Dialing (RDD); Refusal; Sampling Frame;
Unpublished Number
Further Readings
UNMATCHED NUMBER
An unmatched telephone number is one that does
not have a mailing address associated with it. Typically, it also does not have a name matched to it.
The vast majority of unmatched telephone numbers
are also unlisted telephone numbers. The proportion
of unmatched numbers in a telephone survey that uses
matching as a technique will depend upon the extent
to which the survey organization uses multiple vendors to do their matching and the quality of the
matching techniques and databases the vendor uses.
However, in random-digit dialing surveys in the
United States, regardless of how many matching vendors are used, a large minority of numbers can never
be matched to an address because that information
simply is not accessible to any matching company.
UNPUBLISHED NUMBER
The unpublished number disposition is used both in
random-digit dialing (RDD) surveys of the general
public and in telephone surveys of named people that
use a list-based sample. Unpublished telephone
numbers are numbers that are not listed in the local
Usability Testing
933
Further Readings
USABILITY TESTING
Although usability testing can apply to all types of
products, for survey research, it can best be described
as a method for measuring how well interviewers and
respondents can use a computer-assisted interview
such as a CAPI, CATI, CASI, or Web-based survey,
for its intended purpose. It is important to separate
usability testing from testing functionality, which
focuses only on the proper operation of a computerized instrument (software and hardware), not the
individual using the system. The purpose of usability
testing is to determine whether or not the form being
used to collect data helps or hinders a users ability to
deploy it.
In developing and designing survey instruments,
researchers have always strived to ensure that data collection instruments are the best they can be through
a variety of testing and evaluation methods put into
place prior to data collection. Traditionally, cognitive interviewing and other cognitive methods have
provided important tools for examining the thought
processes that affect the quality of answers provided
by survey respondents to survey questions. In addition, question appraisal systems are used to provide
a structured, standardized instrument review methodology that assists a survey design expert in evaluating questions relative to the tasks they require of
respondents, specifically with regard to how respondents understand and respond to survey questions.
Focus groups can be used to obtain qualitative data
934
935
936
V
greatly reduced but not necessarily eliminated. It is
reduced because the logic that is built into the computer interviewing software will not allow many of the
problems to occur that are often present in paper-andpencil interviewing (PAPI). These include erroneously
skipped questions that should have been asked of a particular respondent and other questions that were asked
when they should have been skipped. Problems like
these in PAPI interviewing can be caught through
manual validation and quickly correctedwhether
they indicate the need to change the skipping instructions for certain contingency questions or the need to
retrain particular interviewers.
Furthermore, the need for manual validation is not
eliminated entirely by computer-assisted interviewing
because the answers to open-ended questions cannot
be checked for quality via computer software. Imagine
a researcher who did not have open-ended answers
validated until after all data were gathered, only to find
significant problems in what was recorded by interviewers or respondents. Such problems could result
from many reasons, including an open-ended question
that was not well understood by most respondents. If
this validation did not happen until all the data were
gathered, the researcher may need to scrap this variable entirely. Instead, by having open-ended responses
manually validated in the early stages of a survey, the
researchers can catch such problems before they affect
too many respondents. This will help ensure data quality and ultimately avoid disasters where those data for
an entire variable are worthless.
VALIDATION
Within survey research, validation has more than
one meaning, but this entry focuses on the concept
of validating a completed interview, in particular
one completed via paper and pencil, as soon as
possible after it has been completedthat is, making certain that (a) all questions that should have
been asked and answered were done so, and (b)
answers that were written in for open-ended questions are legible, understandable, and relevant to
the question being asked. The advantages of doing
this type of validation as soon as possible after
interviews are completed are three-fold. First, it
allows for mistakes to be corrected soon after they
have been made at a time when an interviewers
memory of the interview is still relatively fresh.
Second, especially early in the field period of a survey, it helps to identify the need for retraining all
or some interviewers if consistent problems are
detected by the person doing the validation. Third,
early in the survey period, it allows the researchers
to modify their questionnaire if that is necessary
for example, fix a skip pattern that is not working
as intendedbefore the problem negatively affects
too many of the completed cases that otherwise
may need to be reinterviewed.
With the advent of computer-assisted interviewing,
whether the questionnaire is administered by an interviewer or is self-administered by the respondent, the
need for manual validation of completed interviews is
Paul J. Lavrakas
937
938
Validity
Further Readings
VALIDITY
The word validity is primarily a measurement term,
having to do with the relevance of a measuring instrument for a particular purpose, but it has been broadened
to apply to an entire study. A research investigation is
said to have internal validity if there are valid causal
implications and is said to have external validity if the
results are generalizable.
As far as measurement is concerned, the most
important property of a measuring instrument is the
extent to which it has been validated with respect to
some gold standard whose validity has been assumed to
be taken for granted. For example, if scores on a test of
mathematical aptitude (the instrument to be validated)
correlate highly with scores on a subsequent test of
mathematical achievement (the gold standard), all is
well, and the aptitude test would be regarded as valid.
In the early 20th century, the methodological literature referred to three kinds of validity: (1) content
validity (expert judgment, i.e., postulation of the gold
standard itself), (2) criterion-related validity (the type
of validity mentioned in the previous paragraph), and
(3) construct validity (the extent to which the scores
obtained using a particular measuring instrument
agreed with theoretical expectations). There were also
subtypes of criterion-related validity (concurrent and
predictive) and construct validity (convergent and discriminant), but more recently the general label construct validity not only has become more popular
but also has been alleged to include content validity
and criterion-related validity as well.
Connection Between
Reliability and Validity
Reliability is concerned with the consistency of
results whether or not those results are valid. It is easy
to imagine a situation in which ratings given to various contestants (in a figure skating event, for example) are consistent (reliable) from one judge to
another, but that all the ratings are wrong due to the
personal biases of the judges.
There is a way to investigate the validity of an
instrument with respect to its reliability. Suppose that
a survey researcher were interested in pursuing further
the relationship between performance on an aptitude
test (the predictor) and performance on an achievement test (the criterion with respect to which the
predictor instrument is to be validated) in which a
correlation between the two of .54 has been obtained.
The researcher would like to estimate what the correlation would be if it were based upon true scores
rather than observed scores. There is a formula called
the correction for attenuation that can be used for that
purpose. The obtained correlation is divided by the
product of the square roots of the estimates of the
respective reliability coefficients. If the reliability
coefficient for the aptitude test were estimated to be
.64 and the reliability coefficient for the achievement
test were estimated to be .81, application of that formula would yield a value of .75, which is considerably higher than the .54. That makes sense, because
the true correlation has been attenuated (i.e., reduced)
by the less-than-perfect reliabilities of the two tests.
Variance
Further Readings
VARIABLE
A variable is something that varies in value, as opposed
to a constant (such as the number 2), which always
has the same value. These are observable features of
something that can take on several different values or
can be put into several discrete categories. For example,
respondents scores on an index are variables because
they have different values, and religion can be considered a variable because there are multiple categories.
Scientists are sometimes interested in determining the
values of constants, such as p, the ratio of the area of
a circle to its squared radius. However, survey research
involves the study of variables rather than constants.
A quantity X is a random variable if, for every
number a; there is a probability p that X is less than
or equal to a: A discrete random variable is one that
attains only certain values, such as the number of children one has. By contrast, a continuous random variable is one that can take on any value within a range,
such as a persons income (measured in the smallest
possible fractions of a dollar).
Data analysis often involves hypotheses regarding
the relationships between variables, such as If X
increases in value, then Y tends to increase (or decrease)
in value. Such hypotheses involve relationships
between latent variables, which are abstract concepts.
These concepts have to be operationalized into manifest
variables that can be measured in actual research. In
surveys, this operationalization involves either using
939
VARIANCE
Variance, or dispersion, roughly refers to the degree
of scatter or variability among a collection of observations. For example, in a survey regarding the effectiveness of a political leader, ratings from individuals
will differ. In a survey dealing with reading ability
among children, the expectation is that children will
differ. Even in the physical sciences, measurements
might differ from one occasion to the next because of
the imprecision of the instruments used. In a very real
sense, it is this variance that motivates interest in statistical techniques.
A basic issue that researchers face is deciding how
variation should be measured when trying to characterize a population of individuals or things. That is, if
940
Variance
where X =
n
P
n
1 X
2,
Xi X
n 1 i=1
Normal curve
Contaminated normal
i=1
For some purposes, the use of the standard deviation stems from the fundamental result that the probability of an observation being within some specified
distance from the mean, as measured by s, is completely determined under normality. For example, the
probability that an observation is within one standard
deviation of the mean is .68, and the probability of
being within two standard deviations is .954. These
properties have led to a commonly used measure of
effect size (a measure intended to characterize the
extent to which two groups differ) as well as a frequently employed rule for detecting outliers (unusually
Figure 1
0
x
Variance
941
Effect Size
The variance has played a role in a variety of other
situations, stemming in part from properties enjoyed
by the normal distribution. One of these is a commonly used measure of effect size for characterizing
how two groups differ:
D=
m1 m2
,
sp
Standard Errors
The notion of variation extends to sampling distributions in the following manner. Imagine a study based
on n observations resulting in a simple random sam Now imagine the study is repeated
ple mean, say X.
many times (in theory, infinitely many times) yielding
the sample means X1 , X2 , . . ., with each sample mean
again based on n observations. The variance of these
sample means is called the squared standard error of
the sample mean, which is known to be
2
EX m = s2 =n
under random sampling. That is, it is the variance of
these sample means. Certainly the most useful and
important role played by the variance is making inferences about the population mean based on the sample
mean. The reason
pffiffiis that the standard error of the
sample mean, s= n, suggests how inferences about m
should be made assuming normality, but the details
go beyond the scope of this entry.
942
Variance Estimation
Winsorized Variance
Another measure of dispersion that has taken on an
increasing importance in recent years is the Winsorized variance; it plays a central role when comparing
groups based on trimmed means. It is computed
as follows. Consider any g such that 0 g < 1. Let
g = gn, rounded down to the nearest integer. So if
gn = 9:8, say, g = 9: Computing a g-trimmed mean
refers to removing the g smallest and g largest observations and averaging those that remain. Winsorizing
n observations refers to setting the g smallest values
to the smallest value not trimmed and simultaneously
setting the g largest values equal to the largest value
not trimmed. The sample variance, based on the
Winsorized values, is called the Winsorized variance.
Both theory and simulation studies indicate that trimming can reduce problems associated with means due
to nonnormality, particularly when sample sizes are
small. Although not intuitive, theory indicates that the
squared standard error of a trimmed mean is related
to the Winsorized variance, and so the Winsorized
variance has played a role when making inferences
about a population trimmed mean. Also, it has been
found to be useful when searching for robust analogs
of Pearsons correlation.
Rand R. Wilcox
See also Mean; Outliers; Standard Deviation; Standard Error;
Variance Estimation
Further Readings
VARIANCE ESTIMATION
Variance refers to the degree of variability (dispersion) among a collection of observations. Although
estimation of the size of the variance in a distribution of numbers often is a complex process, it is an
extremely important endeavor for survey researchers, as it helps make valid inferences of population
parameters.
Calculation Methods
The two most popular classes in the calculation of
correct variance estimates for complex sample surveys are replicate methods for variance estimation
and Taylor series linearization methods. A third class
of techniques that are sometimes used are generalized
variance functions.
Replicate methods for variance compute multiple
estimates in a systematic way and use the variability in
these estimates to estimate the variance of the fullsample estimator. The simplest replicate method for
variance is the method of random groups. The method
of random groups was originally designed for interpenetrating samples or samples that are multiple repetitions (e.g., 10) of the same sampling strategy. Each of
these repetitions is a random group, from which an
estimate is derived. The overall estimator is the average of these estimates, and the estimated variance of
the estimator is the sampling variance of the estimators.
Of course, this technique can be used for any complex
Variance Estimation
sample survey by separating the sample into subsamples that are as equivalent as possible (e.g., by sorting by design variables and using systematic sampling
to divide into random groups). This simplest of replicate methods for variance is simple enough to do by
hand, but is not as robust as more modern replicationbased methods that can now be easily calculated by
computers.
Balanced repeated replication (BRR), also known
as balanced half-samples, was originally conceived
for use when two primary sampling units (PSUs) are
selected from each stratum. A half-sample then consists of all cases from exactly one primary sampling
unit from each stratum (with each weight doubled).
Balanced half-sampling uses an orthogonal set of
half-samples as specified by a Hadamard matrix. The
variability of the half-sample estimates is taken as an
estimate of the variance of the full-sample estimator.
Jackknife variance calculation, also called Jackknife repeated replication (JRR), works in a similar
way to BRR. The JRR method creates a series of replicate estimates by removing only one primary sampling
unit from only one stratum at a time (doubling the
weight for the stratums other primary sampling unit).
For a simple example, let us use two strata (1 and
2) with two PSUs in each (1a, 1b, 2a, and 2b). BRR
could form estimates by using only 1a and 2a, only
1b and 2b, only 1a and 2b, and only 1b and 2a (each
orthogonal combination of PSUs). JRR drops only
one PSU at a time and forms estimates by using 1b
(doubly weighted), 2a, and 2b; or 1a (doubly
weighted), 2a, and 2b; or 1a, 1b, and 2a (doubly
weighted); or 1a, 1b, and 2b (doubly weighted).
Even if a complex sample survey does not have
exactly two PSUs in each stratum, the BRR and JRR
techniques can be used by defining pseudo-strata and
pseudo-PSUs so that there are two pseudo-PSUs in
each pseudo-stratum. Of course, these methods work
only as well as these definitions work.
A more general replication method for variance is
bootstrapping. The idea of bootstrapping is that, from
the full original sample drawn, the researcher repeatedly draws simple random samples of the same sample size with replacement. With each draw, different
observations will be duplicated, and these differences
will result in different estimates. The variance of these
estimates is the variance estimate.
Taylor series linearization methods work by approximating the estimator of the population parameter of
interest by a linear function of the observations. These
943
944
Verbatim Responses
Further Readings
VERBATIM RESPONSES
A verbatim response refers to what an interviewer
records as an answer to an open-ended question
Further Readings
VERIFICATION
Within survey research, one of the meanings of verification refers to efforts that are made to confirm that
a telephone or in-person interview was actually completed with a particular respondent. Within this context, verification is one of the techniques used to
guard against interviewer falsification. Such verification often is done via telephone even if the interview
was done in person. In most cases a small number
(e.g., less than five) of randomly selected completed
interviews of all interviewers are verified by having
a person in a supervisory position contact the respondents and verify that they were, in fact, interviewed
by the interviewer. In other instances, interviewers
who may be under observation for previous interviewing infractions may have a greater number of
their interviews verified. Oftentimes, interviewers are
instructed to tell respondents at the end of an interview that someone may contact them to verify that
the interview was completed.
Other meanings of verification, within the context
of survey research, concern efforts that researchers
sometimes make to verify whether a respondents
answer to a key variable is accurate. These procedures
include so-called record checks in which the
researcher gains access to an external database (such
as those assembled by various government agencies)
that can be used to officially verify whether the
answer given by a respondent is correct. For example,
in a survey conducted of voting behavior, if respondents are asked in a survey whether they voted in the
past election, then researchers could go to public election records to learn if a given respondent did or did
not vote in the last election, thereby verifying the
answer given in the survey. However, in this entry, it
is the former meaning of the term verification that is
addressed.
In theory, informing interviewers that some of their
completed interviews will be verified is assumed to
motivate them against any falsification. However, little
empirical work has been done to test this assumption.
Nevertheless, many clients expect this will be done,
and as such, verification is a process that many survey
945
organizations build into their contracts. Of note, verification should not be confused with interviewer monitoring, which is a process used in real time to observe
the behavior of interviewers as they interact with
respondents in (a) gaining cooperation from the respondents and (b) gathering data from them.
Verification serves additional purposes beyond its
primary purpose of confirming whether or not a given
interview was conducted. In addition, a systematic
and routine verification process provides information
that a survey organization can use to evaluate the job
performance of individual interviewers. For example,
when respondents are contacted to verify a completed
interview, they also can be asked a few additional
questions, at very little additional cost to the survey
organization, to help evaluate the quality of the interviewers work. A system of routine verification can
also serve as a check on the quality of the interviewer
monitoring system a survey organization has instituted, as verification is a fail-safe approach to detecting falsification, which ideally should be detected via
on-the-job monitoring.
Paul J. Lavrakas
See also Falsification; Interviewer Monitoring; Record
Check; Reverse Record Check; Validation
Further Readings
VIDEO COMPUTER-ASSISTED
SELF-INTERVIEWING (VCASI)
Video computer-assisted self-interviewing (VCASI),
sometimes referred to as audio-visual CASI
(AVCASI), involves administration of an electronic
self-administered questionnaire, which includes prerecorded video clips of an interviewer asking survey
questions. This mode of data collection is an extension of audio computer-assisted self-interviewing
(ACASI). Responses are recorded either by keyboard,
946
Videophone Interviewing
Further Readings
VIDEOPHONE INTERVIEWING
Videophone interviewing enables researchers to
closely reproduce face-to-face interviewing without
requiring interviewers to physically enter respondent
Vignette Question
947
Further Readings
VIGNETTE QUESTION
A vignette is a sort of illustration in words. In survey research, a vignette question describes an event,
happening, circumstance, or other scenario, the wording of which often is experimentally controlled by the
researcher and at least one of the different versions of
the vignette is randomly assigned to different subsets
of respondents.
For example, imagine a vignette that describes
a hypothetical crime that was committed, and the
respondents are asked closed-ended questions to rate
how serious they consider the crime to be and what
948
Vignette Question
Visual Communication
Further Readings
949
VISUAL COMMUNICATION
Visual communication involves the transmission of
information through the visual sensory system, or the
eyes and sense of sight. In surveys, visual communication relies heavily on verbal communication (i.e.,
written text) but can also include nonverbal communication (i.e., through images conveying body language,
gestures, or facial expressions) and paralinguistic
communication (i.e., through graphical language).
Visual communication can be used to transmit information independently or in combination with aural
communication. When conducting surveys, the mode
of data collection determines whether information can
be transmitted visually, aurally, or both. Whether
survey information is transmitted visually or aurally
influences how respondents first perceive and then
cognitively process information to provide their
responses.
Visual communication can consist of not only verbal communication but also nonverbal and paralinguistic forms of communication, which convey
additional information that reinforces or modifies the
meaning of written text. Nonverbal communication
(i.e., information transferred through body language,
gestures, eye contact, and facial expressions) is most
common in face-to-face surveys but can also be conveyed with graphic images in paper and Web surveys.
Paralinguistic communication is traditionally thought
of as information transmitted aurally through the
speakers voice (e.g., quality, tone, pitch, inflection).
However, recent literature suggests that graphical features, such as layout, spacing, font size, typeface, color,
and symbols, that accompany written verbal communication can serve the same functions as aural paralinguistic communication. That is, these graphical features and
images, if perceived and cognitively processed, can
enhance or modify the meaning of written text in paper
and Web surveys.
The process of perceiving visual information can
be divided into two broad and overlapping stages:
pre-attentive and attentive processing. In pre-attentive
processing, the eyes quickly and somewhat effortlessly scan the entire visual field and process abstract
visual features such as form, color, motion, and spatial position. The eyes are then drawn to certain basic
visual elements that the viewer distinguishes as
objects from other competing elements in the visual
field that come to be perceived as background. Once
950
Voice over Internet Protocol and the Virtual Computer-Assisted Telephone Interview Facility
Further Readings
Voice over Internet Protocol and the Virtual Computer-Assisted Telephone Interview Facility
951
952
Voluntary Participation
Further Readings
VOLUNTARY PARTICIPATION
Voluntary participation refers to a human research
subjects exercise of free will in deciding whether to
participate in a research activity. International law,
national law, and the codes of conduct of scientific
communities protect this right. In deciding whether
participation is voluntary, special attention must be
paid to the likely participants socioeconomic circumstances in determining which steps must be put in
place to protect the exercise of free will. The level of
effort involved in clarifying voluntariness is not fixed
and depends on several circumstances, such as the
respondents abilities to resist pressures like financial
inducements, authority figures, or other forms of persuasion. Special care, therefore, must be taken to
eliminate undue pressure (real and perceived) when
research subjects have a diminished capacity to
refuse.
Basic Requirements
The essential qualities of voluntariness include the
following:
Voluntary Participation
Some Exceptions
These requirements apply to research and should not
be confused with taking tests, completing forms for
employment, or filling out applications for public
benefits.
953
Further Readings
W
Further Readings
WAVE
WEB SURVEY
Web surveys allow respondents to complete questionnaires that are delivered to them and administered
over the World Wide Web. This offers advantages
over other survey methods but also poses problems
for quality data collection.
A main advantage of Web surveys, compared with
other survey methods, is lower costs. Lower costs
enable researchers to collect larger samples than affordable with other survey methods and have made Web
surveys an option for many who cannot afford other
survey methods. Technology allowing for complex
questionnaire designs using features available in computer-assisted interviewing programs (such as CAPI,
CASI, and CATI) is inexpensive. There are other
advantages to Web surveys. The ability to quickly contact respondents over the Internet and process questionnaires through Web sites can shorten field periods
compared to other survey methods. Web surveys can be
used in conjunction with other methods, giving respondents another option for survey participation. The
Webs role as an electronic meeting place has increased
the ability to survey some rare, isolated, or marginal
populations. In principle it is possible to conduct
Paul J. Lavrakas
See also Attrition; Longitudinal Studies; Panel; Panel
Survey
955
956
Web Survey
Weighting
WEIGHTING
Weighting is a correction technique that is used by survey researchers. It refers to statistical adjustments that
are made to survey data after they have been collected
in order to improve the accuracy of the survey estimates. There are two basic reasons that survey researchers weight their data. One is to correct for unequal
probabilities of selection that often have occurred during sampling. The other is to try to help compensate for
survey nonresponse. This entry addresses weighting as
it relates to the second of these purposes.
957
Basics of Weighting to
Correct for Nonresponse
Suppose that the objective of a survey is to estimate
of a variable Y: Suppose furthe population mean, Y,
ther that a simple random sample of size n is selected
with equal probabilities and without replacement from
a population of size N: The sample can be represented
by a series of N indicators t1 , t2 , . . . , tN , where the kth indicator tk assumes the value 1 if element k is
selected in the sample, and otherwise it assumes the
value 0. In case of complete response, the sample
mean,
y =
N
1X
tk Yk ,
n k=1
RrY Sr SY
,
N
1X
wk tk Yk :
n k=1
958
Weighting
N
1X
wk tk Xk = X:
N k=1
Post-Stratification
The most frequently used weighting technique to correct for the effects of nonresponse is post-stratification.
One or more qualitative auxiliary variables are needed
to apply this technique. In the following explanation,
only one such variable is considered. Extension to
more variables is straightforward.
Suppose there is an auxiliary variable, X, having L
categories. X divides the population into L strata. The
number of population elements in stratum h is
denoted by Nh ; for h = 1, 2, . . . , L. Hence, the population size is equal to N = N1 + N2 + + NL .
Post-stratification assigns identical correction
weights to all elements in the same stratum. The
weight wk for an element k in stratum h (in the case
of a simple random sample with equal probabilities)
is defined by
Nh =N
,
wk =
nh =n
Linear Weighting
Even in the case of full response, the precision estimators can be improved if suitable auxiliary information is available. Suppose there are p auxiliary
variables. The values of these variables for element k
are denoted by the vector Xk = Xk1 , Xk2 , . . . , Xkp 0 .
The vector of population means is denoted by X.
If auxiliary variables are correlated with the survey
variable, Y, then for a suitably chosen vector
B = B1 , B2 , . . . , Bp 0 of regression coefficients for
a best fit of Y on X, the residuals, Ek = Yk Xk B,
vary less than the values of target variable itself. The
ordinary least squares solution B can be estimated, in
the case of full response, by
!1
!
N
N
X
X
0
7
tk Xk Yk :
tk Xk Xk
b=
k=1
k=1
in which x is the vector of sample means of the auxiliary variables. The generalized regression estimator
is asymptotically design unbiased. This estimator
Weighting
reduces the bias caused by nonresponse if the underlying regression model fits the data well.
The generalized regression estimator can be rewritten in the form of the weighted estimator (4), where
the correction weight, wk , for observed element k is
equal to wk = v0 Xk , and v is a vector of weight coefficients, which is equal to
v=
0
N
X
tk Xk X
k=1
!1
X:
Multiplicative Weighting
Correction weights produced by linear weighting are
the sum of a number of weighted coefficients. It is
also possible to compute correction weights in a
different way, namely, as the product of a number of
weight factors. This weighting technique is usually
called raking or iterative proportional fitting. Here
this process is denoted by multiplicative weighting,
because weights are obtained as the product of a number of factors contributed by the various auxiliary
variables.
959
Calibration Estimation
Jean-Claude Deville and Carl-Erik Sarndal have created a general framework for weighting of which linear weighting and multiplicative weighting are special
cases. This framework is called calibration. Assuming
simple random sampling with equal probabilities, the
starting point is that adjustment weights have to
satisfy two conditions:
1. The adjustment weights wk have to be as close as
possible to 1.
2. The weighted sample distribution of the auxiliary
variables has to match the population distribution;
that is,
960
WesVar
xW =
N
1X
tk wk Xk = X:
n k=1
10
The first condition sees to it that resulting estimators are unbiased, or almost unbiased, and the
second condition guarantees that the weighted sample is representative with respect to the auxiliary
variables used.
A distance measure, Dwk , 1, that measures the
difference between wk and 1 in some way, is introduced. The problem is now to minimize
N
X
tk Dwk ,1
11
weights assigned to each record: one for the household and one for the individual. Having two weights
in each record complicates further analysis.
The generalized regression estimation offers a solution. The trick is to sum the dummy variables corresponding to the qualitative auxiliary variables for the
individuals over the household. Thus, quantitative
auxiliary variables are created at the household level.
The resulting weights are assigned to the households.
Furthermore, all elements within a household are
assigned the same weight, and this weight is equal to
the household weight. This approach forces estimates
computed using the element weights to be consistent
with estimates based on the cluster weights.
k=1
Other Issues
There are several reasons why a survey statistician
may want to have control over the values of the
adjustment weights. One reason is that extremely
large weights are generally considered undesirable.
Use of such weights may lead to unstable estimates of
population parameters. Another reason to have some
control over the values of the adjustment weights is
that application of linear weighting might produce
negative weights.
The calibration approach allows for a weighting
technique that keeps the adjustment weights within
pre-specified boundaries and, at the same time,
enables valid inference. Many surveys have complex
sample designs. One example of such a complex
design is a cluster sampling. Many household surveys
are based on cluster samples. First, a sample of households is selected. Next, several or all persons in the
selected households are interviewed. The collected
information can be used to make estimates for two
populations: the population consisting of all households and the population consisting of all individual
persons. In both situations, weighting can be carried
out to correct for nonresponse. This results in two
Jelke Bethlehem
See also Auxiliary Variable; Bias; Cluster Sample;
Differential Nonresponse; Nonresponse; PostStratification; Probability of Selection; Raking;
Representative Sample; Unit Nonresponse
Further Readings
WESVAR
WesVar is a Windows-based software package for
analyzing data collected by surveys with complex
sample designs. Sample designs that can be analyzed include stratified or unstratified designs,
single- or multi-stage designs, one-time or longitudinal designs. A key aspect of WesVar is the
Within-Unit Coverage
Further Readings
961
WesVar: http://www.westat.com/westat/wesvar/
index.html
WITHIN-UNIT COVERAGE
Within-unit coverage refers to the accuracy of selection of the respondent to be interviewed after a household is contacted, generally by phone or face-to-face
methods, and the informant in the household identifies
the prospective respondent, according to the interviewers instructions. Theoretically, each eligible person living in the unit (e.g., all adults) should have
a known and nonzero chance of selection. A number
of probability, quasi-probability, and nonprobability
methods exist for choosing the respondent, and only
the probability methods allow estimation of chances of
selection. Humans, however, do not always behave
according to the logic of probability methods, and this
leads to errors of within-unit coverage. Missed persons
within households contribute more to noncoverage of
the Current Population Survey, for example, than do
missed housing units.
In general, methods that do not nominate the correct respondent by sex and age are more likely to produce within-unit coverage errors than are those that
do specify those characteristics. Those methods that
ask for a full listing of the households adults, such
as the Kish technique, can be time consuming and
threatening to some informants. The last-birthday and
next-birthday procedures were devised to be nonintrusive and yield within-unit probability samples; yet,
between about 10% and 25% of the time, the informant chooses the wrong respondent. This can happen
because of misunderstanding the question, not knowing all the birthdays of household members, or the
informants deliberate self-selection instead of the
designated respondent. If birthdays do not correlate
with the topics of survey questions, this is not very
problematic. Training interviewers rigorously and
having them verify the accuracy of the respondents
selection can help to overcome difficulties with the
birthday methods. Rigorous training also helps considerably with the Kish procedure, an almost pure
probability method.
Another problem is undercoverage of certain kinds
of people, especially minorities, the poor, the less educated, inner-city dwellers, renters, young males (particularly young black or Hispanic males), and older
962
Further Readings
963
Further Readings
964
Within-Unit Selection
WITHIN-UNIT SELECTION
In survey research, a sample of households usually
must be converted into a sample of individuals. This
often is accomplished by choosing respondents within
households using methods intended to yield a sample
similar to the population of interest. Ideally, this is
done with a probability technique because all unit
members will have a known, nonzero chance of selection, thus allowing generalization to a population.
Probability methods, however, tend to be time consuming and relatively intrusive because they ask about
household composition, potentially alienating prospective respondents and therefore increasing nonresponse.
Most researchers have limited resources, so often
they need quicker, easier, and less expensive quasiprobability or nonprobability methods that they
believe will yield samples adequately resembling the
population being studied. Although surveyors wish to
minimize nonresponse and reduce noncoverage, they
have to balance these choices to fit their goals and
resources. They have to decide if the benefits of probability methods outweigh their possible contributions
to total survey costs or if departures from probability
selection will contribute too much to total survey
error. This does not mean that they tolerate substandard practices but that they consider which trade-offs
will be the most acceptable within their budget and
time restrictions. One paramount goal is to gain the
respondents cooperation in as short a time as possible,
and a second is to obtain a reasonable balance of
demographic (usually age and gender) distributions.
Most households are homogeneous in other demographic characteristics, such as education and race.
Many well-respected polling and other survey research
organizations have to make these kinds of choices.
Selection of respondents within households is particularly important in telephone surveys because most
refusals occur at the inception of contact. Although it
Further Readings
965
966
Further Readings
Z
banks have not yet been published in a telephone
directory. Because directories are published only once
a year, the time lag between number assignment and
directory publication can result in new blocks not
being represented. Another source of error is common
in rural areas, particularly those serviced by small,
local telephone companies. A formal directory may
not be readily available to compilers, but numbers are
listed in a paperback book that looks like the local real
estate listings available at the supermarket.
Alternative sample designs are available for
researchers that opt to include zero-listed banks. One
approach, the Mitofsky-Waksberg method, takes
advantage of the tendency of telephone numbers to
cluster in 100-banks. It starts with a sample of primary numbers in prefixes available for landline residential use. If a primary number is determined to be
a working residential number, a cluster of additional
numbers in generated in the same 100-bank. Another
approach is to use disproportionate stratified samples
of both zero-listed banks and listed banks. For many
years, this design was the sampling protocol for all
surveys conducted as part of the U.S. Department of
Health Behavioral Risk Factor Surveillance System.
Research by Michael Brick and others suggests
that this coverage error is 34% of telephone households. However, work by Brick and by Clyde Tucker
and Jim Lepkowski confirms that the efficiency gains
of list-assisted designs make them preferable in most
cases. In fact, in 2003 the Behavioral Risk Factor Surveillance System protocol was revised to include only
the listed-bank stratum.
ZERO-NUMBER BANKS
Telephone numbers in the United States consist of
10 digits: The first three are the area code; the next
three are the prefix or exchange; the final four digits
are the suffix or local number. The 10,000 possible
numbers for a suffix can be subdivided into banks of
consecutive numbers: 1,000-banks (Nnnn); 100-banks
(NNnn); or 10-banks (NNNn). Zero-number banks,
or zero-listed banks, are banks that do not contain
directory-listed residential numbers. Although including zero-number banks in a random-digit dialing
frame allows for 100% coverage of landline residential telephone numbers, their inclusion can substantially reduce sample efficiency.
Based on regular analyses conducted by Survey
Sampling International, only 29% of 100-banks in
POTS (Plain Old Telephone Service) prefixes, and
other prefixes shared by different types of service,
have at least one directory-listed number. In prefixes
that have at least one directory-listed telephone number, only 47% of the possible 100-banks contain at
least one directory-listed number. Because of these
inefficiencies, most random-digit dialing surveys today
use list-assisted, truncated framesthat is, frames
truncated to include only listed-banks or those 100banks that contain at least one directory-listed residential telephone number.
This truncation introduces some coverage error by
excluding 100-banks that contain residential numbers
but are missing from the truncated frame. The most
common reason for this omission is that newly opened
Linda Piekarski
967
968
z-Score
Z-SCORE
The z-score is a statistical transformation that specifies
how far a particular value lies from the mean of a normal distribution in terms of standard deviations.
z-scores are particularly helpful in comparing observations that come from different populations and from
distributions with different means, standard deviations,
or both. A z-score has meaning only if it is calculated
for observations that are part of a normal distribution.
z-scores are sometimes referred to as standard
scores. When the values of a normal distribution are
transformed into z-scores, the transformed distribution
is said to be standardized such that the new distribution has a mean equal to 0 and a standard deviation
equal to 1.
The z-score for any observation is calculated by
subtracting the population mean from the value of the
observation and dividing the difference by the population standard deviation, or z = x m=s. Positive
z-scores mean that the observation in question is
greater than the mean; negative z-scores mean that it
is less than the mean. For instance, an observation
Index
Entry titles and their page numbers are in bold; illustrative material is identified by (figure) and (table).
Abandoned calls. See Predictive dialing
ABANY (legal abortion if a woman wants one for any
reason), 1:301
ABC News/Washington Post Poll, 1:12, 1:149150
National Election Poll and, 2:497498
on presidential approval ratings, 1:30
tracking polls of, 2:903904
Ability, satisficing and, 2:798
Absolute agreement. See Reliability
Absolute frequency. See Relative frequency
Access lines, 1:2
Accuracy. See Precision
Accuracy, error vs., xxxvi
Acquiescence response bias, 1:34, 1:37
Action codes, 2:887888
Action research. See Research design
Active listening skills. See Refusal avoidance
training (RAT)
Active screening. See Screening
Activities of daily living (ADLs), 2:502
Adaptations, translation and, 1:416417
Adaptive cluster sampling (ACS), 1:46, 1:5 (figure)
Adaptive designs, 2:742
Adaptive sampling, 1:46, 1:5 (figure), 2:690, 2:742
adaptive cluster sampling (ACS), 1:46
adaptive web sampling, 1:6
background, 1:4
Adaptive web sampling (AWS), 1:6
Add-a-digit sampling, 1:67
Adding random noise. See Perturbation methods
Address-based sampling (ABS), 1:78
Address matching. See Matched numbers
Adjustment errors. See Total survey error (TSE)
Ado-files, 2:842
Advance contact, 1:89
Advance letter, 1:911, 1:259260
Advertising Research Foundation (ARF), 1:156
Agency for Health Care Quality and Research, 1:94
Index971
Index973
Index975
Index977
Index979
F-tests, 1:295296
Full coding, selective vs., 1:55
Fuller, W. A., 1:213, 2:868
Full orthogonal balance, 1:48
Fund-raising under the guise of research.
See FRUGing
Gabler, Siegfried, 1:195, 1:196
Gallup, George, 1:20, 1:170,
1:297298, 2:505, 2:589
on bandwagon and underdog effects, 1:49
basic question format and, 1:2930
Landon/Roosevelt election, 2:852
National Council on Public Polls (NCPP)
and, 2:496
precision journalism and, 2:600
on public opinion, 2:637
Gallup and Robinson, 1:297
Gallup International Association, 1:299
Gallup Organization, 1:2931, 1:298
Gallup Poll, 1:298299, 1:419, 2:769
Game schema, 1:313
Gamson, William A., 1:396
Gatekeepers, 1:299300
Gawiser, Sheldon, 2:497
Gaziano, Cecilie, 1:391
Gender, mode effects and, 1:476
General callback, 1:73
General inverse sampling. See Inverse sampling
Generalized least squares (GLS), 2:568
General Motors (GM), 1:20
General Social Survey (GSS),
1:134, 1:300302, 1:354
content of, 1:301
growth of, 1:301
impact of, 1:302
level of analysis, 1:420
NORC and, 2:505, 2:506
origins of, 1:301
random order used in, 2:681
research opportunities, 1:301302
Geographic screening, 1:302303
German Socio-Economic Panel, 2:571
Gestalt psychology, 1:303305
Ghosh, Malay, 2:822823
Gibbs sampling. See Small area estimation
Global Attitudes Project (Pew Center), 2:583
Global Well-Being Index (Andrews & Withey), 2:652
Glynn, Carroll J., 2:643, 2:832
Goodman, Leo, 2:824
Goodnight, Jim, 2:796
Goyder, John, 2:869
Gramm-Leach-Bliley Act, 1:275
Index981
correcting, 1:381
managing, 1:380381
measuring, 1:380
preventing, 1:378380
Interviewerrespondent matching.
See Sensitive topics
Interviewers Manual (University of
Michigan), 1:390, 1:392
Interviewer talk time. See Predictive dialing
Interviewer training, 1:381384
additional types of, 1:382384
basic and project-specific, 1:382, 1:382 (table)
standards, 1:384
training elements and expectations, 1:381382
Interviewer training packet. See Training packet
Interviewer variance, 1:384385
Interviewer wait time. See Predictive dialing
Interviewing, 1:385388
calling rules, 1:388
completed, partial, 1:282
face-to-face, 1:261
fallback statements, 1:266267
history of, 1:385386
interviewer characteristics and, 1:386
interviewer effects and, 1:386
interviewer falsification, 1:267270
interviewer training and, 1:386387
key informant, 1:407
monitoring, supervision, validation, 1:387388
open-ended questions and, 2:548
training packets and, 2:904906
Intraclass correlation coefficient. See (Rho)
Intracluster homogeneity, 1:388390
Introduction, 1:390393, 2:905
Introduction to Variance Estimation (Wolter), 2:794
Intrusiveness. See Sensitive topics
Inverse sampling, 1:393395
applications, 1:394
designs, 1:394395
iPoll Database. See Roper Center for
Public Opinion Research
Issue-attention cycle, 1:13
Issue definition (framing), 1:395397
Issue publics. See Nonattitude
Item, 2:652
Item bank. See Item response theory
Item characteristic curve. See Item response theory
Item count technique. See Sensitive topics
Item discrimination, 1:398
Item nonresponse. See Missing data
Item order effects. See Question order effects
Item order randomization, 1:397
Item response theory, 1:398402
Index983
Index985
Index987
Index989
Index991
Index993
Index995
Index997
Index999