Sunteți pe pagina 1din 14

Research Design & Data Collection:

The Basics
Research & Data Basics

Review of Basic Terminology


• A variable is a is a thing (a concept, property, etc.) that:
– we can name and either qualify (by assigning sub-names or adjectives) or
quantify (by counting or measuring)
– can assume more than one value

• A measurement or score is the value of a variable measured


for a particular individual

• A case, unit of observation, or data-point is a set of


simultaneous measurements of different variables for an
individual

• Data (or a data matrix) are a collection of cases measured for


multiple individuals

01:830:400 Spring 2019 2


Research & Data Basics

Data Matrix (Data Frame in R)

variable

measurement

case

01:830:400 Spring 2019 3


Research & Data Basics

Types of Data
• Qualitative/Categorical data occur when we assign
objects/events into labeled (i.e., nominal or ordinal) groups
– E.g., race, gender, yes/no response
– In R, categorical variables are called “factors” and can be ordered
(ordinal) or unordered (nominal)

• Quantitative/Numerical data occur when we obtain some


number that describes the quantitative trait of interest.
– This includes interval & ratio scale measurements, as well as ordinal
ranks
– These numbers can be either discrete or continuous
– E.g., height, weight, income

01:830:400 Spring 2019 4


Research & Data Basics

Uses for statistics


• Description (descriptive statistics)
– “Statistics” (e.g., mean, SD, IQ, GNP, GDP, SLG, NFL passer rating, etc.)
– Other summaries & visualizations (e.g., tables, plots, EDA)
– E.g., “Who is the GOAT?”, “Which country has a better economy?”

• Prediction (probability theory)


– Ostensibly the purview of probability theory, but often a critical part of
statistics & data science
– E.g., “When will my hard drive fail?”, “What’s the probability that I will be the
tallest person in my class?”, “Who will win the Superbowl?”

• Inference (inferential statistics)


– The focus of most behavioral research (and of most of this course)
– Estimation & modeling
– hypothesis testing, decision making, & model selection (CDA)
– Drawing conclusions from incomplete data
• E.g., from sample statistics  population parameters

01:830:400 Spring 2019 5


Research & Data Basics

01:830:400 Spring 2019 6


Research & Data Basics

Basic Research Designs (for inference)


• True Experiments
• Observational (correlational) Studies
– This includes “quasi-experiments” and “natural experiments”

• Both designs seek to determine the relationship between one


or more response (dependent) variables and one or more
explanatory (independent) variables.

• However, Different research designs


– produce different forms of data
– answer different types of questions
– may require different statistical techniques

01:830:400 Spring 2019 7


Research & Data Basics

True Experiments
• The goal of an experiment is to demonstrate a cause-and-effect
relationship between two (or more) variables
– I.e., to show that changing the value of one variable causes changes to
occur in a second variable.

• In a simple experiment:
– The explanatory (independent) variable is manipulated to create
treatment conditions.
– The response (dependent) variable is observed and measured to obtain
scores for a group of individuals in each of the treatment conditions.

• The critical elements of an experiment are:


– Manipulation of an independent variable
– Control of all extraneous variables (e.g., using random assignment)
– Measurement and comparison of dependent variable across conditions
01:830:400 Spring 2019 8
Research & Data Basics

Observational Studies
• A study is observational if it simply measures the two (or
more) variables as they exist naturally, without manipulation.

• Correlation (alone) between two variables does not imply


causation!

http://xkcd.com/552/

01:830:400 Spring 2019 9


Research & Data Basics

Sampling, Assignment, & Validity


• Random sampling occurs when individuals are selected
such that each member of the population has an equal
chance of inclusion
– This is critical for external validity. Failure to sample randomly may
result in statistics that don’t reflect the whole population
– Stratified sampling for reduced variability

• Random assignment occurs when individuals are assigned


to different treatment groups using a random process
– This is critical for internal validity. Failure to assign randomly confounds
the independent variable; any measured difference in a dependent
variable could be due solely to the assignment
– Blocking for reduced variability

01:830:400 Spring 2019 10


Research & Data Basics

Issues with Confounds & Sampling


• Surveys, convenience sampling, response bias

• Poorly controlled experiments vs. well-controlled


observational studies
– Confounds, demand characteristics, & experimenter effects
– Placebo & nocebo effects
– Double-blind studies
– The negative health effects of smoking
– Cops vs. Crime in DC (natural experiments)

01:830:400 Spring 2019 11


Research & Data Basics

Sampling Bias Example: Landon vs. FDR


A historical example of a biased sample yielding misleading results

In 1936, Landon
was the
Republican
nominee,
opposing the re-
election of FDR.

01:830:400 Spring 2019 12


Research & Data Basics

Literary Digest Poll


● The Literary Digest polled about 10
million Americans, and got responses
from about 2.4 million.
● The poll showed that Landon would
likely be the overwhelming winner
and FDR would get only 43% of the
votes.
● Election result: FDR won, with 62%
of the votes.
● The magazine was completely
discredited because of the poll, and
was soon discontinued.

01:830:400 Spring 2019 13


Research & Data Basics

What Went Wrong?


• The magazine had surveyed three groups:
– its own readers
– registered car owners
– registered telephone users

• These groups all had incomes well above the national


average (during the Great Depression) and resulted in a
sample of voters that was much more likely to be a
Republican supporter than the average American at the time

• The large sample size (2.4 million) did not help because it
was biased.

01:830:400 Spring 2019 14

S-ar putea să vă placă și