Documente Academic
Documente Profesional
Documente Cultură
Biostatistics
Statistical significance
Other key terms
Appropriate statistical tests
Fun examples from the Spring 2005 dataset!
Research
Methods
What exactly is
research?
Important Components
of Empirical Research
Problem statement, research questions,
purposes, benefits
Theory, assumptions, background literature
Variables and hypotheses
Operational definitions and measurement
Research design and methodology
Instrumentation, sampling
Data analysis
Conclusions, interpretations,
recommendations
Sampling
What is your population of interest?
To whom do you want to generalize your
results?
All students (18 and over)
Undergraduates only
Greeks
Athletes
Other
Sampling
A sample is a smaller (but hopefully
representative) collection of units from a
population used to determine truths about
that population (Field, 2005)
Why sample?
Resources (time, money) and workload
Gives results with known accuracy that can be
calculated mathematically
Types of Samples
Probability (Random) Samples
Simple random sample
Systematic random sample
Stratified random sample
Proportionate
Disproportionate
Cluster sample
Non-Probability Samples
Convenience sample
Purposive sample
Quota
Sample Size
Depends on expected response rate
Average 85% for paper
FINAL SAMPLE DESIRED / .85 = SAMPLE
Final Desired N
All students
600-2,999
600
3,000-9,999
700
10,000-19,999
800
20,000-29,000
900
30,000
1,000
Random
Unrelated to true measures
Example: Momentary fatigue
Validity
The extent to which a test measures what it is
supposed to measure
A subjective judgment made on the basis of
experience and empirical indicators
Asks "Is the test measuring what you think its
measuring?
Affected by systematic error/bias
Levels of
Measurement
Levels of
Measurement
Nominal
Gender
Interval
Male, Female
Vaccinations
Yes, No, Unsure
Ordinal
Personal health status
Excellent, Very good,
Good, Fair, Poor
Last 30 days
Never used, Not in
last 30 days, 1-2 days,
3-5 days, 6-9 days,
10-19 days, 20-29
days, All 30 days
Ratio
Number of drinks
Number of sexual
partners
Perception percentages
Blood alcohol
concentration (BAC)
Biostatistics
Types of Statistics
Descriptive statistics
Describe the basic features of data in a
study
Provide summaries about the sample and
measures
Inferential statistics
Investigate questions, models, and
hypotheses
Infer population characteristics based on
sample
Make judgments about what we observe
Descriptive Statistics
Mode
Median
Mean
Central Tendency
Variation
Range
Variance
Standard Deviation
Frequency
Descriptive Statistics
Examples
Categorical Variables (Nominal/Ordinal)
Q1 Gen health
Valid
Missing
Total
1 excellent
2 very good
3 good
4 fair
5 poor
6 don't know
Total
System
Frequency
9145
23767
16442
3737
565
132
53788
323
54111
Percent
16.9
43.9
30.4
6.9
1.0
.2
99.4
.6
100.0
Valid Percent
17.0
44.2
30.6
6.9
1.1
.2
100.0
Cumulative
Percent
17.0
61.2
91.8
98.7
99.8
100.0
Descriptive Statistics
Examples
Categorical Variables (Nominal/Ordinal)
Q49 Year in school * Q46 Sex Crosstabulation
Q49
Year in
school
Total
Count
% of Total
Count
% of Total
Count
% of Total
Count
% of Total
Count
% of Total
Count
% of Total
Count
% of Total
Count
% of Total
Count
% of Total
Q46 Sex
1 female
2 male
7366
4154
14.5%
8.2%
6755
3678
13.3%
7.2%
6195
3333
12.2%
6.6%
5192
2676
10.2%
5.3%
1380
985
2.7%
1.9%
5088
3246
10.0%
6.4%
203
105
.4%
.2%
266
145
.5%
.3%
32445
18322
63.9%
36.1%
Total
11520
22.7%
10433
20.6%
9528
18.8%
7868
15.5%
2365
4.7%
8334
16.4%
308
.6%
411
.8%
50767
100.0%
Descriptive Statistics
Examples
Continuous Variables (Interval/Ratio)
Descriptive Statistics
Q48 Weight in pounds
HT_INCH Height in
Inches
Q13 How many drinks
Q12 Hours alcohol
BAC Blood Alcohol
Content
Valid N (listwise)
N
51935
Range
534
Minimum
52
Maximum
586
Mean
153.16
Std. Deviation
35.791
Variance
1281.031
52017
56.00
48.00
104.00
67.2035
4.01241
16.099
53374
53326
88
65
0
0
88
65
4.42
2.99
4.401
2.726
19.370
7.430
50604
2.47
.00
2.47
.0731
.08357
.007
50218
Hypotheses
Null hypotheses
Presumed true until statistical evidence
in the form of a hypothesis test indicates
otherwise
There is no effect/relationship
There is no difference in means
Alternative hypotheses
Tested using inferential statistics
There is an effect/relationship
There is a difference in means
Alpha probability of
making a Type I error
Power probability of
correctly rejecting null
1 Beta
Effect Size
Measure of the strength of
the relationship between
two variables
Reject
null
Fail to
Reject
null
Null is
true
Null is
false
Alpha
Type I
error
1 Beta
Power
1 Alpha
CORRECT
NONREJECTION
CORRECT
REJECTION
Beta
Type II
error
N
53374
Mean
4.42
Std. Deviation
4.401
Std. Error
Mean
.019
One-Sample Test
Test Value = 5
t
-30.352
df
53373
Sig. (2-tailed)
.000
Mean
Difference
-.578
95% Confidence
Interval of the
Difference
Lower
Upper
-.62
-.54
Test of a single
proportion of
categorical
variable
one
20% of college
students report their health
is excellent
Hypotheses
Ho: p = 20
HA: p 20 (one-tailed)
Binomial Test
Gen health
Group 1
Group 2
Total
Category
<= 1
>1
N
9145
44643
53788
Observed
Prop.
.170
.830
1.000
Test Prop.
.2
Asymp. Sig.
(1-tailed)
.000a,b
a. Alternative hypothesis states that the proportion of cases in the first group < .2.
b. Based on Z Approximation.
Test of a relationship
between
two
continuous variables
There is a relationship between the number of drinks
students report drinking the last time they drank and the
number of sex partners they have had within the last
school year
Hypotheses
Ho : = 0
HA : 0
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
How many
drinks
1
53374
.238**
.000
52576
Partners
you had
.238**
.000
52576
1
52896
Sex
female
male
N
32687
18474
Mean
1.34
1.82
Std. Deviation
2.017
3.627
Std. Error
Mean
.011
.027
F
Partners you had
Equal variances
assumed
Equal variances
not assumed
867.978
Sig.
.000
df
Sig. (2-tailed)
Mean
Difference
Std. Error
Difference
95% Confidence
Interval of the
Difference
Lower
Upper
-19.360
51159
.000
-.483
.025
-.532
-.434
-16.704
25065.988
.000
-.483
.029
-.540
-.426
Descriptives
residence hall
frat/sorority house
other university housing
off campus
with parents
other
Total
N
21285
781
3620
18151
4279
2266
50382
Mean
.0741
.1127
.0622
.0773
.0606
.0579
.0731
Std. Deviation
.08215
.09278
.07357
.08539
.08490
.08296
.08357
Std. Error
.00056
.00332
.00122
.00063
.00130
.00174
.00037
Minimum
.00
.00
.00
.00
.00
.00
.00
Maximum
1.27
.75
1.41
2.47
1.17
1.26
2.47
ANOVA
Blood Alcohol Content
Between Groups
Within Groups
Total
Sum of
Squares
3.188
348.695
351.884
df
5
50376
50381
Mean Square
.638
.007
F
92.123
Sig.
.000
frat/sorority house
off campus
with parents
other
Mean
Difference
(I-J)
Std. Error
-.03865*
.00337
.01190*
.00135
-.00316*
.00085
.01350*
.00141
.01623*
.00183
.03865*
.00337
.05055*
.00354
.03548*
.00338
.05215*
.00356
.05488*
.00375
-.01190*
.00135
-.05055*
.00354
-.01506*
.00138
.00160
.00178
.00433
.00213
.00316*
.00085
-.03548*
.00338
.01506*
.00138
.01667*
.00144
.01940*
.00185
-.01350*
.00141
-.05215*
.00356
-.00160
.00178
-.01667*
.00144
.00273
.00217
-.01623*
.00183
-.05488*
.00375
-.00433
.00213
-.01940*
.00185
-.00273
.00217
Sig.
.000
.000
.003
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.947
.323
.003
.000
.000
.000
.000
.000
.000
.947
.000
.809
.000
.000
.323
.000
.809
Ever - Depression
yes
no
Total
Count
Expected Count
Count
Expected Count
Count
Expected Count
Frat or sorority?
yes
no
681
7692
715.6
7657.4
3744
39657
3709.4
39691.6
4425
47349
4425.0
47349.0
Chi-Square Tests
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-Linear
Association
N of Valid Cases
Value
2.185b
2.122
2.211
2.185
df
1
1
1
1
Asymp. Sig.
(2-sided)
.139
.145
.137
Exact Sig.
(2-sided)
Exact Sig.
(1-sided)
.141
.073
.139
51774
Total
8373
8373.0
43401
43401.0
51774
51774.0
Important Points to
Remember
Questions???