Documente Academic
Documente Profesional
Documente Cultură
Theresa Jackson Hughes, MPH American College Health Association December 2006
Biostatistics
Research Methods
To do successful research, you don't need to know everything, you just need to know of one thing that isn't known.
Arthur Schawlow
That's the nature of research - you don't know what in hell you're doing.
Harold "Doc" Edgerton
If we knew what it was we were doing, it would not be called research, would it?
Albert Einstein
Sampling
What is your population of interest?
To whom do you want to generalize your results?
All students (18 and over) Undergraduates only Greeks Athletes Other
Sampling
A sample is a smaller (but hopefully representative) collection of units from a population used to determine truths about that population (Field, 2005) Why sample?
Resources (time, money) and workload Gives results with known accuracy that can be calculated mathematically
The sampling frame is the list from which the potential respondents are drawn
Registrars office Class rosters Must assess sampling frame errors
Types of Samples
Probability (Random) Samples
Simple random sample Systematic random sample Stratified random sample
Proportionate Disproportionate
Cluster sample
Non-Probability Samples
Convenience sample Purposive sample Quota
Sample Size
Depends on expected response rate
Average 85% for paper
FINAL SAMPLE DESIRED / .85 = SAMPLE
Random
Unrelated to true measures
Example: Momentary fatigue
Validity
The extent to which a test measures what it is supposed to measure A subjective judgment made on the basis of experience and empirical indicators Asks "Is the test measuring what you think its measuring? Affected by systematic error/bias
In order to be valid, a test must be reliable; but reliability does not guarantee validity.
Levels of Measurement
Levels of Measurement
Nominal
Gender
Male, Female
Interval
Body Mass Index (BMI)
Vaccinations
Yes, No, Unsure
Ordinal
Personal health status
Excellent, Very good, Good, Fair, Poor
Ratio
Number of drinks Number of sexual partners Perception percentages Blood alcohol concentration (BAC)
Last 30 days
Never used, Not in last 30 days, 1-2 days, 3-5 days, 6-9 days, 10-19 days, 20-29 days, All 30 days
Biostatistics
It is commonly believed that anyone who tabulates numbers is a statistician. This is like believing that anyone who owns a scalpel is a surgeon.
R. Hooke
Types of Statistics
Descriptive statistics
Describe the basic features of data in a study Provide summaries about the sample and measures
Inferential statistics
Investigate questions, models, and hypotheses Infer population characteristics based on sample Make judgments about what we observe
Descriptive Statistics
Mode Median Mean Central Tendency Variation Range Variance Standard Deviation Frequency
Valid
Missing Total
1 excellent 2 very good 3 good 4 fair 5 poor 6 don't know Total System
Descriptiv e Statistics Q48 Weight in pounds HT_INCH Height in Inches Q13 How many drinks Q12 Hours alcohol BAC Blood Alcohol Content Valid N (listwise) N 51935 52017 53374 53326 50604 50218 Range 534 56.00 88 65 2.47 Minimum 52 48.00 0 0 .00 Maximum 586 104.00 88 65 2.47 Mean 153.16 67.2035 4.42 2.99 .0731 Std. Deviation 35.791 4.01241 4.401 2.726 .08357 Variance 1281.031 16.099 19.370 7.430 .007
Hypotheses
Null hypotheses
Presumed true until statistical evidence in the form of a hypothesis test indicates otherwise
There is no effect/relationship There is no difference in means
Alternative hypotheses
Tested using inferential statistics
There is an effect/relationship There is a difference in means
Null is true
Reject null Alpha Type I error 1 Alpha
CORRECT NONREJECTION
Null is false
1 Beta Power
CORRECT REJECTION
Effect Size
Measure of the strength of the relationship between two variables
One-Sample Test Test Value = 5 95% Confidence Interval of the Difference Lower Upper -.62 -.54
t -30.352
df 53373
Gen health
a. Alternative hypothesis states that the proportion of cases in the first group < .2. b. Based on Z Approximation.
Men and women report significantly different numbers of sexual partners over the past 12 months
Hypotheses Test: Independent Samples t-test OR One-way ANOVA Result: Reject null
Group Statistics N 32687 18474 Partners you had Sex female male Mean 1.34 1.82 Std. Deviation 2.017 3.627 Std. Error Mean .011 .027
1 = 2 1 2
Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -.532 -.540 -.434 -.426
F Partners you had Equal variances assumed Equal variances not assumed 867.978
Sig. .000
t -19.360
df 51159
-16.704 25065.988
Descriptiv es
residence hall frat/sorority house other university housing off campus with parents other Total
ANOVA Blood Alcohol Content Sum of Squares 3.188 348.695 351.884 df 5 50376 50381 Mean Square .638 .007 F 92.123 Sig. .000
Ever - Depression
yes no
Total
Chi-Square Tests Value 2.185 b 2.122 2.211 df 1 1 1 Asymp. Sig. (2-sided) .139 .145 .137 Exact Sig. (2-sided) Exact Sig. (1-sided)
Pearson Chi-Square a Continuity Correction Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases
.073
a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 715. 62.
An significant association does not indicate causation Statistical significance is not always the same as practical significance Multiple factors contribute to whether your results are significant It gets easier and easier as you practice!
Questions???