Sunteți pe pagina 1din 37

Theresa Jackson Hughes, MPH American College Health Association December 2006

Research Methods

Sampling Frame and Sampling Generalizability Bias Reliability and Validity Levels of measurement Statistical significance Other key terms Appropriate statistical tests Fun examples from the Spring 2005 dataset!

Biostatistics

Research Methods

To do successful research, you don't need to know everything, you just need to know of one thing that isn't known.

Arthur Schawlow

That's the nature of research - you don't know what in hell you're doing.

Harold "Doc" Edgerton

If we knew what it was we were doing, it would not be called research, would it?

Albert Einstein

Scientific research is systematic, controlled, empirical, and critical investigation of natural phenomena guided by theory and hypotheses about the presumed relations among such phenomena.

Kerlinger, 1986

Problem statement, research questions, purposes, benefits Theory, assumptions, background literature Variables and hypotheses Operational definitions and measurement Research design and methodology Instrumentation, sampling Data analysis Conclusions, interpretations, recommendations

Sampling

What is your population of interest?

To whom do you want to generalize your results?

All students (18 and over) Undergraduates only Greeks Athletes Other

Sampling

A sample is a smaller (but hopefully representative) collection of units from a population used to determine truths about that population (Field, 2005) Why sample?

Resources (time, money) and workload Gives results with known accuracy that can be calculated mathematically

The sampling frame is the list from which the potential respondents are drawn

Registrars office Class rosters Must assess sampling frame errors

Types of Samples

Probability (Random) Samples

Simple random sample Systematic random sample Stratified random sample

Proportionate Disproportionate

Cluster sample

Non-Probability Samples

Convenience sample Purposive sample Quota

Sample Size

Depends on expected response rate

Average 85% for paper

FINAL SAMPLE DESIRED / .85 = SAMPLE

FINAL SAMPLE DESIRED / .25 = SAMPLE

Size of Campus <600 600-2,999 3,000-9,999 10,000-19,999 20,000-29,000 30,000 Final Desired N All students 600 700 800 900 1,000

Systematic Error or Bias: unknown or unacknowledged error created during the design, measurement, sampling, procedure, or choice of problem studied

Error tends to go in one direction

Examples: Selection, Recall, Social desirability

Random

Unrelated to true measures

Example: Momentary fatigue

Reliability

The extent to which a test is repeatable and yields consistent scores Affected by random error/bias

Validity

The extent to which a test measures what it is supposed to measure A subjective judgment made on the basis of experience and empirical indicators Asks "Is the test measuring what you think its measuring? Affected by systematic error/bias

In order to be valid, a test must be reliable; but reliability does not guarantee validity.

Levels of Measurement

Levels of Measurement

Nominal

Gender

Male, Female

Interval

Body Mass Index (BMI)

Vaccinations

Yes, No, Unsure

Ordinal

Personal health status

Excellent, Very good, Good, Fair, Poor

Ratio

Number of drinks Number of sexual partners Perception percentages Blood alcohol concentration (BAC)

Last 30 days

Never used, Not in last 30 days, 1-2 days, 3-5 days, 6-9 days, 10-19 days, 20-29 days, All 30 days

Biostatistics

It is commonly believed that anyone who tabulates numbers is a statistician. This is like believing that anyone who owns a scalpel is a surgeon.

R. Hooke

Gregg Easterbrook

Author Unknown

Types of Statistics

Descriptive statistics

Describe the basic features of data in a study Provide summaries about the sample and measures

Inferential statistics

Investigate questions, models, and hypotheses Infer population characteristics based on sample Make judgments about what we observe

Descriptive Statistics

Mode Median Mean Central Tendency Variation Range Variance Standard Deviation Frequency

Categorical Variables (Nominal/Ordinal)

Q1 Gen health Frequency 9145 23767 16442 3737 565 132 53788 323 54111 Percent 16.9 43.9 30.4 6.9 1.0 .2 99.4 .6 100.0 Valid Percent 17.0 44.2 30.6 6.9 1.1 .2 100.0 Cumulative Percent 17.0 61.2 91.8 98.7 99.8 100.0

Valid

Missing Total

1 excellent 2 very good 3 good 4 fair 5 poor 6 don't know Total System

Categorical Variables (Nominal/Ordinal)

Q49 Year in school * Q46 Sex Crosstabulation Q46 Sex 1 female 2 male 1st year undergrad Count 7366 4154 % of Total 14.5% 8.2% 2nd year under Count 6755 3678 % of Total 13.3% 7.2% 3rd year under Count 6195 3333 % of Total 12.2% 6.6% 4th year under Count 5192 2676 % of Total 10.2% 5.3% 5th year or more under Count 1380 985 % of Total 2.7% 1.9% graduate Count 5088 3246 % of Total 10.0% 6.4% adult special Count 203 105 % of Total .4% .2% other Count 266 145 % of Total .5% .3% Count 32445 18322 % of Total 63.9% 36.1% Total 11520 22.7% 10433 20.6% 9528 18.8% 7868 15.5% 2365 4.7% 8334 16.4% 308 .6% 411 .8% 50767 100.0% Q49 Year in school 1 2 3 4 5 6 7 8 Total

Continuous Variables (Interval/Ratio)

Descriptiv e Statistics Q48 Weight in pounds HT_INCH Height in Inches Q13 How many drinks Q12 Hours alcohol BAC Blood Alcohol Content Valid N (listwise) N 51935 52017 53374 53326 50604 50218 Range 534 56.00 88 65 2.47 Minimum 52 48.00 0 0 .00 Maximum 586 104.00 88 65 2.47 Mean 153.16 67.2035 4.42 2.99 .0731 Std. Deviation 35.791 4.01241 4.401 2.726 .08357 Variance 1281.031 16.099 19.370 7.430 .007

Hypotheses

Null hypotheses

Presumed true until statistical evidence in the form of a hypothesis test indicates otherwise

There is no effect/relationship There is no difference in means

Alternative hypotheses

Tested using inferential statistics

There is an effect/relationship There is a difference in means

making a Type I error

Reject null when null is true Level of significance, p value

Null is true

Reject null Alpha Type I error 1 Alpha

CORRECT NONREJECTION

Null is false

1 Beta Power

CORRECT REJECTION

Fail to reject null when null is false

1 Beta

Effect Size

Measure of the strength of the relationship between two variables

College students report drinking an average of 5 drinks the last time they partied/socialized

Hypotheses

Ho: = 5 HA: 5

One-Sample Statistics N 53374 Mean 4.42 Std. Deviation 4.401 Std. Error Mean .019

One-Sample Test Test Value = 5 95% Confidence Interval of the Difference Lower Upper -.62 -.54

t -30.352

df 53373

20% of college students report their health is excellent

Hypotheses

Ho: p = 20 HA: p 20 (one-tailed)

Binomial Test Category <= 1 >1 N 9145 44643 53788 Observed Prop. .170 .830 1.000 Test Prop. .2 Asymp. Sig. (1-tailed) .000 a,b

Gen health

a. Alternative hypothesis states that the proportion of cases in the first group < .2. b. Based on Z Approximation.

There is a relationship between the number of drinks students report drinking the last time they drank and the number of sex partners they have had within the last school year

Hypotheses

Ho: = 0 HA: 0

How many drinks Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N How many drinks 1 53374 .238** .000 52576 Partners you had .238** .000 52576 1 52896

Men and women report significantly different numbers of sexual partners over the past 12 months

Hypotheses Test: Independent Samples t-test OR One-way ANOVA Result: Reject null

Group Statistics N 32687 18474 Partners you had Sex female male Mean 1.34 1.82 Std. Deviation 2.017 3.627 Std. Error Mean .011 .027

1 = 2 1 2

Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper -.532 -.540 -.434 -.426

F Partners you had Equal variances assumed Equal variances not assumed 867.978

Sig. .000

t -19.360

df 51159

-16.704 25065.988

Mean BAC reported differs across student residences

Hypotheses Test: One-way ANOVA Result: Reject null

Blood Alcohol Content 95% Confidence Interval for Mean Lower Bound Upper Bound .0730 .0752 .1062 .1193 .0598 .0646 .0760 .0785 .0581 .0631 .0545 .0613 .0724 .0738

Descriptiv es

residence hall frat/sorority house other university housing off campus with parents other Total

ANOVA Blood Alcohol Content Sum of Squares 3.188 348.695 351.884 df 5 50376 50381 Mean Square .638 .007 F 92.123 Sig. .000

Multiple Comparisons Dependent Variable: Blood Alcohol Content Games-Howell Mean Difference (I-J) Std. Error -.03865* .00337 .01190* .00135 -.00316* .00085 .01350* .00141 .01623* .00183 .03865* .00337 .05055* .00354 .03548* .00338 .05215* .00356 .05488* .00375 -.01190* .00135 -.05055* .00354 -.01506* .00138 .00160 .00178 .00433 .00213 .00316* .00085 -.03548* .00338 .01506* .00138 .01667* .00144 .01940* .00185 -.01350* .00141 -.05215* .00356 -.00160 .00178 -.01667* .00144 .00273 .00217 -.01623* .00183 -.05488* .00375 -.00433 .00213 -.01940* .00185 -.00273 .00217 (I) Currently live residence hall (J) Currently live frat/sorority house other university housing off campus with parents other frat/sorority house residence hall other university housing off campus with parents other other university housing residence hall frat/sorority house off campus with parents other off campus residence hall frat/sorority house other university housing with parents other with parents residence hall frat/sorority house other university housing off campus other other residence hall frat/sorority house other university housing off campus with parents Sig. .000 .000 .003 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .947 .323 .003 .000 .000 .000 .000 .000 .000 .947 .000 .809 .000 .000 .323 .000 .809 95% Confidence Interval Lower Bound Upper Bound -.0483 -.0290 .0081 .0157 -.0056 -.0007 .0095 .0175 .0110 .0215 .0290 .0483 .0404 .0606 .0258 .0451 .0420 .0623 .0442 .0656 -.0157 -.0081 -.0606 -.0404 -.0190 -.0111 -.0035 .0067 -.0017 .0104 .0007 .0056 -.0451 -.0258 .0111 .0190 .0125 .0208 .0141 .0247 -.0175 -.0095 -.0623 -.0420 -.0067 .0035 -.0208 -.0125 -.0035 .0089 -.0215 -.0110 -.0656 -.0442 -.0104 .0017 -.0247 -.0141 -.0089 .0035

Is there an association between being a member of a fraternity/sorority and ever being diagnosed with depression?

Hypotheses

Ho: There is no association between being a member of a fraternity/sorority and ever being diagnosed with depression. HA: There is an association between being a member of a fraternity/sorority and ever being diagnosed with depression.

Ev er - Depression * Frat or sorority? Crosstabulation Frat or sorority? yes no 681 7692 715.6 7657.4 3744 39657 3709.4 39691.6 4425 47349 4425.0 47349.0 Total 8373 8373.0 43401 43401.0 51774 51774.0

Ever - Depression

yes no

Total

Chi-Square Tests Value 2.185 b 2.122 2.211 df 1 1 1 Asymp. Sig. (2-sided) .139 .145 .137 Exact Sig. (2-sided) Exact Sig. (1-sided)

Pearson Chi-Square a Continuity Correction Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases

.073

a. Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 715. 62.

An significant association does not indicate causation Statistical significance is not always the same as practical significance Multiple factors contribute to whether your results are significant It gets easier and easier as you practice!

Questions???

