Documente Academic
Documente Profesional
Documente Cultură
A tutorial lecture given at the annual meeting of the American College of Sports Medicine, Seattle, June 4 1999. Will G Hopkins Physiology and Physical Education University of Otago Dunedin NZ will.hopkins@otago.ac.nz
Outline
Performing
Publishing
Conclusions
Example: in a study of 64 subjects, the correlation between height and weight was 0.68 (likely range 0.52 to 0.79).
observed value upper lower confidence confidence limit limit
0.00
Confidence interval: difference between the upper and lower confidence limits.
Amazing facts about confidence intervals
(for normally distributed statistics) To halve the interval, you have to quadruple sample size.
A 99% interval is 1.3 times wider than a 95% interval. You need 1.7 times the sample size for the same width.
A 90% interval is 0.8 of the width of a 95% interval. You need 0.7 times the sample size for the same width.
Rearranging, for 2.5% of the time, true value > function'(observed value, data, critical value) = upper confidence limit
For 95% of subjects, the change was/would be between 0.12 and 0.92 L. The average change in the population would be between 0.12 and 0.92 L. The change for the average subject would be between 0.12 and 0.92 L. There may be individual differences in the change.
P value: Definition
The probability of a more extreme absolute value than the observed value if the true value was zero or null. Example: 20 subjects, correlation = 0.25, p = 0.29.
no effect observed effect (r = 0.25) distribution of correlations for no effect and n = 20
correlation coefficient
likely range
0.37 to 0.87 0.00 to 0.74 -0.22 to 0.62 -0.44 to 0.44
p < 0.05
p = 0.05
p > 0.05
Two independent estimates of a normally distributed statistic with equal confidence intervals are significantly different at the 5% level if the overlap of their intervals is less than 0.29 (1 - 2/2) of the length of the interval. If the intervals are very unequal...
p < 0.05
p = 0.05
p > 0.05
Lots of tests for significance implies more chance of at least one false alarm: "inflated type I error".
Ditto type II error? Deal with inflated type I error by reducing the p value. Should we adjust confidence intervals? No.
The effect is probably big. There's a < 5% chance the effect is zero. There's a < 2.5% chance the effect is < zero. There's a high chance the effect is > zero. The effect is publishable.
The effect is not publishable. There is no effect. The effect is probably zero or trivial. There's a reasonable chance the effect is < zero.
Planning Research
correlation = 0.10 relative risk = 1.2 (or frequency difference = 10%) difference in means = 0.2 of a between-subject standard deviation change in means = 0.5 of a within-subject standard deviation
Example: 760 subjects to detect a correlation of 0.10. Example: 68 subjects to detect a 0.5% change in a crossover study when the within-subject variation is 1%.
But 95% likely range doesn't work properly with traditional sample-size estimation (maybe).
Example: Correlation of 0.06, sample size of 760... 47.5% + 47.5% (=95%) likely range: Not significant, but could be substantial. Huh?
47.5% + 30% likely range: Not significant, and can't be substantial. OK!
But sample size needed to detect or delimit smallest effect is overkill for larger effects.
Example: confidence limits for correlations of 0.10 and 0.80 with a sample size of 760...
-0.1
0.1
0.3
0.9
So why not start with a smaller sample and do more subjects only if necessary? Yes, I call it...
Performing Research
380
trivial -0.1 0 0.1
350
small
270
moderate 0.3
155
large
46
very large
nearly perfect
0.9
And all the big effects have been researched anyway? No, not really.
Publishing Research
In the Methods
"We show the precision of our estimates of outcome statistics as 95% confidence limits (which define the likely range of the true value in the population from which we drew our sample)." Amazingly useful tips on calculating confidence limits
Simple differences between means: stats program. Other normally distributed statistics: mean and p value. Relative risks: stats program. Correlations: Fisher's z transform. Standard deviations and other root mean square variations: chi-squared distribution.
Coefficients of variation: standard deviation of 100x natural log of the variable. Back transform for CV>5%. Use the adjustment of Tate and Klett to get shorter intervals for SDs and CVs from small samples.
Example: coefficient of variation for 10 subjects in 2 tests
adjusted
Ratios of independent standard deviations: F distribution. R2 (variance explained): convert to a correlation. Use the spreadsheet at sportsci.org/stats for all the above. Effect-size (mean/standard deviation): non-central F distribution or bootstrapping. Really awful statistics: bootstrapping.
For a large-enough sample, you can recreate (sort of) the population by duplicating the sample endlessly. Draw 1000 samples (of same size as your original) from this population. Calculate your outcome statistic for each of these samples, rank them, then find the 25th and 975th placegetters. These are the confidence limits. Problems
In the Results
In TEXT
Change or difference in means First mention: ...0.42 (95% confidence/likely limits/range -0.09 to 0.93) or ...0.42 (95% confidence/likely limits/range 0.51). Thereafter: ...2.6 (1.4 to 3.8) or 2.6 ( 1.2) etc. Correlations, relative risks, odds ratios, standard deviations, ratios of standard deviations: can't use because the confidence interval is skewed: ...a correlation of 0.90 (0.67 to 0.97)... ...a coefficient of variation of 1.3% (0.9 to 1.9)...
In TABLES
Confidence intervals
r Variable A Variable B Variable C Variable D 0.70 0.44 0.25 0.00 likely range 0.37 to 0.87 0.00 to 0.74 -0.22 to 0.62 -0.44 to 0.44
P values
r Variable A Variable B Variable C Variable D 0.70 0.44 0.25 0.00 p 0.007 0.05 0.29 1.00
Asterisks
r Variable A Variable B Variable C Variable D 0.70** 0.44* 0.25 0.00
In FIGURES Told carbohydrate Told placebo Not told -10 -5 0 5 10 Change in power (%)
sea level
altitude
sea level live low train low live high train high live high train low
3
2
12
14
In the Discussion
Interpret the observed effect and its 95% confidence limits qualitatively.
Example: you observed a moderate correlation, but the true value of the correlation could be anything between trivial and very strong.
small
moderate 0.3
large
very large
nearly perfect
0.9
Meta-Analysis
Deriving a single estimate and confidence interval for an effect from several studies.
Here's how it works for two:
Equal Confidence Intervals Study 1 Study 2 Study 1+2 Unequal Confidence Intervals Study 1 Study 2 Study 1+2
Conclusions