Documente Academic
Documente Profesional
Documente Cultură
What is Statistics?
- Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it
thereby provides the navigation essential for controlling the course of scientific and societal advances
- Average number of peer-reviewed medical research articles published/year over 1994-2001: 398,778 (MEDLINE)
- ~ 275,000 involved human subjects
- ~ 25,000 involved randomized, controlled trials
Yeah, yeah, yeah. So, how will this knowledge make me a better doctor?
- Understanding what the results of these studies mean (and dont mean) can help in deciding between various
treatments
- The more you know, the better the chance that you will be able to communicate with your patients.
Example: The impact of microscopic extrathyroid extension on outcome in patients with clinical T1 and T2 well-differentiated
thyroid cancer
- Patients and Methods. From an institutional database, we identified 984 patients (54%) who underwent surgery for
cT1/T2N0 disease. Of these, 869 patients were pT1/T2 and 115 were upstaged to pT3 based on the finding of
microscopic ETE. Disease-specific survival (DSS) and recurrence-free survival (RFS) were analyzed for each group using
the KaplanMeier method. In the pT3 group, factors predictive of outcome were analyzed by univariate and
multivariate analyses.
- Results. There was no difference in the 10-year DSS (99% vs 100%; P = .733) or RFS (98% vs 95%; P =.188) on
comparison of the pT1/pT2 and pT3 cohorts. Extent of surgery and administration of postoperative RAI were not
significant for recurrence on univariate or multivariate analysis in the pT3 cohort.
o No difference with stage 3 and stage 1 odds
- Conclusion. Outcomes in patients with cT1T2N0 WDTC are excellent and not affected by microscopic ETE. The extent of
resection and administration of postoperative RAI in patients with microscopic ETE does not impact survival or
recurrence.
No difference?
- 99% = 100%?!?
- 98% = 95%?!?
- What do P = .733 and P = .188 mean?
Getting Started
- Variable characteristic that can be measured or observed. If a characteristic is the same for every member of the
population, it is referred to as a constant
- Types of variables
o Quantitative (Continuous, Discrete)
o Categorical (Binary, Non-binary)
Ex. ZIP CODE (number but just describes something, not actually a numerical value)
Quantitative Variables
- Variables that take on numeric values for which arithmetic operations (differences, averages, etc.) make sense.
o Continuous: Can take on any value over one or more intervals (height, body fat percentage, LDL)
o Discrete: Takes on one of a finite or countably infinite set of values (white blood cell count, number of cases of
mono in a school system, number of patients treated in an emergency room in one day)
has to be a space in between
Categorical Variables
- Variables that identify which of at least two categories an observation falls in.
o Binary: Two possible categories (disease presence, Rh factor, vital state dead or alive)
o Non-binary: Three or more categories (eye color, type of melanoma (lentigo, nodular, etc.))
Some comments
- Descriptive statistics involves using summary values and graphical displays to explore the distribution of one or more
variables in a data set or the relationship between two or more variables.
- Inferential statistics involves drawing conclusions about a population with a certain degree of confidence or error rate.
Histogram/Frequency Table
Questions
Usefulness?
- Mean:
o Not resistant to outliers. Many inferential methods based on the mean do not produce reliable results when
outliers (especially extreme outliers) are in a data set.
o Utilizes all of the observations in a data set, hence it takes advantage of as much information as possible
- Median
o Resistant to outliers.
o Does not utilize all of the data
o Inferential methods based on the median can be used when outliers exist.
- Mode
o Used with categorical data/discrete variables with a small number of unique values.
o No inferential methods
- Trimmed mean: mean is calculated after removing the smallest p% and largest p% of the data values. (Resistant to
outliers, uses more data than the median)
o Dangerous to do if outlier is important
- Five number summary: consists of the minimum, lower (first) quartile, median, upper (third) quartile, and maximum.
Divides the data set into ordered sets that consist of 25% of the data.
o Put it in order, cut it in half, cut each half in half
MAO Example Monoamine oxidase in 18 schizophrenia patients (nmoles benzylaldehyde product/108 platelets)
Median: middle of 8.7 and 9.7
MAO Example (cont.)
- Five-number summary:
4.1 (min) 7.4 (lower quartile) 9.2 (median) 11.9 (upper quartile) 18.8 (max)
- Mode:
7.8 (occurred twice)
5.56% (1 out of 18) Trimmed Mean:
9.6
(4.1 and 18.8 were removed, avg. of remaining 16 was calculated)
Measures of Variability
- Standard Deviation average distance between each observation and the mean.
- Interquartile Range distance between the lower and upper quartiles
o Middle 50%
- Range distance between the minimum and the maximum
Usefulness?
- Standard Deviation
o Not resistant to outliers because mean is not resistant
o Utilizes all of the observations in a data set.
- Interquartile Range
o Resistant to outliers
o Multiple accepted methods for calculating IQR
- Range
o Not resistant to outliers
o No idea what is going on between the extremes
Mean = 9.8056
- Interpretations
- Standard deviation: On average, the MAO values for the 18 people in the sample are within 3.6183 units of the
mean.
- IQR: The middle 50% of the MAO values fall within a 4.5-unit interval.
- Range: All of the data fall within a 14.7-unit interval.
Colorectal F Pancreas M
- Expect the males rates to be more spread out than female rates because male rate SD is more
- The Mississippi rate is 1.682 standard deviations above the mean rate for females, while the Kentucky rate is 1.605
standard deviations above the mean rate for males. Hence, the Mississippi rate is farther above average for females
than the Kentucky rate is for males.
- Farther away from 0, farther away you are from the average
Segue to Inference
Sampling Distribution
- Suppose we choose 40 of these patients at random and calculate the average survival time of the 40 who were
selected. What values would we typically expect for the average of the 40? Would it be possible to obtain an average
between 1600 and 1700 days? How about between 0 and 100 days? Which would be more likely?
The 0-100 days would be more likely
Distribution of mean survival time of 40 patients
Question?
- Two experimental treatments for late-stage pancreatic cancer have been developed. Each one is tested on a cohort of
forty patients. The first cohort has an average survival time of 370 days, while the second cohort has an average
survival time of 450 days. Do either one of these
results provide evidence that the average survival time
using the new treatment is greater than 343 days?
- 1,897 out of the 10,000 average survival times were
370 days or higher. (P = .1897)
- 319 out of the 10,000 average survival times were 450
days or higher. (P = .00319)
Implication
- If the average survival time is 343 days (no different) for experimental treatment A than the status quo, approximately
19% of all samples of 40 patients will have an average survival time of 370 days or longer.
- If the average survival time is 343 days (no different) for experimental treatment B than the status quo, approximately
0.3% of all samples of 40 patients will have an average survival time of 450 days or longer.
- Do the data seem out of line enough with the status quo to believe that the new treatment does increase survival
time?
- (Ball pit analogy)
- Null hypothesis : Statement assumed to be true. Typically implies no difference, no impact, maintaining the
status quo. If the null hypothesis is true, we know how our statistic should behave.
- Alternative hypothesis ( ) statement of what a statistical hypothesis test is set up to establish. Also called the
research hypothesis.
Hypothyroidism Analogy
- Probability of observing a statistic that is at least as extreme as the one produced by the sample, assuming that the null
hypothesis is true.
- Logic: lower p-values imply that it is harder to obtain a specific result if the null hypothesis is true. Hence, lower p-
values correspond to having more evidence against the null hypothesis.
Significance Level ( )
THE RULE
If P-value , reject .
If not, fail to reject (do not reject) .
If we reject , we say that the result is statistically significant at the (insert ) level.
Why Fail to Reject?
HE ENDED HIS LECTURE AT THIS POINT BECAUSE HE RAN OUT OF TIME. HE WILL
RECORD HIMSELF AND RELEASE THE REST OF THE LECTURE FOR OUR OWN BENEFIT.
Other Things You Might See
- Test Statistic measure of how different the relevant statistic(s) is/are from what is specified in the null hypothesis. In
general, values that are farther away from zero imply a greater difference, but what is classified as being far away
depends on a number of factors, which is why it is easier to work with the P-value.
- Power: The power of a hypothesis test is the probability that you will reject when the statement in is true. In
some cases, the value
- will be reported. In this setting, is the probability of failing to reject when, in fact, is true.
From a medical test standpoint, the power of a test is equivalent to the sensitivity of a test. Hence, is the
probability of a false negative.
All others constant, and are inversely related.
Strategy to Accelerate or Augment the Antidepressant Response and for An Early Onset of SSRI Activity. Adjunctive
Amisulpride to Fluvoxamine in Major Depressive Disorder
- Abstract: The topic of early response to antidepressant treatment has been extensively studied in major depressive
disorder (MDD). We serendipitous observed an increase tolerability, a rapid response to therapy and an early onset of
antidepressant fluvoxamine activity when associated with amisulpride in patients with major depressive disorder. The
purpose of this study was to investigate our preliminary observations.
- 20 women (mean age 51.3 years) with DSM-IV TR [23] diagnostic criteria for major depressive disorder and a Hamilton
Depression Rating Scale (HDRS) [24-26] score higher than 20.
- Exclusion factor was the age under 35 years.
- Each patient was given fluvoxamine (100mg/day) and amisulpride (50mg/day) throughout the 6week trial.
- Clinical symptoms were evaluated by using Hamilton Depression Rating Scale (HDRS) [24-26] at the end of week 1, 2, 3,
6.
Output Table
- Comparing average HDRS at different time periods, looking to see if the average HDRS score is different from the
average score at the beginning of the six-week period. The P-values imply that they are different at those times.
- A separate analysis stated The ANOVA one way for repeated measures carried outon the basis of the HDRS score at
baseline and at week 1, 2, 3 and 6 stage expressed a statistically significant improvement of depressive symptoms
(F=4.5; DF 9,80,4,76,99; P < 0.00001).
- Some articles will not state the actual P-value. Instead, you may see something akin to the following.
The P-value is larger than .10, result is insignificant at commonly used levels
Result significant at .10, but not at .05
Result significant at .05, but not at .01
Result significant at .01, but not at .001
(Some journals will differentiate at a .025 or a .02 level, but that information will be on the journals website.)
Confidence Intervals
Margin of Error
- Important properties
As sample size increases, margin of error decreases (interval gets narrower)
As the confidence level increases, margin of error increases (interval gets wider)
Confidence Level
- Confidence interval for the hazard ratio (similar to, but not the same as, relative risk). A value of one implies that the
two groups have the same hazard rate.
- 95% CI for the hazard ratio for the two age groups is (1.925, 5.870). Estimated hazard of dying within 10 years is
approximately two to six times as high for the over 45 group than the under 45 group.
- 95% CI for the hazard ratio for the two sex groups is (.902, 2.499). Values below one imply a lower hazard rate of dying
within 10 years for one sex, while the values above one imply a lower hazard rate for the other group. The fact that 1
falls within the interval would imply that the hazard ratios are not significantly different for males and females.
- Above interval looks at the hazard ratio for thyroid cancer patients with microscopic extensions versus without
microscopic extensions. Again, the confidence interval (0.714, 2.710) contains 1, so there isnt a significant difference in
the 10-year death rates. The last output reveals a similar result based on whether the patient had a radioactive iodine
treatment.
Correlation
- Oftentimes, we are interested in how the values of variables change with one another. One of the primary statistical
measures of such an association is called correlation. The most commonly used version of correlation is called the
Pearson product-moment correlation coefficient, which measures how close an association between two quantitative
variables is to being linear.
Properties of r
If or , the data are perfectly linear.
If the association is positive, . If the association is negative, .
If , there is no linear association, but there could be a strong nonlinear association.
- One of the biggest mistakes is when someone believes that when two variables are correlated, changing the value of
one variable will cause a change in the other variable. Finding out that two variables are correlated should result in
questions about the biological, chemical, and/or physical link between the variables.
Example
- Children ages 3-10 had the length of their feet measured, as well as their reading ability, based on their lexile score.
The correlation between the two variables was close to one, which implies that kids with longer feet have higher
reading levels. As a result, government agencies began awarding grants to scientists to research how to increase the
length of childrens feet.
Serious Example
- Runners showed an increased respiratory exchange ratio during the light cycle (P =
0.029) suggesting that voluntary running shifted resting substrate metabolism
toward glucose oxidation, relative to lipid oxidation.
- The observations from this study indicate that running longer distances is
associated with decreased breast tumor burden in old mice, suggesting that
physiological factors generated by exercising before tumor onset are protective
against tumor progression.
- Most important statement!
The mechanisms for this protective effect are not known, but the data
show that older mice are useful models to address specific questions in cancer research and support further
studies on the ability of exercise training to protect older women at risk for breast cancer.