Sunteți pe pagina 1din 13

Statistics

What is Statistics?

- Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty; and it
thereby provides the navigation essential for controlling the course of scientific and societal advances

Why Do Medical Professionals Need an Understanding of Statistics?

- Average number of peer-reviewed medical research articles published/year over 1994-2001: 398,778 (MEDLINE)
- ~ 275,000 involved human subjects
- ~ 25,000 involved randomized, controlled trials

Yeah, yeah, yeah. So, how will this knowledge make me a better doctor?

- Understanding what the results of these studies mean (and dont mean) can help in deciding between various
treatments
- The more you know, the better the chance that you will be able to communicate with your patients.

Example: The impact of microscopic extrathyroid extension on outcome in patients with clinical T1 and T2 well-differentiated
thyroid cancer

- Patients and Methods. From an institutional database, we identified 984 patients (54%) who underwent surgery for
cT1/T2N0 disease. Of these, 869 patients were pT1/T2 and 115 were upstaged to pT3 based on the finding of
microscopic ETE. Disease-specific survival (DSS) and recurrence-free survival (RFS) were analyzed for each group using
the KaplanMeier method. In the pT3 group, factors predictive of outcome were analyzed by univariate and
multivariate analyses.
- Results. There was no difference in the 10-year DSS (99% vs 100%; P = .733) or RFS (98% vs 95%; P =.188) on
comparison of the pT1/pT2 and pT3 cohorts. Extent of surgery and administration of postoperative RAI were not
significant for recurrence on univariate or multivariate analysis in the pT3 cohort.
o No difference with stage 3 and stage 1 odds
- Conclusion. Outcomes in patients with cT1T2N0 WDTC are excellent and not affected by microscopic ETE. The extent of
resection and administration of postoperative RAI in patients with microscopic ETE does not impact survival or
recurrence.

No difference?

- 99% = 100%?!?
- 98% = 95%?!?
- What do P = .733 and P = .188 mean?

Getting Started

- Variable characteristic that can be measured or observed. If a characteristic is the same for every member of the
population, it is referred to as a constant
- Types of variables
o Quantitative (Continuous, Discrete)
o Categorical (Binary, Non-binary)
Ex. ZIP CODE (number but just describes something, not actually a numerical value)

Quantitative Variables

- Variables that take on numeric values for which arithmetic operations (differences, averages, etc.) make sense.
o Continuous: Can take on any value over one or more intervals (height, body fat percentage, LDL)
o Discrete: Takes on one of a finite or countably infinite set of values (white blood cell count, number of cases of
mono in a school system, number of patients treated in an emergency room in one day)
has to be a space in between

Categorical Variables

- Variables that identify which of at least two categories an observation falls in.
o Binary: Two possible categories (disease presence, Rh factor, vital state dead or alive)
o Non-binary: Three or more categories (eye color, type of melanoma (lentigo, nodular, etc.))

Some comments

- The choice of statistical methods depends partially on the type(s) of variable(s).


- We often measure many continuous variables a specific level of precision (nearest inch, gram, etc.). Process called
discretization. This does imply that the variable is discrete.
- Numeric values dont automatically imply that a variable is quantitative. (e.g. databases will have breast cancer staging
classified as 0, 1, 2, 3, or 4.)

Descriptive vs. Inferential Statistics

- Descriptive statistics involves using summary values and graphical displays to explore the distribution of one or more
variables in a data set or the relationship between two or more variables.
- Inferential statistics involves drawing conclusions about a population with a certain degree of confidence or error rate.

Some Descriptive Statistics

- Frequency/relative frequency distributions


- Histogram
- Numeric Summaries
Measures of central tendency (mean, median, mode)
Measures of variability (standard deviation,
interquartile range, range)
- Boxplot
- Z-score (its not just about the average!)

Breast Cancer % Late Stage at Diagnosis by


Health Insurance Status - Women Ages 40-79,
New Jersey, 2006-2008

Maybe more people in Medicaid are getting


diagnosed with more late stage than early
stage
Frequency/Relative Frequency
- Frequency number of times an observation with a specific value occurs.
- Relative Frequency fraction/proportion of all observations that have a specific value. Can also be expressed as a
percentage. (Note: not all percentages are relative frequencies. Blood/alcohol content, Body/fat percentage are two
examples)

Histogram/Frequency Table

- Used for quantitative data


- Set of data values broken into equal width intervals
(width usually determined by software, can be
changed)
- Bar height = frequency (or relative frequency)
- Variable represented on horizontal axis
- Changing interval width changes appearance of
graph.
Histogram/Frequency Table on Right
- From state data
- Intervals are closed on left and open on the right
Left end point is included and right end
point is not (so just goes up to 39.99 but
doesnt include 40 for the first incidence
rate

Comparisons (Death Rates)

- deaths out of every 100,000 people


- this doesnt tell you which one is more dangerous
doesnt tell you details about what
percentage of people who get it die
- colorectal cancer is more common than pancreas so
thats why it has a higher death rate

Questions

- Are there site/sex combinations that tend to have


higher/lower death rates than others?
o Higher: M colorectal (more numbers to the
right)
o Death rate for males will be more than females
For breast cancer:
Its more common in females but the % of survival for F and M is the same
- Are there site/sex combinations that tend to have more variable (less consistent) death rates than others?
o How widespread the data is
o Male colorectal
More variability
Pancreas data appears the same for F and M
Numeric Summaries

- Mean arithmetic average


o Not resistant will be affected by outlier
- Median middle observation in an ordered list of data
o Resistant wont really be affected by outlier
- Mode observation with the highest frequency
- Resistance of a statistic Depends on whether the value of statistic is impacted by an extremely low or extremely high
observation (outlier)

Usefulness?

- Mean:
o Not resistant to outliers. Many inferential methods based on the mean do not produce reliable results when
outliers (especially extreme outliers) are in a data set.
o Utilizes all of the observations in a data set, hence it takes advantage of as much information as possible
- Median
o Resistant to outliers.
o Does not utilize all of the data
o Inferential methods based on the median can be used when outliers exist.
- Mode
o Used with categorical data/discrete variables with a small number of unique values.
o No inferential methods

Important Related Measures

- Trimmed mean: mean is calculated after removing the smallest p% and largest p% of the data values. (Resistant to
outliers, uses more data than the median)
o Dangerous to do if outlier is important
- Five number summary: consists of the minimum, lower (first) quartile, median, upper (third) quartile, and maximum.
Divides the data set into ordered sets that consist of 25% of the data.
o Put it in order, cut it in half, cut each half in half

MAO Example Monoamine oxidase in 18 schizophrenia patients (nmoles benzylaldehyde product/108 platelets)
Median: middle of 8.7 and 9.7
MAO Example (cont.)

- Five-number summary:

4.1 (min) 7.4 (lower quartile) 9.2 (median) 11.9 (upper quartile) 18.8 (max)

- Mode:
7.8 (occurred twice)
5.56% (1 out of 18) Trimmed Mean:
9.6
(4.1 and 18.8 were removed, avg. of remaining 16 was calculated)

Measures of Variability

- Standard Deviation average distance between each observation and the mean.
- Interquartile Range distance between the lower and upper quartiles
o Middle 50%
- Range distance between the minimum and the maximum

Usefulness?

- Standard Deviation
o Not resistant to outliers because mean is not resistant
o Utilizes all of the observations in a data set.
- Interquartile Range
o Resistant to outliers
o Multiple accepted methods for calculating IQR
- Range
o Not resistant to outliers
o No idea what is going on between the extremes

MAO example revisited

Mean = 9.8056

Deviation: data value of mean (4.1 9.2 =-5.7)


Standard deviation = 3.6183 (absolute standpoint)
Interquartile Range (IQR) = 11.9-7.4 = 4.5
Range = 18.8-4.1 = 14.7

MAO example revisited (cont.)

- Interpretations
- Standard deviation: On average, the MAO values for the 18 people in the sample are within 3.6183 units of the
mean.
- IQR: The middle 50% of the MAO values fall within a 4.5-unit interval.
- Range: All of the data fall within a 14.7-unit interval.

Death Rates (slide 20) Revisited Colorectal M


Pancreas F

Colorectal F Pancreas M

Match the summary statistics with the four histograms.

Boxplot (aka Box-and-whisker plot)


- Visual representation of five-number summary
- Displays outliers, if they exist
- Max, min, quardrants, medium
- Dot = outlier
- If have outlier, the lines are at the min and max WITHOUT
the outlier
Z-score

- Putting things on the same scale so you can compare them


- Mississippis colorectal cancer death rate is 16.5 per 100,000 women, while Kentuckys rate is 23.4 per 100,000 men. A
Mississippi politician states that their rate for women isnt as bad as Kentuckys for men because their rate is 2.484
above average, while Kentuckys is 3.624 above average. Is this an appropriate comparison?

- Expect the males rates to be more spread out than female rates because male rate SD is more
- The Mississippi rate is 1.682 standard deviations above the mean rate for females, while the Kentucky rate is 1.605
standard deviations above the mean rate for males. Hence, the Mississippi rate is farther above average for females
than the Kentucky rate is for males.

- Farther away from 0, farther away you are from the average

Segue to Inference

Stereotypical survival curve will dive down like that

Sampling Distribution

- Suppose we choose 40 of these patients at random and calculate the average survival time of the 40 who were
selected. What values would we typically expect for the average of the 40? Would it be possible to obtain an average
between 1600 and 1700 days? How about between 0 and 100 days? Which would be more likely?
The 0-100 days would be more likely
Distribution of mean survival time of 40 patients

- 10,000 random samples of 40 patients


- NOT ONE has an average between 0-100 but its closer than
1600-1700

Question?

- Two experimental treatments for late-stage pancreatic cancer have been developed. Each one is tested on a cohort of
forty patients. The first cohort has an average survival time of 370 days, while the second cohort has an average
survival time of 450 days. Do either one of these
results provide evidence that the average survival time
using the new treatment is greater than 343 days?
- 1,897 out of the 10,000 average survival times were
370 days or higher. (P = .1897)
- 319 out of the 10,000 average survival times were 450
days or higher. (P = .00319)

Implication

- If the average survival time is 343 days (no different) for experimental treatment A than the status quo, approximately
19% of all samples of 40 patients will have an average survival time of 370 days or longer.
- If the average survival time is 343 days (no different) for experimental treatment B than the status quo, approximately
0.3% of all samples of 40 patients will have an average survival time of 450 days or longer.
- Do the data seem out of line enough with the status quo to believe that the new treatment does increase survival
time?
- (Ball pit analogy)

Basics of a Hypothesis Test

- Null hypothesis : Statement assumed to be true. Typically implies no difference, no impact, maintaining the
status quo. If the null hypothesis is true, we know how our statistic should behave.
- Alternative hypothesis ( ) statement of what a statistical hypothesis test is set up to establish. Also called the
research hypothesis.

Hypothyroidism Analogy

- : patient doesnt have hypothyroidism


- : patient does have hypothyroidism
- Data: blood test
- If is true,
,
- Laymans view: If TSH is high enough, and thyroxine is low enough, will be rejected and a diagnosis of
hypothyroidism will be made.
P-value

- Probability of observing a statistic that is at least as extreme as the one produced by the sample, assuming that the null
hypothesis is true.
- Logic: lower p-values imply that it is harder to obtain a specific result if the null hypothesis is true. Hence, lower p-
values correspond to having more evidence against the null hypothesis.

Significance Level ( )

- Threshold for concluding whether the null hypothesis is false.


- Maximum acceptable error rate for mistakenly concluding that the alternative hypothesis is true, when, in fact, the null
hypothesis is true.
- From a medical test standpoint

- Most common values: .10, .05, .025, .01
If you pick .05 5% of the time, you get this result due to chance

THE RULE

If P-value , reject .
If not, fail to reject (do not reject) .
If we reject , we say that the result is statistically significant at the (insert ) level.
Why Fail to Reject?

- A court trial is analogous to a hypothesis test, with


- : defendant is innocent
- : defendant is guilty
- If there is enough evidence against the defendant, the jury rejects the null, returning a guilty verdict. If there isnt
enough evidence, the jury returns a not guilty verdict, not an innocent verdict. The purpose of the trial is not to
prove innocence, it is to prove guilt.
- Rejecting H0 is GUILTY but not rejecting H0 is NOT GUILTY

The Result Is Significant. Now What?

- Pancreatic Cancer Treatment B


P-value = .0031
- We have concluded that the new treatment increases survival, on average. Does this mean in increases survival for
everyone? Does it mean that more people are cured, but the side effects result in more early deaths? The result should
create more questions to be examined.

HE ENDED HIS LECTURE AT THIS POINT BECAUSE HE RAN OUT OF TIME. HE WILL
RECORD HIMSELF AND RELEASE THE REST OF THE LECTURE FOR OUR OWN BENEFIT.
Other Things You Might See

- Test Statistic measure of how different the relevant statistic(s) is/are from what is specified in the null hypothesis. In
general, values that are farther away from zero imply a greater difference, but what is classified as being far away
depends on a number of factors, which is why it is easier to work with the P-value.
- Power: The power of a hypothesis test is the probability that you will reject when the statement in is true. In
some cases, the value

- will be reported. In this setting, is the probability of failing to reject when, in fact, is true.
From a medical test standpoint, the power of a test is equivalent to the sensitivity of a test. Hence, is the
probability of a false negative.
All others constant, and are inversely related.

Strategy to Accelerate or Augment the Antidepressant Response and for An Early Onset of SSRI Activity. Adjunctive
Amisulpride to Fluvoxamine in Major Depressive Disorder

- Abstract: The topic of early response to antidepressant treatment has been extensively studied in major depressive
disorder (MDD). We serendipitous observed an increase tolerability, a rapid response to therapy and an early onset of
antidepressant fluvoxamine activity when associated with amisulpride in patients with major depressive disorder. The
purpose of this study was to investigate our preliminary observations.

Fluvoxamine Study (cont.)

- 20 women (mean age 51.3 years) with DSM-IV TR [23] diagnostic criteria for major depressive disorder and a Hamilton
Depression Rating Scale (HDRS) [24-26] score higher than 20.
- Exclusion factor was the age under 35 years.
- Each patient was given fluvoxamine (100mg/day) and amisulpride (50mg/day) throughout the 6week trial.
- Clinical symptoms were evaluated by using Hamilton Depression Rating Scale (HDRS) [24-26] at the end of week 1, 2, 3,
6.

Output Table

- Comparing average HDRS at different time periods, looking to see if the average HDRS score is different from the
average score at the beginning of the six-week period. The P-values imply that they are different at those times.
- A separate analysis stated The ANOVA one way for repeated measures carried outon the basis of the HDRS score at
baseline and at week 1, 2, 3 and 6 stage expressed a statistically significant improvement of depressive symptoms
(F=4.5; DF 9,80,4,76,99; P < 0.00001).

One last note about P-values

- Some articles will not state the actual P-value. Instead, you may see something akin to the following.
The P-value is larger than .10, result is insignificant at commonly used levels
Result significant at .10, but not at .05
Result significant at .05, but not at .01
Result significant at .01, but not at .001
(Some journals will differentiate at a .025 or a .02 level, but that information will be on the journals website.)

Confidence Intervals

- Back to the pancreatic cancer example:


- We concluded that the mean survival time is greater than 343 days, but this does not give us any additional
information about the actual value of the mean survival time.
- A confidence interval provides us with an estimate of a population value (parameter) with a specified level of
confidence that the interval contains the parameter we are trying to estimate.
- Confidence intervals can be two-sided (most common) or one-sided.
- A confidence interval utilizes information about the variability of an statistic, such as the standard deviation of the
10,000 sample means, along with the desired level of confidence, to produce a margin of error.
- Many commonly used confidence intervals have the form:
estimate margin of error

Margin of Error

- Important properties
As sample size increases, margin of error decreases (interval gets narrower)
As the confidence level increases, margin of error increases (interval gets wider)

Confidence Level

- Common confidence levels: 90%, 95%, 98%, 99%


- Relationship to the significance level?
- Interpretation of a 95% confidence level for a population mean: If we were to take repeated random samples of a fixed
size and calculate a 95% confidence interval for the population mean, on average, 95% of the resulting intervals would
include the value of the population mean.

Thyroid Cancer Example

- Confidence interval for the hazard ratio (similar to, but not the same as, relative risk). A value of one implies that the
two groups have the same hazard rate.

- 95% CI for the hazard ratio for the two age groups is (1.925, 5.870). Estimated hazard of dying within 10 years is
approximately two to six times as high for the over 45 group than the under 45 group.
- 95% CI for the hazard ratio for the two sex groups is (.902, 2.499). Values below one imply a lower hazard rate of dying
within 10 years for one sex, while the values above one imply a lower hazard rate for the other group. The fact that 1
falls within the interval would imply that the hazard ratios are not significantly different for males and females.

- Above interval looks at the hazard ratio for thyroid cancer patients with microscopic extensions versus without
microscopic extensions. Again, the confidence interval (0.714, 2.710) contains 1, so there isnt a significant difference in
the 10-year death rates. The last output reveals a similar result based on whether the patient had a radioactive iodine
treatment.

Correlation

- Oftentimes, we are interested in how the values of variables change with one another. One of the primary statistical
measures of such an association is called correlation. The most commonly used version of correlation is called the
Pearson product-moment correlation coefficient, which measures how close an association between two quantitative
variables is to being linear.

Properties of r


If or , the data are perfectly linear.
If the association is positive, . If the association is negative, .
If , there is no linear association, but there could be a strong nonlinear association.

Other Information About r

- Versions exist for other types of variables.


- A major use of r in medical research is to look for relationships between variables.
- Hypothesis tests exist for correlation. The primary null hypothesis is that r = 0 (that there is no linear association
between the variables). Smaller P-values imply that there is an association between the variables.

Correlation Isnt Causation!

- One of the biggest mistakes is when someone believes that when two variables are correlated, changing the value of
one variable will cause a change in the other variable. Finding out that two variables are correlated should result in
questions about the biological, chemical, and/or physical link between the variables.

Example
- Children ages 3-10 had the length of their feet measured, as well as their reading ability, based on their lexile score.
The correlation between the two variables was close to one, which implies that kids with longer feet have higher
reading levels. As a result, government agencies began awarding grants to scientists to research how to increase the
length of childrens feet.

Serious Example

- Pre-tumor exercise decreases breast cancer in old mice in a distance-dependent manner


A negative correlation was observed between daily distance ran, prior to tumor injection, and absolute tumor mass
measured at necropsy (Pearsons r = -0.89, P = 0.0066).
A correlation was also observed between distance ran before tumor implant
and the histological score for mitotic index (Pearsons r = -0.85, P = 0.034).

Breast Cancer/Exercise (cont.)

- Runners showed an increased respiratory exchange ratio during the light cycle (P =
0.029) suggesting that voluntary running shifted resting substrate metabolism
toward glucose oxidation, relative to lipid oxidation.
- The observations from this study indicate that running longer distances is
associated with decreased breast tumor burden in old mice, suggesting that
physiological factors generated by exercising before tumor onset are protective
against tumor progression.
- Most important statement!
The mechanisms for this protective effect are not known, but the data
show that older mice are useful models to address specific questions in cancer research and support further
studies on the ability of exercise training to protect older women at risk for breast cancer.

S-ar putea să vă placă și