1 - 2 Biostatistics

Chapter 1: General Principles
1.2: Biostatistics
Lori B. Daniels, MD, MAS, FACC Consulting Fees/Honoraria: Roche Diagnostics, Alere, Inc.; Research Grants: Roche Diagnostics.
Learner Objectives Upon completion of this module, the reader will be able to: 1. Correctly identify the study design used in a given medical study, and list its uses. 2. Describe the p value and interpret its meaning and relationship to hypothesis testing. 3. Calculate sensitivity, specificity, and positive and negative predictive values for a diagnostic test. 4. Compare various methods to account for confounding variables in clinical studies, includingmultivariable regression and propensity analysis. 5. Recognize how survival analysis differs from other regression analyses and identify when survival analysis should be used.
Introduction
One of the strengths of the field of cardiology is its strong evidence base. Cardiology is known for its large clinical trials, which provide a large amount of new information about treatments and practices. A well-qualified cardiologist must understand biostatistics to help decide whether results presented in the literature can be believed and should be applied to their treatment of patients. The purpose of this module is to provide a basic foundation in biostatistics so that the reader can better evaluate clinical literature. The focus is on the interpretation of research methods, rather than on calculations and computational details. This module emphasizes the biostatistics methods that the cardiovascular specialist is most likely to encounter in modern medical literature. An additional resource is the American Heart Association Scientific Statement that reviews the appropriate statistical evaluation of novel markers of cardiovascular risk. It provides an excellent summary and explanation of some of the most frequently used biostatistics within the field of cardiovascular medicine.1
history prospectively over a period of time. The purpose is to determine which characteristics, exposures, or risk factors are associated with a given outcome. Unlike cross-sectional or casecontrol studies, however, the outcome of interest in a cohort study occurs in the future, after the subject is enrolled. In the cardiovascular literature, one of the most prominent cohort studies is theFramingham study of cardiovascular risk factors, which started in 1948, when more than 6,000 individuals from the same Massachusetts town were enrolled. The cohort was then followed with various examinations every two years to determine the association of various risk factors with cardiovascular diseases.
Case Series
A case series is a descriptive account of a collection of patients, in which each case shares some characteristic of interest. A case series can be the first step in identifying a new disease process, describing a novel physical or imaging finding, or reporting on a novel treatment method. Case series reports can serve as a catalyst to other studies.
Case-Control Studies
Case-control studies are retrospective studies that start with individuals who already have a disease or trait of interest (i.e., the cases), then match them with control subjects who lack that disease or trait. The studies then attempt to look back at events, exposures, and characteristics to see whether any difference exists between the two groups. The idea is to find a risk factor that is present in the history of the cases, but not the controls.
Study Designs
Medical research study designs fall into two major categories: 1) observational and 2) interventional. In observational studies, subjects are observed but no medical intervention is performed. The observations may be performed prospectively (i.e., forwardlooking cohort studies), retrospectively (i.e., backward-looking case-control studies), or simultaneously (i.e., cross-sectional studies). Interventional studies, or clinical trials, evaluate the effects of an intervention on outcomes and are considered to provide a stronger level of evidence than observational studies. Understanding how a study is designed is essential to understanding the conclusions that can be drawn from it.
Cross-Sectional Studies
Cross-sectional studies are descriptive studies about the characteristics of a group of individuals at a single point in time. These studies describe what is happening right now in a group of people. Cross-sectional studies can be used to establish norms (e.g., for a new biomarker), evaluate the usefulness of a new diagnostic procedure, or poll individuals about their attitudes (e.g., towards health care).
Cohort Study
A cohort study is an observational study that enrolls a group of subjects with something in common and follows their natural
1.2: Biostatistics
1.2.1
Clinical Trial
A clinical trial is a study undertaken to determine whether a particular procedure or treatment can improve an outcome for a selected group of individuals. In controlled clinical trials, the intervention being tested is compared with another procedure or drug, generally a placebo or the current standard of care. Randomization assigns subjects to either the active treatment or the placebo group by chance, thereby eliminating bias in patient assignment and allowing patient characteristics to be evenly distributed between groups. In double-blind studies, neither the study investigator nor the subject knows whether they are in the treatment group or the control group, thus eliminating potential bias. The most robust clinical trial design is considered to be the randomized, doubleblind, placebo-controlled trial, because it can provide evidence of causation (i.e., the best indication that any effects seen are due to the intervention).
mean, and 99.7% lie within 3 SDs of the mean. Even if the distribution is not bell-shaped, at least 75% of the values will always fall within 2 SDs of the mean. The mean and SD are also useful for determining whether a set of variables is skewed, when only summary statistics are provided. If the mean is smaller than 2 SDs, the data are probably skewed.
Hypothesis Testing
The purpose of a hypothesis test is to permit generalizations about a population based upon observations made in a sample from that population. When making comparisons between two groups (e.g., a group that received some therapy vs. a group that received a placebo), the hypothesis being tested is that some difference exists between the two groups. The null hypothesis, which must be disproven in order to claim a difference, is that the two groups are equal.
Descriptive Statistics
Measures of Central Tendency
Errors in Hypothesis Testing

Erroneous conclusions can arise from hypothesis tests in two ways. A type I error is analogous to a false-positive diagnostic test. A type I error incorrectly concludes significance (and rejects the null hypothesis) when the result is not really significant. A type II error is analogous to a false-negative diagnostic test. A type II error incorrectly concludes no significance when the result is, in fact, significant. The probability of making a type II error is known as beta, or . The significance level of a test is also known as alpha, or . This is the probability of making a type I error (i.e., incorrectly concluding significance). For many statistical tests, the p value can be compared to the significance level to either detect a statistically significant difference (i.e., reject the null hypothesis), or to conclude that the null hypothesis cannot be rejected at that significance level. For most studies, a significance level of 0.05 is chosen.
The correct measures to use for describing a population depend on the type of data being analyzed. The mean measures the middle of a distribution of numerical variables, if that variable has a normal (i.e., bell-shaped) distribution in the population being studied. The mean, also called the arithmetic mean, is the average of the observations. The mean value is sensitive to extreme values, especially in small sample sizes, so it is not used for skewed data. Instead, the median is used to measure the middle of a distribution of numerical variables that are skewed. Medians are also used for ordinal data, which are data that have an inherent order among categories (e.g., New York Heart Association classification for heart failure severity). The median is the point at which half the observations are larger and half are smaller. Unlike the mean, it is unaffected by extreme values.
Power
The power of a statistical test is its ability to detect significance when a result is indeed significant. In the case of a diagnostic test, the power of a statistical test corresponds to the sensitivity of a diagnostic test, or the ability to detect a disease that is present. Investigators want the statistical test to be sensitive to detecting significance when it should be detected, and minimizing the risk of a type II error. Power can be calculated as 1 , or 1 minus the probability of making a type II error.
Measures of Variation
Range: The range is the simplest measure of spread and is defined as the highest observed value minus the lowest observed value. One disadvantage of the range is that it tends to increase as the number of observations increases, since extreme values are more likely to occur with a greater number of data points. Consequently, reporting percentile values such as the 25th and 75th percentiles, or the 5th and 95th percentiles, is often preferred. The interquartile range (i.e., the difference between the 75th and 25th percentiles) is often used in conjunction with the median, to describe a set of skewed observations. Standard deviation: The most commonly used measure of dispersion is the standard deviation (SD), a measure of the spread of data about the mean. The SD is calculated as the square root of the variance, and the variance is the average of the squares of the deviations from the mean. If the distribution of observations is bell-shaped, then approximately 67% of observations are within 1 SD of the mean, 95% are within 2 SDs of the
P Values
The p value is the probability of obtaining a result at least as extreme as the one observed, if the null hypothesis is true (i.e., the groups being compared are equal). The p value can also be thought of as the probability that the observed result is due to chance alone. After a statistical test has been performed, if its p value is less than (often set at 0.05), the null hypothesis is rejected. Importantly, a significant p value does not provide absolute proof that a difference between groups exists; rather, a p value of 0.05 or less means that if the groups do not differ, results as extreme as those observed would happen only 1 in 20 times or
1.2.2
less. Similarly, failure to detect a significant difference does not mean that a difference does not exist. The p value has often been subject to misinterpretation. The p value is not the probability that the null hypothesis is true. It also does not indicate the size or importance of the observed effect. Even an effect that is highly statistically significant (e.g., p < 0.0001) could be clinically insignificant if the magnitude of the difference between groups (i.e., the effect size) is small, or if the observation is not relevant to clinical practice.
event. The NNT is the reciprocal of the ARR (i.e., NNT = 1 ARR). The relative risk reduction (RRR) is also frequently presented and is the amount of risk reduction relative to the baseline risk. It is calculated as ARR divided by the baseline event rate (i.e., divided by the incidence in those without the exposure). Example: A new antiplatelet agent is being tested for its ability to decrease the incidence of myocardial infarction (MI) at 60 days. One thousand patients are randomized to either the new drug or to a placebo, resulting in 500 people in each group. After 60 days, 15 patients in the active treatment group and 25 patients in the placebo group have experienced the primary outcome (i.e., MI). What is the NNT with this new medication, to prevent one MI? What is the RRR? Answer: The incidence of MI in the treatment group was 15 500 = 0.03. The incidence in the placebo group was 25 500 = 0.05. Therefore, the ARR = 0.05 0.03 = 0.02, or 2%. The NNT = 1 ARR = 1 0.02 = 50. Therefore, 50 patients would need to be treated with the new medication for 60 days to prevent one MI. The RRR = 0.02 0.05 = 0.40, or 40%.
Confidence Intervals
Confidence intervals describe the variability, or margin of error, of a given result. Since point estimates, such as means or hazard ratios, do not have an associated probability describing how likely that value is, interval estimates are often provided. Since a level of significance (i.e., ) of 0.05 is often used, investigators commonly report intervals to describe a 95% confidence interval. The 95% confidence interval denotes that if an experiment were repeated numerous times, the point estimate would fall within the confidence intervals 95% of the time.
Sample Size and Effect Size

A number of statistical considerations go into determining the required sample size for a study, including the magnitude of the difference to be detected (i.e., the effect size), the frequency of the outcomes to be measured, and the statistical power desired (i.e., the desired ability to avoid a type II error). With sample sizes that are too small, investigators run the risk of failing to detect an important difference between the study groups, which would be a type II error. However, if sample sizes are larger than needed, more patients may be exposed to research risks, and extra resources will be used. As noted earlier, an effect size is a measure of how much difference exists between two groups, such as a treatment group and a control group. An effect size can be based on the difference between two means (i.e., numerical outcomes such as cholesterol level), on proportions, odds ratios, relative risks, or hazard ratios (i.e., nominal outcomes such as dead vs. alive), or on correlations (i.e., measures of association). In cardiovascular research, investigators are often interested in the magnitude of the relationship between two characteristics, such as the presence of a risk factor and the occurrence of a specific outcome. Two ways to express this relationship are with the relative risk (RR), and the odds ratio (OR). RRs are used in cohort or prospective studies, whereas ORs are used in case-control studies. The OR is the odds that a person with an adverse outcome was at risk, divided by the odds that a person without an adverse outcome was at risk. The RR, used in prospective studies, is the ratio of the incidence of the outcome in people with the risk factor to the incidence in people without the risk factor. The absolute risk reduction (ARR) is the absolute value of the difference between the two event rates (i.e., the incidence in those with the risk factor vs. the incidence in those without). ARR is a useful measure because it can be used to calculate the number needed to treat (NNT) to prevent one
Assessing Yield of Diagnostic Tests

An important part of cardiology is evaluating the accuracy of diagnostic tests. Even with advanced diagnostic technology such as cardiac CT scans, nuclear stress tests, and electrophysiology (EP) studies for diagnosing inducible arrhythmias, the possibility of false positive and false-negative test results exists. The accuracy of a diagnostic test depends on both its sensitivity and its specificity.
Sensitivity
Sensitivity is the probability of a positive test result in patients who have the condition. It is calculated as: true positives (true positives + false negatives) (Table 1). Tests with higher sensitivity mean lower chances of missing the disease. A very sensitive test, when negative, rules out disease. A helpful mnemonic for this is SNOUT: SeNsitive test = good for rule OUT.
Specificity
The specificity of a test is the tests ability to identify individuals who do not have disease. More precisely, specificity is the probability of a negative test result in a patient who does not have the condition being measured. Specificity is calculated as: true negatives (true negatives + false positives) (Table 1). Tests with higher specificity mean that fewer normal people are misdiagnosed as having the disease. A very specific test, when positive, rules in disease. A helpful mnemonic for this is SPIN: SPecific test = rule IN.
Positive and Negative Predictive Values

Performance of a diagnostic test can also be assessed by the positive predictive value (PPV) and the negative predictive value (NPV). The PPV is the probability that a patient whose test is positive actually has the disease. It is calculated as: true positives (true positives + false positives). The NPV is the probability that a patient whose test is negative does not have the disease. It is
1.2: Biostatistics
1.2.3
Bayes Theorem
Summary of Sensitivity, Speci city, Positive Predictive Value, and Negative Predictive Value
Patients with disease Positive test Negative test Patients without disease
a c
b d
Sensitivity = a/(a + c) Specicity = d/(b + d) Positive predictive value = a/(a + b) Negative predictive value = d/(c + d)
Another method for calculating the predictive value of a positive test uses Bayes theorem, a mathematical formula. Bayes theorem is used in clinical medicine to determine the probability of a particular disease in a group of people with a specific characteristic, such as a positive test result. It considers the overall rate of that disease and the likelihood of that specific characteristic in healthy versus diseased individuals, respectively. Bayes theorem can be used in clinical decision making to estimate the probability of a particular diagnosis, given the presence of a particular symptom, sign, or abnormal test result. One example is the likelihood of having coronary artery disease in a patient with an abnormal stress test. This likelihood depends on the patients pretest probability of disease, as well as on the accuracy of the stress test. In a patient with a low pretest probability of disease, an abnormal stress test may be just as likely to be a false positive as a true positive. Conversely, a patient with an extremely high risk of having coronary disease may still be more likely to have coronary disease than not, even if his stress test is normal.
1
Table 1 Summary of Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value
calculated as: true negatives (true negatives + false negatives) (Table 1). Unlike sensitivity and specificity, which tend to be inherent characteristics of the diagnostic test, PPV and NPV depend on the prevalence of disease in the population being tested.
Likelihood Ratio
The likelihood ratio is an alternative method for incorporating information about the sensitivity and specificity of a test, and uses odds rather than probabilities. It can be calculated for both a positive test result and for a negative test result. The likelihood ratio-positive (+LR) can be used to calculate the odds of disease if a test is positive. It is calculated as the odds that a positive test result occurs in patients with the disease (i.e., sensitivity) divided by the odds that the result occurs in patients without the disease (i.e., false-positive rate, or 1 specificity). The post-test odds of a positive test can then be calculated by multiplying the likelihood ratio by the prior, or pretest odds (i.e., [LR+] x pretest odds = post-test odds). The post-test odds can be converted to a post-test probability by dividing it by (1 + post-test odds). This result is the same as the PPV (i.e., the probability that a patient whose test is positive actually has the disease). The likelihood ratio-negative (-LR) is defined as: (1 sensitivity) specificity. It can be used to find the odds of disease even if a test is negative.
1.2.4 Chapter 1: General Principles
Example: Suppose a cardiovascular stress test has a sensitivity and a specificity of 90% for detection of coronary artery disease. A 30-year-old woman with no cardiovascular risk factors but with atypical chest pain is referred for this test. Her pretest probability of coronary artery disease is low (assume 5% for the purposes of this exercise). If the test comes back as abnormal, what is the likelihood that the patient actually has coronary artery disease? Answer: First, generate a 2 x 2 contingency table (Table 2). If 1,000 low-risk individuals are tested, 50 would be expected to have disease, while 950 would be disease-free. Since the test is 90% sensitive, the number of individuals with a true positive test result will be: 0.90 x 50 = 45, leaving 5 patients (i.e., 50 45) with a false-negative result. Since the test is also 90% specific, the number with a true negative result will be: 0.90 x 950 = 855, leaving 95 patients (i.e., 950 855) with a false positive result. Thus, 140 patients overall will have positive stress test results, and only 45 patients will be true positives. In other words, an abnormal stress test in a low-risk patient is more than twice as likely to be a false positive as a true positive (i.e., PPV = 45 140 = 32%). The test will be even less helpful if the true pretest probability is lower than 5%.
Receiver Operating Characteristic Curves

Sensitivity, specificity, and the parameters outlined earlier work well for evaluating diagnostic tests with binary results (i.e., tests in which results are either normal or abnormal). However, a
2 x 2 Contingency Table Illustrating Bayes Theorem

Disease Status Stress Test Result Positive Test Negative Test Total Coronary Disease 45 5 50 No Coronary Disease 95 855 950 Total 140 860 1,000
Table 2 2 x 2 Contingency Table Illustrating Bayes Theorem

Given: Sensitivity = 90%, Specificity = 90%, Pretest probability = 5% Calculations: PPV = true positives all positive tests = 45 140 = 0.32. NPV = true negatives all negative tests = 855 860 = 0.99. +LR = sensitivity (1-specificity) = 0.90 (1-0.90) = 9. Pretest odds = pretest probability (1 pretest probability) = 0.05 (1 0.05) = 0.053 Post-test odds = Pretest odds x (+LR) = 0.053 x 9 = 0.48. Post-test probability = Post-test odds (1 + post-test odds) = 0.48 (1 + 0.48) = 0.32 = same as PPV.
higher for someone with disease than for someone without disease. A useful measurement derived from ROC curves is the area under the ROC curve (AUC), which is a measure of discrimination. A test with an AUC of 1.0 would have perfect discrimination, while a test with an AUC of 0.5 would be no better than tossing a coin to decide. The term c-statistic is often used interchangeably with AUC. The c-statistic can be used to compare two different tests or two different models (i.e., one with standard risk factors alone vs. a second, which also includes a novel risk marker).2 However, a number of researchers have recently warned against using a comparison of c-statistics as the sole measure of worth when evaluating a novel marker or test.3
Discrimination and Reclassification

Recently, clinical researchers who evaluate diagnostic tests have begun placing more emphasis on measures of reclassification.4 Reclassification is the ability of a diagnostic test to more accurately stratify individuals into higher and lower risk categories. It is essentially a more clinically relevant form of discrimination. A better prediction ability is implied by an increased probability that patients with disease will be categorized as high risk, and a decreased probability that patients without disease will be categorized as high risk. The opposite situation implies a worse prediction ability. The net reclassification index (NRI) is one method for assessing discrimination in this way. Using pre-defined risk categories, it measures how the new test result induces changes from one risk category to another. It is a measure of the difference in proportions moving up and down among people with disease versus those without disease. The integrated discrimination index (IDI) considers the change in the estimated predicted probability of disease as a continuous variable, rather than via categories.5,6
large number of cardiovascular tests have numeric results with a wide range of possible values (e.g., cholesterol levels, natriuretic peptides, and cardiac troponins). For continuous tests like these, the sensitivity and specificity of the test depends on which cut point is used to distinguish normal from abnormal. Receiver operating characteristic (ROC) curves are used to display the relationship between sensitivity and specificity for tests with continuous outcomes. ROC curves are plotted with the sensitivity (i.e., true-positive rate) on the y-axis, and the falsepositive rate (i.e., 1 specificity) on the x-axis. In essence, ROC curves are a function of the sensitivity and specificity for each individual value of a test (Figure 1). ROC curves can be used to assess discrimination, which is one way to assess the usefulness of a continuous test. It is defined as the probability that the predicted risk (i.e., test value) is
Comparing Groups
Comparing Two Groups
The appropriate test to use to evaluate differences between two or more independent groups depends on a number of factors, including whether the data are continuous or categorical, and whether the data are normally distributed. When the research question asks whether the mean of two groups are equal, the most commonly used test is the t test, also called the Student t test. When the question is whether proportions in two inde1.2: Biostatistics 1.2.5
Receiver Operating Characteristic (ROC) Curve and Area Under the Curve (AUC)
1.0
BNP , 50 pg/ml BNP , 80 pg/ml BNP , 100 pg/ml BNP , 125 pg/ml BNP , 150 pg/ml
is using cardiac risk factors (e.g., age, cholesterol level, blood pressure, etc.) to predict the risk of coronary artery disease. Linear regression may be used when the dependent variable (the variable being predicted) is a continuous numeric variable. For example, linear regression may be used to predict systolic blood pressure from age, weight, and sex. However, linear regression modeling assumes that the relationships between the dependent and independent variables are linear, that the sample used for development of the model may be generalized to the population it is supposed to represent, and that the scatter of the data points about the regression line is normally distributed. When the dependent variable is a binary variable (e.g., presence vs. absence of coronary disease), logistic regression is used. Logistic regression is commonly used in cardiovascular research for evaluating outcomes and diagnostic tests. For example, a new biomarker may appear to predict the risk of heart failure. However, logistic regression can be used to show that after adjusting the biomarker level for confounding factors that may be associated with both the biomarker level and the outcome of interest (e.g., age, sex, renal function), the biomarker is no longer a significant predictor.
0.8
Sensitivity
0.6
0.4
Area under the receiver-operating-characteristic curve, 0.91 (95% con dence interval, 0.90-0.93)
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1-Specicity
Figure 1 Receiver Operating Characteristic (ROC) Curve and Area Under the Curve (AUC)
This figure presents an ROC curve for various cutoff levels of B-type natriuretic peptide (BNP) and is used in diagnosing acute heart failure among emergency department patients with acute dyspnea. Reproduced with permission from Maisel AS, Krishnaswamy P, Nowak RM, et al., on behalf of the Breathing Not Properly Multinational Study I. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med 2002;347:161-7.
pendent groups are equal, the most commonly used test is the chi-square.
Correlation
The correlation coefficient (denoted by r), or correlation, is a measure of the relationship between two numerical measurements made on the same set of subjects. A positive r value greater than zero describes two variables that are directly and positively associated, whereas a negative r value implies an inverse association. When the correlation coefficient is squared (denoted by r2), it provides a more intuitive way to think about correlations. For example, an r2 of 0.58 means that 58% of the variability in one of the measures is accounted for (i.e., predicted) by knowing the measurement of the other variable. Example: If the correlation (r) between age and B-type natriuretic peptide (BNP) level is 0.2, then r2 = (0.2)2 = 0.04. Thus, 4% of the variability in BNP level is due to age.
Multivariable, or multiple, regression simply means that more than one independent variable is included in the prediction equations. Multivariate techniques allow researchers to adjust for (i.e., weight) the relative contributions of each independent variable on the dependent variable. Multivariable models provide only estimates of the true relationship. Models created from larger sample sizes may provide less uncertainty than models created from smaller sample sizes. Models can be overfit by attempting to adjust for too many independent variables. For logistic regression models, a good rule of thumb is that 10 outcome events are needed for each independent variable included in the model.
Propensity Analysis
Propensity analysis is an alternative to multivariable regression that is used to control for a group of confounding factors. Use of propensity scores provides an alternative method for estimating treatment effects when treatment assignment is not random. In randomized trials, randomization generates treatment groups that are generally balanced for confounding factors, especially when large numbers of patients are involved. In observational studies, however, assignment of patients to treatment groups is frequently imbalanced for a number of measured and unmeasured variables (i.e., covariates) and can
Linear and Logistic Regression

Regression models are used to predict the value of one characteristic (i.e., the dependent variable) based on knowledge of one or more other variables (i.e., the independent variables). An example
1.2.6 Chapter 1: General Principles
Kaplan-Meier Survival Plot

100
80
who have undergone a standard procedure versus a novel procedure, a death on the first day after the procedure carries a different value than a death that occurs after five years. Survival analysis provides a way to account for this type of time-dependent data. Rather than counting the number of events, the time it takes for an event to happen is considered. Sometimes there may be limited followup data for a subject in long-term study. For example, if a patient is lost to followup, researchers may not know whether the patient had the outcome of interest or not. In other situations, a patient may have died from causes unrelated to the cardiovascular disease being studied (e.g., dying in a motor vehicle accident). Rather than excluding these patients, survival analysis uses a method called censoring. Censoring is also used when analysis of survival is done while some patients in the study are still living; the observations on these patients are censored at the end of the follow-up period since one does not know how long these patients will remain alive. Censoring allows the data to be included up until the last follow-up time that was available.
Survival (%)
60
40
TnT Undetectable TnT 0.01 ng/ml p<0.001
20
Time (years)
Figure 2 Kaplan-Meier Survival Plot

This figure presents a Kaplan-Meier plot depicting survival in Rancho Bernardo Study participants with detectable versus undetectable cardiac troponin T. TnT = Troponin T. Reproduced with permission from Daniels LB, Laughlin GA, Clopton P, Maisel AS, Barrett-Connor E. Minimally elevated cardiac troponin T and elevated N-terminal pro-B-type natriuretic peptide predict mortality in older adults: results from the Rancho Bernardo Study. J Am Coll Cardiol 2008;52:450459.
Kaplan-Meier Plots
therefore provide a biased estimation of treatment or exposure effects. This makes it difficult to confidently attribute any differences between the groups to the treatment. Propensity analysis attempts to reduce the effect of confounding variables by developing a propensity score based on possible confounders and then adjusting predictive models with that score. Propensity scores can also be used in retrospective studies to match individuals, by selecting and comparing individuals with similar propensity scores. A disadvantage of propensity analysis is that, despite including a large number of covariates, propensity scores still may leave significant biases that are unaccounted for; no score can account for all potential measureable and non-measureable confounders.
Kaplan-Meier plots are used to depict survival data that include censored data (Figure 2). They show the frequency of patients who developed a given outcome, such as death, at a given time. Since the number of patients available for such an analysis decreases with time, estimates of survival become less precise over time. The log-rank test, also called the Mantel-Cox test, is often used to compare the survival distributions of two groups. The hazard ratio can be calculated for survival data and indicates the risk of an outcome at any time in the group with an exposure (i.e., the treatment group) compared to the risk in the control group.
Cox Proportional Hazards Models

Proportional hazards models are a type of survival model that permits researchers to control for confounding variables. Survival models relate the time to an event to one or more associated covariates. In a proportional hazards model, the assumption is that the effect of the covariate is multiplicative with respect to the hazard rate. For example, if taking drug X reduces the risk of death at 30 days by 25%, it also reduces the risk by 25% at 6 months, at 1 year, or at any other time. Cox proportional hazards models are frequently used in cardiovascular literature to assess the effect of a new treatment on outcomes, after adjusting for possible confounders such as patient age, sex, and other comorbid conditions.
1.2: Biostatistics 1.2.7
Analyzing Survival Data

Many cardiovascular studies are designed to evaluate whether a new treatment will improve outcomes. In this situation, investigators may be interested in long-term outcomes including the incidence of hospitalization, MI, or death. When research involves time-related variables like these, the timing of the event, not just the fact that it happened, is important. For example, in a study that is comparing survival times between groups of individuals
Key Points
Medical research studies can be observational (i.e., case series, case-control, cross-sectional, and cohort studies) or interventional (i.e., clinical trial), and provide varying levels of evidence depending on the design. A statistically significant difference between two groups can be identified by either comparing the p value against the stated alpha or by determining that the confidence intervals exclude a finding of no difference. Those taking the American Board of Internal Medicine examination should be able to calculate sensitivity, specificity, PPV, and NPV, and should understand the concepts of Bayes theorem of post-test probability. ROC curves and the AUC are two ways to assess the usefulness of a diagnostic test. T tests are used to compare means between two independent groups; chi-square is used to compare proportions between two groups. The square of the correlation coefficient, r2, tells the percentage of the variability in one measure that is predicted by knowing the measurement of the other variable. Linear regression and logistic regression are used to predict the value of a dependent variable (i.e., outcome measure) from one or more independent variables (i.e., covariates). Kaplan-Meier plots can be used to plot survival or other time-dependent data. Cox proportional hazards models are used with survival data to adjust for confounding variables.
References
1. Hlatky MA, Greenland P, Arnett DK, et al. Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation 2009;119:2408-16. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol 2004;159:88290. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007;115:928-35. Ridker PM. C-reactive protein and the prediction of cardiovascular events among those at intermediate risk: moving an inflammatory hypothesis toward consensus. J Am Coll Cardio 2007;49:2129-38. Pencina MJ, DAgostino RB Sr, DAgostino RB Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008;27:157-72; discussion 207-12. Pencina MJ, DAgostino RB Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 2011;30:11-21.
2.
3. 4.
5.
6.
1.2.8

1 - 2 Biostatistics

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

1 - 2 Biostatistics

Încărcat de

Drepturi de autor:

Formate disponibile

Chapter 1: General Principles

Errors in Hypothesis Testing

Chapter 1: General Principles

Sample Size and Effect Size

Assessing Yield of Diagnostic Tests

Positive and Negative Predictive Values

Receiver Operating Characteristic Curves

2 x 2 Contingency Table Illustrating Bayes Theorem

Table 2 2 x 2 Contingency Table Illustrating Bayes Theorem

Discrimination and Reclassification

Linear and Logistic Regression

Kaplan-Meier Survival Plot

TnT Undetectable TnT 0.01 ng/ml p<0.001

Figure 2 Kaplan-Meier Survival Plot

Cox Proportional Hazards Models

Analyzing Survival Data

Chapter 1: General Principles

S-ar putea să vă placă și