Documente Academic
Documente Profesional
Documente Cultură
Bias from misclassification can occur if the effect mesure for the correctly
classified data is meaningfully different from the estimated effect actually
observed in misclassified data.
Subjects are misclassified if their location in one of the four cells of yhe
correctly classified data changes to a different cell location in the
(misclassified) data that is actually observed.
Missclassification Table
True
D Not D
D 6 3 9
Classified
Not D 2 7 9
8 10
Study Questions (Q9.1) the following questions refer to the previous table.
1. How many truly diseased persons are misclassified?
Summary
Misclassification of disease status may occur from any of the following
sources: incorrect diagnosis, subject self-report, and coding errors.
1. Measurements are usually made at only one time point and/or in one
location of a residence.
Exposure error may occur when exposure is determined solely from self-
report by subjects, particularly when recalling prior exposure status. This is
typically a problem in case-control studies, since cases may be more motivated to
recall past exposures than controls. Recall error can also occur in cohort studies.
For example, in the Sydney Beach Users study described in Chapter 2, subjects
were asked to report their swimming may have occurred.
Summary
Misclassification of exposure status may occur from any of the
following sources: imprecise measurement, subject self-report, interviewer
bias, and incorrect coding of exposure data
A misclassification table provides a convenient summary of
how exposure can be misclassified from true exposure to observed exposure.
Misclassification can sometimes occur for both exposure and disease in the same
study. For axample, the table to the right considers hypothetical cohort data from
subject surveyed on the beaches of Sydney, Australian during the summer months
of a recent year. The study objective was’ to determine if those who swam at the
beach were more likely to become ill than those who did not swim.
264 56 30 130
Ill
27 33 3 77
66 14 7 33
Not Ill
243 297 27 693
Observed Data
Swam’ Not Swam
Summary
Misclassification can sometimes occur for both exposure and
disease in the same study.
An example of such misclassification is likely if both the exposure
variable and the disease variable are determined by subject recall.
When there is misclassification of both exposure and disease, the
observed data results from how the cell frequencies in each of the four cell of
the 2x2 table for the true data get split up into data the four cell of the 2x2
table for the observed data.
D’ 48 14 62
Chapter 9 Information Bias CLASSIFIED Page 8
Not D’ 12 126 138
60 140
Twelve subjects who were truly diseased were misclassified as no-diseased and
14subjects who were truly not diseased were misclassified as diseased. 48 subjects
who were truly diseased and 125 subjects who were truly non-diseased were
correctly classified.
In a perfect world, we would Perfect World
hope that no want was misclassified, True Proportion
D Not D
that is, we would want our table to D 60 0 60 60 = 1
look like the table below. Then the Classified 60
Not D 0 140 140 140 = 1
proportion correctly classified as 60 140
diseased would be 1 and the
proportion correctly classified as non-diseased is also 1.
In the real world, however,
48/60=80 sensitivity = prob(D’|D ) Given
Sensitivity = prob(E’|E)
Specificity = prob(not E’|E)
Summary
The underlying parameters that must be considered when assessing
information bias are called sensitivity and specificity.
Sensitivity gives the probability that a subject who is truly
diseased (or exposed) will be classified as diseased (or exposed) in one’s
study.
Specificity gives the probability that a subject who is truly non-
diseased (or unexposed) is classified as non-diseased (or unexposed) in one’s
study.
Yhe ideal value for bothsensitivity and specificity is 1, or
100%,which means there isno misclassification.
100
300 700 0
Misclassifying Exposure
500 x 725
OR’ = = 2.6
500 x 275
5. Why is there bias due to misclassifying exposure (Note: the correct odds ratio
is 3.5)?
6. What is the direction of the bias?
This example illustrates a general rule about non differential
misclassification. Whenever there is non differential misclassification of both
exposure and disease, the bias is always towards the null, provided that there are
no other variables being controlled that might also be misclassified.
General rule
ˆ Non-Differential Miscalssification
of Both Exposure and Disease
OR OR’ 1 OR’ OR
null
Chapter 9 Information Bias Page 12
Summary
Nondifferential misclassification of disease the sensitivities and specificities
for misclassifying disease do not differ by exposure.
Nondifferential misclassification of exposure the sensitivities and
specificities for misclassifying exposure do not differ by disease
Nondifferential misclassification of both disease and exposure lead to a bias
toward the null.
Differential Misclassification
This table describes the true exposure and disease status for the same 2000
subjects desribed in the previous section for a hypothetical case-control study of
the relationship between diet and coronary heart disease :
The row totals from each of the misclassification table for exposure allow
us to determine the study data that would actually be observed as a result of
misclassification :
600 x 725
OR’= = 3.95
400 x 275
OR OR’ 1 OR’ OR
With differential misclassification either the sensitivities and specificities for
misclassifying D differ by E or the sensitivities and specificities for
misclassifying E differ by D.
Differential misclassification of either D or E towards the null or away from
the null.
To quantify bias, we need to adjust the cell frequencies for the observed
data to obtain a two-way table of corrected cell frequencies from which we can
compute a corrected effect measure. We can then compare the observed and
possibly biased estimate with our corrected estimate to determine the extent of the
possible bias and its direction.
Study Question (Q9.7) Use the information in the table below about the
observed and corrected effect measure to determine whether the observed bias is
towards or away from the null.
Suppose we have determined thet whatever bias that exists result from non
differential misclassification of exposure or disease.
5. Which values of sensitivity and specificity are associated with the most
bias?
6. Which values of sensitivity and specificity are associated with the least
bias?
7. Based on the sensitivity analysis described above, how might you decide
which corrected OR is the “best”?
Summary
2. If the sensitivity had been ) 0,99, what could you conclude about a truly
diseased patient who had a negative test result?
Specificity describes the tests performance among patients who are truly
without the disease. It is defined as the conditional probability of a negative test
result given the absence of disease, P(Test – I True -).
4. If the sensitivity had been 0,99, what would you conclude about a truly non
diseased patient who had a positive test result?
Summary
• The procedure used to define true disease is called the gold standard.
The probability of true disease status for an individual patient given the result of a
diagnostic test is called the test’s predictive value. The predictive value is
particularly useful to the clinician for individual patient diagnosis because it
directly estimates the probability that the patient truly does or does not have the
disease depending on the results of the diagnostic test. That’s what the clinician
wants to know
Study Questions (Q9.9) Suppose T+ denotes the event that a patient truly has a
disease condition of interest, whereas D+ denotes the event of a positive
diagnostic test result in the same patient.
+ - total
Diagnosis + 48 14 62
Test result (D)
- 12 126 138
Total 60 140 200
3. Based on the data in the misclassification table, what is the estimate of the
average patient’s probability of having DVT prior to performing an
ultrasound?
4. Has the use of an ultrasound improved disease diagnosis for persons with
positive ultrasound results? Explain briefly.
6. Based on the data in the misclassification table, what is the estimate of the
average patient’s probability of not having DVT prior to performing an
ultrasound?
7. Has the use of an ultrasound improved disease diagnosis for persons with
negative ultrasound results? Explain briefly
Summary
• The predictive value (or post-test probability) is the probability of true
disease status given the result of a diagnostic test.
• The predictive value can be obtained directly from the misclassification
table generated by a diagnostic test study.
Quiz (Q9.10) For the classification table shown on the right, determine each of
the following :
Truth
D not D
D 90 20 110
Classified not D 10 180 190
100 200 300
Once again, the sensitivity and specificity for the classification table shown on the
right are 90%. For this table, answer each of the following.
8 what is the prevalence of the
Truth
disease? D not D
9 What is the positive predictive D 90 90 180
Classified not D 10 810 820
values?
100 900 1000
10 The prevalence in this table is
smaller than in the previous two tables, therefore, the positive value is ???
than in the previous two tables
11 These results illustrate the fact that if the prevalence is small, the
predictive value can be quite ??? even if the sensitivity and specificity
parameters are quite ???
Nomenclature
Truth
E not E
E
Classified
No
tE