Sunteți pe pagina 1din 7

Stat 653 HW3 Divya Nair

Exercise 1 (2.1). An article in the New York Times (February 17, 1999) about the PSA blood test for
detecting prostate cancer stated that, of men who had this disease, the test fails to detect prostate cancer
in 1 in 4 (so called false-negative results), and of men who did not have it, as many as two-thirds receive
false-positive results. Let C(C̄) denote the event of having (not having) prostate cancer and let +(−) denote
a positive (negative) test result.

1 1 2 2
a. Which is true: P (− | C) = 4 or P (C | −) = 4? P (C̄ | +) = 3 or P (+ | C̄) = 3?

Solution. ... 1 in 4 ...


of the men who had this disease, the test fails to detect prostate cancer in
P (−∩C)
1
precisely means that P (− | C) = = 4 . Similarly, ... of men who did not have it (disease),
P (C)
P (+∩C̄)
as many as two-thirds receive false-positive results. precisely means that P (+ | C̄) =
P (+) = 32 .
1 2
Hence, P (− | C) =
4 and P (+ | C̄) = 3 are true.

b. What is the sensitivity of this test?

Solution. Sensitivity is the probability that the diagnostic test is positive given that a subject has the
disease. Using the complement rule for conditional probability and the known probability from part
1 3
(a), P (+ | C) = 1 − P (− | C) = 1 − 4 = 4.

c. Of men who take the PSA test, suppose P (C) = 0.01. Find the cell probabilities in the 2 × 2 table for
the joint distribution that cross classies Y = diagnonis (+, −) with X = true disease status (C, C̄).

Solution. The 2×2 table with all the cell probabilities are given below.

Diagnosis
True Disease Status
+ − Total
C 0.0075 0.0025 0.01
C̄ 0.66 0.33 0.99
Total 0.6675 0.3325 1

The values in this table are lled in the following way:

Since P (C) = 0.01, its complement is P (C̄) = 1 − P (C) = 0.99. This lls up all the values in the third
column.

Next, applying the denition of conditional probability on P (− | C) and using the known probability
1
from part (a) we have, P (− ∩ C) = P (− | C) · P (C) = 4 × 0.01 = 0.0025. Consequently, P (+ ∩ C) =
0.01 − 0.0025 = 0.0075. These calculations ll up all the values in the rst row.

Using the complement rule for conditional probability we have, P (− | C̄) = 1 − P (+ | C̄) = 1 − 23 = 13 .
Also, applying the denition of conditional probability on P (− | C̄) gives P (−∩ C̄) = P (− | C̄)·P (C̄) =
1
3 × 0.99 = 0.33. Hence, P (+ ∩ C̄) = 0.99 − 0.33 = 0.66, and P (+) = 0.0075 + 0.66 = 0.6675, and
P (−) = 0.0025 + 0.33 = 0.3325. This completes the table.

d. Using (c), nd the marginal distribution for the diagnosis.

Solution. As computed in part (c), P (+) = 0.6675 and P (−) = 0.3325.

e. Using (c) and (d), nd P (C | +), and interpret.

P (C∩+) 0.0075
Solution. As computed in parts (c) and (d), P (C | +) =
P (+) = 0.6675 = 0.01124. This means
that the probability of men diagnosed with prostate cancer given that they tested positive for it is
0.01124.

1
Stat 653 HW3 Divya Nair

Exercise 2 (2.2). For diagnostic testing, let X = true status (1 = disease, 2 = no disease) and Y =
diagnosis(1 = positive, 2 = negative). Let πi = P (Y = 1 | X = i), i = 1, 2.

a. Explain why sensitivity = π1 and specicity = 1 − π2 .

Solution. Sensitivity is the probability that the diagnostic test is positive given that a subject has the
disease, that is, P (Y = 1 | X = 1). Said dierently, sensitivity is the probability of success for the
subjects in row 1 of the contingency table, and so its probability is given by π1 .
Specicity is the probability that the test is negative given that the subject does not have the disease,
that is, P (Y = 2 | X = 2). In other words, specicity is the probability of failure for the subjects in
row 2 of the contingency table. Hence, its probability is given by 1 − π2 .

b. Let γ denote the probability that a subject has the disease. Given that the diagnosis is positive, use
π1 γ
Bayes' theorem that the probability a subject truly has the disease is .
π1 γ + π2 (1 − γ)

P (B | A) · P (A)
Solution. Recall that Bayes's Theorem is P (A | B) = . The probability that the diag-
P (B)
nosis is positive P (Y = 1) is

P (Y = 1) = P (Y = 1 ∩ X = 1) + P (Y = 1 ∩ X = 2)
= P (Y = 1 | X = 1) · P (X = 1) + P (Y = 1 | X = 2) · P (X = 2).

Thus, the probability that a subject truly has the disease P (X = 1 | Y = 1) is given by

P (Y = 1 | X = 1) · P (X = 1)
P (X = 1 | Y = 1) =
P (Y = 1)
P (Y = 1 | X = 1) · P (X = 1)
=
P (Y = 1 | X = 1) · P (X = 1) + P (Y = 1 | X = 2) · P (X = 2)
π1 γ
= .
π1 γ + π2 (1 − γ)

c. For mammograms for detecting breast cancer, suppose γ = 0.01, sensitivity = 0.86, and specicity
= 0.88. Given a positive test result, nd the probability that the woman truly has breast cancer.

π1 γ
Solution. From part (b), the probability that the woman truly has breast cancer is given by .
π1 γ + π2 (1 − γ)
Since specicity = 1 − π2 = 0.88, we get that π2 = 1 − 0.88 = 0.12. Thus,

π1 γ 0.86 × 0.01
= = 0.0675.
π1 γ + π2 (1 − γ) 0.86 × 0.01 + 0.12(1 − 0.01)

d. To better understand the answer in (c), nd the joint probabilities for the 2×2 cross classication of
X and Y. Discuss their relative sizes in the two cells that refer to a positive test result.

Solution. The 2×2 table with joint probabilities are given below.

Diagnosis
True Status
Y =1 Y =2 Total
X=1 0.0086 0.0014 0.01
X=2 0.1188 0.8712 0.99
Total 0.1274 0.8726 1

2
Stat 653 HW3 Divya Nair

The values in the table are found in the following way:

P (Y = 1 ∩ X = 1) = P (Y = 1 | X = 1) · P (X = 1) = 0.86 × 0.01 = 0.0086


P (Y = 2 ∩ X = 1) = 0.01 − 0.0086 = 0.0014
P (X = 2) = 1 − 0.01 = 0.99
P (Y = 1 ∩ X = 2) = P (Y = 1 | X = 2) · P (X = 2) = 0.12 × 0.99 = 0.1188
P (Y = 2 ∩ X = 2) = 0.99 − 0.1188 = 0.8712.

The probability of women who have breast cancer and tested positive for it is lower than the probability
of women who do not have breast cancer but tested positive for it.

Exercise 3 (2.3). According to the recent UN gures, the annual gun homicide rate is 62.4 per one million
residents in the United States and 1.3 per one million residents in the UK.

a. Compare the proportion of residents killed annually by guns using (i) dierence of proportions, (ii)
relative risk.

Solution. The proportion of residents killed annually by guns

(i) Dierence of proportions:

ˆ = p1 − p2

= 62.4 per one million − 1.3 per one million

= 61.1 per one million.

π1 p1
(ii) Relative risk is given by
π2 which is equal to p2 = 48.
We see the dierence in proportions is a very small number compared to the relative risk.

b. When both proportions are very close to 0, as here, which measure is more useful for describing the
strength in association? Why?

Solution. The relative risk is a more useful measure in describing the strength in association because
the dierence of proportions is so small that it misleads one into thinking that the dierence in the
annual gun homicide rate between the two countries is negligible.

Exercise 4 (2.4). A newspaper article preceding the 1994 World Cup seminal match between Italy and
Bulgaria stated that Italy is favored 10-11 to beat Bulgaria, which is rated at 10-3 to reach the nal.
11 3
Suppose this means that the odds that Italy wins are
10 and the odds that Bulgaria wins are 10 . Find the
probability that each team wins, and comment.

11
odds 10
Solution. The probability of success is given by
odds +1
. The probability that Italy wins is
11 = 0.5238,
+ 10 1
3
10
and the probability that Bulgaria wins is
3 = 0.2308.
10 +1
Exercise 5 (2.5). Consider the following two studies reported in the New York Times :

a. A British study reported (December 3, 1998) that, of smokers who get lung cancer, women are 1.7
times more vulnerable than men to get small-cell lung cancer. Is 1.7 an odds ratio, or a relative risk?

Solution. The number 1.7 is a relative risk since the proportion of women who get small-cell lung
cancer are being compared to the proportion of men who get small-cell cancer.

3
Stat 653 HW3 Divya Nair

b. A National Cancer Institute study about tamoxifen and breast cancer reported (April 7, 1998) that
the women taking the drug were 45% less likely to experience invasive breast cancer compared with the
women taking placebo. Find the relative risk for (i) those taking the drug compared to those taking
placebo, (ii) those taking placebo compared to those taking the drug.

π1
π2 = 1−0.45 =
Solution. The relative risk for those taking the drug compared to those taking placebo is
0.55. On the other hand, the relative risk for those taking placebo compared to those taking the drug
π 1
is 2 =
π1 0.55 = 1.8182.

Exercise 6 (2.6). In the United States, the estimated annual probability that a woman over the age of 35
dies of lung cancer equals 0.001304 for current smokers and 0.000121 for nonsmokers [M. Pagano and K.
Gauvreau, Principles of Biostatistics, Belmont, CA: Duxbury Press (1993), p. 134].

a. Calculate and interpret the dierence of proportions and the relative risk. Which is more informative
for this data? Why?

Solution. The dierence of proportions is ˆ = p1 − p2 = 0.001304 − 0.000121 = 0.001183.


∆ The relative
π1 0.001304
risk is
π2 = 0.000121 = 10.7769. The relative risk is more informative here since the dierence of
proportions is so small.

b. Calculate and interpret the odds ratio. Explain why the relative risk and odds ratio take similar values.

π1 /(1−π1 ) 0.001304/(1−0.001304)
Solution. The odds ratio is given by
π2 /(1−π2 ) = 0.000121/(1−0.000121) = 10.7896. Since the odds ratio is
greater than 1, we conclude that women who smoke and are over the age of 35 are more likely to die of
lung cancer than women who do not smoke and are over the age of 35. The relative risk and odds ratio
take similar values because both π1 and π2 are close to zero. Consequently, 1 − π1 is approximately
the same as 1 − π2 . They then cancel each other in the odds ratio formula leaving with the formula
for the relative risk.

Exercise 7 (2.7). For adults who sailed on the Titanic on its fateful voyage, the odds ratio between gender
(female, male) and survival (yes, no) was 11.4. (For data, see R. Dawson, J. Statist. Educ. 3, no. 3, 1995.)

a. What is wrong with the interpretation, The probability of survival for females was 11.4 times that for
males.? Give the correct interpretation.

Solution. The odds ratio is the ratio of the odds of an event occurring in one group to the odds of that
event occurring in another group. The correct interpretation is The odds of survival for females was
11.4 times that the odds of survival for males.

b. The odds of survival for females equaled 2.9. For each gender, nd the proportion who survived.

oddsF
Solution. The odds ratio θ = oddsM
. It is given in the problem that θ = 11.4 and oddsF = 2.9
oddsM
which gives that oddsM = 0.2544. The probability of survival for males is given by πM = oddsM +1
=
0.2544 oddsF 2.9
0.2544+1 = 0.2028, and the probability of survival for females is given by πF = oddsF +1
= 2.9+1 =
0.7436.

c. Find the value of R in the interpretation, The probability of survival for females was R times that for
males.

Solution. For the given interpretation to be sensible, R here has to be the relative risk which is given
πF 0.7436
by
πM = 0.2028 = 3.6667.

4
Stat 653 HW3 Divya Nair

Exercise 8 (2.8). A research study estimated that under a certain condition, the probability a subject
would be referred for heart catheterization was 0.906 for whites and 0.847 for blacks.

a. A press release about the study stated that the odds of referral for cardiac catheterization for blacks
are 60% of the odds for whites. Explain how they obtained 60% (more accurately, 57%).

oddsB πB /(1−πB ) 0.847/0.153


Solution. The odds ratio is θ = oddsW
= πW /(1−πW ) = 0.906/0.094 = .5744 which is equivalent to

57%.

b. An Associated Press story that described the study stated Doctors were only 60% as likely to order
cardiac catheterization for blacks as for whites. What is wrong with this interpretation? Give the
correct percentage for this interpretation. (In stating results to the general public, it is better to use
the relative risk than the odds ratio. It is simpler to understand and less likely to be misinterpreted.
For details, see New Engl. J. Med., 341: 279-283, 1999.)

Solution. The given interpretation is trying to compare the probability of cardiac catheterization in
blacks with the probability of cardiac catheterization in whites, but 60% describes the odds ratio
πB
instead. The interpretation can be corrected by using the percentage of relative risk which is
πW =
0.847
0.906 = 0.9349 ≈ 93%.

Exercise 9 (2.9). An estimated odds ratio for adult females between the presence of squamous cell carcinoma
(yes, no) and smoking behavior (smoker, nonsmoker) equals 11.7 when the smoker category consists of
subjects whose smoking level s is 0 < s < 20 cigarettes per day; it is 26.1 for smokers with s ≥ 20 cigarettes
per day (R. Brownson et al., Epimediology, 3: 61-64, 1992). Show that the estimated odds ratio between
26.1
carcinoma and smoking levels (s ≥ 20, 0 < s < 20) equals
11.7 = 2.2. Data posted at the FBI website
(www.fbi.gov).

Solution. In a 2 × 2 table, the estimated odds ratio between the presence of squamous cell carcinoma (Y )
and smoking level (X) of 0 < s < 20 cigarettes per day is given by odds
oddsc = 26.1. Similarly, the estimated
s

odds ratio between the presence of squamous cell carcinoma and smoking level of s ≥ 20 cigarettes per day
oddsss
is given by
oddsc = 11.7. Then the estimated odds ratio between carcinoma and smoking levels (s ≥ 20,
0 < s < 20) is odds 26.1×oddsc
oddss = 11.7×oddsc = 2.2.
ss

Exercise 10 (2.10). Data posted at the FBI website (www.fbi.gov) stated that of all blacks slain in 2005,
91% were slain by blacks, and of all whites slain in 2005, 83% were slain by whites. Let Y denote race of
victim and X denote race of murderer.

a. What conditional distribution do these statistics refer to, Y given X, or X given Y?

Solution. Clearly, these statistics refer to X given Y.

b. Calculate and interpret the odds ratio between X and Y.

Solution. The given information is lled in the following 2×2 contingency table where b stands for
black and w stands for white.

Y
X
b w
b 0.91 0.09
w 0.17 0.83
π1 /(1 − π1 ) 0.91/0.09
The odds ratio between X and Y is then = = 49.37. The odds of race of
π2 /(1 − π2 ) 0.17/0.83
murderer is 49.37 times higher than the odds of race of victim.

5
Stat 653 HW3 Divya Nair

c. Given that a murderer was white, can you estimate the probability that the victim was white? What
additional information would you need to do this? (Hint: How could you use Bayes's Theorem?)

P (X = w | Y = w) · P (Y = w)
Solution. By Bayes's Theorem, P (Y = w | X = w) = where w stands
P (X = w)
for white. To estimate this probability we need P (Y = w) and P (X = w).

Exercise 11 (2.12). A statistical analysis that combines information from several studies is called a meta
analysis. A meta analysis compared aspirin with placebo on incidence of heart attack and of stroke, separately
for men and for women (J. Am. Med. Assoc., 295: 306-313, 2006). For the Women's Health Study, heart
attacks were reported for 198 of 19, 934 taking aspirin and for 193 of 19, 942 taking placebo.

a. Construct a 2×2 table that cross classies the treatment (aspirin, placebo) with whether a heart attack
was reported (yes, no).

Solution. The given information is recorded in a 2×2 table below.

Heart Attack
Treatment
Y N Total
A 198 19, 736 19, 934
P 193 19, 749 19, 942

b. Estimate the odds ratio and interpret.

n11 /n12 198/19,736


Solution. The odds ratio is
n21 /n22 = 193/19,749 = 1.0266. Since the odds ratio is greater than
θ̂ = 1,
women who take aspirin are more likely to have a heart attack than women do not take aspirin.

c. Find a 95% condence interval for the population odds ratio for women. Interpret. (As of 2006, results
suggested that for women, aspirin was helpful for reducing risk of stroke but not necessarily risk of
heart attack.)

q
1 1 1 1
Solution. The condence interval is given by log θ̂ ±Zα/2 ·σlog θ̂ where σlog θ̂ = n11 + n12 + n21 + n22 .
The calculations needed to compute the 95% condence interval is shown below.

log θ̂ = log 1.0266 = 0.0114


r
1 1 1 1
σlog θ̂ = + + + = 0.1017.
198 19736 193 19749

Thus, log θ̂ ± Zα/2 · σlog θ̂ becomes 0.0114 ± 1.96 × 0.1017 = (−0.18793, 0.21073), and so the 95%
condence interval is (e−0.18793 , e0.21073 ) = (0.82867, 1.23458). Since the interval does not contain
θ̂ = 1, we conclude that the true odds of heart attack is the same for both treatments.

Exercise 12 (2.13). Refer to Table 2.1 about belief in an afterlife.

Belief in After Life


Gender
Y N Total
F 509 116 625
M 398 104 502
Total 907 220 1127

a. Construct a 90% condence interval for the dierence of proportions, and interpret.

6
Stat 653 HW3 Divya Nair

q
p1 (1−p1 ) p2 (1−p2 )
Solution. The condence interval for the dierence of proportions is given by (p1 −p2 )±Zα/2 n1 + n2 .
509 398
Here, p1 = 625 = 0.8144 and p2 = 502 = 0.7928. Thus, the 90% condence interval is

r
0.8144 × 0.1856 0.7928 × 0.2072
(0.8144 − 0.7928) ± 1.645 × + = 0.0216 ± 0.0392
625 502
= (−0.01764, 0.06084).

Since this interval also contains negative values, we conclude π1 − π2 < 0, or equivalently, π1 < π2 .
This means that more males believe in after life than females.

b. Construct a 90% condence level for the odds ratio, and interpret.

Solution. The condence interval for the odds ratio is given by log θ̂ ± Zα/2 · σlog θ̂ . All the calculations
are shown below.

509/116
log θ̂ = log = 0.05941
398/104
r
1 1 1 1
σlog θ̂ = + + + = 0.15071
509 116 398 104
log θ̂ ± Zα/2 · σlog θ̂ = 0.05941 ± 1.645 × 0.15071
= (−0.18851, 0.30733)

The 90% condence interval is (e−0.18851 , e0.30733 ) = (0.82819, 1.35979). Since this interval contains
θ̂ = 1, the true odds of belief in after life is dierent for males and females.

c. Conduct a test of statistical independence. Report the p-value and interpret.

Solution. The null hypothesis is that the two response variables are independent, that is, πij = πi · πj
for all i and j. The alternate hypothesis is that the two response variables are dependent on each
P (nij −µ̂ij )2
other. We will use the Pearson chi-squared statistic for testing H0 is given by X2 = µ̂ij . An
ij
ni ·nj
estimate of the expected frequency is given by µ̂ij = n . A calculation of the estimated expected
frequencies for each cell is given below.

625 × 907
µ̂11 = = 502.9947
1127
625 × 220
µ̂12 = = 122.0053
1127
502 × 907
µ̂21 = = 404.0053
1127
502 × 220
µ̂22 = = 97.9947
1127
The Pearson chi-squared statistic is

(509 − 502.9947)2 (116 − 122.0053)2 (398 − 404.0053)2 (104 − 97.9947)2


X2 = + + +
502.9947 122.0053 404.0053 97.9947
= 0.8246.

The degrees of freedom is (I − 1)(J − 1) = (2 − 1)(2 − 1) = 1. The p-value is 0.3638. We fail to reject
the null hypothesis and conclude that belief in after life and gender are independent.

S-ar putea să vă placă și