Sunteți pe pagina 1din 18

Discus – A software program to assess judgment of glaucomatous damage in

optic disc photographs.

Short title Discus

Words, Figures, Tables 3600, 3, 2

Codes & Presentations GL, Poster at ARVO meeting in May 2008 (program # 3625)

Keywords glaucoma, optic disc, sensitivity, specificity, diagnostic performance

www.wordle.net
courtesy Jon Feinberg

Authors Jonathan Denniss, MCOptom1,2


Damian Echendu, OD, MSc1
David B Henson, PhD, FCOptom1,2
Paul H Artes, PhD1,2,3 (corresponding author)
1
Affiliations & Research Group for Eye and Vision Sciences,
Correspondence University of Manchester, England
2
Manchester Royal Eye Hospital, Manchester, England
3
Ophthalmology and Visual Sciences, Dalhousie University
Rm 2035, West Victoria
1276 South Park St, Halifax, Nova Scotia
B3H 2Y9, Canada
paul@dal.ca

Commercial Relationships None

Support College of Optometrists PhD studentship (JD)


Nova Scotia Health Research Foundation Grant Med-727 (PHA)

1
1 Abstract

2 Aim
3 To describe a software package (Discus) for evaluating clinicians’ assessment of optic disc damage,
4 and to provide reference data from a group of expert observers.

5 Methods
6 Optic disc images were selected from patients with manifest or suspected glaucoma or ocular
7 hypertension who attended the Manchester Royal Eye Hospital. Eighty images came from eyes
8 without evidence of visual field (VF) loss in at least 4 consecutive tests (VF-negatives), and 20
9 images from eyes with repeatable VF loss (VF-positives). Software was written to display these
10 images in randomized order, for up to 60 seconds. Expert observers (n=12) rated optic disc damage
11 on a 5-point scale (definitely healthy, probably healthy, not sure, probably damaged, definitely
12 damaged).

13 Results
14 Optic disc damage as determined by the expert observers predicted VF loss with less than perfect
15 accuracy (mean area under receiver-operating curve [AUROC], 0.78; range 0.72 to 0.85). When the
16 responses were combined across the panel of experts, the AUROC reached 0.87, corresponding to a
17 sensitivity of ~60% at 90% specificity. While the observers’ performances were similar, there were
18 large differences between the criteria they adopted (p<0.001), even though all observers had been
19 given identical instructions.

20 Conclusion
21 Discus provides a simple and rapid means for assessing important aspects of optic disc interpretation.
22 The data from the panel of expert observers provide a reference against which students, trainees, and
23 clinicians may compare themselves. The program and the analyses described in this paper are freely
24 accessible from http://discusproject.blogspot.com/.

2
25 Introduction

26 The detection of early damage of the optic disc is an important yet difficult task.1, 2
27 In many patients with glaucoma, optic disc damage is the first clinically detectable sign of disease. In
28 the Ocular Hypertension Treatment Study, for example, almost 60% of patients who converted to
29 glaucoma developed optic disc changes before exhibiting reproducible visual field damage.3, 4
30 Broadly similar findings were obtained in the European Glaucoma Prevention Study; in
31 approximately 40% of those participants who developed glaucoma, optic disc changes were
32 recognised before visual field changes.5 However, the diverse range of optic disc appearances in a
33 healthy population, combined with the many ways in which glaucomatous damage may affect the
34 appearance of disc, make it difficult to detect features of early damage.6, 7

35 While several imaging technologies have been developed in the last decades (confocal scanning laser
36 tomography, nerve fibre layer polarimetry, and optical coherence tomography) which provide
37 reproducible assessment of the optic disc and retinal nerve fibre layer, the diagnostic performances
38 of these technologies have not been consistently better than that achieved by clinicians.8-11 Subjective
39 assessment of the optic disc, either by slitlamp biomicroscopy or by inspection of photographs,
40 therefore still plays a pivotal role in the clinical care of patients at risk from glaucoma.8

41 Many papers describe the optic disc changes in glaucoma6, 7, 12-14 and several authors have looked at
42 either at the agreement between clinicians in diagnosing glaucoma, differentiating between different
43 types of optic disc damage, or in estimating specific parameters such as cup/disc ratios.15-25
44 However, because there is no objective reference standard for optic disc damage, it is difficult for
45 students, trainees, or clinicians to assess their judgments against an external reference.

46 In this paper, we describe a software package (“Discus”) which observers can use to view and
47 interpret a set of selected optic disc images under controlled conditions. We further present reference
48 data from 12 expert observers against which future observers can be evaluated, or evaluate
49 themselves.

50

3
50 Methods

51 Selection of Images
52 To obtain a set of optic disc images with a wide spectrum of early glaucomatous damage, data were
53 selected from patients who had attended the Optometrist-lead Glaucoma Assessment (OLGA) clinics
54 at the Royal Eye Hospital (Manchester, UK) between June 2003 and May 2007. This clinic sees
55 patients who are deemed at risk of developing glaucoma, for example due to ocular hypertension, or
56 who have glaucoma but are thought of as being at low risk of progression and are well controlled on
57 medical therapy. Patients undergo regular examinations (normally in intervals of 6 months) by
58 specifically trained optometrists. During each visit, visual field examinations (Humphrey Field
59 Analyzer program 24-2, SITA-Standard) and non-stereoscopic fundus photography are performed
60 (Topcon TRC-50EX, field-of-view 20 degrees, resolution 2000×1312 pixels, 24 bit colour).

61 For this study, images were considered for inclusion if the patient had undergone at least 4 visual
62 field tests on each eye (n=665). The 4 most recent visual fields were then analysed to establish two
63 distinct groups, visual field (VF-) positive and VF-negative (Table 1). Images from patients who did
64 not meet the criteria of either group were excluded.

Table 1: Inclusion criteria for VF-positive and VF-negative groups. For inclusion in the VF-negative
group, the criteria had to be met with both eyes. In addition, the between-eye differences in MD and PSD
had to be less than 1.0 dB

MD PSD

VF-positive between -2.5 and -10.0 dB between 3.0 and 15.0 dB

VF-negative better than [>] -1.5 dB 1 better than [<] 2.0 dB 1

65 If both eyes of a patient met these criteria, a single eye was randomly selected. A small number of
66 eyes (n=17) were excluded owing to clearly non-glaucomatous visual field loss (for example,
67 hemianopia) or non-glaucomatous lesions visible on the fundus photographs (eg chorioretinal scars).
68 There were 155 eyes in the VF-positive and 144 eyes in the VF-negative group.

69 To eliminate any potential clues other than glaucomatous optic disc damage, we matched the image
70 quality in VF-negative and VF-positive groups. One of the authors (DE) viewed the images on a
71 computer monitor in random order and graded each one on a five-point scale for focus and
72 uniformity of illumination. During grading, the observer was unaware of the status of the image (VF-
73 positive or -negative), and the area of the disc had been masked from view. A final set of 20 VF-
74 positive images and 80 VF-negative images was then created such that the distribution of image
75 quality was similar in both groups (Table 2). The total size of the image set (100), and the ratio of

4
76 VF-positive to VF-negative images (20:80), had been decided on beforehand to limit the duration of
77 the experiments and to keep the emphasis on discs with early damage.

Table 2: Characteristics of VF-positive and VF-negative groups


Image quality was scored subjectively on a scale from 1 to 5. Differences between groups were tested for
statistical significance by Mann-Whitney U (MWU) tests.

Image Quality Age, y MD, dB PSD

VF-positive (n=20) 1.82 (1.20) 66.0 (13.1) -6.20 (1.76) 5.58 (2.15)

VF-negative (n=80) 1.68 (1.33) 61.3 (9.3) +0.60 (0.4) 1.50 (0.16)

p-value (MWU) 0.67 0.35 <0.001 <0.001

78 Expert Observers
79 For the present study, 12 expert observers (either glaucoma fellowship-trained ophthalmologists
80 working in glaucoma sub-speciality clinics (n=10) or scientists involved in research in the optic disc
81 in glaucoma (n=2) were selected as observers. Observers were approached ad-hoc during scientific
82 meetings or contacted by e-mail or letter with a request for participation.

83 Prior to the experiments, the observers were given written instructions detailing the selection of the
84 image set. The instructions also stipulated that responses should be given on the basis of apparent
85 optic disc damage rather than the perceived likelihood of visual field damage.

86

5
86 Experiments
87 In order to present images under controlled conditions, and to collect the observers’ responses, a
88 software package Discus (3.0E, figure 1) was developed in Delphi (CodeGear, San Francisco, CA).
89 Details on availability and configuration of the software are provided in the Appendix.

90 The software displayed the images, in random order, on a computer monitor. After the observer had
91 triggered a new presentation by hitting the “Next” button, an image was displayed until the observer
92 responded by clicking one of 5 buttons (definitely healthy, probably healthy, not sure, probably
93 damaged, definitely damaged). After a time-out period of 60 seconds the image would disappear, but
94 observers were allowed unlimited time to give a response. To guard against occasional finger-errors,
95 observers were also allowed to change their response, as long as this occurred before the “Next”
96 button was hit.

97 To assess the consistency of the observers, 26 images were presented twice (2 in the VF-positive
98 group, 24 in the VF-negative group). No feedback was provided during the sessions.

Fig 1: Screenshot of Discus


software.
Images remained on display for
up to 60 seconds, or until the
observer clicked on one of the 5
response categories. A new
presentation was triggered by
hitting the “Next” button.

99

100 Analysis
101 The responses were transformed to a numerical scale ranging from -2 (“definitely healthy”) to +2
102 (“definitely damaged”. The proportion of repeated images in which the responses differed by one or
103 more categories was calculated, for each observer. For all subsequent analyses, however, only the
104 last of the two responses was used. All analyses were carried out in the freely available open-source
105 environment R, and the ROCR library was used to plot the ROC curves.26, 27

6
106 Individual observers’ ROC curves
107 To obtain an objective measure of individual observers’ performance at discriminating between eyes
108 with and without visual field damage, ROC curves were derived from each set of responses. For this
109 analysis, the visual field status was the reference standard, and responses in the “not sure” category
110 were interpreted as between “probably healthy” and “probably damaged”. If an observer had used all
111 five response categories, the ROC curve would contain 4 points (A – D). Point A, the most
112 conservative criterion (most specific but least sensitive) gave the sensitivity and specificity to visual
113 field damage when only the “definitely damaged” responses were treated as test positives while all
114 other responses (“probably damaged”, “not sure”, “probably healthy”, “definitely healthy”) were
115 interpreted as test negatives. For point D, the least conservative criterion (most sensitive but least
116 specific), only “definitely healthy” responses were interpreted as test negatives, and all other
117 responses as test positives.

118 Individual observers’ criteria


119 When using a subjective scale, as in the current study, the responses are dependent on the observer’s
120 interpretation of the categories and their individual inclination to respond with “probably damaged”
121 or “definitely damaged” (response criterion). A cautious observer, for example, might regard a
122 particular ONH as “probably damaged” whilst an equally skilled but less cautious observer might
123 respond with “not sure” or “probably healthy”. To investigate the variation in criteria within our
124 group, we compared the observers’ mean responses across the entire image set.

125 Combining responses of expert observers


126 To estimate the performance of a panel of experts, and to obtain a reference other than visual field
127 damage for judging current as well as future observer’s responses, the mean response of the 12
128 expert observers was calculated for each of the 100 images.

129 To estimate if the expert group (n=12) was sufficiently large, we investigated how the performance
130 of the combined panel changed depending on the number of included observers. Areas under the
131 ROC curve were calculated for all possible combinations of 2, 3, 4…11 observers to derive the mean
132 performance, as well as the minimum and maximum.

133 Relationship between responses of individual observers and expert panel


134 As a measure of overall agreement between the expert observers, independent of their individual
135 response criteria, the Spearman rank correlation coefficient between the 12 sets of responses was
136 computed. The underlying rationale of this analysis is that, by assigning each image to one of five
137 ordinal categories, each observer had in fact ranked the 100 images. If two observers had performed
138 identical ranking, the Spearman coefficient would be 1, regardless of the actual responses assigned.

7
139 Results

140 The experiments took between 13 and 46 minutes (mean, 29 min) to complete. On average, the
141 observers responded 7 seconds after the images were first presented on the screen, and the median
142 response latencies of individual observers ranged from 4 to 16 seconds. The reproducibility of
143 individual observer’s responses was moderate - on average, discrepancies of one category were seen
144 in 44% (12) of 26 repeated images (range, 23 – 62%).

145 Individual observers’ results are shown in Fig. 2A-L. The points labelled A, B, C, and D represent
146 the trade-off between the positive rates in the VF-positive (vertical axis) and VF-negative groups
147 (horizontal axis) achieved with the four possible classification criteria. Point A, for example, shows
148 the trade-off when only discs in the “definitely damaged” category are regarded as test-positives.
149 Point B gives the trade-off when discs in both “definitely damaged” and “probably damaged”
150 categories are regarded as test-positives. For D, the least conservative criterion, only responses of
151 “definitely healthy” were interpreted as negatives. To indicate the precision of these estimates, the
152 95% confidence intervals were added to point B.

153 Areas under the curve (AUROC) ranged from 0.71 (95% CI, 0.58, 0.85) to 0.88 (95% CI, 0.82,
154 0.96), with a mean of 0.79. There was no relationship between observers’ overall performance and
155 their median response latency (Spearman’s rho = 0.34, p = 0.29).

156 In contrast to their similar overall performance, the observers’ response criteria differed substantially
157 (p<0.001, Friedman test). For example, the proportion of discs in the VF-positive category which
158 were classified as “definitely damaged” ranged from 15% to 90%, while the proportion of discs in
159 the VF-negative category classified as “definitely healthy” ranged from 8% to 68%. In Fig 2A-L, the
160 response criterion is represented by the inclination of the red line with its origin in the bottom right
161 corner. If the responses had been exactly balanced between the “damaged” and “healthy” categories,
162 the inclination of the line would be 45 degrees. A more horizontal line represents a more
163 conservative criterion (less likely to respond with “probably damaged” or “definitely damaged”,
164 while a more vertical line represents a less conservative criterion. There was no relationship between
165 the observers’ performance (AUROC) and their response criterion (Spearman’s rho 0.41, p = 0.18).

166 To derive the “best possible” performance as a reference for future observers, the responses of the
167 expert panel were combined by calculating the mean response obtained for each image. The ROC
168 curve for the combined responses (grey curve in Fig. 2A-L) enclosed an area of 0.87.

169

8
169

Fig. 2. Receiver-operating characteristic (ROC) curves for the classification of optic disc photographs
by the 12 expert observers (A-L), with a reference standard of visual field damage. The x-axis (positive
rate in VF-negative group) measures specificity to visual field damage, while the y-axis (positive rate in
the VF-positive group) gives the sensitivity. Point A (most conservative criterion) shows the trade-off
between sensitivity and specificity when only“definitely damaged” responses are interpreted as test
positives. For point D (the least conservative criterion) shows the trade-off when all but “definitely
healthy” responses are interpreted as test positives. Boxplots (right) give the distributions of response
latencies, and the number of times each response was selected.

170
9
170

10
Fig. 2 (cont). To facilitate comparison, the grey ROC curve, and the dotted grey line, represent the
performance and the criterion of the group as a whole, respectively. Results provided in numerical
format are the area under the ROC curve (AUC), the percentage of the AUC as compared to that of the
entire group (individual ROC area – 0.5) / (expert panel ROC area – 0.5), the Spearman rank
correlation of the individual’s responses with those of the entire group, the mean difference between
repeated responses, and the average response as a measure of criterion (-2=”definitely healthy”,
-1=”probably healthy”, 0=”not sure”, 1=”probably damaged”, and 2=”definitely damaged”.

171

11
171 To investigate how the performance of an expert panel varies with the number of contributing
172 observers, the area under the ROC curve was derived for all possible combinations of 2, 3, 4, etc, up
173 to 11, observers (Fig. 5). The limit of the ROC area was approached with 6 or more observers, and it
174 appeared that a further increase in the number of observers would not have had a substantial effect
175 on the performance of the panel.

Fig. 3
Performance (area under ROC curve) of
combined expert panel as a function of
included observers. All possible
combinations of 2 to 11 observers were
evaluated. The mean area under the ROC
curve approaches its limit with
approximately 6 observers.

176

177 Individual observers’ Spearman rank correlation coefficient with the combined expert panel ranged
178 from 0.62 to 0.86, with a median of 0.79. There was no relationship between the Spearman
179 coefficient and the area under the ROC curve (r = 0.09, p = 0.78).

180

181

12
181 Discussion

182 The objective of this work was to establish an easy-to-use tool for clinicians, trainees, and students to
183 assess their skill at interpreting optic discs for signs of glaucoma-related damage, and to provide data
184 from a panel of experts as a reference for future observers. The study also showed that meaningful
185 experiments with Discus can be performed within a relatively short time.

186 All observers in this study had ROC areas significantly smaller than 1, and even when the judgments
187 of the observers were averaged, the combined responses of the panel failed to discriminate perfectly
188 between optic discs in the VF-positive and VF-negative groups. These findings are not surprising,
189 given the lack of a strong association between structure and function in early glaucoma that has been
190 reported by many previous studies.28-33 However, the experiments provide a powerful illustration of
191 how difficult it is to make diagnostic decisions in glaucoma based solely on the optic disc.

192 Estimated at a specificity fixed at 90%, the combined panel’s sensitivity to visual field loss was 60%.
193 This is within the range of performances previously reported for clinical observers and objective
194 imaging tools.9, 34-37 Unfortunately, objective imaging data are not available for the patients in the
195 current dataset and we are therefore unable to perform a direct comparison. However, the
196 methodology developed in this paper may prove useful for future studies that compare diagnostic
197 performance between clinicians and imaging tools in different clinical settings. A potential weakness
198 of our study was the relatively small size of the expert group (n = 12). However, by averaging every
199 possible combination of 2 to 11 observers within the group, we demonstrated that our panel was
200 likely to have attained near-maximum performance, and that a larger group of observers was unlikely
201 to have changed our findings substantially.

202 One challenging issue is how to derive complete and easily interpretable summary measures of
203 performance, in the absence of a reference standard of optic disc damage. Such summary measures
204 would be useful for giving feedback and for establishing targets for students and trainees. We used
205 visual field data as the criterion to separate optic disc images into VF-positive and VF-negative
206 groups, and there was no selection based on the presence or type of optic disc damage which would
207 have biased our sample.38-40 The ROC area therefore measures the statistical separation between an
208 observer’s responses to optic discs in eyes with and without visual field damage.41, 42 However,
209 owing to the lack of a strong correlation between structure and function, visual field loss is not an
210 ideal metric for optic disc damage in early glaucoma. For example, it is likely that a substantial
211 proportion of the VF-negative images show early structural damage, whereas some optic discs in the
212 VF-positive group may still appear healthy.

213 We have attempted to address the problem of a lacking reference standard in two complementary
214 ways. First, a new observer’s ROC area can be compared to that of the expert panel, such that the

13
215 statistic is re-scaled to cover a potential range from near zero (corresponding to chance performance,
216 AUROC = 0.5) to around 100% (AUROC = 0.87, performance of expert panel).

217 Second, we suggest that the Spearman rank correlation coefficient may be useful as a measure of
218 agreement between a future observer’s responses and those of the expert panel.43 Because this
219 coefficient takes into account the relative ranking of the responses, and not their overall magnitude, it
220 is independent of the observer’s response criterion. Consider, for example, three images graded as
221 “probably damaged”, “probably healthy”, and “definitely healthy” by the expert group. An observer
222 responding with “definitely damaged”, “not sure”, and “probably healthy” would differ in criterion
223 but agree on the relative ranking of damage, and their rank correlation with the expert panel would
224 be 1.0 (perfect). Our data suggest that observers may achieve similar ROC areas with rather different
225 responses (consider observers D and F as an example), and the lack of association between the ROC
226 area and the rank correlation means that these statistics measure somewhat independent aspects of
227 decision-making.

228 A surprising finding was that individual observers in our study adopted very different response
229 criteria, even though they had been provided with identical written instructions and identical
230 information on the source of the images and the distribution of visual field damage in the sample
231 (compare observers A and E, for example). It is possible that we might have been able to control the
232 criteria more closely, for example by instructing observers to use the “probably damaged” category if
233 they believed that the chances for the eye to be healthy were less than, say, 10%. More importantly,
234 however, our findings underscore the need to distinguish between differences in diagnostic
235 performance, and differences in diagnostic criterion, whenever subjective ratings of optic disc
236 damage are involved. This is the principal reason for why we avoided the use of kappa statistics
237 which measure overall agreement but do not isolate differences in criterion.44, 45

238 The outpatient clinic from which our images were obtained sees a relatively high proportion of
239 patients suspected of having glaucoma who do not have visual field loss. Because our image sample
240 is not representative of an unselected population, the ROC curves are likely to underestimate
241 clinicians’ true performance at detecting glaucoma by ophthalmoscopy. However, the use of a
242 “difficult” data set may also be seen as an advantage as it allows observers’ performance to be
243 assessed on the type of optic disc more likely to cause diagnostic problems in clinical practice.

244 In addition to the source of our images, here are several other reasons for why the performance on
245 Discus should not be regarded as providing a truly representative measure of an observer’s real-
246 world diagnostic capability. First, we used non-stereoscopic images. Stereoscopic images would
247 have been more representative of slitlamp biomicroscopy, the current standard of care, and there is
248 evidence that many features of glaucomatous damage may be more clearly apparent in stereoscopic
249 images.46 However, the gain over monoscopic images is probably not large.47-50 Second, Discus does
14
250 not permit a comparison of fellow eyes which often provides important clues in patients with early
251 damage.51 Third, through the display of photographic images on a computer monitor we can not
252 assess an observer’s aptitude at obtaining an adequate view of the optic disc in real patients.
253 Notwithstanding these limitations, we believe that Discus provides a useful assessment of some
254 important aspects of recognising glaucomatous optic disc damage. Further studies with Discus are
255 now being undertaken to examine the performance of ophthalmology residents and other trainees as
256 compared to our expert group. These studies will also provide insight into which features of
257 glaucomatous optic disc damage are least well recognised, and how clinicians use information on
258 prior probability in their clinical decision-making.

259 Conclusions

260 The Discus software may be useful in the assessment and training of clinicians involved in the
261 detection of glaucoma. It is freely available from http://discusproject.blogspot.com, and interested
262 users may analyse their results using an automated web server on this site.

263 Acknowledgements

264 Robert Harper, Amanda Harding and Jo Marcks of the OLGA clinic at the Manchester Royal Eye
265 Hospital supported this project and contributed ideas. Jonathan Layes (Medicine) and Bijan Farhoudi
266 (Computer Science) of Dalhousie University helped to improve the software and to implement an
267 automated analysis on our server. We are most grateful to all 12 anonymous observers for their
268 participation.

269 Appendix

270 At present, Discus is available only for the Windows operating systems. The software can be called
271 with different start-up parameters. These parameters (and their defaults) are:

272 1) Duration of image presentations, in ms (10000)


273 2) Rate of Repetitions in the visual field positive group (0.1)
274 3) Rate of Repetitions in the visual field negative group (0.3)
275 4) Save-To-Desktop status (1)
276 If the Save-To-Desktop status is set to 1, a tab delimited file will be saved to the desktop. The user
277 can then upload this file to our server and retrieve their results after a few seconds.

278

279

15
279 References

280

281 1. Weinreb RN, Tee Khaw P. Primary open-angle glaucoma. Lancet 2004;363:1711-1720.
282 2. Garway-Heath DF. Early diagnosis in glaucoma. In: Nucci C, Cerulli L, Osborne NN,
283 Bagetta G (eds), Progress in Brain Research; 2008:47-57.
284 3. Gordon MO, Beiser JA, Brandt JD, et al. The Ocular Hypertension Treatment Study:
285 Baseline Factors That Predict the Onset of Primary Open-Angle Glaucoma. Archives of
286 Ophthalmology 2002;120:714.
287 4. Keltner JL, Johnson CA, Anderson DR, et al. The association between glaucomatous visual
288 fields and optic nerve head features in the Ocular Hypertension Treatment Study.
289 Ophthalmology 2006;113:1603-1612.
290 5. Predictive Factors for Open-Angle Glaucoma among Patients with Ocular Hypertension in
291 the European Glaucoma Prevention Study. Ophthalmology 2007;114:3-9.
292 6. Broadway DC, Nicolela MT, Drance SM. Optic Disk Appearances in Primary Open-Angle
293 Glaucoma. Survey of Ophthalmology 1999;43:223-243.
294 7. Jonas JB, Budde WM, Panda-Jonas S. Ophthalmoscopic evaluation of the optic nerve head.
295 Survey of Ophthalmology 1999;43:293-320.
296 8. Lin SC, Singh K, Jampel HD, et al. Optic Nerve Head and Retinal Nerve Fiber Layer
297 Analysis: A Report by the American Academy of Ophthalmology. Ophthalmology
298 2007;114:1937-1949.
299 9. Sharma P, Sample PA, Zangwill LM, Schuman JS. Diagnostic Tools for Glaucoma Detection
300 and Management. Survey of Ophthalmology 2008;53.
301 10. Zangwill LM, Bowd C, Weinreb RN. Evaluating the Optic Disc and Retinal Nerve Fiber
302 Layer in Glaucoma II: Optical Image Analysis. Seminars in Ophthalmology 2000;15:206 -
303 220.
304 11. Mowatt G, Burr JM, Cook JA, et al. Screening Tests for Detecting Open-Angle Glaucoma:
305 Systematic Review and Meta-analysis. Invest Ophthalmol Vis Sci 2008;49:5373-5385.
306 12. Fingeret M, Medeiros FA, Susanna Jr R, Weinreb RN. Five rules to evaluate the optic disc
307 and retinal nerve fiber layer for glaucoma. Optometry 2005;76:661-668.
308 13. Susanna Jr R, Vessani RM. New findings in the evaluation of the optic disc in glaucoma
309 diagnosis. Current Opinion in Ophthalmology 2007;18:122-128.
310 14. Caprioli J. Clinical evaluation of the optic nerve in glaucoma. Transactions of the American
311 Ophthalmological Society 1994;92:589.
312 15. Lichter PR. Variability of expert observers in evaluating the optic disc. Transactions of the
313 American Ophthalmological Society 1976;74:532.
314 16. Tielsch JM, Katz J, Quigley HA, Miller NR, Sommer A. Intraobserver and interobserver
315 agreement in measurement of optic disc characteristics. Ophthalmology 1988;95:350-356.
316 17. Nicolela MT, Drance SM, Broadway DC, Chauhan BC, McCormick TA, LeBlanc RP.
317 Agreement among clinicians in the recognition of patterns of optic disk damage in glaucoma.
318 American journal of ophthalmology 2001;132:836-844.
319 18. Spalding JM, Litwak AB, Shufelt CL. Optic nerve evaluation among optometrists. Optom Vis
320 Sci 2000;77:446-452.
321 19. Harper R, Reeves B, Smith G. Observer variability in optic disc assessment: implications for
322 glaucoma shared care. Ophthalmic Physiol Opt 2000;20:265-273.
16
323 20. Harper R, Radi N, Reeves BC, Fenerty C, Spencer AF, Batterbury M. Agreement between
324 ophthalmologists and optometrists in optic disc assessment: training implications for
325 glaucoma co-management. Graefes Archive Clin Exp Ophthalmol 2001;239:342-350.
326 21. Spry PG, Spencer IC, Sparrow JM, et al. The Bristol Shared Care Glaucoma Study: reliability
327 of community optometric and hospital eye service test measures. The British journal of
328 ophthalmology 1999;83:707-712.
329 22. Abrams LS, Scott IU, Spaeth GL, Quigley HA, Varma R. Agreement among optometrists,
330 ophthalmologists, and residents in evaluating the optic disc for glaucoma. Ophthalmology
331 1994;101:1662-1667.
332 23. Varma R, Steinmann WC, Scott IU. Expert agreement in evaluating the optic disc for
333 glaucoma. Ophthalmology 1992;99:215-221.
334 24. Azuara-Blanco A, Katz LJ, Spaeth GL, Vernon SA, Spencer F, Lanzl IM. Clinical agreement
335 among glaucoma experts in the detection of glaucomatous changes of the optic disk using
336 simultaneous stereoscopic photographs. American journal of ophthalmology 2003;136:949-
337 950.
338 25. Sung VCT, Bhan A, Vernon SA. Agreement in assessing optic discs with a digital
339 stereoscopic optic disc camera (Discam) and Heidelberg retina tomograph. BMJ; 2002:196-
340 202.
341 26. Ihaka R, Gentleman R. R: A Language for Data Analysis and Graphics. Journal of
342 Computational and Graphical Statistics 1996;5:299-314.
343 27. Sing T, Sander O, Beerenwinkel N, Lengauer T. ROCR: visualizing classifier performance in
344 R. Bioinformatics 2005;21:3940-3941.
345 28. Anderson RS. The psychophysics of glaucoma: Improving the structure/function relationship.
346 Progress in Retinal and Eye Research 2006;25:79-97.
347 29. Garway-Heath DF, Holder GE, Fitzke FW, Hitchings RA. Relationship between
348 electrophysiological, psychophysical, and anatomical measurements in glaucoma.
349 Investigative Ophthalmology and Visual Science 2002;43:2213-2220.
350 30. Johnson CA, Cioffi GA, Liebmann JR, Sample PA, Zangwill LM, Weinreb RN. The
351 relationship between structural and functional alterations in glaucoma: A review. Seminars in
352 Ophthalmology 2000;15:221-233.
353 31. Harwerth RS, Quigley HA. Visual field defects and retinal ganglion cell losses in patients
354 with glaucoma. Archives of Ophthalmology 2006;124:853-859.
355 32. Caprioli J. Correlation of visual function with optic nerve and nerve fiber layer structure in
356 glaucoma. Survey of Ophthalmology 1989;33:319-330.
357 33. Caprioli J, Miller JM. Correlation of structure and function in glaucoma. Quantitative
358 measurements of disc and field. Ophthalmology 1988;95:723-727.
359 34. Deleon-Ortega JE, Arthur SN, McGwin Jr G, Xie A, Monheit BE, Girkin CA. Discrimination
360 between glaucomatous and nonglaucomatous eyes using quantitative imaging devices and
361 subjective optic nerve head assessment. Invest Ophthalmol Vis Sci 2006;47:3374-3380.
362 35. Mardin CY, Jünemann AGM. The diagnostic value of optic nerve imaging in early glaucoma.
363 Current Opinion in Ophthalmology 2001;12:100-104.
364 36. Greaney MJ, Hoffman DC, Garway-Heath DF, Nakla M, Coleman AL, Caprioli J.
365 Comparison of optic nerve imaging methods to distinguish normal eyes from those with
366 glaucoma. Investigative Ophthalmology and Visual Science 2002;43:140-145.

17
367 37. Harper R, Reeves B. The sensitivity and specificity of direct ophthalmoscopic optic disc
368 assessment in screening for glaucoma: a multivariate analysis. Graefe's Archive for Clinical
369 and Experimental Ophthalmology 2000;238:949-955.
370 38. Whiting P, Rutjes AWS, Reitsma JB, Glas AS, Bossuyt PMM, Kleijnen J. Sources of
371 Variation and Bias in Studies of Diagnostic Accuracy: A Systematic Review. Annals of
372 Internal Medicine 2004;140:189-202.
373 39. Medeiros FA, Ng D, Zangwill LM, Sample PA, Bowd C, Weinreb RN. The effects of study
374 design and spectrum bias on the evaluation of diagnostic accuracy of confocal scanning laser
375 ophthalmoscopy in glaucoma. Investigative Ophthalmology and Visual Science 2007;48:214-
376 222.
377 40. Harper R, Henson D, Reeves BC. Appraising evaluations of screening/diagnostic tests: the
378 importance of the study populations. British Journal of Ophthalmology 2000;84:1198.
379 41. Hanley JA. Receiver operating characteristic (ROC) methodology: The state of the art.
380 Critical Reviews in Diagnostic Imaging 1989;29:307-335.
381 42. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating
382 characteristic (ROC) curve. Radiology 1982;143:29-36.
383 43. Svensson E. A coefficient of agreement adjusted for bias in paired ordered categorical data.
384 Biometrical journal 1997;39:643-657.
385 44. Fleiss JL. Measuring nominal scale agreement among many raters. Psychological Bulletin
386 1971;76:378-382.
387 45. Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two
388 paradoxes. J Clin Epidemiol 1990;43:543-549.
389 46. Morgan JE, Sheen NJL, North RV, Choong Y, Ansari E. Digital imaging of the optic nerve
390 head: Monoscopic and stereoscopic analysis. British Journal of Ophthalmology 2005;89:879-
391 884.
392 47. Hrynchak P, Hutchings N, Jones D, Simpson T. A comparison of cup-to-disc ratio
393 measurement in normal subjects using optical coherence tomography image analysis of the
394 optic nerve head and stereo fundus biomicroscopy. Ophthalmic and Physiological Optics
395 2004;24:543-550.
396 48. Parkin B, Shuttleworth G, Costen M, Davison C. A comparison of stereoscopic and
397 monoscopic evaluation of optic disc topography using a digital optic disc stereo camera.
398 BMJ; 2001:1347-1351.
399 49. Vingrys AJ, Helfrich KA, Smith G. The role that binocular vision and stereopsis have in
400 evaluating fundus features. Optom Vis Sci 1994;71:508-515.
401 50. Rumsey KE, Rumsey JM, Leach NE. Monocular vs. stereospecific measurement of cup-to-
402 disc ratios. Optometry and Vision Science 1990;67:546-550.
403 51. Harasymowycz P, Davis B, Xu G, Myers J, Bayer A, Spaeth GL. The use of RADAAR (ratio
404 of rim area to disc area asymmetry) in detecting glaucoma and its severity. Canadian Journal
405 of Ophthalmology 2004;39:240-244.
406

407

18

S-ar putea să vă placă și