Sunteți pe pagina 1din 7

Ensuring reliability of outcome measures in multicenter clinical trials of treatments

for acute ischemic stroke. The program developed for the Trial of Org 10172 in
Acute Stroke Treatment (TOAST)
MA Albanese, WR Clarke, HP Adams, Jr and RF Woolson
Stroke 1994;25;1746-1751
Stroke is published by the American Heart Association. 7272 Greenville Avenue, Dallas, TX 72514
Copyright © 1994 American Heart Association. All rights reserved. Print ISSN: 0039-2499. Online ISSN:
1524-4628

The online version of this article, along with updated information and services, is
located on the World Wide Web at:
http://stroke.ahajournals.org

Subscriptions: Information about subscribing to Stroke is online at


http://stroke.ahajournals.org/subscriptions/

Permissions: Permissions & Rights Desk, Lippincott Williams & Wilkins, a division of Wolters
Kluwer Health, 351 West Camden Street, Baltimore, MD 21202-2436. Phone: 410-528-4050. Fax:
410-528-8550. E-mail:
journalpermissions@lww.com

Reprints: Information about reprints can be found online at


http://www.lww.com/reprints

Downloaded from stroke.ahajournals.org by on April 8, 2011


1746

Ensuring Reliability of Outcome Measures in


Multicenter Clinical Trials of Treatments for
Acute Ischemic Stroke
The Program Developed for the Trial of ORG 10172 in Acute
Stroke Treatment (TOAST)
Mark A. Albanese, PhD; William R. Clarke, PhD; Harold P. Adams, Jr, MD;
Robert F. Woolson, PhD; and TOAST Investigators

Background and Purpose Ensuring the reliability and valid- other physicians were able to complete the process after
ity of outcome measures used in clinical trials is essential to remediation. The intraclass correlations of both the NIH
the success of the trial. The Trial of Org 10172 in Acute Stroke Stroke Scale and supplemental motor examination exceeded
Treatment (TOAST) is a multicenter clinical trial that is 0.95. The K values for the Glasgow Outcome Scale were 0.61
recruiting patients with acute ischemic stroke seen at medical and 0.62 for the first and second ratings of the videotape,
centers across the United States. respectively.
Methods This paper describes an approach to train physi- Conclusions Our experience suggests that a program that
cians to use three clinical measures: the National Institutes of includes educational and certification processes can be per-
Health (NIH) Stroke Scale, a supplemental motor examina- formed as part of the design of a multicenter clinical trial. The
tion, and the Glasgow Outcome Scale. The program included method of providing educational and testing videotapes to
education, certification, remediation when needed, monitor- each site so that physicians can be trained and certified is an
ing, and reliability assessment. The goal was to ensure that effective, inexpensive, and practical approach for enhancing
interrater assessments were as equivalent to one another as and certifying the expertise of the large number of physicians
possible. involved in a multicenter study. (Stroke. 1994^5:1746-1751.)
Results Of the first 95 clinicians who began the certification Key Words • clinical trials • stroke assessment • stroke
process, 75 passed during the first evaluation. Eighteen of the outcome

data.3 While a K value of 0.4 to 0.6 might be considered

A
clinical trial testing alternative treatments for
acute ischemic stroke needs to ensure the qual- "moderate" agreement, it reflects far less than perfect
. ity of all outcome data. Even relatively objec- consistency of interpretation between experts.4
tive data such as blood concentrations of drug are Keeping such inconsistencies to a minimum can be
dependent on the method of determination and have highly advantageous in the performance of a clinical
been inconsistent when sampled at different times.1 The trial. Bellamy et al5 developed a program to standardize
problem becomes even more complex when assessments use of a battery of assessments for osteoarthritis among
depend on judgments based on clinical observations. rheumatologists. The program included a 2-hour orien-
Differences in expertise and other characteristics spe- tation session, evaluation of six patients using the rating
cific to an observer can interfere with the accuracy of instruments, a meeting to discuss and resolve discrep-
such ratings. For example, Azen et al2 noted that
ancies, and another examination of the six patients. This
interrater agreement (/c value) among experienced oph-
thalmologists who independently rated photographs of process resulted in marked increases in consistency
the ocular fundus for proliferative retinopathy ranged among raters. The authors concluded that such a rigor-
from 0.41 to 0.53 on a scale of 0 to 1.0. We previously ous training process might allow the reduction of sample
reported that interphysician agreement in the diagnosis sizes for clinical trials by as much as 50% without any
of a subtype of ischemic stroke can be difficult to lessening of statistical power.
achieve even when all physicians are reviewing identical The key outcome measures of patients with stroke are
clinical, including reduction in mortality and a better
quality of life among survivors. The benchmarks of
Received April 29, 1994; final revision received June 13, 1994; these outcome measures are global judgments of dis-
accepted June 14, 1994.
From the Office of Consultation and Research in Medical ability, neurological impairment, and the extent to
Education (M.A.A.), Division of Biostatistics, Department of which the patient is able to carry on with life as before
Preventive Medicine (W.R.C., R.F.W.), and Division of Cere- stroke. Determinations of the validity and reliability of
brovascular Diseases, Department of Neurology (H.P.A.), Univer- judgments of these outcomes are important if the results
sity of Iowa College of Medicine, Iowa City.
of a trial are to be credible. Similarly, because of the
Reprint requests to Harold P. Adams, Jr, MD, Division of
Cerebrovascular Diseases, Department of Neurology, The Univer- large number of patients needed to test any intervention
sity of Iowa, Iowa City, Iowa 52242. for acute ischemic stroke, a multicenter project that may
© 1994 American Heart Association, Inc. span a continent and last several years is usually man-

Downloaded from stroke.ahajournals.org by on April 8, 2011


Albanese et al Ensuring Reliability of Outcome Measures 1747

dated. In this context, establishing and maintaining motor responses, or language. Studies of the reliability and
interrater agreement across sites and over time as validity of the scale have shown interrater agreement (K value)
examiners are replaced from attrition is critical. on the various items to range from 0.45 to 0.95." The scores on
The purposes of this article are to (1) describe the the baseline administration of the rating instrument have also
been correlated with computed tomographic measurements of
program used for training physicians to use the clinical
lesion volume at 7 days, a finding that demonstrates the
measures (National Institutes of Health [NIH] Stroke concurrent validity and the predictive validity of the scale.7
Scale, supplemental motor examination, and Glasgow Yet, physicians who use this scale must be familiar with the
Outcome Scale) for the clinical trial of stroke medica- subtleties of each item being examined.
tion ORG 10172, (2) report reliability data from its first Because many patients with stroke may have different
year of operation, (3) compare the reliability results degrees of weakness between proximal and distal limb muscles
with findings from other smaller-scale studies that used and because the NIH Stroke Scale does not include indepen-
approaches that would be impractical on a larger scale, dent assessments of proximal and distal motor function, the
and (4) discuss the feasibility of using this type of supplemental motor examination also was included in the
reliability program in future clinical trials. TOAST study. This eight-item scale has been used in other
clinical trials in stroke. It scores six grades of movement from
Methods normal to none at both shoulders, wrists, hips, and ankles.
The Trial of ORG 10172 in Acute Stroke Treatment
(TOAST) is a multicenter clinical trial that is recruiting
Reliability Studies Design
patients with acute ischemic stroke at several medical centers The plan for the reliability studies for the NIH Stroke Scale,
across the United States. The goal of the trial is to test the supplemental motor examination, and Glasgow Outcome
utility of ORG 10172 in improving outcome after acute Scale consisted of education, certification, remediation when
ischemic stroke. ORG 10172 is a low-molecular weight non- needed, monitoring, and reliability assessment.
heparin-containing glycosaminoglycuran that has selective an-
tithromboh/tic effects. As clinical measures, TOAST uses a Education
modified Barthel Index, NIH Stroke Scale, a supplemental The educational program consisted of an instructional man-
motor examination, and the Glasgow Outcome Scale.6-7 ual with detailed definitions and directions for administration
of the three rating tools and an instructional videotape. The
Instruments videotape included a physician's examination of three patients
A modified Barthel Index, Glasgow Outcome Scale, and NIH with recent stroke who had a broad range of impairments. Our
Stroke Scale were selected based on an extensive review of the intention was to provide a representative range of examples of
literature and considerations of the expected effect of adminis- the many individual items. The physician demonstrated tech-
tering ORG 10172. The selection of these rating instruments niques to assess patient capability for each item on the NIH
was aimed at providing data on global outcome (Glasgow Stroke Scale, supplemental motor examination, and Glasgow
Outcome Scale), information about disability (modified Barthel Outcome Scale. The physician disclosed his specific scoring for
Index), and neurological impairments (NIH Stroke Scale and each item after a short period when the observer was allowed
supplemental motor examination). These are the data that to complete his/her own rating. This allowed the observers at
Candelise8 concluded are the true benchmarks for efficacy in each center to verify their own ratings using a common
patients with acute stroke. The scales were selected also be- standard score as a reference. The tape could be repeated, in
cause of their simplicity and ease in administration. whole or in part, as the trainee desired.
The modified Barthel Index is a 10-item checklist that is
given to the patient or to a person who is aware of the patient's Certification and Remediation
living conditions. It can be performed during a face-to-face Physicians were required to demonstrate proficiency in the
interview or through a telephone call. It asks straightforward use of the scales before they could examine patients enrolled
questions about the patient's ability to perform tasks such as in the trial. The certification process consisted of viewing and
going to the bathroom or dressing. Because it is an interview rating a videotape of six patients being examined using the
with specified alternatives, it requires no special expertise to NIH Stroke Scale, supplemental motor examination, and
complete. Little training is needed beyond familiarizing per- Glasgow Outcome Scale. All patients were representative of
sonnel with the questions and the criteria for each response. those who would be eligible for the TOAST study. Physician
For this reason, it was not included as part of the reliability ratings were matched against a standard of performance
studies in the trial and will not be considered further. established by physicians at the clinical coordinating center.
The Glasgow Outcome Scale provides a global rating of the Physicians who met or exceeded the standard of performance
patient's status. It is a single item that has five possible were certified, whereas those who did not do well on the first
classifications of patient status: (1) no or minimal disability or testing were asked to repeat part or all of the educational and
handicap, (2) moderate disability, (3) severe disability, (4) certification process. The repeated portions were those that
persistent vegetative state, and (5) death. Although this rating dealt with areas in which they departed from the standard. The
instrument was developed to rank outcomes after head inju- standard used in the first-evaluation tape rating was also used
ries, it also has been used in trials of interventions for stroke.9 in determining certification after remediation.
It is very well known by clinicians, and it provides a global Whereas the certification process was unambiguous, setting
classification of the patient's status. Although it may not be as the standard of performance was complicated. The standard
sensitive as other scales, it does differentiate clinically impor- was based on the scores of the physician who examined the
tant differences in outcome. Wade10 recently concluded that it patient. Because the scores were obtained during the exami-
is a reasonable measure to estimate outcomes among patients nation of a live patient and might be influenced by features
with acute neurological diseases. While each of the grades in that were not obvious on the videotape, the scores were
the Glasgow Outcome Scale has been defined, it still requires adjusted by the marks made by other physicians at the clinical
physician interpretation; thus, a trial should include methods coordinating center who independently evaluated each item
to teach physicians on the discriminating features for each on the three rating instruments while viewing the tapes. After
grade and to ensure interrater agreement. reviewing their differences in scoring, the physicians decided
The NIH Stroke Scale quantifies the neurological deficits what were acceptable deviations from the standard.
commonly found in patients with stroke. It consists of 15 Criteria for unacceptable deviations in scoring the three
independently scored items, such as level of consciousness, rating instruments were: (1) any total score derived from the

Downloaded from stroke.ahajournals.org by on April 8, 2011


1748 Stroke Vol 25, No 9 September 1994

TABLE 1. Reliability Results for Total Scores of 75 Raters for 6 Videotaped Patients

No. of Standard Irrterrater Intrarater


Scale Items Error* Rellabllrtyt Reliability!
NIH Stroke Scale 15 1.81 .96 .97
Supplemental motor examination 8 1.29 .98 .98
Glasgow Outcome Scale 1 .62 .75
NIH indicates National Institutes of Health.
•Standard error reflects the amount of variation in a single patient's score that could be expected on repeated
assessments.
tintraclass correlations for NIH Stroke Scale and supplemental motor examination, K values for Glasgow Outcome
Scale.

three different rating instruments that deviated by four or staff time to remind the clinicians to complete the
more points in absolute value from the standard on any of the process. Reliability data are reported for only the 75
six patients, (2) scores of individual items that deviated by physicians who had a complete set of ratings. (Physi-
three or more points in absolute value from the standard on cians who failed to be certified on the first testing were
any of the six patients, (3) any individual item scores that
deviated from the standard by any amount for three or more of
omitted because their second evaluations of the entire
the six patients, or (4) Glasgow Outcome Scale scores that videotape were not in the same time frames as those
deviated from the standard by any amount for two or more of who were certified on their first attempt.)
the six patients. Table 1 shows the results of the reliability assess-
The individual physician's ratings were compared with the ments for the three rating instruments. The values for
standard using a form that displayed the four criteria for the NIH Stroke Scale and supplemental motor exami-
unacceptable deviations. Deviations in excess of an absolute nation are intraclass correlations based on summing all
value of more than three points in any of the six patients of the items on the respective instruments. The values
(criterion 2) were highlighted for discussion. Any physician for the Glasgow Outcome Scale are based on the K
whose ratings had not met any of the criteria for unacceptable statistic. The intraclass correlations of both the NIH
deviations was certified.
Stroke Scale and supplemental motor examination ex-
Monitoring ceeded 0.95. The K value for the Glasgow Outcome
Scale was 0.62. These values meet most definitions of
Because a drift in ratings can occur during the conduct of
the trial, a gradual deterioration in the quality of the data can acceptable reliabilities for research purposes, although
result. This is especially true when skills are not used on a dairy higher values for the Glasgow Outcome Scale would be
basis. Drift is being tested by two methods. The first was to desirable.
have physicians rate the tapes a second time from 1 to 3 weeks We also compared our data with those obtained in
after their certification. The second method, which is in earlier reliability studies, which were based on examina-
progress, involves selecting a random sample of 20 physicians tion of live patients. Table 2 compares the K values for
who were certified from 1 to 2 years earlier and having them each item on the NIH Stroke Scale in the present study
rerate the evaluation videotape. The results of these analyses
will be used to decide if and what type of ongoing monitoring
with those for live patients reported by Brott et al7 and
or retraining is necessary. Goldstein et al.13 In general, the reliabilities from Gold-
stein et al were lower than ours, which in turn were lower
Reliability Assessment than those of Brott et al. The items showing the consis-
As part of the data collection and review process, the tently highest reliabilities across the three studies were
reliability and reproducibility of scores were determined. assessments of the right arm and right leg. Those showing
Interrater agreement and reliability were calculated by com- the consistently lowest reliabilities were assessments of
paring each physician's scores with those of others. Intrarater facial movement, limb ataxia, and dysarthria.
reproducibility reflected score drift across time and, as de-
scribed above, was measured by comparing two sets of scores Discussion
of the six patients that were obtained approximately 1 to 3 Our study suggests that a program that includes
weeks apart. The second ratings were used to determine the educational and certification processes is feasible for
stability of ratings over time and were not part of the certifi-
ensuring the quality of data collected as part of the
cation process. Reliabilities for the overall scores derived from
the NIH Stroke Scale and the supplemental motor examina- design of a multicenter clinical trial. Furthermore, by
tion by summing individual item ratings were intraclass corre- providing educational and testing videotapes to each
lations.12 The values computed were estimates of the reliability site, physicians can be trained and certified at their
of a single rater evaluating a single patient. The reliabilities of convenience. This is a relatively inexpensive and prac-
scores for individual items were computed using the K value, a tical approach to certifying the expertise of the large
procedure applicable for categorical response data.7 number of physicians involved in a multicenter study.
The results shown in Table 2 suggest that the rating of
Results videotaped examinations can achieve a degree of reli-
Compliance by physicians with this demanding pro- ability that is comparable to the live clinical setting.
tocol was very good. Of the first 95 clinicians who began We concluded that the physicians in TOAST are
the certification process, 75 passed during the first generally well prepared to evaluate the videotaped
evaluation. Eighteen of the other physicians were able patients using the protocols for the NIH Stroke Scale,
to complete the process after remediation. While com- supplemental motor examination, and Glasgow Out-
pliance was good, it required a substantial amount of come Scale. Because this was not an experimental study

Downloaded from stroke.ahajournals.org by on April 8, 2011


Albanese et al Ensuring Reliability of Outcome Measures 1749

TABLE 2. NIH Stroke Scale Item Reliabilities (K Values)

Present Study Earlier Studies*

Interrater Interrater

Item First Rating Second Rating Intrsrstor Brott7 Goldstein11


Level of
consciousness .52 .44 .75 .49 .50
Questions .69 .65 .81 .80 .64
Commands .77 .71 .87 .58 .41
Gaze .74 .74 .84 .82 .33
Visual fields .80 .80 .89 .81 .57
Facial movement .30 .28 .59 .57 .22
Right arm .96 .95 .97 .85 .77
Left arm .77 .82 .86 t t
Right leg .81 .79 .87 .83 .78
Left leg .54 .49 .72 t t
LJmb ataxia .48 .47 .70 .57 -.16
Sensory .58 .59 .77 .60 .50
Best language .66 .65 .83 .64 .79
Dysarthria .52 .51 .76 .55 .32
Neglect .69 .66 .83 .58 .61
NIH indicates National Institutes of Health.
*The designs of the reliability studies and the methods used to estimate K values were different
between the three studies. Thus, differences In K values may be due to either design issues or K
computational procedures.
tin Brott et ail7 and Goldstein et al 13 only "best arm" and "best leg" ratings were collected.

employing a control group, we cannot say that the cause tient, up to a total of 20 patients. Our data were
of the physicians' preparedness was our training pro- collected from 75 neurologists who viewed a videotape
gram. However, it is likely that at least some of the of single examinations of six patients. The differences in
physicians benefited from the training program. Fur- testing conditions may explain the variations in results
thermore, the results reported by Bellamy et al5 docu- among the three projects. Any transient progression of
ment the potential gains in rater reliability to be had by signs or fatigue may in part explain the lower agreement
use of a rigorous training program. Our intent was in the study of Goldstein et al. Conversely, being able to
simply to develop a feasible approach for extending this detect other cues during performance of the examina-
concept to a large-scale clinical trial. tion of an actual patient may explain the better results
The lower values of K for the Glasgow Outcome Scale of Brott et al. Our use of a videotape may have partially
are troubling, but given that the scale consists of only a explained some of our discrepancies on individual
single five-option item, values in this range are not items.
unexpected. Furthermore, the six patients included in The poorest reliability occurred in the determination
the videotape were all representatives of the middle of impairments in facial movement or limb ataxia. We
three values; obviously, no patients in grade 5 (dead)
initially attributed the low reliability for scoring of these
were included, and no patients without deficits were
two items to the use of the videotapes. However, Brott
videotaped. Thus, our reliability assessment focused on
et al7 and Goldstein et al13 also reported similar prob-
the three ratings that would be the most difficult to
discriminate. Therefore, we believe the determinations lems with these items even though they tested the scales
of outcomes using the Glasgow Outcome Scale by in a live-patient setting. In fact, one of the studies
physicians in our study will be accurate within tolerance. reported a negative reliability value for the assessment
The reliabilities of scoring of the NIH Stroke Scale in of limb ataxia.13 The poor reliability in assessing limb
the study of Goldstein et al13 (Table 2) tended to be ataxia may be explained by the problems in differenti-
lower than ours, which in turn tended to be lower than ating incoordination secondary to weakness from inco-
those reported by Brott et aJ.7 Although the results led ordination due to true ataxia in a patient who has
to comparable conclusions, the three studies have sig- paresis. Still, the incoordination may be important even
nificant differences in design. The data reported by if it is secondary to weakness, and scoring this item may
Brott et al were collected from a team of four healthcare be important. Assessment of the limb ataxia seems to be
professionals, including nonphysicians, who observed a a weakness of the NIH Stroke Scale. Because of the
neurologist examine each of 24 patients. The Goldstein poor interrater agreement, this item either should not
et al data were collected by four neurologists who were be emphasized and perhaps should be omitted from
randomly paired to then independently rate each pa- computing any overall score or physicians should be

Downloaded from stroke.ahajournals.org by on April 8, 2011


1750 Stroke Vol 25, No 9 September 1994

continually reminded to score it even in the presence of RN, and C. Kelley, RN (SC). SUNY Health Sciences Center,
weakness. Syracuse, NY: A. Culebras, MD (PI). G.C Carey, MD; N.M.
In summary, our study suggests that with minimal Martir, MD; P.F. Kent, MD; H. Rabiee, MD; R.A. Guevara,
MD; and M.S. Bangco, MD (Co-I). D. Pastor, RN, and C.
instruction, physicians can use the NIH Stroke Scale, Ficarra (SC). Oregon Health Sciences University, Portland, Ore:
supplemental motor examination, and Glasgow Out- B.M. Coull, MD (PI). D.P. Briley, MD, and W.M. Clark, MD
come Scale to assess patients with stroke reliably and (Co-I). C. Kenny, T. Austin, BS; and P.L. de Garmo, ANP
reproducibly. The possibility of using written materials (SC). Hennepin County Medical Center, Minneapolis, Minn:
and videotape to provide this instruction makes these D.C. Anderson, MD (PI). R.M. Tarrel, DO; M A Nance, MD;
methods well suited to large-scale clinical trials of S.R. Bundlie, MD; and J J . Doyle, MD (Co-I). M. Dierich, RN
stroke management. (SC). Iowa Methodist Medical Center, Des Moines, Iowa: B.B.
Love, MD (PI). L.K. Struck, MD (Co-I). C. Mueller, BSN
Appendix (SC). Medical University of South Carolina, Charleston, SC:
E.L. Hogan, MD (PI). T.D. Carter, MD; P. Gurecki, MD; and
TOAST Research Group J.W. Ph-er, MD (Co-I). B.K. Muntz-Pope, BSN, CNRN (SC).
TOAST Participating Clinical Centers Long Island Jewish Medical Center, New Hyde Park, NY: R.B.
University of Iowa Hospitals and Clinks, Iowa City, Iowa: Libman, MD (PI). T.G. Kwiatkowski, MD, and R.M. Kanner,
B.H. Bendixen, PhD, MD (Principal Investigators] [PI]). H.P. MD (Co-I). R. Donnaruma, RN, MA, and V. Cullen, RN
Adams, Jr, MD; P.H. Davis, MD; M.R Jacoby, MD; FJ. (SC). Yale University School of Medicine, New Haven, Conn:
Gomez, MD; M.E. Dyken, MD; E.Y. Uc, MD; J.M. Woj- P.B. Fayad, MD, and L.M. Brass, MD (PI). F.J. Pavalkis, PA
cieszek, MD; and LJ. Kappelle, MD (Co-Investigator[s] [Co- (SC). Kern Medical Center, Bakersfield, Calif: C.J. Wrobel, MD
I]). A.B. Tanna, RN, and V.L. Mitchell, RN (Study Coordi- (PI). O.B. Leramo, MD (Co-I). S. Buxton, RN (SC). Rochester
nators] [SC]). General Hospital, Rochester, NY: J. Hollander, MD (PI). G.W.
St Louis University Medical Center, St Louis, Mo: C.R. Honch, MD (Co-I). C. Weber, RN, MS (SC). Beth Israel
Gomez, MD (PI). M.D. Malkoff, MD; R. Tulyapronchote, Hospital, Boston, Mass: C.I. Mayman, MD (PI). SJ. Warach,
MD; C M . Sauer, MD; G. Riaz, MD; J.G. Schmidt, MD; and MD (Co-I). M.L. Tijerina (SC). Wichita Institute for Clinical
M.M. Malik, MD (Co-I). G.A. Banet, RN, MSN (SC). Research, Inc, Wichita, Kan: M.A. Mandelbaum, MD (PI).
Marshfield Clinic, Marshfield, Wis: P.N. Karanjia, MD, MRCP R.U. Hassan, MD; D.H. Abbas, MD; and C.G. Olmstead, MD
(PI). K.P. Madden, MD; K.H. Ruggles, MD; S.F. Mickel, MD; (Co-I). L. Sedlacek, RN, MN (SC). Maimonides Medical
P.G. Gottschalk, MD; P L . Hansotia, MD; R.W. Sorenson, Center, Brooklyn, NY: A.E. Miller, MD (PI). MJ. Keilson,
MD; D.M. Jacobson, MD; and B.C. Hiner, MD (Co-I). K. MD; K.M. Bruining, MD; and E.E. Drexler, MD (Co-I). L.
Mancl and E. Lukasik (SC). Albuquerque VA Medical Center, Morgante, RN (SC). St Paul Ramsey Medical Center, St Paul,
Albuquerque, NM: A. Bruno, MD (PI). E.D. Lakind, PhD, Minn: M. Ramirez-Lassepas, MD (PI). J.W. Tulloch, MD;
MD; D.R. Jeffrey, Jr, MD; E.K. Mladinich, MD; J. Iqbal, MD; M.R. Quinones, MD; A. Clavel, MD; M.F. Mendez, MD; S.
M. Reiners, MD; D.W. Barrett, MD; D. Shibuya, MD; J.K. Zhang, MD; and T.A. Ala, MD (Co-I). C. Espinosa and K.L.
Williams, Jr, MD; P. Russell, DO; M.K. King, MD; and J.E. Johnston (SC). Boston University School of Medicine, Boston,
Chapin, MD (Co-I). S. Carter, RN, and L. Jeffries, RN (SC). Mass: CS. Kase, MD (PI). P.A. Wolf, MD, and V.L. Babikian,
University of Illinois Medical Center, Chicago, IlL CM. Helga- MD (Co-I). E.E. Licata-Gehr, RN, MS, and N.C. Allen, RN,
son, MD (PI). D.B. Hier, MD; R.A. Shapiro, DO; and S.U. MSN (SC). Evanston Hospital, Evanston, III: D. Homer, MD
Brint, MD (Co-I). J. Hoff, RN, and D. O'Connell, RN (SC). (PI). S. Neely, MD (Co-I). J. Carpenter, RN, MSN (SC).
University of Southern California School of Medicine, Los Albany Medical Center, Albany, NY: S.H. Horowitz, MD (PI).
Angeles, Calif: M J . Fisher, MD (PI). S.F. Ameriso, MD; M.H. N.S. Lava, MD (Co-I). M. Manning, RN (SC).
Garabedian, MD; R.F. Macko, MD; M. Hanna, MD; and G.A.
Yegyan, MD (Co-I). A. Martin, HT, HTL, and A. Scicli (SC). Clinical Coordinating Center
University of California, San Diego, Medical Center, San Diego, University of Iowa Hospitals and Clinics, Iowa City, Iowa:
Calif: C M . Jackson, MD, and J.F. Rothrock, MD (PI). P.D. H.P. Adams, Jr, MD (Project Director). B.H. Bendixen,
Lyden, MD; M.L. Brody, MD; and R.M. Zweifler, MD (Co-I). PhD, MD; P.H. Davis, MD; and B.B. Love, MD (Medical
N.M. Kelly, BSN (SC). University of Mississippi Medical Center, Monitors). K.J. Grirasman, RN (Center Coordinator). J.D.
Jackson, Miss: D.L. Gordon, MD (PI). A.A. Thiel, MD; R.K. Olson, MD, PhD, and B.J. Pennell (Central Laboratory). K.
Fredericks, MD; and R. Singh, MD (Co-I). J. Dendinger, RN Johnson, RPh (Central Pharmacy). S.H. Cornell, MD; D.L.
(SC). Rush-Presbyterian-St Luke's Medical Center, Chicago, lit Crosby, MD; and T.M. Simpson, MD (Central Radiology).
P.B. Gorelick, MD, MPH (PI). B J . Riskin, MD; D.B. Mirza, V. Krumbholz (Financial Administrative Assistant). C.R.
MD; M.A. Kelly, MD; A. Bijari, MD; J.C. Murray, MD; J. Zalesky (Secretary).
Curtin, MD; F.G. Bozzola, MD; and J.C. Kofman, MD (Co-I).
N. Brown and W.C. Dollear, RN, MPH (SC). Mt Sinai Medical Data Management Center
Center, New York, NY: J.M. Weinberger, MD (PI). S. Tuhrim, University of Iowa, Iowa City, Iowa: R.F. Woolson, PhD
MD; S.H. Rudolph, MD; D.R. Horowitz, MD; K.F. Sheinart, (Director). W.R. Clarke, PhD (Associate Director). P.A.
MD; and T.M. Gondolo, MD (Co-I). J. Ali, RN, and A. Bitton, Wasek, BA (Center Coordinator). J.A. Dieleman, BA (Sys-
RN (SC). Northwestern University Medical School, Chicago, III tems Coordinator). J.M. Paulsen, BS, and J.P. Boreen, BS
J. Biller, MD (PI). J.L. Saver, MD; J.I. Frank, MD; J.T. (Programmers). M.F. Jones, MA; B.M. Robb, BA; L.A. Ober-
Patrick, MD; and E. Fernandez-Beer, MD (Co-I). L. Chad- broeckling, BS; and M.D. Hansen, MS (Research Assistants).
wick, RN (SC). Rhode Island Hospital, Providence, RI: E. K.M. Hicklin (Secretary).
Feldmann, MD (PI). J.L. Wilterdink, MD (Co-I). L. Ricks,
BSN (SC). Columbia-Presbyterian Medical Center, New York, Committees
NY: J.P. Mohr, MD (PI). R.L. Sacco, MD (Co-I). M. Clavijo Advisory Committee: M.L. Dyken, MD; R.F. Frankowski,
(SC). Montefiore Medical Center, Bronx, NY: D.M. Rosen- PhD; CS. Greenberg, MD; L.A. Harker, MD; and J.P. Whis-
baum, MD (PI). S.A. Sparr, MD, and P.M. Katz, MD (Co-I). nant, MD (Members). H.P. Adams, Jr, MD, and R.F. Wool-
E. Klonowski (SC). University of Missouri Health Sciences son, PhD (Members Ex Officio). In-house Safety Committee:
Center, Columbia, Mo: J.A. Byer, MD (PI). H.H. White, MD W.R. Clarke, PhD; R.W. Fincham, MD; T.C Kisker, MD;
(Co-PI). S. Sundrani, MD; M.J. Zafar, MD; R. Arora, MD; J.D. Olson, MD; R.B. Wallace, MD; and R.F. Woolson, PhD.
E . C Gamboa, MD; and M. Stacy, MD (Co-I). A. Bonnett, NIH Safety and Monitoring Committee: H J . Day, MD; K.M.

Downloaded from stroke.ahajournals.org by on April 8, 2011


Albanese et al Ensuring Reliability of Outcome Measures 1751

Detre, MD, DrPH; J.C. Grotta, MD; E.C. Haley, Jr, MD; W.T. 5. Bellamy N, Carette S, Ford PM, Kean WF, leRiche NGH, Lussier
Longstreth, Jr, MD; and J.R. Marler, MD. A, Wells GA, Campbell J. Osteoarthritis antirheumatic drug trials,
I: effects of standardization procedures on observer dependent
Acknowledgments outcome measures. J Rhcumatol 1992;19:436-443.
6. Mahoney FI, Barthel DW. Functional evaluation: the Barthel
This study was sponsored by US Public Health Service Index. Md State Med J. 1965;14:61-65.
grants NIH1-NINDS RO1-NS-27863 and NIH-NINDS RO1- 7. Brott T, Adams HP, Olinger CP, Marier JR, Barsan WG, Biller J,
NS-27960. Support, including supply of the study drug, was Spilker J, Holleran R, Hertzberg V, Rorick M, Moomaw CJ,
also provided by Organon Inc, West Orange, NJ. Walker M. Measurements of acute cerebral infarction: a clinical
examination scale. Stroke. 1989;20:864-870.
References 8. Candelise L. CUnical trial methodology in stroke multicentre
studies: keep the protocol simple. In: Amery WK, Bonsser M,
1. Holt DW, Marsden JT, Johnston A. Quality assessment of cydo- Rose FC, eds. Clinical Trial Methodology in Stroke. London,
sporine measurements: comparison of current methods. Transplant England: BaiUiere Tindall; 1989.
Proc 1990^22:1234-1239. 9. KasseU NF, Tomer JC, Haley EC Jr, Jane JA, Adams HP Jr,
2. Azen SP, Irvine AR, Davis MD, Stern W, Lonn L, Hilton G, Kongable GL, and Participants. The international cooperative
Schwartz A, Boone D, Quillen-Thomas B, Lyons M, Lean JS, study on the timing of aneurysm surgery. / Neurosurg. 1990;73:
Silicone Study Group. The validity and reliability of photographic 18-36.
documentation of proliferative vitreoretinopathy. Ophthalmology. 10. Wade DT. Measurement in Neurological Rehabilitation. New York,
1989;96:352-357. NY: Oxford University Press; 1992.
3. Gordon DL, Bendixen BH, Adams HP Jr, Clarke W, KappeUe LJ, 11. Fleiss JL. Statistical Methods for Rates and Proportions. 2nded. New
Woolson RF, and the TOAST investigators. Interphysician York, NY: John Wiley & Sons Inc; 1981.
agreement in the diagnosis of subtypes of acute ischemic stroke: 12. Bartko JJ. The intradass correlation coefficient as a measure of
implications for clinical trials. Neurology. 1993;43:1021-1027. reliability. Psychol Rep. 1966;19:3-11.
4. Lyden PD, Lau GT. A critical appraisal of stroke evaluation and 13. Goldstein LB, Bertels C, Davis JN. Interrater reliability of the
rating scales. Stroke. 1991^22:1345-1352. NTH stroke scale. Arch NeuroL 1989;46:660-662.

Downloaded from stroke.ahajournals.org by on April 8, 2011

S-ar putea să vă placă și