Sunteți pe pagina 1din 13

The Clinical Neuropsychologist

1998, Vol. 12, No. 1, pp. 43-55

1385-4046/98/1201-043$12.00
Swets & Zeitlinger

Hopkins Verbal Learning Test Revised:


Normative Data and Analysis of Inter-Form and
Test-Retest Reliability*
Ralph H. B. Benedict1, David Schretlen2, Lowell Groninger3, and Jason Brandt2
of Neurology, State University of New York (SUNY) at Buffalo, 2Department of Psychiatry and
Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, and 3University of Maryland at
Baltimore County, Catonsville

1Department

ABSTRACT
The Hopkins Verbal Learning Test (HVLT) is a brief verbal learning and memory test with six alternate
forms. The HVLT is ideal in situations calling for repeated neuropsychological examinations, but it lacks
a delayed recall trial which is essential for the assessment of abnormal forgetting. We present a revised
version of the HVLT which includes a delayed recall trial, and therefore delays the yes/no recognition trial.
The equivalence of test forms was examined in two separate studies using between-groups and withinsubjects research designs. In both studies, the six forms of the revised HVLT (HVLT-R) were found to be
equivalent with respect to the recall trials, but there were some modest differences in recognition. Recommendations for the use of the HVLT-R in serial neuropsychological examinations are provided, as well as
normative data tables from a sample of 541 subjects, spanning ages 17 to 88 years.

The national crisis in health care financing and


the upsurge in managed health care are having
profound impacts on the practice of clinical neuropsychology. Increasing emphasis is placed on
the medical necessity of neuropsychological
evaluations and the cost of psychological testing
procedures. In a recent survey of practicing
neuropsychologists in the United States, 64%
reported a decrease in the number of reimbursable hours per evaluation (Sweet, Westergaard,
& Moberg, 1995). Medicare intermediaries are
recommending limiting the use of clinical neuropsychological testing procedures and discour*

aging the use of predetermined test batteries


(Medicare Part B of New York, 1995). These
pressures are prompting a move away from
lengthy, comprehensive diagnostic evaluations
toward briefer neuropsychological examinations
that can be used to assist in neurologic or psychiatric diagnosis, track the effects of treatment
or changes in cognition, and predict everyday
functional capacities.
Whereas memory tests are essential in even
brief neuropsychological examinations, many of
the more popular memory tests are lengthy (e.g.,
Delis, Kramer, Kaplan, & Ober 1987; Wechsler,

Data collection at the SUNY Buffalo site was supported, in part, by a test development grant from Psychological Assessment Resources, Inc. Data collection at the Johns Hopkins University was supported in part by a
memory research grant from the DeVilbiss Fund and NIA grant #1R01AG11859-01A Aging Brain Imaging and
Cognition. The authors gratefully acknowledge the assistance of Melissa Dobraski and Barnett Sphritz for their
assistance in data collection.
Administration and scoring instructions, and Hopkins Verbal Learning Test-Revised test forms, can be obtained
from Dr. Benedict at cost.
Address correspondence to: Ralph H. B. Benedict, SUNY Buffalo School of Medicine, Department of Neurology,
Buffalo General Hospital, 100 High Street (D-6), Buffalo, NY 14203, USA.
Accepted for publication: June 11, 1997.

44

RALPH H.B. BENEDICT ET AL.

1987). Most memory tests are also highly susceptible to the effects of task-specific practice
because patients are asked to learn the same material repeatedly. McCaffrey and colleagues
(McCaffrey, Ortega, Orsillo, Nelles, & Haase,
1992) reported a practice effect of one standard
deviation (SD) in magnitude on the Visual Reproduction subtest from the Wechsler Memory
Scale (Wechsler, 1945) when it was re-administered after 1 week. In contrast, our group found
that when a similar test with alternate forms was
administered to normal subjects using the same
test-retest interval, the change was on the order
of 0.2 SD (Benedict, Schretlen, Groninger,
Dobraski, & Sphritz, 1996). These findings
highlight the importance of using different,
equivalent forms of the same test when repeated
assessments of memory are necessary.
Many investigators have developed multipleform verbal memory tests. Parker, Eaton,
Whipple, Heseltine, and Bridge (1995) recently
introduced the University of Southern California
Repeatable Episodic Memory Test (USCREMT), a word-list learning task which includes
only semantically unrelated words in order to
maximize the demand for subjective organization during encoding and retrieval. The USCREMT has seven alternate forms which were
administered to 50 highly educated, middle-aged
men, 36 of whom tested positive for HIV-1. Preliminary reliability data are encouraging, but the
USC-REMT is limited by the lack of delayed
recall and recognition trials. Shapiro and Harrison (1990) reported the equivalence of four
forms of the Rey Auditory Verbal Learning Test
(RAVLT; Rey, 1964) in within-subjects testing
of 17 neurology inpatients and 25 college students. The weaknesses of this study were a
highly variable intertest interval (range = 2 to 13
days) and a small sample size. Geffen,
Butterworth, and Geffen (1994) examined the
equivalence of the original version of the
RAVLT and an alternate form in 51 normal subjects. The authors included delayed recall and
recognition trials. The sample was more representative of the general population and the testretest interval was more carefully standardized.
Analyses of variance revealed no significant
effect of test form. There was a difference of 1.1

words on the total recall measure (sum of trials


15) but the averaged standard deviation for the
measure was 7.55. More recently, Uchiyama et
al. (1995) examined the equivalence of the original form of the RAVLT and a form of the test
introduced by Taylor (1959). Again, within-subjects comparison of the two forms revealed no
significant difference for trials 1 to 5. Although
these studies generally support the interform
reliability of the RAVLT and similar measures,
to the best of our knowledge, no study has examined the equivalence of three or more forms,
including delayed recall and recognition trials,
in a large sample.
Since its introduction in 1991, the Hopkins
Verbal Learning Test (HVLT; Brandt, 1991) has
grown in popularity largely because it is brief,
well tolerated by geriatric and demented patients, and has six alternate forms. The HVLT
consists of a 12-item word list which is read to
subjects on three successive learning trials. Free
recall scores are recorded for each learning trial.
A yes/no recognition task is then presented immediately following the third learning trial. Subjects are asked to identify all target words by
responding yes, and to reject 12 nontarget
words by responding no. Brandt reported that
recall across the six forms is equivalent. Although some interform differences were found
on the recognition task, the differences were
small in magnitude and were judged to have little practical significance. A major limitation of
the HVLT is its lack of a delayed recall trial. In
this paper, we introduce the revised Hopkins
Verbal Learning Test (HVLT-R) which includes
a 20-25-min delayed recall trial prior to the recognition task. The HVLT-R was standardized in
a sample of 541 normal healthy volunteers and
test-retest and interform reliability were established. Preliminary normative data are also presented.

METHODS
Subjects
The participants were recruited from three sources:
(1) the State University of New York (SUNY) at
Buffalo and surrounding metropolitan area (n =

HVLT-R

432), (2) undergraduate psychology classes from


the University of Maryland at Baltimore County
(UMBC) (n = 18), and (3) a study of normal aging,
brain magnetic resonance imaging, and cognition
being conducted at the Johns Hopkins University
(JHU) in Baltimore MD (n = 91). The SUNY Buffalo subjects were recruited through newspaper
advertisement and paid from $10 to $15 for completing a 30-60 min neuropsychological examination. At UMBC, the undergraduate volunteers received class credit for participation in successive
hour-long exams. The subjects tested at JHU were
ascertained through random digit dialing to ensure
the recruitment of a sample whose health and demographic characteristics are broadly representative of community-dwelling adults in the Baltimore metropolitan region. The SUNY Buffalo and
JHU subjects were screened via telephone and/or
face-to-face interview for the presence or history
of medical (e.g., prior open-heart surgery), neurologic (e.g., prior stroke or mild head trauma), or
psychiatric conditions (e.g., prior alcohol dependence, current major depressive disorder) which
could affect current cognitive performance. Only
those subjects who were judged to be optimally
healthy or who had only minor health problems
(e.g., obesity, uncomplicated diabetes mellitus,
controlled hypertension) were included. The 18
undergraduate college students were presumed to
have normal cognitive functioning.
Altogether, there were 541 normal healthy subjects who completed at least one administration of
the HVLT-R. The average age of the sample population was 48.1 years (SD = 17.3) with a range of
17 to 88 years. The education level ranged from 5
to 20 years, with a mean of 13.8 (SD = 2.3). There
were 200 (37%) men in the sample and 341 (63%)
women. The racial breakdown was as follows: 459
(85%) Caucasian, 77 (14%) African American,
and 5 (1%) Other. Estimated IQs were calculated
for 450 subjects using the method of Barona,
Reynolds, and Chastain (1984), yielding a mean
estimated IQ of 107.2 (SD = 8.1) and a range of 80
to 121. North American Adult Reading Test (Blair
& Spreen, 1989) IQ estimates were also available
for 343 subjects, yielding a mean Full Scale IQ
estimate of 107.1 (SD = 9.7), and a range of 80 to
126.
The Hopkins Verbal Learning Test-Revised
The administration of the HVLT-R begins with the
reading of a 12-item word list with a 2-s
interstimulus interval. After the final word is read,
the patient or subject is asked to recall as many
items as possible, in any order. The examiner records each response in sequence. Two additional

45

learning trials are then administered, after which


the patient is alerted that he/she might be asked to
recall the list again at a later time. Trial 4, the delayed recall trial, follows a 2025-min interval
filled with unrelated tasks. No cues are provided
for trial 4; patients are merely told that a list of
words was read to them and that they should try to
recall as many of the words as possible. Again,
responses are recorded verbatim. Following trial 4,
a list of 24 words including the 12 target words
and 12 nontarget words (6 semantically related to
the targets, 6 unrelated to the targets), is read. For
each stimulus, the patient is asked to respond
yes if the word was on the target list and no
if the word was not previously presented. The resulting true- and false-positive responses are recorded by the examiner.
We have found that the HVLT-R can be completed by most patients or subjects in fewer than
15 min (exclusive of the delay interval), and that
the test is well tolerated by patients with a wide
range of disorders, including those with severe
dementia. This paper provides data for the following 11 measures: trial 1, trial 2, trial 3, learning,
total recall, trial 4, percent retained, recognition
true-positives, recognition false-positives, discrimination index, and response bias. The number of
words recalled by the patient is recorded following
trials 1, 2, 3, and the delayed recall trial 4. The total recall score is the number of words recalled by
trial 3 (trial 1 + trial 2 + trial 3). The learning measure is calculated as the higher of trial 2 recall and
trial 3 recall, minus trial 1 recall. The percent retained is calculated as the better of trials 2 and 3
divided by the trial 4 recall score, times 100. From
the recognition task, the total number of true- and
false-positives is recorded, from which a recognition discrimination index is calculated (truepositives minus false-positives). Finally, Br, a
measure of recognition response bias derived from
Two-High Threshold Theory, is calculated in accordance with the recommendations of Snodgrass
and Corwin (1988).1
Procedures
In almost all cases, the HVLT-R was administered as part of a larger battery of neuropsycho-

Br is calculated in accordance with Two-High


Threshold Theory as follows:
[(# False-Positives + 0.5 / 13) / 1 ((# True-Positives
+ 0.5 / 13) (# False-Positives + 0.5 / 13))].
A look-up table for Br scores corresponding to each
possible pairing of HVLT-R recognition true- and
false-positives is available from the first author.

46

RALPH H.B. BENEDICT ET AL.

logical tests. The selection of test form was random for each subject in the SUNY Buffalo sample, and as a result, this sample included roughly
equal numbers of subjects per test form. The 18
college students from UMBC completed all six
HVLT-R forms and were assigned to a test form
sequence according to a Latin squares research
design. Each UMBC student returned to the laboratory for five follow-up assessments at weekly
intervals. On each occasion, the subjects completed a new HVLT-R form as well as a test of
nonverbal learning and brief problem-solving tests
that were used to distract them during the 20-25
min delayed recall interval. The JHU subjects
were examined with either form 2 or form 6 of
the HVLT-R, in accordance with a research protocol. Assignment of subjects to one of these test
forms was random.
Forty elderly subjects from the SUNY Buffalo
sample returned to the same laboratory to complete
a different form of the HVLT-R. On each occasion,
these subjects completed a brief battery of other
neuropsychological tests covering the domains of
language, visual-spatial, and executive function.
The selection of HVLT-R form was random for
each examination, provided that the same form was
not repeated with the same subject. The mean age
of this sample was 68.8 years (SD = 5.8, range 56
82) and the average level of education was 13.9
years (SD = 2.7, range 8 20). The test-retest interval ranged from 14 to 134 days, with a mean of
46.6 (SD = 30.1).
Data Analysis
Although all of the HVLT-R measures were limited by a restricted range to some degree, trial 1,
trial 2, trial 3, learning, total recall, trial 4, and
response bias conformed roughly to a normal distribution of scores and parametric statistics were
employed for these measures. Statistical analyses
of the remaining measures employed
nonparametric tests as the distributions for these
measures deviated clearly from normal. For example, 217 (40%) cases achieved a percent retained
score of 100. Extreme kurtosis was particularly
salient on the recognition task, where 419 (77%)
of subjects made 12 of 12 correct target word detections, and 361 (67%) subjects made no falsepositive errors. Finally, given the high statistical
power of our large sample and the multiple comparisons, we set alpha at .01 to avoid interpreting
very small effects.

RESULTS
Between-Group Analysis of Inter-Form
Equivalence
HVLT-R test forms were administered randomly
to the 432 SUNY Buffalo subjects, resulting in
comparable sample sizes per form: form 1 = 92,
form 2 = 70, form 3 = 60, form 4 = 67, form 5 =
62, form 6 = 81. Age, education, Barona IQ, and
NAART IQ values did not differ across form
group as indicated by one-way ANOVA (Age
F(5,426) = 1.5; Education F(5,426) = 0.6;
Barona IQ F(5,426) = 2.2; NAART IQ F(5,246)
= 1.4). Neither the Caucasian to African American\Other ratio nor the male to female ratio varied significantly across form as demonstrated by
chi-square analysis (Sex P 2 = 4.0; Race\
Ethnicity P2 = 0.6).
Analyses of inter-form equivalence employed
one-way ANOVA for the normally distributed
measures, and the nonparametric Kruskal-Wallis
statistic for the remaning measures. As can be
seen in Table 1, the forms are equivalent with
respect to the free-recall scores, percent retained, and recognition true-positives. Large and
significant effects were found, however, for recognition false-positives, discrimination index,
and response bias. All three findings can be attributed to marked differences in the number of
palse-positives produced by the HVLT-R forms.
As shown in Figure 1, there are essentially two
clusters of HVLT-R forms with forms 1, 2, and
4 resulting in a higher number of false-positives
than forms 3, 5, and 6. Scheff and KruskalWallis comparisons revealed no significant differences among the forms within each cluster.
For response bias, Scheff tests revealed significant differences between form 2 and forms 3, 5,
and 6, and a significant difference between
forms 4 and 6. For both false-positives and discrimination index, Kruskal-Wallis comparisons
were significant (all p values < .001) for each
possible pairing of test form between the clusters.
Within-Subjects Analysis of Inter-Form
Equivalence
Comparison of scores across the six test forms,
among the 18 students who completed each

47

HVLT-R

Table 1. Between-Groups Analyses of Inter-Form Equivalence.


HVLT-R Variable
Trial 1
Trial 2
Trial 3
Total Recall
Learning
Trial 4
Percent Retained
Recog True-Positives
Recog False-Positives
Discrimination Index
Response Bias

Lowest Mean
7.61 (form 5)
9.71 (form 3)
10.61 (form 3)
28.11 (form 3)
2.81 (form 3)
10.01 (form 2)
0.91 (form 2)
11.71 (forms 1,4)
0.21 (forms 3,5)
11.01 (form 4)
0.48 (form 6)

Highest Mean
8.11 (form 4)
10.21 (forms 1,4,5)
10.91 (form 5)
28.91 (forms 1,6)
3.41 (form 5)
10.51 (form 5)
0.96 (form 3)
11.91 (form 3)
0.81 (form 2)
11.71 (forms 3,5)
0.59 (form 2)

(SD)

F or K-W

(1.7)
(1.6)
(1.3)
(4.0)
(1.5)
(1.8)
(0.12)
(0.6)
(0.8)
(1.1)
(0.15)

0.5
0.9
0.6
0.5
1.3
0.5
7.6
7.8
50.1
35.2
6.9

0.81
0.46
0.73
0.78
0.26
0.78
0.18
0.17
< .0001
< .0001
< .0001

Note. Recog = Recognition; SD = mean standard deviation for all test forms; K-W = Kruskal-Wallis Chi-Square
statistic.

form, was accomplished using repeated measures ANOVA and nonparametric tests as required. Figure 2 presents the average number of
words recalled for each form across the four recall trails, collapsed across the session administered. The figure clearly demonstrates that the
free-recall scores were similar across form, as
was found in the between-groups analysis. A 6
(form) 4 (trial) ANOVA, with repeated mea-

Fig. 1.

sures on both factors, revealed a significant


main effect for trial (F(3,51) = 113.7, p < .001),
but no form main effect (F(5,85) = 0.8) nor form
by trial interaction (F(15,255) = 0.5). Separate
one-way repeated measures ANOVAs were conducted for the learning, total recall, and response
bias measures. None were statistically significant.

Frequency distribution of the percentage of subjects giving 0, 1, 2, or more than 2 false-positive responses on each form of the HVLT-R. Nonparametric statistical analyses revealed that forms 1, 2, and
4 are similar, as are forms 3, 5, and 6, consistent with visual inspection of the frequency distribution.

48

Fig. 2.

RALPH H.B. BENEDICT ET AL.

Number of words recalled over the three learning and delayed recall trials of the HVLT-R. Subjects
were 18 college undergraduate students who completed all six forms at successive one-week intervals.

Interform differences for percent retained,


true-positives, false-positives, and the discrimination index were assessed with the Friedman
nonparametric ANOVA. Although none of the
comparisons reached statistical significance,
trends were identified suggesting similar interform differences on the recognition task as were
found in the between-groups analysis. Wilcoxin
Matched-Pairs Signed-Ranks tests revealed marginally significant discrimination index differences for the following pairings: 1 versus 3 (Z =
2.2, p = .03), 1 versus 5 (Z = 2.3, p = .02), 2 versus 3 (Z = 2.0, p = .05), 2 versus 5 (Z = 2.2, p =
.03), 4 versus 5 (Z = 2.1, p = .04).

cients for some measures may have been due,


in part, to changing distributions of scores as
subjects became more familiar with the testing
procedure. For example, 74% of subjects
achieved a learning score of 2 or 3 on the second examination which restricts the range of
possible scores, thereby reducing the reliability
coefficient. Response bias was similarly affected by restricted range on the second examination, with 48% of subjects achieving a score
of .50 at Test 1, and 74% achieving the same
score at Test 2. Paired-T and Wilcoxin tests
revealed very little in the way of practice effects in this sample.

Test-Retest Reliability and Practice Effects


Test-retest reliability was estimated in elderly
volunteers taking a different form of the
HVLT-R on two occasions. The reliability coefficients were within acceptable limits for the
free recall measures (Table 2). The low coeffi-

Normative Data
The normative data sample included subjects
from the reliability studies above, and the JHU
sample. As expected, there was a modest yet
significant relationship between younger age
and better HVLT-R performance. The Pearson r

49

HVLT-R

Table 2. Test-Retest Data.


Test 1
M
Trial 1
Trial 2
Trial 3
Learning
Total Recall
Trial 4
Percent Retained
Recognition True-Positives
Recognition False-Positives
Discrimination Index
Response Bias
a

7.6
10.0
10.7
3.3
28.2
9.9
0.91
11.8
0.5
11.3
0.54

Test 2
(SD)
(2.0)
(1.6)
(1.5)
(1.7)
(4.4)
(2.0)
(0.13)
(0.4)
(0.9)
(1.0)
(0.17)

M
8.2
10.1
11.1
2.9
29.3
10.5
0.95
11.8
0.3
11.5
0.51

(SD)
(2.0)
(1.6)
(1.6)
(1.5)
(4.7)
(2.0)
(0.13)
(0.5)
(0.7)
(1.0)
(0.15)

T or Z
b

0.55
0.67b
0.78b
0.41a
0.74b
0.66b
0.39
0.46a
0.25
0.40
0.05

2.0
0.5
2.3
1.1
2.1
2.2
1.4
0.1
1.0
1.6
0.7

= p < .01, b = p < .001.

correlations ranged from .07 for learning to .31


for total recall. All but the learning coefficient
were in the expected direction, greater than .15,
and significant at the p < .001 probability level.
Partial correlation coefficients examining the
relationship between education and HVLT-R
performance while controlling for age were considerably smaller, but in the expected direction.
The largest education coefficient was found for
total recall (r = .16).
Tables 3 to 6 present normative data divided
into four age-based groups. Because standard
scores based on a normal distribution can be
misleading where the score distribution deviates
substantially from normality, percentile ranks
were also included in the tables and are recommended in the interpretation of scores for percent retained, recognition true-positives, recognition false-positives, and discrimination index.
Because the interform reliability studies demonstrated equivalence of form for the recall but not
recognition measures, we divided the HVLT-R
forms into the two clusters found in the reliability analyses and presented recognition data for
these clusters separately.

DISCUSSION
In response to the growing demand for brief,
repeatable tests of memory, we report on a revised version of the Hopkins Verbal Learning
Test which now includes a 20-25 min delayed
recall trial, a measure of forgetting, and a delayed recognition trial. Our results indicate that
the HVLT-R has acceptable reliability, and that
the test forms are equivalent with respect to
learning and delayed recall. There are modest
interform differences on the delayed recognition
task, and we recommend that this factor be taken
into account in the interpretation of HVLT-R
data. Using the same recognition task immediately after trial 3, Brandt (1991) also found that
form 3 results in better target discrimination
than forms 1 and 4. The findings were attributed
to differences in the number of false-positive
errors. When viewed together, existing research
with the HVLT (or HVLT-R) indicates that form
3 is less likely to produce false-positive recognition errors than forms 1, 2, and 4. Although the
modest degree of difference is likely to have
little practical significance, we recommend that
when the HVLT-R is used in repeated examinations, that forms 1, 2, and 4 or forms 3, 5, and 6
be used together, where possible. Analyses of
the recall trails data indicate that all six forms
are interchangeable.

11.81
10.61
11.81
10.49

Recognition Measures Forms 3,5,6, (n = 51)


True-Positives
False-Positives
Discrimination Index
Response Bias
(0.5)
(0.2)
(0.6)
(0.09)

(0.6)
(1.1)
(1.4)
(0.16)
10 12
01
9 12
0.17 0.75

9 12
05
7 12
0.17 0.92

4 12
7 12
7 12
08
19 36
6 12
58 120
19
16
59

19
10.17

10.17

19.17
15.17
17.17
10.17

20.17
17.17
67.17

17

14.17
17.17
18.17

11.25
11.25
10.25
10.25

10.25
13.25
18.25
10.25

15.25
18.25
19.25
10.25
22.25
18.25
73.25

12.50
10.50
12.50
10.50

11.50
12.50
19.50
10.50

16.50
19.50
10.50
12.50
25.50
19.50
83.50

16

10.50

12.50
11.50
10.50
10.50

17.50
19.50
10.50
12.50
27.50
10.50
90.50

25

19.50
12.50
12.50
14.50
32.50
12.50

75

110.50 10.50

110.50
112.50
110.50 10.70

118.50
110.50
111.50
113.50
130.50
111.50
100.50

50

Percentile Ranks

Note. Scores corresponding to each percentile rank were rounded to the nearest whole number for all measures except response bias.

11.71
10.71
11.01
10.56

Recognition Measures Forms 1,2,4, (n = 51)


True-Positives
False-Positives
Discrimination Index
Response Bias

18.11 (1.7)
10.31 (1.4)
11.01 (1.2)
13.11 (1.4)
29.41 (3.7)
10.61 (1.6)
95.11 (11.0)

Recall Measures All Forms, (n = 102)


Trial 1
Trial 2
Trial 3
Learning
Total Recall
Trial 4
Percent Retained

17 30
8 18

(4.6)
(2.1)

24.21
13.81

Range

(SD)

Age (years)
Education (years)

Table 3. HVLT-R Normative Data for 46 Male and 56 Female Young Adults.

10.50

10.75

14.50
33.50

19.50

84

10.50

10.85

15.50
35.50

11.50

95

10.75

10.92

16.50
36.50

12.50

98

99

50
RALPH H.B. BENEDICT ET AL.

7.8
9.9
10.9
3.2
28.8
10.3
93
11.8
0.7
11.2
.59
11.8
0.2
11.6
.49

Recall Measures All Forms, n = 235


Trial 1
Trial 2
Trial 3
Learning
Total Recall
Trial 4
Percent Retained

Recognition Measures Forms 1,2,4, (n = 120)


True-Positives
False-Positives
Discrimination Index
Response Bias

Recognition Measures Forms 3,5,6, (n = 115)


True-Positives
False-Positives
Discrimination Index
Response Bias
(0.5)
(0.5)
(0.8)
(0.13)

(0.4)
(0.9)
(1.1)
(0.16)
9 12
02
9 12
0.13 0.83

10 12
05
5 12
0.25 0.90

3 12
4 12
7 12
08
17 36
4 12
50 113

31 54
10 20

(6.5)
(1.9)
(1.7)
(1.5)
(1.2)
(1.5)
(3.8)
(1.7)
(11.2)

Range

(SD)

14.50
16.50
18.50
10.50
20.50
16.50
63.50

15.50
17.50
19.50
11.50
22.50
17.50
70.50

16.50
18.50
10.50
12.50
25.50
19.50
82.50

16

19.50 10.50
12.50
19.50
10.13 10.17

11.50
11.50
10.50
10.25

118.50
110.50
111.50
113.50
129.50
111.50
100.50

50

19.50
11.50
12.50
14.50
32.50
12.50

75

95

15.50 16.50
33.50 34.50

10.50 11.50
12.50

84

16.50
35.50

11.50

98

12.50
11.50 110.50
11.50 111.50 12.50
10.50 110.50 10.75 10.75 10.83 10.88

17.50
19.50
10.50
12.50
26.50
10.50
89.50

25

11.50 12.50
10.50
11.50 11.50 112.50
10.50 10.50 110.50 10.50 10.50 10.75 10.81

10.50 11.50 11.50 11.50


15.50 14.50 13.50 11.50
16.50 18.50 19.50 10.50
10.25 10.50

19.50
15.50
57.50

17.50

Percentile Ranks

Note. Scores corresponding to each percentile rank were rounded to the nearest whole number for all measures except response bias.

42.1
13.8

Age (years)
Education (years)

Table 4. HVLT-R Normative Data for 79 Male and 156 Female Middle-Aged Adults.

10.83

10.90

17.50
36.50

12.50

99

HVLT-R

51

3 12
4 12
6 12
1 6
15 36
5 12
56 120

7.4 (1.9)
9.7 (1.7)
10.6 (1.4)
3.3 (1.5)
27.5 (4.3)
9.8 (1.8)
(12.9)
91
8 12
04
7 12
0.10 0.90
9 12
04
7 12
0.17 0.87

(0.9)
(0.9)
(1.4)
(0.18)
(0.6)
(0.8)
(1.1)
(0.16)

11.5
0.7
10.8
0.56
11.7
0.4
11.3
0.52

61.9
13.8

55 69
6 20

Range

(4.3)
(2.6)

(SD)

10.17

10.10

15.50
15.50
57.50

13.50
14.50
16.50

15.50
18.50
19.50
12.50
23.50
18.50
78.50
11.50
12.50
19.50
10.50
11.50
11.50
10.50
10.48

19.50
13.50
18.50
10.20
11.50
13.50
19.50
10.25

18.50
14.50
17.50
10.13
19.50
14.50
17.50
10.19

16

15.50
16.50
18.50
11.50
20.50
16.50
63.50

14.50
15.50
17.50
10.50
16.50
16.50
60.50

95

98

12.50
11.50 10.50
11.50 12.50
10.50 10.50 110.50 10.75 10.83 10.88

15.50 16.50
32.50 34.50 35.50
12.50

12.50
10.50
11.50 112.50
10.50 110.75 10.75 10.86 10.89

19.50 10.50 11.50


11.50 12.50

84

11.50
11.50
10.50
10.50

118.50
111.50
112.50
115.50
131.50
111.50
100.50

75

17.50
10.50
11.50
13.50
28.50
10.50
92.50

50

16.50
19.50
10.50
12.50
25.50
19.50
83.50

25

Percentile Ranks

Note. Scores corresponding to each percentile rank were rounded to the nearest whole number for all measures except response bias.

Age (years)
Education (years)
Recall Measures All Forms, (n = 129)
Trial 1
Trial 2
Trial 3
Learning
Total Recall
Trial 4
Percent Retained
Recognition Measures Forms 1,2,4, (n = 68)
True-Positives
False-Positives
Discrimination Index
Response Bias
Recognition Measures Forms 3,5,6, (n = 61)
True-Positives
False-Positives
Discrimination Index
Response Bias

Table 5. HVLT-R Normative Data for 50 Male and 79 Female Older-Aged Adults.

36

12

99

52
RALPH H.B. BENEDICT ET AL.

6.7
8.8
9.7
3.2
25.2
8.7
86
11.3
0.7
10.6
0.51
11.4
0.7
10.7
0.50

Recall Measures All Forms, (n = 75)


Trial 1
Trial 2
Trial 3
Learning
Total Recall
Trial 4
Percent Retained

Recognition Measures Forms 1,2,4, (n = 45)


True-Positives
False-Positives
Discrimination Index
Response Bias

Recognition Measures Forms 3,5,6, (n = 30)


True-Positives
False-Positives
Discrimination Index
Response Bias
(0.7)
(1.2)
(1.6)
(0.21)

(0.9)
(0.9)
(1.5)
(0.18)
10 12
05
5 12
0.17 0.83

9 12
04
6 12
0.13 .90

3 12
4 12
5 12
07
14 35
0 12
0 120

70 88
5 20

(4.5)
(2.9)
(2.0)
(2.1)
(2.0)
(1.7)
(5.5)
(2.8)
(20.7)

Range

SD

0
0

3
4

15.50

16.50
10.13

14.50
11.50
19.50

14.50
15.50
15.50

10.50
14.50
16.50
10.17

19.50
13.50
18.50
10.20

14.50
15.50
16.50
11.50
16.50
14.50
46.50

11.50
12.50
10.50
10.25

10.50
11.50
19.50
10.27

15.50
16.50
18.50
12.50
20.50
16.50
70.50

16

11.50
11.50
10.50
10.25

11.50
11.50
10.50
10.44

15.50
17.50
18.50
12.50
21.50
17.50
80.50

25

118.50
111.50
111.50
114.50
129.50
111.50
100.50

75

19.50
11.50
12.50
15.50
31.50
12.50

84

17.50
34.50

10.50
12.50

95

12.50
10.50
11.50 112.50
10.50 110.71 10.75 10.83

12.50
11.50 110.50
11.50 112.50
10.50 110.66 10.75

16.50
19.50
10.50
13.50
25.50
19.50
89.50

50

Percentile Ranks

Note. Scores corresponding to each percentile rank were rounded to the nearest whole number for all measures except percent retained and response bias.

75.2
13.4

Age (years)
Education (years)

Table 6. HVLT-R Normative Data for 25 Male and 50 Female Elderly Adults.

35

11

98

12

99

HVLT-R

53

54

RALPH H.B. BENEDICT ET AL.

The results of the test-retest reliability analysis should be viewed as preliminary, due to the
wide range of test-retest interval (14 to 134
days) and its restriction to a geriatric sample.
Despite the questionable interform reliability of
the recognition task, the HVLT-R still holds
several advantages over existing verbal learning
tests which provide a more comprehensive evaluation of memory (e.g., Delis et al., 1987). The
HVLT-R has six alternate forms which are
equivalent with respect to learning and recall,
and two sets of three forms which can be used
interchangeably for the assessment of delayed
recognition. The test is also easy to administer
and is tolerated well by elderly and demented
patients. These factors contribute to a cost-effective and less strenuous examination of learning and memory. Despite these advantages, the
HVLT-R, like its predecessor the HVLT, may
lack sufficient difficulty to detect deficits in
young, mildly impaired patients. As is apparent
in Tables 3-6, the test also suffers from a limited
range of scores, particularly on the recognition
task.
Research on the validity of the HVLT-R is
ongoing. In a recent article describing the psychometric qualities of the Brief Visuospatial
Memory Test-Revised (BVMT-R; Benedict et
al., 1996), the HVLT-R was included in a construct validity experiment. The HVLT-R was
administered to 126 patients, aged 55 and over,
diagnosed with vascular or mixed dementia
(22%), dementia of the Alzheimer type (21%),
schizophrenia (16%), mood disorder (19%), and
smaller numbers of patient with dementia due to
other etiologies. The rest of the test battery included the Controlled Oral Word Association
Test (Benton & Hamsher, 1983) with letter (S,P)
and category (animals, supermarket items) cues,
a 30-item short form of the Boston Naming Test
(Kaplan, Goodglass, & Weintraub, 1983), Developmental Test of Visual-Motor Integration
(Beery & Buktenica, 1982), and the Trail Making Test (Reitan, 1958). In the principal components analysis with varimax rotation, the HVLTR recall and discrimination index scores loaded
on a separate verbal learning and memory factor, and the response bias measure loaded on a
separate factor along with a response bias mea-

sure obtained from the BVMT-R. Other validity


studies with the HVLT-R have been concluded
and will be addressed in a future article.

REFERENCES
Barona, A., Reynolds, C.R., & Chastain, R. (1984). A
demographically based index of pre-morbid intelligence for the WAIS-R. Journal of Consulting and
Clinical Psychology, 52, 885-887.
Beery, K. E., & Buktenica, N. A. (1982). Revised Administration, Scoring, and Teaching manual for the
Developmental Test of Visual-Motor Integration.
Cleveland, OH: Modern Curriculum Press.
Benedict, R.H.B., Schretlen, D., Groninger, L.,
Dobraski, M., & Shpritz, B. (1996). Revision of
the Brief Visuospatial Memory Test: Studies of
normal performance, reliability, and validity. Psychological Assessment, 8, 145-153.
Benton, A. L., & Hamsher, K. (1983). Multilingual
Aphasia Examination. Iowa City, IA: AJA Associates.
Blair, J. R., & Spreen, O. (1989). Predicting premorbid IQ: A revision of the National Adult Reading Test. The Clinical Neuropsychologist, 3, 129136.
Brandt J. (1991). The Hopkins Verbal Learning Test:
Development of a new memory test with six equivalent forms. The Clinical Neuropsychologist, 5,
125-142.
Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A.
(1987). California Verbal Learning Test: Adult
Version. San Antonio, TX: The Psychological Corporation.
Geffen, G. M., Butterworth, P., & Geffen, L. B.
(1994). Test-retest reliability of a new form of the
auditory verbal learning test (AVLT). Archives of
Clinical Neuropsychology, 9, 303-316.
Kaplan, E. F., Goodglass, H., & Weintraub, S. (1983).
The Boston Naming Test (2nd ed). Philadelphia,
PA: Lea & Febiger.
McCaffrey, R. J., Ortega, W. H., Osillo, S. M., &
Nelles, W. B. (1992). Practice effects in repeated
neuropsychological assessments. The Clinical Neuropsychologist, 6, 32-42.
Medicare Part B of New York (1995, August). The
Medicare News Brief 95-12. Medicare Part B:
Crompond, NY.
Parker, E. S., Eaton, E. M., Whipple, S. C., Heseltine,
P. N. R., & Bridge, T. P. (1995). University of
Southern California Repeatable Episodic Memory
Test. Journal of Clinical and Experimental Neuropsychology, 17, 926-936.
Reitan, RM. (1958). Validity of the Trail Making Test
as an indicator of organic brain damage. Perceptual and Motor Skills, 8, 271-276.

HVLT-R

Rey, A. (1964). Lexamen clinique en psychologie.


Paris: Presses Universitaires de France.
Shapiro, D. M., & Harrison, D. W. (1990). Alternate
forms of the AVLT: A procedure and test of form
equivalency. Archives of Clinical Neuropsychology, 5, 405-410.
Snodgrass J. G., & Corwin, J. (1988). Pragmatics of
measuring recognition memory: Applications to
dementia and amnesia. Journal of Experimental
Psychology: General, 117, 34-50.
Sweet, J. J., Westergaard, C. K., & Moberg, P. J.
(1995). Managed care experiences of clinical
neuropsychologists. The Clinical Neuropsychologist, 9, 214-218.

55

Taylor, E. M. (1959). The appraisal of children with


cerebral deficits. Cambridge, MA: Harvard University Press.
Uchiyama, C. L., DElia, L. F., Dellinger, A. M.,
Becker,J.T., Seines, O.A., Wesch, J.E., Chen, B.B.,
Satz, P., Van Gorp, W., & Miller, E.N. (1995). Alternate forms of the Auditory-Verbal Learning
Test: Issues of test comparability, longitudinal reliability, and moderating demographic variables.
Archives of Clinical Neuropsychology, 10, 133145.
Wechsler, D. (1945). A standardized memory scale
for clinical use. Journal of Psychology, 19, 87-95.
Wechsler, D. (1987). Wechsler Memory Scale-Revised. New York: Psychological Corporation.