Documente Academic
Documente Profesional
Documente Cultură
261-296, 1995
Copyright 0 1995 F&vim Science Ltd
Pri&ciin the USA. AU rights reserved
0272~7356/95 $9.50 t .oo
Pergamon
of Psychiatry,
University
of British
Columbia
ONCE CONSIDERED RARE, obsessive-compulsive disorder (OCD) is now recognized as one of the most common psychiatric disorders. It has been described as a
hidden epidemic (Jenike, 1989, p. 539) with a lifetime prevalence of 2.3% in North
America and similar rates of occurrence in other countries (Weissman et al., 1994).
OCD is characterized by recurrent obsessions and/or compulsions of sufficient severity to be time consuming, cause marked distress, and interfere with daily functioning
(American Psychiatric Association [APA], 1994). Obsessions are intrusive thoughts,
Correspondence should be addressed to Steven Taylor, Department of Psychiatry, 2255
Wesbrook Mall, Universityof British Columbia, Vancouver, B.C., Canada, V6T 2Al.
261
262
S. Taylor
FOR EVALUATION
0bse.ssim.sand Gnapukti
263
METHODS:
CONTENT,
RELIABILITY,
AND VALIDITY
Behavioral Assessment
Behavioral theorists and therapists have long emphasized the importance of in vivo
assessment of problem behaviors (Cone, 1988). Accordingly, such methods have been
used in several studies of the efficacy of behavior therapy for OCD. In this section I
review behavioral assessment methods most commonly used in outcome studies:
behavioral avoidance tests, diary methods, and direct observation.
Behavioral Avoidance Tests (BATS)
Behavioral avoidance tests were developed originally as in vivo measures of fear and
avoidance exhibited by phobic individuals (Lang & Lazovik, 1963). In a BAT for snake
phobics, for example, the subject is asked to approach as close as possible to a snake,
which is presented under standardized conditions. The distance from the animal at
the point of closest approach is used as a measure of avoidance. The subject also is
asked to rate his or her peak level of distress (subjective units of distress; SUDS) on a
O-100 scale, where higher scores correspond to greater distress.
Several types of BATS have been used to assess O&elated fear and avoidance. Foa
and coworkers (Foa, Steketee, Grayson, Turner, 8c Latimer, 1984, Foa, Steketee, &
Milby, 1980) used a single-task BAT. Here, the therapist presents the patient with a
feared OGrelated stimulus (e.g., a compulsive washer would be presented with a contaminated object such as a trash can). The patient is asked to approach as close as possible to the object and report his or her SUDS level at the point of closest approach.
Avoidance behavior is assessed by distance from the object, or some other proximity
measure such as whether or not the patient is able to touch the object without wearing
gloves. The task is performed before and after treatment to assess changes in OGrelated fears. Note that for some BATS (e.g., those for compulsive checkers) avoidance is
indicated by the presence of compulsive rituals when the person is exposed to a personally threatening stimulus.
A single-task BAT may fail to capture the range of an individuals OGrelated fear
and avoidance; some OCs may fear and avoid a range of different stimuli, while others
have more circumscribed fear and avoidance. Accordingly, Rachman and colleagues
(e.g., Rachman, Hodgson, & Marks, 1971; Rachman, Marks, & Hodgson, 1973;
Rachman et al., 1979) devised a multitask BAT, in which OC patients each complete
264
S. Taybr
0bsesskm.sand Cum$nd.sti
265
urally occurring avoidance. Low demand BATSare likely to be better measures of such
avoidance (Kern, 1983). Unfortunately, the published studies provide little information about the instructions given to OC patients, and so it is difficult to estimate the
degree of demand placed on patients. Given these difficulties, along with the dearth
of reliability and validity data, it is not surprising that many OCD investigators no
longer use BATS (Emmelkamp, 1982; Foa, Steketee, 8c Milby, 1980)
Direct Observation
Direct observation of the frequency or duration of compulsive rituals has been used
in several case studies. Mills, Agras, Barlow, and Mills (1973) assessed washing compulsions of OC inpatients by installing a device that recorded the number of times the
patient approached and used the sink. For another inpatient, Mills et al. mounted a
video camera in a patients room to assess the duration of rituals associated with going
to bed. Both methods were validated against ratings made by ward staff, and were sensitive to treatment effects. Turner and colleagues (Turner, Hersen, Bellack, Andrasik,
& Capparell, 1980; Turner, Hersen, Bellack, & Wells, 1979) also used observational
methods to assess rituals. Ward staff were trained to use time sampling procedures to
assess rituals, and achieved good inter-rater reliability (rs = .87 to .99). These measures
were sensitive to the effects of treatment.
Direct observation methods have not been used in controlled outcome trials, and
would be difficult to use to monitor compulsions in outpatients. Their test-retest reliability, convergent validity, and discriminant validity remain to be determined.
Diary Methods
Diary methods are popular ways of assessing the frequency, duration, severity, and
context of problematic behaviors. For example, panic attack diaries are popular measures in treatment-outcome
studies of panic disorder (e.g., Clark et al., 1994) and
have been used in studies of a variety of other disorders, including social phobia
(Glass & Arnkoff, 1994), chronic pain (Philips, 1988), and insomnia (Lacks & Mot-in,
1992). Diary methods appear to be useful methods for assessing the frequency, dumtion, and situational determinants of obsessions and compulsions. Several studies
have used these methods in OCD outcome studies (e.g., Boersma, Den Hengst,
Dekker, & Emmelkamp, 1976; Foa, Steketee, & Milby, 1980; Hackman & McLean,
1975)) but there appear to be no published data on their psychometric properties.
In their review of the assessment of obsessions and compulsions, Mavissakalian and
Barlow (1981) noted that a major aim of treatment is to reduce the frequency and
duration of obsessive<ompulsive behaviors. Yet, they were surprised to find that frequency counts of target behaviors are rarely used in OCD treatment studies. The situation has not changed over the past decade; diary methods are rarely used despite
their value and ease of applicability. The development of reliable and valid diary
methods may help us understand how OCD treatments influence the patient in his or
her habitual environment.
OC SCALES FROM THE SYMPTOM CHECKLIST-90-REVISED
(SCL30-R OC) AND ITS PREDECESSORS
The Hopkins Symptom Checklist (HSCL; Derogatis, Lipman, Rickels, Uhlenhuth, &
Covi, 1974) is a 58item self-report inventory containing five scales: Somatization, OC
symptoms, interpersonal sensitivity, general anxiety, and depression. The OC scale
266
S. Taylor
consists of eight items, which each assess a different symptom. The subject is asked to
use a 4point scale to rate the extent to which he or she was distressed by each symp
tom over the past week. For each item, a rating of 0 indicates either that the symptom
was absent or that it was present but did not evoke distress. A rating of 3 indicates the
symptom was present and evoked extreme distress. Thus, the HSCL scales measure the
number and severity of symptoms. The HSCL was subsequently expanded to form the
Symptom Checklist-90 (SCLSO: Derogatis, Lipman, 8c Covi, 1973), which includes the
original five scales and four new scales: Hostility, phobic anxiety, paranoid ideation,
and psychoticism. Two items were added to the OC scale, and a 5-point (O-4) rating
was used throughout, using the same instructions and anchor points as the 4point rating. A minor revision was published as the SCLSO-R (Derogatis, 1977), where the OC
scale was unchanged apart from minor changes in wording. In summary, the OC scales
from the HSCL, SCLSO, and SCLSO-R are very similar to one another.
Reliability
Good internal consistencies have been reported for all versions of
the OC scale with coefficient a of 87 for the HSCL version (Derogatis et al., 1974), .86
for SCL90 version (Derogatis, Rickels, & Rock, 1976), and 88 to .91 for the SCL96R
version (Shutty, DeGood, 8c Schwartz, 1986; Woody, Steketee, 8c Chambless, in pres+b).
Itstend ccmshmy
Obsessionrand compllsionr
267
Z&c&&u& r&&y. The HSCL OC scale has been found to have medium-t&uge
correlations with measures of various types of psychopathology, with rs ranging from
36 (MMPI Scale 7, Psychasthenia) to .52 (IPAT anxiety scale; Steketee & Doppelt,
1986). Three studies found the SCL90 OC scale had medium-t&uge
correlations
with measures of depression (rs = .41 to .81), anxiety (rs = .54 to -64)) hostility/anger
(rs = .43 to .65), and psychotic symptoms (r= .57; Clark & Friedman, 1988; Derogatis
et al., 1976, Dinning & Evans, 1977). The SCLSO-R OC scale has been found to have
large correlations with the SCL90-R anxiety scale (r = .56) and SCL-90-R depression
scale (r = .79). In summary, the OC scales of the HSCL, SCL90, and SCL90-R have
medium-telarge
correlations with measures of various types of psychopathology,
including depression, anxiety, hostility/anger, and psychotic symptoms. These correlations tend to be larger than the convergent validity correlations, which indicates
poor discriminative validity.
The SCLSO-R OC scale and predecessors have acceptable internal consistency and
adequate test-retest reliability for periods up to at least 7 days. There have been few
studies of criterion-related validity, and available findings offer mixed support.
Convergent validity appears adequate, but discriminant validity is poor. The OC scales
have medium-to-large correlations with a variety of psychopathologic measures, which
suggests the OC scales are largely measures of general (nonspecific) distress. This passibility is supported by a review of the item content of the scales. The scales overemphasize nonspecific distress and under-emphasize OC symptoms; half the items of the
HSCL OC scale and 40% of items of the later versions refer to nonspecific symptoms
found in several anxiety and mood disorders; that is, your mind going blank, ?rouble remembering things, difliculty making decisions, and trouble concentrating.
A further problem is the scales confound the frequency of symptoms with the amount
of distress evoked by them. This makes scores ambiguous; a high score indicates the
symptom was present and evoked distress, but a low score may indicate low fiequency, low distress, or both. Given these problems, the SCL90-R OC scale (and predecessors) are not recommended as treatment outcome measures.
LEYTON
OBSESSIONAL
INVENTORY
(LOI)
268
S. Taylor
cards in boxes marked yes or %o. An assessor then instructed the subject to make
resistance and interference ratings for select items. Separate versions of the LO1
were devised for men and women, distinguished by minor differences in wording
(Cooper, 1970). The postbox format proved cumbersome and time consuming,
requiring 30-45 min per subject (Cooper & McNeil, 1968). The LO1 was later converted to a self-report questionnaire, and the wording was revised to make a common
version for both genders (Kazarian, Evans, & Lefave, 1977; Snowdon, 1980).
Snowdon (1980) reported the postbox and questionnaire versions were highly correlated (r= .72). For most purposes the postbox and questionnaire versions are prob
ably interchangeable, although treatment outcome studies favor the latter because
of its ease of administration.
Reliability
Internal cxms&mq. The symptom, interference,
and resistance subscales have acceptable-to-good internal consistencies, with coefficients ~ZC
ranging from .75 to .90 (Richter,
Cox, & Direnfeld, 1994; Stanley et al., 1993).
TestRetat red-$.
Kim et al. (1989) assessed a sample of OCs and obtained good
7day test-retest reliabilities, with intraclass correlations ranging from .80 (interference subscale) to .83 (resistance subscale). Kim, Dysken, and Kuskowski (1990)
administered the LO1 three times over 14 days to another sample of OCs. The intraclass correlation for the total scale (sum of symptom and trait items) was .73.
Intraclass correlations were .79 and .84 for the interference and resistance subscales,
respectively. These results suggest the subscales have acceptable test-retest reliabilities, at least over a 14-day interval.
Validity
CriteriorrRe;late z&d+. Cooper (1970) and Millar (1980) found that OCs scored
higher than normal controls on each of the symptom, inference, and resistance sub
scales. Kendell and DiScipio (1970) found that OCs scored higher than depressed
patients on these subscales. Millar (1983) found that OCs scored higher than
depressed patients on the interference and resistance subscales, but not on the symp
tom subscale. Stanley et al. (1993) used a structured interview the Anxiety
Disorders Interview Schedule-Revised (ADIS-R: DiNardo & Barlow, 1988) - to estab
lish the diagnostic status of their patients. The ADIS-R has good reliability and validity for the diagnosis of DSM-III-R anxiety disorders (DiNardo, Moms, Barlow, Rapee,
& Brown, 1993). Stanley et al. found OCs differed from patients with other anxiety
disorders on the LO1 symptom, interference, and resistance subscales. In summary,
most studies were limited by the fact that the validity of the criterion (diagnostic status) is unknown, because diagnoses were based on chart reviews or unstructured
interviews. Nevertheless, most studies support the criterion-related validity of the
symptom, inference, and resistance subscales.
comrergent z&d&y. The LOI symptom, interference, and resistance subscales tend to
have large correlations (mean r= .62, range = .38 to .77) with other OC measures (i.e.,
SCLSO-R OC scale, Maudsley Obsessional Compulsive Inventory, Padua Inventory,
and Yale-Brown Obsessive Compulsive Scale; Hodgson 8c Rachman, 1977; Kim et al.,
1990; Kraaijkamp, Emmelkamp, & van den Hout, 1986, Richter et al., 1994; Sanavio,
1988; Stanley et al., 1993). This indicates good convergent validity.
269
Disc&a&~
val&Uy. Rendell and DiScipio (1970) found the LO1 symptom subscale
had large correlations (r= .53) with the neuroticism scale of the Eysenck Personality
Inventory (EPI). Stanley et al. (1993) found the LO1 subscales had moderate correlations (rs = .36 to 37) with the EPI neuroticism scale. The subscales had small correlations (Iris < .27) with SCLSO-R scales assessing somatization, anxiety, phobia,
depression, and interpersonal sensitivity. The LO1 subscales had small-tomedium correlations with the SCLSO-R hostility, paranoid ideation, and psychoticism scales
(rs = .29 to .42). There was little to distinguish the pattern of correlations of the LO1
subscales. Richter et al. (1994) found the subscales had medium-tolarge correlations
(7s = .43 to .50) with the Hamilton depression scale. In all, the correlations between
the LO1 subscales and non-OC measures tend to be smaller than the convergent validity correlations. These findings support the discriminant validity of the LO1 subscales.
Comment
The LOI subscales have acceptable psychometric properties, yet they also have several important drawbacks. The three subscales are highly intercorrelated with one
another (mean r= .81, range = .70 to .91: Rachman et al., 1973; Richter et al., 1994;
Stanley et al., 1993), which suggests it is redundant to use them all as outcome measures. The symptom subscale was developed to assess house-proud housewives and,
therefore, contains many items concerned with cleanliness and tidiness. It has few
items assessing other symptom domains. For example, only three items pertaining to
checking, which is a serious limitation because checking is one of the most common
compulsions (APA, 1994). This means that OCs with checking rituals may obtain spuriously low scores on the LO1 symptom subscale.
A further problem is the resistance subscale may yield misleading results because it
confounds the intensity of resistance with the number of obsessions and compulsions
reported by the person. That is, the resistance scale is constructed such that an item
is rated for resistance only if the subject indicates that he or she experiences the symp
tom described in the item. Thus, high scores on the resistance scale can be obtained
only from subjects endorsing a lot of symptoms. Although there can be no resistance
unless the person has at least one OC symptom, this does not mean that resistance is
naturally correlated with the number of obsessions and compulsions reported by the
person. Indeed, the LO1 resistance scale can produce a misleading picture of the
patients degree of resistance. To illustrate, two patients might equally struggle to
resist their symptoms. If patient A has more symptoms than patient B, then patient A
will obtain a higher score on the LO1 resistance subscale, giving the misleading
impression that patient A is exerting stronger resistance.
The LO1 resistance subscale also entails the questionable assumption that greater
resistance is associated with more psychopathology. Indeed, it can be argued that
greater resistance is associated with less psychopathology
(Goodman,
Price,
Rasmussen, Mazure, Fleishmann, et al., 1989). Resistance to compulsions, for example, is a means of attaining mastery over symptoms and an important component of
behavior therapy for OCD (Rachman & Hodgson, 1980; Steketee, 1993). Resisting
obsessions also can lead to symptom reduction, to the extent that resistance involves
refusing to act on ones obsessional fears; for example, refusing to avoid fearevoking
stimuli can lead to habituation of obsessional fears. Resistance by means of deliberately suppressing obsessions can (under certain conditions) lead to a paradoxical increase in obsession frequency (Salkovskis & Campbell, 1994). Despite this exception,
it is generally found that if measures of resistance are not confounded with symptom
270
S. Tzylur
prevalence, then the degree of resistance is negatively correlated with the severity of
obsessions and compulsions (Goodman, Price, Rasmussen, Mazure, Fleishmann et al.,
1989; Woody et al., in press-a). Given the questionable assumptions underlying the
construction of the LO1 resistance subscale, it appears to be of dubious value in assessing resistance to obsessions and compulsions.
MAUDSLEY
OBSESSIONAL
COMPULSIVE
INVENTORY
(MOCI)
Hodgson and Rachman (1977) generated 65 true-false items to assess overt rituals and
related obsessions. The items were administered to 50 OCs and 50 non-OC neurotics.
The groups were discriminated by 30 items, which were retained to form the MOCI.
The authors then administered the scale to 100 OCs and factor analyzed the respons
es. Five factors were obtained. Four were used to form the MOCI subscales: (a) washing (11 items), (b) checking (9 items), (c) obsessional slowness/repetition (7 items),
and (d) doubting/conscientiousness
(7 items). The fifth factor, which assessed obsessional rumination, had salient loadings for only two items and so it was disregarded.
The subscales are essentially symptom checklists; that is, their scores reflect the
amount of time consumed by OC symptoms. To illustrate, a high score on the checking subscale indicates that the person spends a great deal of time checking and
rechecking. A high score on the doubting/conscientiousness
subscale indicates the
person has serious doubts about whether he/she has performed tasks adequately, and
a sense of incompleteness even when tasks are performed carefully (Rachman 8c
Hodgson, 1980).
Relationships among Subscales
The MOCI subscales were developed because they corresponded to separate factors,
and so conveyed unique (non-redundant) information. Factor analytic studies have
replicated the washing, checking, and doubting/conscientiousness
factors (Chan,
1990; Kraaijkamp et al., 1986; Rachman 8c Hodgson, 1980; Sanavio & Vidotto, 1985;
Stemberger & Bums, 199Ob), but only Kraaijkamp et al. (1986) found support for a
factorially distinct slowness subscale. In most studies, items from the slowness subscale
tended to load on other factors, such as the doubting/conscientiousness
factor.
Although most of the subscales are conceptually and Eactorially distinct, this does
not mean they are entirely unrelated. The doubting/conscientiousness
subscale
assesses doubts or uncertainties about the adequacy of ones actions. Such doubts can
lead to the repetition of actions, such as repeated checking, washing, and slowness in
completing tasks. Accordingly, the doubting/conscientiousness
subscale is correlated
with the checking subscale (mean r= .50) and has smaller but nontrivial correlations
with the washing subscale (mean r = .27) and the slowness subscale (mean r = .21;
Chan, 1990; Hodgson 8c Rachman, 1977; Richter et al., 1994).
Reliability
In&mud cmktmcy. Studies using clinical samples have generally obtained acceptable
internal consistencies for the checking, cleaning, and doubting/conscientiousness
subscales, with coefficients a ranging from 60 to .87 (Hodgson & Rachman, 1977;
Kraaijkamp et al., 1986; Rachman & Hodgson, 1980; Richter et al., 1994). Studies of
student samples yielded lower internal consistencies, ranging from .40 to .62 (Ghan,
1980 ; Sanavio & Vidotto, 1985; Stemberger & Bums, 1980b). Lower crs may have
been due to range restriction. Studies of clinical and nonclinical samples have gener-
271
ally found very low internal consistencies for the slowness subscale, with as ranging
from 0 to .44 (Chan, 1990 ; Rraaijkamp et al., 1986; Rachman & Hodgson, 1980;
Sanavio 8c Vidotto, 1985). Very low QS for the slowness subscales may be due to item
heterogeneity (see the Comment section below).
T&Retest w&&.Zity. Hodgson and Rachman (1977) examined the 4week test-retest
reliability for a sample of university students. Rendells (1963) z was used to examine
the concordance between item responses across the retest interval. For the sum of
MOCI items, test-retest reliability was found to be acceptable (2 = 8). Kraaijkarnp et
al. (1986) used the same procedure to examine the 4week test-retest reliability in a
mixed sample of OCs and depressed patients. Reliability was good (Z = .84) and MOCI
total scores correlated .92 across the test-retest interval. Stemberger and Bums
(1990b), using a sample of university students, found the 6-7 month test-retest reliability was acceptable for the MOCI total score (r = .69). In summary, the available data
suggest the MOCI total score has acceptable test-retest reliability over a period of at
least 6-7 months. Test-retest reliabilities of the subscales have yet to be reported.
Validity
CriteriorRe&ed r&d&y. Hodgson, Rankin, and Stockwell (1979, unpublished, cited
in Rachman 8c Hodgson, 1980) found the MOCI total scale discriminated between
phobics and OCs. Rraaijkamp et al. (1986) found the MOCI total scale reliably discriminated OCs from normal controls, anorectics, and patients with non_OC anxiety
disorders. Diagnoses were made from information obtained from the Present State
Exam (Wing, Cooper, & Sartorius, 1974). The MOCI did not discriminate between
OCs and depressed patients, although the latter may have been a chance result because comparisons were based on 89 00 and only 6 depressives. Compared to normal
controls and the combined psychiatric samples (anorexia, depression, and nor&C
anxiety disorders), OCs had higher scores on all MOCI subscales, except the slowness
subscale, where OCs and normals did not differ.
Hodgson and Rachman (1977) obtained retrospective therapist ratings of the
severity of washing and checking rituals for OCs. Patients were diagnosed on the basis
of unstructured clinical interviews. Patients were classified according to dichotomized
therapist ratings (slight or no problem vs. moderate or severe problem) for washing rituals and compared to dichotomized scores (low vs. high) on the MOCI washing
subscale. The same type of classification was made for therapist ratings of checking rituals and the MOCI checking subscale. Concordance between therapist ratings and
MOCI subscale scores was assessed by the y coefficient (Goodman & RruskaI, 1963).
Acceptable concordance (y = .7) was obtained for both washing and checking classifications. Rraaijkamp et al. (1986) performed the same analysis for a sample of OCs,
classified as either washers or checkers by two independent raters. The Y coefficient
was calculated separately for each rater, and was .74 and .78 for checking rituals, and
.85 and .89 for washing rituals. In sum, the results support the criterion-related validity of the MOCI total score and its washing, checking, and doubting/conscientiousness subscales. The only study of the slowness subscale (Rraaijkamp et al., 1986) failed
to support its criterion-related validity.
C&q+
v&d&y. The MOCI tends to have large correlations (mean r= .57, range = .23
to -77) with other OC measures (i.e., XL-9GR OC scale and predecessors, subscales of
the Leyton Obsessional Inventory, Compulsive Activity Checklist, Padua Inventory, and
272
S. Taylor
Yale-Brown Obsessive Compulsive Scale; Freund, Steketee, & Foa, 1987; Goodman et
al., 1989b; Hodgson & Rachman, 1977; Rraaijkamp et al., 1986; Richter et al., 1994;
Sanavio, 1988; Steketee & Doppelt, 1986; Steketee & Freund, 1993; Stemberger &
Bums, 1990a, 1990b; van Oppen, 1992; v-an Oppen et al., 1995; Woody et al., in pressa, in press-b). These results support the convergent validity of the MOCI.
Diseriminant validily. Chan (1990) found the MOCI correlated .54 with the Beck
Depression Inventory, and Richter et al. (1994) found the MOCI correlated .41 with
the Hamilton depression scale. Stemberger and Bums (1990b) found the MOCI had
small-t-medium correlations with all SCUO-R scales (rs = .26 to .36) except the SCL
90-R OC scale (r = .51). In general, the results show that correlations with non-OC
measures tend to be lower than correlations with OC measures, which supports the
discriminant validity of the MOCI.
tlhwqmt
and discri?ninantmalidityof the MOCZsubscaks Several studies have examined
the convergent and discriminant validities of the MOCI washing and checking subscales. The MOCI washing subscale has been found to have large correlations with the
Padua inventory contamination subscale (rs = .53 to .87) and small-tomedium correlations with the Padua checking subscale (rs = -.05 to 33) (Stemberger & Bums,
1990a; van Oppen, 1992; van Oppen et al., 1995). A similar pattern of results was
obtained for the MOCI checking subscale, which had large correlations with the
Padua checking subscale (rs = .62 to .84) and small-tomedium correlations with the
Padua contamination subscale (rs = .24 to .35) (Stemberger & Bums, 1990~ van
Oppen, 1992; van Oppen et al., 1995). The MOCI washing subscale has small-tomediurn correlations with the MOCI checking subscale (rs = .25 to .46: Chan, 1990;
Hodgson & Rachman, 1977; Stemberger & Bums, 1990b). These results indicate good
convergent and discriminant validities of the MOCI washing and checking subscales.
Sher and colleagues (Sher, Frost, & Otto, 1983; Sher, Mann, & Frost, 1984) found
that students with high scores on the MOCI checking subscale, compared to students
with low scores, had higher scores on a self-report measure of the frequency of
checking of everyday actions (e.g., checking lights and door locks). Frost and Sher
(1989) administered the MOCI to a sample of college students 1 month before an
exam. During the exam, students were asked to indicate how many times they
checked their answers. The MOCI checking subscale was correlated .27 with checking frequency, whereas the other subscales correlated were unrelated to checking
frequency (rs = -.08 to .02).
The MOCI checking and washing subscales have medium-tolarge
correlations
(r-s = .30 to .51) with the Beck Depression Inventory and Hamilton depression scale
(Chan, 1990; Richter et al., 1994). These tend to be lower than the convergent validity correlations, and so support the discriminant validity of the checking and washing
subscales. There is insufficient information to evaluate the convergent and discriminant validity of the other subscales.
The MOCI total scale has generally acceptable psychometric properties, as does its
washing and checking subscales. The other subscales require further investigation.
Available evidence suggests the slowness subscale is in need of revision. The MOCI
subscales were developed on the basis of factor analysis, and subsequent studies sup
port the factorial distinction between all but the slowness subscale. The latter has
Obsess~andGma~ionr
273
poor internal consistency, which is not surprising given its item content. Two of its
items are related to ruminations, two items refer to compulsive counting and the need
for routine, and only three items make direct reference to obsessional slowness.
Although the MOCI total scale has adequate psychometric properties, it also has
important limitations. The scale was developed to assess obsessions and compulsions
associated with overt rituals (Hodgson & Rachman, 1977). Yet some items do not
directly pertain to obsessions or compulsions (e.g., Some numbers are extremely
unlucky, Neither of my parents was very strict during my childhood). The MOCI
assesses washing and checking compulsions, which are the most common types of
compulsions (APA, 1994; Rachman & Hodgson, 1980), but does not assess other
important compulsions such as hoarding and covert rituals. It provides a limited
assessment of obsessional ruminations (two items).
The MOCI does not assess important parameters of OCD, such as interference and
resistance to compulsions. Interference only can be inferred by the number of symp
toms endorsed by the subject. Moreover, because the MOCI emphasizes cleaning and
checking rituals, patients with these compulsions may obtain higher overall MOCI
scores than patients with other, equally severe OC symptoms. This means it is possible
that patients with moderate washing and checking compulsions may obtain higher
MOCI scores than patients with severe obsessions or hoarding compulsions
(Goodman, Price, Rasmussen, Mazure, Delgado, et al., 1989).
The MOCI could be improved by addressing these issues. The internal consistency
of the slowness subscale could be enhanced by increasing the length of the scale by
adding items central to the construct of obsessional slowness. The addition of sub
scale(s) to assess obsessions also would improve the coverage of the MOCI. The addition of resistance and interference subscales would enhance the breadth of assessment.
COMPULSIVE ACTIVITY CHECKLIST (CAC)
The CAC was developed originally as a 62-item interviewer-administered schedule
to assess the extent to which OC symptoms interfere with everyday activities (Philpott,
1975). Each item lists an activity (e.g., washing, dressing, using electrical appliances),
which is rated on a 4-point scale, ranging from 0 (performance of activity within normal limits) to 3 (complete impairment). Impairment is rated according to four criteria: frequency, duration, avoidance, and oddity of behavior. To illustrate, a score of 3
would be given if (a) the activity takes three times longer than usual, (b) is three times
as frequent as usual, (c) definitely appears very odd, or (d) avoidance markedly interferes with activity. Criteria for normal and odd behavior are left to the judgement
of the interviewer. Interviewers are instructed to elicit concrete information to allow
them to make a rating (e.g., How long does it take you to brush your hair?).
The CAC has been revised several times, mainly by deleting items and changing to
a self-report format. Marks, Hallam, Connolly, and Philpott (1977) developed clinician-rated and self-report versions, each containing 39 items. Freund et al. (1987)
developed a 38item observer-rated version, and Cottraux, Bouvard, Defayolle, and
Messy (1988) developed an 18item self-report version. Most recently, Steketee and
Freund (1993) developed a 28item self-report version. Each revision was intended to
increase item homogeneity and discriminability of OCs from other populations,
although as we will see, the versions have very similar psychometric properties.
Instructions and the rating scale remained essentially unchanged. In summary, each
version is a measure of global impairment due to obsessions or compulsions, taking
into account duration, frequency, and avoidance.
274
S. Taylor
Reliability
Internal txm&emy. Good internal consistency has been reported for the 37-item selfreport CAC (a = .94: Cottraux et al., 1988) and for the 38item observer-rated version
(a = .91; Freund et al., 1987). Similar results were obtained for the 38item self-report
version (as = .86 to .95: Stemberger & Bums, 1990b; Steketee & Freund, 1987) and
for the 28item self-report version (a = .87: Steketee 8c Freund, 1987). Internal consistency of the other, less popular versions have not been reported.
Znterrcrter&ability
and txd&m&p
between seZf+
and obsewerd
vemions. Marks,
Stem, Mawson, Cobb, and McDonald (1980) had two independent assessors administer the 39item CAC to a sample of OCs. Total scores correlated .95 between
observers, and the observer-rated and self-report versions correlated .83. Freund et al.
(1987) obtained moderate inter-rater agreement (r = .64) for the 38item CAC. The
mean CAC score, averaged across raters, correlated .94 with the 38-item self-report
CAC. These results suggest the observer-rated CAC has adequate interrater reliability.
The self-report and observer-rated versions are highly correlated. It is possible the correlations between observer-rated and self-report versions were inflated by criterion
contamination. That is, patients may have rated their responses to the self-report version simply by recalling their responses to the observer-rated version.
TestAtetest mZiubiZity.Freund et al. (1987) averaged CAC ratings from two interviewers to examine the test-retest reliability of the S&item CAC. Test-retest reliability was .68 for a retest interval ranging from 5 to 60 days (mean = 37 days).
Cottraux et al. (1988) administered the 37-item self-report CAC to a sample of normal controls and found the l-month test-retest reliability was .62. Sternberger and
Burns (1990b),
using a sample of university students, obtained a 6-7 month
test-retest reliability of .74. Extrapolating from these results, it seems likely that the
self-report and observer-rated versions have good test-retest reliability over a period of weeks, if not months.
Validity
~riterion_~elated zm~idity. Using the 37-item self-report CAC, Cottraux et al. (1988)
found that OCs (diagnosed by an unspecified method) had higher scores than panic
disordered patients, social phobics, and normal controls. Steketee and Freund (1993)
compared OCs (diagnosed by an unspecified method) to patients with other anxiety
disorders and to university students. OCs had significantly higher scores on 29 of 38
items of the self-report CAC. In the absence of information on the reliability and
validity of the diagnoses, these findings offer only tentative support for the criterionrelated validity of the CAC.
Convergent validity. The self-report and observer-rated CACs tend to have medium correlations (mean r= .40, range = .19 to .84) with other OC measures (i.e., SCLSO-R OC
scale, Maudsley Obsessional Compulsive Inventory, Padua Inventory, and Likert scale
ratings of symptom severity; Cottraux et al., 1988; Freund et al., 1987; Marks et al.,
1980; Steketee 8c Freund, 1993; Stemberger & Bums, 1990b). These results support
the convergent validity of the CAC.
LXscriminant w&%+ Freund et al. (1987) found the 38item observer-rated CAC had a
medium correlation with the SCLSO-R OC scale (r = .38) and slightly smaller correla-
ObsessionsandconrpuLFionr
275
tions with the other !SCLWR scales (rs = .14 to 31). Foa et al. (1987) found the observer-rated CAC had medium correlations (rs = 33 to .47) with measures of depression
(i.e., the Reck Depression Inventory and patient and observer-mted Likert measures of
depression severity). To summarize, the observer-rated CAC has correlations with nonOC measures that tend to be similar in magnitude to correlations with OC measures.
This indicates weak discriminant validity. The same conclusion probably holds for the
self-report CAC, because the self-report and observer-rated CACs are highly correlated.
Comment
Since its appearance in the 197Os,the CAC has been through several revisions.The most
popular are the S&item self-report and observer-rated versions. The 2838 item self
report and observer-rated versions have very similar psychometric properties. Test-retest
reliability and internal consistency are good. Interrater reliability appears adequate.
Criterion-related and convergent validities are acceptable, but discriminant validity
appears weak. A further problem with the CAC is that it provides only an indirect measure of OC symptoms because it assessesonly the degree of interference in everyday
activities. It does not directly assessobsessions or compulsions. Moreover, scores on the
CAC are ambiguous because they confound slowness,avoidance, and oddity of behav
ior. The lack of a structured interview is a further limitation for the observer-rated version because psychometric properties may depend on the skill and training experiences
of the interviewer(s) rather than the properties inherent to the CAC.
OC SCALE FROM THE COMPREHENSIVE
PSYCHOPATHOLOGICAL
RATING SCALE (CPRS-OC)
The Comprehensive Psychopathological Rating Scale (CPRS, Asberg, Montgomexy,
Perris, Schalling, & Sedvall, 1978) is a set of 63 clinician-rated items that assessa range
of psychiatric signs and symptoms. Each item defines a sign or symptom, which is rated
on a 4point (O-3) severity scale. Each point on the rating scale also is accompanied by
a description. For example, a rating of 3 on the rituals item is indicated by extensive
rituals or checking habits that are time consuming and incapacitating. The interviewer is required to elicit sufhcient information to rate each item, using an unstructured
clinical interview. The CPRS-OC consists of eight items selected from the CPRS
because a sample of 24 OCs scored higher on these items than on the remaining items
(Thoren, Asberg, Cronholm, Jomestedt, & Traskman, 1980). The items are as follows:
rituals,D inner tension, compulsive thoughts, concentration difficulties, worrying over trifles, sadness,* lassitude, and indecision. Four of these items also are
included in the CPRS depression scale.
Reliability and Validity
The CPRS-OC has been used in several pharmacotherapy trials (Table 3), even though
its psychometric properties are largely unknown. Internal consistency and test-retest
reliability have yet to be examined. Thoren et al. (1980) reported moderate-tohigh interrater correlations for individual items (rs = .30 to .93) and for the total score (r = .97).
Criterion-related, convergent, and discriminant validities have yet to be examined.
Com m ent
There are several limitations to the CPRSOC. Its psychometric properties are largely
unknown, and only two of its eight items are specific to OCD: compulsive thoughts
276
S. Taylor
(obsessions) and rituals. The remaining items are either features of depression
(lassitude, n concentration difficulties, indecision, sadness) or are nonspecific
features of anxiety states (worrying over trifles, * inner tension). Insel et al. (1983)
modified the CPRS-OC by deleting items assessing sadness, inner tension, and worry.
The resulting 5-item scale still shares two items with the depression scale. Even in its
revised form the CPRS-OC appears to be largely a measure of nonspecific distress.
LIKERT SCALES
A variety of single-item Qpoint Likert scales have been developed to assess a variety of
aspects of OCD, including global measures of severity of obsessions and compulsions,
and specific scales, including measures of the degree of OGrelated fear, degree of avoidance, time spent ritualizing, and severity of urges to ritualize (e.g., Emmelkamp, 1982;
Foa et al., 1983,1992).
The scales may be rated by the patient or by an interviewer.
Reliability
Intewutmrvhbi&y and tdadmship betweenseljk+ort and obsemwm&d zmGons. Foa et
al. (1983) obtained high inter-rater correlations for Likert measures of severity of
obsessions and severity of compulsions (rs = .92 to .97). Cottraux et al. (1990) reported large correlations (rs = .74 to .89) between a self-report and observer-rated versions of two types of Likert measures (OGrelated anxiety/discomfort and duration of
compulsions). Large correlations also have been obtained among the patients, therapists, and independent observers ratings of a range of OC features, including main
fear, avoidance, and compulsion severity (rs = .64 to .83; Foa, Steketee, Kozak, 8c
Dugger, 1987). Thus, there is evidence of good interrater reliability, and high correlations between self-report and observer-rated Likert scales.
T&Ret& ns&zbiZi~.Steketee, Freund, and Foa (1988) reported the test-retest reliability of Likert scales (assessing main fear, avoidance, general functioning, anxiety,
and depression) ranged from .40 to .87 for self-report ratings, and .20 to 50 for
observer ratings over a mean 6O-day interval. These data suggest considerable variation in test-retest reliabilities. Unfortunately, reliabilities were not reported for individual scales (only the above-mentioned ranges were given), so it is not possible to
identify which scales had the lowest reliability. In summary, the test-retest reliability
of Likert scales require further investigation.
Validity
CriteriorrRelated valid&y. There have been no published studies of the criterionrelated validity of these scales. It may be assumed that the scales should have good
criterion-related validity because patients without OCD would have low (or zero)
scores on items measuring global severity of obsessions, compulsions, etc. However,
this assumption may not be warranted because unwanted intrusive thoughts often
occur in people without OCD (Rachman & de Silva, 1978; Salkovskis & Harrison,
1984), and compulsion-like behaviors (e.g., excessive checking) can occur in
patients with disorders other than OCD (e.g., generalized anxiety disorder; Craske,
Rapee, Jackel, & Barlow, 1989).
Ce
r&&y. Likert measures of OC symptoms generally have moderate correlations (mean r = .32, range = .17 to .62) with other OC measures, including the
277
COMPULSIVE
SCALE (COG)
The GOCS (Insel et al., 1983) is a single-item Likert-like measure of the overall
severity of OC symptoms. It is a clinician-rated scale based on other NIMH global
rating scales, such as the global measures of mania and depression (Murphy, Pickar,
& Alter-man, 1982). It differs from the Likert scales described in the previous section in two ways: the number of rating points (15 vs. 9), and the clustering of
descriptors on the scale. The observer completes the GOCS by selecting one of 15
severity levels, ranging from 1 (minimal symptoms or within normal range) to 15
(very severe). Severity levels are clustered into five main groups (i.e., ratings of l-3,
4-6, 7-9, 10-12, and 13-15), with detailed descriptors for each cluster. For example, ratings from lo-12 represent severe obsessive-compulsive behavior, defined as
symptoms that are crippling to the patient, interfering so that daily activity is an
active struggle. Patient may spend full time resisting symptoms. Requires much
help from others to function.
Reliability and Validity
The GOCS has been used in several treatment outcome studies (Table 3) even though
little is known about its psychometric properties. Inter-rater reliability has yet to be
determined. Two studies have examined test-retest reliability. Kim et al. (1992)
reported a twoweek intraclass correlation of .98, and Kim, Dysken, Kuskowski, and
Hoover (1993) obtained a 2-week intraclass correlation of .87.
There have been no studies of criterion-related validity or discriminant validity. With
regard to convergent validity, one study found the GOCS had a medium correlation
(7 = .33) with the SCLSO-R OC scale, and several studies obtained large correlations
278
S. Tqlur
(mean r= .69, range = .63 to .77) with the YBOCS (Black, Kelly, Myers, de Noyes, 1990;
Goodman, Price, Rasmussen, Mazure, Delgado et al., 1989; Kim et al., 1992,1993). The
convergent validity of the COCS is promising, albeit in need of further evaluation.
Correlations between the COCS and YBOCS may have been spuriously inlIated because
in each case the scales were administered by the same interviewer. This means that ratings made on the YBOCS may have influenced those on the COCS, or vice versa.
Comment
The COCS has the advantage of being a simple l-item scale, which, no doubt,
accounts for its popularity in treatment outcome studies. However, little is known
about its reliability and validity. COCS ratings are based on unstructured clinical interviews, and so its psychometric properties may vary widely from one study to the next,
depending on the adequacy of the interviews. The COCS provides only a global
assessment of OC symptoms, and fails to capture information about the severity of different types of OC symptoms.
YALE-BROWN
OBSESSIVE-COMPULSIVE
SCALE
Obsd
and Ckm@lsions
279
The YBOCS investigational items assess the following: amount of time free of obsessions or compulsions, insight into the irrationality of obsessions and compulsions,
avoidance, degree of indecisiveness, overvalued sense of personal responsibility,
obsessional slowness/inertia, pathological doubting, global severity, overall response
to treatment, and reliability of information obtained from the patient. They are rated
by the interviewer on O-4 or O-6 scales, similar to those used for the core items.
YBOCS resistance items are rated such that greater resistance is associated with
lower scores, because greater resistance is associated with less impairment in social
and occupational functioning. This scoring rule is supported by the finding that resistance scores are correlated with less severe OC symptoms, as assessed by other YBOCS
items (Goodman, Goodman, Price, Rasmussen, Mazure, Fleishmann et al., 1989;
Woody et al., in press-a).
In practice, most published treatment outcome studies used only the sum of the 10
core items. Scores on the obsession and compulsion subscales are infrequently used,
and the Symptom Checklist has yet to be used as an outcome measure. In the following, the review is confined to the psychometric properties of the lo-item YBOCS
because there is little or no available information on the properties of the Symptom
Checklist or the investigational items. Accordingly, I will use the acronym YBOCS to
refer to the scale formed by the sum of the 10 core items.
Reliability
Z&et-m&r reEub29. Price, Goodman, Charney, Rasmussen, and Heninger (1987) ob
tamed an intraclass correlation of .99 when the YBOCS was administered by two independent raters to 10 OCs. Goodman, Price, Rasmussen, Mazure, Fleishmann et al.
(1989) assessed the inter-rater reliability of the YBOCS by having six trained raters
evaluate videotape interviews of six OCs. The intraclass correlation was .80. In a second study reported in the same article, four trained raters evaluated videotaped interviews of 40 OCs, yielding an intraclass correlation of -98. Jenike et al. (1990) used four
raters to assess 40 OCs and obtained an intraclass correlation of .96 for the YBOCS.
No information was presented on whether the ratings were based on audiotapes,
videotapes, or live interviews. Woody et al. (in press-a) had an interviewer obtain
YBOCS ratings from live interviews of 30 OCs, and then a second rater listened to
audiotapes of the interviews. The intraclass correlation was .93.
The results of these studies suggest the YBOCS has excellent inter-rater reliability.
However, it is possible that inter-rater reliability was spuriously inflated, at least to some
degree. The reliability estimates were obtained by having one evaluator rerate taped
interviews of another evaluator. This shows that one can score anothers interview reliably, but not that one can administer the instrument reliably. It is quite a different task
to reproduce a raters score, based on a taped interview, than to interview the patient
from scratch and obtain a score that matches that of another rater who also interviews the patient independently. If the original (criterion) rater makes the mistake of
giving the patient actual rating categories to choose from (instead of the interviewer
rating the categories), then extremely high reliabilities can be obtained on rerating if
the patients self-ascribed category is the rating. The irony, of course, is that just
accepting the patients self-rating, rather than doing the difficult work of evaluating
the details of the symptoms, and assigning a rating, appears more reliable.
Unfortunately, it appears to be common for evaluators using the YBOCS to make the
lThe author acknowledges, with thanks, an anonymous reviewer as the source of these comments.
280
S. Taylor
mistake of giving patients the rating categories, despite their having received training
to the contrary.
Znternulm.
The YBOCS has acceptable-to-good internal consistency with coefficients 01 ranging from .69 to .91 (Goodman, Price, Rasmussen, Mazure, Fleishmann
et al., 1989; Richter et al., 1994; Woody et al., in press-a).
TestRezest &ubili#y. Rim et al. (1990,1992,
1993) administered the YBOCS to three
samples of OCs three times over a 2-week period. Intraclass correlations ranged from
.81 to .97. Woody et al. (in press-a) administered the YBOCS to 24 OCs on two occasions over test-retest intervals ranging from 10 to 103 days (mean = 49 days). The
intraclass correlation was .61, and was reduced probably because of the large retest
interval. The findings suggest the YBOCS has good test-retest reliability over at least
a cl-week interval.
Validity
cr&r&&&let&
valid@. The VBOCS was intended for use with patients diagnosed
with OCD, and so there has been only one study of its criterion-related validity.
Rosenfeld, Dar, Anderson, Robak, and Greist (1992) found that patients with OCD
(method of diagnosis unspecified) had higher YBOCS scores than patients with other
anxiety disorders and normal controls.
Convergientr.&i#y.TheYBOCStendst 0 h ave large correlations (mean r= .51, range = .17
to .77) with other OC measures (i.e., anxiety and avoidance ratings from behavioral
avoidance tests, SCL90-R OC scale, subscales of the Leyton Obsessional Inventory,
Maudsley Obsessional Compulsive inventory, Likert scales of symptom severity, Global
Obsessive Compulsive Scale; Black et al., 1990, Goodman, Price, Rasmussen, Mazure,
Delgado et al., 1989; Rim et al., 1996,1992; Richter et al., 1994; Woody et al., in press-a,
in press-b). These results indicate that the YBOCS has good convergent validity.
ZXseriminant n&&y. Studies of discriminant validity have been less encouraging. The
YBOCS has large correlations with the Hamilton depression scale (mean r= .64, range
= .53 to .91) and large correlations with the Hamilton anxiety scale (mean r = .62,
range = .47 to 85; Goodman, Price, Rasmussen, Mazure, Delgado et al., 1989;
Hewlett, Vinogradov, & Agms, 1992; Price et al., 1987; Richter et al., 1994). These
studies show that correlations between the YBOCS and measures of depression and
general anxiety tend to be as large as the convergent validity correlations. This suggests the IO-item YBOCS has poor discriminant validity.
Comment
The YBOCS provides a comprehensive assessment of OC symptoms and their parameters. The core items have good interrater reliability and acceptable internal consistency. Although there is evidence of adequate convergent validity, the lo-item
YBOCS has weak discriminant validity. The psychometric properties of the Symptom
Checklist and investigational items remain to be investigated.
The Symptom Checklist requires the assessor to inquire about a wide range of obses
sive and compulsive phenomena. This is important for a comprehensive assessment
because patients may feel embarrasse d or otherwise reluctant to discuss their obsessions
and compulsions, and they may not mention these symptoms unless the interviewer
Obsesskmsand Comprlsionr
281
directly asks about them. A shortcoming of the Symptom Checklist is that it provides a
limited assessment of cognitive compulsions (e.g., repeating special words or phrases to
undon disturbing thoughts). The Checklist was recently expanded by Foa and Kozak to
assess these phenomena (personal communication, April, 1994).
The YBOCS provides separate scores to measure the severity of obsessions and
compulsions. However, most outcome studies simply combine these into a total score.
Rim et al. (1989) observed that if a patient has only obsessions or compulsions, the
YBOCS total score may be spuriously low even if symptoms are severe. The use of sub
scales would provide more information about the effects of treatment (e.g., some
treatments may have a greater effect on compulsions than obsessions) and would help
circumvent the problem raised by Rim et al.
YBOCS interviews (including the Symptom Checklist, core items, and investigational items) are time consuming, requiring an average of 40 min per patient from a
trained interviewer (Rosenfeld et al., 1992). Recently, Rosenfeld et al. (1992) developed a self-administered computerized version which was well received by patients
and yielded comparable ratings to those obtained from the interview version (97%
agreement).*
Self-report versions also have been developed (beckman, Walker,
Goodman, Pauls, & Cohen, 1994; Warren, Zgourides, & Monto, 1993), although their
psychometric properties remain to be determined.
CONTENT,
RELlABlLITY,
AND VALIDITY:
SUMMARY
AND CONCLUSlONS
Behavioral Assessment
The psychometric properties of the assessment methods reviewed in this article are
summarized in Table 1. As the table shows, little is known about the psychometric prop
erties of behavioral assessment methods. Behavioral Avoidance Tests (BATS) have the
advantage of providing in tivo measures of O&elated fear and avoidance. Unfortunately,
these measures are sometimes diEcult to construct, and often focus on external fear stimuli to the neglect of internal sources of fear (e.g., fear of having a bad thought). BATS
also fail to assess covert avoidance (e.g., imagining a glove on ones hand while touching
a contaminant). Although these limitations could be addressed by including selfreport
measures of such forms of fear and avoidance, BATS are used increasingly less often in
treatment outcome studies (Emmelkamp, 1982; Foa et al., 1992).
Diary measures of naturally occurring target behaviors are popular in treatment-outcome studies of panic disorder (e.g., Clark et al., 1994), and have been used
in studies of other disorders, including social phobia (Glass & Arnkoff, 1994) and
chronic pain (Philips, 1988). Surprisingly, these methods are used infrequently in
OCD outcome studies. The assessment of OCD would be advanced by the develop
ment and validation of such measures.
Direct observation methods have been used occasionally in case studies of inpatients (e.g., Mills et al., 1973). Although these methods are more difficult to apply to
S. Taylm
282
Self-Report
lnven tories
Self-report inventories are popular because of their ease of administration. They differ
markedly in their breadth of measurement; some provide measures of different OC
phenomena (e.g., the MOCI subscales) whereas others are simply global measures of
symptom severity (e.g., the SCIAO-R OC scale). As summarized in Table 1, the inventories also differ in their psychometric properties. The SCIAWR OC scale (and predecessors) has adequate reliability and convergent validity, but uncertain criterion-related
validity and poor discriminant validity. The item content of the SCIA@R OC scale and
predecessors suggests they are essentially measures of nonspecific distress. This is consistent with their high correlations with measures of general psychopathology.
The LO1 subscales have adequate reliability and validity. The MOCI total scale also
has adequate psychometric properties. The MOCI subscales have adequate internal
consistency, apart from the slowness subscale. The MOCI washing and checking sub
scales have adequate validities, whereas the validities of the other subscales remain to
be evaluated. The self-report CAC has adequate psychometric properties, apart from
questionable discriminant validity. The self-report Likert scales have adequate convergent and discriminant validity, although their other psychometric properties remain to
be determined.
Some self-report inventories confound the assessment of important variables.
Distress caused by symptoms and symptom frequency are confounded in the SCIAWR
OC scale (and predecessors). The LO1 subscales are highly intercorrelated, which rai+
es the question of whether there is any advantage to having separate symptom, interference, and resistance subscales. The high correlations arise, in part, from the fact that
the assessment of interference and resistance is confounded with symptom prevalence.
Freund et al. (1987) claimed two advantages of the CAC over the MOCI: (a) the former uses a 4point rather than a dichotomous scale, and so the CAC may be more sensitive to gradations in symptom severiry; and (b) the CAC focuses on highly specific
behaviors, with each point on the rating scale labeled with a detailed written descrip
tion. The first point is unlikely to be correct because Dominguez, Jacobson, de la
Gandara, Goldstein, and Steinbrook (1989) found that original version of the MOCI
correlated .96 with a revised MOCI that used a 4point Likert rating. The advantages
of Freund et als second point also is questionable, because the MOCI assesses specific OC symptoms. In comparison, the CAC does not directly assess OC symptoms, it
merely assesses interference in everyday activities that may be due to obsessions, compulsions, or both. The CAC provides no indication as to the nature of the interference
because its ratings confound slowness, avoidance, and oddity of behavior. This means
that high scores on the CAC are ambiguous; they could arise from obsessional slow
ness, compulsive repeating, avoidance, and/or obsessional doubting and indecision.
In summary, in terms of breadth of measurement, reliability, and validity, there is
much to recommend the MOCI over the other self-report measures. The MOCI total
scale has comparable reliability and validity to other inventories. Compared to global
measures of OC symptoms (e.g., the LO1 Symptom subscale), the MOCI subscales
permit a more detailed assessment of OC symptoms. The MOCI has further advan-
TABLE 1. Prqerties
of Measures:
283
summpry
Reliability
Internal
Consistency Interrater
Validity
TestCriterionRetest?
Related Convergent Diirhninant
Behavioral Approach
Tests
na
Direct Observation
Methods
na
Diary Methods
na
na
SCL!N%ROC
& predecessors
na
+
t
t
na
na
t
t
t
t
+
t
t
t
t
t
+
t
na
na
t
t
t
na
t
t
t
?
t
t
t
LO1 Suhscales
Symptom
Resistance
Interference
MOCI
Total scale
Washing suhscale
Checking suhscale
Doubting suhscale
Slowness suhscale
CAC
Self-Report
Observer-Rated
CPRS-OC
Likert Scales
GOCS
YBOCS (IO-item)
na
na
na
na
t
t
na
t
t
t
na
na
?
t
t
tages of not confounding the assessment of important variables. However, the MOCI
has three main shortcomings: (a) It provides a limited assessment of obsessions, (b)
the slowness subscale has weak psychometric properties, and (c) it provides no measure of symptom interference or resistance.
Observer-Rated Scales
The CPRSOC, GOCS, and observer-rated CAC provide only global measures of
symptom severity. The CPRS-OC and GOCS have been used in numerous treatment
outcome studies (Table 3), despite the lack of data supporting their reliability and
S. Taylor
284
validity (Table 1). Although there are more data on the reliability and validity of the
observer-rated Likert scales, each of these measures suffer the important limitation of
being based on unstructured clinical interviews. As a consequence, the psychometric
properties of these scales may vary with the skill and (unspecified) training experiences of the interviewer. This is less of a problem when the interviewer follows a structured interview protocol such as that used in the YBOCS.
The YBOCS yields a wealth of information on OC symptoms and their parameters.
Each item is accompanied by detailed probe questions, which structures the interview
and ensures that appropriate information is collected. The YBOCS has acceptable
reliability and convergent validity, although discriminant validity is weak. It is unlikely that this is a weakness specific to the YBOCS, because the GOCS is highly correlated with the YBOCS, and so it may have similar problems with discriminant validity.
Moreover, the item content of the CPRS-OC suggests it is a measure of nonspecific distress, and so it is also likely to have even worse discriminant validity than the YBOCS.
Apart from the time required to administer the YBOCS (approximately 40 min), it is
generally superior to the other observer-rated scales covered in this review. The
YBOCS (including the Symptom Checklist and investigational items) has advantages
over self-report measures, including greater coverage (i.e., it assesses a range of OC
symptoms and parameters), greater flexibility, and allows the interviewer to determine whether the patient is reporting OC symptoms or other phenomena, such as tics
or paraphilic symptoms. Considering reliability, validity, and breadth of measurement, the YBOCS appears to be the best available observer-rated scale.
SENSITIVITY
TO TREATMENT
EFFECTS
Treatment outcome measures need to be more than reliable and valid; they also must
be sensitive to changes in symptom severity. Behavior therapy (in vivo exposure plus
response prevention) and clomipramine are established treatments for OCD, with
their efficacy demonstrated on numerous outcome measures (Abel, 1993; Cox et al.,
1993; van Balkom et al., 1994). Accordingly, studies of these therapies were used in
me&analyses of the sensitivity of OC measures.
Method
A meta-analysis was conducted separately for clomipramine and behavior therapy, using
the procedures described by Wolf (1986). Studies were included if they (a) included
samples of more than five subjects, (b) used one or more of the measures covered in
this review, and (c) reported sufficient information to compute effect sizes. Suitable
studies were located by searching Psychological Abstracts and Medline data bases, and
by consulting recent treatment-outcome reviews (e.g., Abel, 1993; Cox et al., 1993; van
Balkom et al., 1994). When necessary and feasible, authors of published reports were
contacted in an effort to obtain information necessary to compute effect sizes. Studies
using subsamples of larger studies were excluded unless they reported outcome measures that were not reported in the larger studies. Also excluded were studies that used
combined pharmacological and behavioral treatment within a single therapy trial.
Thirty-five suitable studies were identified, which provided 19 trials of clomipramine
and 20 trials of behavior therapy (some studies reported more than one trial).
The effect size for each measure was computed according to the following formula: Effect size (ES) = (Mpre - Mpost)/SDFled,
where M,, and Mp, are the pre and
posttreatment means for a given treatment trial, and SDI-ld
is the mean of pre and
posttreatment standard deviations. Hedges (1982) correction was used to calculate
Obs~ti
and Gmfnhuns
285
mean effect sizes. This adjusts for differences in sample size by weighting each effect
size according to the number of subjects it was based upon.
Behavioral Assessment
Nine published treatment studies of behavior therapy or clomipramine used a BAT to
assess treatment outcome. Only four studies provided enough information to compute
effect sizes for behavior therapy, and only one for clomipramine. The results for behavior therapy are presented in Table 2. Here it can be seen that effect sizes for SUDS and
avoidance varied markedly across studies, and effect sizes had no obvious relationship to
number or duration of treatment sessions, sample size, or type of BAT. Overall, the findings suggest that BATS are sensitive to treatment effects. However, further research is
required to determine which type of BAT is most sensitive to treatment effects.
Direct observation and diary methods also require further investigation. These measures appear sensitive to treatment effects in case studies and small open trials (e.g.,
Foa et al., 1980; Mills et al., 1973; Turner et al., 1979,1980,
1985). However, the studies using these methods either did not meet criteria for inclusion in the meta-analysis,
or they did not provide sufficient information to compute relevant effect sizes.
Self-Report lnven tories and Observer-Rated Scales
Table 3 shows the mean effect sizes for self-report inventories and observer-rated
scales, along with the number of trials on which the calculations were based. Before
comparing the treatment sensitivity of the scales, it is necessary to determine whether
the effect sizes of each measure were based on different amounts of treatment. For
the behavior therapy trials, the amount of therapy per outcome trial was computed by
multiplying the number of treatment sessions by the duration of each session. The
inventories and scales were defined as independent variables and were compared, by
means of a one-way ANOVA, in terms of the amount of therapy associated with them.
The inventories and scales did not differ with regard to this variable, fl13,40)
< 1.
For the clomipramine trials the amount of therapy was defined as the mean dose.
per patient multiplied by the number of weeks of treatment. This was used as a dependent variable in a one-way ANOVA, where the inventories and scales were independent
Sample
Size
up to 25
15
17
15
20
Study
Effect Size
Trpe of BAT
SUDS
Avoidance
1.87
1.64
11
Multitask
(set as homework)
Single task
10
51
Multitask
Multitask
5.36
3.69
1.03
1.50
1.93
2.80
1.09
1.34
0.87
3
1
MOCI
CAC
Self-Report
Observer-Rated
CPRSOC
2.03
0.89
1.08
1.33
0.45
0.38
-
0.34
2
4
4
4
4
LO1 Subscales
Symptom
Resistance
Interference
0 . 30
0.28
0.67
Number
of TriaIs
SD
Hedges
Adjusted Mean
Instrument
Scab
1.46
0.38
-
0.52
0.66
0.05
0.45
0.02
SD
0.86
1.78
1.02
1.03
0.97
1.03
0.49
Hedges
Adjusted Mean
Effect Size
Behavior Therapy
(Exposure and Response Prevention)
and Observer-Rated
Number
of Trials
Effect Size
Clomipramine
2.03
0.88
1.64
1.09
0.80
0.68
0.94
0.49
Grand Mean
(AU Trials)
1.75
0.62
0.24
1.67
YBOCS (IO-item)
aOne trial used the HSCL (effect size = 0.47) and one used the SCLSO-R (effect size = .50).
Note. Studies used: Allen & Rack, 1975; Benkelfat et al., 1989, 1990; Boersma et al., 1976; Clomipramine Collaborative Study Group, 1991; Cottraux
1990; Emmelkamp & Beens, 1991; Emmelkamp & Rmanen, 1977; Emmelkamp et al., 1980, 1988, 1989; Fals-Stewart et al., 1993; Foa et al., 1984,
Freund et al., 1987; Cehris et al., 1990; Hewlett et al., 1992; Insel et al., 1983; Rozak et al., 1988; Mavissakalian et al., 1990; Pato et al., 1991; Pigott
1992,199O; Rachman et al., 1979; Rack, 1973; Solyom & Sookman, 1977; Steketee & Doppelt, 1986; Tamimi et al., 1991; Thoren et al., 1980; Vallejo
1992; van den Hout et al., 1988; Welkowitz et al., 1989; Woody et al., in press-a, in press-b Zohar & Insel, 1987.
et al.,
1992;
et al.,
et al.,
1.74
0.58
1.74
GOCS
1.98
2.27
1.75
0.63
1.98
2.27
1.75
5
4
1.84
0.87
2.11
3.47
0
0
1.84
0.60
0.98
1.56
2.11
3.47
0.66
5
4
1.56
0
0
288
S. Taylor
variables. Again, the inventories and scales did not diier on this variable, F(8,27) = 1.17,
p > .l. Thus, the effect sizes obtained for the inventories and scales were not confounded by differences in the amount of treatment associated with each of measure.
This means the effect sizes could be directly compared to determine the relative sensitivity of the measures.
In terms of Cohens (1988) classification scheme, large effect sizes are > .80, and
medium effects are .50 to .79. Table 3 shows the inventories and scales generally yielded medium-tolarge effects, suggesting they all were sensitive to treatment effects. The
OC scales from the SCLSO-R and HSCL produced the smallest effects. The other selfreport inventories (LO1 subscales, self-report CAC, and MOCI total scale) produced
similar effect sizes to one another. MOCI subscales have been used as outcome measures
in only one study (Mavissakalian, Jones, Olson, & Perel, 1990) and so it is diflicult to
gauge their sensitivities. Mavissakalian et al. found that all subscales were sensitive to
the effects of clomipramine. The largest effect was for the checking subscale (effect size
= 1.15)) followed by the washing (1.00)) doubting/conscientiousness
(0.77)) and slowness subscales (0.47). Although the effect sizes suggest the subscales are sensitive to
treatment effects, they should be interpreted with caution because Mavissakalian et al.
did not describe the types of obsessions and compulsions in their sample. If their sample was mostly patients with compulsive checking, then we would expect the checking
subscales to have the largest effect size.
As Table 3 suggests, observer-rated scales produced larger effect sizes than selfreport scales for the clomipramine trials, t(36) = 7.14, p < .OOl, and there was a trend
in this direction for behavior therapy trials, t(56) = 1.99, p < .052. Table 3 suggests the
findings for the Likert scales were an exception to these results, since the effect sizes
of self-report and observer-rated versions do not appear to differ. These impressions
were supported by statistical analyses of the behavior therapy trials, which is where the
Likert scales were used (Table 3). The scales were classified into four groups: (a) selfreport inventories (SCLSO-R OC, LOI, MOCI, self-report CAC),(b) self-report Likert
scales, (c) observer-rated Likert scales, and (d) other observer-rated scales (observerrated CAC and YBOCS). The groups were used as independent variables and effect
size was the dependent variable. The one-way ANOVA was significant, F(3,54) = 8.88,
p < .OOl, and Newman-Keuls posthoc comparisons revealed that the effect size of
Group 1 (self-report inventories) was significantly smaller than those of the other
groups (p < .05), and that the other groups did not differ from one another (ps > .05).
Why do observer-rated scales generally yield larger effects? Lambert, Hatch,
Kingston, and Edwards (1986) found similar results for measures of depression and
suggested that trained observers might be better than patients at detecting changes in
symptom severity. This advantage does not appear to be present for Likert scales. It is
not clear why this occurred. Studies using Likert scales have provided little information
on how the scales were administered. Apart from providing descriptors for the anchor
points on the scales, the studies have provided no information on the time frame used
to assess symptoms or other pertinent details. It may be that self-report Likert scales are
more sensitive than self-report inventories because of their greater specificity; that is,
the subject may be instructed to rate specific symptoms over a specific time period.
DIS C U SSIO N
C u rr e n t Status of the Assessment of Obsessions and Compulsions
The selection of measures for treatment outcome studies is based on multiple criteria. Among the most important are (a) content (range of phenomena assessed); (b)
289
reliability and validity, and whether their is sufficient available information to evaluate these properties; and (c) sensitivity to changes in symptom severity. Some measures
are popular in OCD treatment-outcome
studies, yet have unknown psychometric
properties (i.e., the CPRS-OC and COCS; see Table 1). Some measures provide only
global measures of OC symptoms (LO1 symptom subscale and CAC) and others
appear to be largely measures of nonspecific distress (SCL90-R OC scale, its predecessors, and CPRS-OC). Some measures confound important variables (e.g., symptom
prevalence and distress is confounded in the SCLSO-R OC scale; symptom prevalence
and degree of resistance is confounded in the LO1 resistance subscale; obsessional
slowness, avoidance, and oddity of behavior are confounded in the CAC). When
breadth of measurement, reliability, validity, and sensitivity to treatment effects are
considered together, the YBOCS appears to be the best available measure for treatment outcome research.
Future Directions
Rejining&ng?ne-.
Further research is needed to firmly establish the reliability and validity of many of the measures currently used in treatment outcome
research. For example, studies of test-retest reliability have been confined to relatively short periods (days to weeks). For most measures, temporal stability (in the
absence of treatment) over longer periods of time remains unknown. This is an
important omission because OCD is a chronic disorder (APA, 1994) and a good measure of OC symptoms should be stable over periods of months or years. Test-retest
reliability over periods of months also is important for treatment studies that extend
over such time periods, and for studies of long-term effects of treatment.
Behavioral assessment of OCD has fallen into neglect in recent years, despite the
potential advantages of various assessment methods. Diary methods are commonly
used in studies of other disorders (e.g., panic disorder, chronic pain) and may
become more popular in OCD studies if their reliability and validity can be estab
lished. Confidence in the accuracy of reliability and validity statistics for the YBOCS
would be improved if investigators reported the results of integrity checks for the
YBOCS structured interview, including descriptions of the nature and incidence of
protocol violations. The psychometric properties of the other observer-rated scales
also could be improved by using structured interviews to derive the ratings.
Zncreasitrgspeciji&y. A better understanding of treatment effects may be obtained by
using measures of specific OC symptoms or symptom parameters, rather than relying
on global measures of symptom severity. For example, rather than relying on the sum
of the lo-item YBOCS as a global measure, its obsession and compulsion subscales
could be used separately. This would enable investigators to determine whether a
given treatment (e.g., behavior therapy) is more effective in reducing some OC symp
toms (e.g., compulsions) than others (e.g., obsessions).
Cog&k w
Sanavio (1988) argued that existing self-report measures fail to ade
quately assess obsessions. Accordingly, he developed the Padua inventory, which contains four subscales: (a) checking, (b) contamination fears, (c) mental dyscontrol
(impaired control of mental activities), and (d) fear of behavioral dyscontrol (urges and
worries about losing control of ones behavior). The last two subscales pertain to intrusive thoughts, and were retained essentially unchanged in a recent revision of the Padua
inventory (van Oppen et al., 1995). Although these scales may be useful for outcome
290
S. Taylor
research, they are highly correlated with nonspecific distress (van Oppen, 1992), and
some items are measures of general worry rather than obsessions (Freeston et al., 1994).
Worry and obsessions share many features, although they can be distinguished concep
tually (Turner, Beidel, 8c Stanley, 1992), and subjects can readily discriminate between
them when provided with written definitions (Wells & Morrison, 1994).
A more promising self-report assessment of obsessions is the Obsessive Intrusions
Inventory (Purdon & Clark, 1993), which is a 52-item measure of intrusive thoughts,
images, and impulses. Preliminary validation studies are encouraging, and suggest the
scale is a measure of obsessions rather than other types of thoughts (Purdon & Clark,
1993). This scale may be a valuable addition to treatment outcome batteries, especially if it proves sensitive to changes in symptom severity.
Comprehensive cognitive assessment also requires measures of the patients beliefs
relating to his or her obsessions and compulsions. The YBOCS assesses some of these,
such as exaggerated beliefs of personal responsibility. Other beliefs also may be
important. People with OCD often fear they will act on their obsessional thoughts.
They often state that having an unwanted thought about performing a particular act
is as bad as performing the act itself (thought-action fusion; Rachman, 1993). Such
beliefs are important predictors of the persistence of obsessions (Purdon & Clark,
1994) and may be usefully included as part of a comprehensive assessment battery.
In cognitive-behavioral therapy, patients are often instructed in adaptive methods
of thought control (e.g., see Salkovskis, 1989). To determine whether these interventions are successful, it would be useful to have a method for assessing adaptive and
maladaptive strategies for controlling obsessions. Well and Davies (1994) recently
developed the Thought Control Questionnaire, which may be useful for this purpose.
Such an assessment should prove useful in studies of the process of change in cognitive-behavior therapy, and may provide insights as to the nature of cognitive changes
in pharmacotherapy.
S@xiul @t&ions.
OCD sometimes arises in childhood or early adolescence (APA,
1994) and so some researchers are turning their attention to the early detection and
treatment of childhood obsessions and compulsions. Child and adolescent versions of
several scales have been developed, including the LO1 (Berg, Rapoport, & Flament,
1985) and YBOCS (Goodman & Price, 1990). Little is known about their psychometric properties or sensitivity to treatment effects, although preliminary findings are
encouraging (Berg et al., 1985; Flament et al., 1985; Goodman & Price, 1999). As in
adult samples, self-report measures yield smaller treatment effect&es than observer
rated scales (Flament et al., 1985; Leonard et al., 1989). Further research is needed
on the assessment of obsessions and compulsions in children, and in other populations such as the elderly and minority groups.
Cri&ria~ v
sign&mt chamge.Most OCD outcome studies focus on the statistical
significance of findings; few discuss the clinical significance of the results. Yet, consideration of clinical significance adds an important dimension to treatment evaluation.
This is illustrated by a recent multicenter study of fluoxetine (Prozac) in the treatment
of OCD. Tollefson et al. (1994) assigned 335 OC patients to placebo or fluoxetine. The
latter was given at one of three doses (20,40, or 60 mg/day). In order to be included
in the study, patients had to have a YBOCS score > 15, which was defined as indicating
OC symptoms of moderate or greater severity. According to this cutoff, which is often
used in pharmacotherapy studies of OCD (Rosenfeld et al., 1992), one may define
YBOCS I 15 as indicative of clinically mild OC symptoms. After 13 weeks of treatment,
Obsessionsand compulsion
291
48,730-738.
292
S. Taylor
ohs&
and Gnnm
Foa, E. B., Steketee, G., Grayson, J. B., Turner, R M., & Latimer, P. R (1984).
293
blocking of obsessiw-compulsiie rituals: Immediate and long-term effects. Bdtan& Thcnrp, 15,456-472.
Foa, E. B., Steketee, G., Koxak, M. J., & Dugger, D. (1987). Effects of imipramine on depression and obsec
sive-compulsive symptoms. Pqchiatq Re,smrh, 21,123-136.
Foa, E. B., Steketee, G., & Milby, J. B. (1980). Differential effects of exposure and response prevention in
obsessive-compulsive washers. Joumal of Gmsulting and Clinical Psychology, 48,71-79.
Freeston, M. H., Ladouceur, R, Rheaume, J., Letarte, H., Gagnon, F., & Thibodeau, N. (1994). Self-report
of obsessions and worry. Behaviour Research and Therapy, 32,29-36.
Freund, B., Steketee, G. S., & Foa, E. B. (1987). Compulsive activity checklist (CAC): Psychometric analysis
with obsessive-compulsive disorder. Behavioral Asscssmmt, 9,67-79.
Frost, R., & Sher. K. (1989). Checking behavior in a threatening situation. Bchauiour Research and Therapy,
27, 385-389.
Gehris, T. L., Kathol, R G., Black, D. W., & Noyes, R (1990). Urinary free cortisol levels in obsessive-compulsive disorder. Pychiutry Besea&, 32.151-158.
Glass, C. R., & Amkoff, D. B. (1994). Validity issues in selfstatement measures of social phobia and social
anxiety. Behatiur Research and Therapy, 32,255-267.
Goodman, L. A, & KruskaI, W. H. (1963). Measures of association for cross-classilications: III. Approximate
sampling theory. Journal of the Atnerican SkztistiuzlRrcociation, 58,310-364.
Goodman, W. K., & Price, L. H. (1990). Rating scales for obsessive-zompulsive disorder. In M. A Jenike, L.
Baer, & W. E. Minichiello (Eds.), Obsessive-wrnpu&vediwniers: Thany and managtmcnt (pp. 154-166).
Chicago: Year Book Medical Publishers.
Goodman, W. K, Price, L. H., Rasmussen, S. A., Mazure, C., Fleishmann, R. L., Hill, C. L., Heninger, G. R.,
& Chamey, D. S. (1989). The Yale-Brown obsessive compulsive scale. I. Development, use, and reliability. Anhives ofGeneralpsYchiahy, 46,1006-1011.
Goodman, W. K, Price, L. H., Rasmussen, S. A., Mazure, C., Delgado, P., Heninger, G. R., & Chamey, D. S.
(1989). TheYaIe-Brown obsessive compulsive scale. II. Validity. Archives of GeneralpsYchiatp 46,1012-1016.
Goodman, W. R, Rasmussen, S. A., Price, L. H., Mazure, C., Heninger, G. R., & Chamey, D. S. (1989). Manual
for the Y&Bwwn
obsessiue wmpukive scab (rev.). New Haven, CT: Connecticut Mental Health Center.
Hackman, A., & McLean, C. A. (1975). A comparison of flooding and thought stopping in the treatment
of obsessional neurosis. Behaviour Rcscorch and Therapy, 13,263-269.
Hageman, W. J., & Anindell, W. A. (1993). A further refinement of the reliable change (RC) index by
improving the prepost difference score: Introducing RCm. Behaviour Research and Therapy, 31,693-700.
Hedges, L. (1982). Estimation of effect size from a series of independent experiments. pSrch&gic~ BuLktin,
92,490-499.
Hewlett, W. A., Vinogradov, S., & Agras, W. S. (1992). Clomipramine, clonazepam, and clonidine treatment
of obsessivc-compuIsive disorder. Journal of Clinical psYcho/~hamtaco&~gy,
12,420-430.
Hodgson, R J., & Rachman, S. (1977). Obsessional+ompuIsive complaints. BehauiourResearch and Therapy,
15,3&L395.
Insel, T. R., Murphy, D. L., Cohen, R. M., Alterman, I., Kilton, C., & Linnoila, M. (1983). Obsessive-compulsive disorder: A double blind trial of clomipramine and clorgyline. A&ivrs of General Psychiatry, 40,605-612.
Jacobson, N. S., & Tiuax, P. (1991). Clinical significance: A statistical approach to defining meaningful
change in psychotherapy research. Journal of Consulting and Clinical Pychobgy, 59,12-19.
Jenike, M. A. (1989). Obsessive+ompulsive and related disorders: A hidden epidemic. NewEngland Journal
of Medicine, 321,539541.
Jenike, M. A., Hyman, S., Baer, L., Holland, A., Minichiello, W. E., Buttolph, L., Summergrad, P., Seymour,
R., & Ricciardi, J. (1990). A controlled trial of fluvoxamine in obsessive-compuhive disorder: Implications
for a serotonergic theory. American Journal of Psych&y, 147,1209-1215.
Kazarian, S. S., Evans, D. R., & Lefave, K (1977). Modification and factorial analysis of the Leyton
Obsessional Inventory. Journal of Clinical Psychology, 33,422-425.
Kendall, M. G. (1963). Rank caclafiun methods (3rd ed.). London: Griffm.
Kendell, R. E., & DiScipio, W. J. (1970). Obsessional symptoms and obsessional personality traits in patients
with depressive illness. Pychobgical Medicine, 1,65-72.
Kern, J. M. (1983). Relationships between obtrusive laboratory and unobtrusive naturalistic behavioral fear
assessments: Treated and untreated subjects. Behavioral Assesszen t, 6,45-60.
Kim, S. W., Dysken, M. W., & Katz, R. (1989). Rating scales for obsessive compulsive disorder. Pychiutk
Annals, 19,74-79.
Kim, S. W., Dysken, M. W., & Kuskowski, M. (1990). The Yale-Brown obsessive-compulsive scale: A reliability and validity study. Psychiatry Research, 34,99-106.
Kim, S. W., Dyxken, M. W., & Kuskowxki, M. (1992). The Symptom Checklist-90 obsessive-compulsive sub
scale: A reliability and validity study. Psych&y Reseazh, 41, 37-44.
294
S. Tqbr
Rim, S. W., Dysken, M. W., Ruskowski, M., & Hoover, If_ M. (1993). The Yale-Brown obsessii compulsive
scale and the NIMH global obsessive compuhii scale (GOCS): A reliability and validity study.ZntrmaEiona
Joumal ofMdhods in P+ia&ic
lkeazh, 3.3744.
Kozak, M. J., Foa, E. B., & Steketee, G. (1988). Process and outcome of exposure treatment with obsessive-compulsives: Psychophysiological indicators of emotional processing. EehmriorThera#y,19,157-169.
Rraaijkamp, H. J. M., Emmelkamp, P. M. G.. & van den Hout, M. A. (1986). The Maudslq tie
@side inventory: R&&i&
and t&&y.
Unpublished manuscript, Department of Ciinical Psychology,
University of Groningen, The Netherlands.
lacks, P., & Morin, C. M. (1992). Recent advances in the assessment and treatment of insomnia. Joumal of
GmsuUing and Clinical +w,
40,586-594.
lambert, M. J.. Hatch, D. R. Kingston, M. D., 8c Edwards, B. C. (1986). Zung, Beck, and Hamilton rating
scales as measures of treatment outcome: A meta-analytic comparison. Journal of G~~~~ultingand Clinical
Psychology, 54,54-59.
Lang, P. J.. & Lazovik. A. D. (1963). Experimental desensitization of a phobia. Joumal of Abnomsal and social
psychdogT,66,519-525.
Leckman, J. F., Walker, B. E., Goodman, W. R, Pauls, D. L., & Cohen, D. J. (1994). Justright perceptions
associated with compulsive behavior in Tourettes syndrome. American Joumal of Psychiatry, 151,675-680.
Leonard, H. L., Swedo, S. E., Rapoport, J. L., Roby, E. V, Lenane, M. C., Cheslow, & Hamburger, S. D. (1989).
Treatment of obsessive-compulsive disorder with clomipramine and desipramine in children and adoles
cents. Archiues of General Psych&y, 46,1688-1692.
Marks, 1. M., Haiiam, R. S., Connolly, J., & Philpott, R. (1977). Nursing in 6chaviomZ+sychoth@y
London,
UK: Royal College of Nursing.
Marks, I. M., Stem, R S., Mawson, D., Cobb, J., & McDonald, R (1986). Clomipramine and exposure for
obsessive-compulsive rituals: I. British Joumal of Psych&y
136,1-25.
Mavismkalian, M. R, & Barlow, D. H. (1981). Assessment of obsessive-compulsive disorder. In D. H. Barlow
of adult disonbs (pp. 269-238). New York: Guilford.
(Ed.), Behuviumlit
Mavissakalian, M., Jones, B., Olson, S., & Perel, J. M. (1999). Ciomipramine in obsessive+ompulsive disorder: Clinical response and plasma levels. Journal of Clinical Pha-o&p
IO, 261-366.
Millar, D. G. (1986). A repertory grid study of obsessionalityz Distinctive cognitive structure or distinctive
cognitive content? British Joumul of Medical Psycholo~, 53,59-66.
Millar, D. G. (1983). Hostile emotion and obsessional neurosis. Aychdogical Medicine, 13.813-819.
Mills, H. L., Agras, W. S., Barlow, D. H., & Mills, J. R. (1973). Compulsive rituais treated by response pre
vention. Archives of Gnuml Pqchiaby, 28.524630.
Murphy, D. L., Pi&r, D., & Alterman, I. S. (1982). Methods for the quantitative assessment of depressive
and manic behavior. In E. I. Burdock, A. Sudilosky, & S. Gershon (Eds.), The b&zuiorofprYchiubicpatienfs
(pp. 355-392). New York: Dekker.
Nietzel, M. T., Bernstein, D. A, & Russell, R. L. (1988). Assessment of anxiety and fear. In A. S. Bellack &
M. Hersen (Eds.) , Behavioral assesmmt (3d ed., pp. 280-312). New York: Pergamon.
Nunnally, J. C. (1978). Psychombc theory (2d ed.) . New York: McGraw-Hill.
Pato, M. T., Pigott, T A., Hill, J. L., Grover, G. N., Bernstein, S., & Murphy, D. L. (1991). Controiled comparison
of buspirone and clomipramine in obses&+compuisii
disorder. A meticun JoumulofP@aby,
148,127-129.
Philips, H. C. (1988). The ps$wiogical manapment of chnmic pain. New York: Springer.
Philpott, R. (1975). Recent advances in the behavioral assessment of obsessional illness: Difficulties common to these and other measures. Scottish Medical Journal, 2O(Suppl. 1), 33-46.
Pigott, T. A., LHeureux, F., Hill, J. L., Bihari, K, Bernstein, S. E., & Mtuphy, D. L. (1992). A double-blind
study of adjuvant buspirone hydrochloride in clomipramine-treated patients with obsessive-compulsive
disorder. Journal of Clinical Psychupha-o&y,
12,l l-18.
Pigott, T. A, Pato, M. T., Bernstein, S. E., Grover, G. N., Hill, J. L., Tolliver, T. J., & Murphy, D. L. (1996).
Controlled comparisons of clomipramine and fluoxetine in the treatment of obsessive+ompulsive dii
order. Archives of Ceneml Psychiatry, 47,926-932.
Price, L. H., Goodman, W. R., Chamey, D. S., Rasmussen, S. A, & Heninger, G. R (1987). Treatment of
severe obsessivtiompulsive disorder with fluvoxamine. American Joumal of Psych&y, 144,1059-1661.
Purdon, C., & Clark, D. A (1993). Obsessive intrusive thoughts in nonclinical subjects. Part I. Content and
relation with depressive, anxious, and obsessional symptoms. BehaviourResearch and Thtrafi, 31,713-720.
Purdon, C., & Clark, D. A. (1994). Obsessive intrusive thoughts in nonclinical subjects. Part II. Cognitive
appraisal, emotional response and thought control strategies. B&aviourRescarch and Therapy, 32,463-410.
Rachman, S. (1993). Obsessions, responsibility and guilt. BehuuiourRtxarch and Them&, 31, 149-154.
Rachman, S., Cobb, J., Grep B., McDonald, D., Mawxon, D., Sartory, G., & Stem, R. (1979). The behavioral
treatment of obsessive-compulsive disorders, with and without clomipramine. Behauiour &search and
Therapy, 17,467-478.
Obstxsti
and Gmzjtukm.s
Rachman, S., & de Silva, P. (1978). Abnormal and normal obsessions. B&aviour Rtsazrch and Them&
233-248.
295
16,
2%
S. Taylor
TolleBon, G. D., Rampey, A. H., Potvin, J. H., Jenike, M. A, Rush, A. J., Dominguez, R A., Rotart, L. hi.,
Shear, hf. IL, Goodman, W., & Genduso, L. A_ (1994). A multicenter investigation of fixed-dose fluoxetine in the treatment of obsessive-compulsive disorder. Anhiucs of ccncral Psych&y, 51,559-567.
Turner, S. M., Beidel, D. C., & Stanley, M. A. (1992). Are obsessional thoughts and wony diierent cognitive phenomena? Clinical PsychologyReview, 12,257-270.
Turner, S. M., Hersen, M., Bellack, A. S., Andrasik, F., & Capparell, H. V. (1980). Behavioral and pharmacological treatment of obsessive-compulsive disorders. Journal of Newous and MentalDisease, 168.651-657.
Turner, S. M., Hersen, M., Bellack, A. S., & Wells, IL C. (1979). Behavioral treatment of obsessive compulsive neurosis. Beha&ur Reseamh and Therapy, 17,95-l%
Turner, S. M., Jacob, R. G., Beidel, D. C., & Himmelhoch, J. (1985). Fluoxetine treatment of obsessive-compulsive disorder. Journal of Clinical Pychqbhannacology, 5,207-212.
Vallejo, J., Olivares, J., Marcos, T., Bulbena, A, & Menchon, J. M. (1992). Clomipramine versus phenelzine
in obsessive-compulsive disorder: A controlled clinical trial. British JoumaL of Psych&y, I61,665-670.
van Balkom, A, van Oppen, P., Vermeulen, A, van Dyck, R., Nauta, M., & Vorst, H. (1994). A metaanalysis on the treatment of obsessive compulsive disorder: A comparison of antidepressants, behavior, and
cognitive therapy. ClinicalPsychologyRev&w, 14,359-381.
van den Hout, M., Emmelkamp, P. M. G., Rraaykamp, H., & Criez, E. (1988). Behavioral treatment of oboes
sive-compulsives: Inpatient vs. outpatient. Be&r
Rcseanzh and Therapy, 26,X31-332.
van Oppen, P. (1992). Obsessions and compulsions: Dimensional structure, reliability, convergent and
divergent validity of the Padua Inventoty. &hur&ur Reseanh and Therapy, 30,631-637.
van Oppen, P., Emmelkamp, P. M. G., van Balkom, A., & van Dyck, R. (in press). The sensitivity to change
of measures for obsessive-compulsive disorder. Journal of Anxiety LXsonicrs.
wn Oppen, P., Hoekstra, R. J., & Emmelkamp, P. M. G. (1995). The structure of obsessive-compulsives
symptoms. &haviaur Reseanh and Therapy, 33,15-23.
Warren, R., Zgourides, G., & Monto, M. (1993). Self-report versions of the Yale-Brown obsessive-compulsive scale: An assessment of a sample of normals. Psychobgical Rcpni5, 73,574.
Weissman, M. M., Bland, R. C., Canino, G. J., Greenwald, S., Hwu, H., Lee, C. R.. Newman, S. C., OakleyBrowne, M. A., Rubio-Stipec, M., Wtckmmaratne, P. J., Wittchen, H.-U., & Yeh, E.-K (1994). The cross
national epidemiology of obsessive compulsive disorder. Journal of Clinical Psychiatvy, 55(Suppl. S), 5-10.
Welkowitz, L. A, Bond, R. N., & Anderson, L. T. (1989). Social skills and initial response to behavior therapy for obsessive-compulsive disorder. Phobia Ractiu and Reseamh JoumaL 2,67-85.
Wells, A, & Davies, M. I. (1994). The thought control questionnaire: A measure of individual differences in
the control of unwanted thoughts. BehauiourRescatch and Therapy, 32,871-878.
Wells, A., & Morrison, A P. (1994). Qtalitative dimensions of normal worry and normal obsessions: A comparative study. BchauiuurR.warch and Therapy, 32.867-870.
Wing, J. R, Cooper, J. E., 8c Sartorius, N. (1974). The measumnnt and ckssijication of pFychiat& ympoms.
London: Cambridge University Press.
Wolf, F. M. (1986). Meta-analyk: Quantitative methods for eseavrh synthesis. Newbury Park, CA: Sage.
Woody, S. R., Steketee, G., & Chambless, D. L. (in press-a). Reliability and validity of the Yale-Brown
Obsessive Compulsive Scale. BehuvMurReseurch and Therapy.
Woody, S. R., Steketee, G., 8cChambless, D. L. (in press-b). The usefulness of the obsessive compulsive scale
of the Symptom Checklist-9O-Revised.BehaviourResearch and Therapy.
Zohar, J., 8c Insel, T. R (1987). Obsessive+omptdsive disorder: Psychobiological approaches to diagnosis,
treatment, and pathophysiology. Biological Pqchiatq, 22,667-687.