Clinpsych Rev

Clinical Psychology Review, Vol. 15, No. 4, pp.
261-296, 1995
Copyright 0 1995 F&vim Science Ltd
Pri&ciin the USA. AU rights reserved
0272~7356/95 $9.50 t .oo
Pergamon
ASSESSMENT OF OBSESSIONS AND

COMPULSIONS:
RELIABILITY, VALIDITY,
AND SENSITIVITY TO TREATMENT EFFECTS
Steven Taylor
Department
of Psychiatry,
University
of British
Columbia
ABSTRACT. Advances in the treatment of obsessive-wm~lsive disora%r (OCD) require reliab~

and valid measures of sufmt
sensitivity to detect t7i?atmenteffects. The jm9.sen.tarticle 67iticaUy
reoiews the instruments used in OCD treatment+n&ome wseamh. Behavioral methoa3, selj%$ort
inventories, and obserwerrated scales are mviewed with mspect to content, n&abSty, validity, and
sensitivity to treatment effects. Th.e latter was d&ermined by meta-analj~es of triuls of behavior
therapy (exposum plus response #m?vention) and clomipamine. Little is known about the psychu
metric @perties of behavioral asst~sment methods, and they are used increasingly less o@ in
outcome re.seu~~ despite certain advantup. Sey*
inventories tend to have acceptable &ability and validity, exce$t for the XL-!M-R OC scale (and its jnzd.exxssars)which has weak discriminunt validity and appears to be essentially a measunz of rwnspec.i$.c distnx Little is known
about the reliability and validity of most observer+ated scales, despite the fact thut thq am poaUlar in treatment outcome mean& AU measures appear sensitive to tmatment effects, aMough
obseruer-rated scules tend to yield larger effect sizes than self-report measures. For treatment outcolne
research, the Yale-Bmwn ObsessiveCompulsive Scale (YBOCS) awars to be th.e best available
instrument in t8rm.s of range of obsessive-wm+sive
features assessed reliabi&y, validi$ and
sensitivity to tteatment effects. Cvmp~ministemd
and se&qbort versions of th8 YBOCS have
been &eloped, which appearjnvmising but ~quinzj&rther evaluation. The effects of &a&tent may
be best understood 4 using measures of specify symptoms rather than relying on global measure
of symptom severity. The YBOCS can be readily used fm the jn@osa. Th.e article concludes b
co7Gdering additional requirfmwntsfar a wm$nzhensive assessmat of obsf3.sion.sand com@sion.s.
ONCE CONSIDERED RARE, obsessive-compulsive disorder (OCD) is now recognized as one of the most common psychiatric disorders. It has been described as a
hidden epidemic (Jenike, 1989, p. 539) with a lifetime prevalence of 2.3% in North
America and similar rates of occurrence in other countries (Weissman et al., 1994).
OCD is characterized by recurrent obsessions and/or compulsions of sufficient severity to be time consuming, cause marked distress, and interfere with daily functioning
(American Psychiatric Association [APA], 1994). Obsessions are intrusive thoughts,
Correspondence should be addressed to Steven Taylor, Department of Psychiatry, 2255
Wesbrook Mall, Universityof British Columbia, Vancouver, B.C., Canada, V6T 2Al.
261
262
S. Taylor
impulses, or images, such as repetitive thoughts of violence or contamination.

Compulsions are repetitive, intentional behaviors that the person feels compelled to
perform, often with a desire to resist. Compulsions are performed either in response
to an obsession or according to certain rules, and are often intended to neutralize or
prevent some feared event. However, either the compulsive activity is not connected
in a realistic way to what it is designed to neutralize or prevent, or it is clearly excessive (APA, 1994). Common compulsions include excessive washing and checking.
Cognitive-behavioral
therapies and certain pharmacotherapies are effective treatments of OCD (Abel, 1993; Cox, Swinson, Morrison, & Lee, 1993; van Balkom et al.,
1994). However, there is much room for improvement in treatment efficacy.
Investigators are continually searching for new treatments, and for optimal combinations of existing therapies. Advances in these endeavours require reliable, valid,
and sensitive assessment instruments. In this article I critically review behavioral
methods, self-report inventories, and observer-rated scales used in OCD treatment-outcome studies.
Should a comprehensive assessment of obsessions and compulsions include measures of so-called OC personality traits? It was once argued that OCD arises from
traits of excessive parsimony, obstinacy, and orderliness (Salzman, 1968), which characterize obsessive+ompulsive
personality disorder (OCPD; APA, 1994). However,
later research has shown that OCPD rarely precedes the occurrence OCD, and
occurs no more commonly in OCD than in other anxiety disorders. In OCD, as in
other anxiety disorders, avoidant, dependent, and histrionic personality disorders
are far more common than OCPD (for reviews see Baer & Jenike, 1992; Stein,
Hollander, & Skodol, 1993; Taylor & Livesley, 1995). Accordingly, the present review
will focus on measures of obsessive compulsive (OC) symptoms, not on measures of
OC personality traits.
The review commences with a statement of criteria for evaluating the psychometric properties of each instrument. Then I examine the tange of phenomena assessed
by the instruments, along with their reliabilities and validities. Following this, I review
their abilities to detect treatment-related changes in OC symptoms.
CRITERIA
FOR EVALUATION
Nunnallys (1978) criteria will be used to evaluate internal consistency; coefficient

a 2.70 will be defined as acceptable and a 2.80 will be defined as good. Criterionrelated (knowngroups) validity will be examined by determining whether scores dif
fer across diagnostic groups. For example, OC checkers should score higher than OC
washers on measures of compulsive checking. In evaluating criterion-related validity,
I will consider the reliability and validity of the procedures used to establish diagnostic status.
Correlation coefficients are typically used to determine test-retest reliability and
convergent and discriminant validity. Comparison of correlations across studies is
complicated by the fact that statistical significance varies with sample size. To illustrate, two studies may find measures x and y are correlated .50. The correlation would
be nonsignificant (p > .05) if Study 1 used a sample size of 11, and significant (p c .Ol)
if Study 2 used a sample size of 22. Reliance on statistical significance would lead to
the erroneous conclusion that the studies obtained inconsistent results. To circumvent this difficulty, I will use Cohens (1988) scheme to evaluate the substantive significance of correlations: Large correlations are defined as those > .50, Ymediumn
correlations are from 30 to .49, and small correlations are between -10 and .29.
0bse.ssim.sand Gnapukti
263
Using Nunnallys (1978) criteria, acceptable test-retest reliability is indicated by r 2.70,

and good test-retest reliability is indicated by r 2 .80. Acceptable convergent validity
will be defined by medium-to-large correlations.
With regard to discriminative validity, OC measures are not expected to be uncorrelated with measures of other psychopathology because OC symptoms naturally
co-occur with many forms of psychopathology, including general anxiety, depression,
irritability, and somatic preoccupation (APA, 1994; Millar, 1983; Rachman 8c Hodgson,
1980). However, this does not mean that discriminant validity is irrelevant in evaluating measures of obsessions and compulsions. An OC measure with good discriminant
validity should be correlated more highly with other OC measures than with measures
of general anxiety, depression, etc. That is, the OC measure should correlate more
highly with measures of the same construct than with measures of different constructs
(Campbell Jc Fiske, 1959). Acceptable discriminant validity will be said to occur when
correlations with nonOC measures are smaller than correlations with OC measures.
ASSESSMENT
METHODS:
CONTENT,
RELIABILITY,
AND VALIDITY
Behavioral Assessment
Behavioral theorists and therapists have long emphasized the importance of in vivo
assessment of problem behaviors (Cone, 1988). Accordingly, such methods have been
used in several studies of the efficacy of behavior therapy for OCD. In this section I
review behavioral assessment methods most commonly used in outcome studies:
behavioral avoidance tests, diary methods, and direct observation.
Behavioral Avoidance Tests (BATS)
Behavioral avoidance tests were developed originally as in vivo measures of fear and
avoidance exhibited by phobic individuals (Lang & Lazovik, 1963). In a BAT for snake
phobics, for example, the subject is asked to approach as close as possible to a snake,
which is presented under standardized conditions. The distance from the animal at
the point of closest approach is used as a measure of avoidance. The subject also is
asked to rate his or her peak level of distress (subjective units of distress; SUDS) on a
O-100 scale, where higher scores correspond to greater distress.
Several types of BATS have been used to assess O&elated fear and avoidance. Foa
and coworkers (Foa, Steketee, Grayson, Turner, 8c Latimer, 1984, Foa, Steketee, &
Milby, 1980) used a single-task BAT. Here, the therapist presents the patient with a
feared OGrelated stimulus (e.g., a compulsive washer would be presented with a contaminated object such as a trash can). The patient is asked to approach as close as possible to the object and report his or her SUDS level at the point of closest approach.
Avoidance behavior is assessed by distance from the object, or some other proximity
measure such as whether or not the patient is able to touch the object without wearing
gloves. The task is performed before and after treatment to assess changes in OGrelated fears. Note that for some BATS (e.g., those for compulsive checkers) avoidance is
indicated by the presence of compulsive rituals when the person is exposed to a personally threatening stimulus.
A single-task BAT may fail to capture the range of an individuals OGrelated fear
and avoidance; some OCs may fear and avoid a range of different stimuli, while others
have more circumscribed fear and avoidance. Accordingly, Rachman and colleagues
(e.g., Rachman, Hodgson, & Marks, 1971; Rachman, Marks, & Hodgson, 1973;
Rachman et al., 1979) devised a multitask BAT, in which OC patients each complete
264
S. Taybr
a number of different fear-related tasks. To illustrate, Rachman et al. (1979) asked

each patient to carry out five tasks that usually gave rise to compulsive rituals. For each
task an independent assessor scored the patients performance (1 = task completed,
0 = task avoided), and scores were summed across tasks to yield a O-5 avoidance score.
The assessor also rated the patients discomfort during each task, using a O-8 scale
(0 = no discomfort, 8 = extreme discomfort). Discomfort scores were summed to yield
a O-40 discomfort scale.
In the most recent effort to capture the complexity of OGrelated fear and avoidance, Woody, Steketee, and Chambless (in press-a) developed a multitask BAT in
which tasks consisted of several steps. For each patient the authors identified three
tasks that were difficult or impossible for the patients to complete without significant
anxiety or rituals (e.g., switching off electrical appliances without checking). Each
task was broken down into 3-7 steps and the degree of avoidance and ritualizing was
assessed on a 3point scale (0 = no avoidance/rituals,
1 = partial avoidance/rituals,
2 = unable to do task). SUDS levels also were recorded.
Reliabil&~ and validi+ Studies of phobia and agoraphobia have found BAT measures
of fear and avoidance to have acceptable test-retest reliability and good convergent
validity with other measures of fear and avoidance (for a review see Nietzel, Bernstein,
& Russell, 1988). Kern (1983) reported good criterion-related validity of BAT measures of fear and avoidance for animal phobics. These results can be taken as evidence
in support of the reliability and validity of the BATS used for OCs. However, this conclusion rests on the untested assumption that findings from phobics and agoraphobits generalize to OCs.
Unfortunately, there have been only two published studies of the psychometric
properties of BATS for OCD. Woody et al. (in press-a), using a multistep-multitask
BAT, obtained medium-sized correlations between BAT measures (fear and avoidance) and the Yale-Brown obsessive compulsive inventory (rs = .38 to .43). Woody,
Steketee, and Chambless (in press-b) obtained small correlations between BAT fear
and avoidance measures and the SCL-90-R OC scale (rs = -.02 to .26). Further studies, using a broader range of OC measures, are required before conclusions can be
drawn about the convergent validity of the various versions of the BAT.
Cum&.
BATS have the advantage of providing in vivo measures of fear and avoidance in OCD. The BAT appears well suited for assessing fear and avoidance of contaminated stimuli associated with washing compulsions. It is more difficult to design
BATS for patients with other types of compulsions, such as checking or ordering rituals (Steketee, 1993). Indeed, Rachman et al. (1971) were unable to devise BATS for 2
of their 10 OC patients. A further problem is that fear and avoidance can be situation
specific; a compulsive washer might fear and avoid touching objects in his or her
home (because of fear of contaminating the house), yet this person may fearlessly
handle objects in other situations (Rachman, 1993; Rachman Jc Hodgson, 1980).
When a BAT is conducted in the clinic it may fail tocapture the severity of fear and
avoidance that occurs in the patients habitual environment. Some studies have
assigned BATS as homework assignments (e.g., Cottraux et al., 1990), but this introduces the problem of determining whether the BAT was appropriately conducted.
There is no standardized protocol for administering the BAT Indeed, several versions are available. Performance on the BAT varies with the perceived demands to
perform the task (Nietzel et al., 1988). If the assessor strongly encourages the patient
to approach the feared object, then this may not provide an accurate measure of nat-
0bsesskm.sand Cum$nd.sti
265
urally occurring avoidance. Low demand BATSare likely to be better measures of such
avoidance (Kern, 1983). Unfortunately, the published studies provide little information about the instructions given to OC patients, and so it is difficult to estimate the
degree of demand placed on patients. Given these difficulties, along with the dearth
of reliability and validity data, it is not surprising that many OCD investigators no
longer use BATS (Emmelkamp, 1982; Foa, Steketee, 8c Milby, 1980)
Direct Observation
Direct observation of the frequency or duration of compulsive rituals has been used
in several case studies. Mills, Agras, Barlow, and Mills (1973) assessed washing compulsions of OC inpatients by installing a device that recorded the number of times the
patient approached and used the sink. For another inpatient, Mills et al. mounted a
video camera in a patients room to assess the duration of rituals associated with going
to bed. Both methods were validated against ratings made by ward staff, and were sensitive to treatment effects. Turner and colleagues (Turner, Hersen, Bellack, Andrasik,
& Capparell, 1980; Turner, Hersen, Bellack, & Wells, 1979) also used observational
methods to assess rituals. Ward staff were trained to use time sampling procedures to
assess rituals, and achieved good inter-rater reliability (rs = .87 to .99). These measures
were sensitive to the effects of treatment.
Direct observation methods have not been used in controlled outcome trials, and
would be difficult to use to monitor compulsions in outpatients. Their test-retest reliability, convergent validity, and discriminant validity remain to be determined.
Diary Methods
Diary methods are popular ways of assessing the frequency, duration, severity, and
context of problematic behaviors. For example, panic attack diaries are popular measures in treatment-outcome
studies of panic disorder (e.g., Clark et al., 1994) and
have been used in studies of a variety of other disorders, including social phobia
(Glass & Arnkoff, 1994), chronic pain (Philips, 1988), and insomnia (Lacks & Mot-in,
1992). Diary methods appear to be useful methods for assessing the frequency, dumtion, and situational determinants of obsessions and compulsions. Several studies
have used these methods in OCD outcome studies (e.g., Boersma, Den Hengst,
Dekker, & Emmelkamp, 1976; Foa, Steketee, & Milby, 1980; Hackman & McLean,
1975)) but there appear to be no published data on their psychometric properties.
In their review of the assessment of obsessions and compulsions, Mavissakalian and
Barlow (1981) noted that a major aim of treatment is to reduce the frequency and
duration of obsessive<ompulsive behaviors. Yet, they were surprised to find that frequency counts of target behaviors are rarely used in OCD treatment studies. The situation has not changed over the past decade; diary methods are rarely used despite
their value and ease of applicability. The development of reliable and valid diary
methods may help us understand how OCD treatments influence the patient in his or
her habitual environment.
OC SCALES FROM THE SYMPTOM CHECKLIST-90-REVISED
(SCL30-R OC) AND ITS PREDECESSORS
The Hopkins Symptom Checklist (HSCL; Derogatis, Lipman, Rickels, Uhlenhuth, &
Covi, 1974) is a 58item self-report inventory containing five scales: Somatization, OC
symptoms, interpersonal sensitivity, general anxiety, and depression. The OC scale
266
S. Taylor
consists of eight items, which each assess a different symptom. The subject is asked to
use a 4point scale to rate the extent to which he or she was distressed by each symp
tom over the past week. For each item, a rating of 0 indicates either that the symptom
was absent or that it was present but did not evoke distress. A rating of 3 indicates the
symptom was present and evoked extreme distress. Thus, the HSCL scales measure the
number and severity of symptoms. The HSCL was subsequently expanded to form the
Symptom Checklist-90 (SCLSO: Derogatis, Lipman, 8c Covi, 1973), which includes the
original five scales and four new scales: Hostility, phobic anxiety, paranoid ideation,
and psychoticism. Two items were added to the OC scale, and a 5-point (O-4) rating
was used throughout, using the same instructions and anchor points as the 4point rating. A minor revision was published as the SCLSO-R (Derogatis, 1977), where the OC
scale was unchanged apart from minor changes in wording. In summary, the OC scales
from the HSCL, SCLSO, and SCLSO-R are very similar to one another.
Reliability
Good internal consistencies have been reported for all versions of
the OC scale with coefficient a of 87 for the HSCL version (Derogatis et al., 1974), .86
for SCL90 version (Derogatis, Rickels, & Rock, 1976), and 88 to .91 for the SCL96R
version (Shutty, DeGood, 8c Schwartz, 1986; Woody, Steketee, 8c Chambless, in pres+b).
Itstend ccmshmy
TestRezest tdkbi&y. Derogatis et al. (1974) reported a 7day test-retest Pearson r of

.84 for the HSCL OC scale. Steketee and Doppelt (1986), also using the HSCL OC
scale, obtained an intraclass correlation of .56 for a mean test-retest period of 24 days
(range 4-60 days). Using the SCLSO OC scale, Kim, Dysken, and Katz (1989) reported a 7day intraclass correlation of .74, and Kim, Dysken, and Kuskowski (1992)
obtained a 14-day intraclass correlation of .79. In summary, each version of the OC
scale has been found to have acceptable test-retest reliability for intervals up to at
least 7 days.
Validity
Gi.hon-ReM
valSty. Steketee and Doppelt (1986) found OCs (n = 62) scored
higher than a mixed group of nonpsychotic patients (n = 49) on the HSCL OC scale.
The authors reported no information on how the patients were diagnosed. Woody et
al. (in press-b) diagnosed patients with the Structured Clinical Interview for DSM-IIIR (Spitzer, Williams, Gibbon, & First, 1990). Despite having good statistical power,
Woody et al. found that OCs (n = 54) did not significantly differ from panic disordered patients (n = 31) on the SCLSO-R OC scale. In summary, there is mixed evidence for the criterion-related validity of the SCLSO-R OC scale and its predecessors.
Cw
ZJ&X$. The SCL90-R OC scale (and its predecessors) each tend to have
medium-sized correlations (mean r= .37, range = -.02 to .72) with other OC measures
(i.e., anxiety and avoidance ratings from behavioral avoidance tests, subscales of the
Leyton Obsessional Inventory, the Maudsley Obsessional Compulsive Inventory, the
Compulsive Activity Checklist, Likert scale ratings of global symptom severity, Global
Obsessive Compulsive Scale, Padua Inventory, and Yale-Brown Obsessive Compulsive
Scale; Kim et al., 1992; Stanley et al., 1993; Steketee & Doppelt, 1986; Sternberger
& Bums, 1990a, 1990b; van Oppen, Hoekstra, & Emmelkamp, 1995; Woody et al.,
in press-b). Thus, the OC scales of the SCLSO-R and predecessors have acceptable
convergent validity.
Obsessionrand compllsionr
267
Z&c&&u& r&&y. The HSCL OC scale has been found to have medium-t&uge
correlations with measures of various types of psychopathology, with rs ranging from
36 (MMPI Scale 7, Psychasthenia) to .52 (IPAT anxiety scale; Steketee & Doppelt,
1986). Three studies found the SCL90 OC scale had medium-t&uge
correlations
with measures of depression (rs = .41 to .81), anxiety (rs = .54 to -64)) hostility/anger
(rs = .43 to .65), and psychotic symptoms (r= .57; Clark & Friedman, 1988; Derogatis
et al., 1976, Dinning & Evans, 1977). The SCLSO-R OC scale has been found to have
large correlations with the SCL90-R anxiety scale (r = .56) and SCL-90-R depression
scale (r = .79). In summary, the OC scales of the HSCL, SCL90, and SCL90-R have
medium-telarge
correlations with measures of various types of psychopathology,
including depression, anxiety, hostility/anger, and psychotic symptoms. These correlations tend to be larger than the convergent validity correlations, which indicates
poor discriminative validity.
The SCLSO-R OC scale and predecessors have acceptable internal consistency and
adequate test-retest reliability for periods up to at least 7 days. There have been few
studies of criterion-related validity, and available findings offer mixed support.
Convergent validity appears adequate, but discriminant validity is poor. The OC scales
have medium-to-large correlations with a variety of psychopathologic measures, which
suggests the OC scales are largely measures of general (nonspecific) distress. This passibility is supported by a review of the item content of the scales. The scales overemphasize nonspecific distress and under-emphasize OC symptoms; half the items of the
HSCL OC scale and 40% of items of the later versions refer to nonspecific symptoms
found in several anxiety and mood disorders; that is, your mind going blank, ?rouble remembering things, difliculty making decisions, and trouble concentrating.
A further problem is the scales confound the frequency of symptoms with the amount
of distress evoked by them. This makes scores ambiguous; a high score indicates the
symptom was present and evoked distress, but a low score may indicate low fiequency, low distress, or both. Given these problems, the SCL90-R OC scale (and predecessors) are not recommended as treatment outcome measures.
LEYTON
OBSESSIONAL
INVENTORY
(LOI)
The LO1 was developed to assess obsessionality in Yhouse-proudn (perfectionistic)

housewives (Cooper & McNeil, 1968), and subsequently used to assess clinical OC
phenomena (Cooper, 1970). The inventory consists of 69 items that each describe
an OC symptom (46 items) or so-called OC character trait (23 items). The subject
completes the LO1 by responding yes or no according to whether or not the
items are self-descriptive. A subset of 39 items then are rerated to assess: (a) the
amount interference caused by the symptom or behavior described in the item (4point scale), and (b) the degree to which the subject resists performing the activity
described in the item (5point scale). Resistance and interference ratings are made
only for items endorsed with a yes response. Thus, the LO1 contains four subscales: prevalence of OC symptoms, prevalence of OC traits, degree of interference, and degree of resistance. The present article is concerned with OC symptoms
rather than putative OC character traits, and so I will focus on the symptom, interference, and resistance subscales.
The LO1 originally had a postbox response format, where each item was printed
on a separate card and the subject rated the self-descriptiveness of items by placing
268
S. Taylor
cards in boxes marked yes or %o. An assessor then instructed the subject to make
resistance and interference ratings for select items. Separate versions of the LO1
were devised for men and women, distinguished by minor differences in wording
(Cooper, 1970). The postbox format proved cumbersome and time consuming,
requiring 30-45 min per subject (Cooper & McNeil, 1968). The LO1 was later converted to a self-report questionnaire, and the wording was revised to make a common
version for both genders (Kazarian, Evans, & Lefave, 1977; Snowdon, 1980).
Snowdon (1980) reported the postbox and questionnaire versions were highly correlated (r= .72). For most purposes the postbox and questionnaire versions are prob
ably interchangeable, although treatment outcome studies favor the latter because
of its ease of administration.
Reliability
Internal cxms&mq. The symptom, interference,
and resistance subscales have acceptable-to-good internal consistencies, with coefficients ~ZC
ranging from .75 to .90 (Richter,
Cox, & Direnfeld, 1994; Stanley et al., 1993).
TestRetat red-$.
Kim et al. (1989) assessed a sample of OCs and obtained good
7day test-retest reliabilities, with intraclass correlations ranging from .80 (interference subscale) to .83 (resistance subscale). Kim, Dysken, and Kuskowski (1990)
administered the LO1 three times over 14 days to another sample of OCs. The intraclass correlation for the total scale (sum of symptom and trait items) was .73.
Intraclass correlations were .79 and .84 for the interference and resistance subscales,
respectively. These results suggest the subscales have acceptable test-retest reliabilities, at least over a 14-day interval.
Validity
CriteriorrRe;late z&d+. Cooper (1970) and Millar (1980) found that OCs scored
higher than normal controls on each of the symptom, inference, and resistance sub
scales. Kendell and DiScipio (1970) found that OCs scored higher than depressed
patients on these subscales. Millar (1983) found that OCs scored higher than
depressed patients on the interference and resistance subscales, but not on the symp
tom subscale. Stanley et al. (1993) used a structured interview the Anxiety
Disorders Interview Schedule-Revised (ADIS-R: DiNardo & Barlow, 1988) - to estab
lish the diagnostic status of their patients. The ADIS-R has good reliability and validity for the diagnosis of DSM-III-R anxiety disorders (DiNardo, Moms, Barlow, Rapee,
& Brown, 1993). Stanley et al. found OCs differed from patients with other anxiety
disorders on the LO1 symptom, interference, and resistance subscales. In summary,
most studies were limited by the fact that the validity of the criterion (diagnostic status) is unknown, because diagnoses were based on chart reviews or unstructured
interviews. Nevertheless, most studies support the criterion-related validity of the
symptom, inference, and resistance subscales.
comrergent z&d&y. The LOI symptom, interference, and resistance subscales tend to
have large correlations (mean r= .62, range = .38 to .77) with other OC measures (i.e.,
SCLSO-R OC scale, Maudsley Obsessional Compulsive Inventory, Padua Inventory,
and Yale-Brown Obsessive Compulsive Scale; Hodgson 8c Rachman, 1977; Kim et al.,
1990; Kraaijkamp, Emmelkamp, & van den Hout, 1986, Richter et al., 1994; Sanavio,
1988; Stanley et al., 1993). This indicates good convergent validity.
Obsessims and compulsions
269
Disc&a&~
val&Uy. Rendell and DiScipio (1970) found the LO1 symptom subscale
had large correlations (r= .53) with the neuroticism scale of the Eysenck Personality
Inventory (EPI). Stanley et al. (1993) found the LO1 subscales had moderate correlations (rs = .36 to 37) with the EPI neuroticism scale. The subscales had small correlations (Iris < .27) with SCLSO-R scales assessing somatization, anxiety, phobia,
depression, and interpersonal sensitivity. The LO1 subscales had small-tomedium correlations with the SCLSO-R hostility, paranoid ideation, and psychoticism scales
(rs = .29 to .42). There was little to distinguish the pattern of correlations of the LO1
subscales. Richter et al. (1994) found the subscales had medium-tolarge correlations
(7s = .43 to .50) with the Hamilton depression scale. In all, the correlations between
the LO1 subscales and non-OC measures tend to be smaller than the convergent validity correlations. These findings support the discriminant validity of the LO1 subscales.
Comment
The LOI subscales have acceptable psychometric properties, yet they also have several important drawbacks. The three subscales are highly intercorrelated with one
another (mean r= .81, range = .70 to .91: Rachman et al., 1973; Richter et al., 1994;
Stanley et al., 1993), which suggests it is redundant to use them all as outcome measures. The symptom subscale was developed to assess house-proud housewives and,
therefore, contains many items concerned with cleanliness and tidiness. It has few
items assessing other symptom domains. For example, only three items pertaining to
checking, which is a serious limitation because checking is one of the most common
compulsions (APA, 1994). This means that OCs with checking rituals may obtain spuriously low scores on the LO1 symptom subscale.
A further problem is the resistance subscale may yield misleading results because it
confounds the intensity of resistance with the number of obsessions and compulsions
reported by the person. That is, the resistance scale is constructed such that an item
is rated for resistance only if the subject indicates that he or she experiences the symp
tom described in the item. Thus, high scores on the resistance scale can be obtained
only from subjects endorsing a lot of symptoms. Although there can be no resistance
unless the person has at least one OC symptom, this does not mean that resistance is
naturally correlated with the number of obsessions and compulsions reported by the
person. Indeed, the LO1 resistance scale can produce a misleading picture of the
patients degree of resistance. To illustrate, two patients might equally struggle to
resist their symptoms. If patient A has more symptoms than patient B, then patient A
will obtain a higher score on the LO1 resistance subscale, giving the misleading
impression that patient A is exerting stronger resistance.
The LO1 resistance subscale also entails the questionable assumption that greater
resistance is associated with more psychopathology. Indeed, it can be argued that
greater resistance is associated with less psychopathology
(Goodman,
Price,
Rasmussen, Mazure, Fleishmann, et al., 1989). Resistance to compulsions, for example, is a means of attaining mastery over symptoms and an important component of
behavior therapy for OCD (Rachman & Hodgson, 1980; Steketee, 1993). Resisting
obsessions also can lead to symptom reduction, to the extent that resistance involves
refusing to act on ones obsessional fears; for example, refusing to avoid fearevoking
stimuli can lead to habituation of obsessional fears. Resistance by means of deliberately suppressing obsessions can (under certain conditions) lead to a paradoxical increase in obsession frequency (Salkovskis & Campbell, 1994). Despite this exception,
it is generally found that if measures of resistance are not confounded with symptom
270
S. Tzylur
prevalence, then the degree of resistance is negatively correlated with the severity of
obsessions and compulsions (Goodman, Price, Rasmussen, Mazure, Fleishmann et al.,
1989; Woody et al., in press-a). Given the questionable assumptions underlying the
construction of the LO1 resistance subscale, it appears to be of dubious value in assessing resistance to obsessions and compulsions.
MAUDSLEY
OBSESSIONAL
COMPULSIVE
INVENTORY
(MOCI)
Hodgson and Rachman (1977) generated 65 true-false items to assess overt rituals and
related obsessions. The items were administered to 50 OCs and 50 non-OC neurotics.
The groups were discriminated by 30 items, which were retained to form the MOCI.
The authors then administered the scale to 100 OCs and factor analyzed the respons
es. Five factors were obtained. Four were used to form the MOCI subscales: (a) washing (11 items), (b) checking (9 items), (c) obsessional slowness/repetition (7 items),
and (d) doubting/conscientiousness
(7 items). The fifth factor, which assessed obsessional rumination, had salient loadings for only two items and so it was disregarded.
The subscales are essentially symptom checklists; that is, their scores reflect the
amount of time consumed by OC symptoms. To illustrate, a high score on the checking subscale indicates that the person spends a great deal of time checking and
rechecking. A high score on the doubting/conscientiousness
subscale indicates the
person has serious doubts about whether he/she has performed tasks adequately, and
a sense of incompleteness even when tasks are performed carefully (Rachman 8c
Hodgson, 1980).
Relationships among Subscales
The MOCI subscales were developed because they corresponded to separate factors,
and so conveyed unique (non-redundant) information. Factor analytic studies have
replicated the washing, checking, and doubting/conscientiousness
factors (Chan,
1990; Kraaijkamp et al., 1986; Rachman 8c Hodgson, 1980; Sanavio & Vidotto, 1985;
Stemberger & Bums, 199Ob), but only Kraaijkamp et al. (1986) found support for a
factorially distinct slowness subscale. In most studies, items from the slowness subscale
tended to load on other factors, such as the doubting/conscientiousness
factor.
Although most of the subscales are conceptually and Eactorially distinct, this does
not mean they are entirely unrelated. The doubting/conscientiousness
subscale
assesses doubts or uncertainties about the adequacy of ones actions. Such doubts can
lead to the repetition of actions, such as repeated checking, washing, and slowness in
completing tasks. Accordingly, the doubting/conscientiousness
subscale is correlated
with the checking subscale (mean r= .50) and has smaller but nontrivial correlations
with the washing subscale (mean r = .27) and the slowness subscale (mean r = .21;
Chan, 1990; Hodgson 8c Rachman, 1977; Richter et al., 1994).
Reliability
In&mud cmktmcy. Studies using clinical samples have generally obtained acceptable
internal consistencies for the checking, cleaning, and doubting/conscientiousness
subscales, with coefficients a ranging from 60 to .87 (Hodgson & Rachman, 1977;
Kraaijkamp et al., 1986; Rachman & Hodgson, 1980; Richter et al., 1994). Studies of
student samples yielded lower internal consistencies, ranging from .40 to .62 (Ghan,
1980 ; Sanavio & Vidotto, 1985; Stemberger & Bums, 1980b). Lower crs may have
been due to range restriction. Studies of clinical and nonclinical samples have gener-
Obsessions and hpulsions
271
ally found very low internal consistencies for the slowness subscale, with as ranging
from 0 to .44 (Chan, 1990 ; Rraaijkamp et al., 1986; Rachman & Hodgson, 1980;
Sanavio 8c Vidotto, 1985). Very low QS for the slowness subscales may be due to item
heterogeneity (see the Comment section below).
T&Retest w&&.Zity. Hodgson and Rachman (1977) examined the 4week test-retest
reliability for a sample of university students. Rendells (1963) z was used to examine
the concordance between item responses across the retest interval. For the sum of
MOCI items, test-retest reliability was found to be acceptable (2 = 8). Kraaijkarnp et
al. (1986) used the same procedure to examine the 4week test-retest reliability in a
mixed sample of OCs and depressed patients. Reliability was good (Z = .84) and MOCI
total scores correlated .92 across the test-retest interval. Stemberger and Bums
(1990b), using a sample of university students, found the 6-7 month test-retest reliability was acceptable for the MOCI total score (r = .69). In summary, the available data
suggest the MOCI total score has acceptable test-retest reliability over a period of at
least 6-7 months. Test-retest reliabilities of the subscales have yet to be reported.
Validity
CriteriorRe&ed r&d&y. Hodgson, Rankin, and Stockwell (1979, unpublished, cited
in Rachman 8c Hodgson, 1980) found the MOCI total scale discriminated between
phobics and OCs. Rraaijkamp et al. (1986) found the MOCI total scale reliably discriminated OCs from normal controls, anorectics, and patients with non_OC anxiety
disorders. Diagnoses were made from information obtained from the Present State
Exam (Wing, Cooper, & Sartorius, 1974). The MOCI did not discriminate between
OCs and depressed patients, although the latter may have been a chance result because comparisons were based on 89 00 and only 6 depressives. Compared to normal
controls and the combined psychiatric samples (anorexia, depression, and nor&C
anxiety disorders), OCs had higher scores on all MOCI subscales, except the slowness
subscale, where OCs and normals did not differ.
Hodgson and Rachman (1977) obtained retrospective therapist ratings of the
severity of washing and checking rituals for OCs. Patients were diagnosed on the basis
of unstructured clinical interviews. Patients were classified according to dichotomized
therapist ratings (slight or no problem vs. moderate or severe problem) for washing rituals and compared to dichotomized scores (low vs. high) on the MOCI washing
subscale. The same type of classification was made for therapist ratings of checking rituals and the MOCI checking subscale. Concordance between therapist ratings and
MOCI subscale scores was assessed by the y coefficient (Goodman & RruskaI, 1963).
Acceptable concordance (y = .7) was obtained for both washing and checking classifications. Rraaijkamp et al. (1986) performed the same analysis for a sample of OCs,
classified as either washers or checkers by two independent raters. The Y coefficient
was calculated separately for each rater, and was .74 and .78 for checking rituals, and
.85 and .89 for washing rituals. In sum, the results support the criterion-related validity of the MOCI total score and its washing, checking, and doubting/conscientiousness subscales. The only study of the slowness subscale (Rraaijkamp et al., 1986) failed
to support its criterion-related validity.
C&q+
v&d&y. The MOCI tends to have large correlations (mean r= .57, range = .23
to -77) with other OC measures (i.e., XL-9GR OC scale and predecessors, subscales of
the Leyton Obsessional Inventory, Compulsive Activity Checklist, Padua Inventory, and
272
S. Taylor
Yale-Brown Obsessive Compulsive Scale; Freund, Steketee, & Foa, 1987; Goodman et
al., 1989b; Hodgson & Rachman, 1977; Rraaijkamp et al., 1986; Richter et al., 1994;
Sanavio, 1988; Steketee & Doppelt, 1986; Steketee & Freund, 1993; Stemberger &
Bums, 1990a, 1990b; van Oppen, 1992; v-an Oppen et al., 1995; Woody et al., in pressa, in press-b). These results support the convergent validity of the MOCI.
Diseriminant validily. Chan (1990) found the MOCI correlated .54 with the Beck
Depression Inventory, and Richter et al. (1994) found the MOCI correlated .41 with
the Hamilton depression scale. Stemberger and Bums (1990b) found the MOCI had
small-t-medium correlations with all SCUO-R scales (rs = .26 to .36) except the SCL
90-R OC scale (r = .51). In general, the results show that correlations with non-OC
measures tend to be lower than correlations with OC measures, which supports the
discriminant validity of the MOCI.
tlhwqmt
and discri?ninantmalidityof the MOCZsubscaks Several studies have examined
the convergent and discriminant validities of the MOCI washing and checking subscales. The MOCI washing subscale has been found to have large correlations with the
Padua inventory contamination subscale (rs = .53 to .87) and small-tomedium correlations with the Padua checking subscale (rs = -.05 to 33) (Stemberger & Bums,
1990a; van Oppen, 1992; van Oppen et al., 1995). A similar pattern of results was
obtained for the MOCI checking subscale, which had large correlations with the
Padua checking subscale (rs = .62 to .84) and small-tomedium correlations with the
Padua contamination subscale (rs = .24 to .35) (Stemberger & Bums, 1990~ van
Oppen, 1992; van Oppen et al., 1995). The MOCI washing subscale has small-tomediurn correlations with the MOCI checking subscale (rs = .25 to .46: Chan, 1990;
Hodgson & Rachman, 1977; Stemberger & Bums, 1990b). These results indicate good
convergent and discriminant validities of the MOCI washing and checking subscales.
Sher and colleagues (Sher, Frost, & Otto, 1983; Sher, Mann, & Frost, 1984) found
that students with high scores on the MOCI checking subscale, compared to students
with low scores, had higher scores on a self-report measure of the frequency of
checking of everyday actions (e.g., checking lights and door locks). Frost and Sher
(1989) administered the MOCI to a sample of college students 1 month before an
exam. During the exam, students were asked to indicate how many times they
checked their answers. The MOCI checking subscale was correlated .27 with checking frequency, whereas the other subscales correlated were unrelated to checking
frequency (rs = -.08 to .02).
The MOCI checking and washing subscales have medium-tolarge
correlations
(r-s = .30 to .51) with the Beck Depression Inventory and Hamilton depression scale
(Chan, 1990; Richter et al., 1994). These tend to be lower than the convergent validity correlations, and so support the discriminant validity of the checking and washing
subscales. There is insufficient information to evaluate the convergent and discriminant validity of the other subscales.
The MOCI total scale has generally acceptable psychometric properties, as does its
washing and checking subscales. The other subscales require further investigation.
Available evidence suggests the slowness subscale is in need of revision. The MOCI
subscales were developed on the basis of factor analysis, and subsequent studies sup
port the factorial distinction between all but the slowness subscale. The latter has
Obsess~andGma~ionr
273
poor internal consistency, which is not surprising given its item content. Two of its
items are related to ruminations, two items refer to compulsive counting and the need
for routine, and only three items make direct reference to obsessional slowness.
Although the MOCI total scale has adequate psychometric properties, it also has
important limitations. The scale was developed to assess obsessions and compulsions
associated with overt rituals (Hodgson & Rachman, 1977). Yet some items do not
directly pertain to obsessions or compulsions (e.g., Some numbers are extremely
unlucky, Neither of my parents was very strict during my childhood). The MOCI
assesses washing and checking compulsions, which are the most common types of
compulsions (APA, 1994; Rachman & Hodgson, 1980), but does not assess other
important compulsions such as hoarding and covert rituals. It provides a limited
assessment of obsessional ruminations (two items).
The MOCI does not assess important parameters of OCD, such as interference and
resistance to compulsions. Interference only can be inferred by the number of symp
toms endorsed by the subject. Moreover, because the MOCI emphasizes cleaning and
checking rituals, patients with these compulsions may obtain higher overall MOCI
scores than patients with other, equally severe OC symptoms. This means it is possible
that patients with moderate washing and checking compulsions may obtain higher
MOCI scores than patients with severe obsessions or hoarding compulsions
(Goodman, Price, Rasmussen, Mazure, Delgado, et al., 1989).
The MOCI could be improved by addressing these issues. The internal consistency
of the slowness subscale could be enhanced by increasing the length of the scale by
adding items central to the construct of obsessional slowness. The addition of sub
scale(s) to assess obsessions also would improve the coverage of the MOCI. The addition of resistance and interference subscales would enhance the breadth of assessment.
COMPULSIVE ACTIVITY CHECKLIST (CAC)
The CAC was developed originally as a 62-item interviewer-administered schedule
to assess the extent to which OC symptoms interfere with everyday activities (Philpott,
1975). Each item lists an activity (e.g., washing, dressing, using electrical appliances),
which is rated on a 4-point scale, ranging from 0 (performance of activity within normal limits) to 3 (complete impairment). Impairment is rated according to four criteria: frequency, duration, avoidance, and oddity of behavior. To illustrate, a score of 3
would be given if (a) the activity takes three times longer than usual, (b) is three times
as frequent as usual, (c) definitely appears very odd, or (d) avoidance markedly interferes with activity. Criteria for normal and odd behavior are left to the judgement
of the interviewer. Interviewers are instructed to elicit concrete information to allow
them to make a rating (e.g., How long does it take you to brush your hair?).
The CAC has been revised several times, mainly by deleting items and changing to
a self-report format. Marks, Hallam, Connolly, and Philpott (1977) developed clinician-rated and self-report versions, each containing 39 items. Freund et al. (1987)
developed a 38item observer-rated version, and Cottraux, Bouvard, Defayolle, and
Messy (1988) developed an 18item self-report version. Most recently, Steketee and
Freund (1993) developed a 28item self-report version. Each revision was intended to
increase item homogeneity and discriminability of OCs from other populations,
although as we will see, the versions have very similar psychometric properties.
Instructions and the rating scale remained essentially unchanged. In summary, each
version is a measure of global impairment due to obsessions or compulsions, taking
into account duration, frequency, and avoidance.
274
S. Taylor
Reliability
Internal txm&emy. Good internal consistency has been reported for the 37-item selfreport CAC (a = .94: Cottraux et al., 1988) and for the 38item observer-rated version
(a = .91; Freund et al., 1987). Similar results were obtained for the 38item self-report
version (as = .86 to .95: Stemberger & Bums, 1990b; Steketee & Freund, 1987) and
for the 28item self-report version (a = .87: Steketee 8c Freund, 1987). Internal consistency of the other, less popular versions have not been reported.
Znterrcrter&ability
and txd&m&p
between seZf+
and obsewerd
vemions. Marks,
Stem, Mawson, Cobb, and McDonald (1980) had two independent assessors administer the 39item CAC to a sample of OCs. Total scores correlated .95 between
observers, and the observer-rated and self-report versions correlated .83. Freund et al.
(1987) obtained moderate inter-rater agreement (r = .64) for the 38item CAC. The
mean CAC score, averaged across raters, correlated .94 with the 38-item self-report
CAC. These results suggest the observer-rated CAC has adequate interrater reliability.
The self-report and observer-rated versions are highly correlated. It is possible the correlations between observer-rated and self-report versions were inflated by criterion
contamination. That is, patients may have rated their responses to the self-report version simply by recalling their responses to the observer-rated version.
TestAtetest mZiubiZity.Freund et al. (1987) averaged CAC ratings from two interviewers to examine the test-retest reliability of the S&item CAC. Test-retest reliability was .68 for a retest interval ranging from 5 to 60 days (mean = 37 days).
Cottraux et al. (1988) administered the 37-item self-report CAC to a sample of normal controls and found the l-month test-retest reliability was .62. Sternberger and
Burns (1990b),
using a sample of university students, obtained a 6-7 month
test-retest reliability of .74. Extrapolating from these results, it seems likely that the
self-report and observer-rated versions have good test-retest reliability over a period of weeks, if not months.
Validity
~riterion_~elated zm~idity. Using the 37-item self-report CAC, Cottraux et al. (1988)
found that OCs (diagnosed by an unspecified method) had higher scores than panic
disordered patients, social phobics, and normal controls. Steketee and Freund (1993)
compared OCs (diagnosed by an unspecified method) to patients with other anxiety
disorders and to university students. OCs had significantly higher scores on 29 of 38
items of the self-report CAC. In the absence of information on the reliability and
validity of the diagnoses, these findings offer only tentative support for the criterionrelated validity of the CAC.
Convergent validity. The self-report and observer-rated CACs tend to have medium correlations (mean r= .40, range = .19 to .84) with other OC measures (i.e., SCLSO-R OC
scale, Maudsley Obsessional Compulsive Inventory, Padua Inventory, and Likert scale
ratings of symptom severity; Cottraux et al., 1988; Freund et al., 1987; Marks et al.,
1980; Steketee 8c Freund, 1993; Stemberger & Bums, 1990b). These results support
the convergent validity of the CAC.
LXscriminant w&%+ Freund et al. (1987) found the 38item observer-rated CAC had a
medium correlation with the SCLSO-R OC scale (r = .38) and slightly smaller correla-
ObsessionsandconrpuLFionr
275
tions with the other !SCLWR scales (rs = .14 to 31). Foa et al. (1987) found the observer-rated CAC had medium correlations (rs = 33 to .47) with measures of depression
(i.e., the Reck Depression Inventory and patient and observer-mted Likert measures of
depression severity). To summarize, the observer-rated CAC has correlations with nonOC measures that tend to be similar in magnitude to correlations with OC measures.
This indicates weak discriminant validity. The same conclusion probably holds for the
self-report CAC, because the self-report and observer-rated CACs are highly correlated.
Comment
Since its appearance in the 197Os,the CAC has been through several revisions.The most
popular are the S&item self-report and observer-rated versions. The 2838 item self
report and observer-rated versions have very similar psychometric properties. Test-retest
reliability and internal consistency are good. Interrater reliability appears adequate.
Criterion-related and convergent validities are acceptable, but discriminant validity
appears weak. A further problem with the CAC is that it provides only an indirect measure of OC symptoms because it assessesonly the degree of interference in everyday
activities. It does not directly assessobsessions or compulsions. Moreover, scores on the
CAC are ambiguous because they confound slowness,avoidance, and oddity of behav
ior. The lack of a structured interview is a further limitation for the observer-rated version because psychometric properties may depend on the skill and training experiences
of the interviewer(s) rather than the properties inherent to the CAC.
OC SCALE FROM THE COMPREHENSIVE
PSYCHOPATHOLOGICAL
RATING SCALE (CPRS-OC)
The Comprehensive Psychopathological Rating Scale (CPRS, Asberg, Montgomexy,
Perris, Schalling, & Sedvall, 1978) is a set of 63 clinician-rated items that assessa range
of psychiatric signs and symptoms. Each item defines a sign or symptom, which is rated
on a 4point (O-3) severity scale. Each point on the rating scale also is accompanied by
a description. For example, a rating of 3 on the rituals item is indicated by extensive
rituals or checking habits that are time consuming and incapacitating. The interviewer is required to elicit sufhcient information to rate each item, using an unstructured
clinical interview. The CPRS-OC consists of eight items selected from the CPRS
because a sample of 24 OCs scored higher on these items than on the remaining items
(Thoren, Asberg, Cronholm, Jomestedt, & Traskman, 1980). The items are as follows:
rituals,D inner tension, compulsive thoughts, concentration difficulties, worrying over trifles, sadness,* lassitude, and indecision. Four of these items also are
included in the CPRS depression scale.
Reliability and Validity
The CPRS-OC has been used in several pharmacotherapy trials (Table 3), even though
its psychometric properties are largely unknown. Internal consistency and test-retest
reliability have yet to be examined. Thoren et al. (1980) reported moderate-tohigh interrater correlations for individual items (rs = .30 to .93) and for the total score (r = .97).
Criterion-related, convergent, and discriminant validities have yet to be examined.
Com m ent
There are several limitations to the CPRSOC. Its psychometric properties are largely
unknown, and only two of its eight items are specific to OCD: compulsive thoughts
276
S. Taylor
(obsessions) and rituals. The remaining items are either features of depression
(lassitude, n concentration difficulties, indecision, sadness) or are nonspecific
features of anxiety states (worrying over trifles, * inner tension). Insel et al. (1983)
modified the CPRS-OC by deleting items assessing sadness, inner tension, and worry.
The resulting 5-item scale still shares two items with the depression scale. Even in its
revised form the CPRS-OC appears to be largely a measure of nonspecific distress.
LIKERT SCALES
A variety of single-item Qpoint Likert scales have been developed to assess a variety of
aspects of OCD, including global measures of severity of obsessions and compulsions,
and specific scales, including measures of the degree of OGrelated fear, degree of avoidance, time spent ritualizing, and severity of urges to ritualize (e.g., Emmelkamp, 1982;
Foa et al., 1983,1992).
The scales may be rated by the patient or by an interviewer.
Reliability
Intewutmrvhbi&y and tdadmship betweenseljk+ort and obsemwm&d zmGons. Foa et
al. (1983) obtained high inter-rater correlations for Likert measures of severity of
obsessions and severity of compulsions (rs = .92 to .97). Cottraux et al. (1990) reported large correlations (rs = .74 to .89) between a self-report and observer-rated versions of two types of Likert measures (OGrelated anxiety/discomfort and duration of
compulsions). Large correlations also have been obtained among the patients, therapists, and independent observers ratings of a range of OC features, including main
fear, avoidance, and compulsion severity (rs = .64 to .83; Foa, Steketee, Kozak, 8c
Dugger, 1987). Thus, there is evidence of good interrater reliability, and high correlations between self-report and observer-rated Likert scales.
T&Ret& ns&zbiZi~.Steketee, Freund, and Foa (1988) reported the test-retest reliability of Likert scales (assessing main fear, avoidance, general functioning, anxiety,
and depression) ranged from .40 to .87 for self-report ratings, and .20 to 50 for
observer ratings over a mean 6O-day interval. These data suggest considerable variation in test-retest reliabilities. Unfortunately, reliabilities were not reported for individual scales (only the above-mentioned ranges were given), so it is not possible to
identify which scales had the lowest reliability. In summary, the test-retest reliability
of Likert scales require further investigation.
Validity
CriteriorrRelated valid&y. There have been no published studies of the criterionrelated validity of these scales. It may be assumed that the scales should have good
criterion-related validity because patients without OCD would have low (or zero)
scores on items measuring global severity of obsessions, compulsions, etc. However,
this assumption may not be warranted because unwanted intrusive thoughts often
occur in people without OCD (Rachman & de Silva, 1978; Salkovskis & Harrison,
1984), and compulsion-like behaviors (e.g., excessive checking) can occur in
patients with disorders other than OCD (e.g., generalized anxiety disorder; Craske,
Rapee, Jackel, & Barlow, 1989).
Ce
r&&y. Likert measures of OC symptoms generally have moderate correlations (mean r = .32, range = .17 to .62) with other OC measures, including the
Obsessionr and GmpukMns
277
SCLSO-R OC scale (and predecessors), Compulsive Activity Checklist, Yale-Brown

obsessive compulsive scale, and Padua inventory (Cottraux et al., 1988; Foa et al.,
1983; Freund et al., 1987; Steketee 8c Doppelt, 1986; van Oppen, Emmelkamp, van
Balkom, & van Dyck, in press; Woody et al., in press-a, in press-b). This suggests that
the Likert scales generally have acceptable convergent validity.
LGcrimina& z&id$y. Foa et al. (1983) found that Likert measures of OC symptoms had
small-tomedium correlations with HSCL measures of depression, somatization, anxiety, and interpersonal sensitivity (rs = .09 to .36). Foa et al. (1987) found Likert ratings
of the patients main fear and severity of compulsions had small correlations with the
Beck Depression Inventory, and with an interview rating of depression (rs < .29), and
small-to-medium correlations with a patient-rating of depression severity (rs = .28 to
.30). In all, these results suggest adequate discriminant validity of the Likert measures.
Comment
Likert Scales are popular because of their ease of administration and scoring.
Multiple scales can be used to assess multiple aspects of OCD. Indeed, the Yale-Brown
Obsessive Compulsive Scale (discussed below) can be regarded as a compilation of
such scales. A limitation of Likert scales is that researchers using them have provided
little information on the instructions accompanying the self-report versions, and no
information on the questions asked by interviewers using the observer-rated versions.
This makes it difficult to determine whether different investigators are administering
the measures in the same way.
NIMH GLOBAL OBSESSIVE
COMPULSIVE
SCALE (COG)
The GOCS (Insel et al., 1983) is a single-item Likert-like measure of the overall
severity of OC symptoms. It is a clinician-rated scale based on other NIMH global
rating scales, such as the global measures of mania and depression (Murphy, Pickar,
& Alter-man, 1982). It differs from the Likert scales described in the previous section in two ways: the number of rating points (15 vs. 9), and the clustering of
descriptors on the scale. The observer completes the GOCS by selecting one of 15
severity levels, ranging from 1 (minimal symptoms or within normal range) to 15
(very severe). Severity levels are clustered into five main groups (i.e., ratings of l-3,
4-6, 7-9, 10-12, and 13-15), with detailed descriptors for each cluster. For example, ratings from lo-12 represent severe obsessive-compulsive behavior, defined as
symptoms that are crippling to the patient, interfering so that daily activity is an
active struggle. Patient may spend full time resisting symptoms. Requires much
help from others to function.
Reliability and Validity
The GOCS has been used in several treatment outcome studies (Table 3) even though
little is known about its psychometric properties. Inter-rater reliability has yet to be
determined. Two studies have examined test-retest reliability. Kim et al. (1992)
reported a twoweek intraclass correlation of .98, and Kim, Dysken, Kuskowski, and
Hoover (1993) obtained a 2-week intraclass correlation of .87.
There have been no studies of criterion-related validity or discriminant validity. With
regard to convergent validity, one study found the GOCS had a medium correlation
(7 = .33) with the SCLSO-R OC scale, and several studies obtained large correlations
278
S. Tqlur
(mean r= .69, range = .63 to .77) with the YBOCS (Black, Kelly, Myers, de Noyes, 1990;
Goodman, Price, Rasmussen, Mazure, Delgado et al., 1989; Kim et al., 1992,1993). The
convergent validity of the COCS is promising, albeit in need of further evaluation.
Correlations between the COCS and YBOCS may have been spuriously inlIated because
in each case the scales were administered by the same interviewer. This means that ratings made on the YBOCS may have influenced those on the COCS, or vice versa.
Comment
The COCS has the advantage of being a simple l-item scale, which, no doubt,
accounts for its popularity in treatment outcome studies. However, little is known
about its reliability and validity. COCS ratings are based on unstructured clinical interviews, and so its psychometric properties may vary widely from one study to the next,
depending on the adequacy of the interviews. The COCS provides only a global
assessment of OC symptoms, and fails to capture information about the severity of different types of OC symptoms.
YALE-BROWN
OBSESSIVE-COMPULSIVE
SCALE
The YBOCS is a semistructured interview designed to assess symptom severity and

response to treatment for patients diagnosed with OCD (Goodman, Price, Rasmussen,
Mazure, Fleishmann et al., 1989; Goodman, Price, Rasmussen, Mazure, Delgado et al.,
1989; Goodman, Rasmussen et al., 1989). It consists of three parts:
1. Definitions and examples of obsessions and compulsions, which the interviewer
reads to the patient.
2. A Symptom Checklist, containing over 50 common obsessions and compulsions,
including obsessions about aggression, contamination, and counting, and compulsions about cleaning, checking, ordering, and hoarding. The interviewer
asks the patient whether the symptoms are present currently or were present in
the past. The interviewer then asks the patient to list the most prominent obsessions, compulsions, and OCrelated avoidance behaviors.
3. The YBOCS proper, which consists of 10 core items and 11 investigational
items. The latter are included on a provisional basis and require further evaluation. The core items assess five parameters of obsessions (items l-5) and compulsions (items 6-10): (a) duration/frequency,
(b) interference in social and
occupational functioning, (c) associated distress, (d) degree of resistance, and
(e) perceived control over obsessions or compulsions. Thus, the YBOCS assesses parameters of symptom severity independent of symptom content.
For the YBOCS proper, each core item is rated by the interviewer on a 5-point
scale, ranging from 0 (none) to 4 (extreme). The rater must determine whether the
patient is presenting with real obsessions or compulsions, and not symptoms of
another disorder such as paraphilia. All items are accompanied by probe questions,
and written definitions accompanying each point on the O-4 scales. Items are rated
in terms of the average severity of each parameter over the past week. To illustrate,
item 1 assesses the average time spent on all obsessions over the past week. The
accompanying rating scale ranges from 0 (no obsessions) to 4 (extreme, greater
than 8 hours/day or near constant intrusions). Scores on the 10 core items are
summed to yield scores for the obsessions subscale, the compulsions subscale, and
the total (lo-item) YBOCS scale.
Obsd
and Ckm@lsions
279
The YBOCS investigational items assess the following: amount of time free of obsessions or compulsions, insight into the irrationality of obsessions and compulsions,
avoidance, degree of indecisiveness, overvalued sense of personal responsibility,
obsessional slowness/inertia, pathological doubting, global severity, overall response
to treatment, and reliability of information obtained from the patient. They are rated
by the interviewer on O-4 or O-6 scales, similar to those used for the core items.
YBOCS resistance items are rated such that greater resistance is associated with
lower scores, because greater resistance is associated with less impairment in social
and occupational functioning. This scoring rule is supported by the finding that resistance scores are correlated with less severe OC symptoms, as assessed by other YBOCS
items (Goodman, Goodman, Price, Rasmussen, Mazure, Fleishmann et al., 1989;
Woody et al., in press-a).
In practice, most published treatment outcome studies used only the sum of the 10
core items. Scores on the obsession and compulsion subscales are infrequently used,
and the Symptom Checklist has yet to be used as an outcome measure. In the following, the review is confined to the psychometric properties of the lo-item YBOCS
because there is little or no available information on the properties of the Symptom
Checklist or the investigational items. Accordingly, I will use the acronym YBOCS to
refer to the scale formed by the sum of the 10 core items.
Reliability
Z&et-m&r reEub29. Price, Goodman, Charney, Rasmussen, and Heninger (1987) ob
tamed an intraclass correlation of .99 when the YBOCS was administered by two independent raters to 10 OCs. Goodman, Price, Rasmussen, Mazure, Fleishmann et al.
(1989) assessed the inter-rater reliability of the YBOCS by having six trained raters
evaluate videotape interviews of six OCs. The intraclass correlation was .80. In a second study reported in the same article, four trained raters evaluated videotaped interviews of 40 OCs, yielding an intraclass correlation of -98. Jenike et al. (1990) used four
raters to assess 40 OCs and obtained an intraclass correlation of .96 for the YBOCS.
No information was presented on whether the ratings were based on audiotapes,
videotapes, or live interviews. Woody et al. (in press-a) had an interviewer obtain
YBOCS ratings from live interviews of 30 OCs, and then a second rater listened to
audiotapes of the interviews. The intraclass correlation was .93.
The results of these studies suggest the YBOCS has excellent inter-rater reliability.
However, it is possible that inter-rater reliability was spuriously inflated, at least to some
degree. The reliability estimates were obtained by having one evaluator rerate taped
interviews of another evaluator. This shows that one can score anothers interview reliably, but not that one can administer the instrument reliably. It is quite a different task
to reproduce a raters score, based on a taped interview, than to interview the patient
from scratch and obtain a score that matches that of another rater who also interviews the patient independently. If the original (criterion) rater makes the mistake of
giving the patient actual rating categories to choose from (instead of the interviewer
rating the categories), then extremely high reliabilities can be obtained on rerating if
the patients self-ascribed category is the rating. The irony, of course, is that just
accepting the patients self-rating, rather than doing the difficult work of evaluating
the details of the symptoms, and assigning a rating, appears more reliable.
Unfortunately, it appears to be common for evaluators using the YBOCS to make the
lThe author acknowledges, with thanks, an anonymous reviewer as the source of these comments.
280
S. Taylor
mistake of giving patients the rating categories, despite their having received training
to the contrary.
Znternulm.
The YBOCS has acceptable-to-good internal consistency with coefficients 01 ranging from .69 to .91 (Goodman, Price, Rasmussen, Mazure, Fleishmann
et al., 1989; Richter et al., 1994; Woody et al., in press-a).
TestRezest &ubili#y. Rim et al. (1990,1992,
1993) administered the YBOCS to three
samples of OCs three times over a 2-week period. Intraclass correlations ranged from
.81 to .97. Woody et al. (in press-a) administered the YBOCS to 24 OCs on two occasions over test-retest intervals ranging from 10 to 103 days (mean = 49 days). The
intraclass correlation was .61, and was reduced probably because of the large retest
interval. The findings suggest the YBOCS has good test-retest reliability over at least
a cl-week interval.
Validity
cr&r&&&let&
valid@. The VBOCS was intended for use with patients diagnosed
with OCD, and so there has been only one study of its criterion-related validity.
Rosenfeld, Dar, Anderson, Robak, and Greist (1992) found that patients with OCD
(method of diagnosis unspecified) had higher YBOCS scores than patients with other
anxiety disorders and normal controls.
Convergientr.&i#y.TheYBOCStendst 0 h ave large correlations (mean r= .51, range = .17
to .77) with other OC measures (i.e., anxiety and avoidance ratings from behavioral
avoidance tests, SCL90-R OC scale, subscales of the Leyton Obsessional Inventory,
Maudsley Obsessional Compulsive inventory, Likert scales of symptom severity, Global
Obsessive Compulsive Scale; Black et al., 1990, Goodman, Price, Rasmussen, Mazure,
Delgado et al., 1989; Rim et al., 1996,1992; Richter et al., 1994; Woody et al., in press-a,
in press-b). These results indicate that the YBOCS has good convergent validity.
ZXseriminant n&&y. Studies of discriminant validity have been less encouraging. The
YBOCS has large correlations with the Hamilton depression scale (mean r= .64, range
= .53 to .91) and large correlations with the Hamilton anxiety scale (mean r = .62,
range = .47 to 85; Goodman, Price, Rasmussen, Mazure, Delgado et al., 1989;
Hewlett, Vinogradov, & Agms, 1992; Price et al., 1987; Richter et al., 1994). These
studies show that correlations between the YBOCS and measures of depression and
general anxiety tend to be as large as the convergent validity correlations. This suggests the IO-item YBOCS has poor discriminant validity.
Comment
The YBOCS provides a comprehensive assessment of OC symptoms and their parameters. The core items have good interrater reliability and acceptable internal consistency. Although there is evidence of adequate convergent validity, the lo-item
YBOCS has weak discriminant validity. The psychometric properties of the Symptom
Checklist and investigational items remain to be investigated.
The Symptom Checklist requires the assessor to inquire about a wide range of obses
sive and compulsive phenomena. This is important for a comprehensive assessment
because patients may feel embarrasse d or otherwise reluctant to discuss their obsessions
and compulsions, and they may not mention these symptoms unless the interviewer
Obsesskmsand Comprlsionr
281
directly asks about them. A shortcoming of the Symptom Checklist is that it provides a
limited assessment of cognitive compulsions (e.g., repeating special words or phrases to
undon disturbing thoughts). The Checklist was recently expanded by Foa and Kozak to
assess these phenomena (personal communication, April, 1994).
The YBOCS provides separate scores to measure the severity of obsessions and
compulsions. However, most outcome studies simply combine these into a total score.
Rim et al. (1989) observed that if a patient has only obsessions or compulsions, the
YBOCS total score may be spuriously low even if symptoms are severe. The use of sub
scales would provide more information about the effects of treatment (e.g., some
treatments may have a greater effect on compulsions than obsessions) and would help
circumvent the problem raised by Rim et al.
YBOCS interviews (including the Symptom Checklist, core items, and investigational items) are time consuming, requiring an average of 40 min per patient from a
trained interviewer (Rosenfeld et al., 1992). Recently, Rosenfeld et al. (1992) developed a self-administered computerized version which was well received by patients
and yielded comparable ratings to those obtained from the interview version (97%
agreement).*
Self-report versions also have been developed (beckman, Walker,
Goodman, Pauls, & Cohen, 1994; Warren, Zgourides, & Monto, 1993), although their
psychometric properties remain to be determined.
CONTENT,
RELlABlLITY,
AND VALIDITY:
SUMMARY
AND CONCLUSlONS
The psychometric properties of the assessment methods reviewed in this article are
summarized in Table 1. As the table shows, little is known about the psychometric prop
erties of behavioral assessment methods. Behavioral Avoidance Tests (BATS) have the
advantage of providing in tivo measures of O&elated fear and avoidance. Unfortunately,
these measures are sometimes diEcult to construct, and often focus on external fear stimuli to the neglect of internal sources of fear (e.g., fear of having a bad thought). BATS
also fail to assess covert avoidance (e.g., imagining a glove on ones hand while touching
a contaminant). Although these limitations could be addressed by including selfreport
measures of such forms of fear and avoidance, BATS are used increasingly less often in
treatment outcome studies (Emmelkamp, 1982; Foa et al., 1992).
Diary measures of naturally occurring target behaviors are popular in treatment-outcome studies of panic disorder (e.g., Clark et al., 1994), and have been used
in studies of other disorders, including social phobia (Glass & Arnkoff, 1994) and
chronic pain (Philips, 1988). Surprisingly, these methods are used infrequently in
OCD outcome studies. The assessment of OCD would be advanced by the develop
ment and validation of such measures.
Direct observation methods have been used occasionally in case studies of inpatients (e.g., Mills et al., 1973). Although these methods are more difficult to apply to
*The computer-administered and interview versions were administered in counter-balanced

order. Inter-version agreement may have been inflated by criterion contamination. However,
thii would occur only if the interviewer asked the patient to make the ratings for each YBOCS
item (which violates YBOCS protocol) and if the patient simply recalled the interview ratings
when completing the computerized version (and vice versa when the versions were completed
in reverse order). Rosenfeld et al. (1992) did not report whether protocol violations occurred,
nor did they report assessing for such violations. Accordingly, it is possible that interversion
agreement may have been inflated.
S. Taylm
282
outpatients, it may be possible to have significant others make ratings of particular

patient behaviors (e.g., the frequency or duration of handwashing). Such ratings - if
provide valuable information about the occurrence
found reliable and valid -would
of OC symptoms in the patients habitual environment. There have yet to be pub
lished studies of the feasibility of this approach.
Self-Report
lnven tories
Self-report inventories are popular because of their ease of administration. They differ
markedly in their breadth of measurement; some provide measures of different OC
phenomena (e.g., the MOCI subscales) whereas others are simply global measures of
symptom severity (e.g., the SCIAO-R OC scale). As summarized in Table 1, the inventories also differ in their psychometric properties. The SCIAWR OC scale (and predecessors) has adequate reliability and convergent validity, but uncertain criterion-related
validity and poor discriminant validity. The item content of the SCIA@R OC scale and
predecessors suggests they are essentially measures of nonspecific distress. This is consistent with their high correlations with measures of general psychopathology.
The LO1 subscales have adequate reliability and validity. The MOCI total scale also
has adequate psychometric properties. The MOCI subscales have adequate internal
consistency, apart from the slowness subscale. The MOCI washing and checking sub
scales have adequate validities, whereas the validities of the other subscales remain to
be evaluated. The self-report CAC has adequate psychometric properties, apart from
questionable discriminant validity. The self-report Likert scales have adequate convergent and discriminant validity, although their other psychometric properties remain to
be determined.
Some self-report inventories confound the assessment of important variables.
Distress caused by symptoms and symptom frequency are confounded in the SCIAWR
OC scale (and predecessors). The LO1 subscales are highly intercorrelated, which rai+
es the question of whether there is any advantage to having separate symptom, interference, and resistance subscales. The high correlations arise, in part, from the fact that
the assessment of interference and resistance is confounded with symptom prevalence.
Freund et al. (1987) claimed two advantages of the CAC over the MOCI: (a) the former uses a 4point rather than a dichotomous scale, and so the CAC may be more sensitive to gradations in symptom severiry; and (b) the CAC focuses on highly specific
behaviors, with each point on the rating scale labeled with a detailed written descrip
tion. The first point is unlikely to be correct because Dominguez, Jacobson, de la
Gandara, Goldstein, and Steinbrook (1989) found that original version of the MOCI
correlated .96 with a revised MOCI that used a 4point Likert rating. The advantages
of Freund et als second point also is questionable, because the MOCI assesses specific OC symptoms. In comparison, the CAC does not directly assess OC symptoms, it
merely assesses interference in everyday activities that may be due to obsessions, compulsions, or both. The CAC provides no indication as to the nature of the interference
because its ratings confound slowness, avoidance, and oddity of behavior. This means
that high scores on the CAC are ambiguous; they could arise from obsessional slow
ness, compulsive repeating, avoidance, and/or obsessional doubting and indecision.
In summary, in terms of breadth of measurement, reliability, and validity, there is
much to recommend the MOCI over the other self-report measures. The MOCI total
scale has comparable reliability and validity to other inventories. Compared to global
measures of OC symptoms (e.g., the LO1 Symptom subscale), the MOCI subscales
permit a more detailed assessment of OC symptoms. The MOCI has further advan-
CTksshs and Cam-
TABLE 1. Prqerties
of Measures:
283
summpry
Reliability
Internal
Consistency Interrater
Validity
TestCriterionRetest?
Related Convergent Diirhninant
Behavioral Approach
Tests
na
Direct Observation
Methods
na
Diary Methods
na
na
SCL!N%ROC
& predecessors
na
+
t
t
na
na
t
t
t
t
+
t
t
t
t
t
+
t
na
na
t
t
t
na
t
t
t
?
t
t
t
LO1 Suhscales
Symptom
Resistance
Interference
MOCI
Total scale
Washing suhscale
Checking suhscale
Doubting suhscale
Slowness suhscale
CAC
Self-Report
Observer-Rated
CPRS-OC
Likert Scales
GOCS
YBOCS (IO-item)
na
na
na
na
t
t
na
t
t
t
na
na
?
t
t
aFor at least 7 days.

Note.t = good or adequate; - = inadequate; 3 = insufficient information; na = not applicable;
SCUM&ROC = OC subscale of Symptom Checklist-90, Revised; LO1 = Leyton Obsessional
Inventory; MOCI = Maudsley Obsessional Compulsiie Inventory; CAC = Compulsive Activity
Checklist; CPRS-OC = Comprehensive Psychopathological rating scale, OC suhscale; GOCS =
Global ObsessiveCompulsive Scale; YBOCS = Yale-BrownObsessiveCompulsiie Scale.
tages of not confounding the assessment of important variables. However, the MOCI
has three main shortcomings: (a) It provides a limited assessment of obsessions, (b)
the slowness subscale has weak psychometric properties, and (c) it provides no measure of symptom interference or resistance.
Observer-Rated Scales
The CPRSOC, GOCS, and observer-rated CAC provide only global measures of
symptom severity. The CPRS-OC and GOCS have been used in numerous treatment
outcome studies (Table 3), despite the lack of data supporting their reliability and
S. Taylor
284
validity (Table 1). Although there are more data on the reliability and validity of the
observer-rated Likert scales, each of these measures suffer the important limitation of
being based on unstructured clinical interviews. As a consequence, the psychometric
properties of these scales may vary with the skill and (unspecified) training experiences of the interviewer. This is less of a problem when the interviewer follows a structured interview protocol such as that used in the YBOCS.
The YBOCS yields a wealth of information on OC symptoms and their parameters.
Each item is accompanied by detailed probe questions, which structures the interview
and ensures that appropriate information is collected. The YBOCS has acceptable
reliability and convergent validity, although discriminant validity is weak. It is unlikely that this is a weakness specific to the YBOCS, because the GOCS is highly correlated with the YBOCS, and so it may have similar problems with discriminant validity.
Moreover, the item content of the CPRS-OC suggests it is a measure of nonspecific distress, and so it is also likely to have even worse discriminant validity than the YBOCS.
Apart from the time required to administer the YBOCS (approximately 40 min), it is
generally superior to the other observer-rated scales covered in this review. The
YBOCS (including the Symptom Checklist and investigational items) has advantages
over self-report measures, including greater coverage (i.e., it assesses a range of OC
symptoms and parameters), greater flexibility, and allows the interviewer to determine whether the patient is reporting OC symptoms or other phenomena, such as tics
or paraphilic symptoms. Considering reliability, validity, and breadth of measurement, the YBOCS appears to be the best available observer-rated scale.
SENSITIVITY
TO TREATMENT
EFFECTS
Treatment outcome measures need to be more than reliable and valid; they also must
be sensitive to changes in symptom severity. Behavior therapy (in vivo exposure plus
response prevention) and clomipramine are established treatments for OCD, with
their efficacy demonstrated on numerous outcome measures (Abel, 1993; Cox et al.,
1993; van Balkom et al., 1994). Accordingly, studies of these therapies were used in
me&analyses of the sensitivity of OC measures.
Method
A meta-analysis was conducted separately for clomipramine and behavior therapy, using
the procedures described by Wolf (1986). Studies were included if they (a) included
samples of more than five subjects, (b) used one or more of the measures covered in
this review, and (c) reported sufficient information to compute effect sizes. Suitable
studies were located by searching Psychological Abstracts and Medline data bases, and
by consulting recent treatment-outcome reviews (e.g., Abel, 1993; Cox et al., 1993; van
Balkom et al., 1994). When necessary and feasible, authors of published reports were
contacted in an effort to obtain information necessary to compute effect sizes. Studies
using subsamples of larger studies were excluded unless they reported outcome measures that were not reported in the larger studies. Also excluded were studies that used
combined pharmacological and behavioral treatment within a single therapy trial.
Thirty-five suitable studies were identified, which provided 19 trials of clomipramine
and 20 trials of behavior therapy (some studies reported more than one trial).
The effect size for each measure was computed according to the following formula: Effect size (ES) = (Mpre - Mpost)/SDFled,
where M,, and Mp, are the pre and
posttreatment means for a given treatment trial, and SDI-ld
is the mean of pre and
posttreatment standard deviations. Hedges (1982) correction was used to calculate
Obs~ti
and Gmfnhuns
285
mean effect sizes. This adjusts for differences in sample size by weighting each effect
size according to the number of subjects it was based upon.
Nine published treatment studies of behavior therapy or clomipramine used a BAT to
assess treatment outcome. Only four studies provided enough information to compute
effect sizes for behavior therapy, and only one for clomipramine. The results for behavior therapy are presented in Table 2. Here it can be seen that effect sizes for SUDS and
avoidance varied markedly across studies, and effect sizes had no obvious relationship to
number or duration of treatment sessions, sample size, or type of BAT. Overall, the findings suggest that BATS are sensitive to treatment effects. However, further research is
required to determine which type of BAT is most sensitive to treatment effects.
Direct observation and diary methods also require further investigation. These measures appear sensitive to treatment effects in case studies and small open trials (e.g.,
Foa et al., 1980; Mills et al., 1973; Turner et al., 1979,1980,
1985). However, the studies using these methods either did not meet criteria for inclusion in the meta-analysis,
or they did not provide sufficient information to compute relevant effect sizes.
Self-Report lnven tories and Observer-Rated Scales
Table 3 shows the mean effect sizes for self-report inventories and observer-rated
scales, along with the number of trials on which the calculations were based. Before
comparing the treatment sensitivity of the scales, it is necessary to determine whether
the effect sizes of each measure were based on different amounts of treatment. For
the behavior therapy trials, the amount of therapy per outcome trial was computed by
multiplying the number of treatment sessions by the duration of each session. The
inventories and scales were defined as independent variables and were compared, by
means of a one-way ANOVA, in terms of the amount of therapy associated with them.
The inventories and scales did not differ with regard to this variable, fl13,40)
< 1.
For the clomipramine trials the amount of therapy was defined as the mean dose.
per patient multiplied by the number of weeks of treatment. This was used as a dependent variable in a one-way ANOVA, where the inventories and scales were independent
TABLE 2. Sensitivity to TreatmentEffects: Behavioral Avoidance Tests

Number of
Treatment
Sessionsa
Sample
Size
Cottraux et al. (1990)
up to 25
15
Foa et al. (1984)
17
Rachman et al. (1979)

Woody et al. (in press-a)
Hedges adjusted mean
SLI
15
20
Study
Effect Size
Trpe of BAT
SUDS
Avoidance
1.87
1.64
11
Multitask
(set as homework)
Single task
10
51
Multitask
Multitask
5.36
3.69
1.03
1.50
1.93
2.80
1.09
1.34
0.87
Note. SUDS = Subjective units of distress.

?4ll treatments were behavior therapy (exposure and response prevention).
bNot reported.
3
1
MOCI
CAC
Self-Report
Observer-Rated
CPRSOC
2.03
0.89
1.08
1.33
0.45
0.38
-
0.34
2
4
4
4
4
LO1 Subscales
Symptom
Resistance
Interference
0 . 30
0.28
0.67
SCLQO-R OC and predecessor@

0.56
0.47
0.87
Number
of TriaIs
SD
Hedges
Adjusted Mean
Instrument
Scab
1.46
0.38
-
0.52
0.66
0.05
0.45
0.02
SD
0.86
1.78
1.02
1.03
0.97
1.03
0.49
Hedges
Adjusted Mean
Effect Size
Behavior Therapy
(Exposure and Response Prevention)
and Observer-Rated
Number
of Trials
Effect Size
Clomipramine
TABLE 3. Sensitivityt o T r e a t m e n t Effects: Self-Report Inventories
2.03
0.88
1.64
1.09
0.80
0.68
0.94
0.49
Grand Mean
(AU Trials)
1.75
0.62
0.24
1.67
YBOCS (IO-item)
aOne trial used the HSCL (effect size = 0.47) and one used the SCLSO-R (effect size = .50).
Note. Studies used: Allen & Rack, 1975; Benkelfat et al., 1989, 1990; Boersma et al., 1976; Clomipramine Collaborative Study Group, 1991; Cottraux
1990; Emmelkamp & Beens, 1991; Emmelkamp & Rmanen, 1977; Emmelkamp et al., 1980, 1988, 1989; Fals-Stewart et al., 1993; Foa et al., 1984,
Freund et al., 1987; Cehris et al., 1990; Hewlett et al., 1992; Insel et al., 1983; Rozak et al., 1988; Mavissakalian et al., 1990; Pato et al., 1991; Pigott
1992,199O; Rachman et al., 1979; Rack, 1973; Solyom & Sookman, 1977; Steketee & Doppelt, 1986; Tamimi et al., 1991; Thoren et al., 1980; Vallejo
1992; van den Hout et al., 1988; Welkowitz et al., 1989; Woody et al., in press-a, in press-b Zohar & Insel, 1987.
et al.,
1992;
et al.,
et al.,
1.74
0.58
1.74
GOCS
1.98
2.27
1.75
0.63
1.98
2.27
1.75
5
4
1.84
0.87
2.11
3.47
0
0
1.84
0.60
0.98
1.56
2.11
3.47
0.66
5
4
1.56
0
0
Likert Scales: Observer-Rated

Main Fear (MF)
Avoidance of stimuli
associated with MF
Severity of compulsions
Likert Scales: Self-Report

Main Fear (MF)
Avoidance of stimuli
associated with MF
Severity of compulsions
288
S. Taylor
variables. Again, the inventories and scales did not diier on this variable, F(8,27) = 1.17,
p > .l. Thus, the effect sizes obtained for the inventories and scales were not confounded by differences in the amount of treatment associated with each of measure.
This means the effect sizes could be directly compared to determine the relative sensitivity of the measures.
In terms of Cohens (1988) classification scheme, large effect sizes are > .80, and
medium effects are .50 to .79. Table 3 shows the inventories and scales generally yielded medium-tolarge effects, suggesting they all were sensitive to treatment effects. The
OC scales from the SCLSO-R and HSCL produced the smallest effects. The other selfreport inventories (LO1 subscales, self-report CAC, and MOCI total scale) produced
similar effect sizes to one another. MOCI subscales have been used as outcome measures
in only one study (Mavissakalian, Jones, Olson, & Perel, 1990) and so it is diflicult to
gauge their sensitivities. Mavissakalian et al. found that all subscales were sensitive to
the effects of clomipramine. The largest effect was for the checking subscale (effect size
= 1.15)) followed by the washing (1.00)) doubting/conscientiousness
(0.77)) and slowness subscales (0.47). Although the effect sizes suggest the subscales are sensitive to
treatment effects, they should be interpreted with caution because Mavissakalian et al.
did not describe the types of obsessions and compulsions in their sample. If their sample was mostly patients with compulsive checking, then we would expect the checking
subscales to have the largest effect size.
As Table 3 suggests, observer-rated scales produced larger effect sizes than selfreport scales for the clomipramine trials, t(36) = 7.14, p < .OOl, and there was a trend
in this direction for behavior therapy trials, t(56) = 1.99, p < .052. Table 3 suggests the
findings for the Likert scales were an exception to these results, since the effect sizes
of self-report and observer-rated versions do not appear to differ. These impressions
were supported by statistical analyses of the behavior therapy trials, which is where the
Likert scales were used (Table 3). The scales were classified into four groups: (a) selfreport inventories (SCLSO-R OC, LOI, MOCI, self-report CAC),(b) self-report Likert
scales, (c) observer-rated Likert scales, and (d) other observer-rated scales (observerrated CAC and YBOCS). The groups were used as independent variables and effect
size was the dependent variable. The one-way ANOVA was significant, F(3,54) = 8.88,
p < .OOl, and Newman-Keuls posthoc comparisons revealed that the effect size of
Group 1 (self-report inventories) was significantly smaller than those of the other
groups (p < .05), and that the other groups did not differ from one another (ps > .05).
Why do observer-rated scales generally yield larger effects? Lambert, Hatch,
Kingston, and Edwards (1986) found similar results for measures of depression and
suggested that trained observers might be better than patients at detecting changes in
symptom severity. This advantage does not appear to be present for Likert scales. It is
not clear why this occurred. Studies using Likert scales have provided little information
on how the scales were administered. Apart from providing descriptors for the anchor
points on the scales, the studies have provided no information on the time frame used
to assess symptoms or other pertinent details. It may be that self-report Likert scales are
more sensitive than self-report inventories because of their greater specificity; that is,
the subject may be instructed to rate specific symptoms over a specific time period.
DIS C U SSIO N
C u rr e n t Status of the Assessment of Obsessions and Compulsions
The selection of measures for treatment outcome studies is based on multiple criteria. Among the most important are (a) content (range of phenomena assessed); (b)
Obsessions and i3mpulGm.s
289
reliability and validity, and whether their is sufficient available information to evaluate these properties; and (c) sensitivity to changes in symptom severity. Some measures
are popular in OCD treatment-outcome
studies, yet have unknown psychometric
properties (i.e., the CPRS-OC and COCS; see Table 1). Some measures provide only
global measures of OC symptoms (LO1 symptom subscale and CAC) and others
appear to be largely measures of nonspecific distress (SCL90-R OC scale, its predecessors, and CPRS-OC). Some measures confound important variables (e.g., symptom
prevalence and distress is confounded in the SCLSO-R OC scale; symptom prevalence
and degree of resistance is confounded in the LO1 resistance subscale; obsessional
slowness, avoidance, and oddity of behavior are confounded in the CAC). When
breadth of measurement, reliability, validity, and sensitivity to treatment effects are
considered together, the YBOCS appears to be the best available measure for treatment outcome research.
Future Directions
Rejining&ng?ne-.
Further research is needed to firmly establish the reliability and validity of many of the measures currently used in treatment outcome
research. For example, studies of test-retest reliability have been confined to relatively short periods (days to weeks). For most measures, temporal stability (in the
absence of treatment) over longer periods of time remains unknown. This is an
important omission because OCD is a chronic disorder (APA, 1994) and a good measure of OC symptoms should be stable over periods of months or years. Test-retest
reliability over periods of months also is important for treatment studies that extend
over such time periods, and for studies of long-term effects of treatment.
Behavioral assessment of OCD has fallen into neglect in recent years, despite the
potential advantages of various assessment methods. Diary methods are commonly
used in studies of other disorders (e.g., panic disorder, chronic pain) and may
become more popular in OCD studies if their reliability and validity can be estab
lished. Confidence in the accuracy of reliability and validity statistics for the YBOCS
would be improved if investigators reported the results of integrity checks for the
YBOCS structured interview, including descriptions of the nature and incidence of
protocol violations. The psychometric properties of the other observer-rated scales
also could be improved by using structured interviews to derive the ratings.
Zncreasitrgspeciji&y. A better understanding of treatment effects may be obtained by
using measures of specific OC symptoms or symptom parameters, rather than relying
on global measures of symptom severity. For example, rather than relying on the sum
of the lo-item YBOCS as a global measure, its obsession and compulsion subscales
could be used separately. This would enable investigators to determine whether a
given treatment (e.g., behavior therapy) is more effective in reducing some OC symp
toms (e.g., compulsions) than others (e.g., obsessions).
Cog&k w
Sanavio (1988) argued that existing self-report measures fail to ade
quately assess obsessions. Accordingly, he developed the Padua inventory, which contains four subscales: (a) checking, (b) contamination fears, (c) mental dyscontrol
(impaired control of mental activities), and (d) fear of behavioral dyscontrol (urges and
worries about losing control of ones behavior). The last two subscales pertain to intrusive thoughts, and were retained essentially unchanged in a recent revision of the Padua
inventory (van Oppen et al., 1995). Although these scales may be useful for outcome
290
S. Taylor
research, they are highly correlated with nonspecific distress (van Oppen, 1992), and
some items are measures of general worry rather than obsessions (Freeston et al., 1994).
Worry and obsessions share many features, although they can be distinguished concep
tually (Turner, Beidel, 8c Stanley, 1992), and subjects can readily discriminate between
them when provided with written definitions (Wells & Morrison, 1994).
A more promising self-report assessment of obsessions is the Obsessive Intrusions
Inventory (Purdon & Clark, 1993), which is a 52-item measure of intrusive thoughts,
images, and impulses. Preliminary validation studies are encouraging, and suggest the
scale is a measure of obsessions rather than other types of thoughts (Purdon & Clark,
1993). This scale may be a valuable addition to treatment outcome batteries, especially if it proves sensitive to changes in symptom severity.
Comprehensive cognitive assessment also requires measures of the patients beliefs
relating to his or her obsessions and compulsions. The YBOCS assesses some of these,
such as exaggerated beliefs of personal responsibility. Other beliefs also may be
important. People with OCD often fear they will act on their obsessional thoughts.
They often state that having an unwanted thought about performing a particular act
is as bad as performing the act itself (thought-action fusion; Rachman, 1993). Such
beliefs are important predictors of the persistence of obsessions (Purdon & Clark,
1994) and may be usefully included as part of a comprehensive assessment battery.
In cognitive-behavioral therapy, patients are often instructed in adaptive methods
of thought control (e.g., see Salkovskis, 1989). To determine whether these interventions are successful, it would be useful to have a method for assessing adaptive and
maladaptive strategies for controlling obsessions. Well and Davies (1994) recently
developed the Thought Control Questionnaire, which may be useful for this purpose.
Such an assessment should prove useful in studies of the process of change in cognitive-behavior therapy, and may provide insights as to the nature of cognitive changes
in pharmacotherapy.
S@xiul @t&ions.
OCD sometimes arises in childhood or early adolescence (APA,
1994) and so some researchers are turning their attention to the early detection and
treatment of childhood obsessions and compulsions. Child and adolescent versions of
several scales have been developed, including the LO1 (Berg, Rapoport, & Flament,
1985) and YBOCS (Goodman & Price, 1990). Little is known about their psychometric properties or sensitivity to treatment effects, although preliminary findings are
encouraging (Berg et al., 1985; Flament et al., 1985; Goodman & Price, 1999). As in
adult samples, self-report measures yield smaller treatment effect&es than observer
rated scales (Flament et al., 1985; Leonard et al., 1989). Further research is needed
on the assessment of obsessions and compulsions in children, and in other populations such as the elderly and minority groups.
Cri&ria~ v
sign&mt chamge.Most OCD outcome studies focus on the statistical
significance of findings; few discuss the clinical significance of the results. Yet, consideration of clinical significance adds an important dimension to treatment evaluation.
This is illustrated by a recent multicenter study of fluoxetine (Prozac) in the treatment
of OCD. Tollefson et al. (1994) assigned 335 OC patients to placebo or fluoxetine. The
latter was given at one of three doses (20,40, or 60 mg/day). In order to be included
in the study, patients had to have a YBOCS score > 15, which was defined as indicating
OC symptoms of moderate or greater severity. According to this cutoff, which is often
used in pharmacotherapy studies of OCD (Rosenfeld et al., 1992), one may define
YBOCS I 15 as indicative of clinically mild OC symptoms. After 13 weeks of treatment,
Obsessionsand compulsion
291
fluoxetine produced statistically significant reductions in OC symptoms. However, the

outcome is less impressive when clinical signiicance is considered. For all three dose
levels the mean posttreatment YBOCS score was grwter than 15, suggesting that half the
sample continued to meet criteria for clinically severe 00.
The clinical utility of the YBOCS and other instruments may be improved by having experts specify the clinical significance of each score or range of scores. The criteria developed by Jacobson and Truax (1991) and refined by others (e.g., Hageman
& Anindell, 1993) also could be used to define the clinical significance of OCD treatments. This would include specifying how many subjects are within, say, 2 SDS of the
normal range after treatment. This requires general population norms for OCD measures. Currently, norms are available for many self-report inventories (e.g., the MOCI)
but not for the observer-rated scales. No norms are available for the YBOCS because
the scale was developed for use with patients diagnosed with OCD. However, obsessions and compulsions occur in people without OCD (e.g., Rachman 8c de Silva, 1978;
Salkovskis & Harrison, 1984) and recent studies have begun to use the YBOCS with
normal samples (Rosenfeld et al., 1992). Thus, there is no reason why general population norms for the YBOCS (and other observer-rated scales) cannot be developed.
Such norms would facilitate the use of these scales and the enhance the information
obtained from outcome studies.
REFERENCES
Abel, J. L (1993). Exposure with response prevention and serotonergic antidepressantsin the tmatmentof
obsessive compulsive disorder: A review and implications for interdisciplinary treatment. Bchuuiwr
&arch
and Theratq, 31,463-473.
Allen, J. J.. & Rack, P. H. (1975). Changes in obsessive-compulsive patients as measured by the Leyton
inventorybefore and after treatmentwith clomipramine. Scottish Mcdicd Journal, ZO(Suppl. l), 41-44.
American Psychiatric Association. (1994). L&gnostic and statistical manual of mental disonim (4th ed.).
Washington, DC: Author.
Asberg, hf., Montgomery, S. A, Perris, C., Schallying, D., & Sedvall, G. (1978). A comprehensive psychopathological rating scale. A& Psychiatrica Scundimwica, 27I(Suppl.), 5-27.
Baer, L., & Jenike, M. A. (1992). Personalitydisorders in obsessivecompulsive disorder. Pychiatvic Clinicsof
Nwth America, 15,803-812.
Benkelfat, C., Murphy, D. L., Zohar,J., Hill, J. L., Grover, G., & Insel, T. R (1989). Clomipramine in obsessive-compulsivedisorder: Further evidence for a serotonergic mechanism of action. Arch&s of General
Psyhiatq, 46,23-28.
Benkelfat,C., Nordahl, T. E., Semple, W. E., King, A C., Murphy, D. L., & Cohen, R M. (1990). Local cere
brai glucose metabolic rates in obsessive-compulsivedisorder. Anhiues of General Pqchiatq, 47, S40-S4S.
Berg, C. J., Rapoport, J. L., & Flament, M. (1985). The Leyton Obsessional Inventory - Child version.
Psychephamtaco&y
BuUctin,2I.1057-1059.
Black, D. W., Kelly, M., Myers, C., & Noyes, R (1990). Tiitiated imipramine binding in obsessiw-compulsive volunteers and psychiatricallynormal controls. Biologiud Psychiahy, 27.319-327.
Boenma, R, Den Hengst, S., Dekker,J., & Emmelkamp, P. M. G. (1976). Exposure and response prevention in the natural environment:A comparisonwith obsessive-compulsivepatients.BehaviourRmamhand
Then@, 14,19-24.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix. psychobgical
Bullcrin,56,8l-105.
Chan, D. W. (1990). The Maudsley obsessional-compulsive inventory A psychometric investigationon
Chinese normal subjects.Behaviou~Reseamh and Therafi, 28,413-420.
Clark, A., & Friedman, M. J. (1983). Factor structure and diiriminant validityof the XL90 in a Veteran
t, 47,396-404.
psychiatricpopulation.Joumal of Persona& AssClark, D. M., Salkovskis,P. M., Hackmann, A, Middelton, H., Anastasiades,P., 8cCelder, M. (1994). A comparison of cognitive therapy, applied relaxation and imipramine in the treatment of panic disorder.
Btitish Journal of Aychiatry, 164.759-769.
Clomipramine Collaborative Study Group. (1991). ~omipramine in the treatment of patientswith obses
s&-compulsive disorder. Archives of General Psychi@,
48,730-738.
292
S. Taylor
Cohen,J. (1988). Stuti.sticdpowcrana@erforfhe6&w&mis&nccr(2ded.).

Hie,NJ:Erlbaum.
Cone, J. D. (1988). Psychometric considerations and the multiple models of behavioral assessment In A. S.
Bellack & M. Hersen (Eds.). Behu&mi aweument(3rd ed., pp. 42-66). New York: Pergamon.
Cooper, J. (1970). The Iqton obsessional inventory. l%ychdogicnlMedic%% 1.48-64.
Cooper, J., & McNeil, J. (1968). A study of houseproud housewives and their interaction with their children. Journal of Child Psycho& and Psychiatry, 9.173-188.
Cottraux, J., Bouvard, M., Defay-olle, M., & Messy, P. (1988). Validity and factorial structure of the compulsive activity checklist. Behavior Thempy, 19,45-53.
Cottmux, J., Mallard, E., B o u t a r d ,
M., Marks, I., Sluys, M., Nury, A. M., Douge, R., & Cialdella, P. (1990).
A controlled study of lluvoxamine and exposure in obsessive-compulsive diirder. International Cfinicai
l?yho@nmacolog),
5,17-30.
Cox, B. J., Swinson, R. P, Morrison, B., & Lee, P. S. (1993). Clomipramine, fluoxetine, and behavior therapy in the treatment of obsessive-compulsive disorder: A meta-analysis.Jmcmol of &h&or
Thoapy and
E.qkmentul Psych&q, 24,149-153.
Craske, M., Rapee, R, Jackel, L., & Barlow, D. (1989). Qualitative dimensions of worry in DSM-III-R generalized anxiety subjects and nonanxious controls. BehaviourReswnh and Them& 27,397-f02.
Derogatis, L. R. (1977). S&wfi
Administmtion, scoring, and prwcduns manual I fm the nwised version.
Baltimore, MD: Clinical Psychometrics Research.
Derogatis, L. R, Lipman, R S., & Covi, L. (1973). SCLSO: An outpatient psychiatric rating scale Preliminary report. Psycchophmnracdogy
Bulletin, 9,132s.
Derogatis, L. R., Lipman, R. S., Rickels, R, Uhlenhuth, E. H., & Cow, L. (1974). The Hopkins symptom
checklist (HSCL): A self-report symptom inventory. BehuuiemlQicnce, 19, l-15.
Derogatis, L. R, Rickels, 11, & Rock, A. F. (1976). The SCL90 and the MMPI: A step in the validation of a
new self-report scale. British Journal of Psychiatq 128,260-289.
DiNardo, P. A, & Barlow, D. H. (1988). Anxiety disonfers it~teruiew schedul&eked
(ALXSR). New York
Graywind.
DiNardo, P. A, Moms, II, Barlow, D. H., Rapee, R. M., & Brown, T. A_ (1993). Reliability of DSM-III-R anxiety disorder categories: Using the anxiety disorders interview schedule-Revised (ADISR). Archiucs of
Generd Psyhiatq, 50,251-256.
Dinning, W. D.. & Evans, R G. (1977). Diiriminant and convergent validity of the SCL90 in psychiatric
inpatients. Joumal of PersonaUy AsseWnmt, 41,304-310.
Dominguez. R A, Jacobson, A E, de la Gandara,J., Goldstein, B.J., Steinbrook, R M. (1989). Drug response
assessedby the modified Maudsley obsessiiompulsive
inventoty. &chopB&tin, U, 215-218.
Emmelkamp, P. M. G. (1982). Phobicand obscssivc
wmw
disodrs: Thaq trscamh, and @nctica New York:
Plenum.
Emmelkamp, l? M. G., & Beens, H. (1991). Cognitive therapy with obsessive-compulsive disorder: A compatative evaluation. BehavieurRcseazh and Them&, 29,293-309.
Emmelkamp, P. M. G., 8c Eraanen, J. (1977). ThetapistControlled exposure in vivo vetsus self controlled
exposure in vivo: A comparison with obsessive-compulsive patients. Behaviour&eamh and Thempy, 15,
491-495.
Emmelkamp, P M. G., van der Helm, M., van Zanten, B. L., & Plochg. I. (1980). Treatment of obsessh+ compulsive patients: The contribution of self-instruction training to the effectiveness of exposure. B&auiour
Reseurch and Them@y, 18,61-66.
Emmelkamp, P. M. G., van Linden, van den Heuvell, C., Ruphan, M., 8cSanderman, R (1989). Home-based
treatment of obsessive-compulsive patients: Intersession interval and therapist involvement. Behuviour
Rcstamh and Thor& 27.89-93.
Emmelkamp, P. M. G., Visser, S., & Hoekstra, R. J. (1988). Cognitive therapy vs exposure in viva in the treatment of obsessive-compulsives. Cognitive Then@ and Resanzh, 12,103-114.
FabStewart. W., Marks, A. l?, & Schafer, J. (1993). A comparison of behavioral group therapy and individual behavior therapy in treating obsessive-compulsive disorder. Journal of Nuwous and Mental L&use, 181,
189-193.
Fhunent, M. F, Rapoport, J. L., Berg, C. A., Sceery, W., gilts, C., Mellstron, B., & Linnoila, M. (1985).
Clomipramine treatment of childhood obsessive compulsive disorder. A7chiuc.s of GGncral Psychiohy, 42,
977-983.
Foa, E. B., Grayson, J. B., Steketee, G. S., Doppelt, H. G., Turner, R M., & Latimer, P. R. (1983). Success
and failure in the behavioral treatment of obsessive-compulsives. Journal of &n&ting
and Clinical
Psycho@, 51.287-297.
Foa, E. B., Eomk. M. J., Steketee, G. S., & McCarthy, P. R. (1992). Treatment of depressive and obscssive-compubive symptoms in OCD by imipmmine and behavior therapy, British Jo~mul of Clinical
Psychology, 31,279-292.
ohs&
and Gnnm
Foa, E. B., Steketee, G., Grayson, J. B., Turner, R M., & Latimer, P. R (1984).
293
Deliberate exposure and
blocking of obsessiw-compulsiie rituals: Immediate and long-term effects. Bdtan& Thcnrp, 15,456-472.
Foa, E. B., Steketee, G., Koxak, M. J., & Dugger, D. (1987). Effects of imipramine on depression and obsec
sive-compulsive symptoms. Pqchiatq Re,smrh, 21,123-136.
Foa, E. B., Steketee, G., & Milby, J. B. (1980). Differential effects of exposure and response prevention in
obsessive-compulsive washers. Joumal of Gmsulting and Clinical Psychology, 48,71-79.
Freeston, M. H., Ladouceur, R, Rheaume, J., Letarte, H., Gagnon, F., & Thibodeau, N. (1994). Self-report
of obsessions and worry. Behaviour Research and Therapy, 32,29-36.
Freund, B., Steketee, G. S., & Foa, E. B. (1987). Compulsive activity checklist (CAC): Psychometric analysis
with obsessive-compulsive disorder. Behavioral Asscssmmt, 9,67-79.
Frost, R., & Sher. K. (1989). Checking behavior in a threatening situation. Bchauiour Research and Therapy,
27, 385-389.
Gehris, T. L., Kathol, R G., Black, D. W., & Noyes, R (1990). Urinary free cortisol levels in obsessive-compulsive disorder. Pychiutry Besea&, 32.151-158.
Glass, C. R., & Amkoff, D. B. (1994). Validity issues in selfstatement measures of social phobia and social
anxiety. Behatiur Research and Therapy, 32,255-267.
Goodman, L. A, & KruskaI, W. H. (1963). Measures of association for cross-classilications: III. Approximate
sampling theory. Journal of the Atnerican SkztistiuzlRrcociation, 58,310-364.
Goodman, W. K., & Price, L. H. (1990). Rating scales for obsessive-zompulsive disorder. In M. A Jenike, L.
Baer, & W. E. Minichiello (Eds.), Obsessive-wrnpu&vediwniers: Thany and managtmcnt (pp. 154-166).
Chicago: Year Book Medical Publishers.
Goodman, W. K, Price, L. H., Rasmussen, S. A., Mazure, C., Fleishmann, R. L., Hill, C. L., Heninger, G. R.,
& Chamey, D. S. (1989). The Yale-Brown obsessive compulsive scale. I. Development, use, and reliability. Anhives ofGeneralpsYchiahy, 46,1006-1011.
Goodman, W. K, Price, L. H., Rasmussen, S. A., Mazure, C., Delgado, P., Heninger, G. R., & Chamey, D. S.
(1989). TheYaIe-Brown obsessive compulsive scale. II. Validity. Archives of GeneralpsYchiatp 46,1012-1016.
Goodman, W. R, Rasmussen, S. A., Price, L. H., Mazure, C., Heninger, G. R., & Chamey, D. S. (1989). Manual
for the Y&Bwwn
obsessiue wmpukive scab (rev.). New Haven, CT: Connecticut Mental Health Center.
Hackman, A., & McLean, C. A. (1975). A comparison of flooding and thought stopping in the treatment
of obsessional neurosis. Behaviour Rcscorch and Therapy, 13,263-269.
Hageman, W. J., & Anindell, W. A. (1993). A further refinement of the reliable change (RC) index by
improving the prepost difference score: Introducing RCm. Behaviour Research and Therapy, 31,693-700.
Hedges, L. (1982). Estimation of effect size from a series of independent experiments. pSrch&gic~ BuLktin,
92,490-499.
Hewlett, W. A., Vinogradov, S., & Agras, W. S. (1992). Clomipramine, clonazepam, and clonidine treatment
of obsessivc-compuIsive disorder. Journal of Clinical psYcho/~hamtaco&~gy,
12,420-430.
Hodgson, R J., & Rachman, S. (1977). Obsessional+ompuIsive complaints. BehauiourResearch and Therapy,
15,3&L395.
Insel, T. R., Murphy, D. L., Cohen, R. M., Alterman, I., Kilton, C., & Linnoila, M. (1983). Obsessive-compulsive disorder: A double blind trial of clomipramine and clorgyline. A&ivrs of General Psychiatry, 40,605-612.
Jacobson, N. S., & Tiuax, P. (1991). Clinical significance: A statistical approach to defining meaningful
change in psychotherapy research. Journal of Consulting and Clinical Pychobgy, 59,12-19.
Jenike, M. A. (1989). Obsessive+ompulsive and related disorders: A hidden epidemic. NewEngland Journal
of Medicine, 321,539541.
Jenike, M. A., Hyman, S., Baer, L., Holland, A., Minichiello, W. E., Buttolph, L., Summergrad, P., Seymour,
R., & Ricciardi, J. (1990). A controlled trial of fluvoxamine in obsessive-compuhive disorder: Implications
for a serotonergic theory. American Journal of Psych&y, 147,1209-1215.
Kazarian, S. S., Evans, D. R., & Lefave, K (1977). Modification and factorial analysis of the Leyton
Obsessional Inventory. Journal of Clinical Psychology, 33,422-425.
Kendall, M. G. (1963). Rank caclafiun methods (3rd ed.). London: Griffm.
Kendell, R. E., & DiScipio, W. J. (1970). Obsessional symptoms and obsessional personality traits in patients
with depressive illness. Pychobgical Medicine, 1,65-72.
Kern, J. M. (1983). Relationships between obtrusive laboratory and unobtrusive naturalistic behavioral fear
assessments: Treated and untreated subjects. Behavioral Assesszen t, 6,45-60.
Kim, S. W., Dysken, M. W., & Katz, R. (1989). Rating scales for obsessive compulsive disorder. Pychiutk
Annals, 19,74-79.
Kim, S. W., Dysken, M. W., & Kuskowski, M. (1990). The Yale-Brown obsessive-compulsive scale: A reliability and validity study. Psychiatry Research, 34,99-106.
Kim, S. W., Dyxken, M. W., & Kuskowxki, M. (1992). The Symptom Checklist-90 obsessive-compulsive sub
scale: A reliability and validity study. Psych&y Reseazh, 41, 37-44.
294
S. Tqbr
Rim, S. W., Dysken, M. W., Ruskowski, M., & Hoover, If_ M. (1993). The Yale-Brown obsessii compulsive
scale and the NIMH global obsessive compuhii scale (GOCS): A reliability and validity study.ZntrmaEiona
Joumal ofMdhods in P+ia&ic
lkeazh, 3.3744.
Kozak, M. J., Foa, E. B., & Steketee, G. (1988). Process and outcome of exposure treatment with obsessive-compulsives: Psychophysiological indicators of emotional processing. EehmriorThera#y,19,157-169.
Rraaijkamp, H. J. M., Emmelkamp, P. M. G.. & van den Hout, M. A. (1986). The Maudslq tie
@side inventory: R&&i&
and t&&y.
Unpublished manuscript, Department of Ciinical Psychology,
University of Groningen, The Netherlands.
lacks, P., & Morin, C. M. (1992). Recent advances in the assessment and treatment of insomnia. Joumal of
GmsuUing and Clinical +w,
40,586-594.
lambert, M. J.. Hatch, D. R. Kingston, M. D., 8c Edwards, B. C. (1986). Zung, Beck, and Hamilton rating
scales as measures of treatment outcome: A meta-analytic comparison. Journal of G~~~~ultingand Clinical
Psychology, 54,54-59.
Lang, P. J.. & Lazovik. A. D. (1963). Experimental desensitization of a phobia. Joumal of Abnomsal and social
psychdogT,66,519-525.
Leckman, J. F., Walker, B. E., Goodman, W. R, Pauls, D. L., & Cohen, D. J. (1994). Justright perceptions
associated with compulsive behavior in Tourettes syndrome. American Joumal of Psychiatry, 151,675-680.
Leonard, H. L., Swedo, S. E., Rapoport, J. L., Roby, E. V, Lenane, M. C., Cheslow, & Hamburger, S. D. (1989).
Treatment of obsessive-compulsive disorder with clomipramine and desipramine in children and adoles
cents. Archiues of General Psych&y, 46,1688-1692.
Marks, 1. M., Haiiam, R. S., Connolly, J., & Philpott, R. (1977). Nursing in 6chaviomZ+sychoth@y
London,
UK: Royal College of Nursing.
Marks, I. M., Stem, R S., Mawson, D., Cobb, J., & McDonald, R (1986). Clomipramine and exposure for
obsessive-compulsive rituals: I. British Joumal of Psych&y
136,1-25.
Mavismkalian, M. R, & Barlow, D. H. (1981). Assessment of obsessive-compulsive disorder. In D. H. Barlow
of adult disonbs (pp. 269-238). New York: Guilford.
(Ed.), Behuviumlit
Mavissakalian, M., Jones, B., Olson, S., & Perel, J. M. (1999). Ciomipramine in obsessive+ompulsive disorder: Clinical response and plasma levels. Journal of Clinical Pha-o&p
IO, 261-366.
Millar, D. G. (1986). A repertory grid study of obsessionalityz Distinctive cognitive structure or distinctive
cognitive content? British Joumul of Medical Psycholo~, 53,59-66.
Millar, D. G. (1983). Hostile emotion and obsessional neurosis. Aychdogical Medicine, 13.813-819.
Mills, H. L., Agras, W. S., Barlow, D. H., & Mills, J. R. (1973). Compulsive rituais treated by response pre
vention. Archives of Gnuml Pqchiaby, 28.524630.
Murphy, D. L., Pi&r, D., & Alterman, I. S. (1982). Methods for the quantitative assessment of depressive
and manic behavior. In E. I. Burdock, A. Sudilosky, & S. Gershon (Eds.), The b&zuiorofprYchiubicpatienfs
(pp. 355-392). New York: Dekker.
Nietzel, M. T., Bernstein, D. A, & Russell, R. L. (1988). Assessment of anxiety and fear. In A. S. Bellack &
M. Hersen (Eds.) , Behavioral assesmmt (3d ed., pp. 280-312). New York: Pergamon.
Nunnally, J. C. (1978). Psychombc theory (2d ed.) . New York: McGraw-Hill.
Pato, M. T., Pigott, T A., Hill, J. L., Grover, G. N., Bernstein, S., & Murphy, D. L. (1991). Controiled comparison
of buspirone and clomipramine in obses&+compuisii
disorder. A meticun JoumulofP@aby,
148,127-129.
Philips, H. C. (1988). The ps$wiogical manapment of chnmic pain. New York: Springer.
Philpott, R. (1975). Recent advances in the behavioral assessment of obsessional illness: Difficulties common to these and other measures. Scottish Medical Journal, 2O(Suppl. 1), 33-46.
Pigott, T. A., LHeureux, F., Hill, J. L., Bihari, K, Bernstein, S. E., & Mtuphy, D. L. (1992). A double-blind
study of adjuvant buspirone hydrochloride in clomipramine-treated patients with obsessive-compulsive
disorder. Journal of Clinical Psychupha-o&y,
12,l l-18.
Pigott, T. A, Pato, M. T., Bernstein, S. E., Grover, G. N., Hill, J. L., Tolliver, T. J., & Murphy, D. L. (1996).
Controlled comparisons of clomipramine and fluoxetine in the treatment of obsessive+ompulsive dii
order. Archives of Ceneml Psychiatry, 47,926-932.
Price, L. H., Goodman, W. R., Chamey, D. S., Rasmussen, S. A, & Heninger, G. R (1987). Treatment of
severe obsessivtiompulsive disorder with fluvoxamine. American Joumal of Psych&y, 144,1059-1661.
Purdon, C., & Clark, D. A (1993). Obsessive intrusive thoughts in nonclinical subjects. Part I. Content and
relation with depressive, anxious, and obsessional symptoms. BehaviourResearch and Thtrafi, 31,713-720.
Purdon, C., & Clark, D. A. (1994). Obsessive intrusive thoughts in nonclinical subjects. Part II. Cognitive
appraisal, emotional response and thought control strategies. B&aviourRescarch and Therapy, 32,463-410.
Rachman, S. (1993). Obsessions, responsibility and guilt. BehuuiourRtxarch and Them&, 31, 149-154.
Rachman, S., Cobb, J., Grep B., McDonald, D., Mawxon, D., Sartory, G., & Stem, R. (1979). The behavioral
treatment of obsessive-compulsive disorders, with and without clomipramine. Behauiour &search and
Therapy, 17,467-478.
Obstxsti
and Gmzjtukm.s
Rachman, S., & de Silva, P. (1978). Abnormal and normal obsessions. B&aviour Rtsazrch and Them&
233-248.
295
16,
Rachman, S., & Hodgson, R J. (1980). C#.usbs and concpursions

Englewood Cliffs, NJ: Prentice Hall.
Rachman, S., Hodgson. R., & Marks, I. M. (1971). The treatmentof chronic obsessiiompulsive
neure
sis. LMavbur Zbscmrh and Them&, 9,237-247.
Rachman, S., Marks, I. M., & Hodgson, R (1973). The treatment of obsessive+ompulsive neurotics by
modelling and flooding in vivo. Behaviour Wh
and Therapy, 1 I, 463-471.
Rack, P. M. (1973). Clomiptamine in the treatment of obsessive states with special reference to the Leyton
obsessional inventory. Jbumal of Znttmational Meakal Ilcumch, I, 397402.
Richter, M. A, Cox, B. J., 8c Direnfeld, D. M. (1994). A comparison of three assessment instruments for
obsessive-compulsive symptoms. Joumal of Behauior Themfi and Experimental Z@chiaby, 25,143-147.
Rosenfeld, R, Dar, R., Anderson, D., Kobak, R A., & Greist, J. H. (1992). A computer-administered version
of the Yale-Brown obsessi~ompulsivc
scale. PsychologicalIlrusrmnzt, 4, 329-332.
Salkovskis, P. M. (1989). Obsessional disorders. In R. Hawton, P. M. Salkovskis,J. Kirk, & D. M. Clark (Eds.),
Cogntiiucbchaviour fherafifozpqchi
pr0bbnt.s (pp. 129-168). Oxford: Oxford University Press.
Salkovskis, P. M., 8c Campbell, P. (1994). Thought suppression induces intrusion in naturally occurring negative intrusive thoughts. Behauiour Rrsearch and Therapy, 32,1-8.
Salkovskis, P. M., & Harrison, J. (1984). Abnormal and normal obsessions: A replication. Behuviour Research
and Therapy, 22,549-552.
Salrman, L. (1968). Obses.siunulpemmuUy.New York: Science House.
Sanavio, E. (1988). Obsessions and compulsions: The Padua inventory. &haviourRsconh
and Therapy, 26,
169-177.
Sanavio, E., & Vidotto, G. (1985). The components of the Maudsley obsessional-compulsive questionnaire.
&ha&ur Research and Therapy, 23,659-662.
Sher, R_, Frost, R., & Otto, R (1983). Cognitive deficits in compulsive checkers: An exploratory study.
Behaviour RLscarch and Tka&,
21,357-363.
Sher, IL, Mann, B., &Frost, R (1984). Cognitive dysfunction in compulsive checkers: Further explorations.
&haviour Research and Therapy, 22,493502.
Shutty, M. S., DeGood, D. E., & Schwartz, D. P (1986). Psychological dimensions of distress in chronic pain
patients: A factor analytic study of Symptom Checklist-96 responses. Joumal of Cmsulfing and Clinical
Z%ycholo~, 54,836-842.
Snowdon, J. (1986). A comparison of written and postbox forms of the Leyton Obsessional Inventory.
Psycholo8icalMedtiw 10,165-l 70.
Solyom. L., & Sookman, D. (1977). A comparison of clomipramine hydrochloride (Anaftanil) and behavior
therapy in the treatment of obsessive neurosis. Joumal of Zntrmationol Medicul Z&sea&, 5(Suppl. 5), 49-61.
Spitzer, R. L., Williams, J., Gibbon, M., & First, M. (1990). Structured clinicul interviewfor DSM-ZZSR - Patient
edition (with p#otic
scnzn) u&on 1.0. Washington, DC: American Psychiatric Press.
Stanley, M. A, Ptather, R. C., Beck, J. G., Brown, T. C., Wagner, A L., & Davis, M. L. (1993). Psychometric
analyses of the Leyton Obsessional Inventory in patients with obsessive-compulsive and other anxiety dis
orders. P@&gical
Assessment, 5,187-192.
Stein, D. J., Hollander, E., & Skodol, A. E. (1993). Anxiety disorders and personality disorders: A review.
Journal of Pcnonu lity Disorders, 7,87-104.
Steketee, G. S. (1993). T~atment of obsessive wm@!sive d&&r. New York: Guilford.
Steketee, G. S., & Doppelt, H. (1986). Measurement of obsessive-compulsive symptomatologyz Utility of the
Hopkins Symptom Checklist. Pychiahy l&urch, 19, 135-145.
Steketee, G. S., & Freund, B. (1993). Compulsive activity checklist (CAC): Further psychometric analyses
and revision. Bchuviou~alpsUch&era~, 21.13-25.
Steketee, G., Freund, B., & Foa, E. B. (1988). Liken scaling. In M. Hersen & A S. Bcllack (Eds.), Dictionary
t kxhniques (pp. 289-291). New York: Pergamon.
ofbehavioral
as.ve_ssmen
Sternbcrger, L. G., & Burns, G. L. (1990a). Obsessions and compulsions: Psychometric properties of the
Padua inventory with an American college population. &havburRcsea~h
and Therapy, 28,341-345.
Stcmberger, L. G., & Bums, G. L. (1990b). Compulsive activity checklist and the Maudsley obsessional~om
pulsive inventory: Psychometric properties of two measures of obsessive-compulsive disorder. lJehuu&
Therapy, 21.117-127.
Tamhtti, R. R., Mavissakalian, M. R., Jones, B., & Olson, S. (1991). Clomipramine versus fluvoxamine in
obsessive+ompulsive disorder. Annals of Clinical Psychiatry, 3, 27.5-279.
Taylor, S., & Livesley, W. J. (1995). The influence of personality on the clinical course of neurosis. Curtrat
opinion in Psychiatry, 8,93-97.
Thoren, P, Asberg, M., Cronholm, B., Jomestedt, L., & Traskmau, L. (1980). Clomipramine treatment of
obse&vc_compulsive disorder: I. A controlled clinical trial. Anhives ofCen~~~~1 PsychiaQ, 37,1281-1285.
2%
S. Taylor
TolleBon, G. D., Rampey, A. H., Potvin, J. H., Jenike, M. A, Rush, A. J., Dominguez, R A., Rotart, L. hi.,
Shear, hf. IL, Goodman, W., & Genduso, L. A_ (1994). A multicenter investigation of fixed-dose fluoxetine in the treatment of obsessive-compulsive disorder. Anhiucs of ccncral Psych&y, 51,559-567.
Turner, S. M., Beidel, D. C., & Stanley, M. A. (1992). Are obsessional thoughts and wony diierent cognitive phenomena? Clinical PsychologyReview, 12,257-270.
Turner, S. M., Hersen, M., Bellack, A. S., Andrasik, F., & Capparell, H. V. (1980). Behavioral and pharmacological treatment of obsessive-compulsive disorders. Journal of Newous and MentalDisease, 168.651-657.
Turner, S. M., Hersen, M., Bellack, A. S., & Wells, IL C. (1979). Behavioral treatment of obsessive compulsive neurosis. Beha&ur Reseamh and Therapy, 17,95-l%
Turner, S. M., Jacob, R. G., Beidel, D. C., & Himmelhoch, J. (1985). Fluoxetine treatment of obsessive-compulsive disorder. Journal of Clinical Pychqbhannacology, 5,207-212.
Vallejo, J., Olivares, J., Marcos, T., Bulbena, A, & Menchon, J. M. (1992). Clomipramine versus phenelzine
in obsessive-compulsive disorder: A controlled clinical trial. British JoumaL of Psych&y, I61,665-670.
van Balkom, A, van Oppen, P., Vermeulen, A, van Dyck, R., Nauta, M., & Vorst, H. (1994). A metaanalysis on the treatment of obsessive compulsive disorder: A comparison of antidepressants, behavior, and
cognitive therapy. ClinicalPsychologyRev&w, 14,359-381.
van den Hout, M., Emmelkamp, P. M. G., Rraaykamp, H., & Criez, E. (1988). Behavioral treatment of oboes
sive-compulsives: Inpatient vs. outpatient. Be&r
Rcseanzh and Therapy, 26,X31-332.
van Oppen, P. (1992). Obsessions and compulsions: Dimensional structure, reliability, convergent and
divergent validity of the Padua Inventoty. &hur&ur Reseanh and Therapy, 30,631-637.
van Oppen, P., Emmelkamp, P. M. G., van Balkom, A., & van Dyck, R. (in press). The sensitivity to change
of measures for obsessive-compulsive disorder. Journal of Anxiety LXsonicrs.
wn Oppen, P., Hoekstra, R. J., & Emmelkamp, P. M. G. (1995). The structure of obsessive-compulsives
symptoms. &haviaur Reseanh and Therapy, 33,15-23.
Warren, R., Zgourides, G., & Monto, M. (1993). Self-report versions of the Yale-Brown obsessive-compulsive scale: An assessment of a sample of normals. Psychobgical Rcpni5, 73,574.
Weissman, M. M., Bland, R. C., Canino, G. J., Greenwald, S., Hwu, H., Lee, C. R.. Newman, S. C., OakleyBrowne, M. A., Rubio-Stipec, M., Wtckmmaratne, P. J., Wittchen, H.-U., & Yeh, E.-K (1994). The cross
national epidemiology of obsessive compulsive disorder. Journal of Clinical Psychiatvy, 55(Suppl. S), 5-10.
Welkowitz, L. A, Bond, R. N., & Anderson, L. T. (1989). Social skills and initial response to behavior therapy for obsessive-compulsive disorder. Phobia Ractiu and Reseamh JoumaL 2,67-85.
Wells, A, & Davies, M. I. (1994). The thought control questionnaire: A measure of individual differences in
the control of unwanted thoughts. BehauiourRescatch and Therapy, 32,871-878.
Wells, A., & Morrison, A P. (1994). Qtalitative dimensions of normal worry and normal obsessions: A comparative study. BchauiuurR.warch and Therapy, 32.867-870.
Wing, J. R, Cooper, J. E., 8c Sartorius, N. (1974). The measumnnt and ckssijication of pFychiat& ympoms.
London: Cambridge University Press.
Wolf, F. M. (1986). Meta-analyk: Quantitative methods for eseavrh synthesis. Newbury Park, CA: Sage.
Woody, S. R., Steketee, G., & Chambless, D. L. (in press-a). Reliability and validity of the Yale-Brown
Obsessive Compulsive Scale. BehuvMurReseurch and Therapy.
Woody, S. R., Steketee, G., 8cChambless, D. L. (in press-b). The usefulness of the obsessive compulsive scale
of the Symptom Checklist-9O-Revised.BehaviourResearch and Therapy.
Zohar, J., 8c Insel, T. R (1987). Obsessive+omptdsive disorder: Psychobiological approaches to diagnosis,
treatment, and pathophysiology. Biological Pqchiatq, 22,667-687.

Clinpsych Rev

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Clinpsych Rev

Încărcat de

Drepturi de autor:

Formate disponibile

Clinical Psychology Review, Vol. 15, No. 4, pp.

ASSESSMENT OF OBSESSIONS AND

ABSTRACT. Advances in the treatment of obsessive-wm~lsive disora%r (OCD) require reliab~

impulses, or images, such as repetitive thoughts of violence or contamination.

Nunnallys (1978) criteria will be used to evaluate internal consistency; coefficient

Using Nunnallys (1978) criteria, acceptable test-retest reliability is indicated by r 2.70,

a number of different fear-related tasks. To illustrate, Rachman et al. (1979) asked

TestRezest tdkbi&y. Derogatis et al. (1974) reported a 7day test-retest Pearson r of

The LO1 was developed to assess obsessionality in Yhouse-proudn (perfectionistic)

Obsessims and compulsions

Obsessions and hpulsions

Obsessionr and GmpukMns

SCLSO-R OC scale (and predecessors), Compulsive Activity Checklist, Yale-Brown

The YBOCS is a semistructured interview designed to assess symptom severity and

*The computer-administered and interview versions were administered in counter-balanced

outpatients, it may be possible to have significant others make ratings of particular

CTksshs and Cam-

aFor at least 7 days.

TABLE 2. Sensitivity to TreatmentEffects: Behavioral Avoidance Tests

Cottraux et al. (1990)

Foa et al. (1984)

Rachman et al. (1979)

Note. SUDS = Subjective units of distress.

SCLQO-R OC and predecessor@

TABLE 3. Sensitivityt o T r e a t m e n t Effects: Self-Report Inventories

Likert Scales: Observer-Rated

Likert Scales: Self-Report

Obsessions and i3mpulGm.s

fluoxetine produced statistically significant reductions in OC symptoms. However, the

Cohen,J. (1988). Stuti.sticdpowcrana@erforfhe6&w&mis&nccr(2ded.).

Deliberate exposure and

Rachman, S., & Hodgson, R J. (1980). C#.usbs and concpursions

S-ar putea să vă placă și