Documente Academic
Documente Profesional
Documente Cultură
Entitled
By
Tara M.Hill
Counselor Education
May 2009
UMI Number: 3364311
Copyright 2009 by
Hill, Tara M.
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
UMI
UMI Microform 3364311
Copyright 2009 by ProQuest LLC
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Copyright © 2009
This document is copyrighted material. Under copyright law, no parts of this document
TaraM. Hill
University of Toledo
May 2009
The Substance Abuse Subtle Screening Inverntory-3 (SASSI-3; Miller & Lazowski,
individuals who may be substance dependent. Many researchers have reported reliability
and validity results on this instrument with mixed results, which at times have
contradicted those published by the authors of the instrument. This study is the first to
the SASSI-3's psychometric properties and a discussion of the methods used to evaluate
the instrument. The results demonstrated that the whole SASSI-3 meets fundamental
measurement properties and can discriminate groups of people from high to low on the
substance dependency variable. However, the face valid scales continue to demonstrate
iii
higher functioning when used independently of the subtle items. Based on these results,
future research recommendations include combining the Face Valid Alcohol and Face
Valid Other Drug scales to determine the functioning of these two scales together.
iv
Acknowledgment
Thank you Sarah Richards; your love and support were essential and I could not have
done any of this without you. Sue Nagy, Beth and Jim Hill, and Shaunda Jennings;
thanks for always believing in me. John Laux; your guidance, feedback, humor,
The rest of my committee, Paula Dupuy, Holly Harper, and Greg Stone; your support,
edits, and feedback were essential and appreciated. Thank you to Megan Mahon and
V
Table of Contents
Abstract .. hi
Acknowledgment ...v
Table of Contents..... vi
List of Tables... x
Organization of Chapters 10
Substance Dependence 11
vi
SASSI Psychometrics .. 22
Reliability.. 22
Validity 24
Response Validation 45
Unidimensionality 48
Independence 48
Summary 49
Overview 50
Participants 52
vii
Instrument - The Substance Abuse Subtle Screening Inventory-3 (SASSI-3) 54
Variable. 58
Procedures 59
Limitations 62
viii
Random Answering Pattern (RAP) 138
Implications.. 194
Limitations 197
Conclusion 197
References 199
ix
List of Tables
Table 16 - Dichotomous SASSI-3 Group 1 Paired Aligned Items Fit Statistics 146
Table 21 - Summary of Collapsing Strategy for Whole SASSI-3 Face Valid Response
Options 168
Table 21 - Summary of Collapsing Sartegy for Whole SASSI-3 Group 2 Face Valid
Table 23 - Summary of Person and Item Separation Findings and RPCA's for Direct
xi
List of Figures
Figure 1- Response Option 0123 Output for Face Valid Alcohol Group 1 .67
xii
Figure 19 - Item Map FAM Group 1 128
Figure 24 - Corrected Response Option Curve 0112 Dichotomous SASSI-3 Group 1.. 144
Figure 27 - Corrected Response Option Curve 0112 Dichotomous SASSI-3 Group 2.. 152
Figure 29 - Response Options Curves 0123 Dichotomous Whole SASSI-3 Group 1.... 166
Figure 30 - Corrected Response Options Curve 0112 Whole SASSI-3 Group 1 169
Figure 33 - Corrected Response Options Curve 0112 Whole SASSI-3 Group 2 174
xiii
Chapter One
Introduction
Substance dependency and abuse are expensive problems in the United States of
America and have negative impacts on its citizens (Substance Abuse and Mental Health
Services Administration [SAMHSA], 2008). In addition to the loss of life, there is a loss
in work productivity, reduction in days attended at school, money spent for medical care,
and convictions and prison sentences due to alcohol and drug problems (SAMHSA,
2008). Based on this information, it is important for people who struggle with alcohol and
drug abuse to get proper diagnosis and treatment. Part of the diagnostic process can
involve mental health professionals' use of substance use screening instruments. Due to
the clinical implications of the assessment process, it is necessary that substance abuse
are four substance dependence screens that are most frequently selected by these
counselors as aids in their diagnostic processes (Juhnke, Vacc, Curtis, Coll, & Paredes,
1
2003). These are the Substance Abuse Subtle Screening Inventory (SASSI-3; Miller &
Lazowski, 1999), the Michigan Alcoholism Screening Test (MAST; Selzer, 1971), the
Kaemmer, 1989), and the Additions Severity Index (ASI; McLellan, Luborski, Cacciola,
Griffith, McGranhan, & O'Brien, 1992). Of these four, the SASSI-3 (Miller & Lazowski,
1999) was identified by professional addiction counselors as being the most important
(Juhnke et al. 2003) for the following reasons: a) it measures alcohol dependence as well
(e.g., defensiveness and random answering); and c) it is scored and interpreted according
The SASSI-3 (Miller & Lazowski, 1999) is a paper and pencil, self-administered,
two-sided substance dependence screen that includes 67 true and false items on the front
and two columns of items on the back, each set of which presenting the respondent with
four rating scale choices of never, once or twice, several times, and repeatedly. The two
columns of items on the back of the screening are labeled Face Valid Alcohol (12 items)
and Face Valid Other Drug (14 items), respectively. These two groups of items directly
inquire of the respondent to identify the extent of the use and impact the use has had on
his or her life. However, the items on the front are meant to be more subtle in nature and
therefore illicit less defensiveness, a response commonly identified among clients who
2
The SASSI-3's items form ten scales. Seven of these scales, either independently
or in combination, are used for clinical decision making regarding the probability of a
client's substance dependence. This final disposition is made through nine decision rules.
If any of these decision rules are affirmative then the respondent is likely to be substance
dependent. The seven scales used in the decision rules include the Face Valid Alcohol
scale (FVA), the Face Valid Other Drugs scale (FVOD), the Symptoms scale (SYM), the
Obvious Attributes scale (OAT), the Subtle Attributes scale (SAT), the Supplemental
Addiction Measure scale (SAM), and the Defensiveness scale (DEF). A check of profile
validity is provided by way of the Random Answering Pattern scale (RAP). If the RAP
score is greater than one then the decision rules may be invalid due to the likelihood that
the respondent did not answer the items in a meaningful way. The final two scales are the
Correctional (COR) and the Family vs. Controls scales (FAM). These final scales lend
additional clinical information which may be included in treatment goals for the
respondent. All of the SASSI-3's scales Will be discussed in greater detail in subsequent
Reliability and validity test results have been published on the SASSI-2 and -3.
The results of these investigations provide a variant range of agreement with what is
found in the SASSI-3 Manual (Miller & Lazowski, 1999). For instance, the reliability
findings published in the SASSI-3 Manual (Miller & Lazowski) identified high internal
consistency scores for the individual scales. However, these results have yet to be fully
replicated by other researchers whose findings were as much as seven to twenty points
lower (Clements, 2001; Laux, Salyers, & Kotova, 2005; Myerholtz & Rosenberg, 1998).
3
Only moderate agreement was found between the SASSI-3 and other instruments
purporting to measure similar constructs (Laux, Salyers, & Kotova, 2005; Myerholtz &
(Gray, 2001), performed using factor analysis, failed to render the same ten factor
solution as reported by Miller and Lazowski in the SASSI-3 Manual (1999). However,
the factor structure of two of the SASSI-3's scales, the Face-Valid Alcohol scale and the
Face-Valid Other Drugs scale, did concur with the SASSI-3 Manual's data regarding
these two scales (Laux, Perera-Diltz, Smirnoff, & Salyers, 2005; Laux, Salyers, &
Kotova, 2005). Finally, the SASSI-3 Manual (Miller & Lazowski) reports high overall
accuracy, sensitivity and specificity rates when comparing the SASSI-3's classification
However, these high accuracy, sensitivity and specificity rates have not been replicated
by independent researchers (Arneth et al., 2001; Clements, 2002; Svanum & McGrew,
1995).
Miller, 2007; Gray, 2001; Laux, Salyers, & Kotova, 2005; Svanum & McGrew, 1995)
appear to question the SASSI-3's reliability and validity in the context of that which is
published by the SASSI-Institute. However, there has been no discussion in the literature
linearity, invariance, and independence. These terms will be introduced and explained as
. 4
they apply to the SASSI-3 investigation. Unidimensionality means that an instrument is
evaluating just one construct (Bond & Fox, 2007). In this study, the instrument of interest
is the SASSI-3 and the construct that it purports to measure is substance dependence
(Miller & Lazowski, 1999). The authors of the SASSI-3 acknowledge that it was not their
SASSI-3 was to advance an instrument that could discriminate between those who have a
high probability of substance dependence and those who do not (Miller & Lazowski).
conceptualized in terms of a yard stick (Bond & Fox). A hierarchy of items is created
according to level of difficulty with easy items on one end and difficult items at the other.
answer the items at the bottom, and harder items on the top requiring greater degree of
substance dependence to answer. Like a yard stick measures height, the taller one is the
more height he or she has, for the SASSI-3 the more items a person endorses the more
likely he or she is to be substance dependent. Invariance means that the items will be
means that regardless of the sample being measured, the alignment of the items on the
instrument will not vary. This means that the items will align in equal interval levels like
5
Statement of the Problem
Junkhe, Vacc, Curtis, Coll, and Paredes (2003) reported that one of the screening
instruments most frequently used by addictions counselors is the Substance Abuse Subtle
Screening Inventory-3 (SASSI-3; Miller & Lazowski, 1999). The SASSI-3 has been
used in a variety of settings including but not limited to community mental health
agencies, college counseling centers, prisons, alcohol and drug treatment facilities,
Lazowski, 1999). The SASSI-3's psychometric properties have been studied by several
independent researchers (Arneth, Bogner, Corrigan & Schmidt, 2001; Clements, 2002;
Feldstein & Miller, 2007; Gray, 2001; Laux, Perea-Ditlz, Smirnoff & Salyers, 2005;
Laux, Salyers & Kotova, 2005; Lazowski, Miller, Boye, & Miller, 1998; Peters et al.,
2000; Svanum & McGrew, 1995) have been found to differ, at times significantly, from
those reported in the SASSI-3 Manual (Miller & Lazowski). These differences may be
related to the traditional methods of testing reliability and validity used by researchers.
However, what is unclear is whether the SASSI-3 meets the fundamental requirements of
measurement. And, if there is doubt about whether the SASSI-3 meets the fundamental
requirements of measurement, then there is also doubt about the implications of the
diagnoses it informs and the subsequent treatment recommendations that are prescribed
properties of the SASSI-3 as this may lead to improvement in the instruments' accuracy
6
Purpose of the Study
with the fundamental principles of measurement as represented using the Rasch model
measurement properties of the entire instrument and the individual scales, evaluate the
reliability of the response options by identifying whether the participants are utilizing the
response scales as intended by the authors of the SASSI-3, and assess the linearity,
produce a unidimensional factor structure that accounts for 60% or more of the items'
total variance.
Research Hypothesis 2: An analysis of item fit will produce infit and outfit
7
Research Question 3: Are measures from the SASSI-3 ten scales reliable for
discriminate between those who are substance dependent and those who are not?
produce a unidimensional factor structure that accounts for 60% or more of the whole
Research Question 6: Does the whole SASSI-3 adequately measure the substance
dependence construct?
Research Hypothesis 6: An analysis of item fit will produce infit and outfit
statistics indicative of low item error for the SASSI-3 instrument as a whole.
purposes?
8
Research Hypothesis 7a: Rasch Reliability statistics demonstrate acceptable levels
Research Hypothesis 7b: The holistic SASSI-3 construct (as evidenced in the
to clearly discriminate between those who are substance dependent and those who are
not?
money. In the current state of the economy with budget cuts, mental health and drug
treatment benefits being reduced, alcohol and drug facilities closing due to funding issues
improvement in the screening instrument will save time and money. An improvement to
recommendations based on these improved accuracy rates. With improved accuracy rates
the right clients will be receiving treatment, which leads to higher treatment success rates,
which leads to the public voting for levies and more funding for drug treatment programs.
In order to clarify the term "substance dependence", the following definition will
9
Statistical Manual of Mental Disorders IV Text Revision (DSM TR-IV; American
pattern involving substance use, within the past twelve months which leads "to clinically
significant impairment or distress" (p. 197). Three of the following seven criteria must be
met for an individual to be considered substance dependent: 1) The individual must have
use of the substance to avoid withdrawal symptoms; 3) The individual must used more
than intended; 4) The individual tries to control or reduce the substance use despite
cravings to no avail; 5) The individual spends excessive time in substance seeking, use or
recovering behaviors; 6) The individual often neglects social, work, or other obligations
in favor of the substance use; and 7) Despite negative consequences, both physical and
Organization of Chapters
Chapter one introduced the problem and provided a rational for the study. Chapter
tv/o reviews the relevant literature. Chapter three presents the methodology to be used in
this study. Chapter four will consist of the results from the analysis and Chapter five will
10
Chapter Two
dependence and its impact on society and the Substance Abuse Subtle Inventory-3
(SASSI-3; Miller & Lazowski, 1999). Specifically, the review will begin with a
discussion of substance dependence, diagnosis and screening, and a review of the SASSI,
namely the Rasch model will follow. Finally, the chapter will close with a summary of
Substance Dependence
Substance dependence and abuse in the United States has a negative impact on
society. For example, the latest Substance Abuse and Mental Health Services
result of substance dependence. According to the 2006 SAMHSA report, the number of
visits to an emergency department due to drug abuse increased roughly four percent,
while the US population only increased roughly three percent. Additionally, between
2004 and 2006, visits related to non-legitimate use of prescription medications increased
38 percent. The National Highway Traffic Safety Administration (2007) reports that
11
someone is killed roughly every 40 minutes by a drunk driver. In addition to the loss of
life, there is a loss in work productivity, reduction in days attended at school, money
spent for medical care, and convictions and prison sentences due to alcohol and drug
problems (SAMHSA, 2008). Substance dependence has a great economic toll on society.
Often, persons with substance use problems are sent for screening assessments to
is necessary (Adger & Werner, 1994). There are several screening instruments available
Junkhe, Vacc, Curtis, Coll, and Paredes (2003) surveyed professional addiction
counselors to determine which screening instruments were used most frequently. The
results of this survey suggest that there are four instruments that are most frequently
employed. These were the Substance Abuse Subtle Screening Inventory (SASSI-3; Miller
& Lazowski, 1999), the Michigan Alcoholism Screening Test (MAST; Selzer, 1971),
12
Personality Inventory - 2's (MMPI-2; Butcher, Dahlstrom, Graham, Tellegen, &
Kaemmer, 1989), and the Additions Severity Index (ASI; McLellan, Luborski, Cacciola,
Griffith, McGranhan, & O'Brien, 1992). These professional addiction counselors also
identified the SASSI-3 as the most important assessment instrument (Junkhe et al., 2003).
In addition to the findings of Junkhe et al (2003), and because of the mixed literature
review regarding the instrument's reliability and validity, the SASSI-3 will serve as the
designed to discriminate between people who have a high probability of being substance
dependent from those with a low probability of having a substance dependence disorder,
symptoms" (Lazowski, Miller, Boye, & Miller, 1998, p. 115; Miller & Lazowski, 1999).
People with substance abuse and dependence disorders often deny the existence and
extent of the problem. The original SASSI was uniquely created to address the problem
direct, or Content obvious, and indirect, or content subtle, items (Miller & Lazowski,
1999). Introduced in 1988, the SASSI has gone through two revisions. Currently, the
The SASSI-3's current version was published in 1999. The conversion from the
SASSI-2 to -3 was driven by a desire to reduce rate of false positives, which was 15.5
13
percent (Miller & Lazowski, 1999). The conversion process included the creation of a
new seven-item scale, the Symptoms scale, and the elimination of two items whose
wording was deemed to be Objectionable. The seven items forming the Symptoms scale
were unused items already included among the SASSI-2's item pool (Gray, 2001;
Lazowski et al. 1998). Gray asserted that the differences between SASSI-2 and-3 were
minor and as such, the literature base supporting the SASSI-2 could "readily be
generalized" to the SASSI-3 (p. 104). For the purpose of this study, both reliability and
The current SASSI-3 instrument is printed on one page, front and back. The front
consists of 67 true-false items. The items on side one are typically referred to as indirect
or subtle as most, but not all, of the items do not directly inquire about the impact of
drinking or drug related behaviors. These 67 items make up eight of the SASSI-3's total
ten scales. The authors recommend that side one be administered first as the items on this
side are less likely to elicit defensiveness than those on side two, which directly ask about
Side two of the SASSI-3 includes the face valid items which inquire directly
about alcohol and drug use, behaviors, and the impact thereof. Because the items on side
two are obvious in their intent to measure substance use, there is a potential that
respondents might fake-good or minimize their substance use, if any (Miller & Lazowski,
1999). The response choices for the items on side two are placed along a four-point
Likert-type scale with the options of "never" (0), "once or twice" (1), "several times" (2),
and "repeatedly" (3). For each scale, the score of each item is summed to produce a total
14
score. It takes approximately fifteen minutes to complete, score, and interpret the SASSI-
3. Counselors use a transparent overlay to calculate raw scores for each of the SASSI-3's
scales. These raw scores are then transferred to a profile sheet, which can be used to
approximate the individual's T-scores and percentile scores. A discussion of the scoring
The SASSI-3 has ten scales, three of which are worded in such a way as to inquire
directly of the respondent of his or her use and the impact of the use of drugs and alcohol.
The directly worded scales are the Face Valid Alcohol (FVA), Face Valid Other Drug
(FVOD) and the Obvious Attributes (OAT) scales. The other seven scales are stated in a
subtle manner. The subtle scales are the Subtle Attributes (SAT), Supplemental Additions
(FAM), Correctional (COR), and the Random Answering Pattern (RAP) scales. All of the
scales are said to discriminate statistically between those who are and who are not
substance dependent (Miller & Lazowski, 1999). The FVA, FVOD, OAT, SAT, SAM,
SYM, and DEF are used in clinical decision making. This means that these scales
contribute to the decision rules for the clinician to further assess for treatment needs. The
RAP provides an indicator of how closely the respondent paid attention to the content of
the items, and the FAM and COR are experimental in nature and not used in the clinical
decision making process. Further discussion of the dichotomous clinical decision making
15
Prior to engaging in clinical decision making regarding whether a respondent is
likely to be substance dependent, counselors must first check the respondent's score on
the Random Answering Pattern (RAP) scale. The RAP scale is a measure of random or
careless answering. In this regard the RAP scale is a global measure of the validity of the
respondent's approach to the process, and not the content, per se. The RAP scale is
typically reviewed first to verify whether the respondent completed the instrument in an
appropriate manner (Miller & Lazowski, 1999). This scale is a "measure of response
validity" (Laux, Salyers, & Kotova, 2005, p. 43). Any of these reasons is sufficient to
cause doubt about the validity of the SASSI-3's results. The SASSI-3 Manual
recommends that if the RAP scale score is 2 or greater than the screener should "interpret
with caution" due to the possibility that the respondent did not answer the questions in a
meaningful way or did not understand the directions (Miller & Lazowski, 1999, p. 11).
The RAP scale consists of six, true-false items that produce a range of scores from 0-6. If
the RAP scores suggest that the respondent did not answer in a random manner, the
counselor moves forward with the interpretation of the remaining SASSI-3 scales.
The first of the two scales derived from the items on the face valid or second side
is the Face Valid Alcohol scale (FVA). The response choices are arranged along a four-
point Likert-type scale with the options of "never" (0), "once or twice" (1), "several
times" (2), and "repeatedly" (3). The raw score range is 0-24 for the FVA scale. The
FVA scale consists of twelve questions inquiring directly of alcohol use behavior and the
impacts of use. Examples of item content include alcohol consumption with noon meals
and suicide attempts while under the influence of alcohol. As the reader can plainly see,
16
these items are face-valid in that the intent of the items to measure alcohol use is obvious.
High FVA scores represent intentional recognition and admission of alcohol use. Low
FVA scores may reflect an absence of alcohol use, or they could be the product of efforts
The second face valid scale is the Face Valid Other Drug (FVOD) scale. The
FVOD items consist of fourteen questions inquiring directly of drug use behavior and the
impact of use. The response choices for the FVOD items are a four-point Likert-type
scale with the options of "never" (0), "once or twice" (1), "several times" (2), and
"repeatedly" (3) with a total raw score range of 0-27. Examples of item content include
whether the respondent has had legal trouble as a result of drug Use and used drugs to
avoid withdrawal symptoms. Miller and Lazowski (1999) reported that the higher the
scores on the FVA and FVOD the more progressed the substance dependence disorder.
The Symptoms (SYM) scale is purported to measure the signs, symptoms and
correlates of substance dependence in a direct manner (Miller & Lazowski, 1999). There
are eleven items on this scale with dichotomous response options, "true" and "false" and
a raw score range of 0-11. Examples of item content include inquiring of the respondent
whether he or she has concern regarding memory loss and family history of alcohol or
drug use.
The Obvious Attributes Scale (OAT) scale also utilizes direct items. High OAT
scores have been shown to indicate a willingness to "admit symptoms" (Myerholtz &
Rosenberg, 1998, p. 440), recognize "problematic behaviors" (Miller & Lazowski, 1999,
p. 15), and "personal limitations" (Laux, Salyers & Kotova, 2005, p. 43) frequently
17
associated with substance dependence. While these items are direct in that they ask the
respondent to admit to personal foibles, they do not require the respondent to make the
connection that their foibles are associated with any particular source. Consequently, a
respondent could produce an elevated OAT score without elevating one of the Face-Valid
scales. An interpretation of this arrangement of scores might be that the respondent was
aware that problems were occurring without understanding that these problems were a
consequence of personal substance use. Examples of GAT item content include behaviors
such as impulse control problems and low tolerance for frustration. There are twelve
items on the OAT scale with a raw score range of 0-12. Examples of item content include
whether responsibilities have been avoided or forgotten as a result of substance use and
The Subtle Attributes scale (SAT) consists of eight criterion-keyed items with a
range of raw scores between 0-8. Examples of item content include inquiries of whether
the respondent obeys laws and has excessive energy with a decreased need for sleep.
These eight items were selected solely on their basis of statistically distinguishing
inconsequential. The advantage to such items is that people who may be motivated to
conceal their substance use or those who are "in denial" about the extent to which they
may have a problem have no way to intentionally manipulate these items. Thus, they tend
to answer these questions differently than those who do not have a substance dependence
disorder (Laux, Salyers, & Kotova, 2005). The SAT scale is purported to measure the
18
predisposition of the respondent to developing a substance dependence disorder
(Meyerholtz & Rosenberg, 1998). Additionally, this scale has been able to discriminate
between substance abusers and non-abusers, regardless of their attempts to fake good or
The Defensiveness scale (DEF) consists of eleven criterion-keyed items and, like
the RAP scale, is used as a validity scale. The DEF scale measures "denial or deliberate
concealment of problems" (Myerholtz & Rosenberg, 1998) and is used in the decision
rules. As a result of the DEF scale being developed to discriminate between respondents
using the standard versus fake-good instructions, a respondent who has high scores on the
DEF scale may be making an effort to present him or herself in a positive way (Laux,
Salyers, & Kotova, 2005). Likewise, respondents achieve low DEF scores by endorsing a
high number of personal faults and foibles. Consequently, the DEF scale is also viewed
as an indirect measure of self esteem, depression, and, at the very lowest range, potential
suicidal ideation. The range of raw scores is 0-11 with scores of eight or higher
representing significant enough denial as to call the SASSI-3's results into question.
Examples of item content include inquiring about the amount of dangerous activities in
which the respondent has engaged and whether he or she is a restless person.
criterion-keyed items and has a range of scores between 0-14. Examples of item content
include whether the respondent feels worn out and whether he or she has experienced
periods of memory loss. This is the SASSI-3's third and final validity scale. The SAM is
19
defensive and those who are non-substance dependent with a more pervasive
defensiveness characteristic (Laux, Salyers, & Kotova, 2005; Miller & Lazowski, 1999).
The SAM scale is used to tease out whether elevated DEF scores reflect substances
specific defensiveness (high SAM score) or defensiveness due to some other reason (low
SAM score).
The Family versus Control Subjects (FAM) scale consists of fourteen criterion-
keyed items with a range of scores between 0-14. Examples of item content include
inquiry regarding whether the respondent would like more self-control and whether he or
she has ever broken the law. There are several potential interpretations of the FAM scale.
The SASSI-3 authors designed the FAM scale to assess the amount of focus a respondent
has on others (Miller & Lazowski, 1999). Myerholtz and Rosenberg (1998) reported that
the FAM scale identifies co-dependency. Still other researchers say it discriminates
between those who experienced substance abuse in their family of origin versus those
who did not (Laux, Salyers, & Kotova, 2005). The FAM is not used in the screening
decision rules but can be used to assess possible additional clinical issues needing
addressed in treatment.
range of scores between 0-15. Examples of item content include whether the respondent
has wanted to leave his or her residence and whether he or she would like to hit another
person. Respondents with a high score on the COR scale endorse items in a similar
pattern as those who have extensive criminal histories and legal involvement (Miller &
Lazowski, 1999). This scale is purported to assess the level of treatment or supervision
20
needed by the respondent, if there is evidence of a due criminal history (Miller &
Lazowski). This scale is also not part of the screening decision rules. The reader is
cautioned that there is no published data to suggest that the COR scale predicts future
}
illegal behavior. •
The SASSI - 3 uses has nine decision rules that are used to arrive at a decision
about the respondent's likelihood of having a substance dependence disorder. Each of the
nine rules has between one and five criteria. These criteria are cutoff scores for seven of
the ten scales. If the cutoff score is met or exceeded, the rule is indicated as "yes". If
unmet, the rule is indicated as "no". Rules 1 and 2 are based solely on the FVA and
FVOD scales, respectively. Rules 3, 4, and 5 are based solely on data from the SYM,
OAT, and SAT scales respectively. The remaining rules 6-9, are based on a combination
of the various scales, both direct and indirect. Decision rule 6 must be a score of seven or
more for the OAT and five or more for the SAT to be a "yes". Decision rule 7 includes
two criteria. The first criterion is a combination of an FVA score of nine or more or an
FVOD or fifteen or more. The second criterion is a SAM score of eight or more. If both
criteria are met the Decision rule 7 is a "yes". Decision rule 8 must be a score of five or
more on the OAT, eight or more oft the DEF and eight or more on the SAM to be a "yes".
Decision rule 9 includes four criteria. The first criteria is a combination of an FVA score
of fourteen or more or eight or more on the FVOD. The second criteria is two or more on
the SAT, 4 or more on the DEF, and four or more on the SAM. If all four criteria are met
then Decision rule 9 is a "yes". An indication of "yes" on any of the nine decision rules
21
indicates a high probability of substance dependence. If all decision rules are answered
respondent has a low probability of being substance dependent but had a score of eight or
more on the DEF scale then the counselor is cautioned that the results may be a false
negative.
SASSI Psychometrics
The following section will present and critique the available data regarding the
S AS Si's reliability and validity. Initially, the researcher will provide the psychometric
data provided in the SASSI-3 Manual (Miller & Lazowski, 1999). Then, the data
provided by independent researchers will be introduced. This section will conclude with a
critique of the available literature as well as a recommendation for a new approach to the
question of the SASSI-3's psychometrics. To begin, the researcher will provide a brief
sections.
Reliability. Reliability means that an instrument yields stable results for a given
sample (Bartholomew, 1996; Mark, 1996; Traub, 1994). It is important to understand that
data about an instrument's reliability is sample specific (Gray, 2001). That is, reliability
is an attribute of and is specific to the sample and its data, rather than a characteristic of
the instrument. There are several methods of assessing the reliability of an instrument's
data. These methods are the test-retest, internal consistency, split-half, and inter-rater
reliability tests. Split-half and inter-rater reliability tests are not appropriate tests of
reliability for a screen such as this and thus have not been used to evaluate the SASSI-3.
22
Correlation statistical tests are used to evaluate reliability and include Cfonbach's Alpha,
often referred to as the alpha coefficient, (Cronbach, 1951), the Pearson product moment
andt-tests.
Test-retest reliability is used to explore the stability of the results from a given
instrument over a brief time period (Sproll, 1995). A scale is said to have test-retest
reliability when the value it assigns to a trait does not fluctuate between pretest and
the instrument once and then a second time following a two- or four-week time delay. A
correlation coefficient between the first and second administration is calculated, the
be reliable if the results of the test-retest yields stable scores across the time delay, from
measuring a similar construct (Sproll, 1995). The internal consistency estimate provides
administered once, after which a statistical procedure will report the overall mean
correlation of each item's variance with each other item on the instrument. The
instrument is reported to be reliable, if the items strongly correlated with one another
(Reis & Judd, 2000). The internal consistency of an instrument is commonly referred to
as alpha co-efficient. The statistics used to evaluate the internal alpha coefficient are the
23
Cronbach's Alpha (Cronbach, 1951) and the Kuder-Richardson 20 and 21 (KR-20 & KR-
21; Kuder & Richardson, 1937). Each of these formulas measure internal consistency;
however, they are to be used in different circumstances. The Cronbach's Alpha can be
used with instruments employing any type of response option scales (i.e., two choice
scales to more than two response option categories such as the Likert-type response
scale). The KR-20 and KR-21 were designed to be used specifically and exclusively for
only answers the question of whether or not an instrument provides consistent results.
Validity. Validity has been considered in the past as "most fundamental and
what it reports to measure (Mark, 1996). This means that if an instrument is reported as
being able to assess substance dependence, the instrument will indeed measure substance
abuse and not self-esteem, anxiety, or depression. While reliability can be investigated
empirically using statistics and formulas, "validity is more of a theoretical issue, and
2000, p. 300). There are three ways to explore an instrument's validity. Those methods ,
24
Content validity is sometimes referred to as "face validity", but the two concepts
are not synonymous. Face validity is defined by Mosier (1947) as "the extent to which
the items appear to measure a construct that is meaningful to lay persons or typical
examinees" (Cited in Cocker & Algina, 1986, p. 223). Content validity refers to the
extent to which the items in the instrument accurately reflect the domain of interest
(Bartholomew, Henderson, & Marcia, 2000). Due to concerns about denial and
defensiveness among people with substance abuse and dependence disorders, there is
some debate about the appropriateness of using face valid items to screen for these
disorders.
Content validity is the first evaluation used to classify an instrument as valid and
is typically done in the earliest stages of test development (Bartholomew, Henderson, &
Marcia, 2000). Several steps are followed to assess an instrument's content validity. The
first step is to establish the researcher's intent with regard to the instrument and develop a
pool of items. These items are then evaluated by content expert judges to ascertain their
degree of agreement with the objectives of the instrument (Cocker & Algina, 1986). The
correlation of these matches among judges is evaluated for congruence. Highly congruent
matches between judges, items, and objectives mean that the instrument has content
validity.
responsible (Mark, 1996). Wallen and Fraenkel (1991) outlined a three step process
useful in identifying whether an instrument is high in construct validity. The steps are 1)
25
to create a clear definition of the variable, 2) based on a theory underlying the variable,
develop hypotheses which are formed about how people who possess a "lot" vs. a "little"
of the variable will respond to a particular situation, and 3) test the hypotheses both
logically and empirically - that is, by collecting additional information (Wallen &
Fraenkel, p. 95). For example: Is multicultural awareness the only quality being measured
manner, the instrument's structure is evaluated for underlying constructs which may be
(Sproll, 1995, p. 77). To investigate construct validity, the researcher developing the
above mentioned instrument would want to compare it to the individual's level of cultural
hoping for high correlation. The researcher may also compare the individual's level of
racism (an opposing trait), using multiple sources of data, hoping for a low correlation.
These two types of correlations are used to demonstrate two types of construct validation,
convergent validity and divergent validity, respectively. Convergent validity is the degree
(Mark, 1996). Using the example above, the researcher can count the number of multi-
awareness. A high level of positive correlation would mean that the instrument has
26
the example above the researcher can interview an individual regarding his or her beliefs
about racially charged political events and qualitatively analyze the results hoping for a
negative correlation between the results of the interview and the instrument. Or, the
researcher can evaluate two differing groups of people hoping to differentiate between
them using correlations with the instrument. A negative correlation would demonstrate
discriminate validity.
Construct validity can be evaluated using four methods which are correlations by
SASSI-3 has only been completed using the convergent, discriminate, differentiation
analysis, and factor analysis validity exploration methods. Therefore, for the purposes of
usually involves the use of alternative methods of measurement, if the primary construct
is evaluated via a survey instrument (Litwin, 1995). Convergent validity can be evaluated
more is acceptable in social science research (Nunnally, 1978). There is less agreement as
to the acceptable level of the kappa coefficient as there are several different
interpretations which distinguish the levels of kappa (see Altman, 1991; Carletta, 1996;
Landis & Koch, 1977; Viera & Garrett, 2005). Altman (1991) adapted the Ladis and
Koch (1977) interpretation table for kappa indicating that a kappa score of less than .20 is
27
poor agreement, .21-.40 is fair agreement, 41-.60 is moderate agreement, .61-.80 is good
validity. Factor analysis allows researchers to identify the structure of the instrument and
validate whether it is measuring a common factor (Sproll, 1995). Using the factor
analytic method of validity testing, researchers compute a correlation matrix between the
subjects and items and then conduct a reduction technique to identify the number of
underlying constructs accounting for the variation in the variables (Cocker & Algina,
1986).
Criterion referenced validity is the degree of agreement the instrument has with a
'gold standard' for "assessing the same variable" (Litwin, 1995, p. 37). This 'gold
standard', the criteria against which the instrument is compared, is regarded as the best
measure of the construct (Litwin). The instrument being tested may be a more efficient,
cost effective, quicker or shorter method of evaluating the same construct. Criterion
referenced validity tests have a five step design as identified by Crocker and Algina
(1986). Those steps include 1) identifying a construct and a method to evaluate it; 2)
selecting a sample; 3) collecting and maintaining the data for future evaluation; 4) when
available, obtaining data on comparison construct for each participant; and 5) using a
instrument is said to have predictive validity if it can predict, through the use of a
28
correlation, a future second variable (Sproll, 1995). A common example of this type of
validity test involves pre-college entrance examinations such as the Scholastic Aptitude
Test (SAT). Admissions departments often base their determinations on SAT scores
among other criteria because the SAT is said to predict future performance in college
construct that is present at the time of the evaluation (Cocker & Algina, 1986; Mark,
clinical diagnosis using the Diagnosis and Statistical Manual of Mental Disorders IV Text
health professional.
correctly identify those who meet the 'gold standard' criteria and correctly identify those
who do not meet the 'gold standard' criteria. The terms used to describe the two
conditions described above are sensitivity and specificity, respectively (Altaian, 1991).
With regard to this study, sensitivity refers to the SASSI-3's ability to correctly identify
persons with a substance use disorder, and specificity is the ability to correctly identify
those who do not have a substance dependence disorder. These two concepts are closely
related to the concepts of false positives and false negatives. False positive is when a
screen incorrectly identifies someone as having a substance use disorder. False negative
is when a screen incorrectly says that a person does not have a substance use disorder. If
29
an instrument is high in sensitivity and specificity, it is low in false positives and false
negatives.
SASSI-3 reliability from the SASSI-3 Manual. The authors of the SASSI-3 Manual
report that the two-week test-retest stability coefficients are 1.0 for the face valid scales
and between .92 and .97 for the clinical scales for the sample taken from voluntary
programs, and a sexual offender treatment program across the United States (Miller &
Lazowski, 1999). They report the alpha coefficient as .93 for the entire instrument.
However, the reported alphas by scale are low with the exception of the face valid scales.
The FVA and FVOD scales' alphas are .93 and .95 respectively. The SYM, OAT, and
SAT scales' alphas progressively decrease from .79 to .69 to .27 alphas. The DEF scale
alpha is .63 and is followed again by a decrease in values for the SAM and FAM scales'
alphas of .37 and .33 respectively. The COR scale has a .71 alpha. These varying alpha
values are explained by the authors by identifying that the instrument was not developed
to be unidimensional and therefore, the alpha findings are "not a primary consideration"
(Miller & Lazowski, 1999, p. 26). The support for their findings has been mixed and are
investigations have found the SASSI-3 data to be at varying levels of reliability with
inconsistent findings when compared to the results found by the originators of the
instrument. Consistent with Miller and Lazowski (1999), several researchers have
identified that stability coefficients are the most meaningful reliability test because the
30
SASSI-3 was not constructed to be a unidimensional measure (Lazowski et al. 1998,
Miller & Rosenberg, 1998). However, this assertion has been both supported and
inconsistent findings (Feldstein & Miller, 2007). In their study investigating the efficacy
of the SASSI-3, Lazowski, Miller, Boye, and Miller (1998) utilized a two-week test -
retest to explore reliability with a similar population as that reported in the SASSI-3
Manual. They found SASSI-3 score stability to be between 1.0 for the face valid scales
and between .92 and .97 for the subtle scales. These findings are consistent with the
findings reported in the SASSI-3 Manual which is 1.0 for both the FVA and FVOD
scales (Miller & Lazowski, 1999). With a college sample, Laux, Salyers and Kotova
(2005) also found stability scores for a two-week test-retest reliability investigation of the
Rosenberg (1998) tested the reliability of the SASSI-2 using the test - retest method with
college students. Using several subsamples, these researchers found that the two-week
stability coefficient using the Pearson product-moment correlation coefficients for the
FVA and FVOD scales were .82 and .89 respectively. This demonstrates a moderately
high level of correlation indicating that the set of scores remained relatively stable.
However, other studies have found higher correlation coefficients (i.e., Laux, Salyers, &
Kotova [2005] found .94 for the FVA). The face valid scale stability findings range from
1.0-.97 (Lazowski et al, 1998; Miller & Lazowski, 1999). In the social sciences, it is
generally acceptable if the stability is above .70 (Nunnally, 1978). The clinical scales
31
indicate a more widely spread correlation coefficient across the scales. According to
Myerholtz and Rosenberg (1998), the stability coefficients ranged from .78 to .54,
averaging .71 across the six clinical scales. This indicates less stability than reported by
Miller and Lazowski (1999) but moderate stability in the set of scores between testing
With respect to overall classification, a significant but rarely reported finding for
dependent and three were found to be chemically dependent after initially being classified
as non-chemically dependent.
students, Meyerhotlz and Rosenberg (1998) the stability results were mixed. The Pearson
product moment correlation coefficients for the Face Valid (FVA and FVOD) scales were
.76 and .93. This demonstrates a moderate and high level of correlation between time one
and two, indicating that the set of scores remained stable across the scales. Myerholtz and
Rosenberg found for the clinical scales the correlation coefficient for the 4-week test-
retest group ranged from .78 to .42, averaging .63 across the six clinical scales. This
indicates less stability in the set of scores between testing times for the clinical scales. In
addition, of the 47 participants, nine (19%) were found to have a change in classification
from the first to the second testing time four weeks later indicating "poor" reliability
(Myerholtz & Rosenberg, 1998, p. 441). Four (10.5%) of the 38 participants initially
32
found to be non-chemically dependent for test 1 were classified as chemically dependent
four weeks later on the retest. Five (56%) of the nine participants initially found to be
dependent four weeks later on the second administration. These 4-week stability estimate
results cannot be placed in the context of the SASSI-3 authors' findings as the SASSI-3
Manual only reports 2-week correlation coefficients (Miller & Lazowski, 1999).
The Myerholtz and Rosenberg (1998) findings indicate that there is higher
stability in scores for test-retest for the SASSI-2 direct scales and poor stability for the
clinical scales. They conclude that because the SASSI-2 purports to screen for an
"enduring trait of chemical dependency", the inventory should have more robust clinical
scales and fewer changes in status over testing situations (Myerholtz & Rosenberg, p.
445).
Four studies have investigated the internal consistency of the SASSI-3 or its
scales. The face valid scales have been found to have high internal consistency. The
reported subtle scales' internal consistency varies from good to poor. The coefficient
alpha for the FVA was .92 in a study comparing the SASSI-3 to other substance abuse
screening instruments with a college population (Laux, Salyers, & Kotova, 2005). The
coefficient alpha for the FVOD was .95 in a study which supports the psychometric
properties of the scale using a college student population (Laux, Perrera-Diltz, Smirnoff,
& Salyers, 2005). These two coefficient alpha findings for the FVA and FVOD scales are
consistent with those of Clements (2002) and Miller and Lazowski (1999). The decision
rule findings for SASSI-3 produced a .49 coefficient alpha (Clements, 2001). This means
33
that the items which are included in the scales used for the decision rules finding a person
Clements also found that the three direct scales had the highest coefficient alphas, and the
SASSI-3 validity data from the SASSI-3 Manual. While the SASSI - 3 Manual
(Miller & Lazowski, 1999) identifies two scales as "face valid" the content validity is not
reported for those scales, any other scale, or the instrument as a whole. In stating the
obvious however, the FVA and the FVOD scales appear meant to be reflective of their
content validity. In a study conducted by Lazowski, Miller, Boye, and Miller (1998), the
researchers explored previous research that compared the SASSI-3 to the MMPI - 2
MMPI-2 Addition Potential Scale (Weed et al.), MAC-R (MacAndrew, 1965), the MAST
(Selzer, 1971), and the Millon Clinical Multiaxial Inventory-II (MCMI-II) Alcohol
Dependence Scale and Drug Dependence Scale (Millon, 1987). They found that people
who scored positive for substance dependent on the SASSI-3 had higher mean scores and
all of those that scored non-dependent on the SASSI-3 had lower mean scores on the
Miller and Lazowski report in the SASSI - 3 Manual (1999), when using the
and 94.2 percent specificity for the SASSI-3. Again, comparing the SASSI-3 to clinical
diagnosis, Lazowski, Miller, Boye, arid Miller (1997) found an overall accuracy rate of
34
SASSI validity data from independent researchers. In the current SASSI literature,
researchers have compared the SASSI to other survey instruments measuring the same
construct (Laux, Salyers, & Kotova, 2005; Lazowski, Miller, Boye, & Miller, 1998;
Myerholtz & Rosenberg, 1998). When comparing the SASSI-2 to other instruments
which also purport to screen for alcohol and of drug problems, Myerholtz and Rosenberg
(1998) found that the SASSI-2 had less than acceptable (.61) convergent validity with the
CAGE (Ewing, 1984; Mayfield et al., 1974). "CAGE" is an acronym, the letters of which
represent the following alcohol-related traits and behaviors: C- have you ever felt you
should cut down on your drinking, A- have people annoyed you by criticizing your
drinking, G- have you ever felt bad or guilty about your drinking, and, E- have you ever
had a drink first thing in the morning to steady your nerves or to get rid of a hangover
clients. Laux, Salyers, and Kotova (2005) compared the SASSI-3's classification
agreement with the MAST, CAGE and MAC - R (see Table 1). Using the Altaian
approach to Kappa interpretation, Laux, Salyers, and Kotova (2005) identified that the
agreement between the SASSI-3 and the CAGE and MAST is in the "high-moderate
35
Table 1
SASSI &
Myerholtz &
61 .58 .34 .22
Rosenberg —
Good Moderate Fair Fair
(1998)
A factor analytic evaluation of the SASSI-2 was published by Gray (2001) who
found through confirmatory factor analysis and exploratory factor analysis that the ten
factor solution as suggested by the ten scales identified in the SASSI-3 Manual, was not a
good fit for his data. In fact, a two factor solution, with items mostly representing the
FVA and the FVOD scales, accounted for up to 53 percent of the variance (Gray). The
subtle items did not organize into the scales as identified by the SASSI-3 Manual and
were found to be "multivocal" (Gray, p. 109). This dimensionality was confirmed later in
two studies exploring the FVA scale and the FVOD respectively (Laux, Salyers, &
examined the SASSI-3's predictive validity among patients with traumatic brain injury
36
(TBI). These authors compared the TBI patients' SASSl-3's results with their blood
alcohol level (BAL) at the time of their injury in an effort to see which would better
predict chemical dependency. They found that the SASSI-3's results were equally
Researchers (Arneth et al., 2001; Clements, 2002; Peters et al., 2000; Svanum &
McGrew, 1995) investigating the SASSI's criterion referenced validity have used
diagnoses by licensed mental health professionals using the DSMIV-TR criteria as the
gold standard. The results of these studies are mixed and are not consistent with of the
results published in the SASSI - 3 Manual (Miller & Lazowski, 1999). For example, in a
study of the SASSI -2, Svanum and McGrew (1995) found a sensitivity of 33 percent and
population, Peters et al. (2000) found an overall accuracy rate for the SASSI -2 of 69.4
percent with a sensitivity of 73.3 percent and a specificity of 62.2 percent. A similarly
lower finding came from a study of TBI patients using the SASSI-3 and diagnostic
criteria (Arneth et al., 2001). The accuracy rate was found to be 69.2 percent, sensitivity
rate equaled 70.8 percent, and specificity was 68.5 percent. All of which were reported to
be statistically significantly different than the normative sample at the p<.001 level
(Arneth et al.). Using a sample of college students, Clements (2002) also found lower
hypothesized that if the cutoff scores were lower for the college population, the SASSI-3
may have higher sensitivity. Upon further investigation Clements found that if the cutoff
37
researchers have been unable to replicate many of the sensitivity, specificity, and overall
The SASSI-3 is one of the most frequently used substance dependence screening
instruments used by counselors and has been identified as the "most important"
instrument of its kind (Juhnke et al., 2003). Unfortunately, there is significant question
validity. This may be due to differences in the methods researchers use to evaluate
sources. Those sources, which may contribute to changes in scores from the initial to the
follow up test include: (a) the individuals attempts to recall what was previously asked or
how they answered, (b) changes in the characteristic being assessed, and (c) changes in
the conditions or environment and the interaction between the individual and those
The major limitation for testing the internal consistency for the SASSI -3 includes
the fact that many of the instrument's scales were not developed to measure one
construct. Rather, their test construction and item selection were guided by the criterion
between people who are substance dependent and those who are not, regardless of the
item's content (Miller & Lazowski, 1999). Therefore, internal consistency is a less
38
relevant reliability measure for the SASSI-3 (Miller & Lazowski, 1999; John & Benet-
Martinez, 2000).
bias" (Brewer, 2000, p. 9) which can reduce validity. Researchers can use two different
methods to evaluate the construct which will aide in eliminating mono-operational bias.
construct, he or she should use a method other than a survey instrument to evaluate a
validity". While these concepts can compare two measures and varying sources of data,
they in fact are fundamentally different in their intentions. Construct validity involves
SASSI with other substance dependence screening instruments, the findings result in
mixed outcomes at best. These mixed outcomes may be the result of a lack of clarity and
validity are sample specific. If the sample changes, the results of the reliability and
validity investigations will change as well (Keeves & Masters, 1999). The performance
of the person is dependent on the instrument in classical test theory because there is an
interaction between the instrument and the sample (Keeves & Masters). As a
39
consequence of this interaction, no inferences can be made about the performance of any
one person on any particular item. Instead, all that can be known is the individual's
performance on the test as a whole (John & Benet-Martinez, 2000). Additionally, there is
no way to empirically evaluate the quality of any individual item (Kagee & deBruin,
2007). Nor is there a way to empirically evaluate the response scales of the instrument in
classical test theory (Keeves & Masters). Finally, often researchers assign numbers to an"
ordinal scale and then assume that those numbers are interval and mean the same for each
item in order to use them in statistical analyses (Keeves & Masters). Each of these
limitations can be addressed through the use of measurement models in which a person's
performance and the items are independently scaled "along a continuous intervally scaled
Rasch Measurement
instrument's reliability and validity (Fox & Jones, 1998). This method of evaluating the
measurement (Thurstone, 1927). These principles are the same principles utilized when
measuring the height of a house, the weight of a baby, or the volume of a container of
(b) linearity; (c) invariance; and (d) independence, can be applied to instruments which
are designed to measure psychological constructs in humans (Stone, 2007). Each of the
Thurstonian principles will be described in detail and will include an example of its
40
application on a poplar measure of general distress, the Symptoms Checklist-90, Revised
(Derogatis, 1975) as evaluated using the Rasch method by Elliott etal. (2006).
characteristic of an object at a time (Bond & Fox, 2007). For instance, a scale only
measures weight, not height. A ruler only measures length and not temperature. In
counseling research, this means that an instrument should only measure one trait or
psychometric properties of the SLC-90-R (Elliot et al., 2006). The researchers found that
the instrument measured the construct of "general clinical distress" as evidenced by the
measurement principal components analysis finding that the instrument accounted for 78
percent of the total variance (Elliot et al., p. 359). This means that the SCL-90-R is
measuring one construct; the items on this instrument aligned in a hierarchical fashion
according to difficulty.
Linearity implies that an object of measurement has more or less of the construct.
For instance, a person has more height or less height than another person, more weight or
less weight than another. In counseling research, an example is that an instrument should
measure more or less of a construct such as more or less anxiety, or more or less
depression in a person. This is evident in the SCL-90-R because the analysis using the
above principles of measurement found increasing levels of severity both among the
items and the people (Elliot et al., 2006). There was a continuum of items from more to
41
agreeability for people from "non-clinical" to "extreme distress" indicating a hierarchical
across samples (Stone, 2007). For instance, five inches is equal to five inches regardless
of where on the ruler one begins to measure or what one is measuring. In counseling
research, this means that an instrument regardless of whether one starts measuring with
the low end units or the middle units of the "ruler" will result in measuring the same size
unit. In the psychometric analysis of the SCL-90-R, Elliott et al. (2006) found that the
instrument could be used to measure people at the high end of the ruler, demonstrating
that individuals were experiencing extreme clinical distress, and at the low end of the
ruler, demonstrating this part of the sample was experiencing non-clinical distress.
seriously affected in its measuring function by the object of measurement" (as cited in
Wright, 1960, p. ix). For instance, whether a person is weighing apples at the produce
market, a baby at birth, or gold, the scale is an instrument used to measure and ounces are
the unit of measurement regardless of the item being measured. In addition, the scale
does not measure color of the apples, length of the baby, or karats of gold. In counseling
42
tickets, and dispense medications, people have come to rely on systems of calibrated
apply in the afternoon? Why do we change our definition of and standards for
measurement when the human condition is the focus of our attention? (Bond
& Fox, p. 1)
(Fox & Jones, 1998). Many researchers have explored the psychometric properties of
several different psychological constructs and instruments using the Rasch model. Some
of those investigations include hostility (Strong, Kahler, Greene, & Schinka, 2005), the
Symptom Checklist-90-Revised (Elliott et al., 2006), school readiness (Banerji, Smith, &
Dedrick, 1997), detainees distress (Kagee & de Bruin, 2007), and evidenced based
practices in the criminal justice system (Henderson, Taxman, & Young, 2008).
Using the Rasch model is user friendly for instrument development and for
computer program used for this evaluation. Winsteps provides researchers easy to read
43
tables, charts, and graphs. The variable, scales, items, and people can be represented
through clear pictorial representations such as the person-item map and the response
probability curves. These charts and graphs will be referred to throughout this section and
Elliot et al. (2006) used the Rasch model to explore psychometric properties of
the SCL-90-R. Their method outlined the process by which other researchers can
evaluate instruments. This method, to be described below, involves the following steps:
1) evaluate the separation and reliability for the entire instrument, 2) response validation,
3) analyze the item fit, 4) evaluate the construct analysis, 5) evaluate the instrument for
unidimensionality by reviewing the fit statistics and the principal components analysis,
and 6) investigate whether the items function the same with a different sample. After
each step, the person and item separation and reliabilities will be evaluated for changes.
consistency is analogous with the Rasch model's person separation and item separation
reliabilities (Fox & Jones, 1998). The separation statistic assists in identifying the number
of distinct groups among the items and people (Elliot et al., 2006). From the separation
statistic the strata index can be determined (Bond & Fox, 2007). The strata inform
researchers of the statistically distinct groups of people and items found. It is suggested
that a separation of two is the minimum acceptable standard (Wright & Masters, 1982 as
cited in Elliot et al.). A separation of two or greater creates three or more distinct groups
of items or people. The output, known as item map, is another indication of person and
item separation as they can be visually distinguishable on this diagram (Elliot et al.).
44
The first step in evaluating an instrument is to review the separation and
reliability (Elliot et al. 2006). The Rasch outputs offer two sets of statistics for separation
and reliability, one is for the items and the other is for the participants which is called
"person". The second step is to evaluate the separation and reliability for the subscales.
The separation and reliability statistics will be the basis upon which the researcher will
compare any changes made to the response scales or elimination of misfitting items. For
example, if the researcher eliminates a misfitting item, this may affect the separation and
reliability statistics. If it increases these two statistics, than the outcome of the change is
positive. If it decreases these two statistics, than it may limit available information
Response Validation. Researchers can use the Rasch model to determine whether
participants utilized the rating scale as established by the developers. This process is
the response scales may not be working as the researchers intended (Elliot et al., 2006).
Completing a rating scale analysis allows researchers to test their hypotheses regarding
whether the rating scale was clear, had the correct amount of response choices, and
whether the participants were using the scale as developed (Fox & Jones, 1998).
Conducting an analysis of the rating scale also allows researchers to evaluate whether the
instrument's items function unidimensionally (Elliot et al.). For this analysis, the
commonly accepted rule is that the distance between two adjacent response options
(threshold) should be more than 1.4 but not more than 5 logits (Linacre, 1999). A logit is
a unit of measurement that is arranged on an equal interval log scale (Bond & Fox, 2007).
45
A second way to evaluate the rating scales is to visually inspect the response probability
curves output. For each item, a probabilistic curve is created from the data. This curve
demonstrates the likelihood of each response option being chosen by the sample. If any
response option curve does not exceed 50 percent probability of being selected or the
threshold is below or in excess of 1.4 or 5 logits, test developers should consider re-
evaluating the rating scale, redefining the options, or logically collapsing two response
assists in scale development and individual diagnosis. In the SCL-90-R Rasch analysis,
the researchers found that the respondents were not using the response scale as expected,
and therefore, for the instrument's response scales to function as intended and maintain
separation and reliability, it became necessary to collapse the five point Likert-type scale
to a three point scale (Elliot et al.). This means that the original scale i.e., (1) not at all,
(2) a little bit, (3) moderately, (4) quite a bit, (5) extremely, needed adjusted because
individuals did not respond to these categories in five distinct ways; but instead
individuals responded in three distinct ways: (1) not at all, (2) a little bit and moderately,
and (3) quite a bit and extremely. By collapsing the rating scale in this manner, the
Item Fit Analysis. Item fit is similar to construct validation from a factor analysis
point of view. The purpose of item fit analysis is to investigate whether any item is
measuring "something qualitatively different" than the construct of focus (Elliot et al.,
2006, p. 362). Fit statistics are sensitive to "unexpected variance in response patterns"
46
(Henderson, Taxman & Young, 2008, p. 166). Bond and Fox (2007) identify the in-fit
mean square cutoff as 1.4 for items that are measuring something different. If an item is
over 1.4, the researcher should consider that the item in question is not measuring the
construct of interest. To explore item redundancy, the same criterion is applied as well as
residual correlations are similar to tests of significance for each item. Items that have >
•
high standardized residual correlations are redundant and not contributing to the
information provided by the data. Items that are considered redundant have an out-fit z-
standard score of under 0.7 and also are among the highest standardized residual
not negatively impact the separation and reliability statistics. Fit statistics is a test of
in the evaluation of instruments when conducting construct analysis. The Rasch model
analyzes linearity by allowing for item ordering along a continuum ranked by difficulty
(Elliot et al., 2006). A way to conceptualize Rasch construct analysis is to consider a flag
pole as the variable of substance dependence, from less, (i.e., low on the pole) to more
(i.e., high on the pole). Flags on the left of the pole are items. Items are arranged from
difficult to endorse at the top, to easy to endorse at the bottom of the pole. People are
arranged on the right of the pole from possessing more substance dependence at the top
to less substance dependence at the bottom of the pole. In considering the FVA scale of
the SASSI-3, the items inquire about the behaviors of respondents involving alcohol such
47
as drinking with lunch or suicide attempts when drinking. Respondents are more likely to
endorse the item regarding drinking with lunch than the item inquiring of suicidal
behavior when drinking. Therefore, the second item is considered more difficult. This
item continuum allows researchers to compare the order of items to clinical and
(Banerji, Smith, & Dedrick, 1997). Two methods are used to investigate an instruments
unidimensionality. The first method is to investigate the fit statistics, which was
dimension (Elliot et al. 2006). The second method of unidimensionality is through the
variance for the instrument (Elliot et al.). In the study investigating the SCL-90-R, the
researchers found that while the instrument is not completely measuring a unidimensional
construct, the additional multidimensions are trivial in comparison to the overall distress
identified in the RPCA which demonstrated that the measure accounted for 78 percent to
instrument's validity and reliability is to compare two samples to verify the consistency
of the measure (Elliot et al., 2006). This analysis allows researchers to assess whether a
measure maintains its meaning across different samples. This is the measurement
property of invariance or specific objectivity (Bartholomew, 1996; Bond & Fox, 2007).
48
Items should line up according to their level of difficulty, regardless of the population
the study investigating the SCL-90-R, Elliot, et al. identified that there was no
meaningful or statistical difference between the clinical and non-clinical samples on the
Summary
reliability and validity. The SASSI authors' and independent researchers' results varied in
many ways. Limitations of the traditional approaches taken were highlighted and a
different method was introduced, the Rasch model. The Rasch model has been used to
successfully investigate the quality of the measure of general clinical distress as evaluated
can be found to work as a single "ruler" of substance dependence, then, the instrument
49
Chapter Three
Methods
Overview
Chapter Three presents the research methodology that was to answer the research
The participants were samples collected from two previous research investigations. The
demographic information of these samples is presented in this chapter. The SASSI-3 will
be reviewed and the procedure by which it was analyzed via the Rasch model will be
outlined.
SASSI-3 using the Rasch model of measurement. The research questions are as follows:
50
Research Hypothesis 1: A Rasch principal components analysis will
produce a uriidimensional factor structure that accounts for 60% or more of the items'
total variance.
the construct?
Research Hypothesis 2: An analysis of item fit will produce infit and outfit
Research Question 3: Are measures from the SASSI-3 reliable for diagnostic
classification purposes?
of internal consistency.
Hypothesis 3b: The SASSI-3 decision rule scales (as evidenced in the
discriminate between those who are substance dependent and those who are not?
51
produce a unidimensional factor structure that accounts for 60 percent or more of the
Research Question 6: Does the whole SASSI-3 adequately measure the substance
dependence construct?
Research Hypothesis 6: An analysis of item fit will produce infit and outfit
statistics indicative of low item error for the SASSI-3 instrument as a whole.
purposes?
to clearly discriminate between those who are substance dependent and those who are
not?
Participants
The participants in this study consist of a total of 3 5 8 adults from two previous
research investigation samples collected from the greater Toledo Area (see Laux, Salyers,
& Kotova 2005 for the study involving the first sample). Institutional Review Board
approval was granted for the first study involving a sample of 230 students, men
accounted for 21.2 percent of the sample (n=49), and women accounted for 78.8 percent
52
large Midwestern university, enrolled in social work or counseling courses (mean number
of years in college was 3.5, SD = 2, range = 0-10, median=4). The sample self-identified
ethnicity included 62.6 percent (n=144) European American, 24.8 percent (n=57) African
American, 3 percent (n=7) Native American, 2.6 percent (n=6) biracial, 1.7 percent (n-4)
Hispanic, .4 percent (n=l) Asian American, and 4.8 percent (n=l 1) did not report (Laux
et al. 2005). The mean age for this sample was 28.1 years (SD=10.4, range=18-59,
median=26).
community agency and court cooperative program designed to assist in reunifying drug
and alcohol abusing parents with their children. The data was collected by the
professionals involved in the daily administration of the program and provided to the
Mental Health Services Administration (SAMHSA). This data is the result of a second
sample that contained a total of 235 adults with 20.9 percent (n=49) men, 77.0 percent
(n=181) women, and 2.1 percent (n=5) did not report. The sample self-identified ethnicity
included 61.3 percent (n=144) European Americans, 24.3 percent (n=57) African
Americans, 3 percent (n=7) Native Americans, 1.7 percent (n=4) Hispanics, 2.6 percent
(n=6) biracial, 0.4 percent (n=l) Asian American, and 6.8 percent (n=16) did not report.
The mean age for this sample was 28 years (SD=11, range 19-59, median=23).
The samples were combined and selected, by utilizing a random numbers table, to
create two groups. The samples were combined to ensure that a portion of each of the
groups contains individuals with problems related to substance abuse necessitating some
53
therapeutic intervention. If the SASSI-3 functions as a measure, these samples should
represent a wide range on the substance dependent ruler. The first group was used for the
initial validation of the SASSI-3; the first purpose of this study. The second group was
used to evaluate the SASSI's independence against the first sample; the second purpose
of this study.
The Substance Abuse Subtle Screening Inventory-3 (Miller & Lazowski, 1999)
was developed to identify individuals who had a high probability of being substance
dependent (Miller & Lazowski). The instrument was first published in 1988, revised in
The SASSI-3 is a paper, pencil, screening instrument printed on both sides of one
page. It is brief, easy to administer and score, and is economical. The front consists of 67
true and false items. The back has 26 items with a rating scale choices 0-3 indicating
never, once or twice, several times, and repeatedly. The front side includes subtle items,
which purportedly indirectly inquire about substance abuse related issues. However,
several of these items directly pertain to past alcohol and drug use.
The developers of the SASSI-3 identified ten scales upon which to measure
individuals for the probability of substance dependence (Miller & Lazowski, 1999).
These ten scales include the Face Valid Alcohol scale (FVA), the Face Valid Other Drug
scale (FVOD), the Symptom scale (SYM), the Obvious Attributes scale (OAT), the
Subtle Attributes scale (SAT), the Defensiveness scale (DEF), the Supplemental Addition
Measure scale (SAM), the Family vs. Control Subjects scale (FAM), the Correctional
54
scale (COR), and the Random Answering Pattern scale (RAP). The FVA and FVOD
scales' items directly question the respondent about his or her alcohol and other drug use.
The SYM scale assesses respondents' symptoms and consequences of drug and alcohol
use. Obvious traits associated with substance use are measured through the OAT scale,
while subtle traits are measured through the SAT scale. The DEF scale is a validity scale
which measures respondents' defensiveness to the SASSI-3's items. The SAM scale is
meant to discriminate between persons whose high DEF scores are due to substance
specific defensiveness from those whose elevated DEF scales are due to some other
source of defensiveness. The FAM scale evaluates the amount that the respondent
focuses his or her own feelings or thoughts on herself or himself versus the feelings or
thoughts of others. The COR scale reports on the similarity of a respondent's scores to a
group of persons known to have a history of criminal behavior. Finally, the RAP is a
validity scale which determines whether a respondent was answering in a random pattern.
If a respondent's RAP score is greater than one, then the respondent's screening
may be invalid. Therefore, prior to scoring, the RAP scale should be reviewed. The
scoring procedures include nine decision rules which are used to determine the likelihood
of substance dependence for the respondent. For each of the scoring rules, should the
substance dependent.
Bogner, Corrigan, & Schmidt, 2001; Clements, 2002; Feldstein & Miller, 2007; Gray,
2001; Laux, Perera-Diltz, Smirnoff, & Salyers, 2005; Laux, Salyers & Kotova, 2005;
55
Lazowski, Miller, Boye & Miller, 1998, Svanum & McGrew, 1995, Sweet & Saules,
2003) have been mixed when compared to the results reported by the authors of the
SASSI (Miller & Lazowski, 1999). Often the findings have not reflected the high levels
instrument's reliability (Bartholomew, 1996; Mark, 1996; Traub, 1994). The two-week
test-retest reliability found for the SASSI by the authors (Miller & Lazowski, 1999) as
1.0 for the FVA and FVGD scales and between .92 and .97 for the clinical scales. This
finding was supported by Laux, Salyers, and Kotova (2005) but challenged by Myerholtz
and Rosenberg (1998). Myerholtz and Rosenberg found .82 and .89 for the FVA and
and Rosenberg (1998) found the FVA and FVOD scales to be .76 and .93 respectively.
With regard to internal consistency, Miller and Lazowski (1999) found that the SASSI
had a .93 coefficient alpha. While the internal consistency finding is less meaningful
because the SASSI was not developed to be a unidimensional instrument, this provides
findings for the face valid scales have been consistent with Miller and Lazowski.
However, Clements (2001) produced only a .49 coefficient alpha for the instrument. This
evaluated in several ways including the content, construct, and criterion referenced
approaches. Lazowski, Miller, Boye, and Miller (1999) found that people who score high
56
on the SASSI also score high on other instruments measuring the same construct such as
the MAST and the MMPI-2 Addiction Potential Scale. Likewise, people who scored low
comparisons of the SASSI to other instruments produced mixed results. For example, the
overall classification agreement findings for the SASSI and CAGE agreement was .49
(Laux, Salyers, & Kotova, 2005) and .61 (Myerholtz & Rosenberg, 1998). When the
SASSI was compared to a modified CAGE, the agreement rate dropped to .58 (Myerholtz
& Rosenberg). The SASSI and MAC agreement was lower still with a .22 agreement
(Myerholtz & Rosenberg) but in another study had a higher agreement rate result at .52
(Laux, Salyers, & Kotova). However, when the SASSI was compared to the MAC-R the
Based on an exploratory factor analysis, the authors of the SASSI (Miller &
Lazowski, 1999) identified a ten factor solution; however, the only other study to
investigate the factor structure of the SASSI was unable to replicate this finding (Gray,
2001). Gray's data factor analysis identified a two factor solution comprised of mostly
the FVA and FVOD items, which accounted for 53 percent of the SASSI-3's total
variance. Two studies have also confirmed the factor structure of the FVA and FVOD
scales respectively (Laux, Salyers, & Kotova, 2005: Laux, Perea-Diltz, Smirnoff, &
Salyers, 2005).
professional with the criteria from the Diagnostic and Statistical Manual IV Text
57
criterion, Lazowski, Miller, Boye, and Miller (1997) reported the accuracy rate for the
later, in the SASSI-3 Manual, Miller and Lazowski (1999) reported a lower but still
acceptable accuracy rate for the SASSI to be 93 percent, sensitivity to be 93.3 percent,
and specificity to be 94.2 percent. However, the results from independent researchers
have again been mixed. Using the same gold standard* Svanum and McGrew (1995)
found the sensitivity to be 33 percent and specificity to be 87 percent for their college
student sample. Five years later using an incarcerated population the results improved
with an overall accuracy rate of 69.4 percent, sensitivity of 73.3 percent, and specificity
of 62.2 percent. Using a traumatic brain injury sample, the overall accuracy rate was
again lower than that found by the SASSI authors at 69.2 percent, sensitivity of 70.8
percent and specificity of 68.5 percent (Arneth et al., 2001). Finally, Clements also found
Variable
variable being investigated. For this study, the variable being evaluated is substance
between those who are likely to be substance dependent from those who are not (Miller
& Lazowski). The authors of the SASSI-3 also asserted that it was not their intention to
comment that the scales measuring homogeneous traits have higher internal
58
consistencies. The high coefficient alpha findings identified by several authors have
Laux, Salyers, & Kotova, 2005; Laux, Perera-Diltz, Smirnoff, & Salyers, 2005; Miller &
Lazowksi). Independent evidence of unidimensionality has been reported for the FVA
and FVOD. The FVA, FVOD, SAT, OAT, and SYM scales and the SAM have been
constructs other than substance dependence, such as validity and additional clinical
issues, are being measured with the FAM, DEF, COR and RAP scales (Miller &
Lazowski, 1999).
Procedures
One of the many advantages of using the Rasch model is that the outputs from the
analysis are in the form of easy to read "pictures". The pictures are graphs and charts
which demonstrate visually the response scales and the "ruler" upon which the items and
people can be aligned. The pictures will be described below as they apply to each step in
the procedure. The following method includes the steps used to evaluate the SASSI-3's
measurement properties.
Steps in conducting a Rasch Analysis. This study will follow the process of Rasch
analysis using the example set by Elliot et al. (2006). When conducting a Rasch analysis,
at each step described below, the person and item separations and reliabilities will be
reviewed for changes and improvements as a guide to determine whether the change was
effective.
59
Step one- Response validation. The purpose of exploring the response validity
first is to establish whether the participants are using the response scales as intended by
the authors of the SASSI-3 (Elliot et al, 2006). In addition, response validation is the first
step in determining whether the items function unidimensionally (Elliot et al). There are
two ways the response options will be validated. The first is by visually reviewing the
probability curves. Each response option should have over .50 probability of being
chosen. The second is by examining the thresholds. Each response option (1 to 2, or, 2 to
3, etc.) threshold should be between 1.4 and 5 units in distance from the next response
option. If the threshold is less than 1.4 or greater than 5 and the probability of being
choose is less than .50, then it is recommended that the response options be revised.
Step two - Item fit analysis. Item fit analysis is a form of construct validation and
a test of unidimensionality. By reviewing the z-standardized score for the cutoff of 2.0
any item over this value or any item with a negative point-biserial value is likely either
redundant or measuring a separate construct than intended. As such, items failing to meet
meaning between these points. The fit analysis will also be conducted for people in the
Step three - Construct analysis. Construct analysis is a test for the Thurstonian
map output. A way to conceptualize the Rasch construct analysis output is to consider a
flag pole as the variable of substance dependence, from less, low on the pole, to more
high on the pole. Flags on the left of the pole are items. Items are arranged from difficult
60
to endorse at the top to easy to endorse at the bottom of the pole. People are arranged on
the right of the pole from possessing more substance dependence at the top to less
substance dependence at the bottom of the pole. The linear measure construct item map is
the variable of substance dependence extrapolated from the instrument. In this way one
can see the degrees of separation along the variable and where the separations are.
Step four - Assess for unidimensionality. The primary way to evaluate the
procedures differ, however, in that the RPCA approach not only provides first order
factor results, but additionally provides the researcher with evidence of the presence of
unsuspected secondary variables, if they exist (Bond & Fox, 2007). If the RPCA is over
60 percent and the remaining residuals do not explain greater than five percent variance,
comparable sample is one in which the researcher would expect to find a wide range of
the construct being measured. For example, if one was interested in measuring the
construct of intelligence, the researcher would generally need a sample that included
whether or not the instrument included items at all points along this continuum. In this
study an appropriate and comparable sample would be composed of persons whose use of
61
substances ranges from none at all to those whose use has progressed to the point where
they are experiencing significant consequences in their lives. Tests of independence will
inform the researcher of whether the meaning of the instrument and the item hierarchy,
ranging from easy to difficult, remains consistent. The resulting person-item map from
the first group is visually compared against the second group's person-item map. These
maps are evaluated for consistency by observing the arrangement of items. That is, if the
items fall in relatively the same point along the hierarchy difficulty continuum for both
groups, then the researcher can conclude that the measure is sample independent.
dependence and non-dependence (specific research questions six and seven) was
identified through the use of the person-item map. The individuals in the second group
was coded according to the traditional SASSI-3 decision rule (high probability vs. low
probability of substance dependence), Using the person-item map, these people was
Limitations
Broadly, the limitations of this study are associated self-report instruments as well
One limitation of this study is that the SASSI-3 is a self report instrument. Self
report, generally speaking, is one of the easiest ways to collect information on a construct
of interest. Respondents often answer items in socially desirable ways (Donaldson &
manner that artificially minimizes (faking-good) or maximizes (faking bad) the severity
62
of their presenting issues. Such response styles may be a particular concern regarding
substance dependence screening due to possible secondary gains from results that are
positive or negative. Although the SASSI-3's authors purport to have lirnited the impact
faking good or bad, initial evidence suggests that, when instructed to do so, college
students can fake-good and fake-bad on this instrument (Burck, Laux, Harper, & Ritchie,
2009). As such, self-report may be a potential limitation for this study in that the
A second limitation is the utilization of the Rasch model. Despite its multiple uses
and high reputation for instrument validation, some do offer critiques against the Rasch
model. These critics state that Rasch model analysis is not a theory building method as is
factor analysis and that the Rasch model theory is too simplistic (Bond & Fox, 2007).
According to the Rasch model, the theory drives the development of the instrument. This
it is ineffective to utilize Rasch analysis as the Rasch model only works for
unidimensional instruments (Kubinger, 2005). Since the SASSI was developed using the
Diagnostic and Statistical Manual (APA, 2000) criteria and was based on the
addition, according to the SASSI-3 Manual (Miller & Lazowski, 1999), the SASSI-3
63
unidimensional measure, as seen above in the reliability studies, the instrument or some
Another criticism involves the SASSI-3 's scoring procedures. When two
individuals' raw scores are compared, some researchers may report a person's ability in
an invalid manner (Kubinger, 2005). This can happen when two people have the same
raw total score yet one person (person A) correctly answered the first ten easiest
questions on a 2 5-question test but the second person (person B) correctly answered the
ten hardest questions on the same test. Both raw scores equal ten, yet, person B was able
to answer more difficult questions than person A. Therefore, the scoring may not
necessarily be correct. One way researchers using the Rasch model can rectify this
problem is by carefully analyzing the total scores versus high score prior to publishing or
In Rasch analysis, the data must fit the model as opposed to factor analysis in
which researchers can adjust the model to fit the data. An additional limitation will be
that the data do not fit the model. This means that the instrument is not a measure of
substance dependence. However, this limitation will be unknown until after the analysis.
Finally, as Bond and Fox (2007) point out, "critics argue that we can't physically
align bits of the human psyche together to produce measures, as we can with centimeters
to make meters" (p. 6). This means that a string of substance dependence units cannot be
64
Chapter Four
Results
This chapter presents the results of the Rasch analysis on the archived data from a
study of the SASSI-3 on two samples of adults. The samples were a combination of two
samples from previous research. One sample was taken from a study involving a
community cooperative program with child protective services and family court, and the
other sample was taken from a study involving adults from a large metropolitan
university. Using a random numbers table the samples were divided into two groups and
then one group from each sample was combined to create a dichotomous sample. Both
groups contained a dichotomous sample which included some of the sample from the
community cooperative program with the child protective services study and family court
and some of the sample from the university study. These groups were referred to as
Group 1 and Group 2 throughout the course of the study. Group 1 consisted of 174
participants, men accounted for 23.6 percent (n=41) of the sample, and women accounted
for 76.4 percent (n=f 133). This sample self-identified ethnicity included 58 percent
(n=101) European American, 26.4 percent (n=46) African American, 1.7 percent (n=3)
Native American, 2.3 percent (n=4) biraeial, and 2.3 percent (n=4) Hispanic. Group 2
65
consisted of 175 participants, men accounted for 20 percent (n=35) of the sample, women
accounted for 79.4 percent (n=139), one person did not report his/her sex. This sample
self-identified ethnicity included 63.4 percent (n=l 1.1) European American, 22.3 percent
(n=39) African American, 2.3 percent (n=4) Native American, 2.3 percent (n=4) biracial,
3.4 percent (n=6) Hispanic, and 1.1 percent (n=2) Asian American.
The initial person and item separation and reliabilities fof the FVA scale were
2.51I.%1 and 7.47/.9S, respectively. The minimum acceptable standard for separation is
2.0 (Wright & Masters, 1982). A separation of 2.0 translates statistically into 3 strata.
This means that the FVA's scale's reliability is excellent and its ability to distinguish
differences in the people is good. In this case, the FVA can be said to be a linear
whether improvements could be made, the researcher conducted analysis of the FVA
scale's response options, items, and underlying factor structure. Step one of the Rasch
analysis involved evaluating how the respondents were using the response options. Each
of the FVA scale's twelve items include four choices of responses to which respondents
can indicate the frequency to which they engage in the item's behaviors. These response
options and corresponding point value are: 0-Never, l-Once or Twice, 2-Several Times,
revealed that the respondents used all response options as expected by the authors of the
66
;
Figure 1
MISSING 5 0 1 -3.57 | || | |
+ ; •-.--
R 1.0 + +
O 100000 3|
B I 000 3333 I
A I 000 333 |
B .8 + 00 33 +
I I 0 ( 33 |
L | 00 3 |
I I 0 33 |
T .6 + 0 3 +
Y I 0 22 3 |
.5 + 0 222 222 33 +
0 1 0 22 *2 |
F . 4 + 0 2 3 22 +
I 111*1**1 3 22 |
R | 11 * 11 33 22 |
E' I 111 2 00 11 3 2 |
S .2 + 11 22 0 3*1 222 +
P I 111 2 0 3 11 222 |
O. I 1111 222 33*00 111 222 |
N 111111 2222 3333 000 11111 221
S .0 +************3333333333 000000000**************+
This failure suggested that the sample did not reliably distinguish between option l-0nce
or Twice and the next adjacent category, 2-Several times. However, because all of the
other response options appeared to work as intended and because no improvements were
found in the person and item separation and reliabilities after collapsing strategies were
attempted, no changes to the response scale were made at this time (see Table 2 for
68
Table 2
Rating Probability
Threshold2 P S & R IS&R RPCA
1
Scale Curve
0 = 0.95
0-1 = N/A
1=0.40
0,1,2,3 1-2 = 5.41 2.65A87 7.81/.98 94.8%
2 = 0.60
2-3 = 22.67
3 =0.90
0 = 0.95
0-l=N/A
0,1,1,2 1=0.80 2.48A86 7.50/.98 95.3%
1-2 = 24.00
2 = 0.95
0 = 0.95
0-l=N/A
0,0,1,2 1=0.80 1.99/.80 5.56A97 92.4%
1-2 = 44.00
2 = 0.95
0 = 0.95
0-1= N/A
0,1,2,2 1=0.40 2.40/.85 7.99A98 97.3%
1-2 = 6.76
2 = 0.95
Note. 1 = >/ .5 is acceptable. 2 = >/1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation <
69
In an effort to further improve the FVA's separation and reliability results, the
researcher inspected the FVA items and respondents for fit. Item and people are
considered to fit if the z-standardized score is less than 2.0 and the point-biserial is not
negative. If items or people are outside of these cutoffs, they are considered to be misfits
and should be considered for possible elimination. This inspection lead to a final iterative
elimination of twelve people. No items failed to meet the standards set forth for item fit.
The elimination of misfitting people resulted in the final person and item separations and
reliabilities of 2.65A87 for persons and 7.81/98 for the FVA scale. These separations and
reliabilities are improvements from the initial findings, which suggested a well defined
linear construct that accurately measures the people. The FVA scale is divided into ten
levels of difficulty and discriminates among nearly four groups of people ranging from
The third step in the analysis involved a review of the person-item map (Figure 2)
70
Figure 2
FVA Gl Item-map
INPUT: 174 Persons 12 Items MEASURED: 174 Persons 12 Items 4 CATS
80
##
T FVA12-commit suicide
70
# T FVA9-effects recur
60 + FVAll-Nervous/shakes
##
FVA6-trouble
FVAlO-relationship
50 # +M
FVA8-argued
.# FVA3-energy FVA7-depressed
.# FVA1-lunch
#
FVA2-feelings
40 . # # •
#
####
#### M FVA5-physical probs
30
### FVA4-intended
#####
20 ###### +
S
10
.#####
0 .############ +
<less>|<frequ>
EACH '#* IS 2.
71
The resulting hierarchy of items resulted in a pattern from most difficult items to endorse
to least difficult items to endorse. When two items are aligned at the same place on the
hierarchy, the items are either theoretically redundant or at the same level of difficulty.
Overfitting items can be eliminated, if the infit mean-square is below 0.6 and the z-
standardized score is -2.0 or less. Despite appearing to be measuring the same theoretical
content, one of the items from the aligned group of FVA3/FVA7 could be eliminated
because the items in this combination fall within the item fit standards and are at the same
level of difficulty. Although it is the case both items remained because neither misfit.
Group l's FVA hierarchy is displayed visually on Table 2. The initial Rasch principle
components analysis (RPCA) indicated that 91.9 percent of the total variance was
explained by the instrument. With the elimination of twelve misfitting people the RPCA
increased to 95.1 percent of the total variance having been explained by the instrument,
which demonstrated improvement in the FVA. However, the item/person map means and
standard deviations were separated by nearly a standard deviation indicating that the
items were more difficult to endorse than the people were able to agree to them. An
example of this may be like a spelling bee. In this scenario the students would be third
grade level spellers and the words would be tenth grade spelling words. The words would
In the final step of the analysis, the extrapolated variable was compared using the
data from a second comparable group using the same process. The FVA scale, using
Group 2 data, demonstrated similar person and item separation and reliability results as
were produced in the analyses of the first data set. While no changes were needed to the
72
response options, option 1-Once or Twice, as was reported from analysis from the first
data set, only met the probability curve at 0.4. (See Figure 3 for response curve and Table
Figure 3
MISSING 2 01-38.61 | || | . |
+
OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.
+ — ; • • :
R 1.0 + +
0 | 33 1
B |00 333 |
A | 000 333 |
B .8 + 0 3 +
I I 00 33 |
L | 0 3 |
I | 0 2222222 33 |
T .6 + 0 22 2 3 +
Y | 00 22 22 3 |
.5 + 0 2 23 +
O | 011111112 322 |
F .4 + 110 211 3 2 +
| 1 0 2 1 33 2 |
R I 11 002 11 3 22 |
E | 11 20 1 3 2 |
S .2 + 11 22 0 11 33 22 +
P | 11 2 00 3*1 222 |
O 1111 22 00 33 111 222 |
N | 2222 333**0 11111 22|
S .0 +********3333333333333 000000000000****************+
E ++ + + + +--• + + —+ + ++
-40 -30 -20 -10 0 10 20 30 40 50
Person [MINUS] Item MEASURE
73
Table 3
Rating Probability
Threshold2 P S & R IS&R RPCA
1
Scale Curve
0 = 0.90
0-l=N/A
1 = 0.45
0,1,2,3 1-2 = 15.15 2.787.89.7.767.98 98.6%
2 = 0.65
2-3 = 19.12
3 =0.95
0 = 0.90
0-l=N/A
0,1,1,2 1=0.90 2.71/.88 7.50/.98 99.4%
1-2 = 57.32
2 = 0.90
0 = 0.95
0-l=N/A
0,0,1,2 1=0.60 2.19/.83 5.52A97 99.7%
1-2 = 29.38
2 = 0.95
0 = 0.90
0-l=N/A
0,1,2,2 1=0.50 2.38A85 8.45A98 97.9%
1-2=11.88
2 = 0.90
Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &
74
After the iterative elimination of 23 misfitting people, the final person and item
items failed to meet the cutoff for item fit, therefore none were eliminated. Two items
(FVA1 and FVA2) were aligned at the same place on the hierarchy and neither met the
statistical standards for item overfit. Therefore, they are measuring unique qualities. And,
neither can be eliminated. The final RPCA for the scale was also comparable at 98.6
percent of the variance being accounted for by the items. As was presented for the FVA
for Group 1, the hierarchy of FVA item endorsement difficulty for Group 2 is also
presented on Figure 4.
75
Figure 4
76
A side-by-side comparison of the Groups' respective item-endorsement difficulty
indicated that ten of the twelve items remained constant on the hierarchy across Groups is
presented on Table 4. Of the three items found to be the most difficult items to endorse
by both Groups, the second and third most difficult were interchanged. Again, the items
fell into ten levels of difficulty, and the scale distinguished nearly four groups of people
77
Table 4
Item Hierarchy
Group 1 Group 2
Difficult to endorse
Easy to endorse
78
Face Valid Other Drug Scale (FVOD)
The initial person and item separation and reliabilities for the FVOD scale were
2.50/.86 and 4.69A96, respectively. Additionally, the initial RPCA was 83.9. Viewing
these findings collectively, the FVOD can be said to be a linear construct which
possible, the response options, items, and the underlying factor structure were explored.
At each step the person and item separation and reliability were reviewed as a way to
Like the FVA scale, the FVOD scale has the same four response options; 0-
Never, 1-Once or Twice, 2-Several Times, 3-Repeatedly. Unlike the FVA scale the
FVOD response scale did not seem to function as well. Inspection of the probability
curve and thresholds (Figure 5) indicated that option 1-Once and Twice and 2-Several
Figure 5
IMISSING 8 1| -3.46 | I | I |
+ •
OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.
79
Figure 5 (Continued)
+-
CATEGORY STRUCTURE I SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. ZONE 1 PROBABLTY | M->C C->M|DISCR|
R 1.0 + +
O 0000 3333
B 00000 33333
A 0000 333
B .8 + •00 333
I 00 33
L 00 33
I 00
T .6 + 33
Y
.5 + 00 33
.4 +
0 3
111100 **222222
11111 ***1 2222
.2 + 1111 222 3 0111 2222
1111 222 33 00 111 2222
111111 222 333 000 1111 222222
111111 2222222 3333 0000 111111 2222
^Q +************33333333 0000000*************+
Specifically, the calibration thresholds reflected the respondents' misuse of the response
options 1 and 2, which were both below the 0.2 on the probability curve. Respondents
were not reliably distinguishing between option 1-Once or twice and 2-Several times.
Considering the logic behind the options and the statistical evidence provided by the
thresholds, the researcher decided to use a collapsing strategy of combining the middle
two response categories in an effort to improve the FVOD's functioning. That is, the
researcher reanalyzed the data using 0,1 and 2, and 3 and the three response options.
This change produced an improvement in the person separation and reliability and a
80
minor decrease in the item separation and reliability scores of 2.61/.87 and 4.43A95,
respectively. The RPCA conducted after response options 1 and 2 were combined
decreased from 83.9 percent to 76.8 percent of the total variance accounted for. Despite
this decline, the final value was still above the minimum accepted standard. Further
evaluation demonstrated that the three option response scales seemed to work better in
this model because all response options exceeded the statistical standard cutoff ranges of
.50 on the probability curves and the threshold should be more than 1.4 units in distance
from the next response option, despite the decrease in item separation and reliability and
Figure 6
MISSING 8 1| -5.65 | || | |
:
+
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.
+
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. -—-ZONE |PROBABLTY| M->C C->M|DISCR|
81
Figure 6 (Continued)
CATEGORY PROBABILITIES: MODES - Structure measures at intersections
R 1.0 + +
0 I , 1
B |000 222|
A I 0000 2222 |
B .8 + 000 ' .. " 222 +
1 I 00 22 |
h I 00 22 |
•I | • • 00 22 |
T .6 + 00 22 +
Y | 00 111 22 |
.5 '+ 00 11111 11111 22 +
O I 1*1 1*1 I
F .4 + 111 00 22 111 +
| 11 00 22 11 |
R I 111 00 22 111 I
E | 111 0*2 111 I
S .2 + 111 22 00 111 +
P I 1111 222 000 1111 |
O 1111 2222 0000 111|
N | 2222222 0000000 |
S .0+22222222222222 00000000000000+
No other collapsing strategy met the threshold and probability statistical standards for
82
Tables
Ratine Probability
Threshold2 PS & R IS&R RPCA
1
Scale Curve
0 = 0.95
0-l=N/A
1=0.30
0,1,2,3 1-2 = 1.27 2.687.88 5.05/.96 88.3%
2 = 0.35
2-3 = 4.36
3 =0.95
0 = 0.95
0-l=N/A
0,1,1,2 1=0.60 2.78/.89 4.777.96 84.3%
1-2 = 20.46
2 = 0.95
0 = 0.95
0-l=N/A
0,0,1,2 1=0.35 2.27A84 4.11/.94 82.5%
1-2 = 0.02
2 = 0.95
0 = 0.95
0-l=N/A
0,1,2,2 1=0.30 2.56A87 5.24A96 88.9% i.
1-2 = 2.16
2 = 0.95
Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &
83
Step two of the Rasch instrument validation analysis involved reviewing the
person and item fit. Inspection of the items and persons lead to a final iterative
elimination of twelve people whose responses were inconsistent. No items failed to meet
the standards set forth for item fit. The item fit standards include the z-standard score
being below 2.0 and a positive point-biserial. The elimination of these twelve people
resulted in final person and item separations and reliabilities of 2.78A87 for persons and
4.77A96 for the FVOD scale. These findings suggest a well defined linear construct that
reliably distinguishes differences among the people. The FVOD scale can be divided into
The third step in the analysis was to review the person-item map (Figure 7) to
84
Figure 7
90
80 T+
FVOD9-doctor
70 +T
I
I
I
FVOD7-trouble w/law
60 . +S
I FVOD3-more aware
I
I FVOD12-avoid withdrawal FVOD14-treatment program
FVOD4-sex
. 'I
• . • I
50
. +M
• I
I FVOD8-really stoned
I FVODl-improve thinking FVOD5-help
. M| FVODlO-activities
I FVOD13-life FVOD6-forget
40
.# • +S FVOD2-feel better
. 1
I FVODll-aod
# I
I
30 .# +T.
I
# I
# S|
I
20 +
. I
I
10 .########### +
<less>I<frequ>
EACH '#' IS 6.
85
The hierarchy of items resulted in a pattern from most difficult items to endorse to least
difficult items to endorse. When two items are aligned at the same place on the hierarchy,
the items are either theoretically redundant or at the same level of difficulty. In Rasch this
means the items overfit. Overfitting items can be eliminated if the infit mean-square is
below 0.6 and the z-standardized score is -2.0 or less. Despite appearing to be measuring
the same theoretical content, one of the items from each of the aligned groups of
all of items in these combinations fall within the item fit standards and are at the same
level of difficulty. Group 1 's item hierarchy is visually displayed on Table 7. The final
Group 1 RPCA indicated that 84.3 percent of the total variance was explained by the
scale. This improvement in this scale was achieved by adjusting the response scale and
eliminating the twelve misfitting people. In addition, the item/person map means and
standard deviations (Figure 6) were close in proximity, indicating that the difficulty of the
items were similar to the ability of the people. It should be noted that while the means
were close in proximity, it appeared that only the most extreme people were identified on
the FVOD scale, while the majority of the sample were at the bottom of the scale. This
may be due to the fact that only a small number of people in the sample were found to be
The extrapolated variable, from Group 2's data, was compared to the variable
constructed using the data from Group 1, using the data from Group 2 using the same
process. The FVOD scale, using Group 2 data, demonstrated similar person and item
separation and reliability findings. As with Group 1, the response options were not being
86
used as intended by the authors of the SASSI-3. By reviewing the thresholds and
probability curves the following collapsing strategy was developed (see Table 6). The
two middle response options, 1-Once or Twice and 2-Several times, were combined. This
allowed for a better functioning response scale and an increase in the person and item
87
Table 6
Rating Probability
Threshold2 P S & R IS&R RPCA
1
Scale Curve
0 = 0.95
0-l=N/A
1=0.30
0,1,2,3 1-2=1.24 2.827.89 4.777.96 88.7%
2 = 0.45
2-3 = 13.94
3 =0.95
0 = 0.95
0-l=N/A
0,1,1,2 1=0.65 2.97/.90 4.19/.95 84.3%
1-2 = 28.18
2 = 0.95
0 = 0.95
0-l=N/A
0,0,1,2 1=0.45 2.49A86 4.21/.95 89.8%
'1-2*11.36
2 = 0.95
0 = 0.95
0-l=N/A
0,1,2,2 1=0.30 2.65A88 4.83A96 90.5%
1-2 = 3.18
2 = 0.95
Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &
88
After the iterative elimination of thirteen misfitting people the final person and item
items failed to meet the cutoff for item fit, therefore no items were eliminated. The final
RPCA for the scale was also comparable to that of Group 1 's RPCA as 84.3 percent of
the variance was accounted for by the FVOD scale's items. This means that the FVOD
scale using 14 items can be separated into about six groups and identifies about four
groups of people reliably. As was reported for the FVOD for Group 1, the hierarchy of
89
Figure 8
. T
80
.#
FVOD9-doctor
70 # +T
. S
#
60 +s
FVOD7-trouble w/law
FVOD12-avoid withdrawal FVOD3-more aware
FVOD14-treatment program FVOD4-sex
50 +M
FVOD5-help
FVOD13-life FVOD8-feally stoned
FVODlO-activities
FVODll-aod
SI
30 .#. +T
20 # +
10 .############ +
<less>|<frequ>
EACH '#' IS 5.
90
Four pairs of items were aligned on the variable FVOD12/FVOD3, FVOD14/FVOD4,
FVOD13/FVOD8, and FVOD1/FVOD6. All of the items met the statistical standards for
item fit and appear to measure different content. One item from each of the pairs can be
difficulty indicated that eight of the scales fourteen items remained constant on the
hierarchy across Groups (see Table 7). The two items found most difficult to endorse by
both Groups on the hierarchy and the three items found to be the easiest to endorse by
both Groups remained consistent. However, the items around the means were not aligned
across Groups.
91
Table 7
Item Hierarchy
Group 1 Group 2
Difficult to endorse
Easy to endorse
92
Symptoms Scale (SYM)
The initial review of the person and item separations and reliabilities findings
indicated 1.28/.62 for persons and 4.90/.96 for items of the SYM scale respectively. This
means that the scale can be divided into six groups of items in terms of difficulty, but
does not measure the people in a reliable way. While the person separation does not meet
the standard of 2.0, a person separation of 1.28 suggests that we may marginally
i
distinguish between two groups of people with significant error. The purpose of the SYM
scale is to distinguish between two groups, those who have a high probability of
Typically, the standard first step in conducting a Rasch analysis is to evaluate the
scale's items' range of responses. However, the SYM scale's items, as well as those of all
the other SASSI-3 scales, have only two response choices: true or false. Dichotomous
response options have an equal probability of being selected. Therefore, the review of the
response scales for the SYM and all subsequent scales was unnecessary. Step two of the
Rasch instrument validation analysis involved reviewing the person and item fit. Further
evaluation was conducted in an effort to improve this scale's separation and reliability
results. The researcher reviewed the fit statistics for the items and persons. This review
lead to a final iterative elimination of twelve people and two items that failed to meet the
standards set forth for item fit. These eliminations resulted in a decrease in function for
the scale as evidenced by the final person and item separations and reliabilities of
1.16/.57 for persons and 6.65/.98 for the remaining eight items on SYM scale. This
93
suggests a reasonably well defined linear construct but the construct does not do a good
The third step in the analysis involved a review of the person-item map to explore
the extrapolated construct. The resulting hierarchy of items resulted in a pattern from
94
Figure 9
<more>|<rare>
80
,####
Q55 MORNING
70
.########
60 ####### +
Q35 MEMORY
Q54 NEGLECTED
50 +M
######### Q56 TEENAGER
.#########
30
20 ############ +
10 .########## +
<less>|<frequ>
EACH '#' IS 2.
95
Group 1 's item hierarchy is visually demonstrated in Table 8. The initial Rasch principle
components analysis (RPCA) indicated that 63.7 percent of the total variance was
explained by the instrument. With the eliminatibn of one item (Q60 [Drink away from
home]) from the SYM scale the RPCA increased to 92.9 percent of the total variance
having been explained by the instrument. This demonstrated improvement in this scale by
eliminating misfitting items. In addition, the item/person map means and standard
deviations were close in proximity indicating that the items were as difficult to endorse as
The extrapolated variable was compared using the data from Group 2 by
following the same process in the analysis of the measurement function of the scale.
After the iterative elimination of sixteen misfitting people and two items, the final person
and item separation and reliability findings increased to 1.33A64 and 7.23A98
respectively. No items failed to meet the cutoff for item fit, therefore no items were
eliminated. The final RPCA for the scale was also comparable at 96.6 percent of the
variance being accounted for by the items. This means that the SYM scale items can be
divided into roughly ten groups. And, because the separation of people is not greater than
2.0 the SYM scale does not reliably distinguish between those with a high probability of
substance dependence and those who do not. As was presented for the SYM for Group 1,
the hierarchy of SYM item endorsement difficulty for Group 2 is also presented on
Figure 10.
96
Figure 10
#######
80
f##
70 S+
Q58 INTO TROUBLE
Q54 NEGLECTED
60 .## + Q59 FAMILY PROBLEMS
,######
50 +M
Q56 TEENAGER
#####
40 Q42 TOO OFTEN
Q40 REMEMBER
#########
30
20 S+
#########
10
0.
-10
<less>|<frequ>
EACH '#' IS 3.
97
No items were aligned on the item map for Group 2. A side-by-side comparison of the
Groups' respective item-endorsement difficulty indicated that seven of the scales nine
items remained constant on the hierarchy across Groups (see Table 8).
98
Table 8
Item Hierarchy
Difficult to endorse
Number
Easy to endorse
Eliminated items:
Q60 Drink away from home Q60 Drink away from home
99
Obvious Attributes Scale (OAT)
The initial review of the person and item separations and reliabilities findings
indicated 1.16/.57 for persons and 4.89A96 for items of the OAT scale, respectively. Like
the SYM, the OAT scale has only true and false possible response options. Therefore,
these response options had an equal probability of being selected. Based on this, the
review of the response scales was unnecessary. Step two of the Rasch instrument
validation analysis involved reviewing the person and item fit. Further evaluation was
conducted in an effort to improve this scale's separation and reliability results. Inspection
of the items and persons lead to a final iterative elimination of seven people whose
responses were inconsistent. No items failed to meet the standards set forth for item fit,
and therefore, none were eliminated. The elimination of these seven people resulted in
the final person and item separations and reliabilities of 1.20/.59 for persons and 5.15/.96
for the twelve items on OAT scale. These findings suggested a reasonably well defined
linear construct which can be divided between seven groups of items. However, the
construct does not reliably distinguish any characteristic differences within the people
The third step in the analysis involved a review of the person-item map to explore
the extrapolated construct. The resulting hierarchy of items resulted in a pattern from
most difficult to endorse to least difficult to endorse. When two items are aligned at the
same place on the on the hierarchy, the items are either theoretically redundant or at the
same level of difficulty. This means the items overfit. By way of reminder, overfitting
items can be eliminated if the infit mean-square is below 0.6 and the z-standardized score
100
is -2.0 or less. Despite appearing to be measuring the same theoretical content, one of the
items from the aligned group of Q20 and Q54 can be eliminated because each item in this
Combination falls within the item fit standards and is at the same level of difficulty. A
visual representation of the item hierarchy can be viewed in Table 9. The initial Rasch
principle components analysis (RPCA) indicated that 53.6 percent of the total variance
was explained by the instrument. However, there was also an indication of three
constructs which may point to the construct being multidimensional. With the elimination
of misfitting people from the OAT scale, the RPCA increased to 60.3 percent of the total
variance having been explained by the instrument with the three underlying contrasts
remaining. This demonstrated a minimal improvement in the OAT scale and is just within
the RPCA range of acceptability. In addition, the item/person map means and standard
deviations were separated by three items between the person mean and item mean and
one item separating the upper standard deviation and two separating the lower standard
deviation between the persons and items. These distances indicate that the items were
more difficult to endorse than the people were able to agree to them (see Figure 11).
101
Figure 11
T Q23 clever
70 .# +
Q17 respectful
. # • #
60
.#######
Q20 disapproval Q52 resentful
.##########
Q7 hot lived
#########
40 Q48 punished
Q4V Police
###########
30 .######## +
##########
20 ## +
<less>I<frequ>
EACH '#' IS 2.
102
The extrapolated variable was compared using the data from a second comparable
group using the same process in the final step of the analysis. The OAT scale, using
Group 2 data demonstrated similar person and item separation and reliability findings.
After the iterative elimination of eight misfitting people and one item which failed to
meet the standards for item fit, the final person and item separation and reliability
findings increased to 1.211.62 and 5.83A97 respectively. Item Q48 (Rarely punished) and
Q7 (Not lived) were aligned at the same place on the variable which implied item
redundancy. Item Q7 overfit, meaning it met the statistical standards for item elimination.
However, this elimination reduced the person separation and reliability findings, while
the item separation and reliability finding remained relatively constant to 1.09/. 5 4 and
5.81/.91 respectively. Because of this reduction in the person separation and reliability
findings, Q7 was retained. The final RPCA for the scale was also comparable at 73.5
percent of the variance being accounted for by the items. The variance accounted for in
RPCA for Group 2 was substantially higher than the RPCA for Group 1 (a difference of
13.2%). This means that the SYM scale items can be divided into eight groups of
difficulty but it cannot reliably distinguish any differences among the group of people. As
was presented for the OAT for Group 1, the hierarchy of OAT item endorsement
103
Figure 12
IT Q23 clever
t
i
70 +
# T|
1
1 Q17 respectful
1
1
IS
1
1
.### 1
60 +
1
•
1
1
SI.
. .###### 1
1 Q53 responsibilities
1 Q20 disapproval
1 Q52 resentful
i
1
50 +M
.##### Q39 law
1
1
i
I
1
1 Q19 leave home
######## 1
1
M|
i
40
Q48 punished iQ7 not lived
.#########
i
IS Q4 Police
1
I
######### I
30 +
SI
I Qll sitting still
I
I
I
I
IT
####### I
20 .## +
<less>|<frequ>
EACH ' # ' IS 3 .
104
A side-by-side comparison of the two Groups' respective item-endorsement difficulty
indicated that six of the scale's twelve items remained constant on the hierarchy across
Groups (see Table 9). Five of the six consistent items were found by both Groups to be
105
Table 9
Item Hierarchy
Difficult to endorse
Easy to endorse
Eliminated items:
106
Subtle Attributes Scale (SAT)
Upon initial review of the person and item separations and reliabilities, the
findings indicated .45/. 17 for persons and 7.52A98 for items of the SAT scale,
respectively. The SAT items can be reliably divided into 10 levels of difficulty. However,
the SAT scale distinguishes no differences among the people. Because the SAT scale has
only dichotomous response options, the review of the response scales was unnecessary.
Step two of the Rasch instrument validation analysis involved reviewing the person and
item fit. Further evaluation was conducted in an effort to improve this scales separation
and reliability results. Inspection of the items and persons lead to a final iterative
elimination of seven people whose fit statistics did not meet the 2.0 z-standardized or
negative point-biserial values. Twelve people were eliminated for misfitting. No items
failed to meet the standards set forth for item fit, and therefore, none were eliminated.
This resulted in the final person and item separations and reliabilities of .49/.20 for
persons and 6.72A98 for the eight items on SAT scale. These findings represented a slight
increase for person separation and reliability but a decrease in item separation. While
these results suggest a reasonably well defined linear construct, the construct fails to
The third step in the analysis involved a review of the person-item map to explore
the extrapolated construct. The resulting hierarchy of items resulted in a pattern from
107
Figure 13
S Q61
70
Q18
60 .##.## +
Q50
50 +M
.#########
Q4 9
40 M+
############
30
Q6
######## S
Q44
20 Q28
.###
10
<less>|<frequ>
EACH •#' IS 4.
108
The initial Rasch principle components analysis (RPCA) indicated that 68.2 percent of
the total variance was explained by the instrument. When the misfitting people were
eliminated from the SAT, the RPCA increased to 92.8 percent of the total variance
having been explained by the instrument with no underlying contrasts remaining. This
change in variance accounted for an improvement in the SAT scale. In addition, the
item/person map means and standard deviations were separated by several items. This
separation indicates a great distance between the difficulty of the endorsability of the
items and the agreeability of the people. However, these items appeared to span the entire
The extrapolated variable was compared using the data from a second comparable
group using the same process in the final step of the analysis of the scale. The SAT scale,
using Group 2 data, demonstrated similar person and item separation and reliability
findings. After the iterative elimination of eleven misfitting people, the final person and
item separation and reliability findings increased to .62A28 and 8.32A99 respectively. No
items failed to meet the cutoff for item fit; therefore, no items were eliminated. The final
RPCA for the scale was also comparable at 92.9 percent of the variance being accounted
for by the items. As was presented for the SAT for Group 1, the hierarchy of SAT item
109
Figure 14
Q32
80
70 T+
Q18
####
60
Q50
50 +M
#######
Q4 9
40
.###########
30 + Q6
.##### S
Q44
20
Q28
.##
10
<less>I<frequ>
EACH •#' IS 5.
110
A side-by-side comparison of the Groups' respective item-endorsement difficulty
indicated that six of the scales eight items remained constant on the hierarchy across
Groups (see Table 10). The two items found to be the most difficult to endorse by both
Groups were interchanged. This means that the SAT scale can be divided into eleven
levels of difficulty. However, the scale does not discriminate differences among the
Group 2.
Table 10
Item Hierarchy
Group 1 Group 2
Difficult to endorse
Easy to endorse
111
Supplemental Addiction Measure (SAM)
The initial review of the person and item separations and reliabilities findings
indicated 1.06/.53 for persons and 3.01/.90 for items of the SAM scale, respectively. This
means that the scale does a good job of distinguishing the levels of items into four levels
of difficulty but it does not meet the minimum standard of a separation of 2.0 for people.
The scale does not differentiate any groups of people in this Group. Due to the SAM
having only true and false possible responses, the review of the response scales was
unnecessary. Step two of the Rasch instrument validation analysis involved reviewing the
person and item fit. Further evaluation was conducted in an effort to improve this scale's
separation and reliability results. Inspection of the items and persons lead to a final
iterative elimination of the 21 people and the one item that failed to meet the standards
set forth for person and item fit. This elimination process resulted in the final person and
item separations and reliabilities of 1.32/.64 for persons and 4.17/.95 for the remaining
ten items on the SAM scale. This suggests a reasonably well defined linear construct.
However, the construct does not discriminate among the Group reliably.
The third step in the analysis involved a review of the person-item map to explore
the extrapolated construct. The resulting hierarchy of items resulted in a pattern from
112
Figure 15
. # •
#########
50 M+M Q54 NEGLECTED
S Q4 6 UNDESIRABLE
40
Q13 WORNOUT
S
########
30
T
##
20
<less>|<frequ>
EACH '#' IS 2.
113
When two items are aligned at the same place on the hierarchy, the items are either
theoretically redundant or at the same level of difficulty. In Rasch this means the items
overfit. Overfitting items can be eliminated if the infit mean-square is below 0.6 and the
theoretical content, one of the items from each of the aligned groups of Q16/Q9, Q42/Q7
and Q29/Q40 can be eliminated because all of items in these combinations fall within the
item fit standards and are at the same level of difficulty. A visual representation of the
hierarchy for Group 1 's SAM scale items is demonstrated in Table 11. The initial Rasch
principle components analysis (RPCA) indicated that 26.8 percent of the total variance
was explained by the instrument. Additionally, five underlying contrasts were indicated.
With the elimination of 21 people and one item (Q5 [Made mistakes]) from the SAM
scale, the RPCA increased to 47.4 percent of the total variance having been explained by
the instrument and a reduction in the remaining contrasts to four. This demonstrated some
improvement in this scale due to the elimination of the misfitting items and people.
However, the minimal accepted standard for RPCA is greater than or equal to 60 percent.
Even with the improvements the SAM does not appear to function as a linear construct
The extrapolated variable was compared using the data from a second comparable
group using the same process in the final step of the analysis. The SAM scale, using
Group 2 data, demonstrated similar person and item separation and reliability findings.
After the iterative elimination of twenty-three misfitting people and two items (Q16 and
Q5) that failed to meet the standards for item fit, the final person item separation and
114
reliability findings increased to 1.42/.69 and 5.45A97 respectively. The final RPCA for
the scale was also comparable at 71.6 percent of the variance being accounted for by the
items with no underlying contrasts. As was presented for the SAM for Group 1, the
hierarchy of SAM item endorsement difficulty for Group 2 is also presented on Figure
16.
Figure 16
SIS
1 .Q9 DAYDREAM
. 1
1
60 +
########## 1
1
1
1 Q39 BROKEN LAW
1 Q54 NEGLECTED
.##### . 1
50 +M
M|
i
1
######### 1
I Q4 8 PUNISHED
I Q42 TOO OFTEN
I
I
40 #### + Q4 TROUBLE Q40 COULDN'T REMEMBER
IS
I
SI
###### I
30
I Q13 WORNOUT
.##### I
|T
20 •# + ••
<less>I<frequ>
EACH '#' IS 3.
115
The item combination of Q46/Q40 was aligned at the same place on the variable. Neither
of the items fit both statistical standards for item elimination and neither seemed to be
measuring the same theoretical content. In addition, elimination of Q40, which had the
highest overfit statistics did not improve the scale. This change produced a person
separation and reliability and RPCA decrease to 1.32/.64 and 70.2 percent respectively.
Therefore, item Q40 remained in the hierarchy. However, the removal of item Q46
increased the scale's person and item separation and reliability findings as well as
increased the RPCA, 1.46/.68, 5.65A97, and 7.75 percent, respectively. Therefore, item
Q46 was removed due to redundancy and improvement in the scale. A side-by-side
the scale's fourteen items remained constant on the hierarchy across Groups (Table 11).
The item found to be the most difficult to endorse and the two items found to be least
difficult to endorse by both Groups were consistent across both scales. Yet, the hierarchy
developed from Group 1 's data was comprised of thirteen items. The hierarchy item scale
developed from Group 2's data was comprised of eleven items. This means that the SAM
scale items can be divided into seven levels of difficulty, which does not reliably
116
Table 11
Item Hierarchy
Group 1 Group 2
Difficult to endorse
Q4 Police trouble
117
Table 11 (Continued)
Easy to endorse
Eliminated items:
Q16 Wasn't up to it
The review of the initial person and item separations and reliabilities findings
indicated .84/.41 for persons and 5.64A97 for items of the DEF scale, respectively. This
means that while the DEF scale items can be divided into seven levels of difficulty, they
do not reliably discriminate differences among the people. Because of the dichotomous
nature of the response options (true and false) both options have an equal probability of
being selected. Therefore, the review of the response scales was unnecessary.
Step two of the Rasch instrument validation analysis involved reviewing the
person and item fit. Further evaluation was conducted in an effort to improve this scale's
separation and reliability results. Inspection of the items and persons lead to a final
iterative elimination often people whose responses were inconsistent and two items (Q8
[Friendly] and Q25 [Dangerous]) which failed to meet the standards set forth for item fit.
This resulted in the final person and item separations and reliabilities of .93/.47 for
persons and 6.62A98 for the remaining ten items On DEF scale. While this suggests a
reasonably well defined linear construct, which can be divided into nine levels of
118
difficulty with high reliability (.98), the variable does not reliably discriminate any
The third step in the analysis involved a review of the person-item map to explore
119
Figure 17
I
.######### I
40
##### I Q65 RESTLESS
IS
I
S|
I
.###### I.
30 +
I Q64 HAPPY
I
I Q31 NO GOOD
' - . . [ •
T|
20 • .##### +T
<less>|<frequ>
EACH '#' IS 3.
120
The resulting hierarchy of items resulted in a pattern from most difficult to endorse to
least difficult to endorse. The initial Rasch principle components analysis (RPCA)
indicated that 42.8 percent of the total variance was explained by the instrument.
Additionally, five underlying contrasts were indicated. Following the elimination often
people and two items from the DEF scale, the RPCA increased to 71.6 percent of the total
variance having been explained by the instrument and the number of remaining contrasts
was reduced to one. This increase in the RPCA demonstrated the improvement in the
DEF scale by eliminating misfitting items and people. In addition the item/person map
means and standard deviations were separated by several items but span the length of the
variable. This indicates that the items were marginally as difficult to endorse as the
The extrapolated variable was compared using the data from a second comparable
group using the same process. The DEF scale, using Group 2 data, demonstrated similar
person and item separation and reliability findings as were found using Group 1 's data.
After the iterative elimination of thirteen misfitting people and one item that failed to
meet the standards for item fit, the final person and item separation and reliability
improved the scale, the DEF scale still did not distinguish the people in a reliable manner.
The final RPCA for the scale was also comparable at 80.5 percent of the variance being
accounted for by the items. As was presented for the DEF for Group 1, the hierarchy of
DEF item endorsement difficulty for Group 2 is also presented on Table 12. Items Q25
and Q9 were aligned on the variable for the data provided by Group 2. Further evaluation
121
of this alignment indicated that neither of the items met the statistical standards for
content areas, Q25 (Dangerous), Q9 (Don't like to Daydream). It should be noted that in
the hierarchy produced by the data from Group 1, item Q25 was eliminated for misfitting.
Eliminating item Q25 from the hierarchy produced by the data from Group 2 improved
the item separation and reliability findings and the RPCA for the DEF scale to 7.45A98
and 83.9 percent respectively. See Figure 18 for the DEF item map.
122
Figure 18
123
A side-by-side comparison of the two Groups' respective item-endorsement difficulty
indicated that six of the scales twelve items remained constant on the hierarchy across
Groups (Table 12). Three of the six consistent items were found by both Groups to be the
most difficult to endorse. The other three items were found by both Groups to be the least
difficult to endorse.
124
Table 12
Item Hierarchy
Group 1 Group 2
Difficult to endorse
Q9 Daydream Q8 Friendly
Q31 No good
Easy to endorse
Eliminated items:
Q8 Friendly Ql Lie
Q25 Dangerous
125
Family versus Control Scale (FAM)
The initial review of the FAM scale's person and item separations and reliabilities
findings indicated .71/.33 for persons and 5.03/.96 for items, respectively. This means
that the FAM scale items can be divided into seven levels of difficulty but the scale did
not reliably distinguish any differences among Group 1. Since the FAM scale's items are
limited to true and false responses, both options have an equal probability of being
selected. Therefore, the review of the response scales was unnecessary. Step two of the
Rasch instrument validation analysis involved reviewing the person and item fit. Further
evaluation was conducted in an effort to improve this scale's separation and reliability
results. Inspection of the items and persons lead to a final iterative elimination of fourteen
people whose responses were inconsistent and three items (Q27 [Too much], Q63 [Loss
for words], and Q8 [Friendly]) which failed to meet the standards set forth for item fit.
This resulted in the final person and item separations and reliabilities of 1.00/. 50 for
persons and 5.71/.97 for the FAM scale's remaining thirteen items. These separation and
reliability findings suggest a reasonably well defined linear construct which can be
divided into seven levels of difficulty but the construct still fails to reliably distinguish
The third step in the analysis involved a review of the person-item map to explore
the extrapolated construct. The resulting hierarchy of items resulted in a pattern from
most difficult to endorse to least difficult to endorse. When two items are aligned at the
same place on the on the hierarchy, the items are either theoretically redundant or at the
same level of difficulty. In Rasch this means the items overfit. Overfitting items can be
126
eliminated if the infit mean-square is below 0.6 and the z-standardized score is -2.0 or
less. Despite not appearing to be measuring the same theoretical content and all of the
items in these combinations falling within the item fit standards, one of the items from
each of the aligned groups of Q25/Q9 and Q23/Q55 can be eliminated because they are at
the same level of difficulty. The initial Rasch principle components analysis (RPCA)
indicated that 35.9 percent of the total variance was explained by the instrument.
Additionally, three underlying contrasts were indicated. With the elimination of fourteen
people and three items from the FAM scale, the RPCA increased to 78.1 percent of the
total variance having been explained by the instrument with no remaining contrasts. This
addition, the item/person map means and standard deviations were separated by several
items but span the length of the variable. This indicates that the items were easier to
endorse than the people were able to agree to them (see Figure 19).
127
Figure 19
.########
Ql LIE
60 ############ M+
Q54 NEGLECTED
#####.### Q65 RESTLESS
Q39 BROKEN LAW
####
50 +M
Q50 ENERGY
.#
40
Q23 CROOKS Q55 STEADY
30
20
Q61 ANTACID
10
<less>|<frequ>
EACH '#' IS 3,
128
The extrapolated variable was compared using the data from a second comparable
group using the same process in the final step of the analysis. The FAM scale, using
Group 2 data, demonstrated similar person and item separation and reliability findings.
After the iterative elimination of six misfitting people, the final person and item
separation and reliability findings increased to .43/. 16 and 5.41/.97 respectively. No items
failed to meet the cutoff for item fit, therefore, no items were eliminated. The final RPC A
for the scale was also comparable at 40 percent of the variance being accounted for by the
items. As was presented for the FAM for Group 1, the hierarchy of FAM item
endorsement difficulty for Group 2 is also presented on Table 13. Items Q25 and Q9 were
again aligned on the same variable for the hierarchy Created by the data from Group 2,
which indicates redundancy. In addition, the item combination of Q39 and Q54 was also
aligned on the hierarchy. Item Q54 fit statistics indicated that the item overfit. However,
elimination of the item drastically decreased the person separation and reliability findings
while only narrowly increasing the item separation and reliability findings and RPC A to
.19/.04, 5.54A97 and 40.4 percent, respectively. Therefore, the item Q54 remained in the
scale, as it was found to drastically reduce the ability of the instrument of discriminate
between the people and also appeared to measures a different content area (Figure 20).
129
Figure 20
38 +S
37 +
36 +
35 +
34 + Q23 CROOKS
33 +
32 +
31 + Q61 ANTACID
<less>|<frequ>
EACH •#' IS 4.
130
A side-by-side comparison of the Groups' respective item-endorsement difficulty
indicated that six of the scales' fifteen items remained constant on the hierarchy across
Groups (Table 13). The two items found by both Groups to be the most difficult to
endorse (Q25 [Dangerous] and Q9 [Daydream]) and the item least difficult to endorse
(Q61 [Antacid]) were consistent as well as a set of three items (Q3 [Go along with], Q38
[Feel sure], and Q50 [Full of energy]) which were clustered around the mean for both
hierarchies and were found to be consistent. This means that the FAM scale fails to work
in terms of discriminating differences among the people and also fails to account for
131
Table 13
Item Hierarchy
Group 1 Group 2
Difficult to endorse
Q61 Antacid
Easy to endorse
;
Eliminated items:
132
Table 13 (Continued)
The initial review of the person and item separations and reliabilities findings
indicated 1.10/.55 for persons and 5.35A97 for items of the COR scale respectively. The
COR scale items can be divided into seven levels of difficulty but it cannot reliably
discriminate differences among the people. The COR scale has true and false responses.
Therefore, the review of the response scales was unnecessary. Step two of the Rasch
instrument validation analysis involved reviewing the person and item fit. Inspection of
the items and persons lead to a final iterative elimination of six people and one item (Ql
[Lie]) which failed to meet the standards set forth for item fit. This resulted in the final
person and item separations and reliabilities of 1.13/.56 for persons and 5.84A97 for the
remaining eleven items on COR scale. While these separation and reliability findings
suggest a reasonably well defined linear construct, the variable does not distinguish
The third step in the analysis involved a review of the person-item map to explore
the extrapolated construct. The resulting hierarchy of items resulted in a pattern from
133
Figure 21
80
70
Q18 OBEY
##
60
Q31 NO GOOD
.##### S
Q24 TEACHERS
50 ######### +M
Q39 NEVER BROKEN LAW
.######
Q19 LEAVE HOME
M| Q42 TOO OFTEN
30 ###### S+
20 ######### +
<less>|<frequ>
EACH 'i' IS 3.
134
The initial RPCA indicated that 60.1 percent of the total variance was explained by the
instrument. Additionally, two underlying contrasts were indicated. With the elimination
of six people and one item from the COR scale, the RPCA increased to 85.1 percent of
the total variance having been explained by the instrument with no remaining contrasts.
This demonstrated improvement in this scale by eliminating misfitting items and people.
In addition, the item/person map means and standard deviations were separated by
several items but span the length of the variable. This indicates that the items were more
The extrapolated variable was compared using the data from a second comparable
group using the same process in the last step of the analysis. The COR scale, using Group
2 data, demonstrated similar person and item separation and reliability findings. After the
iterative elimination of three misfitting people and one item (Ql), the final person and
item separation and reliability findings increased to 1.28/.62 and 6.17/.97, respectively.
Using the data from Group 2, two combinations of items aligned at the same place on the
hierarchy, Q42/Q7 and Q36/Q40. By examining the item fit statistics for the aligned
items, only one item appeared to overfit statistically, and all of the items appeared to
measure different content. There was no improvement for this scale despite the removal
of the overfitting item (Q42), the review of the person and item separation and reliability
findings and the RPCA. The person and item separation and reliability findings and
RPCA declined to 1.12/.56,6.16/.97 and 80.6 percent, respectively. Therefore, item Q42
remained in the hierarchy. The final RPCA for the scale was also comparable at 82.8
percent of the variance being accounted for by the items. As was presented for the COR
135
for Group 1, the hierarchy of COR item endorsement difficulty for Group 2 is also
Figure 22 '
## T|
70 +
I
Q18 OBEY
IS
.###
60 + Q31 NO GOOD
SI
.###•# I
I Q24 TEACHERS
.#########
50 +M Q39 NEVER BROKEN LAW
#### I
M| Q19 LEAVE HOME
I
40 +
I Q42 TOO OFTEN Q7 TROUBLE
I
I Q36 HIT PEOPLE Q40 COULDN'T REMEMBER
Q41 THINK .
#### IS.
30 ######## +
SI
.##### I
20. ####. +
<less>|<frequ>
EACH '#' IS 3.
136
A side-by-side comparison of the Groups' respective item-endorsement difficulty
indicated that seven of the scale's remaining eleven items remained constant on the
hierarchy across Groups (Table 14). All seven of these items were among those found to
be among the most difficult to be endorsed by both Groups. This means that the COR
scale items can be divided into eight difficulty levels, but it does not do an adequately
Table 14
Item Hierarchy
Group 1 Group 2
Difficult to endorse
137
Table 14 (Continued)
Easy to endorse
Eliminated items:
Ql Lie Ql Lie
The initial review of the person and item separations and reliabilities the findings
indicated 0.00/0.00 for persons and 3.63A93 for items of the RAP scale respectively. This
means that the RAP scale does not distinguish any differences among Group 1 but
initially, the items can be divided into five levels of difficulty. However, the RAP has
true and false response options, therefore, both have an equal probability of being
selected. Therefore, the review of the response scales was unnecessary. Step two of the
Rasch instrument validation analysis involved reviewing the person and item fit.
Inspection of the items and persons lead to a final iterative elimination of twelve
misfitting people who failed to meet the statistical standards for fit. No items failed to
meet the standards set forth for item fit. This resulted in the final person and item
separations and reliabilities of 0.00/0.00 for persons and 0.00/0.00 for the RAP scale.
This indicated no change for items and a decrease in reliability and separation for
persons. This suggests that the RAP scale is functioning as developed, which is in a
random manner. The third step in the analysis involved a review of the person-item map
138
to explore the extrapolated construct. There was no resulting hierarchy of items due to the
nature of the scale being random. The initial RPCA indicated that 48 percent of the total
variance was explained by the instrument. Additionally, two underlying contrasts were
indicated. With the elimination of twelve misfitting people from the RAP scale, the
RPCA decreased to 1.4 percent of the total variance having been explained by the
instrument with two remaining contrasts which accounted for 98.6 percent of the
unexplained variance. The RPCA also indicated that the RAP scale was functioning as
The extrapolated variable was compared using the data from a second comparable
group using the same process in the last step of the analysis. The RAP scale, using Group
2 data, demonstrated similar person and item separation and reliability findings. After the
iterative elimination of fourteen misfitting people, the final person and item separation
and reliability findings decreased to .00/.00 and .00/.00, respectively. The final RPCA for
the scale was also comparable at 18.6 percent of the variance being accounted for by the
items with three additional contrasts. As with the RAP scale using data from Group 1, the
RAP scale using data from Group 2 had similar results. This means that the RAP scale
Dichotomous SASSI-3
having a substance dependence disorder. This rubric requires the clinician to reference
the respondent's scores on eight of the SASSI-3's ten subscales. The first 5 of these 9
139
steps require the scorer to reference the individual SASSI-3 subscales. The remaining 4
steps are a function of two or more subscales used in combination. In total, these 9 steps,
involving eight of the ten subscales employ only 70 of the SASSI-3's total of 93 items.
The initial person and item separation and reliabilities for the dichotomous SASSI-3 were
3.54A93 and 5.60/.97, respectively. Additionally, the initial RPCA was 59.7 with one
contrast accounting for more than 5 percent of additional variance. Viewed cumulatively,
all of these findings suggest that the dichotomous SASSI-3, according to the Rasch
analysis, can be said to be a logical linear construct in which the items can be divided into
seven levels of difficulty and which discriminates five levels of differences among the
people. It is just under the required 60 percent cut off for explained variance for the
could be made, the researcher conducted analyses of the dichotomous SASSI-3's scale's
The first step in a Rasch analysis was to evaluate the response scales. Because the
dichotomous SASSI-3 involves items from both the front and back of the instrument, the
true/false and Likert-type response options, it is important to evaluate the response scales
for validity. Inspection of the probability curve and thresholds indicated that response
options 1-Once or Twice and 2- Several times did not meet the standards for cutoffs for
the face valid response scales as identified earlier (see Figure 23).
140
Figure 23
the response scales of the face valid scales and not separate the FVA and FVOD scales.
Therefore, employing the same collapsing strategy for both the FVA and FVOD response
scales resulted in a positive increase in the item separation and reliability to 5.72/.97 (see
Table 15). No additional examination was warranted because the response options for the
142
Table 15
Summary of Collapsing Strategy for Dichotomous Group 1 Face Valid Response Options
Rating Probability
Threshold2 PS&R IS&R RPCA
1
Scale Curve
0 = 0.95
0-l=N/A
1=0.20'
0,1,2,3 1-2 = 9.29 3.547.93 5.60/.97 59.7%
2 = 0.30
2-3 = 3.61
3 =0.95
0 = 0.95
0-l=N/A
0,1,1,2 1=0.40 3.35A92 5.72A97 49.3%
1-2 = 6.04
2 = 0.95
0 = 0.95
0-l=N/A
0,0,1,2 1=0.20 3.06/.90 5.75A97 58.1%
1-2=11.44
2 = 0.95
0 = 0.95
0-l=N/A
0,1,2,2 1=0.20 3.43A92 5.43A97 53.1%
1-2=14.49
2 = 0.95
Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &
143
Figure 24 depicts the corrected response scale in which the middle two response
Figure 24
+ -T • '
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQJ | CALIBRATN | MEASURE|
MISSING 13 0| -5.57 | || | |
+ .—
OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.
:
+ - • •- ;
R 1.0 + + '
0 |00 22|
B | 00000 22222 |
A | 0000 2222 |
B .8 + 000 222 +
1 I 00 22 |
L | 00 22 |
I | 00 22 |
T .6 + 00 22 +
Y | 0 2 |
.5 + 00 22 + •
0 | . 0 2 I
F .4 + **11111** +
| 1111 00 22 1111 |.
R | 111 * ' 111 I!
E " I 111 22 00 ' 111 |
S .2 + 1111 22 00 1111 +
P | 1111 222 000 . 1111 |
o i nun 222 ooo mill i
N 111 2222222 0000000 111
S .0+2222222222222 0000000000000+
144
Step two of the Rasch instrument validation analysis involved reviewing the
person and item fit. Further evaluation was conducted for the dichotomous SASSI-3 to
determine whether the separation and reliability results could be improved. Inspection of
the items and persons lead to a final iterative elimination of 25 people whose responses
were inconsistent and eighteen items which failed to meet the standards set forth for item
fit. This resulted in the final person and item separations and reliabilities of 3.32/.92 and
5.50/.92, respectively, for the dichotomous SASSI-3. This suggests that the resulting
items formed a well defined linear construct that does a good job in measuring the
people.
The third step in the analysis of the dichotomous SASSI-3 involved a review of
the person-item map to explore the extrapolated construct. The resulting hierarchy of
items resulted in the following pattern from most difficult to endorse to least difficult to
endorse is available for a visual review on Table 19. Items aligned at the same position
on the variable may imply redundancy. Therefore, the item fit statistics for the pairs of
items were reviewed to identify which items fit best. The least best fitting item of the pair
Was eliminated. In addition, because the items from the face valid scales were found to
discriminate differences among people, these items were analyzed last for elimination as
they seem to contribute the most to the effectiveness of the instrument. Table 16 lays out
145
Table 16
Infit Outfit
Q4 1.06 .7 1.04 .3
*=better fitting item of the pair. Items with no * was eliminated. MNSQ = mean-square. ZSTD=z-standardized.
The elimination of items Q17, Q35, Q42, Q29, and Q48 resulted in an increase in person
and item separation and reliability findings of 3.21/.91 and 5.62A97, respectively, and a
RPCA of 97.8 percent. The above process was repeated until no further improvements
146
were made in person and item separation and reliabilities. Finally, with the elimination of
29 misfitting items and 25 misfitting people, the resulting 41 item dichotomous SASSI-3
scale had a person and item separation and reliability finding of 3.06/.90 and 5.63/.97,
147
Figure 25
# | FVAlO-relationship FVAll-Nervous/shakes
FVA6-trouble
60 # T+ FVOD12-avoid withdrawal FVOD3-more aware
Q55
this scale by adjusting the response scale and eliminating misfitting people and items. In
addition, the item/person map means and standard deviations were separated by nearly
one standard deviation, indicating that the items were more difficult to endorse than the
In the final step of the analysis, the extrapolated variable was compared using the
data from a second comparable group using the same process. Dichotomous SASSI-3,
using Group 2 data, demonstrated similar person and item separation and reliability
findings. As with Group 1, the response options were not being used as intended (see
Figure 26).
Figure 26
+ ' '
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||CALIBRATN| MEASURE|
MISSING 60 1| -7.29 | || I I
+ • ^- •-
149
Figure 26 (Continued)
+ - - — ••• • — - > 1 — - . — - > - — • —
p ++ -+ ,-- 1 h H 1 h - + --; 1 h h+
R 1.0 + +
0 100 1
B I 000000 333333 1
A 1 000 3333 |
B .8 + 000 3333 +
I 1 00 33 |
L 1 00 • 333 |
I 1 00 33 |
T .6 + • 0 • 3 +
Y 1 0 33 |
.5 + 00 33 +
0 I. 0 3 |
F .4 + 0 3. +
1 00 33 |
R 1 *2*222222222 |
E 1 2223*0 2222 |
S .2 + 111***1**1 0 22222 +
P. 1 1111111222 33 111** 22222 |
0 1 1111111 2222 333 ***n 22222221
N 111111 22222222 33333 00***11111 1
S .0 +*******333333333 00000**************+
E ++ .- -+ + -+ + +-—•—+ + +-• +- ++
-24 -19 -14 -9.-4 1 . 6 11 16 21 26
Person [MINUS] Item MEASURE
A collapsing strategy was developed by reviewing the thresholds and probability curves
(see Table 17). This strategy lead the researcher to combine the two middle response
options: 1-Once or Twice and 2-Several times. This combination allowed for a better
functioning response scale and an increase in the item separation and reliability findings.
150
Table 17
Summary of Collapsing Strategy for Dichotomous Group 2 Face Valid Response Options
Rating Probability
Threshold2 PS&R IS&R RPCA
1
Scale Curve
0 = 0.95
0-l=N/A
1 = 0.20
0,1,2,3 1-2 = 9.64 3.53A93 6.15/.97 62.2%
2 = 0.30
2-3 = 6.29
3 =0.95
0 = 0.95
0-l=N/A
0,1,1,2 1=0.45 3.39A92 6.22A97 56.7%
1-2 = 8.32
2 = 0.95
0 = 0.95
0-l=N/A
0,0,1,2 1=0.25 3.11/.91 6.22/.91 61.4%
1-2 = 8.66
2 = 0.95
0 = 0.95
0-l=N/A
0,1,2,2 1=0.20 3.42A92 5.96A97 56.7%
1-2 = 14.44
2 = 0.95
Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &
151
The corrected response option curves for Group 2 are presented in Figure
Figure 27
+
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||CALIBRATN| MEASURE|
MISSING 60 11-10.20 | | | | |
+
OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.
+
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIMI
LABEL MEASURE S.E. | AT CAT. ZONE 1PROBABLTY | M->C C->M|DISCR|
152
There were 35 misfitting people and 18 items that failed to meet the standards for
item fit. The elimination of these people and items increased the final person and item
separation and reliability results to 3.45A92 and 6.11/.97, respectively. The item map
identified five pairs of aligned items. These items were considered for elimination. See
153
Table 18
Infit Outfit
154
While Group 2's final RFC A for the scale increased to 85.5 percent of the variance being
accounted for by the items, the person and item separation and reliability findings
decreased to 3.17/.91 and 5.78A97, respectively. Therefore, these items remained in the
instrument. As was presented for the Dichotomous SASSI-3 for Group 1, the hierarchy of
155
Figure 28
156
A side-by-side comparison of the Groups' respective item-endorsement difficulty
indicated that the scale maintained some of its consistency on the hierarchy of item
difficulty across groups (see Table 19). This means that the dichotomous SASSI-3*s
items can be divided into eight levels of difficulty with high (.97) reliability. Further,
these items distinguish four levels of differences among the groups with high (.91)
reliability.
157
Table 19
Item Hierarchy
Group 1 Group 2
Difficult to endorse
158
Table 19 (Continued)
159
Table 19 (Continued)
Q7 Not lived
Q4 Police trouble
Easy to endorse
Eliminate(litems:
Qi Lie Q1 Lie
160
Table 19 (Continued)
Q8 Friendly Q9 Daydream
Q49 Cigarettes
161
Table 19 (Continued)
Q52 Resentful
Q61 Antacid
Q64 Happy
Q65 Restless
The RCPA analyses indicated that the following scales were unidimensional in
structure (i.e., each scale accounted for equal to or greater than 60 percent of the scale's
total variance): FVA, FVOD, SYM, OAT, SAT, DEF, FAM, COR, and the dichotomous
SASSI-3 scale. The SAM and the RAP scale's RPCA, failed to meet the minimum 60
The following scales' item fit produced infit and outfit statistics indicative of low
item error: FVA, FVOD, SAM, SYM, OAT, SAT, DEF, FAM, COR, and the
dichotomous SASSI-3 scale. The RAP scale's items did not meet the acceptable
standards for item fit. Therefore, the researcher rejected Research Hypothesis 2.
internal consistency: FVA, FVOD, OAT, SAT, SAM, DEF, FAM, COR, and the
dichotomous SASSI-3 scale. The RAP scale did not produce acceptable reliability
statistics for internal consistency. Therefore, the researcher rejected Research Hypothesis
3a.
162
The following scales remained reliably defined across samples: FVA, FVOD,
SYM, OAT, SAT, SAM, DEF, FAM, COR, RAP, and dichotomous SASSI-3. Therefore,
The following scales demonstrated high discriminatory ability: FVA, FVOD, and
the dichotomous SASSI-3 scale. The SYM, OAT, SAT, SAM, DEF, FAM, COR, and
RAP did not demonstrate discriminatory ability. Therefore, the researcher rejected
Research Hypothesis 4.
Whole SASSI-3
The SASSI-3 has a total of 93 items. Eleven of the SASSI-3's 93 items are not
used on any of the ten scales. Twenty-six of these 93 load on more than one scale. While
the 26 shared items each have dichotomous response options, nine are true on at least one
of the scales and false on another (see Table 20). Items that do not fall in the same
direction or cannot be coded as such are deemed to be misfitting. While there is a key
indicating the expected or "correct" response as identified by the authors of the SASSI-3,
twenty items either have opposite correct answers on two different scales or have no
correct answer listed. In addition, this creates interdependence and artificially high
intercorrelations, it was expected that many of these items will appear to be redundant or
misfit.
163
Table 20
164
Table 20 (Continued)
Note: * items response options are true on at least one scale and false on another.
The initial person and item separation and reliabilities for the SASSI-3 were
3.29A92 and 5.73A97, respectively. Additionally, the initial RPCA was 54.6 percent with
only one contrast that accounted for more than 5 percent of additional variance.
Combined, these findings suggest that the SASSI-3 is a logical linear construct that meets
the minimum standards of distinguishing differences among the sample. However, the
165
instrument is multidimensional and accounts for less than the accepted level of 60 percent
of the total variance. This suggests that the SASSI-3 is measuring more than just one
construct. And, whatever additional constructs that are being measured account for
determine whether augmentations could be made that would render the SASSI-3 a
unidimensional instrument. Because the SASSI-3 involves items from both the front and
evaluate the response scales for validity. Inspection of the probability curves and
thresholds indicated that response option 1-Once or twice and 2-Several times (Figure 20)
did not meet the standards for cutoffs for the face valid response scales as identified
Figure 29
IMISSING 13 0| -3.16 | | | | |
+ : '
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.
166
Figure 29 (Continued)
+ r
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM. T COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. ZONE- |PROBABLTY| M->C C->M|DISCR|
167
Table 21
Summary of Collapsing Strategy for Whole SASSI-3 Face Valid Response Options
Rating Probability
Threshold2 PS&R IS&R RPCA
Scale Curve1
0 = 0.95
0-l=N/A
1 = 0.20
0,1,2,3 1-2 = 10.43 3.297.92 5.727.97 77.8%
2 = 0.25
2-3 = 2.44
3 =0.95
0 = 0.95
0-l=N/A
0,1,1,2 1=0.40 3.00/.90 5.74A97 49.7%
1-2 = 3.84
2 = 0.95
0 = 0.95
0-l=N/A
0,0,1,2 1=0.20 7.78A89 5.71/.97 53%
1-2=13.48
2 = 0.95
0 = 0.95
0-l=N/A
0,1,2,2 1=0.20 3.12/.91 5.70/.97 52.1%
1-2=16.54
2 = 0.95
Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &
168
Therefore, the collapsing strategy for the face valid response scales positively increased
the item separation and reliability to 5.74/.97, respectively (Figure 30). As stated above
because the response options for the other scales are true and false no additional
Figure 30
MISSING 13 0| -5.31 | | | | |
+- ;
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.
+ -• . - - —
169
Figure 30 (Continued)
R 1.0 +
0 000 222|
B 00000 22222
A 0000 2222
B 000 222
I 00 22
L 00 22
I' 00 22
T .6 +
Y I 00 22
.5 + 00 22
0 I 0
F .4 + 0011122
I 1111110 2111111
R I 111 2*0 111
E I 111 2 0• 111
S .2 + 1111 22 00 1111 . +
P 1111 222 000 1111 I
0 11111 222 000 11111 |
N 1111 2222222 0000000 11111
'S .0 +2222222222222 0000000000000+
E
-30 -20 -10 0 .10 20 30 ;
Person [MINUS] Item MEASURE
Step two of the Rasch instrument validation analysis involved reviewing the
person and item fit. Inspection of the items and persons lead to a final iterative
elimination of 22 people and 20 items which failed to meet the standards set forth for
item fit. This resulted in the final person and item separations and reliabilities of 3.82A94
for persons and 5.63/.97 for the SASSI-3 with an RPCA of 69.2 percent. This suggests
that the remaining items form a well defined linear construct that can be divided into
seven levels of difficulty, and it does a reliable (.97) job in discriminating five different
groups among the people from low to high agreeability on the hierarchy of items.
The third step in the analysis involved a review of the person-item map to explore
the extrapolated construct. The resulting hierarchy produced a pattern of items ranging
from most difficult to endorse to least difficult to endorse (see Table 22). No items shared
170
the same position on the scale. Group 1 's final RPCA indicated that 69.2 percent of the
total variance was explained by the SASSI-3's remaining 73 items. This improvement
was achieved by adjusting the response scale and eliminating misfitting people and items.
In addition, the item/person map means and standard deviations were separated by nearly
one standard deviation (Figure 31). This separation indicates that the items were more
Figure 31
171
Figure 31 (Continued)
FVA8-argiied
Q31 no good
Q53 responsibilities .
I FVA7-depressed
FVOD8-really stoned
Q58 drink/drugs trouble
S| FVAl-lunch
Q50 energy
50 .## +M FVA2-feelings
FVODl-improve thinking
FVOD13-life
FVOD5-help
Q20 disapproving looks
Q24 teachers had probs
Q52 resentful
Q67 binge
##### I FVODlO-activities
FVOD6-forget
Q57 father
Q9 daydream
# | FVODll-aod
FVOD2-feel better
.### I FVA5-physical probs
Q33 take blame
Q59 family probs
1 Q21 others would not deal
Q56 teenager
#### 1 Q19 tempted to leave
Q54 neglected obligations
#### 1 Q42 too much/often
Q4 9 cigarettes
Q63 loss for words
40 ######## M+ FVA4-intended
Q60 away from home
.## 1 Q26 need something to do
Q40 done things
Q48 punished
Q7 not lived the way
#### IS
#### 1 Q29 control
Q4 trouble w/police
1
.### 1 Q46 undesirable types
#### 1 Qll sitting still
Q13 worn out
Q6 not my fault
30 ### +
.# S| Q44 blame
### 1 Q66 spur of moment
### 1 Q27 drunk too much
.## IT
1 Q14 moving
.# l
I
20 +
# 1
T|
# ' I
10 +
<less>|<frequ>
EACH ' # ' IS 2.
172
In the final step of the analysis, the extrapolated variable was compared using the
data from a second comparable group using the same process. The SASSI-3, using Group
2 data, demonstrated similar person and item separation and reliability findings, 3.S7/.93
and 5.26A97, respectively and a RPCA of 60.9 percent. As with Group 1, the response
Figure 32
:
+ — - — - — • - — ' • -' . — ' •
+: ^_, -^ . . ~
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.. I COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. -ZONE |PROBABLTY| M->C C->M|DISCR|
173
Figure 32 (Continued)
The researcher developed a collapsing strategy after reviewing the thresholds and
probability curves. The two middle response options, 1-Once or Twice and 2-Several
Figure 33
MISSING 35 1| -6.73 I I I . I I
174
Figure 33 (Continued)
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.
+
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 5 0 % CUM.I COHERENCE|ESTIM|
LABEL MEASURE S.E. | A T CAT. ZONE |PROBABLTY| M->C C->M|DISCR|
R 1.0 + +
0 100 22 1
B 1 00000 22222 |
A 1 0000 2222 |
B .8 + 00 22 +
I 1 000 222 |
L 1 00 22 |
I 1 00 22 |
T .6 + 0 2 +
Y 1 00 22 . |
.5 + 00 22 +
0 1 0 2 |
F .4 + 1**11111**1 +
1 1111 0 2 1111 |
R 1 111 0*2 111 |
E 1 111 22 00 111 |
S .2 + 111 22 00 111 +
P 1 1111 222 000 1111 |
O 1 111111 222 000 111111 |
N ill 2222222 0000000 111
S .0 +2222222222222 0000000000000+
E ++- + — + + + + ++
-30 -20 -10 0
Person [MINUS] Item MEASURE
This allowed for a better functioning response scale and an increase in the person and
175
Table 21
Summary of Collapsing Strategy for Whole SASSI-3 Group 2 Face Valid Response
Options
Rating Probability
j2
Threshold* PS&R IS&R RPCA
Scale Curve1
0 = 0.95
0-l=N/A
1=0.20
0,1,2,3 1-2 = 9.84 3.57A93 5261.91 60.9%
2 = 0.30
2-3 = 5.16
3=0.95
0 = 0.95
0-l=N/A
0,1,1,2 1=0.40 3.32A92 5.25A97 58%
1-2 = 6.92
2 = 0.95
0 = 0.95
0-l=N/A
0,0,1,2 1=0.25 3.11/.91 S.23/.96 59.9%
1-2 = 9.54
2 = 0.95
0 = 0.95
0-l=N/A
0,1,2,2 1=0.20 3.37A92 5.25A97 60.6%
1-2=15.34 "
2 = 0.95
Note. 1 = >/ .5 is acceptable. 2 = >/1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &
176
After the iterative elimination of the 13 misfitting people and the 22 items that
failed to meet the standards for fit, Group 2's final person and item separation and
reliability findings improved to 4.12/.94 and 5.10/.96, respectively. The final RPCA for
the scale was also comparable at 74.8 percent of the variance being accounted for by the
items. This means that the items can be divided into seven levels of difficulty that
discriminate between five groups of people among the sample and have a high reliability
(.94 and .96, respectively). As was reported for the SASSI-3 for Group 1, the hierarchy of
item endorsement difficulty for Group 2 is also presented on Table 21. A side-by-side
comparison of the Groups' respective item-endorsement difficulty indicated that the scale
maintained some of its consistency on the hierarchy across groups. Of the 20 items
deleted from Group 1 's hierarchy all but one item were also deleted from Group 2's
hierarchy.
Table 22
Item Hierarchy
Group 1 Group 2
Difficult to endorse
177
Table 22 (Continued)
178
Table 22 (Continued)
179
Table 22 (Continued)
180
Table 22 (Continued)
to endorse
Eliminated items:
Ql Lie Ql Lie
181
Table 22 (Continued)
Q64 Happy
Q65 Restless
The whole SASSI-3's RPCA indicated that greater than 60 percent (74.8%) of the
variability is accounted for by the instrument. Based on this finding, the researcher failed
The SASSI-3's remaining 63 items infit and outfit statistics fall within the lower
than 2.0 z-standardized and positive point-biserial cutoff statistics. Based on this finding,
The SASSI-3 maintains its' item consistency as the items align in the same
general area on the variable across samples. Based on this finding, the researcher failed to
182
The SASSI-3 discriminates five different groups (person - 3,82) among sample
with high reliability (.94). Based on this finding, the researcher failed to reject Research
Hypothesis 8.
Summary
This study had two general research questions. General Research Question 1 was
General Research Question 2 was Does modern measurement theory assist in improving
the SASSI-3 instrument holistically? Based on the results reported in this chapter, the
researcher failed to reject both General Research Questions. Generally, the evidence
supports that the face valid scales meet fundamental measurement properties and the
subtle scales do not. Additionally, when combined with the subtle scales, the face valid
scales perform better but are still outperformed when they are used independently.
183
Chapter Five
Discussion
America and one that has a negative impact on its citizens (Substance Abuse and Mental
with untimely deaths, loss in work productivity, reduction in days attended at school,
increased costs due to substance dependence associated medical care, and criminal
activity (SAMHSA, 2008). It is important for people who struggle with alcohol and drug
dependency to get proper diagnosis and treatment to help reduce and eliminate these
accuracy of the tools used in formulating a diagnosis. As such, due to the clinical
tools are psychometrically sound and accurately measure the behaviors they are designed
to measure—substance abuse.
184
A number of substance abuse screening instruments are available to assist in this
process. A study of masters addictions counselors revealed that there are four substance
abuse screens that these counselors most frequently select as aids in their diagnostic
processes (Juhnke, Vacc, Curtis, Coll, & Paredes, 2003). These four screens are the
Substance Abuse Subtle Screening Inventory-3 (SASSI-3; Miller & Lazowski, 1999), the
Michigan Alcoholism Screening Test (MAST; Selzer, 1971) the Minnesota Multiphasic
Kaemmer, 1989) Mac Andrew Scale-Revised (Mac-R: MacAndrew, 1965), and the
Additions Severity Index (ASI; McLellan, Luborski, Cacciola, Griffith, McGranhan, &
O'Brien, 1992). Of these four, the Substance Abuse Subtle Screening Inventory-3 (Miller
& Lazowski, 1999) was identified by these counselors as being the most important
(Juhnke et al., 2003) for the following reasons: the SASSI-3, unlike the other three
abuse; it provides several measures of response bias (e.g., defensiveness and random
data.
A robust but conflicting literature base has developed to address the SASSI-3's
of agreement with what is published in the SASSI-3 Manual (Miller & Lazowski, 1999).
In fact, research conducted by investigators not associated with the SASSI-3's publishers
appears to question the SASSI-3's reliability and validity. Despite this well-developed
body of literature, nothing is known about the SASSI-3's alignment with the fundamental
185
principles of measurement (Thurstone, 1927). Psychometric concepts central to
evaluating just one construct (Bond & Fox, 2007). In this study, the construct purportedly
measured by the SASSI-3 is substance dependence (Miller & Lazowski, 1999). Linearity
refers to an ever increasing level of an instrument's items' difficulty (Bond & Fox). If an
Easier to answer items fall on one end of the spectrum and harder to answer items fall on
one's substance use might include the following: "I can drink one or two drinks without
passing out." Most persons who consume alcohol could very likely answer that item
affirmatively could be "I experience delirium tremens when I stop drinking." It is likely
that fewer persons' substance dependence has progressed to this level. Consequently, it is
harder for most people to answer this question affirmatively. Additionally, invariance
means that the items will be aligned on an equal interval. That is, for example, the
distance between "sometimes and "frequently." Finally, an instrument that invariant will
demonstrate equal alignment of the items' response options, regardless of the sample in
186
Despite its popularity among addictions counselors (Junkhe, Vacc, Curtis, Coll,
and Paredes, 2003) and its use in a wide-range of settings, the SASSI-3's psychometric
properties have been found to differ (Arneth, Bogner, Corrigan & Schmidt, 2001;
Clements, 2002; Feldstein & Miller, 2007; Gray, 2001; Laux, Perea-Ditlz, Smirnoff &
Salyers, 2005; Laux, Salyers & Kotova, 2005; Lazowski, Miller, Boye, & Miller, 1998;
Peters et al., 2000; Svanum & McGrew, 1995), at times significantly, from those reported
in the Manual (Miller & Lazowski). These differences may be related to the traditional
methods of testing reliability and validity used by researchers. However, what is unclear
purports to measure. If there is doubt about what the S ASSI-3 is measuring, then there is
also doubt about the implications of the diagnoses it informs and the subsequent
using the Rasch model (Rasch, 1960,1980). Specifically, this investigation focused on
the unidimensionality of the entire instrument and the individual scales. Additionally, it
evaluated the reliability rating scales by identifying whether the participants are utilizing
the scales as intended by the authors of the SASSI-3, and assessed the linearity,
187
This study explored the measurement properties of the SASSI-3 in three parts.
The first part was to look at each scale individually. The SASSI-3 authors identified a
factor structure which resulted in ten scales (Miller & Lazowski, 1995). Those ten scales
included the Face Valid Alcohol (FVA), Face Valid Other Drug (FVOD), Obvious
(COR), and the Random Answering Pattern (RAP) scales. The second part involved
exploring all the items together which contribute to the dichotomous decision of
likelihood of substance dependence or not. This included the face valid scales, the OAT,
SAT, SAM, SYM, and DEF scales only. The third part of the investigation involved the
exploration of the entire instrument including all 93 items. The following will summarize
these findings in the following order: each SASSI-3 scale, the dichotomous SASSI-3, and
The FVA scale includes twelve items. Each item is accompanied by a four point
Likert-type rating scale response option. The respondent is directed to identify the
number of times he Or she has engaged in the particular behavior listed in the item. The
results of this investigation indicate that the FVA was unidimensional because its RPCA
was above 60 percent and it had no underlying contrasts. After adjusting the rating scale
for improved functioning and eliminating misfitting people it was found that the FVA
scale's items could be divided into ten levels of difficulty. These 10 levels discriminated
188
between nearly four groups of people ranging from low to high agreeability on the items
The FVOD scale includes fourteen items. As does the FVA, each of the FVOD's
items is accompanied by a four point Likert-type rating scale response option. The
respondent is requested to identify the number of times he or she has engaged in the
particular behavior listed in the item. The FVOD was unidimensional because its RPCA
was above 60 percent and it had no underlying contrasts. After adjusting the rating scale
for improved functioning and eliminating misfitting people, it was found that the FVOD
scale's items could be divided into six levels of difficulty. These six levels discriminated
between nearly four groups of people and ranged from low to high agreeability on the
The SYM scale includes ten items, each with a dichotomous true-false response
option. After eliminating two items, the RPCA indicated that 92.9 percent of the variance
could be explained by the scale. However, despite the remaining eight items being
divided into as many levels of difficulty, the scale did not distinguish any differences
among the people in the sample. Therefore, this scale failed to meet fundamental
measurement properties.
The OAT scale includes twelve items, each with a dichotomous true-false
response option. The final RPCA indicated 60.3 percent of the total variance was
accounted for by the OAT scale with three underlying contrasts accounting for greater
than 5 percent of the variance. This implied that the OAT scale possibly had multiple
dimensions. Additionally, while the items were divided into seven levels of difficulty, the
189
scale did not distinguish any differences among the group. Therefore, this scale failed to
The SAT scale includes eight items, each with a dichotomous true-false response
option. The final RPCCA indicated that 92.8 percent of the variance was explained by the
SAT scale. Additionally, while the scale's items divided into as many levels of difficulty,
the SAT scale did not distinguish any differences among the group. Therefore, this scale
The SAM scale includes fourteen items, each with a dichotomous true-false
response option. While the items could be divided into four levels of difficulty, the final
RPCA for Group 1 indicated that 47.4 percent of the variance was accounted for by the
SAM scale. In contrast, the RPCA for Group 2 indicated that 80.8 percent of the variance
was accounted for by the scale. Neither Groups' person separation met or exceeded the
2.0 standard. Additionally, the SAM scale did not distinguish any differences among the
group. Therefore, this scale failed to meet the, fundamental measurement properties.
The DEF scale includes twelve items, each with a dichotomous true-false
response option. The final RPCA indicated that 71.6 percent of the total variance could
be explained by the DEF scale. Additionally, the items, while dividing into nine groups,
did not distinguish any differences among the group. Therefore, this scale failed to meet
The FAM scale included fourteen items, each with a dichotomous true-false
response option. After the elimination of three misfitting items, the final RPCA indicated
that 78.1 percent of the variance was explained by the FAM scale. However, while the
190
FAM scale's items could be divided into seven levels of difficulty, they could not
discriminate any differences among the people. Therefore, the FAM scale failed to meet
The COR scale included twelve items, each with a dichotomous true-false
response option. After the elimination of one misfitting item, the final RPAC indicated
that 85.1 percent of the variance was explained by the COR scale. However, while the
items could be divided into seven levels of difficulty, the COR scale did not distinguish
any differences among the group. Therefore, the COR scale failed to meet the
The RAP scale included six items, each with a dichotomous true-false response
option. The final RPCA indicated that 1.4 of the variance was explained by the scale. The
items could not be divided into any levels of difficulty and no distinction could be made
among differences in the people. In addition, there was no reliability (.00) with this scale.
This scale did not meet the fundamental measurement properties. However, one would
question whether that was not the original intention of the SASSI-3 authors as it was
meant to identify people who were responding to the instrument in a random way.
The dichotomous SASSI-3 includes 70 items with both a four point Likert-type
response scale and a dichotomous response scale, true and false. After adjusting the four
point Likert-type scale for maximum meaning and eliminating 29 misfitting items, the
accounted for 81 percent of the variance explained and no underlying constructs. The
191
items were divided into four levels of difficulty. These levels discriminated seven
different groups of people ranging from high to low on the variable. Therefore, the
The whole SASSI-3 included all 93 items including both the four point Likert-
type response scales and the dichotomous true-false response item scales. After adjusting
the four point Likert-type response scale for maximum meaning and eliminating 20
items, the RPC A indicated that the instrument functioned as a unidimensional measure
with 69.2 percent of the variance explained, and no evidence of underlying constructs.
The items were divided into seven levels of difficulty which could discriminate five
different groups among the people within the group from high to low on the variable.
Therefore, the whole SASSI-3 instrument, can work as a unidimensional instrument, used
to distinguish people high on the variable from those low on the variable.
The SASSI-3 authors' purport that the unique integration of subtle items with
direct items provides additional information which is often difficult to assess due to the
clinical denial often present in people dealing with substance dependence issues (Miller
& Lazowski, 1985). However, in their review of the empirical SASSI-3 literature,
Feldstein and Miller (2007) concluded that the SASSI-3's subtle scales have fair to poor
reviewed substantiation was found" for the claims that the unique contribution the
192
dependence (p. 49). The findings of the present study are supportive of Feldstein and
Miller's summary conclusions. The subtle scales in this study did not function
measurement properties. In addition, the face valid scales had higher person and item
separation and reliability findings and RPCA's than the dichotomous or whole SASSI-3
Table 23
Summary of Person and Item Separation Findings and RPCA 'sfor Direct Versus Direct
PS& R = Person Separation & Reliability. IS & R = Item Separation & Reliability. RPCA = Rasch principal components analysis.
In 2006, Tellegen et al. introduced a revised version of the MMPl-2. Tellegen and
his co-authors noted that many of the MMPI-2's items loaded on two or more of the
MMPI-2's Clinical scales. They concluded that these multi-item overlaps reduced
specificity among the 8 Clinical scales. In an effort to improve these Basic scales'
specificity, these authors published a newer version of the MMPI called the MMPI-
Restructured Clinical (RC). This reduction and restructuring of the Clinical scales
resulted in RC scales that have higher validity and reliability estimates (Nichols, 2006;
Rogers, Sewell, Harrison & Jordan, 2006). As noted earlier, many of the SASSI-3's items
193
load on one or more of the dichotomous scales. Employing the types of analyses
presented in this study has the potential to produce the same results for the Substance
Abuse Subtle Screening Instrument-3. Specifically, the findings of the present study were
supportive of this assertion. A reduction in the number of items improved the reliability
Finally, the hierarchies that were established for the S AS SI-3 items, regardless of
whether they were from the face valid only scales, the dichotomous, or the whole SASSI-
3, maintained the same general position across all four measures. For example, FVA12
(Suicide) was more difficult to endorse on each of the scales, and FVA4 (More than
intended) was less difficult to endorse on each of the scales. These consistent patterns of
item difficulty are indicative of the linearity of the SASSI-3 measure. That is, the SASSI-
3 measures less to more of the variable of substance dependence consistently and reliably
across samples. This is not unlike intelligence tests; the purpose of which is to measure
less to more of the variable of intelligence consistently and reliably across samples. The
more difficult the item the more of the quality or characteristic one possesses.
Implications
3. The first recommendation is to reduce the number of scales. The SASSI-3 meets the
to screen for substance dependency. This means that the SASSI-3 can be made more
194
A second recommendation is to reduce the number of items. Eliminating
multivocal items, items that are true on one scale and false on another, and items that are
not on any scale, may have a broader effect on the instrument's measurement properties
because these added to the misfitting items (see Table 20). These deleted items indicated
that the item misfits of overfits on the instrument on a consistent manner. Deleting the
misfitting or the overfitting items improved the instrument's person and item separation
The respondents failed to utilize the response options as the SASSI-3 authors
intended. These standards include meeting the probability curve of .5 or better and a
threshold of greater than or equal to 1.4 units distance between two adjacent response
choices. It appeared from the data reviewed in this study that the respondents did not
response option 1- Once or Twice and response option 2- Several Times. A review of the
response options would be to vary the weights assigned to each level of behavior
acknowledgment. For example, to respond "frequently" to the question of how often one
consumes alcohol with lunch has different clinical implications than responding
"frequently" to a question about attempting suicide while consuming alcohol. Under the
current SASSI-3 scoring system, each of these responses are scored a " 3 " even though,
195
consuming alcohol is of much greater concern, clinically, than is someone who frequently
These analyses indicate that the subtle scales do not contribute in a meaningful
way to the instrument. This was evidenced by the fact that when the face Valid scales
were used independently, the person and item separation and reliability findings as well
as the RPCAs were higher than when the face valid scales were combined with the subtle
scale items. Therefore, the subtle items could be removed without losing any of the
measurement properties.
Recommendations for future research include combining both the FVA and the
FVOD with the subtle items to investigate the measurement properties. These new
instruments may produce an alcohol only and a drug only screening instrument.
However, it is important to explore whether the face valid scales have higher
measurement properties with or without the subtle items. Combining the face valid scales
into one scale may also be an area of research to investigate. While the results of the
Rasch analysis demonstrated that the FVOD and FVA scale did function independently,
investigating whether they function together with some modification in the wording of
the items to make them more universal to substances instead of drugs or alcohol
exclusively, may be a benefit to the SASSI-3. Finally, reworking the response options for
the face valid scales may contribute to the functioning of the instrument.
196
Limitations
Despite its multiple uses and high reputation for instrument validation, critiques
against the Rasch model are purported hy individuals who are solely committed to the use
of factor analysis. Bond and Fox (2007) report that these critics state that Rasch model
/ analysis is not a theory building method, as is factor analysis, and that the Rasch model
theory is too simplistic. In Rasch, the theory drives the development of the instrument.
Rasch analysis will prove ineffective as the Rasch model only works for unidimensional
A specific limitation of this study included the assumption made by the researcher
that the sample drawn from the data gathered from the community family court project
included people with a higher likelihood of substance dependence. This assumption was
made primarily due to the involvement with the project. However, just because a
respondent was involved with the project did not necessarily imply a higher likelihood of
substance dependence.
Conclusion
The purpose of this study was to investigate the measurement properties of the
Independence. This study produced two major findings. The first involves the SASSI-3's
197
written, is not intended to be unidimensional, the SASSI-3 can function as a
minor adjustments to the response options and elimination of some misfitting and
redundant items. The second major finding of this study is that the subtle scales and
subtle items do not appear to contribute to the functioning of the instrument. The
implications of these findings are that changing the response scale and eliminating
multivocal items, items that are true on one scale and false on another, items with no
scale and other items that misfit or are redundant will improve the functioning of the
time management and save money for community agencies and drug and alcohol
only improve On the effectiveness of the instrument. However, more research is needed to
confirm the findings of this study. As has been suggested for the MMPI-2 RC, immediate
change to a new instrument without research to confirm and validate these findings
would be premature.
198
References
Adger, H., & Werner, M. J. (1994). The pediatrician. Alcohol Health and Research
(Eds.), Test validity (pp. 19-32). Princeton, NJ: Lawrence Erlbaum Associates,
Inc.
Arneth, P. M., Bogner, J. A., Corrigan, J. D. & Schmidt, L. (2001). The utility of the
Substance Abuse Subtle Screening Inventory-3 for use with individuals with brain
Banerji, M., Smith, R. M., & Dedrick, R. F. (1997). Dimensionality of an early childhood
199
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental
measurement in the human sciences, (2n ed). Mahwah, NJ: Lawrence Erlbaum
Associates, Publishers.
Burck, A. M., Laux, J. M., Harper, H. L., & Ritchie, M, (2008). Detecting college student
Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989).
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Fort
260.
200
Elliott, R., Fox, C. M , Beltyukova, S. A., Stone, G. E.> Gunderson, J., & Zhang, X.
Ewing, J. A. (1984). Detecting alcoholism: The CAGE questionnaire. JAMA, 252, 1905-
1907.
Feldstein, S. W., & Miller, W. R. (2007). Does subtle screening for substance abuse
Addiction, 702,41-50.
Gray, B. T. (2001). A factor analytic study of the Substance Abuse Subtle Screening
118.
Henderson, C. E., Taxman, F. S., & Young, D. W. (2007). A Rasch model analysis of
Juhnke, G. A., Vacc, N. A., Curtis, R. C , Coll, K. M., & Paredes, D. M. (2003).
201
Assessment instruments used by addictions counselors. Journal of Addictions and
Kagee, A., & deBruin, G. P. (2007). The South African former detainees distress scale:
Keeves, J. P., & Masters, G. N., (1999) Introduction. In G. N. Masters & J. P. Keeves
Kubinger, K.D. (2005). Psychological test calibration using the Rasch model - Some
5(4), 377-394.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for
Laux, J. M., Perera-Diltz, D., Smirnoff, J. B., & Salyers, K. M. (2005). The SASSI-3 face
Laux, J. M., Salyers, K. M., & Kotova, E. (2005). Psychometric evaluation of the SASSI-
Lazowski, L. E., Miller, F. G., Boye, M. W., & Miller, G. A. (1998). Efficacy of the
Assessment, 71,114-128.
202
Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome
Measurement, 3, 103-122.
Litwin, M. (1995). How to measure survey reliability and validity. Thousand Oaks, CA:
SAGE Publications.
Mayfield, D., McLeod, G:, & Hall, P. (1974). The CAGE questionnaire: Validation of a
1121-1123.
Miller, W. R., & Lazowski, L. (1999). Adult SASSI-3 Manual. Springfield, IN: SASSI
Institute.
Miller, W. R., & Feldstein, S. W. (2007). SASSI: A response to Lazowski & Miller.
Millon, T. (1987). Manual for the Millon Clinical Multiaxial lnverntory-ll (MCMI-II).
Myerholtz, L , & Rosenberg, H. (1998). Screening college students for alcohol problems:
446.
203
National Highway Safety Traffic Safety Administration (2006). 2006 Annual assessment
Nichols, D. S. (2006). The trials of separating bath Water from baby: A review and
Peters, R. H., Greenbaum, P. E., Steinberg, M. L., Carter, C. R., Ortiz, M. M., Fry, B. C ,
75,349-358.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.
Rasch, G. (1980). Probabilistic Models for some intelligence and attainment tests
Rogers, R., Sewell, K. W., Harrison, K. S, & Jordan, M. J. (2006). The MMPI-2
Salins, P. (2008). Does the SAT predict college success? Retrieved 1/23/09 from
ml.
Selzer, M. L. (1971) The Michigan Alcohol Screening Test: The quest for a new
204
Sproll, N. L. (1995). Handbook of research methods: A guide for practitioners and
students in the social sciences, (2nd ed). Metuchen, N.J. & London, England:
Stevens, J. (1996). Applied multivariate statistics for social sciences, (3 rd ed). Mahwah,
Strong, D. R., Kahler, G. W., Greene, R. L., & Schinka, J. (2005). Isolating a primary
Substance Abuse and Mental Health Services Administration (2008). Drug abuse
Substance Abuse and Mental Health Services Administration (2008). Results from the
2007 national survey on drug use and health: National findings. Retrieved on
Svanum, S., & McGrew, J. (1995). Prospective screening of substance dependence: The
Sweet, R. I., & Saules, K. K. (2003). Validity of the Substance Abuse Subtle Screening
Tellegen, A., Ben-Porath, Y. S, Sellbom, M., Arbisi, P. A., McNulty, J. L., & Graham, J.
205
R. (2006). Further evidence on the validity of the MMPI-2 Restructured Clinical
(RC) Scales: Addressing questions raised by Rogers, Sewell, Harrison and Jordan
286.
Traub, R. (1994). MMSS Reliability for the social sciences: Theory and applications,
Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa
Wallen, N. E., & Fraenkel, J. R. (1991). Educational research: A guide to the process.
Weed, N. C , Butcher, J. N., McKenna, T., & Ben-Porath, Y. S. (1992). New measures
for assessing alcohol. And drug abuse with the MMPI-2: The APS and AAS.
intelligence and attainment tests (pp. ix-xix). Chicago, IL: The University of
Chicago Press.
206