Sunteți pe pagina 1din 220

A Dissertation

Entitled

A Psychometric Study of the Substance Abuse Subtle Screening Inventory-3

Using Rasch Analysis

By

Tara M.Hill

Submitted as partial fulfillment of the requirements for

Doctorate in Philosophy degree in

Counselor Education

Advisor: Dr. John Laux

Dr. Paula Dupuy

Dr. Holly Harper

Dr. Gregory Stone

College of Health Science and Human Service

College of Graduate Studies

The University of Toledo

May 2009
UMI Number: 3364311

Copyright 2009 by
Hill, Tara M.

INFORMATION TO USERS

The quality of this reproduction is dependent upon the quality of the copy
submitted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleed-through, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send a complete manuscript
and there are missing pages, these will be noted. Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.

UMI
UMI Microform 3364311
Copyright 2009 by ProQuest LLC
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.

ProQuest LLC
789 East Eisenhower Parkway
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Copyright © 2009

This document is copyrighted material. Under copyright law, no parts of this document

may be reproduced without the expressed permission of the author.


An Abstract of

A Psychometric Study of the Substance Abuse Subtle Screening Inventory-3

Using Rasch Analysis

TaraM. Hill

Submitted as partial fulfillment of the requirements for

Doctor of Philosophy degree in Counselor Education

University of Toledo

May 2009

The Substance Abuse Subtle Screening Inverntory-3 (SASSI-3; Miller & Lazowski,

1999) is a popular screening instrument used to assist professionals in the assessment of

individuals who may be substance dependent. Many researchers have reported reliability

and validity results on this instrument with mixed results, which at times have

contradicted those published by the authors of the instrument. This study is the first to

explore the fundamental measurement properties of the SASSI-3 from a Thurstoniah

perspective. Included in this dissertation are a comprehensive review of the literature on

the SASSI-3's psychometric properties and a discussion of the methods used to evaluate

the instrument. The results demonstrated that the whole SASSI-3 meets fundamental

measurement properties and can discriminate groups of people from high to low on the

substance dependency variable. However, the face valid scales continue to demonstrate

iii
higher functioning when used independently of the subtle items. Based on these results,

future research recommendations include combining the Face Valid Alcohol and Face

Valid Other Drug scales to determine the functioning of these two scales together.

iv
Acknowledgment

Thank you Sarah Richards; your love and support were essential and I could not have

done any of this without you. Sue Nagy, Beth and Jim Hill, and Shaunda Jennings;

thanks for always believing in me. John Laux; your guidance, feedback, humor,

persistence, and encouragement made it possible for me to complete my dissertation.

The rest of my committee, Paula Dupuy, Holly Harper, and Greg Stone; your support,

edits, and feedback were essential and appreciated. Thank you to Megan Mahon and

Amber Lange, who in this process, became my lifetime friends.

V
Table of Contents

Abstract .. hi

Acknowledgment ...v

Table of Contents..... vi

List of Tables... x

List of Figures xii

Chapter One Introduction •. 1

Statement of the Problem 6

Purpose of the Study. 7

Research Questions and Corresponding Hypotheses 7

Significance of the Study 9

Definition of Substance Dependence 9

Organization of Chapters 10

Chapter Two Review of the Literature 11

Substance Dependence 11

Alcohol and Drug Screenings 12

SASSI-3 - The Instrument 13

The SASSI-3 Scales 15

Interpreting the SASSI-3.. .21

vi
SASSI Psychometrics .. 22

Reliability.. 22

Validity 24

SASSI-3 reliability from the SASSI-3 Manual. 30

SASSI-3 reliability from independent researchers 30

SASSI-3 validity data from the SASSI-3 Manual 34

SASSI validity data from independent researchers 35

Limitations of the Psychometric Findings on the SASSI-3 38

Rasch Measurement ...40

Rasch Separation and Reliability 44

Response Validation 45

Construct analysis ...47

Unidimensionality 48

Independence 48

Summary 49

Chapter Three Methods , 50

Overview 50

Research Questions and Correlating Hypotheses 50

Participants 52

vii
Instrument - The Substance Abuse Subtle Screening Inventory-3 (SASSI-3) 54

Variable. 58

Procedures 59

Steps in conducting a Rasch Analysis 59

Step one - Response validation 60

Step two - Item fit analysis. 60

Step three - Construct analysis 60

Step five - Assess for measure independence 61

Limitations 62

Chapter Four Results '. 65

Face Valid Alcohol Scale (FVA) 66

Face Valid Other Drug Scale (FVOD) ! 79

Symptoms Scale (SYM) 93

Obvious Attributes Scale (OAT) 100

Subtle Attributes Scale (SAT) 107

Supplemental Addiction Measure (SAM)...... 112

Defensiveness Scale (DEF) 118

Family versus Control Scale (FAM) 126

Correctional Scale (COR) 133

viii
Random Answering Pattern (RAP) 138

Dichotomous SASSI-3 .....139

Review of Research Hypotheses 1-4 162

Review of Research Hypotheses 5-8 182

Chapter Five Discussion 184

The SASSI-3 Scales 188

The Dichotomous SASSI-3 and the Whole SASSI-3 191

Integration of Findings with Other Research 192

Implications.. 194

Suggestions for Future Research 196

Limitations 197

Conclusion 197

References 199

ix
List of Tables

Table 1 - Kappa Coefficient Agreement between Instruments by Authors 36

Table 2 - Summary of Collapsing Strategy for FVA Group 1 Response Options. 69

Table 3 - Summary of Collapsing Strategy for FVA Group 2 Response Options 74

Table 4 - FVA Test of Independence.. 78

Table 5 - Summary of Collapsing Strategy for FVOD Group 1 Response Options 83

Table 6 - Summary of Collapsing Strategy for FVOD Group 2 Response Options 88

Table 7-FVOD Test of Independence ...90

Table 8 - SYM Test of Independence... 99

Table 9 - OAT Test of Independence 106

Table 10 - SAT Test of Independence 111

Table 11 - SAM Test of Independence 117

Table 12-DEF Test of Independence 125

Table 13 - FAM Test of Independence 132

Table 14 - COR Test of Independence 137

Table 15 - Summary of Collapsing Strategy for Dichotomous Group 1 Face Valid

Response Options 143

Table 16 - Dichotomous SASSI-3 Group 1 Paired Aligned Items Fit Statistics 146

Table 17 - Summary of Collapsing Strategy for Dichotomous Group 2 Face Valid

Response Options.... 151


x
Table 18 - Dichotomous SASSl-3 Group 2 Paired Aligned Items Fit Statistics 154

Table 19 - Dichotomous SASSI-3 Test of Independence 158

Table 20 - Multivocal Items, Items on No Scale 164

Table 21 - Summary of Collapsing Strategy for Whole SASSI-3 Face Valid Response

Options 168

Table 21 - Summary of Collapsing Sartegy for Whole SASSI-3 Group 2 Face Valid

Response Options 176

Table 22-Whole SASSI-3 Test of Independence 177

Table 23 - Summary of Person and Item Separation Findings and RPCA's for Direct

Versus Direct and Indirect Scales Combined 193

xi
List of Figures

Figure 1- Response Option 0123 Output for Face Valid Alcohol Group 1 .67

Figure 2 - Item Map FVA Group 1 71

Figure 3 -Response Options Curve 0123 for FVA Group 2.. 73

Figure 4 - Item Map FVA Group 2......... 76

Figure 5 - Response Option Curve 0123 FVOD Groupl 79

Figure 6 - Corrected Response Option Curve 0112 FVOD Group 1 81

Figure 7 - Item Map FVOD Group 1 85

Figure 8 - Item Map FVOD Group 2 90

Figure 9 - Item Map SYM Group 1 95

Figure 10 - Item Map SYM Group 2 97

Figure 11 - Item Map OAT Group 1 102

Figure 12 - Item Map OAT Group 2 104

Figure 13 - Item Map SAT Group 1 108

Figure 14 - Item Map SAT Group 2 110

Figure 15 - Item Map SAM Group 1 113

Figure 16 - Item Map SAM Group 2 115

Figure 17 - Item Map DEF Group 1 120

Figure 18 - Item Map DEF Group 2 123

xii
Figure 19 - Item Map FAM Group 1 128

Figure 20 - Item Map FAM Group 2 130

Figure 21 - Item Map COR Group 1.. 134

Figure 22 - Item Map COR Group 2 136

Figure 23 - Response Option 0123 Dichotomous SASSI-3 Group 1 141

Figure 24 - Corrected Response Option Curve 0112 Dichotomous SASSI-3 Group 1.. 144

Figure 25 - Item Map Dichotomous SASSI-3 Group 1 148

Figure 26 - Response Option 0123 Dichotomous SASSI-3 Group 2 , 149

Figure 27 - Corrected Response Option Curve 0112 Dichotomous SASSI-3 Group 2.. 152

Figure 28 - Item Map Dichotomous SASSI-3 Group 2 156

Figure 29 - Response Options Curves 0123 Dichotomous Whole SASSI-3 Group 1.... 166

Figure 30 - Corrected Response Options Curve 0112 Whole SASSI-3 Group 1 169

Figure 31 - Item Map Whole SASSI-3 Group 1 171

Figure 32 - Response Options Curves 0123 Whole SASSI-3 Group 2 , 173

Figure 33 - Corrected Response Options Curve 0112 Whole SASSI-3 Group 2 174

xiii
Chapter One

Introduction

Substance dependency and abuse are expensive problems in the United States of

America and have negative impacts on its citizens (Substance Abuse and Mental Health

Services Administration [SAMHSA], 2008). In addition to the loss of life, there is a loss

in work productivity, reduction in days attended at school, money spent for medical care,

and convictions and prison sentences due to alcohol and drug problems (SAMHSA,

2008). Based on this information, it is important for people who struggle with alcohol and

drug abuse to get proper diagnosis and treatment. Part of the diagnostic process can

involve mental health professionals' use of substance use screening instruments. Due to

the clinical implications of the assessment process, it is necessary that substance abuse

screening tools are psychometrically sound and accurately measure substance

dependence. A number of substance dependence screening instruments are available to

assist in this process. According to a study of professional addictions counselors, there

are four substance dependence screens that are most frequently selected by these

counselors as aids in their diagnostic processes (Juhnke, Vacc, Curtis, Coll, & Paredes,

1
2003). These are the Substance Abuse Subtle Screening Inventory (SASSI-3; Miller &

Lazowski, 1999), the Michigan Alcoholism Screening Test (MAST; Selzer, 1971), the

MacAndrew Scale-Revised (Mac-R: MacAndrew, 1965) from the Minnesota Multiphasic

Personality Inventory - 2's (MMPI-2; Butcher, Dahlstrom, Graham, Tellegen, &

Kaemmer, 1989), and the Additions Severity Index (ASI; McLellan, Luborski, Cacciola,

Griffith, McGranhan, & O'Brien, 1992). Of these four, the SASSI-3 (Miller & Lazowski,

1999) was identified by professional addiction counselors as being the most important

(Juhnke et al. 2003) for the following reasons: a) it measures alcohol dependence as well

as dependence on other drugs of abuse; b) it provides several measures of response bias

(e.g., defensiveness and random answering); and c) it is scored and interpreted according

to sex-specific national normative data.

The SASSI-3 (Miller & Lazowski, 1999) is a paper and pencil, self-administered,

two-sided substance dependence screen that includes 67 true and false items on the front

and two columns of items on the back, each set of which presenting the respondent with

four rating scale choices of never, once or twice, several times, and repeatedly. The two

columns of items on the back of the screening are labeled Face Valid Alcohol (12 items)

and Face Valid Other Drug (14 items), respectively. These two groups of items directly

inquire of the respondent to identify the extent of the use and impact the use has had on

his or her life. However, the items on the front are meant to be more subtle in nature and

therefore illicit less defensiveness, a response commonly identified among clients who

are questioned directly about their substance abuse behaviors.

2
The SASSI-3's items form ten scales. Seven of these scales, either independently

or in combination, are used for clinical decision making regarding the probability of a

client's substance dependence. This final disposition is made through nine decision rules.

If any of these decision rules are affirmative then the respondent is likely to be substance

dependent. The seven scales used in the decision rules include the Face Valid Alcohol

scale (FVA), the Face Valid Other Drugs scale (FVOD), the Symptoms scale (SYM), the

Obvious Attributes scale (OAT), the Subtle Attributes scale (SAT), the Supplemental

Addiction Measure scale (SAM), and the Defensiveness scale (DEF). A check of profile

validity is provided by way of the Random Answering Pattern scale (RAP). If the RAP

score is greater than one then the decision rules may be invalid due to the likelihood that

the respondent did not answer the items in a meaningful way. The final two scales are the

Correctional (COR) and the Family vs. Controls scales (FAM). These final scales lend

additional clinical information which may be included in treatment goals for the

respondent. All of the SASSI-3's scales Will be discussed in greater detail in subsequent

sections of this document.

Reliability and validity test results have been published on the SASSI-2 and -3.

The results of these investigations provide a variant range of agreement with what is

found in the SASSI-3 Manual (Miller & Lazowski, 1999). For instance, the reliability

findings published in the SASSI-3 Manual (Miller & Lazowski) identified high internal

consistency scores for the individual scales. However, these results have yet to be fully

replicated by other researchers whose findings were as much as seven to twenty points

lower (Clements, 2001; Laux, Salyers, & Kotova, 2005; Myerholtz & Rosenberg, 1998).

3
Only moderate agreement was found between the SASSI-3 and other instruments

purporting to measure similar constructs (Laux, Salyers, & Kotova, 2005; Myerholtz &

Rosenberg, 1998). An independent investigation of the SASSI-3's construct validity

(Gray, 2001), performed using factor analysis, failed to render the same ten factor

solution as reported by Miller and Lazowski in the SASSI-3 Manual (1999). However,

the factor structure of two of the SASSI-3's scales, the Face-Valid Alcohol scale and the

Face-Valid Other Drugs scale, did concur with the SASSI-3 Manual's data regarding

these two scales (Laux, Perera-Diltz, Smirnoff, & Salyers, 2005; Laux, Salyers, &

Kotova, 2005). Finally, the SASSI-3 Manual (Miller & Lazowski) reports high overall

accuracy, sensitivity and specificity rates when comparing the SASSI-3's classification

results to diagnosis of a substance dependence diagnosis using the Diagnostic and

Statistical Manual IV Text Revision (American Psychiatric Association [APA], 2000).

However, these high accuracy, sensitivity and specificity rates have not been replicated

by independent researchers (Arneth et al., 2001; Clements, 2002; Svanum & McGrew,

1995).

Independent researchers (Arneth et al., 2001; Clements, 20002; Feldstein &

Miller, 2007; Gray, 2001; Laux, Salyers, & Kotova, 2005; Svanum & McGrew, 1995)

appear to question the SASSI-3's reliability and validity in the context of that which is

published by the SASSI-Institute. However, there has been no discussion in the literature

about SASSI-3's alignment with the fundamental principles of measurement (Thurstone,

1927). These fundamental principles of measurement include unidimensionality,

linearity, invariance, and independence. These terms will be introduced and explained as

. 4
they apply to the SASSI-3 investigation. Unidimensionality means that an instrument is

evaluating just one construct (Bond & Fox, 2007). In this study, the instrument of interest

is the SASSI-3 and the construct that it purports to measure is substance dependence

(Miller & Lazowski, 1999). The authors of the SASSI-3 acknowledge that it was not their

intention to develop a unidimensional instrument, rather, the goal in developing the

SASSI-3 was to advance an instrument that could discriminate between those who have a

high probability of substance dependence and those who do not (Miller & Lazowski).

However, the instrument appears to be addressing the construct of substance dependence

with the exception of a couple of its scales. Therefore, in addition to unidemisionality it

would be interesting to explore the SASSI-3's measurement properties. Linearity is

conceptualized in terms of a yard stick (Bond & Fox). A hierarchy of items is created

according to level of difficulty with easy items on one end and difficult items at the other.

This hierarchy would indicate a lesser degree of substance dependence is needed to

answer the items at the bottom, and harder items on the top requiring greater degree of

substance dependence to answer. Like a yard stick measures height, the taller one is the

more height he or she has, for the SASSI-3 the more items a person endorses the more

likely he or she is to be substance dependent. Invariance means that the items will be

aligned on an equal interval "yard stick" measuring substance dependence. Independence

means that regardless of the sample being measured, the alignment of the items on the

instrument will not vary. This means that the items will align in equal interval levels like

inches on a yard stick.

5
Statement of the Problem

Junkhe, Vacc, Curtis, Coll, and Paredes (2003) reported that one of the screening

instruments most frequently used by addictions counselors is the Substance Abuse Subtle

Screening Inventory-3 (SASSI-3; Miller & Lazowski, 1999). The SASSI-3 has been

used in a variety of settings including but not limited to community mental health

agencies, college counseling centers, prisons, alcohol and drug treatment facilities,

inpatient hospitalization programs, and rehabilitation treatment centers (Miller &

Lazowski, 1999). The SASSI-3's psychometric properties have been studied by several

independent researchers (Arneth, Bogner, Corrigan & Schmidt, 2001; Clements, 2002;

Feldstein & Miller, 2007; Gray, 2001; Laux, Perea-Ditlz, Smirnoff & Salyers, 2005;

Laux, Salyers & Kotova, 2005; Lazowski, Miller, Boye, & Miller, 1998; Peters et al.,

2000; Svanum & McGrew, 1995) have been found to differ, at times significantly, from

those reported in the SASSI-3 Manual (Miller & Lazowski). These differences may be

related to the traditional methods of testing reliability and validity used by researchers.

However, what is unclear is whether the SASSI-3 meets the fundamental requirements of

measurement. And, if there is doubt about whether the SASSI-3 meets the fundamental

requirements of measurement, then there is also doubt about the implications of the

diagnoses it informs and the subsequent treatment recommendations that are prescribed

based on these diagnoses. Therefore, it is necessary to explore the measurement

properties of the SASSI-3 as this may lead to improvement in the instruments' accuracy

rates, validity, and reliability across samples.

6
Purpose of the Study

The purpose of the study is to investigate the SASSI-3's psychometric alignment

with the fundamental principles of measurement as represented using the Rasch model

(Rasch, 1960,1980). Specifically, this investigation will focus on investigating the

measurement properties of the entire instrument and the individual scales, evaluate the

reliability of the response options by identifying whether the participants are utilizing the

response scales as intended by the authors of the SASSI-3, and assess the linearity,

invariance, and independence of the instrument.

Research Questions and Corresponding Hypotheses

The following research questions will be addressed in this study.

General Research Question 1: Does modern measurement methodology assist in

the revalidation of the SASSI-3?

Research Question 1: Do the ten scales on the SASSI-3 represent a

unidimensional measure of substance dependence?

Research Hypothesis 1: A Rasch principal components analysis will

produce a unidimensional factor structure that accounts for 60% or more of the items'

total variance.

Research Question 2: Do the ten scales included on the SASSI-3 adequately

measure the construct?

Research Hypothesis 2: An analysis of item fit will produce infit and outfit

statistics indicative of low item error.

7
Research Question 3: Are measures from the SASSI-3 ten scales reliable for

diagnostic classification purposes?

Research Hypothesis 3: The SASSI-3 ten scales (as evidenced in the

item-map) will remain reliably defined across independent samples.

Research Question 4: Do the SASSI-3 Rasch analyzed ten scales clearly

discriminate between those who are substance dependent and those who are not?

Hypothesis 4: The SASSI-3 Rasch analyzed ten scales demonstrate high

discriminatory ability (via high Rasch Person Separation).

General Research Question 2: Does modern measurement theory assist in

improving the SASSI-3 instrument holistically?

Research Question 5: Does the SASSI-3 instrument, as a whole, represent a

unidimensional measure of substance dependence?

Research Hypothesis 5: A Rasch principal components analysis will

produce a unidimensional factor structure that accounts for 60% or more of the whole

instruments' total variance.

Research Question 6: Does the whole SASSI-3 adequately measure the substance

dependence construct?

Research Hypothesis 6: An analysis of item fit will produce infit and outfit

statistics indicative of low item error for the SASSI-3 instrument as a whole.

Research Question 7: Is the whole SASSI-3 reliable for diagnostic classification

purposes?

8
Research Hypothesis 7a: Rasch Reliability statistics demonstrate acceptable levels

of internal consistency for the S AS §1-3 instrument as a whole.

Research Hypothesis 7b: The holistic SASSI-3 construct (as evidenced in the

item-map) will remain reliably defined across independent samples.

Research Question 8: Does the whole SASSI-3 instrument demonstrate an ability

to clearly discriminate between those who are substance dependent and those who are

not?

Research Hypothesis 8: The holistic SASSI-3 demonstrates high discriminatory

ability (via high Rasch Person Separation).

Significance of the Study

The significance of investigating the SASSI-3' s measurement properties is to save

money. In the current state of the economy with budget cuts, mental health and drug

treatment benefits being reduced, alcohol and drug facilities closing due to funding issues

improvement in the screening instrument will save time and money. An improvement to

the SASSI-3's reliability and validity will improve diagnostic accuracy

(sensitivity/specificity) and potentially improve subsequent substance abuse treatment

recommendations based on these improved accuracy rates. With improved accuracy rates

the right clients will be receiving treatment, which leads to higher treatment success rates,

which leads to the public voting for levies and more funding for drug treatment programs.

Definition of Substance Dependence

In order to clarify the term "substance dependence", the following definition will

be used to operationally define substance dependence as provided by the Diagnostic and

9
Statistical Manual of Mental Disorders IV Text Revision (DSM TR-IV; American

Psychiatric Association, 2000). Substance dependence is a maladaptive behavioral

pattern involving substance use, within the past twelve months which leads "to clinically

significant impairment or distress" (p. 197). Three of the following seven criteria must be

met for an individual to be considered substance dependent: 1) The individual must have

demonstrated increase tolerance; 2) The individual must have withdrawal symptoms or

use of the substance to avoid withdrawal symptoms; 3) The individual must used more

than intended; 4) The individual tries to control or reduce the substance use despite

cravings to no avail; 5) The individual spends excessive time in substance seeking, use or

recovering behaviors; 6) The individual often neglects social, work, or other obligations

in favor of the substance use; and 7) Despite negative consequences, both physical and

psychological, the individual continues to use the substance.

Organization of Chapters

Chapter one introduced the problem and provided a rational for the study. Chapter

tv/o reviews the relevant literature. Chapter three presents the methodology to be used in

this study. Chapter four will consist of the results from the analysis and Chapter five will

include a discussion of the findings.

10
Chapter Two

Chapter two provides a review of the literature with an overview on substance

dependence and its impact on society and the Substance Abuse Subtle Inventory-3

(SASSI-3; Miller & Lazowski, 1999). Specifically, the review will begin with a

discussion of substance dependence, diagnosis and screening, and a review of the SASSI,

and its psychometric properties. Then, a discussion of modern measurement theory,

namely the Rasch model will follow. Finally, the chapter will close with a summary of

the SASSI-3 and the Rasch model.

Substance Dependence

Substance dependence and abuse in the United States has a negative impact on

society. For example, the latest Substance Abuse and Mental Health Services

Administration (SAMHSA; 2008) reports an increase in the use of medical services as a

result of substance dependence. According to the 2006 SAMHSA report, the number of

visits to an emergency department due to drug abuse increased roughly four percent,

while the US population only increased roughly three percent. Additionally, between

2004 and 2006, visits related to non-legitimate use of prescription medications increased

38 percent. The National Highway Traffic Safety Administration (2007) reports that

11
someone is killed roughly every 40 minutes by a drunk driver. In addition to the loss of

life, there is a loss in work productivity, reduction in days attended at school, money

spent for medical care, and convictions and prison sentences due to alcohol and drug

problems (SAMHSA, 2008). Substance dependence has a great economic toll on society.

Often, persons with substance use problems are sent for screening assessments to

determine if a substance use diagnosis is appropriate and, if so, to determine what

treatment should be prescribed. It is important to have appropriate screening instruments

available to effectively aid in the diagnosis and treatment process.

Alcohol and Drug Screenings

There are several methods by which counselors determine whether further

assessment and treatment for a substance dependence disorder diagnosis is appropriate.

One of these methods is to administer a substance dependence questionnaire or screening

instrument. The purpose of a substance dependence screening instrument is to aid and

assist the counselor in determining whether additional substance dependence assessment

is necessary (Adger & Werner, 1994). There are several screening instruments available

to assist clinicians in assessing for alcohol and drug related problems.

Junkhe, Vacc, Curtis, Coll, and Paredes (2003) surveyed professional addiction

counselors to determine which screening instruments were used most frequently. The

results of this survey suggest that there are four instruments that are most frequently

employed. These were the Substance Abuse Subtle Screening Inventory (SASSI-3; Miller

& Lazowski, 1999), the Michigan Alcoholism Screening Test (MAST; Selzer, 1971),

MacAndrew Scale-Revised (Mac-R: MacAndrew, 1965) from the Minnesota Multiphasic

12
Personality Inventory - 2's (MMPI-2; Butcher, Dahlstrom, Graham, Tellegen, &

Kaemmer, 1989), and the Additions Severity Index (ASI; McLellan, Luborski, Cacciola,

Griffith, McGranhan, & O'Brien, 1992). These professional addiction counselors also

identified the SASSI-3 as the most important assessment instrument (Junkhe et al., 2003).

In addition to the findings of Junkhe et al (2003), and because of the mixed literature

review regarding the instrument's reliability and validity, the SASSI-3 will serve as the

focus of this investigation.

SASSI-3 - The Instrument

The Substance Abuse Subtle Screening Inventory-3 (SASSI-3) is an instrument

designed to discriminate between people who have a high probability of being substance

dependent from those with a low probability of having a substance dependence disorder,

regardless of whether or not they are able or "willing to acknowledge relevant

symptoms" (Lazowski, Miller, Boye, & Miller, 1998, p. 115; Miller & Lazowski, 1999).

People with substance abuse and dependence disorders often deny the existence and

extent of the problem. The original SASSI was uniquely created to address the problem

of denial commonly identified by treatment providers through the inclusion of both

direct, or Content obvious, and indirect, or content subtle, items (Miller & Lazowski,

1999). Introduced in 1988, the SASSI has gone through two revisions. Currently, the

SASSI is available in three versions. There is an adult version (SASSI-3), an adolescent

version (SASSI-2A), and a version for Spanish speaking persons.

The SASSI-3's current version was published in 1999. The conversion from the

SASSI-2 to -3 was driven by a desire to reduce rate of false positives, which was 15.5

13
percent (Miller & Lazowski, 1999). The conversion process included the creation of a

new seven-item scale, the Symptoms scale, and the elimination of two items whose

wording was deemed to be Objectionable. The seven items forming the Symptoms scale

were unused items already included among the SASSI-2's item pool (Gray, 2001;

Lazowski et al. 1998). Gray asserted that the differences between SASSI-2 and-3 were

minor and as such, the literature base supporting the SASSI-2 could "readily be

generalized" to the SASSI-3 (p. 104). For the purpose of this study, both reliability and

validity findings for the SASSI-2 and SASSI-3 will be reported.

The current SASSI-3 instrument is printed on one page, front and back. The front

consists of 67 true-false items. The items on side one are typically referred to as indirect

or subtle as most, but not all, of the items do not directly inquire about the impact of

drinking or drug related behaviors. These 67 items make up eight of the SASSI-3's total

ten scales. The authors recommend that side one be administered first as the items on this

side are less likely to elicit defensiveness than those on side two, which directly ask about

substance use (Miller & Lazowski, 1999).

Side two of the SASSI-3 includes the face valid items which inquire directly

about alcohol and drug use, behaviors, and the impact thereof. Because the items on side

two are obvious in their intent to measure substance use, there is a potential that

respondents might fake-good or minimize their substance use, if any (Miller & Lazowski,

1999). The response choices for the items on side two are placed along a four-point

Likert-type scale with the options of "never" (0), "once or twice" (1), "several times" (2),

and "repeatedly" (3). For each scale, the score of each item is summed to produce a total

14
score. It takes approximately fifteen minutes to complete, score, and interpret the SASSI-

3. Counselors use a transparent overlay to calculate raw scores for each of the SASSI-3's

scales. These raw scores are then transferred to a profile sheet, which can be used to

approximate the individual's T-scores and percentile scores. A discussion of the scoring

rules will follow a general description of the SASSI-3's scales.

The SASSI-3 Scales

The SASSI-3 has ten scales, three of which are worded in such a way as to inquire

directly of the respondent of his or her use and the impact of the use of drugs and alcohol.

The directly worded scales are the Face Valid Alcohol (FVA), Face Valid Other Drug

(FVOD) and the Obvious Attributes (OAT) scales. The other seven scales are stated in a

subtle manner. The subtle scales are the Subtle Attributes (SAT), Supplemental Additions

Measure (SAM), Symptoms (SYM), Defensiveness (DEF), Family versus Control

(FAM), Correctional (COR), and the Random Answering Pattern (RAP) scales. All of the

scales are said to discriminate statistically between those who are and who are not

substance dependent (Miller & Lazowski, 1999). The FVA, FVOD, OAT, SAT, SAM,

SYM, and DEF are used in clinical decision making. This means that these scales

contribute to the decision rules for the clinician to further assess for treatment needs. The

RAP provides an indicator of how closely the respondent paid attention to the content of

the items, and the FAM and COR are experimental in nature and not used in the clinical

decision making process. Further discussion of the dichotomous clinical decision making

rules will follow.

15
Prior to engaging in clinical decision making regarding whether a respondent is

likely to be substance dependent, counselors must first check the respondent's score on

the Random Answering Pattern (RAP) scale. The RAP scale is a measure of random or

careless answering. In this regard the RAP scale is a global measure of the validity of the

respondent's approach to the process, and not the content, per se. The RAP scale is

typically reviewed first to verify whether the respondent completed the instrument in an

appropriate manner (Miller & Lazowski, 1999). This scale is a "measure of response

validity" (Laux, Salyers, & Kotova, 2005, p. 43). Any of these reasons is sufficient to

cause doubt about the validity of the SASSI-3's results. The SASSI-3 Manual

recommends that if the RAP scale score is 2 or greater than the screener should "interpret

with caution" due to the possibility that the respondent did not answer the questions in a

meaningful way or did not understand the directions (Miller & Lazowski, 1999, p. 11).

The RAP scale consists of six, true-false items that produce a range of scores from 0-6. If

the RAP scores suggest that the respondent did not answer in a random manner, the

counselor moves forward with the interpretation of the remaining SASSI-3 scales.

The first of the two scales derived from the items on the face valid or second side

is the Face Valid Alcohol scale (FVA). The response choices are arranged along a four-

point Likert-type scale with the options of "never" (0), "once or twice" (1), "several

times" (2), and "repeatedly" (3). The raw score range is 0-24 for the FVA scale. The

FVA scale consists of twelve questions inquiring directly of alcohol use behavior and the

impacts of use. Examples of item content include alcohol consumption with noon meals

and suicide attempts while under the influence of alcohol. As the reader can plainly see,

16
these items are face-valid in that the intent of the items to measure alcohol use is obvious.

High FVA scores represent intentional recognition and admission of alcohol use. Low

FVA scores may reflect an absence of alcohol use, or they could be the product of efforts

to minimize or deny alcohol consumption.

The second face valid scale is the Face Valid Other Drug (FVOD) scale. The

FVOD items consist of fourteen questions inquiring directly of drug use behavior and the

impact of use. The response choices for the FVOD items are a four-point Likert-type

scale with the options of "never" (0), "once or twice" (1), "several times" (2), and

"repeatedly" (3) with a total raw score range of 0-27. Examples of item content include

whether the respondent has had legal trouble as a result of drug Use and used drugs to

avoid withdrawal symptoms. Miller and Lazowski (1999) reported that the higher the

scores on the FVA and FVOD the more progressed the substance dependence disorder.

The Symptoms (SYM) scale is purported to measure the signs, symptoms and

correlates of substance dependence in a direct manner (Miller & Lazowski, 1999). There

are eleven items on this scale with dichotomous response options, "true" and "false" and

a raw score range of 0-11. Examples of item content include inquiring of the respondent

whether he or she has concern regarding memory loss and family history of alcohol or

drug use.

The Obvious Attributes Scale (OAT) scale also utilizes direct items. High OAT

scores have been shown to indicate a willingness to "admit symptoms" (Myerholtz &

Rosenberg, 1998, p. 440), recognize "problematic behaviors" (Miller & Lazowski, 1999,

p. 15), and "personal limitations" (Laux, Salyers & Kotova, 2005, p. 43) frequently

17
associated with substance dependence. While these items are direct in that they ask the

respondent to admit to personal foibles, they do not require the respondent to make the

connection that their foibles are associated with any particular source. Consequently, a

respondent could produce an elevated OAT score without elevating one of the Face-Valid

scales. An interpretation of this arrangement of scores might be that the respondent was

aware that problems were occurring without understanding that these problems were a

consequence of personal substance use. Examples of GAT item content include behaviors

such as impulse control problems and low tolerance for frustration. There are twelve

items on the OAT scale with a raw score range of 0-12. Examples of item content include

whether responsibilities have been avoided or forgotten as a result of substance use and

whether substance dependence has resulted in family conflicts.

The Subtle Attributes scale (SAT) consists of eight criterion-keyed items with a

range of raw scores between 0-8. Examples of item content include inquiries of whether

the respondent obeys laws and has excessive energy with a decreased need for sleep.

These eight items were selected solely on their basis of statistically distinguishing

between persons known to be substance dependent and persons known to not be

substance dependent. The specific content of empirically selected items is

inconsequential. The advantage to such items is that people who may be motivated to

conceal their substance use or those who are "in denial" about the extent to which they

may have a problem have no way to intentionally manipulate these items. Thus, they tend

to answer these questions differently than those who do not have a substance dependence

disorder (Laux, Salyers, & Kotova, 2005). The SAT scale is purported to measure the

18
predisposition of the respondent to developing a substance dependence disorder

(Meyerholtz & Rosenberg, 1998). Additionally, this scale has been able to discriminate

between substance abusers and non-abusers, regardless of their attempts to fake good or

fake bad (Miller & Lazowski, 1999).

The Defensiveness scale (DEF) consists of eleven criterion-keyed items and, like

the RAP scale, is used as a validity scale. The DEF scale measures "denial or deliberate

concealment of problems" (Myerholtz & Rosenberg, 1998) and is used in the decision

rules. As a result of the DEF scale being developed to discriminate between respondents

using the standard versus fake-good instructions, a respondent who has high scores on the

DEF scale may be making an effort to present him or herself in a positive way (Laux,

Salyers, & Kotova, 2005). Likewise, respondents achieve low DEF scores by endorsing a

high number of personal faults and foibles. Consequently, the DEF scale is also viewed

as an indirect measure of self esteem, depression, and, at the very lowest range, potential

suicidal ideation. The range of raw scores is 0-11 with scores of eight or higher

representing significant enough denial as to call the SASSI-3's results into question.

Examples of item content include inquiring about the amount of dangerous activities in

which the respondent has engaged and whether he or she is a restless person.

The Supplemental Addictions Measure (SAM) scale consists of fourteen

criterion-keyed items and has a range of scores between 0-14. Examples of item content

include whether the respondent feels worn out and whether he or she has experienced

periods of memory loss. This is the SASSI-3's third and final validity scale. The SAM is

designed to discriminate between respondents who are substance dependent and

19
defensive and those who are non-substance dependent with a more pervasive

defensiveness characteristic (Laux, Salyers, & Kotova, 2005; Miller & Lazowski, 1999).

The SAM scale is used to tease out whether elevated DEF scores reflect substances

specific defensiveness (high SAM score) or defensiveness due to some other reason (low

SAM score).

The Family versus Control Subjects (FAM) scale consists of fourteen criterion-

keyed items with a range of scores between 0-14. Examples of item content include

inquiry regarding whether the respondent would like more self-control and whether he or

she has ever broken the law. There are several potential interpretations of the FAM scale.

The SASSI-3 authors designed the FAM scale to assess the amount of focus a respondent

has on others (Miller & Lazowski, 1999). Myerholtz and Rosenberg (1998) reported that

the FAM scale identifies co-dependency. Still other researchers say it discriminates

between those who experienced substance abuse in their family of origin versus those

who did not (Laux, Salyers, & Kotova, 2005). The FAM is not used in the screening

decision rules but can be used to assess possible additional clinical issues needing

addressed in treatment.

The Correctional (COR) scale consists of fifteen criterion-keyed items with a

range of scores between 0-15. Examples of item content include whether the respondent

has wanted to leave his or her residence and whether he or she would like to hit another

person. Respondents with a high score on the COR scale endorse items in a similar

pattern as those who have extensive criminal histories and legal involvement (Miller &

Lazowski, 1999). This scale is purported to assess the level of treatment or supervision

20
needed by the respondent, if there is evidence of a due criminal history (Miller &

Lazowski). This scale is also not part of the screening decision rules. The reader is

cautioned that there is no published data to suggest that the COR scale predicts future
}
illegal behavior. •

Interpreting the SASSL3

The SASSI - 3 uses has nine decision rules that are used to arrive at a decision

about the respondent's likelihood of having a substance dependence disorder. Each of the

nine rules has between one and five criteria. These criteria are cutoff scores for seven of

the ten scales. If the cutoff score is met or exceeded, the rule is indicated as "yes". If

unmet, the rule is indicated as "no". Rules 1 and 2 are based solely on the FVA and

FVOD scales, respectively. Rules 3, 4, and 5 are based solely on data from the SYM,

OAT, and SAT scales respectively. The remaining rules 6-9, are based on a combination

of the various scales, both direct and indirect. Decision rule 6 must be a score of seven or

more for the OAT and five or more for the SAT to be a "yes". Decision rule 7 includes

two criteria. The first criterion is a combination of an FVA score of nine or more or an

FVOD or fifteen or more. The second criterion is a SAM score of eight or more. If both

criteria are met the Decision rule 7 is a "yes". Decision rule 8 must be a score of five or

more on the OAT, eight or more oft the DEF and eight or more on the SAM to be a "yes".

Decision rule 9 includes four criteria. The first criteria is a combination of an FVA score

of fourteen or more or eight or more on the FVOD. The second criteria is two or more on

the SAT, 4 or more on the DEF, and four or more on the SAM. If all four criteria are met

then Decision rule 9 is a "yes". An indication of "yes" on any of the nine decision rules

21
indicates a high probability of substance dependence. If all decision rules are answered

"no", it is an indication of a low probability of substance dependence. However, if the

respondent has a low probability of being substance dependent but had a score of eight or

more on the DEF scale then the counselor is cautioned that the results may be a false

negative.

SASSI Psychometrics

The following section will present and critique the available data regarding the

S AS Si's reliability and validity. Initially, the researcher will provide the psychometric

data provided in the SASSI-3 Manual (Miller & Lazowski, 1999). Then, the data

provided by independent researchers will be introduced. This section will conclude with a

critique of the available literature as well as a recommendation for a new approach to the

question of the SASSI-3's psychometrics. To begin, the researcher will provide a brief

review of the psychometric constructs "reliability" and "validity" as used in these

sections.

Reliability. Reliability means that an instrument yields stable results for a given

sample (Bartholomew, 1996; Mark, 1996; Traub, 1994). It is important to understand that

data about an instrument's reliability is sample specific (Gray, 2001). That is, reliability

is an attribute of and is specific to the sample and its data, rather than a characteristic of

the instrument. There are several methods of assessing the reliability of an instrument's

data. These methods are the test-retest, internal consistency, split-half, and inter-rater

reliability tests. Split-half and inter-rater reliability tests are not appropriate tests of

reliability for a screen such as this and thus have not been used to evaluate the SASSI-3.

22
Correlation statistical tests are used to evaluate reliability and include Cfonbach's Alpha,

often referred to as the alpha coefficient, (Cronbach, 1951), the Pearson product moment

correlation coefficient, the Kuder-Richardson 20 (KR-20; Kuder & Richardson, 1937),

andt-tests.

Test-retest reliability is used to explore the stability of the results from a given

instrument over a brief time period (Sproll, 1995). A scale is said to have test-retest

reliability when the value it assigns to a trait does not fluctuate between pretest and

posttest administrations. To test the stability of an instrument, a sample is administered

the instrument once and then a second time following a two- or four-week time delay. A

correlation coefficient between the first and second administration is calculated, the

magnitude of which is reported as the stability coefficient. The instrument is reported to

be reliable if the results of the test-retest yields stable scores across the time delay, from

time one to time two.

Internal consistency assesses the degree to which an instrument's items are

measuring a similar construct (Sproll, 1995). The internal consistency estimate provides

an indicator of the homogeneity of a set of items on an instrument (Mark, 1996). To

investigate an instrument's internal consistency, the instrument need only be

administered once, after which a statistical procedure will report the overall mean

correlation of each item's variance with each other item on the instrument. The

instrument is reported to be reliable, if the items strongly correlated with one another

(Reis & Judd, 2000). The internal consistency of an instrument is commonly referred to

as alpha co-efficient. The statistics used to evaluate the internal alpha coefficient are the

23
Cronbach's Alpha (Cronbach, 1951) and the Kuder-Richardson 20 and 21 (KR-20 & KR-

21; Kuder & Richardson, 1937). Each of these formulas measure internal consistency;

however, they are to be used in different circumstances. The Cronbach's Alpha can be

used with instruments employing any type of response option scales (i.e., two choice

scales to more than two response option categories such as the Likert-type response

scale). The KR-20 and KR-21 were designed to be used specifically and exclusively for

dichotomously scored instruments, correct/incorrect or with only two response options^

such as yes/no or true/false (Crocker & Algina, 1986).

As demonstrated above while the reliability of an instrument is the first question

to be explored when investigating an instrument's psychometric capabilities, reliability

only answers the question of whether or not an instrument provides consistent results.

The question of whether or not an instrument measures what it is purported to measure is

answered by an investigation of its validity. Validity will be discussed below.

Validity. Validity has been considered in the past as "most fundamental and

important in psychometrics" (Angoff, 1988, p. 19). An instrument that is valid measures

what it reports to measure (Mark, 1996). This means that if an instrument is reported as

being able to assess substance dependence, the instrument will indeed measure substance

abuse and not self-esteem, anxiety, or depression. While reliability can be investigated

empirically using statistics and formulas, "validity is more of a theoretical issue, and

therefore, its assessment is less straightforward" (Bartholomew, Henderson, & Marcia,

2000, p. 300). There are three ways to explore an instrument's validity. Those methods ,

are grouped as content validity, construct validity, and criterion-referenced validity.

24
Content validity is sometimes referred to as "face validity", but the two concepts

are not synonymous. Face validity is defined by Mosier (1947) as "the extent to which

the items appear to measure a construct that is meaningful to lay persons or typical

examinees" (Cited in Cocker & Algina, 1986, p. 223). Content validity refers to the

extent to which the items in the instrument accurately reflect the domain of interest

(Bartholomew, Henderson, & Marcia, 2000). Due to concerns about denial and

defensiveness among people with substance abuse and dependence disorders, there is

some debate about the appropriateness of using face valid items to screen for these

disorders.

Content validity is the first evaluation used to classify an instrument as valid and

is typically done in the earliest stages of test development (Bartholomew, Henderson, &

Marcia, 2000). Several steps are followed to assess an instrument's content validity. The

first step is to establish the researcher's intent with regard to the instrument and develop a

pool of items. These items are then evaluated by content expert judges to ascertain their

degree of agreement with the objectives of the instrument (Cocker & Algina, 1986). The

correlation of these matches among judges is evaluated for congruence. Highly congruent

matches between judges, items, and objectives mean that the instrument has content

validity.

Construct validity explores whether the trait of interest is the characteristic

impacting the test-taker's performance of if another underlying characteristic is partially

responsible (Mark, 1996). Wallen and Fraenkel (1991) outlined a three step process

useful in identifying whether an instrument is high in construct validity. The steps are 1)

25
to create a clear definition of the variable, 2) based on a theory underlying the variable,

develop hypotheses which are formed about how people who possess a "lot" vs. a "little"

of the variable will respond to a particular situation, and 3) test the hypotheses both

logically and empirically - that is, by collecting additional information (Wallen &

Fraenkel, p. 95). For example: Is multicultural awareness the only quality being measured

on this instrument or is affability impacting his or her performance as well? In this

manner, the instrument's structure is evaluated for underlying constructs which may be

affecting the outcome.

Construct validity also evaluates "various relationships" to the variable of interest

(Sproll, 1995, p. 77). To investigate construct validity, the researcher developing the

above mentioned instrument would want to compare it to the individual's level of cultural

understanding measured by using multiple sources of data. The researcher would be

hoping for high correlation. The researcher may also compare the individual's level of

racism (an opposing trait), using multiple sources of data, hoping for a low correlation.

These two types of correlations are used to demonstrate two types of construct validation,

convergent validity and divergent validity, respectively. Convergent validity is the degree

to which an instrument correlates positively with another measure of a similar construct

(Mark, 1996). Using the example above, the researcher can count the number of multi-

cultural events an individual attends in a year to correlate with the multicultural

awareness. A high level of positive correlation would mean that the instrument has

convergent validity. Discriminate validation is demonstrated when an instrument is

negatively correlated with another measure of an opposing construct (Mark). Again, in

26
the example above the researcher can interview an individual regarding his or her beliefs

about racially charged political events and qualitatively analyze the results hoping for a

negative correlation between the results of the interview and the instrument. Or, the

researcher can evaluate two differing groups of people hoping to differentiate between

them using correlations with the instrument. A negative correlation would demonstrate

discriminate validity.

Construct validity can be evaluated using four methods which are correlations by

exploring the convergent and discriminate validities, differentiation between groups

analysis, factor analysis, and multitrait-multimethod matrix. Construct validation of the

SASSI-3 has only been completed using the convergent, discriminate, differentiation

analysis, and factor analysis validity exploration methods. Therefore, for the purposes of

this study, the multitrait-multimethod matrix method will not be discussed.

Convergent validity is associated with testing an instrument's construct validity. It

usually involves the use of alternative methods of measurement, if the primary construct

is evaluated via a survey instrument (Litwin, 1995). Convergent validity can be evaluated

using a correlation or kappa coefficient (Litwin). A correlation coefficient level of 0.70 or

more is acceptable in social science research (Nunnally, 1978). There is less agreement as

to the acceptable level of the kappa coefficient as there are several different

interpretations which distinguish the levels of kappa (see Altman, 1991; Carletta, 1996;

Landis & Koch, 1977; Viera & Garrett, 2005). Altman (1991) adapted the Ladis and

Koch (1977) interpretation table for kappa indicating that a kappa score of less than .20 is

27
poor agreement, .21-.40 is fair agreement, 41-.60 is moderate agreement, .61-.80 is good

agreement and .81-1.00 is very good agreement with another instrument.

Factor analysis is another method used by researchers to evaluate construct

validity. Factor analysis allows researchers to identify the structure of the instrument and

validate whether it is measuring a common factor (Sproll, 1995). Using the factor

analytic method of validity testing, researchers compute a correlation matrix between the

subjects and items and then conduct a reduction technique to identify the number of

underlying constructs accounting for the variation in the variables (Cocker & Algina,

1986).

Criterion referenced validity is the degree of agreement the instrument has with a

'gold standard' for "assessing the same variable" (Litwin, 1995, p. 37). This 'gold

standard', the criteria against which the instrument is compared, is regarded as the best

measure of the construct (Litwin). The instrument being tested may be a more efficient,

cost effective, quicker or shorter method of evaluating the same construct. Criterion

referenced validity tests have a five step design as identified by Crocker and Algina

(1986). Those steps include 1) identifying a construct and a method to evaluate it; 2)

selecting a sample; 3) collecting and maintaining the data for future evaluation; 4) when

available, obtaining data on comparison construct for each participant; and 5) using a

correlation coefficient to evaluate the degree of relationship between the primary

construct and comparison construct.

There are two subtypes of criterion validity, predictive and concurrent. An

instrument is said to have predictive validity if it can predict, through the use of a

28
correlation, a future second variable (Sproll, 1995). A common example of this type of

validity test involves pre-college entrance examinations such as the Scholastic Aptitude

Test (SAT). Admissions departments often base their determinations on SAT scores

among other criteria because the SAT is said to predict future performance in college

(Salins, 2008). An instrument is said to have concurrent validity, if it measures a

construct that is present at the time of the evaluation (Cocker & Algina, 1986; Mark,

1996). An example of concurrent validity would be to evaluate a person's substance

dependence using the SASSI-3 against their admission of substance dependence Or a

clinical diagnosis using the Diagnosis and Statistical Manual of Mental Disorders IV Text

Revision (DSMIV-TR; American Psychiatric Association, 2000) by a licensed mental

health professional.

Criterion referenced validity requires investigating the instruments ability to

correctly identify those who meet the 'gold standard' criteria and correctly identify those

who do not meet the 'gold standard' criteria. The terms used to describe the two

conditions described above are sensitivity and specificity, respectively (Altaian, 1991).

With regard to this study, sensitivity refers to the SASSI-3's ability to correctly identify

persons with a substance use disorder, and specificity is the ability to correctly identify

those who do not have a substance dependence disorder. These two concepts are closely

related to the concepts of false positives and false negatives. False positive is when a

screen incorrectly identifies someone as having a substance use disorder. False negative

is when a screen incorrectly says that a person does not have a substance use disorder. If

29
an instrument is high in sensitivity and specificity, it is low in false positives and false

negatives.

SASSI-3 reliability from the SASSI-3 Manual. The authors of the SASSI-3 Manual

report that the two-week test-retest stability coefficients are 1.0 for the face valid scales

and between .92 and .97 for the clinical scales for the sample taken from voluntary

participants at addiction treatment centers, psychiatric hospitals, vocational rehabilitation

programs, and a sexual offender treatment program across the United States (Miller &

Lazowski, 1999). They report the alpha coefficient as .93 for the entire instrument.

However, the reported alphas by scale are low with the exception of the face valid scales.

The FVA and FVOD scales' alphas are .93 and .95 respectively. The SYM, OAT, and

SAT scales' alphas progressively decrease from .79 to .69 to .27 alphas. The DEF scale

alpha is .63 and is followed again by a decrease in values for the SAM and FAM scales'

alphas of .37 and .33 respectively. The COR scale has a .71 alpha. These varying alpha

values are explained by the authors by identifying that the instrument was not developed

to be unidimensional and therefore, the alpha findings are "not a primary consideration"

(Miller & Lazowski, 1999, p. 26). The support for their findings has been mixed and are

addressed in the following section.

SASSI-3 reliability from independent researchers. Independent research

investigations have found the SASSI-3 data to be at varying levels of reliability with

inconsistent findings when compared to the results found by the originators of the

instrument. Consistent with Miller and Lazowski (1999), several researchers have

identified that stability coefficients are the most meaningful reliability test because the

30
SASSI-3 was not constructed to be a unidimensional measure (Lazowski et al. 1998,

Miller & Rosenberg, 1998). However, this assertion has been both supported and

challenged in research as independent test-retest reliability studies have shown

inconsistent findings (Feldstein & Miller, 2007). In their study investigating the efficacy

of the SASSI-3, Lazowski, Miller, Boye, and Miller (1998) utilized a two-week test -

retest to explore reliability with a similar population as that reported in the SASSI-3

Manual. They found SASSI-3 score stability to be between 1.0 for the face valid scales

and between .92 and .97 for the subtle scales. These findings are consistent with the

findings reported in the SASSI-3 Manual which is 1.0 for both the FVA and FVOD

scales (Miller & Lazowski, 1999). With a college sample, Laux, Salyers and Kotova

(2005) also found stability scores for a two-week test-retest reliability investigation of the

SASSI-3 for all of the scales.

However, in their study investigating the reliability of SASSI-2, Myerholtz and

Rosenberg (1998) tested the reliability of the SASSI-2 using the test - retest method with

college students. Using several subsamples, these researchers found that the two-week

stability coefficient using the Pearson product-moment correlation coefficients for the

FVA and FVOD scales were .82 and .89 respectively. This demonstrates a moderately

high level of correlation indicating that the set of scores remained relatively stable.

However, other studies have found higher correlation coefficients (i.e., Laux, Salyers, &

Kotova [2005] found .94 for the FVA). The face valid scale stability findings range from

1.0-.97 (Lazowski et al, 1998; Miller & Lazowski, 1999). In the social sciences, it is

generally acceptable if the stability is above .70 (Nunnally, 1978). The clinical scales

31
indicate a more widely spread correlation coefficient across the scales. According to

Myerholtz and Rosenberg (1998), the stability coefficients ranged from .78 to .54,

averaging .71 across the six clinical scales. This indicates less stability than reported by

Miller and Lazowski (1999) but moderate stability in the set of scores between testing

situations for the clinical scales.

With respect to overall classification, a significant but rarely reported finding for

SASSI psychometric reports, Myerholtz and Rosenberg (1998) found that of a

subsarnples of 55 participants, five (10%) of them had a change in classification. Two

were found to be non-chemically dependent after initially being classified as chemically

dependent and three were found to be chemically dependent after initially being classified

as non-chemically dependent.

In the four-week test-retest reliability investigation using a subsample of college

students, Meyerhotlz and Rosenberg (1998) the stability results were mixed. The Pearson

product moment correlation coefficients for the Face Valid (FVA and FVOD) scales were

.76 and .93. This demonstrates a moderate and high level of correlation between time one

and two, indicating that the set of scores remained stable across the scales. Myerholtz and

Rosenberg found for the clinical scales the correlation coefficient for the 4-week test-

retest group ranged from .78 to .42, averaging .63 across the six clinical scales. This

indicates less stability in the set of scores between testing times for the clinical scales. In

addition, of the 47 participants, nine (19%) were found to have a change in classification

from the first to the second testing time four weeks later indicating "poor" reliability

(Myerholtz & Rosenberg, 1998, p. 441). Four (10.5%) of the 38 participants initially

32
found to be non-chemically dependent for test 1 were classified as chemically dependent

four weeks later on the retest. Five (56%) of the nine participants initially found to be

chemically dependent on the first administration were classified as non-chemically

dependent four weeks later on the second administration. These 4-week stability estimate

results cannot be placed in the context of the SASSI-3 authors' findings as the SASSI-3

Manual only reports 2-week correlation coefficients (Miller & Lazowski, 1999).

The Myerholtz and Rosenberg (1998) findings indicate that there is higher

stability in scores for test-retest for the SASSI-2 direct scales and poor stability for the

clinical scales. They conclude that because the SASSI-2 purports to screen for an

"enduring trait of chemical dependency", the inventory should have more robust clinical

scales and fewer changes in status over testing situations (Myerholtz & Rosenberg, p.

445).

Four studies have investigated the internal consistency of the SASSI-3 or its

scales. The face valid scales have been found to have high internal consistency. The

reported subtle scales' internal consistency varies from good to poor. The coefficient

alpha for the FVA was .92 in a study comparing the SASSI-3 to other substance abuse

screening instruments with a college population (Laux, Salyers, & Kotova, 2005). The

coefficient alpha for the FVOD was .95 in a study which supports the psychometric

properties of the scale using a college student population (Laux, Perrera-Diltz, Smirnoff,

& Salyers, 2005). These two coefficient alpha findings for the FVA and FVOD scales are

consistent with those of Clements (2002) and Miller and Lazowski (1999). The decision

rule findings for SASSI-3 produced a .49 coefficient alpha (Clements, 2001). This means

33
that the items which are included in the scales used for the decision rules finding a person

substance dependent or non-dependent appear to moderately represent a single construct.

Clements also found that the three direct scales had the highest coefficient alphas, and the

subtle scales had low alphas.

SASSI-3 validity data from the SASSI-3 Manual. While the SASSI - 3 Manual

(Miller & Lazowski, 1999) identifies two scales as "face valid" the content validity is not

reported for those scales, any other scale, or the instrument as a whole. In stating the

obvious however, the FVA and the FVOD scales appear meant to be reflective of their

content validity. In a study conducted by Lazowski, Miller, Boye, and Miller (1998), the

researchers explored previous research that compared the SASSI-3 to the MMPI - 2

Addiction Acknowledgement Scale (Weed, Butcher, McKenna & Ben-Porath, 1992),

MMPI-2 Addition Potential Scale (Weed et al.), MAC-R (MacAndrew, 1965), the MAST

(Selzer, 1971), and the Millon Clinical Multiaxial Inventory-II (MCMI-II) Alcohol

Dependence Scale and Drug Dependence Scale (Millon, 1987). They found that people

who scored positive for substance dependent on the SASSI-3 had higher mean scores and

all of those that scored non-dependent on the SASSI-3 had lower mean scores on the

above listed instruments (Lazowski et al. 1998).

Miller and Lazowski report in the SASSI - 3 Manual (1999), when using the

instrument to compare to clinical diagnosis, 93 percent accuracy, 93.3 percent sensitivity

and 94.2 percent specificity for the SASSI-3. Again, comparing the SASSI-3 to clinical

diagnosis, Lazowski, Miller, Boye, arid Miller (1997) found an overall accuracy rate of

97 percent, a.sensitivity rate of 97 percent, and a specificity rate of 95 percent.

34
SASSI validity data from independent researchers. In the current SASSI literature,

researchers have compared the SASSI to other survey instruments measuring the same

construct (Laux, Salyers, & Kotova, 2005; Lazowski, Miller, Boye, & Miller, 1998;

Myerholtz & Rosenberg, 1998). When comparing the SASSI-2 to other instruments

which also purport to screen for alcohol and of drug problems, Myerholtz and Rosenberg

(1998) found that the SASSI-2 had less than acceptable (.61) convergent validity with the

CAGE (Ewing, 1984; Mayfield et al., 1974). "CAGE" is an acronym, the letters of which

represent the following alcohol-related traits and behaviors: C- have you ever felt you

should cut down on your drinking, A- have people annoyed you by criticizing your

drinking, G- have you ever felt bad or guilty about your drinking, and, E- have you ever

had a drink first thing in the morning to steady your nerves or to get rid of a hangover

(eye-opener). This acronym is a question prompt for clinicians in their screenings of

clients. Laux, Salyers, and Kotova (2005) compared the SASSI-3's classification

agreement with the MAST, CAGE and MAC - R (see Table 1). Using the Altaian

approach to Kappa interpretation, Laux, Salyers, and Kotova (2005) identified that the

agreement between the SASSI-3 and the CAGE and MAST is in the "high-moderate

range" (p. 47).

35
Table 1

Kappa Coefficient Agreement between Instruments by Authors

SASSI &

SASSI& Modified SASSI & SASSI & SASSI &

CAGE CAGE MAC MAST MAC-R

Laux, Salyers &


.49 .52 .29
Kotova — —
Moderate Moderate Fair
(2005)

Myerholtz &
61 .58 .34 .22
Rosenberg —
Good Moderate Fair Fair
(1998)

Note. Cells with no value indicate that no data were reported.

A factor analytic evaluation of the SASSI-2 was published by Gray (2001) who

found through confirmatory factor analysis and exploratory factor analysis that the ten

factor solution as suggested by the ten scales identified in the SASSI-3 Manual, was not a

good fit for his data. In fact, a two factor solution, with items mostly representing the

FVA and the FVOD scales, accounted for up to 53 percent of the variance (Gray). The

subtle items did not organize into the scales as identified by the SASSI-3 Manual and

were found to be "multivocal" (Gray, p. 109). This dimensionality was confirmed later in

two studies exploring the FVA scale and the FVOD respectively (Laux, Salyers, &

Kotova, 2005; Laux, Perera-Diltz, Smirnoff, & Salyers, 2005).

One group of researchers (Arneth, Bogner, Corrigan, & Schmidt, 2001),

examined the SASSI-3's predictive validity among patients with traumatic brain injury
36
(TBI). These authors compared the TBI patients' SASSl-3's results with their blood

alcohol level (BAL) at the time of their injury in an effort to see which would better

predict chemical dependency. They found that the SASSI-3's results were equally

predictive of chemical dependence as was their BAL.

Researchers (Arneth et al., 2001; Clements, 2002; Peters et al., 2000; Svanum &

McGrew, 1995) investigating the SASSI's criterion referenced validity have used

diagnoses by licensed mental health professionals using the DSMIV-TR criteria as the

gold standard. The results of these studies are mixed and are not consistent with of the

results published in the SASSI - 3 Manual (Miller & Lazowski, 1999). For example, in a

study of the SASSI -2, Svanum and McGrew (1995) found a sensitivity of 33 percent and

87 percent specificity in their college student population. Using an incarcerated

population, Peters et al. (2000) found an overall accuracy rate for the SASSI -2 of 69.4

percent with a sensitivity of 73.3 percent and a specificity of 62.2 percent. A similarly

lower finding came from a study of TBI patients using the SASSI-3 and diagnostic

criteria (Arneth et al., 2001). The accuracy rate was found to be 69.2 percent, sensitivity

rate equaled 70.8 percent, and specificity was 68.5 percent. All of which were reported to

be statistically significantly different than the normative sample at the p<.001 level

(Arneth et al.). Using a sample of college students, Clements (2002) also found lower

sensitivity and specificity findings, 65 percent and 89 percent, respectively. Clements

hypothesized that if the cutoff scores were lower for the college population, the SASSI-3

may have higher sensitivity. Upon further investigation Clements found that if the cutoff

scores were reduced, the sensitivity increased to 89 percent. Clearly, independent

37
researchers have been unable to replicate many of the sensitivity, specificity, and overall

classification rate findings found in the SASSI-3 Manual.

Limitations of the Psychometric Findings on the SASSI-3

The SASSI-3 is one of the most frequently used substance dependence screening

instruments used by counselors and has been identified as the "most important"

instrument of its kind (Juhnke et al., 2003). Unfortunately, there is significant question

and disparity as reported in independent research of the SASSI-3's reliability and

validity. This may be due to differences in the methods researchers use to evaluate

reliability and validity.

A limitation to the test-retest method of reliability testing involves extraneous

sources. Those sources, which may contribute to changes in scores from the initial to the

follow up test include: (a) the individuals attempts to recall what was previously asked or

how they answered, (b) changes in the characteristic being assessed, and (c) changes in

the conditions or environment and the interaction between the individual and those

changes the occasion accounted for in the variance (Traub, 1994).

The major limitation for testing the internal consistency for the SASSI -3 includes

the fact that many of the instrument's scales were not developed to measure one

construct. Rather, their test construction and item selection were guided by the criterion

keyed method, a procedure that identifies an item's statistical ability to discriminate

between people who are substance dependent and those who are not, regardless of the

item's content (Miller & Lazowski, 1999). Therefore, internal consistency is a less

38
relevant reliability measure for the SASSI-3 (Miller & Lazowski, 1999; John & Benet-

Martinez, 2000).

A limitation of construct validation studies is the potential for "mono-operational

bias" (Brewer, 2000, p. 9) which can reduce validity. Researchers can use two different

methods to evaluate the construct which will aide in eliminating mono-operational bias.

For example, if a researcher is using a survey instrument to evaluate the primary

construct, he or she should use a method other than a survey instrument to evaluate a

similar construct. Another limitation of exploring construct validity is that researchers

often use a variety of terminology when referring to the relationships between

instruments. Often researchers will interchange "convergent validity" with "concurrent

validity". While these concepts can compare two measures and varying sources of data,

they in fact are fundamentally different in their intentions. Construct validity involves

validating the instrument's theoretical underpinnings. Concurrent validity compares the

instrument to the existence of a criterion. In exploring the convergent validity of the

SASSI with other substance dependence screening instruments, the findings result in

mixed outcomes at best. These mixed outcomes may be the result of a lack of clarity and

agreement among researchers as to the correct terminology when referring to validity.

The primary limitation to these traditional methods of establishing reliability and

validity are sample specific. If the sample changes, the results of the reliability and

validity investigations will change as well (Keeves & Masters, 1999). The performance

of the person is dependent on the instrument in classical test theory because there is an

interaction between the instrument and the sample (Keeves & Masters). As a

39
consequence of this interaction, no inferences can be made about the performance of any

one person on any particular item. Instead, all that can be known is the individual's

performance on the test as a whole (John & Benet-Martinez, 2000). Additionally, there is

no way to empirically evaluate the quality of any individual item (Kagee & deBruin,

2007). Nor is there a way to empirically evaluate the response scales of the instrument in

classical test theory (Keeves & Masters). Finally, often researchers assign numbers to an"

ordinal scale and then assume that those numbers are interval and mean the same for each

item in order to use them in statistical analyses (Keeves & Masters). Each of these

limitations can be addressed through the use of measurement models in which a person's

performance and the items are independently scaled "along a continuous intervally scaled

latent trait" (Henderson, Taxman, & Young, 2008, p. 165).

Rasch Measurement

Rasch analysis (Rasch, 1960, 1980) is an alternative method of evaluating an

instrument's reliability and validity (Fox & Jones, 1998). This method of evaluating the

reliability and validity of a psychological instrument employs the guiding principles of

measurement (Thurstone, 1927). These principles are the same principles utilized when

measuring the height of a house, the weight of a baby, or the volume of a container of

liquid. The Thurstonian principles of measurement, which include (a) unidimensionality;

(b) linearity; (c) invariance; and (d) independence, can be applied to instruments which

are designed to measure psychological constructs in humans (Stone, 2007). Each of the

Thurstonian principles will be described in detail and will include an example of its

40
application on a poplar measure of general distress, the Symptoms Checklist-90, Revised

(Derogatis, 1975) as evaluated using the Rasch method by Elliott etal. (2006).

Unidimensionality means that an instrument measures one construct or

characteristic of an object at a time (Bond & Fox, 2007). For instance, a scale only

measures weight, not height. A ruler only measures length and not temperature. In

counseling research, this means that an instrument should only measure one trait or

construct. One example of such an instrument is the Symptom Checklist-90-Revised

(SCL-90-R; Derogaris, 1975). A psychometric analysis was conducted to evaluate the

psychometric properties of the SLC-90-R (Elliot et al., 2006). The researchers found that

the instrument measured the construct of "general clinical distress" as evidenced by the

measurement principal components analysis finding that the instrument accounted for 78

percent of the total variance (Elliot et al., p. 359). This means that the SCL-90-R is

measuring one construct; the items on this instrument aligned in a hierarchical fashion

according to difficulty.

Linearity implies that an object of measurement has more or less of the construct.

For instance, a person has more height or less height than another person, more weight or

less weight than another. In counseling research, an example is that an instrument should

measure more or less of a construct such as more or less anxiety, or more or less

depression in a person. This is evident in the SCL-90-R because the analysis using the

above principles of measurement found increasing levels of severity both among the

items and the people (Elliot et al., 2006). There was a continuum of items from more to

less difficult to answer from "general malaise" to "psychosis" and an increasing

41
agreeability for people from "non-clinical" to "extreme distress" indicating a hierarchical

arrangement of items (Elliot et al., p. 364).

Invariance means that a unit of measurement can be repeated without

modification in different parts of a continuous instrument and it will remain constant

across samples (Stone, 2007). For instance, five inches is equal to five inches regardless

of where on the ruler one begins to measure or what one is measuring. In counseling

research, this means that an instrument regardless of whether one starts measuring with

the low end units or the middle units of the "ruler" will result in measuring the same size

unit. In the psychometric analysis of the SCL-90-R, Elliott et al. (2006) found that the

instrument could be used to measure people at the high end of the ruler, demonstrating

that individuals were experiencing extreme clinical distress, and at the low end of the

ruler, demonstrating this part of the sample was experiencing non-clinical distress.

Independence means that as Thurstone stated, "a measurement must not be

seriously affected in its measuring function by the object of measurement" (as cited in

Wright, 1960, p. ix). For instance, whether a person is weighing apples at the produce

market, a baby at birth, or gold, the scale is an instrument used to measure and ounces are

the unit of measurement regardless of the item being measured. In addition, the scale

does not measure color of the apples, length of the baby, or karats of gold. In counseling

research, it follows that regardless of the population, criminals or mothers of infants, an

instrument designed to evaluate depression measures depression and not additional

factors related to being a criminal or a mother. To weigh produce, disperse speeding

42
tickets, and dispense medications, people have come to rely on systems of calibrated

measurement. The following questions are then raised by some researchers:

How is it that when we go to our offices to conduct educational research,

undertake some psychological investigation, or implement a standardized

survey, we then go about treating and analyzing those data as if the

requirements for measurement that existed at home in the morning no longer

apply in the afternoon? Why do we change our definition of and standards for

measurement when the human condition is the focus of our attention? (Bond

& Fox, p. 1)

Developing and improving instruments designed to measure constructs of human

psychological characteristics and impairments using the guiding principles of

measurement are important to furthering knowledge in the field of counseling research

(Fox & Jones, 1998). Many researchers have explored the psychometric properties of

several different psychological constructs and instruments using the Rasch model. Some

of those investigations include hostility (Strong, Kahler, Greene, & Schinka, 2005), the

Symptom Checklist-90-Revised (Elliott et al., 2006), school readiness (Banerji, Smith, &

Dedrick, 1997), detainees distress (Kagee & de Bruin, 2007), and evidenced based

practices in the criminal justice system (Henderson, Taxman, & Young, 2008).

Using the Rasch model is user friendly for instrument development and for

analyzing instruments' psychometric properties. It is based on a complicated probability

equation involving logarithms. However, Winsteps (Linacre, 2009) is the current

computer program used for this evaluation. Winsteps provides researchers easy to read

43
tables, charts, and graphs. The variable, scales, items, and people can be represented

through clear pictorial representations such as the person-item map and the response

probability curves. These charts and graphs will be referred to throughout this section and

will be carefully described to aid in understanding of the concepts.

Elliot et al. (2006) used the Rasch model to explore psychometric properties of

the SCL-90-R. Their method outlined the process by which other researchers can

evaluate instruments. This method, to be described below, involves the following steps:

1) evaluate the separation and reliability for the entire instrument, 2) response validation,

3) analyze the item fit, 4) evaluate the construct analysis, 5) evaluate the instrument for

unidimensionality by reviewing the fit statistics and the principal components analysis,

and 6) investigate whether the items function the same with a different sample. After

each step, the person and item separation and reliabilities will be evaluated for changes.

Rasch Separation and Reliability. In classical terms, the concept of internal

consistency is analogous with the Rasch model's person separation and item separation

reliabilities (Fox & Jones, 1998). The separation statistic assists in identifying the number

of distinct groups among the items and people (Elliot et al., 2006). From the separation

statistic the strata index can be determined (Bond & Fox, 2007). The strata inform

researchers of the statistically distinct groups of people and items found. It is suggested

that a separation of two is the minimum acceptable standard (Wright & Masters, 1982 as

cited in Elliot et al.). A separation of two or greater creates three or more distinct groups

of items or people. The output, known as item map, is another indication of person and

item separation as they can be visually distinguishable on this diagram (Elliot et al.).

44
The first step in evaluating an instrument is to review the separation and

reliability (Elliot et al. 2006). The Rasch outputs offer two sets of statistics for separation

and reliability, one is for the items and the other is for the participants which is called

"person". The second step is to evaluate the separation and reliability for the subscales.

The separation and reliability statistics will be the basis upon which the researcher will

compare any changes made to the response scales or elimination of misfitting items. For

example, if the researcher eliminates a misfitting item, this may affect the separation and

reliability statistics. If it increases these two statistics, than the outcome of the change is

positive. If it decreases these two statistics, than it may limit available information

provided by the data.

Response Validation. Researchers can use the Rasch model to determine whether

participants utilized the rating scale as established by the developers. This process is

called response validation and is important to do prior to interpreting further results, as

the response scales may not be working as the researchers intended (Elliot et al., 2006).

Completing a rating scale analysis allows researchers to test their hypotheses regarding

whether the rating scale was clear, had the correct amount of response choices, and

whether the participants were using the scale as developed (Fox & Jones, 1998).

Conducting an analysis of the rating scale also allows researchers to evaluate whether the

instrument's items function unidimensionally (Elliot et al.). For this analysis, the

commonly accepted rule is that the distance between two adjacent response options

(threshold) should be more than 1.4 but not more than 5 logits (Linacre, 1999). A logit is

a unit of measurement that is arranged on an equal interval log scale (Bond & Fox, 2007).

45
A second way to evaluate the rating scales is to visually inspect the response probability

curves output. For each item, a probabilistic curve is created from the data. This curve

demonstrates the likelihood of each response option being chosen by the sample. If any

response option curve does not exceed 50 percent probability of being selected or the

threshold is below or in excess of 1.4 or 5 logits, test developers should consider re-

evaluating the rating scale, redefining the options, or logically collapsing two response

options into one.

Response validation is an important step in instrument construction because it

assists in scale development and individual diagnosis. In the SCL-90-R Rasch analysis,

the researchers found that the respondents were not using the response scale as expected,

and therefore, for the instrument's response scales to function as intended and maintain

separation and reliability, it became necessary to collapse the five point Likert-type scale

to a three point scale (Elliot et al.). This means that the original scale i.e., (1) not at all,

(2) a little bit, (3) moderately, (4) quite a bit, (5) extremely, needed adjusted because

individuals did not respond to these categories in five distinct ways; but instead

individuals responded in three distinct ways: (1) not at all, (2) a little bit and moderately,

and (3) quite a bit and extremely. By collapsing the rating scale in this manner, the

instrument could be improved (Elliot et al.).

Item Fit Analysis. Item fit is similar to construct validation from a factor analysis

point of view. The purpose of item fit analysis is to investigate whether any item is

measuring "something qualitatively different" than the construct of focus (Elliot et al.,

2006, p. 362). Fit statistics are sensitive to "unexpected variance in response patterns"

46
(Henderson, Taxman & Young, 2008, p. 166). Bond and Fox (2007) identify the in-fit

mean square cutoff as 1.4 for items that are measuring something different. If an item is

over 1.4, the researcher should consider that the item in question is not measuring the

construct of interest. To explore item redundancy, the same criterion is applied as well as

an evaluation of the standardized residual correlations (Elliot et al.). Standardized

residual correlations are similar to tests of significance for each item. Items that have >

high standardized residual correlations are redundant and not contributing to the

information provided by the data. Items that are considered redundant have an out-fit z-

standard score of under 0.7 and also are among the highest standardized residual

correlation. Researchers should consider eliminating misfitting items as long as they do

not negatively impact the separation and reliability statistics. Fit statistics is a test of

unidimensionality (Elliot et al).

Construct analysis. Linearity is a concept that is unique to Rasch and is applicable

in the evaluation of instruments when conducting construct analysis. The Rasch model

analyzes linearity by allowing for item ordering along a continuum ranked by difficulty

(Elliot et al., 2006). A way to conceptualize Rasch construct analysis is to consider a flag

pole as the variable of substance dependence, from less, (i.e., low on the pole) to more

(i.e., high on the pole). Flags on the left of the pole are items. Items are arranged from

difficult to endorse at the top, to easy to endorse at the bottom of the pole. People are

arranged on the right of the pole from possessing more substance dependence at the top

to less substance dependence at the bottom of the pole. In considering the FVA scale of

the SASSI-3, the items inquire about the behaviors of respondents involving alcohol such

47
as drinking with lunch or suicide attempts when drinking. Respondents are more likely to

endorse the item regarding drinking with lunch than the item inquiring of suicidal

behavior when drinking. Therefore, the second item is considered more difficult. This

item continuum allows researchers to compare the order of items to clinical and

theoretical logic (Elliot et al.).

Unidimensionality. The extent to which an instrument measures only one

construct is unidimensionality and involves establishing the instruments invafiance

(Banerji, Smith, & Dedrick, 1997). Two methods are used to investigate an instruments

unidimensionality. The first method is to investigate the fit statistics, which was

discussed above. If an item is misfitting it may be expressing the existence of a multiple

dimension (Elliot et al. 2006). The second method of unidimensionality is through the

Rasch principal-components analysis (RPCA).The RPCA explains the overall amount of

variance for the instrument (Elliot et al.). In the study investigating the SCL-90-R, the

researchers found that while the instrument is not completely measuring a unidimensional

construct, the additional multidimensions are trivial in comparison to the overall distress

identified in the RPCA which demonstrated that the measure accounted for 78 percent to

the total variance (Elliot et al.).

Independence, The last step in using the Rasch model to investigate an

instrument's validity and reliability is to compare two samples to verify the consistency

of the measure (Elliot et al., 2006). This analysis allows researchers to assess whether a

measure maintains its meaning across different samples. This is the measurement

property of invariance or specific objectivity (Bartholomew, 1996; Bond & Fox, 2007).

48
Items should line up according to their level of difficulty, regardless of the population

being evaluated. Thus, an inherent quality of an item is its difficulty (Bartholomew). In

the study investigating the SCL-90-R, Elliot, et al. identified that there was no

meaningful or statistical difference between the clinical and non-clinical samples on the

instrument when the item maps were compared.

Summary

Chapter two provided a review of the SASSI's psychometric properties of

reliability and validity. The SASSI authors' and independent researchers' results varied in

many ways. Limitations of the traditional approaches taken were highlighted and a

different method was introduced, the Rasch model. The Rasch model has been used to

successfully investigate the quality of the measure of general clinical distress as evaluated

by the SCL-90-R. Of particular interest is the concept of unidimensionality. This is

important for the SASSI-3 because it is purported to be multidimensional. If the SASSI-3

can be found to work as a single "ruler" of substance dependence, then, the instrument

will function in a more efficient, effective, and holistic manner.

49
Chapter Three

Methods

Overview

Chapter Three presents the research methodology that was to answer the research

question regarding whether the SASSI-3 adequately measures substance dependence

according to the Thurstonian principals of measurement as analyzed by the Rasch model.

The participants were samples collected from two previous research investigations. The

demographic information of these samples is presented in this chapter. The SASSI-3 will

be reviewed and the procedure by which it was analyzed via the Rasch model will be

outlined.

Research Questions and Correlating Hypotheses

The purpose of this study is to investigate the psychometric properties of the

SASSI-3 using the Rasch model of measurement. The research questions are as follows:

General Research Question: Does modern measurement methodology assist in

the revalidation of the SASSI-3?

Research Question 1: Do the items included on the SASSI-3 represent a

unidimensional measure of substance dependence?

50
Research Hypothesis 1: A Rasch principal components analysis will

produce a uriidimensional factor structure that accounts for 60% or more of the items'

total variance.

Research Question 2: Do the items included on the SASSI-3 adequately measure

the construct?

Research Hypothesis 2: An analysis of item fit will produce infit and outfit

statistics indicative of low item error.

Research Question 3: Are measures from the SASSI-3 reliable for diagnostic

classification purposes?

Research Hypothesis 3 a: Rasch Reliability statistics demonstrate acceptable levels

of internal consistency.

Hypothesis 3b: The SASSI-3 decision rule scales (as evidenced in the

item-map) will remain reliably defined across independent samples.

Question 4: Do the SASSI-3 Rasch analyzed decision rule scales clearly

discriminate between those who are substance dependent and those who are not?

Hypothesis 4: The SASSI-3 Rasch analyzed decision rules demonstrate high

discriminatory ability (via high Rasch Person Separation).

General Research Question 2: Does modern measurement theory assist in

improving the SASSI-3 instrument holistically?

Research Question 5: Does the SASSI-3 instrument, as a whole, represent a

unidimensional measure of substance dependence?

Research Hypothesis 5: A Rasch principal components analysis will

51
produce a unidimensional factor structure that accounts for 60 percent or more of the

whole instruments' total variance.

Research Question 6: Does the whole SASSI-3 adequately measure the substance

dependence construct?

Research Hypothesis 6: An analysis of item fit will produce infit and outfit

statistics indicative of low item error for the SASSI-3 instrument as a whole.

Research Question 7: Is the whole SASSI-3 reliable for diagnostic classification

purposes?

Research Hypothesis 7: The holistic SASSI-3 construct (as evidenced in the

item-map) will remain reliably defined across independent samples.

Research Question 8: Does the whole SASSI-3 instrument demonstrate an ability

to clearly discriminate between those who are substance dependent and those who are

not?

Research Hypothesis 8: The holistic SASSI-3 demonstrates high discriminatory

ability (via high Rasch Person Separation).

Participants

The participants in this study consist of a total of 3 5 8 adults from two previous

research investigation samples collected from the greater Toledo Area (see Laux, Salyers,

& Kotova 2005 for the study involving the first sample). Institutional Review Board

approval was granted for the first study involving a sample of 230 students, men

accounted for 21.2 percent of the sample (n=49), and women accounted for 78.8 percent

(n=181). Participants included 165 undergraduate students and 65 graduate students at a

52
large Midwestern university, enrolled in social work or counseling courses (mean number

of years in college was 3.5, SD = 2, range = 0-10, median=4). The sample self-identified

ethnicity included 62.6 percent (n=144) European American, 24.8 percent (n=57) African

American, 3 percent (n=7) Native American, 2.6 percent (n=6) biracial, 1.7 percent (n-4)

Hispanic, .4 percent (n=l) Asian American, and 4.8 percent (n=l 1) did not report (Laux

et al. 2005). The mean age for this sample was 28.1 years (SD=10.4, range=18-59,

median=26).

The participants in the second sample included clients involved in a local

community agency and court cooperative program designed to assist in reunifying drug

and alcohol abusing parents with their children. The data was collected by the

professionals involved in the daily administration of the program and provided to the

evaluators of an expansion and enhancement grant awarded by Substance Abuse and

Mental Health Services Administration (SAMHSA). This data is the result of a second

sample that contained a total of 235 adults with 20.9 percent (n=49) men, 77.0 percent

(n=181) women, and 2.1 percent (n=5) did not report. The sample self-identified ethnicity

included 61.3 percent (n=144) European Americans, 24.3 percent (n=57) African

Americans, 3 percent (n=7) Native Americans, 1.7 percent (n=4) Hispanics, 2.6 percent

(n=6) biracial, 0.4 percent (n=l) Asian American, and 6.8 percent (n=16) did not report.

The mean age for this sample was 28 years (SD=11, range 19-59, median=23).

The samples were combined and selected, by utilizing a random numbers table, to

create two groups. The samples were combined to ensure that a portion of each of the

groups contains individuals with problems related to substance abuse necessitating some

53
therapeutic intervention. If the SASSI-3 functions as a measure, these samples should

represent a wide range on the substance dependent ruler. The first group was used for the

initial validation of the SASSI-3; the first purpose of this study. The second group was

used to evaluate the SASSI's independence against the first sample; the second purpose

of this study.

Instrument - The Substance Abuse Subtle Screening Inventory-3 (SASSI-3)

The Substance Abuse Subtle Screening Inventory-3 (Miller & Lazowski, 1999)

was developed to identify individuals who had a high probability of being substance

dependent (Miller & Lazowski). The instrument was first published in 1988, revised in

1999, and is now in its third edition.

The SASSI-3 is a paper, pencil, screening instrument printed on both sides of one

page. It is brief, easy to administer and score, and is economical. The front consists of 67

true and false items. The back has 26 items with a rating scale choices 0-3 indicating

never, once or twice, several times, and repeatedly. The front side includes subtle items,

which purportedly indirectly inquire about substance abuse related issues. However,

several of these items directly pertain to past alcohol and drug use.

The developers of the SASSI-3 identified ten scales upon which to measure

individuals for the probability of substance dependence (Miller & Lazowski, 1999).

These ten scales include the Face Valid Alcohol scale (FVA), the Face Valid Other Drug

scale (FVOD), the Symptom scale (SYM), the Obvious Attributes scale (OAT), the

Subtle Attributes scale (SAT), the Defensiveness scale (DEF), the Supplemental Addition

Measure scale (SAM), the Family vs. Control Subjects scale (FAM), the Correctional

54
scale (COR), and the Random Answering Pattern scale (RAP). The FVA and FVOD

scales' items directly question the respondent about his or her alcohol and other drug use.

The SYM scale assesses respondents' symptoms and consequences of drug and alcohol

use. Obvious traits associated with substance use are measured through the OAT scale,

while subtle traits are measured through the SAT scale. The DEF scale is a validity scale

which measures respondents' defensiveness to the SASSI-3's items. The SAM scale is

meant to discriminate between persons whose high DEF scores are due to substance

specific defensiveness from those whose elevated DEF scales are due to some other

source of defensiveness. The FAM scale evaluates the amount that the respondent

focuses his or her own feelings or thoughts on herself or himself versus the feelings or

thoughts of others. The COR scale reports on the similarity of a respondent's scores to a

group of persons known to have a history of criminal behavior. Finally, the RAP is a

validity scale which determines whether a respondent was answering in a random pattern.

If a respondent's RAP score is greater than one, then the respondent's screening

may be invalid. Therefore, prior to scoring, the RAP scale should be reviewed. The

scoring procedures include nine decision rules which are used to determine the likelihood

of substance dependence for the respondent. For each of the scoring rules, should the

respondent's scores exceed the cutoff, he or she is considered to be highly likely to be

substance dependent.

The results of independent investigations on the psychometric properties (Arneth,

Bogner, Corrigan, & Schmidt, 2001; Clements, 2002; Feldstein & Miller, 2007; Gray,

2001; Laux, Perera-Diltz, Smirnoff, & Salyers, 2005; Laux, Salyers & Kotova, 2005;

55
Lazowski, Miller, Boye & Miller, 1998, Svanum & McGrew, 1995, Sweet & Saules,

2003) have been mixed when compared to the results reported by the authors of the

SASSI (Miller & Lazowski, 1999). Often the findings have not reflected the high levels

of reliability and validity found by the SASSI authors.

The stability of an instrument's results for a given sample is referred to as the

instrument's reliability (Bartholomew, 1996; Mark, 1996; Traub, 1994). The two-week

test-retest reliability found for the SASSI by the authors (Miller & Lazowski, 1999) as

1.0 for the FVA and FVGD scales and between .92 and .97 for the clinical scales. This

finding was supported by Laux, Salyers, and Kotova (2005) but challenged by Myerholtz

and Rosenberg (1998). Myerholtz and Rosenberg found .82 and .89 for the FVA and

FVOD scales respectively. In a four-week test-retest reliability investigation Myerholtz

and Rosenberg (1998) found the FVA and FVOD scales to be .76 and .93 respectively.

With regard to internal consistency, Miller and Lazowski (1999) found that the SASSI

had a .93 coefficient alpha. While the internal consistency finding is less meaningful

because the SASSI was not developed to be a unidimensional instrument, this provides

evidence to the underlying one dimensional construct of substance dependence. The

findings for the face valid scales have been consistent with Miller and Lazowski.

However, Clements (2001) produced only a .49 coefficient alpha for the instrument. This

indicates that the instrument only moderately represented a single construct.

An instrument is valid if it measures what it reports toimeasure. Validity is

evaluated in several ways including the content, construct, and criterion referenced

approaches. Lazowski, Miller, Boye, and Miller (1999) found that people who score high

56
on the SASSI also score high on other instruments measuring the same construct such as

the MAST and the MMPI-2 Addiction Potential Scale. Likewise, people who scored low

on the SASSI also scored low on similar instruments. Independent researchers'

comparisons of the SASSI to other instruments produced mixed results. For example, the

overall classification agreement findings for the SASSI and CAGE agreement was .49

(Laux, Salyers, & Kotova, 2005) and .61 (Myerholtz & Rosenberg, 1998). When the

SASSI was compared to a modified CAGE, the agreement rate dropped to .58 (Myerholtz

& Rosenberg). The SASSI and MAC agreement was lower still with a .22 agreement

(Myerholtz & Rosenberg) but in another study had a higher agreement rate result at .52

(Laux, Salyers, & Kotova). However, when the SASSI was compared to the MAC-R the

agreement was again lower with a .29 (Laux et al., 2005).

Based on an exploratory factor analysis, the authors of the SASSI (Miller &

Lazowski, 1999) identified a ten factor solution; however, the only other study to

investigate the factor structure of the SASSI was unable to replicate this finding (Gray,

2001). Gray's data factor analysis identified a two factor solution comprised of mostly

the FVA and FVOD items, which accounted for 53 percent of the SASSI-3's total

variance. Two studies have also confirmed the factor structure of the FVA and FVOD

scales respectively (Laux, Salyers, & Kotova, 2005: Laux, Perea-Diltz, Smirnoff, &

Salyers, 2005).

Using a substance dependence diagnosis provided by a licensed mental health

professional with the criteria from the Diagnostic and Statistical Manual IV Text

Revision (DSMIV-TR; American Psychiatric Association, 2000) as the 'gold standard'

57
criterion, Lazowski, Miller, Boye, and Miller (1997) reported the accuracy rate for the

SASSI as 97 percent, sensitivity of 97 percent and specificity of 95 percent. Two years

later, in the SASSI-3 Manual, Miller and Lazowski (1999) reported a lower but still

acceptable accuracy rate for the SASSI to be 93 percent, sensitivity to be 93.3 percent,

and specificity to be 94.2 percent. However, the results from independent researchers

have again been mixed. Using the same gold standard* Svanum and McGrew (1995)

found the sensitivity to be 33 percent and specificity to be 87 percent for their college

student sample. Five years later using an incarcerated population the results improved

with an overall accuracy rate of 69.4 percent, sensitivity of 73.3 percent, and specificity

of 62.2 percent. Using a traumatic brain injury sample, the overall accuracy rate was

again lower than that found by the SASSI authors at 69.2 percent, sensitivity of 70.8

percent and specificity of 68.5 percent (Arneth et al., 2001). Finally, Clements also found

a lower sensitivity and specificity rating of 65 and 89 percent, respectively (2002).

Variable

When investigating the psychometric properties of an instrument, particularly

whether an instrument is an accurate measure of a construct, it is imperative to define the

variable being investigated. For this study, the variable being evaluated is substance

dependence (Miller & Lazowski, 1999). The SASSI-3 is intended to discriminate

between those who are likely to be substance dependent from those who are not (Miller

& Lazowski). The authors of the SASSI-3 also asserted that it was not their intention to

develop a unidimensional instrument. However, Myerholtz and Rosenberg (1998)

comment that the scales measuring homogeneous traits have higher internal

58
consistencies. The high coefficient alpha findings identified by several authors have

given evidence to the unidimensionality of certain SASSI-3 subscales (Clements, 2001;

Laux, Salyers, & Kotova, 2005; Laux, Perera-Diltz, Smirnoff, & Salyers, 2005; Miller &

Lazowksi). Independent evidence of unidimensionality has been reported for the FVA

and FVOD. The FVA, FVOD, SAT, OAT, and SYM scales and the SAM have been

identified as measuring substance dependence (Miller & Lazowski, 1999). However,

constructs other than substance dependence, such as validity and additional clinical

issues, are being measured with the FAM, DEF, COR and RAP scales (Miller &

Lazowski, 1999).

Procedures

One of the many advantages of using the Rasch model is that the outputs from the

analysis are in the form of easy to read "pictures". The pictures are graphs and charts

which demonstrate visually the response scales and the "ruler" upon which the items and

people can be aligned. The pictures will be described below as they apply to each step in

the procedure. The following method includes the steps used to evaluate the SASSI-3's

measurement properties.

Steps in conducting a Rasch Analysis. This study will follow the process of Rasch

analysis using the example set by Elliot et al. (2006). When conducting a Rasch analysis,

at each step described below, the person and item separations and reliabilities will be

reviewed for changes and improvements as a guide to determine whether the change was

effective.

59
Step one- Response validation. The purpose of exploring the response validity

first is to establish whether the participants are using the response scales as intended by

the authors of the SASSI-3 (Elliot et al, 2006). In addition, response validation is the first

step in determining whether the items function unidimensionally (Elliot et al). There are

two ways the response options will be validated. The first is by visually reviewing the

probability curves. Each response option should have over .50 probability of being

chosen. The second is by examining the thresholds. Each response option (1 to 2, or, 2 to

3, etc.) threshold should be between 1.4 and 5 units in distance from the next response

option. If the threshold is less than 1.4 or greater than 5 and the probability of being

choose is less than .50, then it is recommended that the response options be revised.

Step two - Item fit analysis. Item fit analysis is a form of construct validation and

a test of unidimensionality. By reviewing the z-standardized score for the cutoff of 2.0

any item over this value or any item with a negative point-biserial value is likely either

redundant or measuring a separate construct than intended. As such, items failing to meet

these standards should be either eliminated or revised to increase the differences of

meaning between these points. The fit analysis will also be conducted for people in the

sample using the same statistical standards.

Step three - Construct analysis. Construct analysis is a test for the Thurstonian

concept of linearity. Linearity means that a hierarchy will be established by a person-item

map output. A way to conceptualize the Rasch construct analysis output is to consider a

flag pole as the variable of substance dependence, from less, low on the pole, to more

high on the pole. Flags on the left of the pole are items. Items are arranged from difficult

60
to endorse at the top to easy to endorse at the bottom of the pole. People are arranged on

the right of the pole from possessing more substance dependence at the top to less

substance dependence at the bottom of the pole. The linear measure construct item map is

the variable of substance dependence extrapolated from the instrument. In this way one

can see the degrees of separation along the variable and where the separations are.

Step four - Assess for unidimensionality. The primary way to evaluate the

unidimensionality of an instrument is through Rasch principle-components analysis

(RPCA). RPCA is conceptually similar to the correlation matrix that is developed

through principal components exploratory factor analysis (Stevens, 1996). These

procedures differ, however, in that the RPCA approach not only provides first order

factor results, but additionally provides the researcher with evidence of the presence of

unsuspected secondary variables, if they exist (Bond & Fox, 2007). If the RPCA is over

60 percent and the remaining residuals do not explain greater than five percent variance,

then a researcher can conclude that the instrument is unidimensional.

Step five - Assess for measure independence. To evaluate an instrument's

independence, the instrument must be compared to a second comparable sample. A

comparable sample is one in which the researcher would expect to find a wide range of

the construct being measured. For example, if one was interested in measuring the

construct of intelligence, the researcher would generally need a sample that included

persons of low, average, and above average intelligence so as to be able to determine

whether or not the instrument included items at all points along this continuum. In this

study an appropriate and comparable sample would be composed of persons whose use of

61
substances ranges from none at all to those whose use has progressed to the point where

they are experiencing significant consequences in their lives. Tests of independence will

inform the researcher of whether the meaning of the instrument and the item hierarchy,

ranging from easy to difficult, remains consistent. The resulting person-item map from

the first group is visually compared against the second group's person-item map. These

maps are evaluated for consistency by observing the arrangement of items. That is, if the

items fall in relatively the same point along the hierarchy difficulty continuum for both

groups, then the researcher can conclude that the measure is sample independent.

To identify the SASSI-3's demarcation between individuals with substance

dependence and non-dependence (specific research questions six and seven) was

identified through the use of the person-item map. The individuals in the second group

was coded according to the traditional SASSI-3 decision rule (high probability vs. low

probability of substance dependence), Using the person-item map, these people was

evaluated to see where they fall on the substance dependence construct.

Limitations

Broadly, the limitations of this study are associated self-report instruments as well

as criticisms of the Rasch model. Both will be discussed below.

One limitation of this study is that the SASSI-3 is a self report instrument. Self

report, generally speaking, is one of the easiest ways to collect information on a construct

of interest. Respondents often answer items in socially desirable ways (Donaldson &

Grant-Vallone, 2002). That is, sometimes people answer self-report questions in a

manner that artificially minimizes (faking-good) or maximizes (faking bad) the severity

62
of their presenting issues. Such response styles may be a particular concern regarding

substance dependence screening due to possible secondary gains from results that are

positive or negative. Although the SASSI-3's authors purport to have lirnited the impact

of faking on the SASSI-3 by incorporating scales that discriminate between respondents

faking good or bad, initial evidence suggests that, when instructed to do so, college

students can fake-good and fake-bad on this instrument (Burck, Laux, Harper, & Ritchie,

2009). As such, self-report may be a potential limitation for this study in that the

objectivity of this SASSI has not been established.

A second limitation is the utilization of the Rasch model. Despite its multiple uses

and high reputation for instrument validation, some do offer critiques against the Rasch

model. These critics state that Rasch model analysis is not a theory building method as is

factor analysis and that the Rasch model theory is too simplistic (Bond & Fox, 2007).

According to the Rasch model, the theory drives the development of the instrument. This

principle is contradictory to exploratory factor analysis, which is employed to facilitate

the process of theory building. If a researcher is interested in exploring multidimensions,

it is ineffective to utilize Rasch analysis as the Rasch model only works for

unidimensional instruments (Kubinger, 2005). Since the SASSI was developed using the

Diagnostic and Statistical Manual (APA, 2000) criteria and was based on the

understanding of substance dependence, the instrument is already grounded in theory. In

addition, according to the SASSI-3 Manual (Miller & Lazowski, 1999), the SASSI-3

measures the probability of substance dependence. While it was not developed to be a

63
unidimensional measure, as seen above in the reliability studies, the instrument or some

of its scales appear to meet some characteristics of unidimensionality.

Another criticism involves the SASSI-3 's scoring procedures. When two

individuals' raw scores are compared, some researchers may report a person's ability in

an invalid manner (Kubinger, 2005). This can happen when two people have the same

raw total score yet one person (person A) correctly answered the first ten easiest

questions on a 2 5-question test but the second person (person B) correctly answered the

ten hardest questions on the same test. Both raw scores equal ten, yet, person B was able

to answer more difficult questions than person A. Therefore, the scoring may not

necessarily be correct. One way researchers using the Rasch model can rectify this

problem is by carefully analyzing the total scores versus high score prior to publishing or

utilizing their findings.

In Rasch analysis, the data must fit the model as opposed to factor analysis in

which researchers can adjust the model to fit the data. An additional limitation will be

that the data do not fit the model. This means that the instrument is not a measure of

substance dependence. However, this limitation will be unknown until after the analysis.

Finally, as Bond and Fox (2007) point out, "critics argue that we can't physically

align bits of the human psyche together to produce measures, as we can with centimeters

to make meters" (p. 6). This means that a string of substance dependence units cannot be

added together to distinguish between someone who is or is not substance dependent.

64
Chapter Four

Results

This chapter presents the results of the Rasch analysis on the archived data from a

study of the SASSI-3 on two samples of adults. The samples were a combination of two

samples from previous research. One sample was taken from a study involving a

community cooperative program with child protective services and family court, and the

other sample was taken from a study involving adults from a large metropolitan

university. Using a random numbers table the samples were divided into two groups and

then one group from each sample was combined to create a dichotomous sample. Both

groups contained a dichotomous sample which included some of the sample from the

community cooperative program with the child protective services study and family court

and some of the sample from the university study. These groups were referred to as

Group 1 and Group 2 throughout the course of the study. Group 1 consisted of 174

participants, men accounted for 23.6 percent (n=41) of the sample, and women accounted

for 76.4 percent (n=f 133). This sample self-identified ethnicity included 58 percent

(n=101) European American, 26.4 percent (n=46) African American, 1.7 percent (n=3)

Native American, 2.3 percent (n=4) biraeial, and 2.3 percent (n=4) Hispanic. Group 2

65
consisted of 175 participants, men accounted for 20 percent (n=35) of the sample, women

accounted for 79.4 percent (n=139), one person did not report his/her sex. This sample

self-identified ethnicity included 63.4 percent (n=l 1.1) European American, 22.3 percent

(n=39) African American, 2.3 percent (n=4) Native American, 2.3 percent (n=4) biracial,

3.4 percent (n=6) Hispanic, and 1.1 percent (n=2) Asian American.

Face Valid Alcohol Scale (FVA)

The initial person and item separation and reliabilities fof the FVA scale were

2.51I.%1 and 7.47/.9S, respectively. The minimum acceptable standard for separation is

2.0 (Wright & Masters, 1982). A separation of 2.0 translates statistically into 3 strata.

This means that the FVA's scale's reliability is excellent and its ability to distinguish

differences in the people is good. In this case, the FVA can be said to be a linear

construct which accurately measures the sample. However, in an effort to determine

whether improvements could be made, the researcher conducted analysis of the FVA

scale's response options, items, and underlying factor structure. Step one of the Rasch

analysis involved evaluating how the respondents were using the response options. Each

of the FVA scale's twelve items include four choices of responses to which respondents

can indicate the frequency to which they engage in the item's behaviors. These response

options and corresponding point value are: 0-Never, l-Once or Twice, 2-Several Times,

or 3- Repeatedly. Examination of the probability curves and the thresholds (Figure 1)

revealed that the respondents used all response options as expected by the authors of the

SASSI-3 except for the option 1-Once or Twice.

66

;
Figure 1

Response Option 0123 Output for Face Valid Alcohol Group 1

INPUT: 174 Persons 12 Items MEASURED: 174 Persons 12 Items 4 CATS


3.62.1

SUMMARY OF CATEGORY STRUCTURE. Model="R"


+- ' ^ ;--
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||CALIBRATN| MEASURE|

0 0 972 541-28.21 -27.8| .99 1.00| | NONE | (-22.88)1 0


1 1 324 181-10.73 -12.11 1.01 .96|| -8.50 | -7.79 | 1
2 2 299 17| 1.23 .74| .95 .82|| -4.94 | 6.15 | 2
3 3 188 111 16.97 18.291 1.22 1.1811 13.44 |( 25.53)1 3

MISSING 5 0 1 -3.57 | || | |
+ ; •-.--

OBSERVED AVERAGE is mean, of measures in category. It is not a parameter


estimate.
+ - ^ '•* •

CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|


LABEL MEASURE S.E. | AT CAT. -ZONE |PROBABLTY| M->C C->M|DISCR|

0 NONE | (-22.88) -INF -15.88 | I 87% 81%| |0


1 -8.50 .67 | -7.79 -15.88 -1.231 -12.56 | 36% 52%| 1.10| 1
2 -4.94 .81 | 6.15 -1.23 16.731 - -2.44 | 48% 45%| .93 1 2
3 13.44 1.08 |( 25.53) 16.73 +INF | 14.85 I 72% 50%| .811 3
+ '—+- --- ' •

M->C = Does Measure imply Category?


C->M = Does Category imply Measure?
CATEGORY PROBABILITIES: MODES - Structure measures at intersections

R 1.0 + +
O 100000 3|
B I 000 3333 I
A I 000 333 |
B .8 + 00 33 +
I I 0 ( 33 |
L | 00 3 |
I I 0 33 |
T .6 + 0 3 +
Y I 0 22 3 |
.5 + 0 222 222 33 +
0 1 0 22 *2 |
F . 4 + 0 2 3 22 +
I 111*1**1 3 22 |
R | 11 * 11 33 22 |
E' I 111 2 00 11 3 2 |
S .2 + 11 22 0 3*1 222 +
P I 111 2 0 3 11 222 |
O. I 1111 222 33*00 111 222 |
N 111111 2222 3333 000 11111 221
S .0 +************3333333333 000000000**************+

-40 -30 -20 -10 0 10 20 30 40


Person [MINUS] Item MEASURE
67
The 1-Once or Twice response option failed to reach the .50 probability curve cutoff.

This failure suggested that the sample did not reliably distinguish between option l-0nce

or Twice and the next adjacent category, 2-Several times. However, because all of the

other response options appeared to work as intended and because no improvements were

found in the person and item separation and reliabilities after collapsing strategies were

attempted, no changes to the response scale were made at this time (see Table 2 for

collapsing strategy options).

68
Table 2

Summary of Collapsing Strategy for FVA Group 1 Response Options

Rating Probability
Threshold2 P S & R IS&R RPCA
1
Scale Curve

0 = 0.95
0-1 = N/A
1=0.40
0,1,2,3 1-2 = 5.41 2.65A87 7.81/.98 94.8%
2 = 0.60
2-3 = 22.67
3 =0.90

0 = 0.95
0-l=N/A
0,1,1,2 1=0.80 2.48A86 7.50/.98 95.3%
1-2 = 24.00
2 = 0.95

0 = 0.95
0-l=N/A
0,0,1,2 1=0.80 1.99/.80 5.56A97 92.4%
1-2 = 44.00
2 = 0.95

0 = 0.95
0-1= N/A
0,1,2,2 1=0.40 2.40/.85 7.99A98 97.3%
1-2 = 6.76
2 = 0.95

Note. 1 = >/ .5 is acceptable. 2 = >/1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation <

Reliability. RPCA = Rasch principal components analysis.

69
In an effort to further improve the FVA's separation and reliability results, the

researcher inspected the FVA items and respondents for fit. Item and people are

considered to fit if the z-standardized score is less than 2.0 and the point-biserial is not

negative. If items or people are outside of these cutoffs, they are considered to be misfits

and should be considered for possible elimination. This inspection lead to a final iterative

elimination of twelve people. No items failed to meet the standards set forth for item fit.

The elimination of misfitting people resulted in the final person and item separations and

reliabilities of 2.65A87 for persons and 7.81/98 for the FVA scale. These separations and

reliabilities are improvements from the initial findings, which suggested a well defined

linear construct that accurately measures the people. The FVA scale is divided into ten

levels of difficulty and discriminates among nearly four groups of people ranging from

low to high agreeability on the items.

The third step in the analysis involved a review of the person-item map (Figure 2)

to explore the extrapolated construct.

70
Figure 2

Item Map FVA Group 1

FVA Gl Item-map
INPUT: 174 Persons 12 Items MEASURED: 174 Persons 12 Items 4 CATS

Persons MAP OF Items


<more>|<rare>
90 +

80

##
T FVA12-commit suicide

70

# T FVA9-effects recur

60 + FVAll-Nervous/shakes
##

FVA6-trouble
FVAlO-relationship

50 # +M
FVA8-argued
.# FVA3-energy FVA7-depressed
.# FVA1-lunch
#
FVA2-feelings
40 . # # •

#
####
#### M FVA5-physical probs

30

### FVA4-intended
#####

20 ###### +
S

10
.#####

0 .############ +
<less>|<frequ>
EACH '#* IS 2.
71
The resulting hierarchy of items resulted in a pattern from most difficult items to endorse

to least difficult items to endorse. When two items are aligned at the same place on the

hierarchy, the items are either theoretically redundant or at the same level of difficulty.

Overfitting items can be eliminated, if the infit mean-square is below 0.6 and the z-

standardized score is -2.0 or less. Despite appearing to be measuring the same theoretical

content, one of the items from the aligned group of FVA3/FVA7 could be eliminated

because the items in this combination fall within the item fit standards and are at the same

level of difficulty. Although it is the case both items remained because neither misfit.

Group l's FVA hierarchy is displayed visually on Table 2. The initial Rasch principle

components analysis (RPCA) indicated that 91.9 percent of the total variance was

explained by the instrument. With the elimination of twelve misfitting people the RPCA

increased to 95.1 percent of the total variance having been explained by the instrument,

which demonstrated improvement in the FVA. However, the item/person map means and

standard deviations were separated by nearly a standard deviation indicating that the

items were more difficult to endorse than the people were able to agree to them. An

example of this may be like a spelling bee. In this scenario the students would be third

grade level spellers and the words would be tenth grade spelling words. The words would

be more difficult to spell than the students would be able to spell.

In the final step of the analysis, the extrapolated variable was compared using the

data from a second comparable group using the same process. The FVA scale, using

Group 2 data, demonstrated similar person and item separation and reliability results as

were produced in the analyses of the first data set. While no changes were needed to the
72
response options, option 1-Once or Twice, as was reported from analysis from the first

data set, only met the probability curve at 0.4. (See Figure 3 for response curve and Table

3 for a summary of the collapsing strategy).

Figure 3

Response Options Curve 0123 for FVA Group 2


INPUT: 173 Persons 12 Items MEASURED: 149 Persons 12 Items 4 CATS
SUMMARY OF CATEGORY STRUCTURE. Model="R"
+ '—•—-—• ' '
CATEGORY OBSERVED|OBSVD SAMPLE UNFIT OUTFIT| |STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQIICALIBRATN| MEASURE|

0 0 934 591-43.44 -43.0| 1.05 1.111 I NONE | (-29.72)1 0


1 1 306 191-15.12 -16.91 ~\ .97 .62|| -16.78 | -11.60 I 1
2 2, 238 151 .18 .851 1.00 1.14| | -5.79 | 9.09 | 2
3 3 104 7| 32.26 32.941 1.15 1.06|| 22.57 |( 33.99)1 3

MISSING 2 01-38.61 | || | . |
+
OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.
+ — ; • • :

CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.I COHERENCE|ESTIMI .


LABEL MEASURE S.E. | AT CAT. ZONE 1 PROBABLTY | M->C C->M|DISCR|

0 NONE |(-29.72) -INF -21.65| | 90% 87%| |0


1 -16.78 .77 | -11.60 -21.65 -2.531 -19.15 | 47% 60%| 1.111 1
2 -5.79 .97 | 9.09 -2.53 24.021 -4.00 I 58% 50%| . 821 2
3 22.57 1.57 |( 33.99) 24.02 +INF | 23.12 | 80% 61%| .94| 3
+ • • -
M->C = Does Measure imply Category?
C->M = Does Category imply Measure?
CATEGORY PROBABILITIES: MODES - Structure measures at intersections

R 1.0 + +
0 | 33 1
B |00 333 |
A | 000 333 |
B .8 + 0 3 +
I I 00 33 |
L | 0 3 |
I | 0 2222222 33 |
T .6 + 0 22 2 3 +
Y | 00 22 22 3 |
.5 + 0 2 23 +
O | 011111112 322 |
F .4 + 110 211 3 2 +
| 1 0 2 1 33 2 |
R I 11 002 11 3 22 |
E | 11 20 1 3 2 |
S .2 + 11 22 0 11 33 22 +
P | 11 2 00 3*1 222 |
O 1111 22 00 33 111 222 |
N | 2222 333**0 11111 22|
S .0 +********3333333333333 000000000000****************+
E ++ + + + +--• + + —+ + ++
-40 -30 -20 -10 0 10 20 30 40 50
Person [MINUS] Item MEASURE

73
Table 3

Summary of Collapsing Strategy for FVA Group 2 Response Options

Rating Probability
Threshold2 P S & R IS&R RPCA
1
Scale Curve

0 = 0.90
0-l=N/A
1 = 0.45
0,1,2,3 1-2 = 15.15 2.787.89.7.767.98 98.6%
2 = 0.65
2-3 = 19.12
3 =0.95

0 = 0.90
0-l=N/A
0,1,1,2 1=0.90 2.71/.88 7.50/.98 99.4%
1-2 = 57.32
2 = 0.90

0 = 0.95
0-l=N/A
0,0,1,2 1=0.60 2.19/.83 5.52A97 99.7%
1-2 = 29.38
2 = 0.95

0 = 0.90
0-l=N/A
0,1,2,2 1=0.50 2.38A85 8.45A98 97.9%
1-2=11.88
2 = 0.90

Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &

Reliability. RPCA = Rasch principal components analysis.

74
After the iterative elimination of 23 misfitting people, the final person and item

separation and reliability findings increased to 2J8/.89 and 7.76A98 respectively. No

items failed to meet the cutoff for item fit, therefore none were eliminated. Two items

(FVA1 and FVA2) were aligned at the same place on the hierarchy and neither met the

statistical standards for item overfit. Therefore, they are measuring unique qualities. And,

neither can be eliminated. The final RPCA for the scale was also comparable at 98.6

percent of the variance being accounted for by the items. As was presented for the FVA

for Group 1, the hierarchy of FVA item endorsement difficulty for Group 2 is also

presented on Figure 4.

75
Figure 4

Item Map FVA Group 2


INPUT: 173 Persons 12 Items MEASURED: 149 Persons 12 Items 3 CATS

Persons MAP OF Items


<more>|<rare>
110 # +
I
100 +
IT FVA12-commit suicide
90 +
1
1
1
# 1
80 +
1
## 1
T| FVA11-Nervous/shakes
# IS FVA9-effects recur
70 +
1
1
# 1
i
1
60 + FVAlO-relationship
1
1
## SI
1 FVA6-trouble
50 +M FVA8-argued
.## 1
## 1 FVA7-depressed
### 1 FVA3-energy
i
1
40 .#### +
#### 1
1 FVA1-lunch FVA2-feelings
.#### M|
.### 1
30 +
######## IS
1 FVA5-physical probs
#### 1
1
20 +
####### 1
1
SI
.###### 1 FVA4-intended
10 +
IT
1
1
.#####
1
0 ######## +
<less>|<frequ>
EACH '#' IS 2.

76
A side-by-side comparison of the Groups' respective item-endorsement difficulty

indicated that ten of the twelve items remained constant on the hierarchy across Groups is

presented on Table 4. Of the three items found to be the most difficult items to endorse

by both Groups, the second and third most difficult were interchanged. Again, the items

fell into ten levels of difficulty, and the scale distinguished nearly four groups of people

among the sample from low to high on the variable.

77
Table 4

FVA Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item number Item Item number Item

FVA12 Suicide FVA12 Suicide

FVA9 Drinking effects FVA11 Shakes after sobering up

FVA11 Shakes after sobering up FVA9 Drinking effects

FVA6 Trouble at work, school FVA10 Relationship problems

FVA10 Relationship problems FVA6 Trouble at work, school

FVA8 Argued w/ friends/family FVA8 Argued w/ friends/family

FVA7 Depressed after sobering FVA7 Depressed after sobering

FVA3 Drink for energy FVA3 Drink for energy

FVA1 Midday drinks FVA1 Midday drinks

FVA2 To express feelings FVA2 To express feelings

FVA5 Physical problems FVA5 Physical Problems

FVA4 More than intended FVA4 More than intended

Easy to endorse

78
Face Valid Other Drug Scale (FVOD)

The initial person and item separation and reliabilities for the FVOD scale were

2.50/.86 and 4.69A96, respectively. Additionally, the initial RPCA was 83.9. Viewing

these findings collectively, the FVOD can be said to be a linear construct which

accurately measures the sample. However, in an effort to see if improvements were

possible, the response options, items, and the underlying factor structure were explored.

At each step the person and item separation and reliability were reviewed as a way to

evaluate the changes made to the instrument.

Like the FVA scale, the FVOD scale has the same four response options; 0-

Never, 1-Once or Twice, 2-Several Times, 3-Repeatedly. Unlike the FVA scale the

FVOD response scale did not seem to function as well. Inspection of the probability

curve and thresholds (Figure 5) indicated that option 1-Once and Twice and 2-Several

Times did not meet the acceptable standards.

Figure 5

Response Option Curve 0123 FVOD Group 1


INPUT: 174 Persons 14 Items MEASURED: 174 Persons 14 Items 4 CATS N

SUMMARY OF CATEGORY STRUCTURE. Model="R"


+ •
I CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFITJ |STRUCTURE|CATEGORY I
I LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQI |CALIBRATN| MEASURE!

| 0 0 698 451-15.08-15.21 1.14 1.201 ! NONE | (-16.26)1 0


1 1 1 259 17| -6.42 -5.691 .91 .95|| -.41 | -4.39 I 1
| 2 2 234 15| 1.85 2.04| .94 1.12|| -.70 I 4.26 | 2
I 3 3 369 24| 9.56 9.33| .94 .86|| 1.10 l( 16.43)1 3

IMISSING 8 1| -3.46 | I | I |
+ •
OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.

79
Figure 5 (Continued)
+-
CATEGORY STRUCTURE I SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. ZONE 1 PROBABLTY | M->C C->M|DISCR|

0 NONE I (-16.26) -INF -10.441 | 82% 65%| | 0


1 -.41 .69 I -4.39 -10.44 -. 071 -6.44 | 27% 47% | v .94| 1
2 ' -.70 .74 | 4.26 -.07 10.431 -.18 | 30% 47%| 1.04| 2
3 1.10 .78 |( 16.43) 10143 +INF | 6.55 | 78% 44%| 1.071 3
+- . — • • . • —

M->C = Does Measure imply Category?


C->M = Does Category imply Measure?

CATEGORY PROBABILITIES: MODES - Structure measures at intersections

R 1.0 + +
O 0000 3333
B 00000 33333
A 0000 333
B .8 + •00 333
I 00 33
L 00 33
I 00
T .6 + 33
Y
.5 + 00 33

.4 +
0 3
111100 **222222
11111 ***1 2222
.2 + 1111 222 3 0111 2222
1111 222 33 00 111 2222
111111 222 333 000 1111 222222
111111 2222222 3333 0000 111111 2222
^Q +************33333333 0000000*************+

-30 -20 -10 0 10 20 .30


Person [MINUS] Item MEASURE

Specifically, the calibration thresholds reflected the respondents' misuse of the response

options 1 and 2, which were both below the 0.2 on the probability curve. Respondents

were not reliably distinguishing between option 1-Once or twice and 2-Several times.

Considering the logic behind the options and the statistical evidence provided by the

thresholds, the researcher decided to use a collapsing strategy of combining the middle

two response categories in an effort to improve the FVOD's functioning. That is, the

researcher reanalyzed the data using 0,1 and 2, and 3 and the three response options.

This change produced an improvement in the person separation and reliability and a

80
minor decrease in the item separation and reliability scores of 2.61/.87 and 4.43A95,

respectively. The RPCA conducted after response options 1 and 2 were combined

decreased from 83.9 percent to 76.8 percent of the total variance accounted for. Despite

this decline, the final value was still above the minimum accepted standard. Further

evaluation demonstrated that the three option response scales seemed to work better in

this model because all response options exceeded the statistical standard cutoff ranges of

.50 on the probability curves and the threshold should be more than 1.4 units in distance

from the next response option, despite the decrease in item separation and reliability and

RPCA (see Figure 6).

Figure 6

Corrected Response Option Curve 0112 FVOD Group 1

INPUT: 174 Persons 14 Items MEASURED: 174 Persons 14 Items 3 CATS


3.62.1

SUMMARY OF CATEGORY STRUCTURE. Model="R"


+
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||CALIBRATN| MEASURE|

0 0 698 451-20.25 -20.4| 1.10 1.14|| NONE | (-20.35)1 0


1 1 493 31| -3.55 -2.92| .98 1.111 I -8.09 | .00 I 1
2 2 369 24| 13.58 13.131 .91 .85|| 8.09 |( 20.35)1 3

MISSING 8 1| -5.65 | || | |
:
+
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.

+
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. -—-ZONE |PROBABLTY| M->C C->M|DISCR|

0 NONE | (-20.35) -INF -11.591 | 80% 73%| |0


1 -8.09 .69 | .00 -11.59 11.591 -9.65 | 48% 63% | .95 1 1
2 8.09 .79 |( 20.35) 11.59. +INF | 9.65 | 75% 54%| 1.08| 3
:
+ •
M->C = Does Measure imply Category?
C->M = Does Category imply Measure?

81
Figure 6 (Continued)
CATEGORY PROBABILITIES: MODES - Structure measures at intersections

R 1.0 + +
0 I , 1
B |000 222|
A I 0000 2222 |
B .8 + 000 ' .. " 222 +
1 I 00 22 |
h I 00 22 |
•I | • • 00 22 |
T .6 + 00 22 +
Y | 00 111 22 |
.5 '+ 00 11111 11111 22 +
O I 1*1 1*1 I
F .4 + 111 00 22 111 +
| 11 00 22 11 |
R I 111 00 22 111 I
E | 111 0*2 111 I
S .2 + 111 22 00 111 +
P I 1111 222 000 1111 |
O 1111 2222 0000 111|
N | 2222222 0000000 |
S .0+22222222222222 00000000000000+

-30 -20 -10 0 10 20 30


Person [MINUS] Item MEASURE

No other collapsing strategy met the threshold and probability statistical standards for

response scales (see Table 5).

82
Tables

Summary of Collapsing Strategy for FVOD Group 1 Response Options

Ratine Probability
Threshold2 PS & R IS&R RPCA
1
Scale Curve

0 = 0.95
0-l=N/A
1=0.30
0,1,2,3 1-2 = 1.27 2.687.88 5.05/.96 88.3%
2 = 0.35
2-3 = 4.36
3 =0.95

0 = 0.95
0-l=N/A
0,1,1,2 1=0.60 2.78/.89 4.777.96 84.3%
1-2 = 20.46
2 = 0.95

0 = 0.95
0-l=N/A
0,0,1,2 1=0.35 2.27A84 4.11/.94 82.5%
1-2 = 0.02
2 = 0.95

0 = 0.95
0-l=N/A
0,1,2,2 1=0.30 2.56A87 5.24A96 88.9% i.
1-2 = 2.16
2 = 0.95

Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &

Reliability. RPCA = Rasch principal components analysis.

83
Step two of the Rasch instrument validation analysis involved reviewing the

person and item fit. Inspection of the items and persons lead to a final iterative

elimination of twelve people whose responses were inconsistent. No items failed to meet

the standards set forth for item fit. The item fit standards include the z-standard score

being below 2.0 and a positive point-biserial. The elimination of these twelve people

resulted in final person and item separations and reliabilities of 2.78A87 for persons and

4.77A96 for the FVOD scale. These findings suggest a well defined linear construct that

reliably distinguishes differences among the people. The FVOD scale can be divided into

roughly six groups of items, and it distinguishes four groups of people.

The third step in the analysis was to review the person-item map (Figure 7) to

examine the extrapolated construct.

84
Figure 7

Item Map FVOD Group 1


INPUT: 174 Persons 14 Items MEASURED: 162 Persons 14 Items 3 CATS 3.62.1

Persons MAP OF Items


<more>|<rare>
100 +

90

80 T+

FVOD9-doctor

70 +T
I
I
I
FVOD7-trouble w/law

60 . +S
I FVOD3-more aware
I
I FVOD12-avoid withdrawal FVOD14-treatment program
FVOD4-sex
. 'I
• . • I
50
. +M
• I
I FVOD8-really stoned
I FVODl-improve thinking FVOD5-help
. M| FVODlO-activities
I FVOD13-life FVOD6-forget
40
.# • +S FVOD2-feel better
. 1
I FVODll-aod
# I
I
30 .# +T.
I
# I
# S|
I
20 +
. I
I
10 .########### +
<less>I<frequ>
EACH '#' IS 6.

85
The hierarchy of items resulted in a pattern from most difficult items to endorse to least

difficult items to endorse. When two items are aligned at the same place on the hierarchy,

the items are either theoretically redundant or at the same level of difficulty. In Rasch this

means the items overfit. Overfitting items can be eliminated if the infit mean-square is

below 0.6 and the z-standardized score is -2.0 or less. Despite appearing to be measuring

the same theoretical content, one of the items from each of the aligned groups of

FVOD12/FVOD14, FVOD1/FVOD5, and FVOD13/FVOD6 can be eliminated because

all of items in these combinations fall within the item fit standards and are at the same

level of difficulty. Group 1 's item hierarchy is visually displayed on Table 7. The final

Group 1 RPCA indicated that 84.3 percent of the total variance was explained by the

scale. This improvement in this scale was achieved by adjusting the response scale and

eliminating the twelve misfitting people. In addition, the item/person map means and

standard deviations (Figure 6) were close in proximity, indicating that the difficulty of the

items were similar to the ability of the people. It should be noted that while the means

were close in proximity, it appeared that only the most extreme people were identified on

the FVOD scale, while the majority of the sample were at the bottom of the scale. This

may be due to the fact that only a small number of people in the sample were found to be

dependent to drugs other than alcohol.

The extrapolated variable, from Group 2's data, was compared to the variable

constructed using the data from Group 1, using the data from Group 2 using the same

process. The FVOD scale, using Group 2 data, demonstrated similar person and item

separation and reliability findings. As with Group 1, the response options were not being

86
used as intended by the authors of the SASSI-3. By reviewing the thresholds and

probability curves the following collapsing strategy was developed (see Table 6). The

two middle response options, 1-Once or Twice and 2-Several times, were combined. This

allowed for a better functioning response scale and an increase in the person and item

separation and reliability findings.

87
Table 6

Summary of Collapsing Strategy for FVOD Group 2 Response Options

Rating Probability
Threshold2 P S & R IS&R RPCA
1
Scale Curve

0 = 0.95
0-l=N/A
1=0.30
0,1,2,3 1-2=1.24 2.827.89 4.777.96 88.7%
2 = 0.45
2-3 = 13.94
3 =0.95

0 = 0.95
0-l=N/A
0,1,1,2 1=0.65 2.97/.90 4.19/.95 84.3%
1-2 = 28.18
2 = 0.95

0 = 0.95
0-l=N/A
0,0,1,2 1=0.45 2.49A86 4.21/.95 89.8%
'1-2*11.36
2 = 0.95

0 = 0.95
0-l=N/A
0,1,2,2 1=0.30 2.65A88 4.83A96 90.5%
1-2 = 3.18
2 = 0.95
Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &

Reliability. RPCA = Rasch principal components analysis.

88
After the iterative elimination of thirteen misfitting people the final person and item

separation and reliability findings increased to 2.97/.90 and 4.19/.95 respectively. No

items failed to meet the cutoff for item fit, therefore no items were eliminated. The final

RPCA for the scale was also comparable to that of Group 1 's RPCA as 84.3 percent of

the variance was accounted for by the FVOD scale's items. This means that the FVOD

scale using 14 items can be separated into about six groups and identifies about four

groups of people reliably. As was reported for the FVOD for Group 1, the hierarchy of

FVOD item endorsement difficulty for Group 2 is also presented on Figure 8.

89
Figure 8

Item Map FVOD Group 2


INPUT: 173 Persons 14 Items MEASURED: 158 Persons 14 Items 3 CATS 3.62.1

Persons MAP OF Items


<more>|<rare>
90 . +

. T
80

.#
FVOD9-doctor

70 # +T

. S
#

60 +s
FVOD7-trouble w/law
FVOD12-avoid withdrawal FVOD3-more aware
FVOD14-treatment program FVOD4-sex

50 +M
FVOD5-help
FVOD13-life FVOD8-feally stoned

FVODlO-activities

FVODl-improve thinking FVOD6-forget


40 +S
FVOD2-feel better

FVODll-aod

SI
30 .#. +T

20 # +

10 .############ +
<less>|<frequ>
EACH '#' IS 5.
90
Four pairs of items were aligned on the variable FVOD12/FVOD3, FVOD14/FVOD4,

FVOD13/FVOD8, and FVOD1/FVOD6. All of the items met the statistical standards for

item fit and appear to measure different content. One item from each of the pairs can be

eliminated. A side-by-side comparison of the Groups' respective item-endorsement

difficulty indicated that eight of the scales fourteen items remained constant on the

hierarchy across Groups (see Table 7). The two items found most difficult to endorse by

both Groups on the hierarchy and the three items found to be the easiest to endorse by

both Groups remained consistent. However, the items around the means were not aligned

across Groups.

91
Table 7

FVOD Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item Number Item Item Number Item

FVOD9 Talk a Dr. into it FVOD9 Talk a Dr. into it

FVOD7 Trouble with law FVOD7 Trouble with law

FVOD3 Become more aware FVOD12 Avoid withdrawal

FVOD 12 Avoid withdrawal FVOD3 Become more aware

FVOD 14 Treatment program FVOD14 Treatment program

FVOD4 Increase sexual pleasure FVOD4 Increase sexual pleasure

FVOD8 Really stoned FVOD5 Forget helplessness

FVOD1 Improved thinking FVOD 13 Drugs keep from life

FVOD5 Forget helplessness FVOD8 Really stoned

FVOD 10 Drug-related activities FVOD 10 Drug-related activities

FVOD 13 Drugs keep from life FVOD1 Improved thinking

FVOD6 To forget pressure FVOD6 To forget pressure

FVOD2 Feel better FVOD2 Feel better

FVOD 11 Drugs/alcohol together FVOD 11 Drugs/alcohol together

Easy to endorse

92
Symptoms Scale (SYM)

The initial review of the person and item separations and reliabilities findings

indicated 1.28/.62 for persons and 4.90/.96 for items of the SYM scale respectively. This

means that the scale can be divided into six groups of items in terms of difficulty, but

does not measure the people in a reliable way. While the person separation does not meet

the standard of 2.0, a person separation of 1.28 suggests that we may marginally
i
distinguish between two groups of people with significant error. The purpose of the SYM

scale is to distinguish between two groups, those who have a high probability of

substance dependence and those who do not.

Typically, the standard first step in conducting a Rasch analysis is to evaluate the

scale's items' range of responses. However, the SYM scale's items, as well as those of all

the other SASSI-3 scales, have only two response choices: true or false. Dichotomous

response options have an equal probability of being selected. Therefore, the review of the

response scales for the SYM and all subsequent scales was unnecessary. Step two of the

Rasch instrument validation analysis involved reviewing the person and item fit. Further

evaluation was conducted in an effort to improve this scale's separation and reliability

results. The researcher reviewed the fit statistics for the items and persons. This review

lead to a final iterative elimination of twelve people and two items that failed to meet the

standards set forth for item fit. These eliminations resulted in a decrease in function for

the scale as evidenced by the final person and item separations and reliabilities of

1.16/.57 for persons and 6.65/.98 for the remaining eight items on SYM scale. This

93
suggests a reasonably well defined linear construct but the construct does not do a good

job in measuring the people.

The third step in the analysis involved a review of the person-item map to explore

the extrapolated construct. The resulting hierarchy of items resulted in a pattern from

most difficult to endorse to least difficult to endorse (see Figure 9).

94
Figure 9

Item Map SYM Group 1

INPUT: 174 Persons 10 Items MEASURED: 162 Persons 9 Items 2 CATS

Persons MAP OF Items

<more>|<rare>
80

,####
Q55 MORNING
70

.########

Q58 INTO TROUBLE

60 ####### +

Q35 MEMORY

. # • # # # # Q59 FAMILY PROBLEMS

Q54 NEGLECTED
50 +M
######### Q56 TEENAGER

Q42 TOO OFTEN

########* Q40 REMEMBER


40

.#########

30

20 ############ +

Q27 TOO MUCH

10 .########## +
<less>|<frequ>
EACH '#' IS 2.

95
Group 1 's item hierarchy is visually demonstrated in Table 8. The initial Rasch principle

components analysis (RPCA) indicated that 63.7 percent of the total variance was

explained by the instrument. With the eliminatibn of one item (Q60 [Drink away from

home]) from the SYM scale the RPCA increased to 92.9 percent of the total variance

having been explained by the instrument. This demonstrated improvement in this scale by

eliminating misfitting items. In addition, the item/person map means and standard

deviations were close in proximity indicating that the items were as difficult to endorse as

the people were able to agree to them.

The extrapolated variable was compared using the data from Group 2 by

following the same process in the analysis of the measurement function of the scale.

After the iterative elimination of sixteen misfitting people and two items, the final person

and item separation and reliability findings increased to 1.33A64 and 7.23A98

respectively. No items failed to meet the cutoff for item fit, therefore no items were

eliminated. The final RPCA for the scale was also comparable at 96.6 percent of the

variance being accounted for by the items. This means that the SYM scale items can be

divided into roughly ten groups. And, because the separation of people is not greater than

2.0 the SYM scale does not reliably distinguish between those with a high probability of

substance dependence and those who do not. As was presented for the SYM for Group 1,

the hierarchy of SYM item endorsement difficulty for Group 2 is also presented on

Figure 10.

96
Figure 10

Item Map SYM Group 2 „

INPUT: 173 Persons 10 Items MEASURED: 157 Persons 8 Items 2 CATS

Persons MAP OF Items


<more>|<rare>
100 .#### +

' Q55 MORNING


90

#######

80

f##
70 S+
Q58 INTO TROUBLE

Q54 NEGLECTED
60 .## + Q59 FAMILY PROBLEMS

,######
50 +M

Q56 TEENAGER

#####
40 Q42 TOO OFTEN

Q40 REMEMBER

#########
30

20 S+
#########
10

0.

Q27 TOO MUCH

-10
<less>|<frequ>
EACH '#' IS 3.
97
No items were aligned on the item map for Group 2. A side-by-side comparison of the

Groups' respective item-endorsement difficulty indicated that seven of the scales nine

items remained constant on the hierarchy across Groups (see Table 8).

98
Table 8

SYM Test of Independence

Item Hierarchy

Group 1 ' „ • Group 2

Difficult to endorse

Item Number Item Item Item

Number

Q55 Morning drink Q55 Morning drink

Q58 Get in trouble Q58 Get in trouble

Q35 Memory Problems Q54 Neglected obligations

Q59 Family problems Q59 Family Problems

Q54 Neglected obligations Q56 Teen use

Q56 Teen use Q42 Used too often

Q42 Used too often Q40 Couldn't remember

Q40 Couldn't remember Q27 Used too much

Q27 Used too much

Easy to endorse

Eliminated items:

Q60 Drink away from home Q60 Drink away from home

Q35 Memory Problems

99
Obvious Attributes Scale (OAT)

The initial review of the person and item separations and reliabilities findings

indicated 1.16/.57 for persons and 4.89A96 for items of the OAT scale, respectively. Like

the SYM, the OAT scale has only true and false possible response options. Therefore,

these response options had an equal probability of being selected. Based on this, the

review of the response scales was unnecessary. Step two of the Rasch instrument

validation analysis involved reviewing the person and item fit. Further evaluation was

conducted in an effort to improve this scale's separation and reliability results. Inspection

of the items and persons lead to a final iterative elimination of seven people whose

responses were inconsistent. No items failed to meet the standards set forth for item fit,

and therefore, none were eliminated. The elimination of these seven people resulted in

the final person and item separations and reliabilities of 1.20/.59 for persons and 5.15/.96

for the twelve items on OAT scale. These findings suggested a reasonably well defined

linear construct which can be divided between seven groups of items. However, the

construct does not reliably distinguish any characteristic differences within the people

from lower to higher on the variable.

The third step in the analysis involved a review of the person-item map to explore

the extrapolated construct. The resulting hierarchy of items resulted in a pattern from

most difficult to endorse to least difficult to endorse. When two items are aligned at the

same place on the on the hierarchy, the items are either theoretically redundant or at the

same level of difficulty. This means the items overfit. By way of reminder, overfitting

items can be eliminated if the infit mean-square is below 0.6 and the z-standardized score

100
is -2.0 or less. Despite appearing to be measuring the same theoretical content, one of the

items from the aligned group of Q20 and Q54 can be eliminated because each item in this

Combination falls within the item fit standards and is at the same level of difficulty. A

visual representation of the item hierarchy can be viewed in Table 9. The initial Rasch

principle components analysis (RPCA) indicated that 53.6 percent of the total variance

was explained by the instrument. However, there was also an indication of three

underlying contrasts. Underlying contrasts can demonstrate the presence of underlying

constructs which may point to the construct being multidimensional. With the elimination

of misfitting people from the OAT scale, the RPCA increased to 60.3 percent of the total

variance having been explained by the instrument with the three underlying contrasts

remaining. This demonstrated a minimal improvement in the OAT scale and is just within

the RPCA range of acceptability. In addition, the item/person map means and standard

deviations were separated by three items between the person mean and item mean and

one item separating the upper standard deviation and two separating the lower standard

deviation between the persons and items. These distances indicate that the items were

more difficult to endorse than the people were able to agree to them (see Figure 11).

101
Figure 11

Item Map OAT Group 1

INPUT: 173 Persons 12 Items MEASURED: 165 Persons 10 Items 4 CATS

Persons MAP OF Items


<more>|<rare>
80 # +

T Q23 clever

70 .# +

Q17 respectful

. # • #

60

.####### Q53 responsibilities

.#######
Q20 disapproval Q52 resentful

50 .####.##### +M Q33 blame


Q39 law

Q19 leave home

.##########

Q7 hot lived
#########
40 Q48 punished

Q4V Police

###########

Qll sitting still

30 .######## +

##########

20 ## +
<less>I<frequ>
EACH '#' IS 2.
102
The extrapolated variable was compared using the data from a second comparable

group using the same process in the final step of the analysis. The OAT scale, using

Group 2 data demonstrated similar person and item separation and reliability findings.

After the iterative elimination of eight misfitting people and one item which failed to

meet the standards for item fit, the final person and item separation and reliability

findings increased to 1.211.62 and 5.83A97 respectively. Item Q48 (Rarely punished) and

Q7 (Not lived) were aligned at the same place on the variable which implied item

redundancy. Item Q7 overfit, meaning it met the statistical standards for item elimination.

However, this elimination reduced the person separation and reliability findings, while

the item separation and reliability finding remained relatively constant to 1.09/. 5 4 and

5.81/.91 respectively. Because of this reduction in the person separation and reliability

findings, Q7 was retained. The final RPCA for the scale was also comparable at 73.5

percent of the variance being accounted for by the items. The variance accounted for in

RPCA for Group 2 was substantially higher than the RPCA for Group 1 (a difference of

13.2%). This means that the SYM scale items can be divided into eight groups of

difficulty but it cannot reliably distinguish any differences among the group of people. As

was presented for the OAT for Group 1, the hierarchy of OAT item endorsement

difficulty for Group 2 is also presented on Figure 12.

103
Figure 12

Item Map OAT Group 2


INPUT: 173 Persons 12 Items MEASURED: 165 Persons 10 Items 4 CATS

Persons M A P OF Items '


<more>|<rare>
+
.#1
i

IT Q23 clever
t
i
70 +
# T|
1
1 Q17 respectful
1
1
IS
1
1
.### 1
60 +
1

1
1
SI.
. .###### 1
1 Q53 responsibilities
1 Q20 disapproval
1 Q52 resentful
i
1
50 +M
.##### Q39 law
1
1
i
I
1
1 Q19 leave home
######## 1
1
M|
i

40
Q48 punished iQ7 not lived
.#########
i

IS Q4 Police
1

I
######### I
30 +
SI
I Qll sitting still
I
I
I
I
IT
####### I

20 .## +
<less>|<frequ>
EACH ' # ' IS 3 .
104
A side-by-side comparison of the two Groups' respective item-endorsement difficulty

indicated that six of the scale's twelve items remained constant on the hierarchy across

Groups (see Table 9). Five of the six consistent items were found by both Groups to be

the most difficult to endorse.

105
Table 9

OAT Test of Independence

Item Hierarchy

Group 1 ' Group 2

Difficult to endorse

Item Number Item Item number Item

Q23 Clever crooks Q23 Clever crooks

Q17 Respectful Q17 Respectful

Q53 Responsible Q53 Responsible

Q20 Disapproving looks Q20 Disapproving looks

Q52 Resentful Q52 Resentful


V

Q33 Take the blame Q39 Broken the law

Q39 Broken the law Q19 Leave home

Q19 Leave home Q48 Rarely punished


1

Q7 Not lived Q7 Not lived

Q48 Rarely punished Q4 Police

Q4 Police Qll Sitting still

Qll Sitting still

Easy to endorse

Eliminated items:

Q33 Take the blame

106
Subtle Attributes Scale (SAT)

Upon initial review of the person and item separations and reliabilities, the

findings indicated .45/. 17 for persons and 7.52A98 for items of the SAT scale,

respectively. The SAT items can be reliably divided into 10 levels of difficulty. However,

the SAT scale distinguishes no differences among the people. Because the SAT scale has

only dichotomous response options, the review of the response scales was unnecessary.

Step two of the Rasch instrument validation analysis involved reviewing the person and

item fit. Further evaluation was conducted in an effort to improve this scales separation

and reliability results. Inspection of the items and persons lead to a final iterative

elimination of seven people whose fit statistics did not meet the 2.0 z-standardized or

negative point-biserial values. Twelve people were eliminated for misfitting. No items

failed to meet the standards set forth for item fit, and therefore, none were eliminated.

This resulted in the final person and item separations and reliabilities of .49/.20 for

persons and 6.72A98 for the eight items on SAT scale. These findings represented a slight

increase for person separation and reliability but a decrease in item separation. While

these results suggest a reasonably well defined linear construct, the construct fails to

discriminate differences between the people.

The third step in the analysis involved a review of the person-item map to explore

the extrapolated construct. The resulting hierarchy of items resulted in a pattern from

most difficult to endorse to least difficult to endorse (see Figure 13).

107
Figure 13

Item Map SAT Group 1

Persons MAP OF Items


<more> <rare>
Q32
90

S Q61

70

Q18

60 .##.## +

Q50
50 +M
.#########

Q4 9
40 M+
############

30
Q6
######## S
Q44

20 Q28

.###

10
<less>|<frequ>
EACH •#' IS 4.

108
The initial Rasch principle components analysis (RPCA) indicated that 68.2 percent of

the total variance was explained by the instrument. When the misfitting people were

eliminated from the SAT, the RPCA increased to 92.8 percent of the total variance

having been explained by the instrument with no underlying contrasts remaining. This

change in variance accounted for an improvement in the SAT scale. In addition, the

item/person map means and standard deviations were separated by several items. This

separation indicates a great distance between the difficulty of the endorsability of the

items and the agreeability of the people. However, these items appeared to span the entire

scale of the variable.

The extrapolated variable was compared using the data from a second comparable

group using the same process in the final step of the analysis of the scale. The SAT scale,

using Group 2 data, demonstrated similar person and item separation and reliability

findings. After the iterative elimination of eleven misfitting people, the final person and

item separation and reliability findings increased to .62A28 and 8.32A99 respectively. No

items failed to meet the cutoff for item fit; therefore, no items were eliminated. The final

RPCA for the scale was also comparable at 92.9 percent of the variance being accounted

for by the items. As was presented for the SAT for Group 1, the hierarchy of SAT item

endorsement difficulty for Group 2 is also presented on Figure 14.

109
Figure 14

Item Map SAT Group 2

INPUT: 173 Persons 8 Items MEASURED: 162 Persons 8 Items 4 CATS

Persons MAP OF Items.


<more> <rare>
Q61

Q32
80

70 T+
Q18
####
60

Q50

50 +M
#######

Q4 9

40

.###########

30 + Q6

.##### S

Q44

20

Q28

.##

10
<less>I<frequ>
EACH •#' IS 5.

110
A side-by-side comparison of the Groups' respective item-endorsement difficulty

indicated that six of the scales eight items remained constant on the hierarchy across

Groups (see Table 10). The two items found to be the most difficult to endorse by both

Groups were interchanged. This means that the SAT scale can be divided into eleven

levels of difficulty. However, the scale does not discriminate differences among the

Group 2.

Table 10

SAT Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item Number Item Item Number Item

Q32 Break more laws Q61 Antacid

Q61 Antacid Q32 Break more laws

Q18 Obey the law Q18 Obey the law

Q50 Full of energy Q50 Full of energy

Q49 Smoke Q49 Smoke

Q6 Not my fault Q6 Not my fault

Q44 Responsible Q44 Responsible

Q28 Uninteresting Q28 Uninteresting

Easy to endorse

111
Supplemental Addiction Measure (SAM)

The initial review of the person and item separations and reliabilities findings

indicated 1.06/.53 for persons and 3.01/.90 for items of the SAM scale, respectively. This

means that the scale does a good job of distinguishing the levels of items into four levels

of difficulty but it does not meet the minimum standard of a separation of 2.0 for people.

The scale does not differentiate any groups of people in this Group. Due to the SAM

having only true and false possible responses, the review of the response scales was

unnecessary. Step two of the Rasch instrument validation analysis involved reviewing the

person and item fit. Further evaluation was conducted in an effort to improve this scale's

separation and reliability results. Inspection of the items and persons lead to a final

iterative elimination of the 21 people and the one item that failed to meet the standards

set forth for person and item fit. This elimination process resulted in the final person and

item separations and reliabilities of 1.32/.64 for persons and 4.17/.95 for the remaining

ten items on the SAM scale. This suggests a reasonably well defined linear construct.

However, the construct does not discriminate among the Group reliably.

The third step in the analysis involved a review of the person-item map to explore

the extrapolated construct. The resulting hierarchy of items resulted in a pattern from

most difficult to endorse to least difficult to endorse (see Figure 15).

112
Figure 15

Item Map SAM Group 1

INPUT: 174 Persons 14 Items MEASURED: 153 Persons 13 Items 4 CATS

Persons MAP OF Items


<more>I<rare>
80 +

. # •

Q51 SAT ABOUT.


70 #### +

60 + Q16 NOTUPTOIT Q9 DAYDREAM


S

########## Q39 BROKEN LAW

#########
50 M+M Q54 NEGLECTED

########### Q42 TOO OFTEN

Q48 PUNISHED Q7 NOT LIVED


Q2 9 CONTROL MYSELF Q40 COULDN'T REMEMBER
##### Q4 TROUBLE

S Q4 6 UNDESIRABLE
40
Q13 WORNOUT
S

########

30
T
##
20
<less>|<frequ>
EACH '#' IS 2.
113
When two items are aligned at the same place on the hierarchy, the items are either

theoretically redundant or at the same level of difficulty. In Rasch this means the items

overfit. Overfitting items can be eliminated if the infit mean-square is below 0.6 and the

z-standardized score is -2.0 or less. Despite appearing to be measuring the same

theoretical content, one of the items from each of the aligned groups of Q16/Q9, Q42/Q7

and Q29/Q40 can be eliminated because all of items in these combinations fall within the

item fit standards and are at the same level of difficulty. A visual representation of the

hierarchy for Group 1 's SAM scale items is demonstrated in Table 11. The initial Rasch

principle components analysis (RPCA) indicated that 26.8 percent of the total variance

was explained by the instrument. Additionally, five underlying contrasts were indicated.

With the elimination of 21 people and one item (Q5 [Made mistakes]) from the SAM

scale, the RPCA increased to 47.4 percent of the total variance having been explained by

the instrument and a reduction in the remaining contrasts to four. This demonstrated some

improvement in this scale due to the elimination of the misfitting items and people.

However, the minimal accepted standard for RPCA is greater than or equal to 60 percent.

Even with the improvements the SAM does not appear to function as a linear construct

due to its low RPCA.

The extrapolated variable was compared using the data from a second comparable

group using the same process in the final step of the analysis. The SAM scale, using

Group 2 data, demonstrated similar person and item separation and reliability findings.

After the iterative elimination of twenty-three misfitting people and two items (Q16 and

Q5) that failed to meet the standards for item fit, the final person item separation and

114
reliability findings increased to 1.42/.69 and 5.45A97 respectively. The final RPCA for

the scale was also comparable at 71.6 percent of the variance being accounted for by the

items with no underlying contrasts. As was presented for the SAM for Group 1, the

hierarchy of SAM item endorsement difficulty for Group 2 is also presented on Figure

16.

Figure 16

Item Map SAM Group 2

INPUT: 173 Persons 14 Items MEASURED: 150 Persons 9 Items 4 CATS

irsons MAP OF Items


<more>|<rare>
80 + Q51 SAT ABOUT
.## ' IT
T|
70 +
.##### 1
i

SIS
1 .Q9 DAYDREAM
. 1
1
60 +
########## 1
1
1
1 Q39 BROKEN LAW
1 Q54 NEGLECTED
.##### . 1
50 +M
M|
i
1
######### 1
I Q4 8 PUNISHED
I Q42 TOO OFTEN
I
I
40 #### + Q4 TROUBLE Q40 COULDN'T REMEMBER
IS
I
SI
###### I

30
I Q13 WORNOUT
.##### I
|T
20 •# + ••
<less>I<frequ>
EACH '#' IS 3.

115
The item combination of Q46/Q40 was aligned at the same place on the variable. Neither

of the items fit both statistical standards for item elimination and neither seemed to be

measuring the same theoretical content. In addition, elimination of Q40, which had the

highest overfit statistics did not improve the scale. This change produced a person

separation and reliability and RPCA decrease to 1.32/.64 and 70.2 percent respectively.

Therefore, item Q40 remained in the hierarchy. However, the removal of item Q46

increased the scale's person and item separation and reliability findings as well as

increased the RPCA, 1.46/.68, 5.65A97, and 7.75 percent, respectively. Therefore, item

Q46 was removed due to redundancy and improvement in the scale. A side-by-side

comparison of the Groups' respective item-endorsement difficulty indicated that three of

the scale's fourteen items remained constant on the hierarchy across Groups (Table 11).

The item found to be the most difficult to endorse and the two items found to be least

difficult to endorse by both Groups were consistent across both scales. Yet, the hierarchy

developed from Group 1 's data was comprised of thirteen items. The hierarchy item scale

developed from Group 2's data was comprised of eleven items. This means that the SAM

scale items can be divided into seven levels of difficulty, which does not reliably

discriminate any differences among the Group 2.

116
Table 11

SAM Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item Number Item Item number Item


' ' • /•

Q51 Sat about Q51 Sat about

Q16 Wasn't up to it Q9 Daydream

Q9 Daydream Q39 Broken law

Q39 Broken law Q54 Neglected obligations

Q54 Neglected obligations Q48 Rarely punished

Q42 Too often Q42 Too often

Q7 Not lived Q4 Police trouble

Q29 Control myself Q40 Couldn't remember

Q40 Couldn't remember Q13 Worn out

Q4 Police trouble

Q46 Undesirable people

Q13 Worn out

117
Table 11 (Continued)

Easy to endorse

Eliminated items:

Q5 Make mistakes Q5 Make mistakes

Q16 Wasn't up to it

Q46 Undesirable People

Defensiveness Scale (DEF)

The review of the initial person and item separations and reliabilities findings

indicated .84/.41 for persons and 5.64A97 for items of the DEF scale, respectively. This

means that while the DEF scale items can be divided into seven levels of difficulty, they

do not reliably discriminate differences among the people. Because of the dichotomous

nature of the response options (true and false) both options have an equal probability of

being selected. Therefore, the review of the response scales was unnecessary.

Step two of the Rasch instrument validation analysis involved reviewing the

person and item fit. Further evaluation was conducted in an effort to improve this scale's

separation and reliability results. Inspection of the items and persons lead to a final

iterative elimination often people whose responses were inconsistent and two items (Q8

[Friendly] and Q25 [Dangerous]) which failed to meet the standards set forth for item fit.

This resulted in the final person and item separations and reliabilities of .93/.47 for

persons and 6.62A98 for the remaining ten items On DEF scale. While this suggests a

reasonably well defined linear construct, which can be divided into nine levels of

118
difficulty with high reliability (.98), the variable does not reliably discriminate any

differences among the people.

The third step in the analysis involved a review of the person-item map to explore

the extrapolated construct (see Figure 17).

119
Figure 17

Item Map DEF Group 1


• • i , . • • . •

INPUT: 174 Persons 12 Items MEASURED: 164 Persons 10 Items 4 GATS

Persons. MAP OF Items


<more>|<rare>
80 . +T
I Q61 ANTACID
I
70 T+
.# I
IS
I
I
.### I Q22 AVOIDED PEOPLE
I
60 +
I Q51 SAT ABOUT
S|
I
.########## I
I Q9 DAYDREAM
I
I
I Q16 NOT UP TO IT
I.
50 .########### +M Q41 THINK
. I Ql LIE
I
I
M | . •

I
.######### I
40
##### I Q65 RESTLESS
IS
I
S|
I
.###### I.
30 +
I Q64 HAPPY
I
I Q31 NO GOOD
' - . . [ •
T|
20 • .##### +T
<less>|<frequ>
EACH '#' IS 3.

120
The resulting hierarchy of items resulted in a pattern from most difficult to endorse to

least difficult to endorse. The initial Rasch principle components analysis (RPCA)

indicated that 42.8 percent of the total variance was explained by the instrument.

Additionally, five underlying contrasts were indicated. Following the elimination often

people and two items from the DEF scale, the RPCA increased to 71.6 percent of the total

variance having been explained by the instrument and the number of remaining contrasts

was reduced to one. This increase in the RPCA demonstrated the improvement in the

DEF scale by eliminating misfitting items and people. In addition the item/person map

means and standard deviations were separated by several items but span the length of the

variable. This indicates that the items were marginally as difficult to endorse as the

people were able to agree to them.

The extrapolated variable was compared using the data from a second comparable

group using the same process. The DEF scale, using Group 2 data, demonstrated similar

person and item separation and reliability findings as were found using Group 1 's data.

After the iterative elimination of thirteen misfitting people and one item that failed to

meet the standards for item fit, the final person and item separation and reliability

findings increased to .96/.48 and 7.07/.98, respectively. While these eliminations

improved the scale, the DEF scale still did not distinguish the people in a reliable manner.

The final RPCA for the scale was also comparable at 80.5 percent of the variance being

accounted for by the items. As was presented for the DEF for Group 1, the hierarchy of

DEF item endorsement difficulty for Group 2 is also presented on Table 12. Items Q25

and Q9 were aligned on the variable for the data provided by Group 2. Further evaluation

121
of this alignment indicated that neither of the items met the statistical standards for

overfit. Additionally, the items appeared to be measuring two theoretically different

content areas, Q25 (Dangerous), Q9 (Don't like to Daydream). It should be noted that in

the hierarchy produced by the data from Group 1, item Q25 was eliminated for misfitting.

Eliminating item Q25 from the hierarchy produced by the data from Group 2 improved

the item separation and reliability findings and the RPCA for the DEF scale to 7.45A98

and 83.9 percent respectively. See Figure 18 for the DEF item map.

122
Figure 18

Item Map DEF Group 2

INPUT: 173 Persons 12 Items MEASURED: 160 Persons 10 Items 4 CATS

Persons MAP OF Items .


<more> | <rare>.
| Q61 ANTACID
80 +
# 1

70 T+S Q22 AVOIDED PEOPLE


### I
I
I Q51 SAT ABOUT
60 + Q8 FRIENDLY
I
.######## I
S|
| Q9 DAYDREAM
• I
I
50 ######### +M
| Q16 NOT UP TO IT
I
| Q41 THINK
I
########## Ml
I ..
40 +
.######## I
| Q65 RESTLESS
30 S+S
I
######### I
I
I
I Q64 HAPPY
| Q31 NO GOOD
20 +
I
I
### T|.
I
I
IT
10 . +
<less>|<frequ>
EACH '#' IS 3.

123
A side-by-side comparison of the two Groups' respective item-endorsement difficulty

indicated that six of the scales twelve items remained constant on the hierarchy across

Groups (Table 12). Three of the six consistent items were found by both Groups to be the

most difficult to endorse. The other three items were found by both Groups to be the least

difficult to endorse.

124
Table 12

DEF Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item number Item Item number Item

Q61 Antacid Q61 Antacid

Q22 Avoided people Q22 Avoided people

Q51 Sat about Q51 Sat about

Q9 Daydream Q8 Friendly

Q16 Wasn't up to it Q25 Dangerous

Q41 Think carefully Q9 Daydream

Ql Lie Q16 Wasn't up to it

Q65 Restless Q41 Think carefully

Q64 Happy Q65 Restless

Q31 No good Q64 Happy

Q31 No good

Easy to endorse

Eliminated items:

Q8 Friendly Ql Lie

Q25 Dangerous

125
Family versus Control Scale (FAM)

The initial review of the FAM scale's person and item separations and reliabilities

findings indicated .71/.33 for persons and 5.03/.96 for items, respectively. This means

that the FAM scale items can be divided into seven levels of difficulty but the scale did

not reliably distinguish any differences among Group 1. Since the FAM scale's items are

limited to true and false responses, both options have an equal probability of being

selected. Therefore, the review of the response scales was unnecessary. Step two of the

Rasch instrument validation analysis involved reviewing the person and item fit. Further

evaluation was conducted in an effort to improve this scale's separation and reliability

results. Inspection of the items and persons lead to a final iterative elimination of fourteen

people whose responses were inconsistent and three items (Q27 [Too much], Q63 [Loss

for words], and Q8 [Friendly]) which failed to meet the standards set forth for item fit.

This resulted in the final person and item separations and reliabilities of 1.00/. 50 for

persons and 5.71/.97 for the FAM scale's remaining thirteen items. These separation and

reliability findings suggest a reasonably well defined linear construct which can be

divided into seven levels of difficulty but the construct still fails to reliably distinguish

any differences among the people.

The third step in the analysis involved a review of the person-item map to explore

the extrapolated construct. The resulting hierarchy of items resulted in a pattern from

most difficult to endorse to least difficult to endorse. When two items are aligned at the

same place on the on the hierarchy, the items are either theoretically redundant or at the

same level of difficulty. In Rasch this means the items overfit. Overfitting items can be

126
eliminated if the infit mean-square is below 0.6 and the z-standardized score is -2.0 or

less. Despite not appearing to be measuring the same theoretical content and all of the

items in these combinations falling within the item fit standards, one of the items from

each of the aligned groups of Q25/Q9 and Q23/Q55 can be eliminated because they are at

the same level of difficulty. The initial Rasch principle components analysis (RPCA)

indicated that 35.9 percent of the total variance was explained by the instrument.

Additionally, three underlying contrasts were indicated. With the elimination of fourteen

people and three items from the FAM scale, the RPCA increased to 78.1 percent of the

total variance having been explained by the instrument with no remaining contrasts. This

demonstrated improvement in this scale by eliminating misfitting items and people. In

addition, the item/person map means and standard deviations were separated by several

items but span the length of the variable. This indicates that the items were easier to

endorse than the people were able to agree to them (see Figure 19).

127
Figure 19

Item Map FAM Group 1

INPUT: 174 Persons 15 Items MEASURED: 160 Persons 12 Items 4 CATS

Persons MAP OF Items


<more>|<rare>
90 . . +
T|
####
JO

,###### S Q25 DANGEROUS


70 + Q9 DAYDREAM

.########
Ql LIE

60 ############ M+

Q54 NEGLECTED
#####.### Q65 RESTLESS
Q39 BROKEN LAW

####
50 +M
Q50 ENERGY

,#### Q38 FEEL SURE


Q3 GO ALONG

.#
40
Q23 CROOKS Q55 STEADY

30

20

Q61 ANTACID

10
<less>|<frequ>
EACH '#' IS 3,

128
The extrapolated variable was compared using the data from a second comparable

group using the same process in the final step of the analysis. The FAM scale, using

Group 2 data, demonstrated similar person and item separation and reliability findings.

After the iterative elimination of six misfitting people, the final person and item

separation and reliability findings increased to .43/. 16 and 5.41/.97 respectively. No items

failed to meet the cutoff for item fit, therefore, no items were eliminated. The final RPC A

for the scale was also comparable at 40 percent of the variance being accounted for by the

items. As was presented for the FAM for Group 1, the hierarchy of FAM item

endorsement difficulty for Group 2 is also presented on Table 13. Items Q25 and Q9 were

again aligned on the same variable for the hierarchy Created by the data from Group 2,

which indicates redundancy. In addition, the item combination of Q39 and Q54 was also

aligned on the hierarchy. Item Q54 fit statistics indicated that the item overfit. However,

elimination of the item drastically decreased the person separation and reliability findings

while only narrowly increasing the item separation and reliability findings and RPC A to

.19/.04, 5.54A97 and 40.4 percent, respectively. Therefore, the item Q54 remained in the

scale, as it was found to drastically reduce the ability of the instrument of discriminate

between the people and also appeared to measures a different content area (Figure 20).

129
Figure 20

Item Map FAM Group 2

INPUT: 173 Persons 15 Items MEASURED:. 167 Persons 14 Items 4 CATS

Persons MAP OF Items


<more>|<rare>
73 .### +T
72 +
71 + Q9 DAYDREAM
70 + Q25 DANGEROUS
69 +
68 +
67 .###### +
66 S+
65 +
64 +
63 +
62 .########## +s Q63 LOSSFORWORDS
61 . +
60 + Ql LIE
59 M+
58 +
57 .######### +
.56 +
55 +
54 +
53 .###### + Q39 BROKEN LAW
52 S+ Q65 RESTLESS
51 +
50 ..### +M
49 + Q50 ENERGY
48 +
47 +
46 . + Q38 FEEL SURE
.45 T + Q3 GO ALONG
44 + Q27 TOO MUCH Q8 FRIENDLY
43 +
42 #• +
41 +
40 +
39 + • Q55 STEADY

38 +S
37 +
36 +
35 +
34 + Q23 CROOKS
33 +
32 +
31 + Q61 ANTACID
<less>|<frequ>
EACH •#' IS 4.

130
A side-by-side comparison of the Groups' respective item-endorsement difficulty

indicated that six of the scales' fifteen items remained constant on the hierarchy across

Groups (Table 13). The two items found by both Groups to be the most difficult to

endorse (Q25 [Dangerous] and Q9 [Daydream]) and the item least difficult to endorse

(Q61 [Antacid]) were consistent as well as a set of three items (Q3 [Go along with], Q38

[Feel sure], and Q50 [Full of energy]) which were clustered around the mean for both

hierarchies and were found to be consistent. This means that the FAM scale fails to work

in terms of discriminating differences among the people and also fails to account for

more than 60 percent of the variance.

131
Table 13

FAM Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item Number Item Item Number Item

Q25 Dangerous Q9 Daydream

Q9 Daydream Q25 Dangerous

Ql Lie Q63 Loss for words

Q54 Neglected obligations Ql Lie

Q65 Restless Q39 Broken a law

Q39 Broken a law Q65 Neglected obligations

Q50 Full of energy Q50 Full of energy

Q38 Feel sure Q38 Feel sure

Q3 Go along with Q3 Go along with

Q23 Clever crooks Q27 Too much

Q55 Morning drinks Q8 Friendly

Q61 Antacid Q55 Morning drinks

Q23 Clever crooks

Q61 Antacid

Easy to endorse
;

Eliminated items:

132
Table 13 (Continued)

Q8 Friendly Q54 Neglected obligations

Q27 Too much

Q63 Loss for words

Correctional Scale (COR)

The initial review of the person and item separations and reliabilities findings

indicated 1.10/.55 for persons and 5.35A97 for items of the COR scale respectively. The

COR scale items can be divided into seven levels of difficulty but it cannot reliably

discriminate differences among the people. The COR scale has true and false responses.

Therefore, the review of the response scales was unnecessary. Step two of the Rasch

instrument validation analysis involved reviewing the person and item fit. Inspection of

the items and persons lead to a final iterative elimination of six people and one item (Ql

[Lie]) which failed to meet the standards set forth for item fit. This resulted in the final

person and item separations and reliabilities of 1.13/.56 for persons and 5.84A97 for the

remaining eleven items on COR scale. While these separation and reliability findings

suggest a reasonably well defined linear construct, the variable does not distinguish

differences among the people.

The third step in the analysis involved a review of the person-item map to explore

the extrapolated construct. The resulting hierarchy of items resulted in a pattern from

most difficult to endorse to least difficult to endorse (see Figure 21).

133
Figure 21

Item Map COR Group 1

INPUT: 174 Persons 12 Items MEASURED: 168. Persons 11 Items 4 CATS

Persons MAP OF Items


<more> <rare>
100

90 + Q32 BREAK MORE

80

70
Q18 OBEY

##
60

Q31 NO GOOD
.##### S

Q24 TEACHERS
50 ######### +M
Q39 NEVER BROKEN LAW

.######
Q19 LEAVE HOME
M| Q42 TOO OFTEN

40 .######## + Q4 0 COULDN'T REMEMBER


Q7 TROUBLE

.####'# I Q41 THINK


S Q36 HIT PEOPLE

30 ###### S+

20 ######### +
<less>|<frequ>
EACH 'i' IS 3.

134
The initial RPCA indicated that 60.1 percent of the total variance was explained by the

instrument. Additionally, two underlying contrasts were indicated. With the elimination

of six people and one item from the COR scale, the RPCA increased to 85.1 percent of

the total variance having been explained by the instrument with no remaining contrasts.

This demonstrated improvement in this scale by eliminating misfitting items and people.

In addition, the item/person map means and standard deviations were separated by

several items but span the length of the variable. This indicates that the items were more

difficult to endorse than the people were able to agree to them.

The extrapolated variable was compared using the data from a second comparable

group using the same process in the last step of the analysis. The COR scale, using Group

2 data, demonstrated similar person and item separation and reliability findings. After the

iterative elimination of three misfitting people and one item (Ql), the final person and

item separation and reliability findings increased to 1.28/.62 and 6.17/.97, respectively.

Using the data from Group 2, two combinations of items aligned at the same place on the

hierarchy, Q42/Q7 and Q36/Q40. By examining the item fit statistics for the aligned

items, only one item appeared to overfit statistically, and all of the items appeared to

measure different content. There was no improvement for this scale despite the removal

of the overfitting item (Q42), the review of the person and item separation and reliability

findings and the RPCA. The person and item separation and reliability findings and

RPCA declined to 1.12/.56,6.16/.97 and 80.6 percent, respectively. Therefore, item Q42

remained in the hierarchy. The final RPCA for the scale was also comparable at 82.8

percent of the variance being accounted for by the items. As was presented for the COR

135
for Group 1, the hierarchy of COR item endorsement difficulty for Group 2 is also

presented on Figure 22.

Figure 22 '

Item Map COR Group 2

INPUT: 172 Persons 12 Items MEASURED: 16'9 Persons 11 Items 4 CATS

Persons MAP OF Items


<more>|<rare>
90 +
I Q'32 BREAK MORE
# 1
IT
80 +
' • I •

## T|
70 +
I
Q18 OBEY
IS

.###

60 + Q31 NO GOOD

SI
.###•# I

I Q24 TEACHERS

.#########
50 +M Q39 NEVER BROKEN LAW

#### I
M| Q19 LEAVE HOME
I
40 +
I Q42 TOO OFTEN Q7 TROUBLE
I
I Q36 HIT PEOPLE Q40 COULDN'T REMEMBER
Q41 THINK .
#### IS.

30 ######## +
SI

.##### I
20. ####. +
<less>|<frequ>
EACH '#' IS 3.

136
A side-by-side comparison of the Groups' respective item-endorsement difficulty

indicated that seven of the scale's remaining eleven items remained constant on the

hierarchy across Groups (Table 14). All seven of these items were among those found to

be among the most difficult to be endorsed by both Groups. This means that the COR

scale items can be divided into eight difficulty levels, but it does not do an adequately

reliable job of distinguishing among Group 2.

Table 14

COR Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item Number Item Item Number Item

Q32 Break more laws Q32 Break more laws

Q18 Obey laws Q18 Obey laws

Q31 No good Q31 No good

Q24 School/teacher problems Q24 School/teacher problems

Q39 Broken laws Q39 Broken laws

Q19 Leave home Q19 Leave home

Q42 Too often Q42 Too often

Q40 Couldn't remember Q7 Not lived

Q7 Not lived Q40 Couldn't remember

137
Table 14 (Continued)

Q41 Think carefully Q36 Tempted to hit

Q36 Tempted to hit Q41 Think carefully

Easy to endorse

Eliminated items:

Ql Lie Ql Lie

Random Answering Pattern (RAP)

The initial review of the person and item separations and reliabilities the findings

indicated 0.00/0.00 for persons and 3.63A93 for items of the RAP scale respectively. This

means that the RAP scale does not distinguish any differences among Group 1 but

initially, the items can be divided into five levels of difficulty. However, the RAP has

true and false response options, therefore, both have an equal probability of being

selected. Therefore, the review of the response scales was unnecessary. Step two of the

Rasch instrument validation analysis involved reviewing the person and item fit.

Inspection of the items and persons lead to a final iterative elimination of twelve

misfitting people who failed to meet the statistical standards for fit. No items failed to

meet the standards set forth for item fit. This resulted in the final person and item

separations and reliabilities of 0.00/0.00 for persons and 0.00/0.00 for the RAP scale.

This indicated no change for items and a decrease in reliability and separation for

persons. This suggests that the RAP scale is functioning as developed, which is in a

random manner. The third step in the analysis involved a review of the person-item map

138
to explore the extrapolated construct. There was no resulting hierarchy of items due to the

nature of the scale being random. The initial RPCA indicated that 48 percent of the total

variance was explained by the instrument. Additionally, two underlying contrasts were

indicated. With the elimination of twelve misfitting people from the RAP scale, the

RPCA decreased to 1.4 percent of the total variance having been explained by the

instrument with two remaining contrasts which accounted for 98.6 percent of the

unexplained variance. The RPCA also indicated that the RAP scale was functioning as

intended which is in a random manner.

The extrapolated variable was compared using the data from a second comparable

group using the same process in the last step of the analysis. The RAP scale, using Group

2 data, demonstrated similar person and item separation and reliability findings. After the

iterative elimination of fourteen misfitting people, the final person and item separation

and reliability findings decreased to .00/.00 and .00/.00, respectively. The final RPCA for

the scale was also comparable at 18.6 percent of the variance being accounted for by the

items with three additional contrasts. As with the RAP scale using data from Group 1, the

RAP scale using data from Group 2 had similar results. This means that the RAP scale

functions in a random pattern as intended by the SASSI-3 authors.

Dichotomous SASSI-3

The SASSI-3 employs a 9-step decision rubric to arrive at a final determination of

whether a respondent's answers indicated "high probability" or "low probability" of

having a substance dependence disorder. This rubric requires the clinician to reference

the respondent's scores on eight of the SASSI-3's ten subscales. The first 5 of these 9

139
steps require the scorer to reference the individual SASSI-3 subscales. The remaining 4

steps are a function of two or more subscales used in combination. In total, these 9 steps,

involving eight of the ten subscales employ only 70 of the SASSI-3's total of 93 items.

The initial person and item separation and reliabilities for the dichotomous SASSI-3 were

3.54A93 and 5.60/.97, respectively. Additionally, the initial RPCA was 59.7 with one

contrast accounting for more than 5 percent of additional variance. Viewed cumulatively,

all of these findings suggest that the dichotomous SASSI-3, according to the Rasch

analysis, can be said to be a logical linear construct in which the items can be divided into

seven levels of difficulty and which discriminates five levels of differences among the

people. It is just under the required 60 percent cut off for explained variance for the

Rasch principle components. However, in an effort to determine whether improvements

could be made, the researcher conducted analyses of the dichotomous SASSI-3's scale's

response options, items, and underlying factor structure.

The first step in a Rasch analysis was to evaluate the response scales. Because the

dichotomous SASSI-3 involves items from both the front and back of the instrument, the

true/false and Likert-type response options, it is important to evaluate the response scales

for validity. Inspection of the probability curve and thresholds indicated that response

options 1-Once or Twice and 2- Several times did not meet the standards for cutoffs for

the face valid response scales as identified earlier (see Figure 23).

140
Figure 23

Response Option 0123 Dichotomous SASSI-3 Group J

INPUT: 174 Persons 71 Items MEASURED: 174 Persons 71 Items 8 CATS


3.62.1

SUMMARY OF CATEGORY STRUCTURE. Model="R"


FOR GROUPING "B" Item NUMBERS: 46-71
+
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT I I STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQI|CALIBRATN| MEASURE|

0 0 2810 62|--14.22 -13.51 .77 .801 I NONE I (-14.07) |0


1 1 583 13 1 -6.36 -7.941 .90 .60 1 I 4.99 | -4.08 | 1
2 2 533 12 1 -1.90 -2.481 .87 .63 M -4.30 | 3.50 | 2
3 3 585 131 3.80 2.881 .83 .661 I -.69 |( 14.72)| 3
+- ++-
MISSING 13 01 -3.33 1 1 1 1 1
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.
+
CATEGORY STRUCTURE I SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. ZONE 1PROBABLTY| M->C C->M|DISCR|

0 NONE |(-14.07) -INF -9.21| | 88% 74%| | 0


1 4.99 .38 | -4.08 -9.21 -.331 -4.63 | 21% 49%[ 1.11 | 1
2 -4.30 .44 | 3.50 -.33 9.08| -.86 | 29% 34%| 1.40| 2
3 -.69 .54 |( 14.72) 9.08 +INF | 5.07 | 80% 22% | 1.351 3
+
M->C = Does Measure imply Category?
C->M = Does Category imply Measure?
CATEGORY PROBABILITIES: MODES - Structure measures at intersections
p ++ -+- H 1 -+ n (. + 1 (. _++
R 1.0 + +
0 1000 33|
B I 000000 333333 |
A 1 000 3333 |
B .8 + 000 333 +
I 1 00 333 |
L 1 00 33 |
I 1 0 33 |
T .6 + 00 3 +
Y 1 0 33 |
.5 + 0 3 +
O 1 00 33 |
F .4 + 0 3 +
1 0 3 |
R 1 0*3222222 |
E 1 22*2* 22222 |
S .2 + 111*****1 00 22222 +
P 1 1111111222 33 1111*0 22222 |
O 1 11111111 2222 33: 1**11 2222222 |
N 111111 2222222233333 00****1111 22|
S .0 -J.*******!* 33333333 0000**************+
E ++ -+- + + .+ + + + + + ++
-24 -19 -14 - 9 - 4 1 6 11 16 21 26
Person [MINUS] Item MEASURE
141
As the purpose of this exploration was to investigate the effectiveness of the dichotomous

SASSI-3 as a unidimensional instrument, it was important to maintain the continuity of

the response scales of the face valid scales and not separate the FVA and FVOD scales.

Therefore, employing the same collapsing strategy for both the FVA and FVOD response

scales resulted in a positive increase in the item separation and reliability to 5.72/.97 (see

Table 15). No additional examination was warranted because the response options for the

other scales are dichotomous (true and false).

142
Table 15

Summary of Collapsing Strategy for Dichotomous Group 1 Face Valid Response Options

Rating Probability
Threshold2 PS&R IS&R RPCA
1
Scale Curve

0 = 0.95
0-l=N/A
1=0.20'
0,1,2,3 1-2 = 9.29 3.547.93 5.60/.97 59.7%
2 = 0.30
2-3 = 3.61
3 =0.95

0 = 0.95
0-l=N/A
0,1,1,2 1=0.40 3.35A92 5.72A97 49.3%
1-2 = 6.04
2 = 0.95

0 = 0.95
0-l=N/A
0,0,1,2 1=0.20 3.06/.90 5.75A97 58.1%
1-2=11.44
2 = 0.95

0 = 0.95
0-l=N/A
0,1,2,2 1=0.20 3.43A92 5.43A97 53.1%
1-2=14.49
2 = 0.95

Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &

Reliability. RPCA = Rasch principal components analysis.

143
Figure 24 depicts the corrected response scale in which the middle two response

options 1-Once or Twice and 2-Several Times were combined.

Figure 24

Corrected Response Option Curve 0112 Dichotomous SASSI-3 Group 1


INPUT: 174 Persons 71 Items MEASURED: 174 Persons 71 Items 7 CATS

SUMMARY OF CATEGORY STRUCTURE. Model="R"


FOR GROUPING "B" Item NUMBERS: 46-71

+ -T • '
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQJ | CALIBRATN | MEASURE|

0 0 2810 621-17.73 -16.6| .72 .78|| NONE |(-16.80)I 0


1 1 1116 251 -6.18 -7.881 .79 .591| -3.02 | .00 | 1
2 2 585 131 2.97 .891 .79 .621 | 3.02 |( 16.80)1 3

MISSING 13 0| -5.57 | || | |
+ .—
OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.
:
+ - • •- ;

CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.I COHERENCE|ESTIM|


LABEL MEASURE S.E. | AT CAT. ZONE |PROBABLTY| M->C. C->M|DISCR|

0 NONE I (-16.80) -INF -9.311 I 84% 84%| |0


1 -3.02 .36 | .00 -9.31 9.30| -6.33 | 43% 60%I 1.31| 1
2 3.02 .52 |( 16.80) 9.30 +INF | 6.33 I 78% 20%| 1.32| 3
:
+
M->C = Does Measure imply Category?
C->M = Does Category imply Measure?

CATEGORY PROBABILITIES: MODES - Structure measures at intersections

R 1.0 + + '
0 |00 22|
B | 00000 22222 |
A | 0000 2222 |
B .8 + 000 222 +
1 I 00 22 |
L | 00 22 |
I | 00 22 |
T .6 + 00 22 +
Y | 0 2 |
.5 + 00 22 + •
0 | . 0 2 I
F .4 + **11111** +
| 1111 00 22 1111 |.
R | 111 * ' 111 I!
E " I 111 22 00 ' 111 |
S .2 + 1111 22 00 1111 +
P | 1111 222 000 . 1111 |
o i nun 222 ooo mill i
N 111 2222222 0000000 111
S .0+2222222222222 0000000000000+

-30 -20 -10 0 10 20 30


Person [MINUS] Item MEASURE

144
Step two of the Rasch instrument validation analysis involved reviewing the

person and item fit. Further evaluation was conducted for the dichotomous SASSI-3 to

determine whether the separation and reliability results could be improved. Inspection of

the items and persons lead to a final iterative elimination of 25 people whose responses

were inconsistent and eighteen items which failed to meet the standards set forth for item

fit. This resulted in the final person and item separations and reliabilities of 3.32/.92 and

5.50/.92, respectively, for the dichotomous SASSI-3. This suggests that the resulting

items formed a well defined linear construct that does a good job in measuring the

people.

The third step in the analysis of the dichotomous SASSI-3 involved a review of

the person-item map to explore the extrapolated construct. The resulting hierarchy of

items resulted in the following pattern from most difficult to endorse to least difficult to

endorse is available for a visual review on Table 19. Items aligned at the same position

on the variable may imply redundancy. Therefore, the item fit statistics for the pairs of

items were reviewed to identify which items fit best. The least best fitting item of the pair

Was eliminated. In addition, because the items from the face valid scales were found to

discriminate differences among people, these items were analyzed last for elimination as

they seem to contribute the most to the effectiveness of the instrument. Table 16 lays out

the pairs of items and the fit statistics.

145
Table 16

Dichotomous SASSI-3 Group 1 Paired Aligned Items Fit Statistics

Infit Outfit

Items MNSO1 zsur MNSO ZSTD

Q17* 1.06 .40 1.02 .20

Q18 1.08 .60 .75 -.80

Q35* 1.10 1.0 2.11 3.8

Q52 1.08 .8 .99 .0

Q42* .92 -1.0 .86 -1.0

Q49 .81 -2.4 .70 -2.2

Q29* 1.04 .5 1.06 .5

Q4 1.06 .7 1.04 .3

Q40 .95 -.6 .90

Q48* 1.20 2.4 1.27 1.9

*=better fitting item of the pair. Items with no * was eliminated. MNSQ = mean-square. ZSTD=z-standardized.

The elimination of items Q17, Q35, Q42, Q29, and Q48 resulted in an increase in person

and item separation and reliability findings of 3.21/.91 and 5.62A97, respectively, and a

RPCA of 97.8 percent. The above process was repeated until no further improvements
146
were made in person and item separation and reliabilities. Finally, with the elimination of

29 misfitting items and 25 misfitting people, the resulting 41 item dichotomous SASSI-3

scale had a person and item separation and reliability finding of 3.06/.90 and 5.63/.97,

respectively (see Figure 25 for the item map).

147
Figure 25

Item Map Dichotomous SASSI-3 Group 1


INPUT: 174 Persons 71 Items MEASURED: 14 9 Persons 42 Items 7 CATS

Persons MAP OF Items.


<more>|<rare>
-' . I Q32 •
90 +
80 +T FVA12-^commit suicide
I
I
. I
| FVA9-effects recur FVOD9-doctor
70 +
| FVOD7-trouble w/law
IS
. I •

# | FVAlO-relationship FVAll-Nervous/shakes
FVA6-trouble
60 # T+ FVOD12-avoid withdrawal FVOD3-more aware
Q55

## 1 FVA8-argued FVOD14-treatment pro


FVOD4-sex
, # 1 FVA3-energy FVA7-depressed
FVOD8-really stoned
# 1 FVOD5-help Q58
50 ### +M FVODl-improve thinking FVODlO-activities
FVOD13-life Q53 responsibilities
.# SI FVAl-lunch FVA2-feelings
FVOD6-forget Q50
.## 1 FVOD2-feel better Q20 disapproval
### 1 FVODll-aod Q59
.# 1 Q39 law
40 .'#### + FVA5-physical probs Q54
Q56
.### 1
.###### 1
##### M|S FVA4-intended Q7 not lived
### 1 Q29'
30 ####### +
.### 1
# 1 Q46
## 1 Q13
.### 1
20 .## S+T
.### 1 Q27
.#### 1
1 Q22
.# 1
10 +
1
T|
0 +
-10 +
<less>I<frequ>
EACH '#' IS 2.
148
The final RPCA indicated that 81 percent of the total variance was explained by

the remaining 42 items on the dichotomous SASSI-3. This demonstrated improvement in

this scale by adjusting the response scale and eliminating misfitting people and items. In

addition, the item/person map means and standard deviations were separated by nearly

one standard deviation, indicating that the items were more difficult to endorse than the

people were able to agree to them.

In the final step of the analysis, the extrapolated variable was compared using the

data from a second comparable group using the same process. Dichotomous SASSI-3,

using Group 2 data, demonstrated similar person and item separation and reliability

findings. As with Group 1, the response options were not being used as intended (see

Figure 26).

Figure 26

Response Option 0123 Dichotomous SASSI-3 Group 2

INPUT: 173 Persons 71 Items MEASURED: 173 Persons 71 Items 8 CATS


3.62.1 .

SUMMARY OF CATEGORY STRUCTURE. Model="R"


FOR GROUPING "B" Item NUMBERS: 4 6-71

+ ' '
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||CALIBRATN| MEASURE|

0 0 2774 621-14.89 -14.21 .85 .931 | NONE | (-14.63)1 0


1 1 574 13| -6.76-8.541 .76 .431| 4.33 | -4.37 | 1
2 2 559 12| -2.23 -2.60 1 .85 .561| -5.31 | 3.58 | 2
3 3 531 12| 4.48 3.52| .87 .791| .98 1(15.58)1 3

MISSING 60 1| -7.29 | || I I
+ • ^- •-

OBSERVED AVERAGE is mean of measures in category. It is not a parameter


estimate.

149
Figure 26 (Continued)
+ - - — ••• • — - > 1 — - . — - > - — • —

CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|


LABEL MEASURE S.E. | AT CAT. ZONE |PROBABLTY| M->C C->M|DISCR|

0 NONE | (-14.63) -INF -9.67| | 88% 78%| I0


1 4.33 .38 | -4.37 -9.67 -.461 . -5.12 j 24% 53%| 1.11 | 1
2 -5.31 .45 | 3.58 -.46 9.54| -1.17 | 33% 35%| 1.33| 2
3 .98 .57 |( 15.58) 9.54 +INF | 5.77 |. ' 84% 26% | 1.33| 3
' + ' — - — • • — • — . ; - - - — . • - — • —

M->C = Does Measure imply Category?


C->M = Does Category imply Measure?

p ++ -+ ,-- 1 h H 1 h - + --; 1 h h+
R 1.0 + +
0 100 1
B I 000000 333333 1
A 1 000 3333 |
B .8 + 000 3333 +
I 1 00 33 |
L 1 00 • 333 |
I 1 00 33 |
T .6 + • 0 • 3 +

Y 1 0 33 |
.5 + 00 33 +
0 I. 0 3 |
F .4 + 0 3. +
1 00 33 |
R 1 *2*222222222 |
E 1 2223*0 2222 |
S .2 + 111***1**1 0 22222 +
P. 1 1111111222 33 111** 22222 |
0 1 1111111 2222 333 ***n 22222221
N 111111 22222222 33333 00***11111 1
S .0 +*******333333333 00000**************+
E ++ .- -+ + -+ + +-—•—+ + +-• +- ++
-24 -19 -14 -9.-4 1 . 6 11 16 21 26
Person [MINUS] Item MEASURE

A collapsing strategy was developed by reviewing the thresholds and probability curves

(see Table 17). This strategy lead the researcher to combine the two middle response

options: 1-Once or Twice and 2-Several times. This combination allowed for a better

functioning response scale and an increase in the item separation and reliability findings.

150
Table 17

Summary of Collapsing Strategy for Dichotomous Group 2 Face Valid Response Options

Rating Probability
Threshold2 PS&R IS&R RPCA
1
Scale Curve

0 = 0.95
0-l=N/A
1 = 0.20
0,1,2,3 1-2 = 9.64 3.53A93 6.15/.97 62.2%
2 = 0.30
2-3 = 6.29
3 =0.95

0 = 0.95
0-l=N/A
0,1,1,2 1=0.45 3.39A92 6.22A97 56.7%
1-2 = 8.32
2 = 0.95

0 = 0.95
0-l=N/A
0,0,1,2 1=0.25 3.11/.91 6.22/.91 61.4%
1-2 = 8.66
2 = 0.95

0 = 0.95
0-l=N/A
0,1,2,2 1=0.20 3.42A92 5.96A97 56.7%
1-2 = 14.44
2 = 0.95

Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &

Reliability. RPCA = Rasch principal components analysis.

151
The corrected response option curves for Group 2 are presented in Figure

Figure 27

Corrected Response Option Curve 0112 Dichotomous SASSI-3 Group 2


INPUT: 173 Persons 71 Items MEASURED: 173 Persons 71 Items 7 CATS

SUMMARY OF CATEGORY STRUCTURE. Model="R"


FOR GROUPING "B" Item NUMBERS: 46-71

+
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||CALIBRATN| MEASURE|

0 0 2774 621-18.69 -17.51 .76 .831 | NONE | (-17.51)1 0


1 1 1133 25| -6.59 -8.50| .72 .48|| -4.16 | .00 | 1
2 2 531 12| 3.74 1.71| .81 .70|| 4.16 |( 17.51)1 3

MISSING 60 11-10.20 | | | | |
+
OBSERVED AVERAGE is mean of measures in category. It is not a parameter estimate.

+
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIMI
LABEL MEASURE S.E. | AT CAT. ZONE 1PROBABLTY | M->C C->M|DISCR|

0 NONE | (-17.51) -INF -9.741 | 84% 86%| I0


1 -4.16 .36 | .00 -9.74 9.741 -6.99 | 48% 60%| 1.331 1
2 4.16 .55 |( 17.51) 9.74 +INF | 6.99 | 83% 26%| 1.26| 3

M->C = Does Measure imply Category?


C - > M = Does Category imply Measure?

CATEGORY PROBABILITIES: MODES - Structure measures at intersections


p —+-
R 1.0 + +
O 10 21
B I 00000 22222 |
A I 0000 2222 |
B .8 + 000 222 +
I 1 00 22 1
L 1 00 22 1
I 1 00 22 1
T .6 + 00 22 +
Y 1 00 22 1
.5 + 0 2 +
O 1 00 11111 22 1
F .4 + 111*1 1*111 +
1 111 00 22 111 1
R 1 111 0*2 111 1
E 1 111 22 00 111 1
S .2 + 111 22 00 111 +
P 1 1111 22 00 1111 |
O 1 11111] 2222 0000 111111 |
N 11 2222222 0000000 H
S .0 +2222222222222 0000000000000+
E ++ + + + + —+- ++
-30 -20 -10 0 10 20 30
Person [MINUS] Item MEASURE

152
There were 35 misfitting people and 18 items that failed to meet the standards for

item fit. The elimination of these people and items increased the final person and item

separation and reliability results to 3.45A92 and 6.11/.97, respectively. The item map

identified five pairs of aligned items. These items were considered for elimination. See

Table 18 for details.

153
Table 18

Dichotomous SASSI-3 Group 2 Paired Aligned Items Fit Statistics

Infit Outfit

Items MNSO ZSTD MNSO ZSTD

Q54 .62 -3.4 .45 -2.8

Q59* .62 -3.4 .45 -2.8

Q35 1.36 2.7 1.92 3.3

Q50* 1.37 2.8 1.54 2.1

Q29* .96 -.5 .97 -.1

Q49 .96 -1.2 2.50 5.9

Q56* .96 -1.0 1.01 .1

Q7 .99 -.1 .94 -.3

Q42 .78 -2.9 .76 -1.3

Q48* 1.07 .8 1.19 1.0

Q22 1.16 1.0 2.20 2.3

Q28* 1.32 1.7 1.55 1.2

*=better fitting item of the pair. Items with no * were eliminated.

154
While Group 2's final RFC A for the scale increased to 85.5 percent of the variance being

accounted for by the items, the person and item separation and reliability findings

decreased to 3.17/.91 and 5.78A97, respectively. Therefore, these items remained in the

instrument. As was presented for the Dichotomous SASSI-3 for Group 1, the hierarchy of

item endorsement difficulty for Group 2 is also presented on Figure 28.

155
Figure 28

Item Map Dichotomous SASSI-3 Group 2


INPUT: 173 Persons 71. Items MEASURED: 139 Persons 53 Items 7 CATS

Persons MAP OF Items


<more>|<rare>
. • I FVA12-commit suicide
90 +
I Q32
IT
80 +
I EVA9-effects recur
70 + FVAll-Nervous/shakes FVOD9-doctor
IS FV0D7-trouble w/law Q61
I FV0D3-more aware Q23 clever
Q55
# | FV0D12-avoid withdrawal Q17 respectful
. T| FVAlO-relationship FVOD14-treatment program
FV0D4-sex Q18
60 .# + FVA6-trouble
# | FVOD5-help FVOD8-really storied
# I FVOD13-life
I FVA8-argued FVODl-improve thinking
FVODlO-activities
## | FVA3-energy FVA7-depressed
FVOD2-feel better FVOD6-forget
Q58
50 # +M Q20 disapproval
## S| Q54 Q59
. •• I FVA2-feelings FVODll-aod
Q35 Q50
Q52 resentful
.# 1
#### 1 Q39 law
40 . # • #
+
## 1 FVA5-physical probs Q19 leaye home
##### 1 Q29 Q49
Q56 Q7 not lived
.#### 1 Q42 Q4 8 punished
Q60
.#### M|S FVA4-intended Q.4 police
30 .## + Q40
##### 1
#### 1
### 1
### 1
20 ### +
.### SI Q44
.# 1
.### IT Q27
10 .# ' +
1 Q22 Q28
T.I
# 1
+
<less>|< frequ>
EACH '#' IS 2.

156
A side-by-side comparison of the Groups' respective item-endorsement difficulty

indicated that the scale maintained some of its consistency on the hierarchy of item

difficulty across groups (see Table 19). This means that the dichotomous SASSI-3*s

items can be divided into eight levels of difficulty with high (.97) reliability. Further,

these items distinguish four levels of differences among the groups with high (.91)

reliability.

157
Table 19

Dichotomous SASSI-3 Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item Number Item Item Number Item

Q32 Break more laws FVA12 Suicide

FVA12 Suicide Q32 Break more laws

FVA9 Drinking effects FVA9 Drinking effects

FV0D9 Tried to talk a Dr. into it FVA11 Shakes after sober

FV0D7 Legal trouble FVOD9 Tried to talk a Dr. into

FVA10 Relationship problems FVOD7 Legal trouble

FVA11 Shakes after sobering Q61 Antacid

FVA6 Work/school problems FVOD3 More aware

FVOD12 Avoid withdrawal Q23 Clever crooks

FVOD3 More aware Q55 Morning drinks

Q55 Morning drinks FVOD12 Avoid withdrawal

FVA8 Argued Q17 Respectful

FVOD14 Treatment programs FVA10 Relationship problems

FVOD4 Improve sex FVOD14 Treatment program

FVA3 For energy FVOD4 Improve sex

FVA7 Depressed after sober Q18 Obey the law

158
Table 19 (Continued)

FV0D8 Really stoned FVA6 Work/school problems

FV0D5 Forget helplessness FVOD5 Forget helplessness

Q58 Get in trouble FVOD8 Really stoned

FV0D1 Improve thinking FVOD13 Keep from life

FVOD10 Drug-related activities FVA8 Argued w/others

FVOD13 Keep from life FVODl Improved thinking

Q53 Responsibilities FVOD10 Drug-related activities

FVA1 Midday drinks FVA3 For energy

FVA2 Express feelings FVA7 Depressed after sober

FVOD6 Forget pressures FVOD2 Feel better

Q50 Full of energy FVOD6 Forget pressures

FVOD2 Feel better Q58 Get in trouble

Q20 Disapproving looks Q20 Disapproving looks

FVODll* Drink & drug together Q54 Neglected obligations

Q59 Family problems Q59 Family problems

Q39 Never broken laws FVA2 Express feelings

FVA5 Physical Problems FVODll Drink & drug together

Q54 Neglected obligations Q35 Memory problems

Q56 Teenage use Q50 Full of energy

FVA4 More than intended Q52 Resentful

Q7 Not lived Q39 Never broken laws

159
Table 19 (Continued)

Q29 Control myself FVA5 Physical problems

Q46 Undesirable types Q19 Tempted to leave

Q13 Worn out Q29 Control myself

Q27 Too much Q49 Cigarettes

Q22 Avoided people Q56 Teenage use

Q7 Not lived

Q42 Too often

Q48 Rarely punished

Q60 Away from home

FVA4 More than intended

Q4 Police trouble

Q40 Couldn't remember

Q44 Who is to blame

Q27 Too much

Q22 Avoided people

Q28 Uninteresting life

Easy to endorse

Eliminate(litems:

Qi Lie Q1 Lie

Q4 Police trouble Q5 Well behaved

Q5 Well behaved Q6 Not my fault

160
Table 19 (Continued)

Q6 Not my fault Q8 Friendly

Q8 Friendly Q9 Daydream

Q9 Daydream Qll Sitting still

Qll Sitting still Q13 Worn out

Q16 Wasn't up to it Q16 Wasn't up to it

Q17 Respectful Q25 Dangerous

Q18 Obey the law Q31 No good

Q19 Tempted to leave Q33 Take the blame

Q23 Clever crooks Q41 Think carefully

Q25 Dangerous Q46 Undesirable types

Q28 Uninteresting Q51 Sat about

Q31 No good Q53 Responsibilities

Q33 Take the blame Q64 Happy

Q35 Memory problems Q65 Restless

Q40 Couldn't remember FVA1 Midday drinks

Q41 Think carefully

Q42 Too often

Q44 Who is to blame

Q48 Rarely punished

Q49 Cigarettes

Q51 Sat about

161
Table 19 (Continued)

Q52 Resentful

Q60 Away from home

Q61 Antacid

Q64 Happy

Q65 Restless

Review of Research Hypotheses 1-4

The RCPA analyses indicated that the following scales were unidimensional in

structure (i.e., each scale accounted for equal to or greater than 60 percent of the scale's

total variance): FVA, FVOD, SYM, OAT, SAT, DEF, FAM, COR, and the dichotomous

SASSI-3 scale. The SAM and the RAP scale's RPCA, failed to meet the minimum 60

percent cutoff. Therefore, the researcher rejected Research Hypothesis 1.

The following scales' item fit produced infit and outfit statistics indicative of low

item error: FVA, FVOD, SAM, SYM, OAT, SAT, DEF, FAM, COR, and the

dichotomous SASSI-3 scale. The RAP scale's items did not meet the acceptable

standards for item fit. Therefore, the researcher rejected Research Hypothesis 2.

The following scales' reliability statistics demonstrated acceptable levels of

internal consistency: FVA, FVOD, OAT, SAT, SAM, DEF, FAM, COR, and the

dichotomous SASSI-3 scale. The RAP scale did not produce acceptable reliability

statistics for internal consistency. Therefore, the researcher rejected Research Hypothesis

3a.

162
The following scales remained reliably defined across samples: FVA, FVOD,

SYM, OAT, SAT, SAM, DEF, FAM, COR, RAP, and dichotomous SASSI-3. Therefore,

the researcher failed to reject Research Hypothesis 3b.

The following scales demonstrated high discriminatory ability: FVA, FVOD, and

the dichotomous SASSI-3 scale. The SYM, OAT, SAT, SAM, DEF, FAM, COR, and

RAP did not demonstrate discriminatory ability. Therefore, the researcher rejected

Research Hypothesis 4.

Whole SASSI-3

The SASSI-3 has a total of 93 items. Eleven of the SASSI-3's 93 items are not

used on any of the ten scales. Twenty-six of these 93 load on more than one scale. While

the 26 shared items each have dichotomous response options, nine are true on at least one

of the scales and false on another (see Table 20). Items that do not fall in the same

direction or cannot be coded as such are deemed to be misfitting. While there is a key

indicating the expected or "correct" response as identified by the authors of the SASSI-3,

twenty items either have opposite correct answers on two different scales or have no

correct answer listed. In addition, this creates interdependence and artificially high

intercorrelations on other scales. Because of this interdependence and high

intercorrelations, it was expected that many of these items will appear to be redundant or

misfit.

163
Table 20

Multivocal Items, Items on No Scale

Item Number Item Scale

Ql Lie DEF, FAM, COR

Q4 Never trouble for police OAT, SAM

Q7 Not lived OAT, SAM, COR

Q8* Friendly DEF, FAM

Q9 Daydream DEF, SAM, FAM

Q12 Take my advice No scale

Q14 Enjoy moving No scale

Q15 Not to talk No scale

Q16 Wasn't up to it DEF, SAM

Q18 Obey the law SAT, COR

Q21 Other's can't handle it No scale

Q23* Clever crooks OAT, FAM

Q25 Dangerous DEF, FAM

Q26 Bored No scale

Q27 Too much SYM,FAM

Q31* Break more laws DEF, COR

Q32 Break more laws SAT, COR

Q34 Crying No scale

Q37 Successes No scale

164
Table 20 (Continued)

Q39* Never broken laws OAT, SAM, FAM, COR

Q40 Couldn't remember SYM, SAM, COR

Q41* Think carefully DEF,COR

Q42 Picked on SYM, SAM, COR

Q45 Make lists No scale

Q48 Rarely punished OAT, SAM, COR

Q50* Full of energy SAT, FAM

Q51 Sat about DEF, SAM

Q54* Neglected obligations SYM, SAM, FAM, COR

Q55* Morning drinks SYM, FAM, COR

Q57 Dad was a drinker No scale

Q61* Antacid SAT, DEF, FAM

Q62 Never sad No scale

Q65 Restless DEF, FAM

Q66 Spontaneous No scale

Note: * items response options are true on at least one scale and false on another.

The initial person and item separation and reliabilities for the SASSI-3 were

3.29A92 and 5.73A97, respectively. Additionally, the initial RPCA was 54.6 percent with

only one contrast that accounted for more than 5 percent of additional variance.

Combined, these findings suggest that the SASSI-3 is a logical linear construct that meets

the minimum standards of distinguishing differences among the sample. However, the

165
instrument is multidimensional and accounts for less than the accepted level of 60 percent

of the total variance. This suggests that the SASSI-3 is measuring more than just one

construct. And, whatever additional constructs that are being measured account for

approximately 40 percent of the SASSI-3's variance. As such, the researcher attempted to

determine whether augmentations could be made that would render the SASSI-3 a

unidimensional instrument. Because the SASSI-3 involves items from both the front and

back of the instrument, true/false and Likert-type response options, it is important to

evaluate the response scales for validity. Inspection of the probability curves and

thresholds indicated that response option 1-Once or twice and 2-Several times (Figure 20)

did not meet the standards for cutoffs for the face valid response scales as identified

earlier (see Table 29).

Figure 29

Response Options Curves 0123 Dichotomous Whole SASSI-3 Group 1

INPUT: 174 Persons 93 Items MEASURED: 174 Persons 93 Items 8 CATS

SUMMARY OF CATEGORY STRUCTURE. Model="R"


FOR GROUPING "B" Item NUMBERS: 68-93
+ ; : ' ; - - •

|CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT| |STRUCTURE|CATEGORY I


• |LABEL SCORE COUNT %fAVRGE EXPECT| MNSQ MNSQ| |CALIBRATN| MEASURE|

I 0 '0 2810 621-12.57 -11.8| .71 .761| NONE | (-13.54)1 0


I 1 1 583 13| -5.92 -7.421- .84 .58|| 6.11 | -3.91 I 1
1 2 2 533 12| -2.14 -3.03| .82 .5611 -4.32 | 3.38 I 2
1 3 3 585 131 2.52 1.21| .78. .5911 -1.79 |( 14.09)1 3

IMISSING 13 0| -3.16 | | | | |
+ : '
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.

166
Figure 29 (Continued)
+ r
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM. T COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. ZONE- |PROBABLTY| M->C C->M|DISCR|

0 NONE . I (-13.54) -INF -8.861 1 89% 72% | | 0


1 6.11 .36 | -3.91 -8.86 -.301 -4.20 | 21% 55%| 1.131 1
2 -4.32 .42 | 3.38 -.30 8.72 1 -.79 | 29% _ 35%| 1.63 12
3 -1.79 .52 | (14.09) 8.72 + INF | 4.57 | 84% 12%| 1.561 3

M->C = Does Measure imply Category?


C->M = Does Category imply Measure?

CATEGORY PROBABILITIES: MODES - Structure measures at intersections


p ++— — + - H H ^H H H - H •--- + h :--++
R 1.0 + +
0 1000 3333 1
B 1 ooooc1 333333 |
A 1 0000 3333 |
B .8 + 00 333 +
I 1 00 333 |
L 1 00 33 |
I ~, 1 00 33 |
T .6 + 0 3 +
Y,- 0 33 |
.5 +
r 00 3 +
O i 0 3 |
F .4 + ,- 0 33 +
i 0 3 |
R i 0* I
E i **2*222222222 |
S .2 + 11*** 00 22222 +
P i 111111**2*3 111111100 22222 |
O i 1111111 2222 33 11**1 2222222 |
N l i m n 222222*33333 0 *****111 22222 1
S .0 +****** **'333333 QOO****************+
E ++-- +- + + + +—^-'—+ •—-+ -+-- + ++
-23 -18 -13 -8 -3 2 7 12 17 22 27
Person [MINUS] Item MEASURE

167
Table 21

Summary of Collapsing Strategy for Whole SASSI-3 Face Valid Response Options

Rating Probability
Threshold2 PS&R IS&R RPCA
Scale Curve1

0 = 0.95
0-l=N/A
1 = 0.20
0,1,2,3 1-2 = 10.43 3.297.92 5.727.97 77.8%
2 = 0.25
2-3 = 2.44
3 =0.95

0 = 0.95
0-l=N/A
0,1,1,2 1=0.40 3.00/.90 5.74A97 49.7%
1-2 = 3.84
2 = 0.95

0 = 0.95
0-l=N/A
0,0,1,2 1=0.20 7.78A89 5.71/.97 53%
1-2=13.48
2 = 0.95

0 = 0.95
0-l=N/A
0,1,2,2 1=0.20 3.12/.91 5.70/.97 52.1%
1-2=16.54
2 = 0.95

Note. 1 = >/ .5 is acceptable. 2 = >/ 1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &

Reliability. RPCA = Rasch principal components analysis.

168
Therefore, the collapsing strategy for the face valid response scales positively increased

the item separation and reliability to 5.74/.97, respectively (Figure 30). As stated above

because the response options for the other scales are true and false no additional

examination was warranted.

Figure 30

Corrected Response Options Curve 0112 Whole SASSI-3 Group 1

INPUT: 174 Persons 93 Items MEASURED: 174 Persons 93 Items 7 CATS

SUMMARY OF CATEGORY STRUCTURE. Model="R"


FOR GROUPING "B" Item NUMBERS: 68-93
;
+ •* '
CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|
LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ||CALIBRATN| MEASURE|

0 0 2810 621-15.72 -14.5| .69 .77|| NONE | (-16.15)1 0


1 1 1116 25| -6.07 -7.841 .75 .581| -1.92 | .00 | 1
2 2 585 13| 1.12 -1.35| .77 .60|| 1.92 |( 16.15)1 3

MISSING 13 0| -5.31 | | | | |
+- ;
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.

+ -• . - - —

CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.| COHERENCE|ESTIM|


LABEL MEASURE S.E. | AT CAT. — — Z O N E — — | P R O B A B L T Y | M->C C->M|DISCR|

0 NONE |(-16.15) -INF -8.92| | 85% 83%I 10


1 -1.92 .35 | .00 -8.92 8.92| -5.73 | 42% 64% j 1.351 1
2 1.92 .50 |( 16.15) 8.92 +INF | 5.73 I 81% 10%I 1.42| 3
+—'--. - • >
M->C = Does Measure imply Category?
C->M = Does Category imply Measure?

169
Figure 30 (Continued)

CATEGORY PROBABILITIES: MODES - Structure measures at intersections

R 1.0 +
0 000 222|
B 00000 22222
A 0000 2222
B 000 222
I 00 22
L 00 22
I' 00 22
T .6 +
Y I 00 22
.5 + 00 22
0 I 0
F .4 + 0011122
I 1111110 2111111
R I 111 2*0 111
E I 111 2 0• 111
S .2 + 1111 22 00 1111 . +
P 1111 222 000 1111 I
0 11111 222 000 11111 |
N 1111 2222222 0000000 11111
'S .0 +2222222222222 0000000000000+
E
-30 -20 -10 0 .10 20 30 ;
Person [MINUS] Item MEASURE

Step two of the Rasch instrument validation analysis involved reviewing the

person and item fit. Inspection of the items and persons lead to a final iterative

elimination of 22 people and 20 items which failed to meet the standards set forth for

item fit. This resulted in the final person and item separations and reliabilities of 3.82A94

for persons and 5.63/.97 for the SASSI-3 with an RPCA of 69.2 percent. This suggests

that the remaining items form a well defined linear construct that can be divided into

seven levels of difficulty, and it does a reliable (.97) job in discriminating five different

groups among the people from low to high agreeability on the hierarchy of items.

The third step in the analysis involved a review of the person-item map to explore

the extrapolated construct. The resulting hierarchy produced a pattern of items ranging

from most difficult to endorse to least difficult to endorse (see Table 22). No items shared

170
the same position on the scale. Group 1 's final RPCA indicated that 69.2 percent of the

total variance was explained by the SASSI-3's remaining 73 items. This improvement

was achieved by adjusting the response scale and eliminating misfitting people and items.

In addition, the item/person map means and standard deviations were separated by nearly

one standard deviation (Figure 31). This separation indicates that the items were more

difficult to endorse than the people were able to agree to them.

Figure 31

Item Map Whole SASSI-3 Group I

INPUT: 174 Persons 93 Items MEASURED: 152 Persons 73 Items 7 CATS

Persons MAP OF Items


<more>|<rare>
90 + Q10 criticized
5
Q2 make mistakes
Q30 confused
Q43 picked on
Q62 sad
I
I'
I
I
| Q47 joke
'I
80 +
I
I FVA12-commit suicide
Q32 break laws
IT
I
. I
I
70 + FVA9-effects recur
I FVOD9-doctor
I
I Q23 crooks are clever
. .1
# T|S
FVAll-Nervous/shakes
Q15 better to not talk
Q18 obey the law
Q55 morning drinks
# | FVA6-trouble
FVOD7-trouble w/law
Q17 respectful of authority
60 .'# + FVAlO-relationship
.# I FVOD12-avoid withdrawal
FVOD3-more aware
. . | Q37 successes
.## | FVOD14-treatment program .
FVOD4-sex
.## | FVA3-energy

171
Figure 31 (Continued)
FVA8-argiied
Q31 no good
Q53 responsibilities .
I FVA7-depressed
FVOD8-really stoned
Q58 drink/drugs trouble
S| FVAl-lunch
Q50 energy
50 .## +M FVA2-feelings
FVODl-improve thinking
FVOD13-life
FVOD5-help
Q20 disapproving looks
Q24 teachers had probs
Q52 resentful
Q67 binge
##### I FVODlO-activities
FVOD6-forget
Q57 father
Q9 daydream
# | FVODll-aod
FVOD2-feel better
.### I FVA5-physical probs
Q33 take blame
Q59 family probs
1 Q21 others would not deal
Q56 teenager
#### 1 Q19 tempted to leave
Q54 neglected obligations
#### 1 Q42 too much/often
Q4 9 cigarettes
Q63 loss for words
40 ######## M+ FVA4-intended
Q60 away from home
.## 1 Q26 need something to do
Q40 done things
Q48 punished
Q7 not lived the way
#### IS
#### 1 Q29 control
Q4 trouble w/police
1
.### 1 Q46 undesirable types
#### 1 Qll sitting still
Q13 worn out
Q6 not my fault
30 ### +
.# S| Q44 blame
### 1 Q66 spur of moment
### 1 Q27 drunk too much
.## IT
1 Q14 moving
.# l
I
20 +
# 1

T|

# ' I
10 +
<less>|<frequ>
EACH ' # ' IS 2.

172
In the final step of the analysis, the extrapolated variable was compared using the

data from a second comparable group using the same process. The SASSI-3, using Group

2 data, demonstrated similar person and item separation and reliability findings, 3.S7/.93

and 5.26A97, respectively and a RPCA of 60.9 percent. As with Group 1, the response

options were not being used as intended (Figure 32).

Figure 32

Response Options Curves 0123 Whole SASSI-3 Group 2

INPUT: 141 Persons 93 Items MEASURED: 141 Persons 93 Items 8 CATS


3.62.1

SUMMARY OF CATEGORY STRUCTURE. Model="R"


FOR GROUPING "B" Item NUMBERS: 68-93

:
+ — - — - — • - — ' • -' . — ' •

CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT||STRUCTURE|CATEGORY|


LABEL SCORE COUNT %IAVRGE EXPECT| MNSQ MNSQ||CALIBRATN| MEASURE|

0 0 2126 58|-12.76 -12.01 .79 . 85 | | N O N E | (-14.30)1 0


1 1 499 14| -5.9.4 -7.27 | .71 .44|| 4.84 | -4.22 | 1
2 2 507 1 4 | -2.07 -2.43| .83 .571| -5.00 I 3.52 | 2
3 3 499 14| 3.82 2.37 | .79 .701 | .16 |( 15.12) | 3

MISSING 35 1-1 -4.39 | II I I


+ ^--^ .__^__^ —L .. —
OBSERVED AVERAGE is mean of measures in category. It is hot a parameter
estimate.

+: ^_, -^ . . ~
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 50% CUM.. I COHERENCE|ESTIM|
LABEL MEASURE S.E. | AT CAT. -ZONE |PROBABLTY| M->C C->M|DISCR|

0 NONE I (-14.30) -INF -9.421 1 88% 71% | 10


1 4.84 .41 | -4.22 -9.42 -.411 -4.84 | 23% 58% | 1 131 1
2 -5.00 .46 | 3.52 -.41 9.281 -1.05 | 32% 37% 1 1 501 2
3 .16 .57 |( 15.12) 9.28 + INF | 5.39 | 88% 20% | 1 58| 3

M->C = Does Measure imply Category?


C->M = Does Category imply Measure?

173
Figure 32 (Continued)

CATEGORY PROBABILITIES: MODES - Structure measures at intersections


p ++ -+- 1 j. + 1 : + h--- h H -+ +
R 1.0 + • +
0 1000 31
B I 00000 333333 I
A 1 0000 3333 I
B .8 + 00 333 +
I 1 000 333 |
L 1 0 .33 |
I 1 00 .33. |
T .6 + 0 33 +
Y 1 00 . 3 ' I
.5 + 0 33 +
0 1 0 3 1
F .4 + 00 33 +
1 0 3 1
R 1 0**22222222 |
E 1 222*00 2222 |
S .2 + 11***1**1 0 22222 +
P 1 111111*22 33 111** 22222 |
0 1 11111111 2222 333 ***11 2222222 1
N 111111 2222222 33333 00****1111 2|
S .0 _fl_* * * * * * * * 33333333 0000**************+
E ++ -+- — + + + —+_. +—-;—+ .—-+ + ++
-24 -19 -14 -9 -4 1
Person [MINUS] Item MEASURE

The researcher developed a collapsing strategy after reviewing the thresholds and

probability curves. The two middle response options, 1-Once or Twice and 2-Several

times, were combined (Figure 33).

Figure 33

Corrected Response Options Curve 0112 Whole SASSI-3 Group 2

INPUT: 141 Persons 93 Items MEASURED: 141 Persons 93 Items 7 CATS


3.62.1

SUMMARY OF CATEGORY STRUCTURE. Model="R"


FOR GROUPING "B" Item NUMBERS: 68-93
+ -— -' • —

CATEGORY OBSERVED|OBSVD SAMPLE|INFIT OUTFIT I I STRUCTURE|CATEGORY|


LABEL SCORE COUNT %|AVRGE EXPECT| MNSQ MNSQ|ICALIBRATN| MEASURE|

0 0 2126 581-15.83.-14.51 .73 .801 | NONE | (-17.07)1 0


1 1 1006 27 1-5.85-7.321 .71 .51|I -3.46 | .00 | 1
2 2 4 99 14| 2.79 .17 1 .76 .6511 3.46 - | ( 17.07)' | 3

MISSING 35 1| -6.73 I I I . I I

174
Figure 33 (Continued)
OBSERVED AVERAGE is mean of measures in category. It is not a parameter
estimate.

+
CATEGORY STRUCTURE | SCORE-TO-MEASURE | 5 0 % CUM.I COHERENCE|ESTIM|
LABEL MEASURE S.E. | A T CAT. ZONE |PROBABLTY| M->C C->M|DISCR|

0 NONE | (-17.07) -INF -9.47| | 83% 81% | | 0


1 -3.46 .39 I .00 -9.47 9.47| -6.58 I 46% 6 7 % | 1.41| 1
2 3.46 .55 |( 17.07) 9.47 +INF | 6.58 I 86% 1 8 % | 1.39| 3
+
M->C = Does Measure imply Category?
C->M = Does Category imply Measure?

CATEGORY PROBABILITIES: MODES - Structure measures at intersections


p ++- + — 1 -| 1. -i + +

R 1.0 + +
0 100 22 1
B 1 00000 22222 |
A 1 0000 2222 |
B .8 + 00 22 +
I 1 000 222 |
L 1 00 22 |
I 1 00 22 |
T .6 + 0 2 +
Y 1 00 22 . |
.5 + 00 22 +
0 1 0 2 |
F .4 + 1**11111**1 +
1 1111 0 2 1111 |
R 1 111 0*2 111 |
E 1 111 22 00 111 |
S .2 + 111 22 00 111 +
P 1 1111 222 000 1111 |
O 1 111111 222 000 111111 |
N ill 2222222 0000000 111
S .0 +2222222222222 0000000000000+
E ++- + — + + + + ++
-30 -20 -10 0
Person [MINUS] Item MEASURE

This allowed for a better functioning response scale and an increase in the person and

item separation and reliability findings (see Table 22).

175
Table 21

Summary of Collapsing Strategy for Whole SASSI-3 Group 2 Face Valid Response

Options

Rating Probability
j2
Threshold* PS&R IS&R RPCA
Scale Curve1

0 = 0.95
0-l=N/A
1=0.20
0,1,2,3 1-2 = 9.84 3.57A93 5261.91 60.9%
2 = 0.30
2-3 = 5.16
3=0.95

0 = 0.95
0-l=N/A
0,1,1,2 1=0.40 3.32A92 5.25A97 58%
1-2 = 6.92
2 = 0.95

0 = 0.95
0-l=N/A
0,0,1,2 1=0.25 3.11/.91 S.23/.96 59.9%
1-2 = 9.54
2 = 0.95

0 = 0.95
0-l=N/A
0,1,2,2 1=0.20 3.37A92 5.25A97 60.6%
1-2=15.34 "
2 = 0.95

Note. 1 = >/ .5 is acceptable. 2 = >/1.4 is acceptable. PS& R = Person Separation & Reliability. IS & R = Item Separation &

Reliability. RPCA = Rasch principal components analysis.

176
After the iterative elimination of the 13 misfitting people and the 22 items that

failed to meet the standards for fit, Group 2's final person and item separation and

reliability findings improved to 4.12/.94 and 5.10/.96, respectively. The final RPCA for

the scale was also comparable at 74.8 percent of the variance being accounted for by the

items. This means that the items can be divided into seven levels of difficulty that

discriminate between five groups of people among the sample and have a high reliability

(.94 and .96, respectively). As was reported for the SASSI-3 for Group 1, the hierarchy of

item endorsement difficulty for Group 2 is also presented on Table 21. A side-by-side

comparison of the Groups' respective item-endorsement difficulty indicated that the scale

maintained some of its consistency on the hierarchy across groups. Of the 20 items

deleted from Group 1 's hierarchy all but one item were also deleted from Group 2's

hierarchy.

Table 22

Whole SASSI-3 Test of Independence

Item Hierarchy

Group 1 Group 2

Difficult to endorse

Item Number Item Item Number Item

Q10 Criticized Q43 Picked on

Q2 Make mistakes Q47 Laugh at jokes

Q30 Get Confused Q10 Criticized

Q43 Picked on FVA12 Suicide

177
Table 22 (Continued)

Q62 Never sad Q32 Break more laws

Q47 Laugh at jokes Q61 Antacid

FVA1.2 Avoid withdrawal FVA11 Shakes after sobering

Q32 Break more laws FVOD9 Tried to talk a Dr. into it

FVA9 Drinking effects Q23 Clever crooks

FV0D9 Tried to talk a Dr. into it FVA9 Drinking effects

Q23 Clever crooks Q15 Better to not talk

FVA1.1 Shakes after sobering Q18 Obey the law

Q15 Better not to talk Q34 Crying

Q18 Obey the law Q37 Successes

Q55 Moring drinks Q17 Respectful

FVA6 Work/home problems Q55 Morning drinks

FV0D7 Trouble with the law FVA10 Relationship problems

Q17 Respectful FVA12 Avoid withdrawal

FAV10 Relationship problems FVOD3 Become more aware

FVOD12 Avoid withdrawal Q31 No good

FVOD3 Become more aware FVA6 Work/school trouble

Q37 Successes FVOD4 Improve sex

FVOD14 Treatment program FVOD7 Legal trouble

FVOD4 . Improve sex FVOD14 Treatment program

FVA3 For energy FVA3 For energy

178
Table 22 (Continued)

FVA8 Argued w/ family FVA8 Argued w/family

Q31 Get confused FVOD8 Really stoned

Q53 Responsibilities Q9 Daydream

FVA7 Depressed after sober FVA7 Depressed after sober

FV0D8 Really stoned FVOD10 Drug-related activities

Q58 Get in trouble FVOD5 Forget helplessness

FVA1 Midday drinks Q20 Disapproving looks

Q50 Full of energy Q24 School problems

FVA2 Express feelings Q53 Responsibilities

FV0D1 Improve thinking FVA1 Midday drinks

FVOD13 Kept from life FVOD1 Improve thinking

FVOD5 Forget helplessness FVOD13 Kept from life

Q20 Disapproval looks FVOD6 Forget pressure

Q24 School problems Q50 Full of energy

Q52 Resentful Q52 Resentful

Q67 Binge Q58 Get in trouble

FVOD10 Relationship problems FVA2 Express feelings

FVOD6 Work/home trouble FVOD2 Feel better

Q57 Dad drinker Q33 Take the blame

FVOD11 Drink & drugs together Q67 Binge

FVOD2 Feel better FVOD11 Drink & drug together

179
Table 22 (Continued)

FVA5 Physical problems FVA5 Psychical problems •

Q33 Take the blame Q57 Dad drinker

Q59 Family problems Q59 Family problems

Q21 Others can't deal Q54 Neglected obligations

Q56 Teenage use Q63 Loss for words

Q19 Tempted to leave Q19 Tempted to leave

Q54 Neglected obligations Q21 Others can't deal

Q42 Too often Q60 Away from home

Q49 Cigarettes FVA4 More than intended

Q63 Loss for words Q26 Get bored

FVA4 More than intended Q48 Rarely punished

Q60 Away from home Q49 Cigarettes

Q26 Bored Q56 Teenage use

Q40 Couldn't remember Q29 Control myself

Q48 Rarely punished Q46 Undesirable types

Q7 Not lived Q7 Not lived

Q29 Control myself Q4 Police trouble

Q4 Police trouble Q42 Too often

Q46 Undesirable types Q40 Couldn't remember

Qll Sitting still Qll Sitting still

Q13 Worn Out Q66 Spontaneous

180
Table 22 (Continued)

Q6 Not my fault Q14 Moving

Q44 Who is to blame Q13 Worn out

Q66 Spontaneous Q44 Who is to blame

Q27 Too much Q27 Too much

Q14 Enjoy moving

to endorse

Eliminated items:

Ql Lie Ql Lie

Q3 Go along with Q2 Make mistakes

Q5 Well behaved Q3 Go along with

Q8 Friendly Q5 Well behaved

Q12 Take my advice Q6 Not my fault

Q16 Wasn't up to it Q8 Friendly

Q22 Avoided people Q12 Take my advice

Q25 Dangerous Q16 Wasn't up to it

Q28 Uninteresting life Q22 Avoided people

Q34 Crying Q25 Dangerous

Q35 Memory problems Q28 Uninteresting life

Q36 Tempted to hit Q30 Get confused

Q38 Feel sure Q35 Memory problems

Q39 Broken the law Q36 Tempted to hit

181
Table 22 (Continued)

Q41 Think carefully Q38 Feel sure

Q45 Make lists Q39 Never broken laws

Q51 Sat about Q41 Think carefully

Q61 Antacid Q45 Make lists

Q64 Happy Q51 Sat about

Q65 Restless Q62 Never sad

Q64 Happy

Q65 Restless

Review of Research Hypotheses 5-8

The whole SASSI-3's RPCA indicated that greater than 60 percent (74.8%) of the

variability is accounted for by the instrument. Based on this finding, the researcher failed

to reject Research Hypothesis 5.

The SASSI-3's remaining 63 items infit and outfit statistics fall within the lower

than 2.0 z-standardized and positive point-biserial cutoff statistics. Based on this finding,

the researcher failed to reject Research Hypothesis 6.

The SASSI-3 maintains its' item consistency as the items align in the same

general area on the variable across samples. Based on this finding, the researcher failed to

reject Research Hypothesis 7.

182
The SASSI-3 discriminates five different groups (person - 3,82) among sample

with high reliability (.94). Based on this finding, the researcher failed to reject Research

Hypothesis 8.

Summary

This study had two general research questions. General Research Question 1 was

Does modern measurement methodology assist in the revalidation of the SASSI-3?

General Research Question 2 was Does modern measurement theory assist in improving

the SASSI-3 instrument holistically? Based on the results reported in this chapter, the

researcher failed to reject both General Research Questions. Generally, the evidence

supports that the face valid scales meet fundamental measurement properties and the

subtle scales do not. Additionally, when combined with the subtle scales, the face valid

scales perform better but are still outperformed when they are used independently.

183
Chapter Five

Discussion

Substance abuse and dependency is an expensive problem in the United States of

America and one that has a negative impact on its citizens (Substance Abuse and Mental

Health Services Administration [SAMHSA], 2008). Substance dependence is associated

with untimely deaths, loss in work productivity, reduction in days attended at school,

increased costs due to substance dependence associated medical care, and criminal

activity (SAMHSA, 2008). It is important for people who struggle with alcohol and drug

dependency to get proper diagnosis and treatment to help reduce and eliminate these

unwanted biopsychosocial consequences. Part of the diagnostic process can involve

mental health professionals' use of substance use screening instruments. The

effectiveness of substance dependence assessment and treatment is limited by the

accuracy of the tools used in formulating a diagnosis. As such, due to the clinical

implications of the assessment process, it is necessary that substance abuse screening

tools are psychometrically sound and accurately measure the behaviors they are designed

to measure—substance abuse.

184
A number of substance abuse screening instruments are available to assist in this

process. A study of masters addictions counselors revealed that there are four substance

abuse screens that these counselors most frequently select as aids in their diagnostic

processes (Juhnke, Vacc, Curtis, Coll, & Paredes, 2003). These four screens are the

Substance Abuse Subtle Screening Inventory-3 (SASSI-3; Miller & Lazowski, 1999), the

Michigan Alcoholism Screening Test (MAST; Selzer, 1971) the Minnesota Multiphasic

Personality Inventory - 2's (MMPI-2; Butcher, Dahlstrom, Graham, Tellegen, &

Kaemmer, 1989) Mac Andrew Scale-Revised (Mac-R: MacAndrew, 1965), and the

Additions Severity Index (ASI; McLellan, Luborski, Cacciola, Griffith, McGranhan, &

O'Brien, 1992). Of these four, the Substance Abuse Subtle Screening Inventory-3 (Miller

& Lazowski, 1999) was identified by these counselors as being the most important

(Juhnke et al., 2003) for the following reasons: the SASSI-3, unlike the other three

identified screens, measures alcohol dependence as well as dependence on other drugs of

abuse; it provides several measures of response bias (e.g., defensiveness and random

answering); and it is scored and interpreted according to sex-specific national normative

data.

A robust but conflicting literature base has developed to address the SASSI-3's

psychometric characteristics. The results of these investigations provide a variant range

of agreement with what is published in the SASSI-3 Manual (Miller & Lazowski, 1999).

In fact, research conducted by investigators not associated with the SASSI-3's publishers

appears to question the SASSI-3's reliability and validity. Despite this well-developed

body of literature, nothing is known about the SASSI-3's alignment with the fundamental

185
principles of measurement (Thurstone, 1927). Psychometric concepts central to

Thurstonian principles of measurement are unidimensionality, linearity, invariance, and

independence. Unidimensionality refers to the degree to which an instrument is

evaluating just one construct (Bond & Fox, 2007). In this study, the construct purportedly

measured by the SASSI-3 is substance dependence (Miller & Lazowski, 1999). Linearity

refers to an ever increasing level of an instrument's items' difficulty (Bond & Fox). If an

instrument is linear, a hierarchy of items can be created according to level of difficulty.

Easier to answer items fall on one end of the spectrum and harder to answer items fall on

the opposite end of the spectrum. As applied to the measurement of substance

dependence, a disease theorized to be progressive in nature, easier to answer items about

one's substance use might include the following: "I can drink one or two drinks without

passing out." Most persons who consume alcohol could very likely answer that item

meaning that it is an easy item to answer. Conversely, a harder item to answer

affirmatively could be "I experience delirium tremens when I stop drinking." It is likely

that fewer persons' substance dependence has progressed to this level. Consequently, it is

harder for most people to answer this question affirmatively. Additionally, invariance

means that the items will be aligned on an equal interval. That is, for example, the

psychological distance between "never" and "sometimes" is the same psychological

distance between "sometimes and "frequently." Finally, an instrument that invariant will

demonstrate equal alignment of the items' response options, regardless of the sample in

which the construct is being measured.

186
Despite its popularity among addictions counselors (Junkhe, Vacc, Curtis, Coll,

and Paredes, 2003) and its use in a wide-range of settings, the SASSI-3's psychometric

properties have been found to differ (Arneth, Bogner, Corrigan & Schmidt, 2001;

Clements, 2002; Feldstein & Miller, 2007; Gray, 2001; Laux, Perea-Ditlz, Smirnoff &

Salyers, 2005; Laux, Salyers & Kotova, 2005; Lazowski, Miller, Boye, & Miller, 1998;

Peters et al., 2000; Svanum & McGrew, 1995), at times significantly, from those reported

in the Manual (Miller & Lazowski). These differences may be related to the traditional

methods of testing reliability and validity used by researchers. However, what is unclear

is whether the S ASSI-3 meets the fundamental requirements of measurement.

Consequently, it is unclear whether or not the SASSI-3 is actually measuring what it

purports to measure. If there is doubt about what the S ASSI-3 is measuring, then there is

also doubt about the implications of the diagnoses it informs and the subsequent

treatment recommendations that are prescribed based on these diagnoses.

Therefore, the purpose of the study was to investigate the SASSI-3's

psychometric alignment with the fundamental principles of measurement as represented

using the Rasch model (Rasch, 1960,1980). Specifically, this investigation focused on

the unidimensionality of the entire instrument and the individual scales. Additionally, it

evaluated the reliability rating scales by identifying whether the participants are utilizing

the scales as intended by the authors of the SASSI-3, and assessed the linearity,

invariance and independence of the instrument.

Summary of the Results

187
This study explored the measurement properties of the SASSI-3 in three parts.

The first part was to look at each scale individually. The SASSI-3 authors identified a

factor structure which resulted in ten scales (Miller & Lazowski, 1995). Those ten scales

included the Face Valid Alcohol (FVA), Face Valid Other Drug (FVOD), Obvious

Attributes (OAT), Subtle Attributes (SAT), Supplemental Addiction Measure (SAM),

Symptoms (SYM), Family versus Control (FAM), Defensiveness (DEF), Correctional

(COR), and the Random Answering Pattern (RAP) scales. The second part involved

exploring all the items together which contribute to the dichotomous decision of

likelihood of substance dependence or not. This included the face valid scales, the OAT,

SAT, SAM, SYM, and DEF scales only. The third part of the investigation involved the

exploration of the entire instrument including all 93 items. The following will summarize

these findings in the following order: each SASSI-3 scale, the dichotomous SASSI-3, and

the whole SASSI-3.

The SASSI-3 Scales

The FVA scale includes twelve items. Each item is accompanied by a four point

Likert-type rating scale response option. The respondent is directed to identify the

number of times he Or she has engaged in the particular behavior listed in the item. The

results of this investigation indicate that the FVA was unidimensional because its RPCA

was above 60 percent and it had no underlying contrasts. After adjusting the rating scale

for improved functioning and eliminating misfitting people it was found that the FVA

scale's items could be divided into ten levels of difficulty. These 10 levels discriminated

188
between nearly four groups of people ranging from low to high agreeability on the items

with high reliability (.98).

The FVOD scale includes fourteen items. As does the FVA, each of the FVOD's

items is accompanied by a four point Likert-type rating scale response option. The

respondent is requested to identify the number of times he or she has engaged in the

particular behavior listed in the item. The FVOD was unidimensional because its RPCA

was above 60 percent and it had no underlying contrasts. After adjusting the rating scale

for improved functioning and eliminating misfitting people, it was found that the FVOD

scale's items could be divided into six levels of difficulty. These six levels discriminated

between nearly four groups of people and ranged from low to high agreeability on the

items with high reliability (.96).

The SYM scale includes ten items, each with a dichotomous true-false response

option. After eliminating two items, the RPCA indicated that 92.9 percent of the variance

could be explained by the scale. However, despite the remaining eight items being

divided into as many levels of difficulty, the scale did not distinguish any differences

among the people in the sample. Therefore, this scale failed to meet fundamental

measurement properties.

The OAT scale includes twelve items, each with a dichotomous true-false

response option. The final RPCA indicated 60.3 percent of the total variance was

accounted for by the OAT scale with three underlying contrasts accounting for greater

than 5 percent of the variance. This implied that the OAT scale possibly had multiple

dimensions. Additionally, while the items were divided into seven levels of difficulty, the

189
scale did not distinguish any differences among the group. Therefore, this scale failed to

meet the fundamental measurement properties.

The SAT scale includes eight items, each with a dichotomous true-false response

option. The final RPCCA indicated that 92.8 percent of the variance was explained by the

SAT scale. Additionally, while the scale's items divided into as many levels of difficulty,

the SAT scale did not distinguish any differences among the group. Therefore, this scale

failed to meet the fundamental measurement properties.

The SAM scale includes fourteen items, each with a dichotomous true-false

response option. While the items could be divided into four levels of difficulty, the final

RPCA for Group 1 indicated that 47.4 percent of the variance was accounted for by the

SAM scale. In contrast, the RPCA for Group 2 indicated that 80.8 percent of the variance

was accounted for by the scale. Neither Groups' person separation met or exceeded the

2.0 standard. Additionally, the SAM scale did not distinguish any differences among the

group. Therefore, this scale failed to meet the, fundamental measurement properties.

The DEF scale includes twelve items, each with a dichotomous true-false

response option. The final RPCA indicated that 71.6 percent of the total variance could

be explained by the DEF scale. Additionally, the items, while dividing into nine groups,

did not distinguish any differences among the group. Therefore, this scale failed to meet

the fundamental measurement properties.

The FAM scale included fourteen items, each with a dichotomous true-false

response option. After the elimination of three misfitting items, the final RPCA indicated

that 78.1 percent of the variance was explained by the FAM scale. However, while the

190
FAM scale's items could be divided into seven levels of difficulty, they could not

discriminate any differences among the people. Therefore, the FAM scale failed to meet

the fundamental measurement properties.

The COR scale included twelve items, each with a dichotomous true-false

response option. After the elimination of one misfitting item, the final RPAC indicated

that 85.1 percent of the variance was explained by the COR scale. However, while the

items could be divided into seven levels of difficulty, the COR scale did not distinguish

any differences among the group. Therefore, the COR scale failed to meet the

fundamental measurement properties.

The RAP scale included six items, each with a dichotomous true-false response

option. The final RPCA indicated that 1.4 of the variance was explained by the scale. The

items could not be divided into any levels of difficulty and no distinction could be made

among differences in the people. In addition, there was no reliability (.00) with this scale.

This scale did not meet the fundamental measurement properties. However, one would

question whether that was not the original intention of the SASSI-3 authors as it was

meant to identify people who were responding to the instrument in a random way.

The Dichotomous SASSI-3 and the Whole SASSI-3

The dichotomous SASSI-3 includes 70 items with both a four point Likert-type

response scale and a dichotomous response scale, true and false. After adjusting the four

point Likert-type scale for maximum meaning and eliminating 29 misfitting items, the

RPCA indicated that the instrument functioned as a unidimensional measure that

accounted for 81 percent of the variance explained and no underlying constructs. The

191
items were divided into four levels of difficulty. These levels discriminated seven

different groups of people ranging from high to low on the variable. Therefore, the

dichotomous SASSI-3 can work as a unidimensional instrument that distinguishes people

high on the variable from those low on the variable.

The whole SASSI-3 included all 93 items including both the four point Likert-

type response scales and the dichotomous true-false response item scales. After adjusting

the four point Likert-type response scale for maximum meaning and eliminating 20

items, the RPC A indicated that the instrument functioned as a unidimensional measure

with 69.2 percent of the variance explained, and no evidence of underlying constructs.

The items were divided into seven levels of difficulty which could discriminate five

different groups among the people within the group from high to low on the variable.

Therefore, the whole SASSI-3 instrument, can work as a unidimensional instrument, used

to distinguish people high on the variable from those low on the variable.

Integration of Findings with Other Research

The SASSI-3 authors' purport that the unique integration of subtle items with

direct items provides additional information which is often difficult to assess due to the

clinical denial often present in people dealing with substance dependence issues (Miller

& Lazowski, 1985). However, in their review of the empirical SASSI-3 literature,

Feldstein and Miller (2007) concluded that the SASSI-3's subtle scales have fair to poor

internal consistency. These researchers also concluded that no independent "peer-

reviewed substantiation was found" for the claims that the unique contribution the

combination of indirect and direction items provides in the screening of substance

192
dependence (p. 49). The findings of the present study are supportive of Feldstein and

Miller's summary conclusions. The subtle scales in this study did not function

independently as measures as indicated by their failure to meet the minimal fundamental

measurement properties. In addition, the face valid scales had higher person and item

separation and reliability findings and RPCA's than the dichotomous or whole SASSI-3

(See Table 23).

Table 23

Summary of Person and Item Separation Findings and RPCA 'sfor Direct Versus Direct

and Indirect Scales Combined

~~ Scale ' PS&R IP&R RPCA

FVA 2.65A87 7.81/.98 95.1%

FVOD 2.78A87 4.77A96 84.3%

Dichotomous SASSI-3 3.06/.90 5.63A97 81%

Whole SASSI-3 3.82A94 5.63A97 69.2%

PS& R = Person Separation & Reliability. IS & R = Item Separation & Reliability. RPCA = Rasch principal components analysis.

In 2006, Tellegen et al. introduced a revised version of the MMPl-2. Tellegen and

his co-authors noted that many of the MMPI-2's items loaded on two or more of the

MMPI-2's Clinical scales. They concluded that these multi-item overlaps reduced

specificity among the 8 Clinical scales. In an effort to improve these Basic scales'

specificity, these authors published a newer version of the MMPI called the MMPI-

Restructured Clinical (RC). This reduction and restructuring of the Clinical scales

resulted in RC scales that have higher validity and reliability estimates (Nichols, 2006;

Rogers, Sewell, Harrison & Jordan, 2006). As noted earlier, many of the SASSI-3's items
193
load on one or more of the dichotomous scales. Employing the types of analyses

presented in this study has the potential to produce the same results for the Substance

Abuse Subtle Screening Instrument-3. Specifically, the findings of the present study were

supportive of this assertion. A reduction in the number of items improved the reliability

and measurement functioning of the S AS SI-3.

Finally, the hierarchies that were established for the S AS SI-3 items, regardless of

whether they were from the face valid only scales, the dichotomous, or the whole SASSI-

3, maintained the same general position across all four measures. For example, FVA12

(Suicide) was more difficult to endorse on each of the scales, and FVA4 (More than

intended) was less difficult to endorse on each of the scales. These consistent patterns of

item difficulty are indicative of the linearity of the SASSI-3 measure. That is, the SASSI-

3 measures less to more of the variable of substance dependence consistently and reliably

across samples. This is not unlike intelligence tests; the purpose of which is to measure

less to more of the variable of intelligence consistently and reliably across samples. The

more difficult the item the more of the quality or characteristic one possesses.

Implications

The implications of this study focus on recommendations to improve the SASSI-

3. The first recommendation is to reduce the number of scales. The SASSI-3 meets the

fundamental measurement properties to work, holistically, as a unidimensional measure

to screen for substance dependency. This means that the SASSI-3 can be made more

efficient and effective since it will be unidimensional.

194
A second recommendation is to reduce the number of items. Eliminating

multivocal items, items that are true on one scale and false on another, and items that are

not on any scale, may have a broader effect on the instrument's measurement properties

because these added to the misfitting items (see Table 20). These deleted items indicated

that the item misfits of overfits on the instrument on a consistent manner. Deleting the

misfitting or the overfitting items improved the instrument's person and item separation

and reliability findings as well as the RPCA.

The respondents failed to utilize the response options as the SASSI-3 authors

intended. These standards include meeting the probability curve of .5 or better and a

threshold of greater than or equal to 1.4 units distance between two adjacent response

choices. It appeared from the data reviewed in this study that the respondents did not

make a qualitative distinction resulting clear statistically significant difference between

response option 1- Once or Twice and response option 2- Several Times. A review of the

definitions, clarifying the options would be beneficial to promote more accurate

discrimination of the responses. A final consideration for revising the Likert-style

response options would be to vary the weights assigned to each level of behavior

acknowledgment. For example, to respond "frequently" to the question of how often one

consumes alcohol with lunch has different clinical implications than responding

"frequently" to a question about attempting suicide while consuming alcohol. Under the

current SASSI-3 scoring system, each of these responses are scored a " 3 " even though,

from a biopsychosocial perspective, a person who frequently attempts suicide while

195
consuming alcohol is of much greater concern, clinically, than is someone who frequently

consumes alcohol with lunch.

These analyses indicate that the subtle scales do not contribute in a meaningful

way to the instrument. This was evidenced by the fact that when the face Valid scales

were used independently, the person and item separation and reliability findings as well

as the RPCAs were higher than when the face valid scales were combined with the subtle

scale items. Therefore, the subtle items could be removed without losing any of the

measurement properties.

Suggestions for Future Research

Recommendations for future research include combining both the FVA and the

FVOD with the subtle items to investigate the measurement properties. These new

instruments may produce an alcohol only and a drug only screening instrument.

However, it is important to explore whether the face valid scales have higher

measurement properties with or without the subtle items. Combining the face valid scales

into one scale may also be an area of research to investigate. While the results of the

Rasch analysis demonstrated that the FVOD and FVA scale did function independently,

investigating whether they function together with some modification in the wording of

the items to make them more universal to substances instead of drugs or alcohol

exclusively, may be a benefit to the SASSI-3. Finally, reworking the response options for

the face valid scales may contribute to the functioning of the instrument.

196
Limitations

Despite its multiple uses and high reputation for instrument validation, critiques

against the Rasch model are purported hy individuals who are solely committed to the use

of factor analysis. Bond and Fox (2007) report that these critics state that Rasch model

/ analysis is not a theory building method, as is factor analysis, and that the Rasch model

theory is too simplistic. In Rasch, the theory drives the development of the instrument.

This principle is contradictory to exploratory factor analysis, which is designed to

facilitate theory building. For those researchers interested in exploring multidimensions,

Rasch analysis will prove ineffective as the Rasch model only works for unidimensional

instruments (Kubinger, 2005).

A specific limitation of this study included the assumption made by the researcher

that the sample drawn from the data gathered from the community family court project

included people with a higher likelihood of substance dependence. This assumption was

made primarily due to the involvement with the project. However, just because a

respondent was involved with the project did not necessarily imply a higher likelihood of

substance dependence.

Conclusion

The purpose of this study was to investigate the measurement properties of the

Substance Abuse Subtle Screening Inventory-3 (SASSI-3). The measurement properties

as outlined by Thurston (1927) include Unidimensionality, Linearity, Invariance, and

Independence. This study produced two major findings. The first involves the SASSI-3's

measurement properties. While it is commonly known that the SASSI-3, as it is currently

197
written, is not intended to be unidimensional, the SASSI-3 can function as a

unidimensional measure which meets fundamental measurement properties with some

minor adjustments to the response options and elimination of some misfitting and

redundant items. The second major finding of this study is that the subtle scales and

subtle items do not appear to contribute to the functioning of the instrument. The

implications of these findings are that changing the response scale and eliminating

multivocal items, items that are true on one scale and false on another, items with no

scale and other items that misfit or are redundant will improve the functioning of the

instrument by improving its reliability. A higher functioning instrument will improve

time management and save money for community agencies and drug and alcohol

treatment facilities by effectively screening in people who need treatment, providing

them with treatment which leads to effective outcomes.

Additional research on the SASSI-3 using modern measurement methodology can

only improve On the effectiveness of the instrument. However, more research is needed to

confirm the findings of this study. As has been suggested for the MMPI-2 RC, immediate

change to a new instrument without research to confirm and validate these findings

would be premature.

198
References

Adger, H., & Werner, M. J. (1994). The pediatrician. Alcohol Health and Research

World, 75, 121-126.

Altaian, D. G. (1991). Practical statistics for medical research. London, England:

Chapman & Hall.

American Psychiatric Association. (2000). Diagnostic and statistical manual (4 ed.)

Washington DC: American Psychiatric Association.

Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. I. Braun,

(Eds.), Test validity (pp. 19-32). Princeton, NJ: Lawrence Erlbaum Associates,

Inc.

Arneth, P. M., Bogner, J. A., Corrigan, J. D. & Schmidt, L. (2001). The utility of the

Substance Abuse Subtle Screening Inventory-3 for use with individuals with brain

injury. Brain Injury, 15, 499-510.

Banerji, M., Smith, R. M., & Dedrick, R. F. (1997). Dimensionality of an early childhood

scale using Rasch analysis and Confirmatory Factor Analysis. Journal of

Outcome Measurement 7(1), 56-85.

Bartholomew, D. J. (1996). The statistical approach to social measurement. London

England: Academic Press Inc.

Bartholomew, K., Henderson, A. J. Z., & Marcia, J. E. (2000). Coded semistructured

Interviews in social psychological research. In H. T. Reis & C. M. Judd (Eds.),

Handbook of research methods in social andpersonality psychology. New

York, NY: Cambridge University Press.

199
Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental

measurement in the human sciences, (2n ed). Mahwah, NJ: Lawrence Erlbaum

Associates, Publishers.

Brewer, M. B. (2000). Research design and issues of validity. In H. T. Reis, & C. M.

Judd, (Eds.). Handbook of research methods in social and personality psychology

(pp. 3-16). Cambridge, UK: Cambridge University Press.

Burck, A. M., Laux, J. M., Harper, H. L., & Ritchie, M, (2008). Detecting college student

impression management using the SASSl-3. Adams State College. Paper

submitted for publication.

Butcher, J. N., Dahlstrom, W. G., Graham, J. R., Tellegen, A., & Kaemmer, B. (1989).

Manual for the Restandardized Minnesota Multiphasix Personality Inventory:

MMPI-2. Minneapolis, MN: University of Minnesota Press.

Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic.

Computational Linguistics, 22, 249-254.

Clements, R. (2002). Psychometric properties of the Substance Abuse Subtle Screening

Inventory - 3. Journal of Substance Abuse Treatment, 23, 419-423.

Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Fort

Worth, TX: Harcourt Brace Jovanovich College Publishers.

Derogaris, L. R. (1975). The SCL-90-R. Baltimore, MD: Clinical Psychometric Research.

Donaldson, S. I., & Grant-Vallone, E. J. (2002). Understanding self-report bias in

organizational behavior research. Journal of Business and Psychology, 17, 245-

260.

200
Elliott, R., Fox, C. M , Beltyukova, S. A., Stone, G. E.> Gunderson, J., & Zhang, X.

(2006). Deconstructing therapy outcome measurement with Rasch analysis of a

measure of general clinical distress: The Symptom Checklist-90-Revised.

Psychological Assessment, 18,359-372.

Ewing, J. A. (1984). Detecting alcoholism: The CAGE questionnaire. JAMA, 252, 1905-

1907.

Feldstein, S. W., & Miller, W. R. (2007). Does subtle screening for substance abuse

work? A review of the Substance Abuse Subtle Screening Inventory (SASSI).

Addiction, 702,41-50.

Fox, C. M^ & Jones, J. A. (1998). Uses of Rasch modeling in counseling psychology

research. Journal of Counseling Psychology, ¥5,30-45.

Gray, B. T. (2001). A factor analytic study of the Substance Abuse Subtle Screening

Inventory (SASSI). Educational and Psychological Measurement, 61(1), 102-

118.

Henderson, C. E., Taxman, F. S., & Young, D. W. (2007). A Rasch model analysis of

evidence-based treatment practices in the criminal justice system. Drug and

Alcohol Dependence, 93, 163-175.

John, O. P., & Benet-Martinez, V. (2000) Measurement: Reliability, construct validation,

and scale construction (pp. 339-369). In H. T. Reis & C. M. Judd (Eds.).

Handbook ofresearch methods in social andpersonality psychology.

Cambridge, UK: Cambridge University Press.

Juhnke, G. A., Vacc, N. A., Curtis, R. C , Coll, K. M., & Paredes, D. M. (2003).

201
Assessment instruments used by addictions counselors. Journal of Addictions and

Offender Counseling, 23. 66-12.

Kagee, A., & deBruin, G. P. (2007). The South African former detainees distress scale:

Results of a Rasch item response theory analysis. South African Journal of

Psychology, 37, 518-529.

Keeves, J. P., & Masters, G. N., (1999) Introduction. In G. N. Masters & J. P. Keeves

(Eds.), Advances in measurement in educational research and assessment. New

York, NY: Pergamon. ' _.

Kubinger, K.D. (2005). Psychological test calibration using the Rasch model - Some

critical suggestions on traditional approaches. International Journal or Testing,

5(4), 377-394.

Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for

categorical data. Biometrics, 33. 159-174.

Laux, J. M., Perera-Diltz, D., Smirnoff, J. B., & Salyers, K. M. (2005). The SASSI-3 face

valid other drugs scale: A psychometric investigation. Journal of Additions and

Offender Counseling 26, 15-21.

Laux, J. M., Salyers, K. M., & Kotova, E. (2005). Psychometric evaluation of the SASSI-

3 in a college sample. Journal of College Counseling, 8, 41-51.

Lazowski, L. E., Miller, F. G., Boye, M. W., & Miller, G. A. (1998). Efficacy of the

Substance Abuse Subtle Screening Inventory-3 (SASSI-3) in identifying

substance dependence disorders in clinical settings. Journal of Personality

Assessment, 71,114-128.

202
Linacre, J. M. (1999). Investigating rating scale category utility. Journal of Outcome

Measurement, 3, 103-122.

Linacre, J. M. (2G09). Winsteps. Retrieved on March 21, 2009 from www.winsteps.com

Litwin, M. (1995). How to measure survey reliability and validity. Thousand Oaks, CA:

SAGE Publications.

Mark, R. (1996). Research made simple: A handbookfor social workers. Thousand

Oaks, CA: SAGE Publications.

Mayfield, D., McLeod, G:, & Hall, P. (1974). The CAGE questionnaire: Validation of a

new alcoholism screening instrument. American Journal of Psychiatry, 131,

1121-1123.

McAndrew, C. (1965). The differentiation of male alcoholic outpatients from

nonalcoholic psychiatric outpatients by means of the MMPI. Quarterly Journal of

Studies on Alcohol, 26, 238-246.

Miller, W. R., & Lazowski, L. (1999). Adult SASSI-3 Manual. Springfield, IN: SASSI

Institute.

Miller, W. R., & Feldstein, S. W. (2007). SASSI: A response to Lazowski & Miller.

Addiction, 102, 1001-1004.

Millon, T. (1987). Manual for the Millon Clinical Multiaxial lnverntory-ll (MCMI-II).

Minneapolis, MN: National Computer Systems.

Myerholtz, L , & Rosenberg, H. (1998). Screening college students for alcohol problems:

Psychometric assessmentof the SASSI-2. Journal orStudies on Alcohol, 59, 439-

446.

203
National Highway Safety Traffic Safety Administration (2006). 2006 Annual assessment

final report. Retrieved 1/28/09 from http://www.nhtsa.dot.gov/.

Nichols, D. S. (2006). The trials of separating bath Water from baby: A review and

critique of the MMPA-2 Restructured Clinical Scales. Journal of Personality

Assessment 87, 121-138.

Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.

Peters, R. H., Greenbaum, P. E., Steinberg, M. L., Carter, C. R., Ortiz, M. M., Fry, B. C ,

& Valle, S. K. (2000). Effectiveness of screening instruments in detecting

substance use disorders among prisoners. Journal of Substance Abuse Treatment,

75,349-358.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests.

Chicago, IL: The University of Chicago Press.

Rasch, G. (1980). Probabilistic Models for some intelligence and attainment tests

(expanded ed.). Chicago, IL: The University of Chicago Press.

Rogers, R., Sewell, K. W., Harrison, K. S, & Jordan, M. J. (2006). The MMPI-2

Restructured Clinical Scales: A paradigmatic shift in scale development. Journal

of Personality Assessment, 87, 139-147.

Salins, P. (2008). Does the SAT predict college success? Retrieved 1/23/09 from

http://www.mindingthecampus.com/originals/2008/10/by_peter_salins one of.ht

ml.

Selzer, M. L. (1971) The Michigan Alcohol Screening Test: The quest for a new

diagnostic instrument. American Journal of Psychiatry, 127, 1653-1658.

204
Sproll, N. L. (1995). Handbook of research methods: A guide for practitioners and

students in the social sciences, (2nd ed). Metuchen, N.J. & London, England:

Scarecrow Press, Inc.

Stevens, J. (1996). Applied multivariate statistics for social sciences, (3 rd ed). Mahwah,

NJ: Lawrence Erlbaum.

Stone, G. E. (2007). Thurstonian principles of measurement. Powerpoint presented at the

weekly class meeting Measurement I, Toledo, Oh.

Strong, D. R., Kahler, G. W., Greene, R. L., & Schinka, J. (2005). Isolating a primary

dimension within the Cook-Medley hostility scale: A Rasch analysis. Personality

and Individual Differences, 39, 21 -33.

Substance Abuse and Mental Health Services Administration (2008). Drug abuse

warning network, 2006: National estimates of drug-related emergency

department visits. Retrieved on January 7, 2009, from http://www.samhsa.gov/.

Substance Abuse and Mental Health Services Administration (2008). Results from the

2007 national survey on drug use and health: National findings. Retrieved on

January 7,2009, from http ://www. samhsa. gov/.

Svanum, S., & McGrew, J. (1995). Prospective screening of substance dependence: The

advantages of directness. Addictive Behaviors, 20, 205-213.

Sweet, R. I., & Saules, K. K. (2003). Validity of the Substance Abuse Subtle Screening

Inventory - Adolescent Version (SASSI-A). Journal of Substance Abuse

Treatment, 24, 331-340.

Tellegen, A., Ben-Porath, Y. S, Sellbom, M., Arbisi, P. A., McNulty, J. L., & Graham, J.

205
R. (2006). Further evidence on the validity of the MMPI-2 Restructured Clinical

(RC) Scales: Addressing questions raised by Rogers, Sewell, Harrison and Jordan

mdNichols. Journal of Personality Assessment, 87, 148-171.

Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273-

286.

Traub, R. (1994). MMSS Reliability for the social sciences: Theory and applications,

vol. 3. Thousand Oaks, CA: SAGE Publications.

Viera, A. J., & Garrett, J. M. (2005). Understanding interobserver agreement: The kappa

statistic. Family Medicine 3 7, 360-363.

Wallen, N. E., & Fraenkel, J. R. (1991). Educational research: A guide to the process.

New York, NY: McGraw-Hill.

Weed, N. C , Butcher, J. N., McKenna, T., & Ben-Porath, Y. S. (1992). New measures

for assessing alcohol. And drug abuse with the MMPI-2: The APS and AAS.

Journal of Personality Assessment, 58, 389-404.

Wright, B. D. (1960). Forward. In Rasch, G. (1960) Probabilistic models for some

intelligence and attainment tests (pp. ix-xix). Chicago, IL: The University of

Chicago Press.

206

S-ar putea să vă placă și