Documente Academic
Documente Profesional
Documente Cultură
Department of Marketing, School of Business Administration, University of Miami, 523D Jenkins Building, Coral Gables, FL 33124-6554, USA
b
Department of Marketing, Moore School of Business, University of South Carolina, 29208 Columbia, SC, USA
Abstract
A review of the assessment of face validity in consumer-related scale development research is reported, suggesting that concerns over the
lack of consistency and guidance regarding item retention during the expert judging phase of scale development are warranted. After
analyzing data from three scale development efforts, guidance regarding the application of different decision rules to use for item retention is
offered. Additionally, the results suggest that research using new, changed, or previously unexamined scale items should, at a minimum, be
judged for face validity.
D 2003 Elsevier Science Inc. All rights reserved.
Keywords: Face validity; Content validity; Scale development
1. Introduction
Concerns regarding the lack of consistency and guidance
regarding how to use the expertise of judges to determine
whether an item should be retained for further analysis in
the scale development process motivated this investigation.
Moreover, there is confusion regarding the difference
between face and content validity and no previous research
has addressed directly the procedures used by consumer and
marketing researchers to determine item retention during
face validity assessment. Therefore, our first objective was
to assimilate and explain the difference between content and
face validity. Our second objective was to review the use of
expert judges in previous consumer and marketing research.
Finally, and based on this review, our third objective was to
test three frequently employed decision rules used in consumer and marketing research, in order to investigate the
relative effectiveness of alternative decision rules for use in
assessing face validity of scale items being considered in
measure development processes.
We begin by differentiating between face and content
validity, and then we describe the importance of having face
valid items. Next, the results of a review of marketing and
consumer behavior-related scale development efforts are presented. The review consisted of an assessment of the scale
development articles reviewed in Bearden and Netemeyers
(1999) second edition compilation of marketing scales. These
scales were chosen since they are among the most frequently
employed and more rigorously constructed scales used by
consumer and marketing researchers. The review was undertaken for two primary reasons. First, we wanted to determine
the prevalence of expert judging as a tool to aid in item face
validity assessment; and second, we wanted to gain an understanding of how previous researchers used expert judging to
reduce an initial item pool and to determine which items to be
further analyzed. After establishing that there has been a lack of
consistency regarding the rules used for item retention, several
data sets were analyzed in an attempt to provide future
researchers with guidance regarding item retention decisions.
Finally, the article concludes with remarks concerning the
implications and limitations of our research, as well as a
discussion of future research avenues.
2. Face validity assessment
2.1. The importance of having face valid items
Churchill (1979) proposed a widely accepted general
paradigm for developing measures of marketing constructs,
99
100
Table 1
Expert judging of face validity: 39 measures reported in Bearden and Netemeyers (1999) Handbook of Marketing Scales
Initial number
of items
Number of items
after judging
Number of items
in the final scale
Number of
expert judges
10
10
10
15
15
15
10
10
10
72
150
60
150
18
20
not reported
4
70
27
15
60
32
99
60
5, 5
25
12
12
colleagues in
the marketing
department
marketing
professors
and PhD
students
3
60
29
14
180
100
17
40
74, 60
19
16, 16
6
6, 8
a group
5, 5
70
60
13 (CC), 9 (RP),
5 (VS), 10 (CR),
7 (MUP)
several
135Study
1
86Study 2
86
12
62
12
50
not reported
31
42
36
26
a group of
university
professors
10
168
43, 23
20
3, 5
168
35
10
3, 5
150
89, 89
75
41, 28
33
10
30
5
71
53
15
33 (CP)
Study 1
33 (VC)
Study 1
25 (CP)
Study 2
18 (VC)
Study 2
104
25
18
25
15
72
5, 5, 5
52
124
31
45
36
15
33
115
50
35
35
118
99
82
10
70
not given
11
101
102
Table 1 (continued )
Initial number
of items
Number of items
after judging
Number of items
in the final scale
Number of
expert judges
33
200
33
34, 31
8
16
3
8, 18
not given
not given
6, 4, 5, 3, 3
3, 3
57, 53
22, 21
5, 5
4, 4
100
65
31
a number of
judges
104
70
24
24
not given
not given
150
85
31
a panel of
judges
6
27, > 40
22, 21
17, 21
12
not given
not given
not given
19
>100
9
34
about 50
>21
103
104
Table 2
Three decision rules employed by researchers in the use of expert judging
for the face validity of itemsa
Name of construct
Not
Sumscore Complete representative
Information Acquisition
Consideration Set Formation
Personal Outcome Decision Making
Social Outcome Decision Making
Persuasion Knowledge
Marketplace Interfaces
Overall Consumer Self Confidence
Work Family Conflict
Family Work Conflict
Achievement Vanity
Physical Vanity
.705***
.566**
.335
.730***
.501**
.183
.395***
.401**
.242
.151 *
.063
.709***
.556**
.248
.683***
.544**
.182
.475***
.396 *
.246
.119
.035
.428*
.481*
.548 **
.742***
.128
.152
.316***
not applicable
not applicable
not applicable
not applicable
105
the sizes of the correlations, it appears that the not representative decision rule is not as effective at predicting
ultimate inclusion of an item in a scale as the alternative
two rules (i.e., sumscore and complete). That is, researchers
should not simply focus on the number of judges who rate an
item as not representative at all for a construct when determining whether to retain or delete items (cf. Bearden et al.,
1989; Netemeyer et al., 1995, 1996). In order to evaluate the
sumscore and the complete decision rules further, we considered two additional data sets. The nature of the decision
rules used by these two sets of authors precludes further
testing of the not representative decision rule.
The Work Family Conflict and the Family Work Conflict Scales were considered first (Netemeyer et al., 1996).
These authors decided that all judges had to have indicated
that the item was at least somewhat representative of the
construct of interest in order to be retained. As shown in
Table 2, both the sumscore and complete decision rule are
statistically related to the inclusion of items in the final
scale for one of the two constructs. Based on this, there is
little difference between the use of the sumscore and
complete decision rules. In order to further evaluate these
two rules, the physical and achievement vanity scales of
Netemeyer et al. (1995) were evaluated. Specifically,
Netemeyer et al., in the second phase of their judging
procedures, decided that all four judges had to rate an item
as at least somewhat representative to be retained. As can
be seen in Table 2, only the sumscore decision rule is
statistically related to the inclusion of items in the final
achievement vanity scale. Neither of the decision rules was
statistically related for the physical vanity scale. Again,
however, the overall magnitude of both the sumscore and
complete decision rules seems to suggest that they are
performing similarly.
106
References
Allen MJ, Yen WM. Introduction to measurement theory. Monterey (CA):
Brooks/Cole, 1979.
Allison NK. A psychometric development of a test for consumer alienation
from the marketplace. J Mark Res 1978;15:565 75.
Anastasi A. Psychological testing. New York: Macmillan, 1988.
Babin BJ, Darden WR, Griffin M. Work and/or fun: measuring hedonic and
utilitarian shopping value. J Consum Res 1994;20:644 56 (March).
Baumgartner H, Steenkamp J-BEM. Exploratory consumer buying behavior: conceptualization and measurement. Int J Res Mark 1996;13:
121 37.
Bearden WO, Netemeyer RG. Handbook of marketing scales: multi-item
measures for marketing and consumer behavior research. Thousand
Oaks (CA): Sage Publications, 1999.
Bearden WO, Netemeyer RG, Teel JE. Measurement of consumer susceptibility to interpersonal influence. J Consum Res 1989;15:473 81.
Bearden WO, Hardesty DM, Rose RL. Consumer self-confidence: refinements in conceptualization and measurement. J Consum Res 2001;
28:121 34.
Behrman DN, Perreault WD. Measuring the performance of industrial
salespersons. J Bus Res 1982;10:355 70.
Bienstock CC, Mentzer JT, Bird MM. Measuring physical distribution
service quality. J Acad Mark Sci 1997;25:31 44 (Winter).
Brooker G. An instrument to measure consumer self-actualization. In:
Schlinger MJ, editor. Advances in consumer research, vol. 2. Ann Arbor
(MI): Association for Consumer Research, 1975. p. 563 75.
Butaney G, Wortzel LH. Distributor power versus manufacturer power: the
customer role. J Mark 1988;52:52 63 (January).
Churchill G. A paradigm for developing better measures of marketing
constructs. J Mark Res 1979;16:64 73 (February).
Cialdini RB, Frost MR, Newsom JT. Preference for consistency: the development of a valid measure and the discovery of surprising behavioral
implications. J Pers Soc Psychol 1995;69(2):318 28.
Cohen JB. An interpersonal orientation to the study of consumer behavior. J
Mark Res 1967;4:27 278.
Feick LF, Price LL. The market maven: a diffuser of marketplace information. J Mark 1987;51:83 97.
Ferrell OC, Skinner SJ. Ethical behavior and bureaucratic structure in marketing research organizations. J Mark Res 1988;25:103 9 (February).
Gaski JF, Nevin JR. The differential effects of exercised and unexercised power sources in a marketing channel. J Mark Res 1985;22:
130 42 (May).
Holsti O. Content analysis for the social sciences and humanities. Reading
(MA): Addison-Wesley Publishing, 1969.
Kohli AK, Zaltman G. Measuring multiple buying influences. Ind Mark
Manage 1988;17:197 204.
Kumar N, Stern LW, Achrol RS. Assessing reseller performance from the
perspective of the supplier. J Mark Res 1992;29:238 53 (May).
Lichtenstein DR, Netemeyer RG, Burton S. Distinguishing coupon proneness from value consciousness: an acquisitiontransaction utility
theory perspective. J Mark 1990;54:54 67.
Lundstrum WJ, Lamont LM. The development of a scale to measure consumer discontent. J Mark Res 1976;13:373 81.
Malhotra NK. A scale to measure self-concepts, person concepts, and
product concepts. J Mark Res 1981;16:456 64.
107