Sunteți pe pagina 1din 9

',,: ~,

---
In Defense of External Invalidity
-- Douglas G. Mook University of Virginia

..
ABSTRACT: Many psychological investigations are monographs havebeen written about its proper nur-
accusedof "failure to generalize to th~ real world" ture, and checklists of specific threats to its well-
because of sample bias or artificiality of setting. It being are now appearing in textbooks. Studies unes-
is argued in this article that such "generalizations" corted by it are afflictedby-what else?-external
oftenare not intended. Rather than making predic-
invalidity. That phrase has a lovely mouth-filling
tionsabout the real world from the laboratory, we resonance to it, and there is, to be sure, a certain
may test predictions that sIJecify what ought to hap- poetic justice in our being attac~ed with our own
pen in the lab. We may regard even "artificial"find- Jargon.
ingsas interesting because they show what can occur, Warm Fuzzies and Cold Creepies
even if it rarely does. Or, where we do make gener-
alizations. they may have added force because of The trouble is that, like most "purr" and "snarl"
artificiality of sample or setting. A misplaced preoc- words, the phrases external validityand external in-
cllpationwith external validity can lead us to dismiss validity can serve as serious barriers to thought.
goodresearchfor which generalization to real life is Obviously, any kind of validity is a warm, fuzzy
not intended or meaningful. Good Thing; and just as obviously,any kind of in-
validity must be a cold, creepy Bad Thing. Who
r- could doubt it? .
'Cry, .
It seems to me that these phrases trapped even
rs/ Thegreatest weakness of laboratory experiments lies in their originators, in just that way. Campbell and
theirartificiality. Social processes observed to occur within
Stanley (1967) introduce the concept thus: "Exter-
a laboratory setting might not necessarily occur within nal validity asks the question of generalizability: To
morenatural social settings.
-Babbie, 1975, p. 254 what populations, settings, treatment variables, and
measurement variables can this effect be general-
Inorderto behavelike scientists we must construct situ- ized?" (p. 5). Fair enough. External validity is not
ationsin whichour subjects. . . can behave as little like an automatic desideratum; it asks a question. It in-
humanbeingsas possibleand we do this in order to allow vites us to think about the prior questions: To what
ourselvesto make statements about the nature of their
populations, settings, and so on, do we want the ef-
humanity.
-Bannister, 1966, p. 24 fect to be generalized? Do we want to generalize it
at all? .
Experimentalpsychologists frequently have to listen But their next sentence is: "Both types of cri-
to remarks like these. And one who has taught teria are obviously important. . ." And". . . the
I ' coursesin research methods and experimental psy- selection of designs strong in both types of validity
.
- U.S.
. chology,as I have for the past several years, has prob- is obviously our ideal" (Campbell & Stanley, 1967,
ablyhad no problem in alerting students to the p.5). .
J.)
.= "anificiality"of research settings. Students, like lay- I intend to argue that this is simply wrong. If
dOte . it sounds plausible, it is because the word validity
persons(and not a few social scientists for that mat-
- .. ter),come to us quite prepared to point out the
remoteness of our experimental chambers, our
has given it a warm coat of downy fuzz. Who wants
to be invalid-internally, extermdly,or in any other
- ...
date
preoccupation with rats and college sophomores, way? One might as well ask for acne. In a way, I
wish the authors had stayed with the term general-
.. andthe comic-opera "reactivity" of our shock gen-
~rators,electrode paste, and judgments of lengths of izability, precisely because it does not sound nearly
.. linesegments on white paper. so good. It would then be easier to remember that
.. They see all this. My problem has been not to we are not dealing with a criterion, like clear skin,
.. aIenthem to these considerations, but to break their
habit of dismissing well-done, meaningful, infor-
but with a question, like "How can we get this sofa
down the stairs?" One asks that question if, and only
..... mativeresearch on grounds of "artificiality."
The task has become a bit harder over the last
if, moving the sofa is what one wants to do.
But generalizabili~yis not quite right either.

-
f~ years because a full-fledged "purr" word has
g&]nedcurrency: external validity. Articles and

April 1983 . American Psychologist


The question of external validity is not the same as
the question of generalizability.Even an experiment

379
OIpyn&ht 1983 by the American Psychological Association, Inc.
comfort \Va:
that is clearly"applicable to the real world," perhaps low that one particular situation to stand for the unspec..
Hied circumstances in which an individual could be al- :hment," wher
because it was conducted there (e.g., Bickman's,
truistic. . . . the social psychologistas experimenteris s a massive SJ:
1974,studies of obedience on the street corner), will ~ut still wrigglir
have some limits to its generalizability. Cultural, content to let a particular situation stand for an indefinite
range of possible testing situations in a vague and un. 950s.
historical, and age-group limits will surely be pres- specifiedway.(pp. 59-60) As a case st1
ent; but these are unknown and no singlestudy can d cloth-mothe
discoverthem all. Their determination is empirical. It comes down to this: The experimenter is gener.
rena of EV.

I
'

The external-validityquestion is a special case. alizing on the basis of a small and biased sample : The origim
"

It comes to this: Are the sample, the setting, and the not of subjects (though probably those too), but of
manipulation so artificial that the class of "target" settings and manipulations. 1 ~ 'tanley (1967) ]
real-life situations to which the results can be gen- The entire argument rests, however,on an ap- tigation they h
"',mixed design w
eralized is likely to be trivially small? If so, the ex- plied, or what I call an "agricultural," conception .
withheld (the ir
periment lacks external validity.But that argument of the aims of research. The assumption is that the ,
Since Harlow's
still begs the question I wish to raise here: Is such experiment is intended to be generalized to similar ,t,
generalizationour intent? Is it what we want to do? subjects, manipulations, and settings. If this is so, ',.the first two oj
" do not arise at
Not always. then the broader the generalizationsone can make, ~and multiple-tl
the more real-world occurrences.one can predict
The Agricultural Model from one's findings and the more one has learned .. The other
These baleful remarks about external validity (EV) about the real world from them. However,it may ii Harlow's case.
are not quite fair to its originators. In defining the not be so. There are experiments-very many .. that the effects
concept, they had a particular kind of research in of them-that do not have such generalization as ,ulation from
their aim. ".lected" (Caml
mind, and it was the kind in which the problem of ~genera1IY, this
EV is meaningful and important. This is not to deny that wehavetalked nonsense t
These are the applied experiments. Campbell on occasion. We have. Sweeping generalizations ".it raises the S}
,
,

and Stanley (1967) had in mind the kind of inves- about "altruism," or "anxiety," or "honesty" have " Of course, as '
, bat the proble
tigation that is designed to evaluate a new teaching
procedure or the effects of an "enrichment" pro-
been made on evidence that does not begin to sup-
port them, and for the reasons Deese gives. But let , it)is to select
of interest.
it also be said that in many such cases, we have
t
"

gram on the culturally deprived.For that matter, the :I;' Were H~


research context in which sampling theory was de- seemed to talk nonsense only because our critics,
veloped in its modern form-agricultural re- or we ourselves, have assumed that the "agricul- t, the populati<
search-has a similar purpose. The experimental tural" goal of generalization is part of our intent. [;~. not; they wer
: \; besides. Well
setting resembles, or is a special case of, a real-life
settingin whichone wants to knowwhat to do. Does
But in many (perhaps most) ofthe experiments
Deese has in mind, the logic goes in a different di- I:. r the populati
this fertilizer (or this pedagogicaldevice) promote rection. Weare not making generalizations,but test- ,There was 11
growth in this kind of crop (or this kind of child)? ing them. To show what a differencethis makes, let " must be COI1
If one finds a significant improvement in the ex- me turn to an example. ,~,dures fell faJ
.. Second
perimental subjects as compared with the controls, A Case Study of a Flat Flunk "patent arti
one predicts that implementation of a similar ma- (Campbell (
nipulation, in a similar settingwith similar subjects, Surely one of the experiments that has had per- Stanley go <
will be of benefit on a larger scale. . manent impact on our thinking is the study of subjects' kn
That kind of argument does assume that one's "mother love" in rhesus monkeys, elegantly con- and by wha
experimental manipulation represents the broader- ducted by Harlow.His wire mothers and terry-cloth But the pro
scale implementation and that one's subjects and mothers are permanent additions to our vocabulary we know t1
settings represent their target populations. Indeed, of classic manipulations. And his finding that con- , setting is w1
part of the thrust of the EV concept is that we have Solutionsh
been concerned only with subject representativeness I thank James E. Deese and Wayne Shebilske for their comments factthat th
and not enough with representativeness of the set- on an earlier version of this article. ' to a field
tings and manipulations we have sampled in doing Requests for reprints should be sent to Douglas G. Mook, " .
Department of Psychology, University of Virginia, Charlottes-
represent
experiments. .
selves(e.g.
ville, Virginia 22901.
Deese (1972), for example, has taken us to task I In fairness, Deese 'goes on to make a distinction much like What
for this neglect: the one I intend here. "If the theory and observations are ex- know whe
plicitly related to one another through some rigorous logical pro- experimer
Someparticular set of conditions in an experiment is gen- cess, then the sampling of conditions may become completely
erallytaken to be representative of all possibleconditions unnecessary" (p. 60). I agree. "But a theory having such power experienCt
of a similar type. . . . In the investigation of altruism, is almost never found in psychology" (p. 61). I disagree, not 'der and ac
situations are devisedto permit people to make altruistic because I think our theories are all that powerful, but because p. 21). In
choices.Usually a singlesituation providesthe setting for I do not think all that much power is required for what we are wildered,
the experimental testing. . . . [the experimenter] will al- usually trying to do.

380 April 1983 . American Psychologist


'cr';
tact comfort was a powerful determinant of "at- of the representativenessof the setting? Real mon- I\~
I~!I
.

tachment," whereas nutrition was small potatoes, keysdo not livewithin walls.They do not encounter
was a massive spike in the coffin of the moribund, mother figuresmade of wire mesh, with rubber nip-
but still wriggling, drive-reduction theories of the ples; nor is the advent of a terry-cloth cylinder,
1950s. warmed by a light bulb, a part of their natura1life-
As a case study, let us see how the Harlow wire- style.What can this contrived situation possiblytell
and cloth-mother experiment stands up to the cri- us about how monkeys with natural upbringing
teria of EV. . would behave in a natural setting?
The original discussionofEV by Campbell and On the face of it, the verdict must be a flat
Stanley (1967) reveals that the experimental hives- flunk. On every criterion of EV that applies at all,
an ap. tigation they had in mind was a rather complex we find Harlow's experiment either manifestly de-
lception mixed design with pretests, a treatment imposed or ficient or simply unevaluable. And yet our tendency
latthe withheld (the independent variable), and a posttest. is to respond to this critique with a resounding "So
similar Since Harlow's experiment does not fit this mold, what?" And I think we are quite right to so respond. '

is so, the first two of their "threats to external validity" Why? Because using the lab results to make
make, do not arise at all: pretest effectson responsiveness genera1izationsabout real-world behavior was no
predict and multiple-treatment interference. part of Harlow's intention. It was not what he was
earned The other two threats on their list do arise in trying to do. That being the case, the concept of EV
it may Harlow's case. First, ..there remains the possibility simply does not arise-except in an indirect and
many that the effects. . . hold only for that unique pop- remote sense to be clarified shortly.
ltionas ulation from which the . . . [subjects were] se- Harlow did not conclude, "Wild monkeys in
lected" (Campbell & Stanley, 1967, p. 19). More the jungle probably would choose terry-cloth over
generally,this is the problem of sampling bias, and wire mothers, too, if offered the choice." First, it
i?n~nse it raises the spectre of an unrepresentative sample. would be a moot conclUsion,since that simply is
zations
r"have Of course, as every student knows, the way to com- not going to happen. Second, who cares whether
to sup. bat the problem (and never mind that nobody does they would or not? The genera1izationwould be triv-
Butlet it) is to select a random sample from the population ial even if true. What Harlow did conclude was that
vehave of interest. . the hunger-reduction interpretation of mother love
critics; Were Harlow'sbaby monkeys representative of would not work. If anything about his experiment
agricul- the population of monkeys in general? Obviously has external validity,it is this theoretical point, not
Intent. not; they were born in captivity and then orphaned the findingsthemselves.And to seewhetherthe theo-
:riments besides.Well, were they a representative sample of retical conclusion is valid, we extend the experi-
'entdi- the population of lab-born, orphaned monkeys? ments or test predictions based on theory.2We do
ut test- There was no attempt at all to make them so. It not dismiss the findings and go back to do the ex-
:es,let must be concluded that Harlow's sampling proce- periment "properly," in the jungle with a random
dures fell far short of the ideal. sample of baby monkeys.
Second, we have the undeniable fact of the The distinction between generality of findings
"patent artificiality of the experimental setting" and generality of theoretical conclusions under-
ad per- (Campbell & Stanley, 1967, p. 20). Campbell and scores what seems to me the most important source
udy of Stanleygo on to discuss the problems posed by the of confusion in all this, which is the assumption that
tly con- subjects'knowledgethat they are in an experiment the purpose of collecting data in the laboratory is
~-cloth and by what we now call "demand characteristics." to predict real-life behavior in the real world. Of
buiary But the problem can be generalized again: How do course, there are times when that is what we are'
at con- weknow that what the subjects do in this artificial trying to do, and there are times when it is not.
settingis what they would do in a more natural one? When it is, then the problem of EV confronts us,
Solutionshave involvedhiding from the subjects the full force. When it is not, then the problem of EV
factthat they are subjects;movingfrom a laboratory is either meaningless or trivial, and a misplaced
to a field setting; and, going further, trying for a preoccupation with it can seriouslydistort our eval-
"representative sample" of the field settings them- uation of the research. . .

~\,
selves(e.g., Brunswik, 1955). But if we are not using our experiments to pre-
much lilce What then of Harlow's work? One does not dict real-life behavior, what are we using them for?
ms are' ex.
logicalpro- know whether his subjects knew they were in an Why else do an experiment?
Icompletely experiment;certainly there is everychance that they
iuchpower 2 The term theory is used loosely to mean. not a strict de-
,experienced"expectations of the unusual, with won- ductive system, but a conclusion on which different findings con-
;agree.not derand activepuzzling" (Campbell &Stanley, 1967,
,utbeCause verge. Harlow's demonstration draws much of its force from the
lhatweiit, p. 21). In short, they must have been cautious, be- context of other findings (by Ainsworth. Bowlby, Spitz, and oth-
'~'
~,< .~'. wildered,reactive baby monkeys indeed. And what ers) with which it articulates.
~"c :
.
.

'cholojSt.. April 1983 . American Psychologist 381


There are a numbet: of other things we may be are a trivial guide to a person's intelligence and are Iholicsare
doing. First, we may be asking whether something treated as such when more information is available. conditi
can happen, rather than whether it typically does
happen; Second, our prediction may be in the other
On the other hand, is it not worth knowing that Indeed ~
..ie"represe
such a bias can occur, even under restricted condi-
direction; it may specify something that ought to tions? Does it imply an implicit "theory" or set of
happen in the lab. and so we go to the lab to see "heuristics" that we carry about with us? If so, where
'~lxcept elec
401 studenl
whether it does. Third, we may demonstrate the do they come from? - ~hat
f kind,
.
power of a phenomenon by showingthat it happens There are some intriguing issues here. Why purpose IS
even under unnatural conditions that ought to pre- should the person's wearing eyeglasses affect our ~: Higgir
clude it. Finally,we may use the lab to produce con- judgments of his or her intelligence under any can. ~usion: ..T
ditions that have no counterpart in real life at all, ditions whatever? As a pure guess, I would hazard
so that the concept of "generalizing to the real ~to drink in
the following: Maybe we believe that (a) intelligent 't ~.
ably. woulc
.
world" has no meaning. But even where findings people read more than less intelligent ones, and (b) . 81ml1ar Clft
cannot possiblygeneralize and are not supposed to, that reading leads to visual problems, wherefore (c) ~. conclusion.

they can contribute to an understanding of the pro- the more intelligent are more likely to need glasses. ~' , Or t~
cessesgoingon. Once again, it is that understanding If that is how the argument runs, then it is an in- ',
'~.Threat of
which has external validity (if it does)-not the find- in these ci
"

stance of how our person perceptions are influenced ...

~
.
ings themselves,much less the setting and the sam- by causal "schemata" (Nisbett & Ross, 1980)-even . etyproba\:
ple. And this implies in turn that we cannot assess
that kind of validity by examining the experiment
where at least one step in the theoretical sequence
([b] above) is, as far as we know, simply false. ~...bemal,manife:
real-,
itself. . \ Looked at in that way, the difference between
the 15-second and the 5-minute condition is itself
I land of COI
Alternatives to Generalization If a conclusi
worth investigating further (as it would not be if the
"What Can" Versus "What Does" ..from the si
latter simply "invalidated" the former). If we are so
"Person perception studies using photographs or ready to abandon a rather silly causal theory in the .~plex and I

brief exposure of the stimulus person have com- light of more data, why are some other causal the-
f of shoc~ f
!' ;..and anXle
monly found that spectacles, lipstick and untidy hair ories, many of them even sillier, so fiercely resistant 1.>,been guilt
have a great effect on judgments of intelligence and to change? .. than one
other traits. It is suggested. . . that these results are The point is that in thinking about the matter ;) But
probably exaggerations of any effect that might oc- this way, we are taking the results strictly as we find l!darlatt d
cur when more information about a person is avail- them. The fact that eyeglasses can influence our i ferent sh
able" (Argyle, 1969, p. 19). Later in. the same text,
Argyle gives a specific example: "Argyle and
judgments of intelligence, though it may be quite
devoid of real-world application, surely says some- i
.
~
phrased,i
'

our subje
McHenry found that targeted persons were judged thing about us as judges. If we look just at that, then : fore, the ..

as 13 points of IQ more intelligent when wearing the issue of external validity does not arise. We are S~ diets tha1
.

f.~
spectacles and when seen for 15 seconds; however, no longer concerned with generalizing from the lab 'fis in nee,
if they were seen during 5 minutes of conversation to the real world. The lab (qua lab) has led us to ask 'S\the
'ft' hypo
spectacles made no difference" (p. '135). questions that might not otherwise occur to us. ~f Importar
Argyle (1969) offers these data as an example Surely. that alone makes the research more than a (\ the resul
of how "the results [of an independent variable stud- sterile intellectual exercise. . 'I plays no
ied in isolation] may be exaggerated" (p. 19). Ex- *it' Of<
aggeratedwith respect to what? With respect to what Predicting From and Predicting To *.modifi~
~
"really" goes on in the: world of affairs. It is clear' The next case study has a special place in my heart. ;'is possib
that on these groundS, Argyle takes the 5-minute It is one of the things that led directly to this article,
study, in which glasses made no difference, more which I wrote fresh from a 'delightful roaring ar-
~;tensions
~~andself-
seriously than the 15-second study, in which gument with my students about the issues at hand. ~,~tus fr
they did. . The study is a test of the tension-reduction view .~. alcoh~l t
Now from an "applied" perspective, there is no of alcohol consumption, conducted by Higgins and
question that Argyle' is right. Suppose that only the Marlatt ( 1973).Briefly,the subjectsweremade either .~SOmetm
' I>Ossibili
'

.
" .

15-second results were known; and suppose that on highly anxious or not so anXious by the threat of ..
.~,
~. sharpen
.

the basis of them, employment counselors began electric shock, and were permitted accessto alcohol ~r'l.:.tt
:., .

. advising their students to wear glasses or sales execu- as desired. If alcohol reduces tension and if people
tives began requiring their salespeople to do so. The drink it because it does so (Cappell & Herman, ';~t 3 I s!
result would be a great deal of wasted time, and all 1972),then the anxious subjects should have drunk '~ about th2
because of an "exaggerated effect," or what I have more. They did not. of! Jleriment
called an "inflated variable" (Mook, 1982). Powerful Writingabout this experiment, one of my better
1~a growin,
'~J:»but a~l. i
in the laboratory (13 IQ points is a lot!), eyeglasses students gave it short shrift: ..Surely not many al- .il', ,questIOn

382 April 1983 . American Psychologist


. ,

coholics are presented with such a threat under nor- it (as it is not restricted now) to certain kinds of
lDalconditions:' . I, tension and, perhaps, to certain settings. In short,
Ning' .. , Indeed. The threat of electric shock can hardly we could advance our understanding. And the "ar-
:d ~1Idk: be "representative" of the dangers faced by anyone tificial" laboratory findingswould have contributed
or set or' except electricians, hi-fi builders, and Psychology to that advance. Surely we cannot reasonably ask
101 students. What then? It depends! It depends on for more. ' '
10,where,...
\ ...", what kind of conclusion one draws and what one's It seemsto me that this kind of argument char-
1"e.\\1by" purpose is in doing the study. '
acterizes much of our research-much more of it
ffect °111: Higgins and Marlatt could have drawn this con- than our critics recognize. In very many cases, we
any COn. clusion: "Threat of shock did not cause our subjects are not using what happens in the laboratory to '

d hazard to drink in these circumstances. Therefore, it prob- "predict" the real world. Prediction goes the other
tltelIi&ent way: Our theory specifieswhat subjects should do ilaL,
ably would not 'cause similar subjects to drink in
I,and (b) similar circumstances either." A properly cautious in the laboratory.Then we go to the laboratory to
refore (c) conclusion,and manifestly trivial. ' ask, Do they do it? And we modify our theory, or
d~ Or they could have drawn this conclusion: hang onto it for the time being, as results dictate.
is an in. "Threat of shock did not cause our subjectsto drink Thus we improve our theories, and-to say it
dluenced '
in these circumstances. Therefore, tension or anxi- again-it is these that generalize to the real world
O>-eYen ~ I etyprobably does not cause people to drink in nor- if anything does. '

sequence , mal, real-world situations." That conclusion would Let me turn to an example of another kind. To
ilse.;, : be manifestlyrisky, not to say foolish; and it is that this point, it,is artificiality of setting that has been
between kindof conclusionwhichraisesthe issue ofEV. Such the focus. Analogousconsiderations can arise, how-
1 is itself < a conclusion does assume that we can generalize ever, when one thinks through the implications of
:be ifthe
weare so
. fromthe simpleand protected lab setting to the com- artificialityof, or bias in, the sample. Considera case
study.
plex and dangerous real-life one and that the fear
Iryin the of shock can represent the general case of tension A great deal of folklore, supported by some
lusalthe- and anxiety. And let me admit again that we have powerful psychologicaltheories, would have it that
resistant been guilty of just this kind of foolishnesson more children acquire speech of the forms approved by
than one occasion. their culture-that is, grammatical speech-through
Ie matter But that is not the conclusion Higgins and the impact of parents' reactions to what they say.
ISwefind Marlatt drew. Their argument had an entirely dif- If a child emits a properly formed sentence (so the
ence our ferent shape, one that changes everything. Para- argument goes), the parent responds with approval
be quite phrased,it went thus: "Threat of shock did not cause or attention. If the utterance is ungrammatical, the
ayssome- our subjectsto drink in these circumstances. There- parent corrects it or, at the least,withholdsapproval.
that,then fore, the tension-reduction hypothesis, which pre-, , Direct observation of parent-child interac-
:. Weare diets that it should have done so, either is false or tions, however, reveals that this need not happen.
It1the lab is in need of qualification." This is our old friend, Brown and Hanlon (1970) report that parents react
us to ask the hypothetico-deductive method, in action. The to ~hecontent of a child's speech, not to its form.
llTto us. important point to see is that the generalizabilityof If the sentenceemitted is factually correct, it is likely
re than a the results, from lab to real life, is not claimed. It to be approved by the parent; if false,disapproved.
plays no part in the argument at all. But whether the utterance embodies correct gram-
I Of course, these findingsmay not require much matical form has surprisingly little to do with the
modificationof the tension-reduction hypothesis. It parent's reaction to it.
my heart. ," is possible-indeed it is highly likely-that there are What kind of sample were Brown and Hanlon
aisarticle, 'j,,- liE
tensionsand tensions; and perhaps the naggingfears dealing with here? Families that (a) lived in Boston, liJi
)aring ar- ,
and self-doubtsof the everydayhave a quite different (b) were well educated, and (c) were willingto have 'j;
»at hand. ~ '-I,
'"

status from the acute fear of electric shock. Maybe squadrons of psychologistscamped in their living !
;tion view '~ alcoholdoes reduce these chronic fears and is taken, rooms, taping their conversations. It is virtually cer-
ggins and !:t sometimes abusively, because it does SO.3If these tain that the sample was biased even with respect .lC.
ade either ~'" possibilitiescan be shown to be true, then we could to the already limited "population" of upper-class- )~

threat of .c 1~!i
sharpen the tension-reduction hypothesis,restricting Bostonian-parents-of-young-children. '"
to alcohol ..1, Surely a sample like that is a poor basis from ~
.,.1
'%1.

if people':~; which to generalize to any interesting population.


Herman, 1iW 3 I shouldnote,however,that there is considerabledoubt But what if we turn it around? We start with the.
about that as a statement of the general case. Like Harlow's ex- theoretical proposition: Parents respond to the
we drunk .f,>~
periment, the Higgins and Marlatt (1973) study articulates with
, .~.~~ '
a growing body of data from very different sources and settings, grammar of their children's utterances (as by mak-
my better ""
~'¥t but all, in this case, calling the tension-reduction theory into ing approval contingent or by correcting mistakes).
.
. question (cf. Mello & Mendelson,)978). Now we make the prediction: Therefore, the parents
manyal-~t
'Ij~..
..

ychologist
.~.~<f.
April 1983 . American Psychologist 383
.'

"'(T'

we observeought to do that. And the prediction' is of interest because they are not representative of a .! ASan exa
'disconfirmed. . language-using species. And with all the quarrels
Going further, if we find that the children their accomplishments have given rise to, I have not .~op~ical
:,rSUDplified,ecoe)
Brown and Hanlon studied went on to acquire Bos- seem them challengedas "unrepresentative chimps," t'ftiSthese thing!
tonian-approved syntax, as seemslikely,then we can except by students on examinations (I am not mak- ~\i!.' 1. Dark
1.'~"
draw a further prediction and see it disconfirmed. ing that up). The achievements of mnemonists tfhere is a raI
If the theory is true, and if these parents do not react (which show us what can happen, rather than what sitivity,follow
to grammaticality or its absence,then these children typically does) 'are of interest because mnemonists !~ 2. The fj
should not pick up grammatical speech. If they do are not representative of the rest of us. And when ":'aptation by tt
'so anyway,then parental approval is not necessary one comes across a mnemonist one studies that f"~I Hecht (11
for the acquisition of grammar. AIid that is shown mnemonist, without much concern for his or her Conclusions b'
not by generalizingfrom sample to population, but r~presentativenesseven as a mnemonist. 'cones (them~
by what happened in the sample. But what do students read? "Samples should photochemic~
It is of course legitimate to wonder whether the always be as representative as possible of the pop- "are densely 1
same contingencies would appear in Kansas City ulation under study." "[A] major concern of the ...much less ser
working-class families or in slum dwellers in the behavioralscientistis to ensure that the sample itself 'ible waveleng
Argentine. Maybe parental approval/disapproval is is a good representative [sic] of the population." ...the cone COIJ
a much more potent influence on children's speech (The sourcesof these quotations do not matter; they bymaking b:
in some cultures or subcultures than in others. Nev- come from an accidental sample of books on my .center of the
ertheless, the fact would remain that the parental shelf.) 1i'! Now let
approval theory holds only in some instances and The trouble with these remarks is not that they invalidity of
must be qualified appropriately. Again, that would are false-sometimes they are true-but that they in a dark rOI
be well worth knowing,and this sample of families are unqualified. Representativeness of sample is of 'light may aJ:
would have played a part in establishing it. vital importance for certain purposes, such as survey , that, in the'
The confusion here may reflect simple histor- research. For other purposes it is a trivial issue.4 "'subject siml
ical accident. Considerations of sampling from pop- Therefore, one must evaluate the sampling proce- Iresponse. S1
ulations were brought to our attention largely by dure in light of the purpose-separately, case by little like h
survey researchers,for whom the procedure of "gen- case. I 1966)-We
eralizing to a population" is of vital concern. If we the differenl
want to estimate the proportion of the electorate Taking the Package Aparl I' Howt
intending to vote for Candidate X, and if Y%of our wortd? The'
sample intends to do so, then we want to be able to Everyone knows that we make experimental settings haveno r~
artificial for a reason. We do it to control for extra-
say something like this: "We can be 95% confident . and in spad
that Y% of the voters, plus or minus Z, intend to neous variables and to permit separation of factors system WOI
vote for x." Then the issue of representativeness is that do not come separately in Nature-as-you-find- ~ Thatis wh:
squarely before us, and the horror stories of biased it. But that leaves us wondering how, having stepped flyingplan!
sampling and wildly wrong predictions, from the out of Nature, we get back in again. How do our ray prints (
Literary Digest poll on down, have ever,yright to findings apply to the real-life setting in all its com- blindness J:
keep us awake at night. plexity? besides.
I think there are times when the answer has to 1'.1 Such
But what has to be thought through, case by
case, is whether that is the kind of conclusion we be, "They don't." But we then may add, "Something ~f real-wo
IiiI else does. It is called understanding." cesseswe
,
intend to draw. In the Brown and Hanlon (1970)
, case,nothing could be more unjustified than a state- the real w
mentof the kind,"Wecanbe . W%certain that X% 4 There is another sense in which "generalizingto a popu- ~ terestbea
of the utterances of Boston children, plus or minus lation" attends most psychological research: One usually tests the .,~ many J
significance of one's findings, and in doing so one speaks of sam- true, look
y, are true and are approved." The biased sample ple values, as estimates of population. parameters. In this con-
rules such a conclusion out of court at the outset. nection, though, the students are usually reassured that they ca~ ~f"target
But it was never intended. The intended conclusion always define the population in tenns of the sample and take It it certain]
was not about a population but about a theory. That from there-which effectively leaves them wondering what all the III ,set. We n
flap was about in the first place. ,
. .:fi!,ing,
parental approval tracks content rather than form, Perhaps this is the place to note that some of the case studies
ffiI
in these children,means that the parental approval I have presented may raise questions in the reader's mind t~at ~ults aI
theory of grammar acquisition either is simply false are not dealt with here. Some raise the problem of interpreting
,U\ which
or interacts in unsuspected ways with some attri- null conclusions; adequacy of controls for confounding variables "IUminati.
bute(s) of the home. may be worrisome; and the Brown and Hanlon ( 1970) study faced lengths (
the problem of observer effects (adequately dealt with, I think: does.Th
In yet other cases, the subjects are of interest see Mook, 1982). Except perhaps for the last one, however, th~
precisely because of their unrepresentativeness. issues are separate from the problem of external validity, whiCh
J'~rld pi
Washoe, Sarah, and our other special students are is the only concern here. ':";.yAltc
-
384 April 1983 . American Psychologist,
A.san example, consider dark adaptati.on. Psy- pl.ore a kn.own phen.omen.on, but t.o determine
i~~~.. . dlophysical experiments, c.onducted in restricted, whether such and such a phen.omen.on exists .or can
~-,.~,
:ii~ Sif11Plified,ecol.ogically invalid settings, have taught be made t.o .occur. (Here again the emphasis is '.on
what can happen, n.ot what usually d.oes.) Henshel
Bps'
nak.
". \ISthese things am.ong .others:
1. Dark adaptati.on .occurs in tw.o phases. (1980) has n.oted that some intriguing and impOrtant
'nists fbere is a rapid and rather small increase in sen- phen.omena, such as bi.ofeedback, c.ould never have
What sitivity,f.oll.owedby a delayed but greater increase. been disc.overed by sampling .or mimicking natural
Inists 2. The first .of these phases reflects dark ad- settings. He points .out, t.o.o,that if a desirable phe-
"hen aPtati.onby the cones; the sec.ond, by the rods. n.omen.on .occurs under lab.orat.ory conditions, .one
that Hecht (1934) dem.onstrated the sec.ond .ofthese may seek t.o make natural settings mimic the lab-
rher conclusi.onsby taking advantage .ofsame facts ab.out .orat.oryrather than the .other way ar.ound. Engineers .
cones(themselves established in ec.ol.ogicallyinvalid are familiar with this appraach. Sa, far instance, are
.Quid photochemical and hist.ol.ogicallab.orat.ories). Canes many behavi.or therapists. '.
POp. are densely packed near the f.ovea; and they are (I part c.ompany with Henshel's excellent dis-
r the lI1uchless sensitive than the rods t.o the sh.orter vis- cussi.on .only when he writes, "The requirement .of
itself iblewavelengths. Thus, Hecht was able t.o tease .out 'realism: .or a faithful mimicking .of the. .outside
ian." the cane c.omp.onent .of the dark-adaptati.on curve w.orld in the laborat.ory experiment, applies .only t.o
they bymaking his stimuli small, restricting them t.o the . . . hyp.othesis testing 'within the l.ogico-deductive
IDly center.of the visual field, and turning them red. model .of research" [po 470]. Far reas.ons given ear-
N.owlet us c.ontemplate the manifest ec.ol.ogical lier, I do n.ot think it need apply even there.)
they invalidity .of this setting. We have a human subject
in a dark r.o.om,staring at a place where a tiny red
The Drama' of the Artificial
they
is .of tightmay appear. Wh.o .on earth spends time d.oing Ta this point, I have c.onsidered alternatives t.o the
Irvey that, in the w.orld .of affairs? And .on each trial, the "anal.ogue" m.odel .ofresearch and have painted .out
sue.4 subject simply makes a "yes, I see it/n.o, I d.on't" that we need n.ot intend t.o generalize .our results
race. response. Surely we have subjects wh.o "behave as fr.om sample t.op.opulati.on, .orfr.om lab t.olife. There'
e by tittle like human beings as p.ossible" (Bannister, are cases in which we d.owant t.o d.o that, .ofc.ourse.
1966)-We might be calibrating a ph.ot.ocell far all Where we d.o, we meet an.other temptati.on: We may
the difference it w.ould make. assume that in .order t.o generalize t.o "real life," the
Haw then d.o the findings apply t.o the real lab.oratary setting sh.ould resemble the real-life .one
worid?They d.o n.ot. The task, variables, and setting as much as p.ossible. This assumpti.on is the farce
tings haven.o real-w.orld c.ounterparts. What d.oes apply, behind the cry far "representative settings,"
Ktra-
~ars andin spades, is the understanding .ofhaw the visual The assumpti.on is false. There are cases in
find. system w.orks that such experiments have given us. which the generalizati.on from research setting t.o
pped That is what we apply t.o the real-w.orld setting-t.o real-life settings is made all the stronger by the lack
flyingplanes at night, t.o the problem .ofreading X- .of resemblance between the tw.o. C.onsider an ex-
.our
:.001- rayprints .onthe spat, t.oeffective treatment .ofnight ample.
blindness pr.oduced by vitamin deficiency, and much A research project that c.omes in far criticism
besides. al.ong these lines is the well-kn.own w.ork an .obedi-
asta
Such experiments, I say, give us understanding ence by Milgram (1974). In his w.ork, the difference
hing
of real-w.orld phen.omena. Why? Because the pro- between a lab.orat.ory and a real-life setting is
cesseswe dissect in the lab.orat.oryals.o.operate in brought sharply int.o f.ocus. S.oldiers in the jungles
the real w.orld. The dark-adaptati.on data are .of in- .of Viet Nam, concentrati.on camp guards .on the
popu- . terest because they shaw us a pr.ocess that d.oes.occur
Itsthe fields .of Eastern Europe-what resemblance d.o
'sam- in many real-w.orld situati.ons. Thus we c.ould, it is their envir.onments bear t.o a sterile roam with a
i con- true, l.o.okat the lab.orat.ory as a member .of a class shack generat.or and an interc.om, presided .over by
:ycan of "target" settings t.o which the results apply. But a white-c.oated scientist? As a setting, Milgram's
aleeit it certainly is n.ot a "representative" member .ofthat
dlthe surely is a prot.otype .of an "unnatural" .one.
set. We might think .of it as a limiting, .or even de- One p.ossible reacti.on t.o that fact is t.o dismiss
tudies fining, member .ofthat set. T.o what settings d.o the the w.ork bag and baggage, as Argyle (1969) seems
:Ithat results apply? The sh.ortest answer is: t.o any setting t.o d.o: "When a subject steps inside a psychal.ogical
reting in which it.is relevant that (far instance) as the il- lab.orat.ory he steps .out .of-culture, and all the n.ormal
iables
faced luminati.on dims, sensitivity t.o l.onger visible wave- rules and conventi.ons are temp.orarily discarded and
think; lengths drops .out bef.ore sensitivity t.o sh.ort .ones replaced by the single rule .of labarat.ory culture-
these does. The findings d.o n.ot represent a class .of real- 'd.o what the experimenter says, n.o matter haw ab-
which, " warld phenamena; they define .one. surd .or unethical it may be' " (p. 20). He g.oes .on
Alternatively, .one might use the lab n.ot t.o ex- t.o cite Milgram's w.ork as an example.

April 1983 . American Psych.ol.ogist 385


~ ~

All of this-which is perfectly true-comes in cognitive escape hatch available to them. If Mil. ,e characterist
a discussion of how "laboratory research can pro- gram's subjectsdid say "It must not be dangerous" t;~lation? Or an
duce the wrong results'" (Argyle, 1969, p. 19). The then his conclusion-people are surprisinglywilli~ ~about a popula'
wrong results! But that is the whole point of the , to inflict danger under orders-is in fact weakened. rwhat these subj
results. What Milgram has shown is how easily we The important thing to see is that the checklist ~pes) would it
can "step out of culture" in just the way Argylede- approach will not serve us. Here we have two dif- :can be made tl
As to the
. scribes-and how, once out of culture, we proceed ferences between lab and life-the absence of pun-
to violate its "normal rules and conventions" in ishment and the possibility of discounting the danger what would h~
ways that are a revelation to us when they occur. of obedience. The latter difference weakens the im- class of such:
vides dependi1
Remember, by the way, that most of the people pact of Milgram's finding~; the former strengthens
, The ans"
Milgram interviewed' grossly underestimated the it. .Obviously we must move beyond a simple count
, amountof compliancethat wouldoccurin thatlab- of differences and think through what the effect of be testing a pI
oratory setting. each one is likely to be. theory may s
setting. Then 1
. Another reaction, just as wrong but unfortu-
nately even more tempting, is to start listing simi- Validity of What? the theory a
larities and differencesbetween the lab setting and Ultimately,what makes research findingsof interest , question vani
the natural one. The temptation here is to get in- is that they help us understand everyday life. That tJ, Or the a
volved in count-'em mechanics: The more differ- understanding, however,comes from theory or the
analysis of mechanism; it is not a matter of "gen-
{ sentative"of
Is it therefor
ences there are, the greater the external invalidity. ~,

Thus: eralizing" the findings themselves. This kind of va-


lidity applies(if it does)to statements like "The hun-
One element lackingin Milgram's situation that typically ger-reduction interpretation of infant attachment
,
~ thatitbear
I,
cessesthat n
iUf obtains in similar naturalistic situations is that the ex-
will not do," or "Theory-driven inferences may bias r latt~r,
'
pe:ha1
sett10gs 10 ,
perimenter had no real power to harm the subject if the first impressions," or "The Purkinje shift occurs
subject failed to obey orders. The subject could always because rod vision has these characteristics and cone f" plest possibl
simply get up and walk out of the experiment, never to
,

~ to be. In th
see the experimenter again. So when considering Mil-
gram's results, it should be borne in mind that a powerful
vision has those." The validity of these generaliza-
tions is tested by their success at prediction and has .. actually dej
the findings
The questi(
source of obedience in the real world was lacking in this nothing to do with the naturalness, representative- K

situation. (Kantowitz & Roediger, 1978, pp. 387-388) ness, or even nonreactivity of the investigations on ~ preserves tl
"
which they rest. :~"', issueof ext
"Borne in mind" to what conclusion? Since the next Of course there are also those cases in which .~,
,
,
Wem ,

Ii'
sentence is "Nonetheless, Milgram's results are truly 'one does want to predict real-life behavior directly ther. Supp<
remarkable" (p. 388), we must suppose that the re- from research findings. Survey research, and most the researc
marks were meant in criticism. '

experiments in applied settings such as factory or shouldren


-Now the lack of threat of punishment is, to be classroom, have that end in view. Predicting real-life or restrict
sure,'a major differencebetween Milgram's lab and behavior is a perfectly legitimate and honorable way strengthen
the jungle war or concentration camp setting. But to use research. When we engage in it, we do con- powerto I
what happened? An astonishing two thirds obeyed front the problem of EV, and Babbie's (1975) com- Thinl
anyway.The force of the experimenter's authority ment about the artificiality of experiments has force. fuzzyphn
was sufficientto induce normal decent adults to in- What I have argued here is that Babbie's com-
flictpain on another human being, even though they ment has force only then. If this is so, then external 5Of co
could have refused without risk. Surely the absence validity, far from being "obviously our ideal" processcan
of power to punish, though a distinct differencebe- (Campbell & Stanley, 1967), is a concept that applies on such qUI
tween Milgram's setting and the others, only adds only to a rather limited subset of the research
reallycaptu
to the drama of what he saw. If resolutio
we do. whetherthl
There are other threats to the external validity other.It wi
of Milgram's findings, and some of them must be. A Checklist of Decisions the one an
taken more seriously. There is the possibility that I am afraid that there is. no alternative to thinking about that.
the orders he gave were "legitimized by the labo- through, case by case, (a) what conclusion we want
ratory setting" (Orne & Evans, 1965, p. 199). Per- to draw and (b) whether the specifics of our sample
haps his subjects said in effect, "This is a scientific or setting will prevent us from drawing it. Of course
experiment run by a responsible investigator, so there are seldom any fixed rules about how to "think
maybe the whole business isn't as dangerous as it through" anything interesting. But here is a sample
looks." This possibility(which is quite distinct from of questions one might ask in deciding whether the
the last one, though the checklist approach often usual criteria of external validity should even be
confusesthe two) does leave us with naggingdoubts considered:
about the generalizability of Milgram's findings. As to the sample: Am I (or is he or she whose
Camp guards and jungle fighters do not have this, work I am evaluating) trying to estimate ,from sam-

386 April 1983 . American Psychologist


.-"
:\ -
~
,"~
'>!
.' '''.~
. .
':;0

If -.. pie characteristics the.characteristics of s°!lle pop- with the cold creepies with which my students as-
gerotiJ ulation? Or am I trymg to draw conclusIOns not sault research findings: knee-jerk reactions to "ar-
f wiUin;'
:akened.,
- ,... .
.
about a population, but about a theory that specifies
what these subjects ought to do? Or (as in linguistic
tificiality"; finger-jerkpointing to "biased samples"
and "unnatural settings"; and now, tongue-jerk im-
precations about "external invalidity." People are
:heckIist apes) would it be important if any subject does, or
two dif. . can be made to do, this or that? already far too eagerto dismisswhat wehavelearned
of pun." As to the setting: Is it my intention to predict (even that biased sample who come to college and
~danger what would happen in a real-life Settingor "target" elect our courses!). If they do so, Jet it be for the
the itn. ~ class of such settings? Our "thinking through" di- right reasons. '.

gthens , videsdepending on the answer.


e COunt The answer may be no. Once again, we may REFERENCES
~ be testing a prediction rather than making one; our
ffectof.., I theory may specify what ought to happen in this Argyle, M. Social interaction. Chicago: Atherton Press, 1969.
Babbie, E. R. The practice of social research. Belmont, Calif.:
~ setting.Then the question is whetherthe settinggives Wadsworth, 1975. .
the theory a fair hearing, and the external-validity Bannister, D. Psychology as an exercise in paradox. Bulletin of
interest .' question vanishes altogether. . the British Psychological Society, 1966, 19, 21-26.
e. That Or the answer may be yes. Then we must ask, Bickman, L. Social roles and uniforms: Oothes make the person.
, or the Psychology Today, July 1974, pp. 49-51.
Is it therefore necessary that the setting be "repre- Brown, R., & Hanlon, C. Derivational complexity arid order of
If "gen- sentative" of the class of target settings?Is it enough acquisition in child speech. In J. R. Hayes (Ed.), Cognition and
:lof va- that it be a member of that class, if it captures pro- the development of language. New York: Wiley, 1970.
he hun. cessesthat must operate in all such settings? If the Brunswik, E. Representative design anQ probabilistic theory in
chment a functional psychology. Psychological Review. 1955, 62.
latter,perhaps it should be a "limiting case" of the 193-217.
ay bias settings in which the processes operate-the sim- Campbell, D. T., & Stanley, J. C. Experimental and quasi-exper-
occurs plestpossible one, as a psychophysicslab is intended imental designsfor research. Chicago: Rand McNally, 1967.
ldcone to be. In that case, the stripped-down setting may Cappell, H., & Herman, C. P. Alcohol and tension reduction: A
:raliza- actually define the class of target settings to which review. Quarterly Journal of Studies on Alcohol. 1972, JJ.
nd has 33-64.
the findings apply, as in the dark-adaptation story. Deese, J. Psychology as science and art. New York: Harcourt
tative- The question is only whether the setting actually Brace Jovanovich, 1972.' ,
ons on preserves the processes of interest,5 and again the Hecht, S. Vision II: The nature of the photoreceptor process. In
issueof external validity disappears. C. Murchison (Ed.), Handbook of general experimental psy-
which We may push our thinking through a step fur- chology. Worcester, Mass.: Oark University Press, 1934.
Henshel, R. L. The purposes of laboratory experimentation and
lirectly ther. Suppose there are distinct differencesbetween the virtues of deliberate artificiality. Journal of Experimental '1
I most the research setting and the real-lifetarget ones. We Social Psychology, 1980, 16. 466-478.
ory or shouldremember to ask: So what?Will they weaken Higgins, R. L., & Marlatt, G. A. Effects of anxiety arousal on the
eal-life or restrict our conclusions? Or might they actually consumption of alcohol by alcoholics and social drinkers. Jour-
)Ieway nal of Consulting and Clinical Psychology, 1973,41.426-433.
strengthen and extend them (as does the absence of Kantowitz, B. H., & Roediger, H. L., III. Experimental psychology.
0 con- powerto punish in Milgram's experiments)? .' Chicago: Rand McNally, 1978.
) com- Thinking through is of course another warm, Mello, N. K., & Mendelson, J. H. Alcohol and human behavior.
force. fuzzy phrase, I quite agree. But I mean it to contrast In L. L. Iverson,.s. D. Iverson, & S. H. Snyder (Eds.), Hand-
,com- book of psychopharmacology: Vol. 12. Drugs of abuse. New
ternal York: Plenum Press, 1978.
S Of course. whether an artificial setting does preserve the Milgram, S. Obedience to authority. New York: Harper & Row,
ideal" process can be a very real question. Much controversy centers 1974.
lpplies ' on such questions as whether the operant-conditioning chamber Mook, D. G. Psychological research: Strategy and tactics. New
search really captures the processes that operate in, say,the marketplace. York: Harper & Row, 1982.
If resolution of that issue comes, however, it will depend on Nisbett, R. E., & Ross, L. Human inference: Strategies and short-
whether the one setting permits successful predictions about the comings in social judgment. New York: Century, 1980.
other. It will not come from pointing to the "unnaturalness" of Orne, M. T., & Evans, T. J. Social control in the psychological
the one and the "naturalness" of the other. There is no dispute experiment: Anti-social behavior and hypnosis. Journal of Per-
about that. sonality and Social Psychology, 1965, I, 189-200.

April 1983 . American Psychologist 387.

S-ar putea să vă placă și