Sunteți pe pagina 1din 5
2. Belief in the law of small numbers Amos Tversky and Daniel Kahneman “Suppose you have run an experiment on 20 subjects, and have obtained a significant result which confirms your theory (© ~ 223, p= 05, two- tailed). You now have cause to run an additional group of 10 subject ‘What do you think the probability is thatthe results wil be significant by sonetailed test, separately for thie group?” you fel that the probability is somewhere around 85, you may be pleated to know that you belong toa majority group. Indeed, that was the Inedian answer of two smal groups who were kind enough to respond t0 ‘questionnaire distributed at meetings of the Mathematical Paychology Group and ofthe American Prychological Asciaion (On the other hand, if you feel thatthe probability Is around 8, you bolong toa minority. Only 9 of our 84 respondents geve ansnrers between AD and 60, However, 8 happens to bea much more resonable estimate than a5! “Apparently, mast peychologists have an exaggerated belief inthe kel: hood of successfully replicating an obtained Bading, The sources of such ‘icibe alternative the mul hypothe The probably sequetd inthe queton es {ree tntrpnted sie panera he ar et es toe patty of oatng ‘oto thet api tn th petal ci osteoma we fo ane ot ‘Would compat the Power af the tat aquest the hypothe that the popuation mean Ete copies poiiySSbcning'rs 1804 ony Araceae oats apponch Sto nerf the esd pechiy wins Basen Fasewao ‘dlceoat irae se appro fccte pt Urban. oaning + {isos ere cates pte pity 8: Gey ie pe ite Eggi mi nebo ws oe to eps pri even Tekh cai pps in Phot alt 17.215 Antic eli Assan Raped pmo opyaghte srt by beliefs, and their consequences forthe conduct of scientific inquiry, are what this paper is about. Our thesis i that people have strong intuitions bout random sempling that thse intltions are wrong in fundarnental "respects that these intuitions are shared by naive subjects and by trained. ‘cient and that they are applied with unfortunate consequences in the ‘course of scenic Ingury ‘We submit that people view a sample randomly drawn from a populs- tion as highly representative, thats, similar to the population in all, fssential charecteistis, Consequently, they expect any two samples ‘raven from a particulae population to be more similar to one mother and to the population than sampling theory predict, at lease for small samples “The tendency to rogard a sample asa representation le manifest in a wide variety of situations. When subjects are instructed t0 generate & random sequence of hypothetical tosses of @ fal coln, for example, they Drodiuce sequences where the proportion of heads in any short segment Stays far closer to 30 than the laws of chance Would predict (Tune, 1964), ‘Thus, each segment ofthe reponse sequence i highly representative of the “faienese” of the coin, Similar effects are observed when subjects successively predict events in a andomly generated series, sin probabl ity learning experiments (Estes, 1968) of in other sequential games of chance. Subjects act ab if ery segment of the random sequence must reflect the true proportion: ifthe sequence has strayed fromthe popul tion proportion, a corrective bias inthe other direction is expected. This hha been called the gambler fallacy, ‘The hear ofthe gamble’ fallacy isa misconception of the fitness of. the laws of chance. The gamble feels thatthe falmese ofthe coin entitles him to expect that any deviation in one direction will soon be cancelled by ‘corresponding deviation inthe other. Even the fairest of coi, howevez, given the limitations of is memory end moral sense, cannot be as fai a the gambler expects it to be. This fallacy is not unique to gamblers, Consider the following example "The mean 1 of the population of eighth graders nity is no o be 100. You have elected random sinple of 50 chieen fara study of edsational achieve ‘nents The st ld tested ae an 10 of 150, What do you expect he mean 10 10 Seforthe whole sample? ‘The coneet answer is 101. A suprisingly large number of people believe that the expected IQ for the sample i sll 100. Thi expectation can Be Juste only by the belief that a random process ie eelécorrecting, Idiom such a “errors cancel each other out” rellect the image of an active self-correcting proces. Some familia proceses in nature obey such law 8 deviation from a stable equilibrium produces a force that restores the equilibrium. The laws of chance, in contrast, do not work that way deviations are not canceled as sampling. proceeds, they are merely ‘diluted olin hela small numbers 25 “Thus fer, we have altenipied to describe to slate intuition about chance. We propos» presentation hypothe scoring to wh people believe samples tobe very sat to one another undo the Population fro which they ve raw. We ns sugested that people Ietievesompingo Bea secreting procs The to belied ote sme consequences. Bish generate expecationy bout charters of Seles and the varsity of there txpectation else an the te anu t east forall samples The lew of lage mumbers gunrantes that very large samples wil indeed be highly tepresenttv of the population som which ey are dawn. finan, a seltcorsective fehdency i at works then tal Sampis should sho be highly representative and tr fo one ether ope’ intone about tino sping spear fo sty the la af fall numbers, whic sere thatthe law of lange numbers apis Soll pombe el “Considers hypothetical wens who lives bythe lof mall numbers How ssid hn belie aft his setiewerk? Assume or sents {udies phenomena whowe magritude i null relative fo uncontlled ‘erably, tat nthe lgali-noe rato nthe mesage he rece {fom nature slow. Our seni ould be meteorologists Pharmaclo, fie operopes popchologe Te he believes inthe aw of small numbers, the alent wil have ranged condence in the validiy of concaione based on sul simpler Toasts ppone he is engaged in tying which of to toy infants wil rte ay with Of oe Sin ve nasa out fave shown’ prerence forthe sume toy Many ayehogi wt fee ‘ome confidence at hsp, thatthe al hypothe of no preferences isn. Fortunately, such conection le not suficent‘cndion for jnaral pain, although ay do for» book By 2 quick compute fon, our paychologt will sower thatthe probity ofa esl ‘trom athe one obtsne as hgh st nde the mul hypothe “obese the splcaton of sata hypothe tating to vrente Inference i beet ith serous dielties Nevertheless the computation of ignicanes level (or hikethoed rate, av» ayectan might pete) forces the sent to evlate the otsned cect in tems of © il {Stinat of sampling varance rather shan inter of hb subj bed ‘imate. Stati tnt therefore, pret the scenic community Sains overly hay rotons ofthe nal hypotese he Typ ern by Felcing ie many members who would rather live bythe iw of sal amber On the othe hand, eve arena comparle sgn apna the mak of fling to confirm a valid tesrch hypothesis Ge Tope TE cron Imagine a paychloget who sades the coration between need for achievement and gran When deiing on samples, he ay reason a follows "What correltion dot expect? = 35, What N dol need to mabe a 1 result significant? (Looks at table.) N — 33. Fine, that’s my sample.” 26 REPRESENTATIVENESS ‘The only flaw in this reasoning is that our psychologist has forgotten about sampling variation, possibly because he believes that aay sample ‘must be highly representative ofits population. However, if is guess bout the correlation in the populstion ie correc, the correlation in the sample is about as likely to lie below orabove 5. Flenc, the likelihood of oblaining a signifcant result (Le, the power ofthe test) for = 33s about 50 Ina detailed investigation of statistical power, J. Cohen (1962, 1969) has provided plausible definitions of lange, medium, and small efecs and an ‘extensive set of computational ads wo the estimation of power fra vaciety ‘of statistical tess In the normal test fora difeence betreen two meen, {or example, a difference of 25 is small, a diference of 50 i inediuan, and a difference of Leis large, according tothe proposed definitions, The ‘mean IQ diference between clevieal end semisklled workers is a medium ‘fect. In an ingenious study of research practice, Cahen (1962) reviewed all the statistical analyses published in one volume of the Journal of ‘Abnormal and Social Psyokaagy, and computed the likelihood of detecting tach of the three sizes of effect. The average power wat 18 for the ‘detection of small effect, 8 for medium effets and 83 for large effets. If payehologist typically expect mediuin ells and select sample size in the above example, the power of their studies should indeed be about 50. Cohen's analysis shows thatthe statistical power of many peychological studies is ridiculously low, This isa self-defeating practice! it makes for Frustrated scientists and inelScient reeatch. The investigator who tests & valid hypothesis but fails to obtain significant results cannot help but rigard nature as untrustworthy or even hostile. Furthermore, as Overall (1965) has shown, the prevalence of studies deficient in statistical powers not only wasteful but actualy pernicious it results in lange proportion of invalid rejetlons ofthe null hypothesis aong published results, Because considerations of statistical povrer are of particular importance fn the design of replication studies, we probed atitudes conceening replication in our questionnaire Suppose one of your doctoral students as completed a dificult and time- contuming experiment on 80 animal He har sored and analyzed large number ‘of variables His results are generally inconcasive, but one before compar Seti highly sigan! = 270 whish ning and cou a me ‘eoreialsignieanee ‘Considering the importance of ternal surprise, an the number of snlyes that your stent har performed, would you recommend tha he epics the study before publishing? I you resomanend replication, ow many tas sould you urge hn tran Among the psychologists to whom we put these questions there wa ‘overwhelming sentiment favoring replication: it was recomended by 66 “This follows from the representation hypothesis if we expect ll samples to be very similar to one another, then almost all replications ofa valid elit inthe aw ofall mumbess 27 ‘out of 75 respondents, probably because they suspected that the single significant rele was due to chance. The median recommendation wes for the doctoral student to run 20 subjects in a replication study. Tt instructive to consider the likely consequences ofthe advice Ifthe mesa and the variance inthe second sample ae actully Identical o those in the frat sample, then the resulting value of twill be 188, Following the reasoning of Footnote 1, the student's chance of obitining a significant result in the replication is only slightly above one-half (for p ~.05, ‘onevtail test). Since we had anticipated that a repliction sample of 20 ‘would appese reasonable to our respondents, we added the following ‘question: ‘Assume that you unhappy student ha in ft rp the iii ty with 20, ‘ditional snimal and has obtained on insignia olin the sane dzstion, = 124 What would you reommend now? Check ene [ie numbers {arethesesreterto the numberof exponents who checked each answer] {c) Heshould pot the els and publish his onsasion a fact 0) (8) Heshould report the resulsera tentative nding (28) (@) Heshould am another group of eda 20] anal (21) (@) He should ty Yo nd an explanation forthe diference Between the wo us. (30) Note that regardless of one's confidence in the original fading, its credibility is surely enhanced by the replication. Not only isthe expert ‘mental effet inthe same direction inthe two samples bathe magnstuce ofthe effet in che zepleation i ally two-thlads of that in the orginal sludy. In view of the sample size (20), which our respondents recom ‘mended, the replication was about at succesafl at on is entitled to expect. The dissbution of responses, however, relecte continued skepticism «concerning the student's Snding following the recommended replication, ‘This unhappy state of affairs isa typical consequence of insulicient statsical power In contrast to Responses b and ¢, which can be justified on some {grounals the mest popular response, Response 4s indefensible. We doubt that the same answer would have been obtained i the respondents had realized that the difference between the two studies docs not even ‘spproach significance (Ifthe variances ofthe two samples are equal, for the difference is 53) In the absence ofa satisticl tes, oar respondents followed the representation hypothesis asthe difference between the wo samples was lager than they expected, they viewed it as worthy of ‘explanation. However the attempt to "Bnd an explanation for the dilfer- fence between the two groups” ii all probability an exercise in explain= Ing noise. Altogether our respondents evaluated the replication rather harshly. 2% REPRESENTATIVENESS hypothesis should be statistically significant, The harshness of the crite: ron for succesful replication fe manifest inthe responses to the following question: ‘An investigator has reported ze that you conser implausible, He ran 1 fjord veported » signfcont valu, 1 ~ 246. Another investigator os Sttemptod to duplicate is procedure and he obtained a nonsignicat value of ‘withthe same numberof abet The direction was the san in bh eof at ‘You ae reviewing the iteratre- Whats the highest value in the seo ee of data that you would dasrbe a failure a replete) ‘The majority of our respondents regarded t ~ 170 asa fulure to repliate, Ifthe data of two sueh studies (= 246 and f= 1.70) are pooled the value of for the combined data is about 3.0 assuming equal variances). Thus, ‘Weare faced witha paradoxical state fas, in which the sane data that would increase our confidence inthe finding when viewed ss part ofthe ‘original study, shake our confidence when viewed as an independent study. This double standard i particularly disturbing since, for many reasons, replications are usually considered as independent studies, and Ihypotheses are often evaluated by listing confirming and diseonfirning repors. Contrary to a widespread belief, a cae can be mua that a replication sample should often be larger than the orginal The decision to replicate once obtsiied finding often expresses » great fondness for that finding and» desife to see it accepted by a skeptical community. Since that ‘community unreasonably demands thatthe replication be independently significant or atleast that approach significance, one must fun a large ample. To illustrate, if the unfortunate doctorl student whose thesis was iscused earller assumes the validity of his initial result (¢~ 270, N = 40), and if hei wiling to accept ask of only 10 of abtsining » lower than 1.70, he should run approximately 50 ania in his replication study, With a somewhat weaker inital result (¢~ 220,.N = 40), the size ofthe "plication sample required forthe same power rises to about 75, That the effects discussed thus far are not Hmited to hypotheses about means and variances is demonstrated by the responses to the following ‘question: ‘You have run 2 coreationl study scoring 29 variale on 100 subjets. Twenty- seven ofthe 190 correlation cede ae signi tthe lve and 9 of {hee sre signifiant beyond the Ol level The mean absolute level of the igncat correlations 31, and the pattern of results i very cemnable on ‘ore! grounds How many a the egncantcormlatons would you expect ‘obevignicant again, nan ext epiation othe shady, with = 10? With © = 40, correlation of about 31 i required forsgaifcence atthe .05 level. This isthe mean of the significant correlations inthe oxginal Study. Thus, only about half ofthe originally significant correlations te, 130 14) would remain signiscant with N = 4. ln addition, ofcourse, the it nthe aw fall numbers 29 correlations in the replication aze bound to differ from those in the ‘original study. Hence, by regression effets, the initially significant coe flents are most likely to be reduced, Ths, 8 to 10 repeated significant ‘correlations from the original 7 is probably a generous estimate of what fone is entitled fo expect The median estimate of Our respondent is 18 ‘hiss more than the number of repeated signcant correlations that wil be found if the corelations are recomputed for 40 subjects randomly selected from the ariginal 100! Apparently, people expect more than mere duplication of the original statistics in the replication sample they ‘expect a duplication of the significance of results, with Ite regard for ‘smple size, This expectation requires a ludicrous extension ofthe repre~ sentation hypothesis; even the law af small mumbers ie incapable of generating such a rel The expectation that patterns of results are replicable almeet in their entirety provides the rationale for a common, though much deplored practice The Investigator who computes all correlations between three Indexes of anvety and three indexes of dependency wil often report end Interpret with reat confidence the single significant coreletion obtained His confidence in the shaky nding stems from his belief that the obtained correlation matrix is highly tepresentative and feaily repicn be In review, we have seen thatthe believer inthe la of small umbers praties science as follows 1. He gambles his research hypotheses on small samples without reaiz- ing that the odds against him are unreasonably high, He overestimates power. 2. He has undue confidence in early trends (eg. the data of the fist fev subject) and inthe stability of observed patterns (eg. the number and identity of significant result) He overestiates significance 5. In evaluating replications, hs or others he has unreasonably high expectations about the replicability of signidcant results, He undcrest ‘mates the breadth of confidence intervals ‘4 He rarely attributes a devistion of resulls from expectations to sampling variability, because he fnds a caural “explanation” for any discrepancy. Thus, he has litle opportunity to recognize sampling vata ton in action. His belief in the law of small numbers, therefore, will forever remain intact (ur questionnaire elicited considerable evidence for the prevalence of the belief in the law of small numbers? Our typisl respondent i 2 believer regardless ofthe group to which he belongs, There were practi cally no differences between the median responses of audiences at © "Wands (96029) has apd tht people i lo wtrcsulcent norman ‘hey tnd ennet movecrany fone dean nedanin come = 90 REPRESENTATIVENESS mathematical peychology meeting and ata general session of the Amer ‘an Prychologieal Assocation convention, although we make no clams for the representativeness of either sample. Apparently, acquaintance ‘with formal logic and with probability theory does not extinguish erro- ‘neous intuitions. What, then, ean be done? Can the belie in the law of final numbers be abolished ora leat controlled? "Research experience is unlikely to help much, because sampling vari tion is all to easly “explained.” Cormecive experiences are thors tat provide neither motive nor opportunity for spurious explanation. Thus, 3 ‘Hudent ina statistics course may draw repeated samples Of given size from ‘population, and learn the effect of sample size on sampling variability from personal abwervation. Weare far from cerlan, however that expect tions can be corrected in thie manner, since related biases, auch as the ‘semble’ fallacy, survive considerable contradictory evidence. "Even ifthe bias cannot be unlearned, students ean lesen to recognize is ‘existence and take the necesary precautions Since the teaching of satis: tics not short on admonitions, a warning about Based statistical intul- tons may not be out of place. The obvious precaution is computation. The believer in the law of small numbers has incortect intuitions about significance level, power, and confidence interval, Significance levels re ‘ually computed and reported, but power and confidence limits are not. Perhaps they should be Explicit computation of power relative to some reasonable hypothesis, {or instance, |. Cohen's (1962, 1969) small, large, and medium effects, should surely be catred out before any study i dane, Such computations ‘will often lend to the realization that there simpy no point in running the study unless, for example, sample size is multiplied by four. We refuse to believe that a serious investigator will knowingly accept a 50 risk of faing to contrm a valid research hypothesis. In addition, computations fof power are essential tothe interpretation of nogative results, that is, failures to reject the null hypothesis, Because reader’ intultve estimates ‘of power ar likely tobe wrong, the publication of computed values does ‘hot appear tobe a waate of either readers’ tie or journal space, In the early peychologicaliterstre, the convention prevailed of report Ing for xample, sample mean os M s PE, where PE s the probable error (Ge, the 30% confidence interval around the mean). This convention was later abandoned in favor ofthe hypothesl-terting formulation. A cond dence interval, however, provides & useful index of sampling variability, land Iti precisely this variably that we tend to underestimate. TRe ‘emphasis On significance levels ten to abecures fundamental distinction between the size of an effect and its statistical significance, Regardless of sample size, the size of an effect im one study ine reasonable estate of the sizeof the effect in replication. In contrast, the estimated signiiance level ina replication depends eiically on sample ze. Unralsic expec tations concerning the rplisbility of significance Levels may be corrected if the distinction between size and significance is clarified, and if the ‘computed sizeof observed effect srouinely reported. From ths point of ‘view at leat, the acceptance ofthe hypothesis-testing model has not been an unmixed blessing for peycholony. "The trae believer in the lav of amall numbers commits his multitude of cine aginst the loge of satistiea inference in good faith. The representa- fion hypothesis describes a cognitive or perceptual bias, which operates regardless of motivational factors. Thus, while the hasty rejection of the nll hypothesis is gratifying, the rejection of a cherished hypothesis is “aggravating, yet the tre belicwer subject to both. His intuitive expects tone are governed by a consatent misperception of the world rather than ‘by opportunistic wishful thinking. Given some editorial prodding, he may be willing to regard his statistical itultions with proper suspicion and replace impression formation by computation whenever possible

S-ar putea să vă placă și