Sunteți pe pagina 1din 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/261815731

Contrasts and Effect Sizes in Behavioral Research: A Correlational Approach


by Robert Rosenthal; Ralph L. Rosnow; Donald B. Rubin

Article  in  Journal of the American Statistical Association · January 2001


DOI: 10.2307/3085927

CITATIONS READS

0 640

1 author:

Robert J. Boik
Montana State University
68 PUBLICATIONS   1,811 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Robert J. Boik on 04 December 2014.

The user has requested enhancement of the downloaded file.


Review: [untitled]
Author(s): Robert J. Boik
Reviewed work(s): Contrasts and Effect Sizes in Behavioral Research: A Correlational Approach
by Robert Rosenthal ; Ralph L. Rosnow ; Donald B. Rubin
Source: Journal of the American Statistical Association, Vol. 96, No. 456, (Dec., 2001), pp. 1528
-1529
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/3085927
Accessed: 19/05/2008 12:43

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We enable the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.

http://www.jstor.org
1528 Book Reviews

from a historical perspective, contrastingthe Neyman-Pearson optimal test- discusses goodness-of-fit tests and model validation.This part of the book is
ing methodology with Fisher's pure significance testing. It is argued that written very tersely and may prove to be hard to understandfor readerswith
Neyman-Pearson testing should be viewed as testing within the boundaries little backgroundin statistics.
delineatedby the postulatedstatisticalmodel, whereas Fisher's approachis in PartD consists of seven chaptersthat discuss managementdecisions from
fact testing beyond these boundaries,that is misspecificationtesting. the perspective of both consumer and manufacturersin dealing with unreli-
An appealing feature of the book is the many interestinghistorical excur- able systems. Software reliability and maintainabilityissues are introducedin
sions made throughout.Anotherone is its relentless effort to provideintuition, Chapter 12. Chapter 17 discusses different postsale service issues including
motivation,and reflectionon the concepts and ideas that are introducedalong warrantiesand service contracts.These are of importancein the marketingof
the way. The text is not intendedas a cookbook of statisticaltechniques,but, unreliableproductsin situations,where postsale services are bundledwith the
more ambitiously,to put forwarda methodologicalframeworkfor economet- productand sold as a package. Lastly, Chapter 18 deals with various aspects
ric modeling. of optimizationof reliability.
The book provides many examples that illustratethe statistical theory as Part E, epilog, gives complete discussion of two cases using the concepts
it is unfolded, but very few of them are concerned with real-world data. introducedin the book in Chapter 19. A very up-to-date list of references,
The same criticism applies to the exercises at the end of each chapter that including different sources of reliability databases and softwares useful for
are almost exclusively of a theoretical nature. I feel that an opportunityis the reliabilitymodeling is providedin Chapter20. The book would have been
missed here. Another problem with the book is that it contains numerous more appealing as a textbook if implementationof reliability models using
typographicalerrors, and a number of substantive ones. Let me give two one of the popularstatistical software had also been discussed.
examples. On pages 156-157, the author suggests that the marginaldensity
of X can be visualized as the projection of the bivariatedensity f(x, y) on RajeshwariSUNDARAM
Universityof North Carolina-Charlotte
the [x, f(x, y)] plane. One gains the false impression from Figure 4.2 that
a marginal density is obtained from the shadow thrown on a plane by the
bivariatedensity.On page 456, it is statedthat an Ornstein-Uhlenbeckprocess
has differentiablesample paths. Furthermore,some well-known concepts are Contrasts and Effect Sizes in Behavioral
not defined in the usual way; a standardizedvariable,for example, is defined Research: A Correlational Approach.
on page 109 as X/[var(X)] /2, without centering. Robert ROSENTHAL. Ralph L. ROSNOW, and Donald B. RUBIN.Cam-
It is unavoidablethat some themes are not covered. As a text in economet-
bridge, UK: CambridgeUniversityPress, 2000, ISBN 0-521-65258-8. x
rics it is surprising,however, that some importanttopics are not even men- + 212 pp. $54.95.
tioned, such as generalized least squaresestimation,cointegration,unit roots,
vector autoregressions,and panel data methods.It is even more surprisingthat This book represents an elaboration and update of ideas presented in
importanttechniques for the analysis of observationaldata like instrumental Rosenthaland Rosnow's slender 1985 volume on contrastanalysis (Rosenthal
variablesand the generalizedmethod of moments do not play any role at all. and Rosnow 1985). The intended audience of the new book by Rosenthal,
In conclusion, while the book offers viewpointsnot commonly encountered Rosnow, and Rubin (hereafter,R3) is the same as that of the 1985 book and
elsewhere, I found its contents too uneven to recommendit as a textbook for it includes graduatestudents and behavioral science investigatorswith mod-
teaching econometrics. est mathematicalskills. The new book embodies several recommendations
Geert DHAENE made by the American Psychological Association's Task Force on Statistical
KatholiekeUniversiteitLeuven Inference (Wilkinson 1999); hereafter,called the Task Force. Robert Rosen-
thal cochairedthe Task Force and Donald Rubin was one of its distinguished
members.The TaskForce was assembledin responseto the nearly50-year old
Reliability: Modeling, Prediction and Optimization. debate concerningthe use of statistics (especially statisticaltests) in psychol-
Wallace R. BLISCHKE and D. N. PrabhakarMURTHY. New York:Wiley, ogy (e.g., Nunnaly 1960; Rozeboom 1960; Grant 1962; Binder 1963; Cohen
1994; Shrout 1997). The TaskForce gave theirapprovalto four unifying ideas
2000. ISBN 0-4711-8450-0. xxvii + 812 pp. $89.95 (H).
in R3, namely carefully selecting a priori contrasts in ANOVA; estimating
Blischke and Murthy state in the preface that the book is intended for a effect size; using counternullintervals as alternativesto confidence intervals
broadaudiencerangingfrom practitionerslike productionengineers and man- for effect sizes; and using the binomial effect size display (BESD) to display
agers, applied statisticians,as well as graduateand advanced undergraduate and interpreteffect size estimates. These ideas are discussed in detail later in
studentsin industrialengineering,operationsresearch,and statistics. I would this review.
recommendthis book to practitionersand as a graduatelevel book. The math- The 1985 book was met with less than enthusiasticcritical acclaim. Chief
ematical maturityneeded for the reliabilitymodel partof the book is such that among the criticisms was that the presentationsfailed to provide a context
it is not suitable at undergraduatelevel. The review of the book is organized for the concepts (Levin 1987). Too often, the analyses were "presentedas
aroundthe five parts of the book. recipes with little feel for the underlyingtheory or model," (Gordon 1987).
Part A introduces reliability analysis and its application in practical sit- Alas, even with the additionof Rubin as a coauthor,the new book is plagued
uations. Chapter 1 discusses the basic concepts needed to study the various with the same weakness. Equally troublesome is the uncritical promotion
aspects relatedto failures of engineeredobjects, their consequences and tech- of counternull intervals and the BESD. The Task Force recommendations
niques for their avoidance. Chapter 2 introduces various case studies like generally were well thoughtout and they likely will have a powerful influence
heart pacemakerfailure data, industrialproductsold in batches, reliability of on the teaching and use of statistics in psychology. Nonetheless, they missed
hydraulicsystems. This part of the book is very readableand is replete with the markwith their endorsementof counternullintervalsand the BESD. These
figures and flow charts to explain the complex systems, product life cycles, techniquespromise more than they deliver.
differentframeworksfor the study of reliability issues. Much of R3 is devoted to the computation of test statistics and associ-
Parts B and C discusses reliability modeling. Part B deals with statistical ated measures of effect size for contrasts in various experimentaldesigns.
methods for reliabilitydata analysis for very simple system or for a part of a Statistical models are never mentioned, nor is there any mention of distribu-
system. The basic probabilisticand statisticalconcepts requiredfor developing tional assumptions (yet p-values abound!). Repeated measures are analyzed
the statistical methods are discussed very tersely in Chapters 3 and 4. Part with scant attentionto the covariancestructureamong the measuresand with-
C builds on the statistical models introduced in Part B to more complex out mention of mixed models. Factorialdesigns are presentedas though they
systems. Chapter 6 discusses reliability modeling at component level, later necessarily include all higher order interactions.This unfocused approachto
develops it to multicomponentsystems in Chapter7. Topics of currentinterest experimentaldesign underminesthe book's mission of extolling the virtues
like statistical models for incomplete data (Types I and II, censored) which of a focused approachto data analysis.
frequently arise in failure time studies are discussed in Chapter8. Bayesian The book championsseveraleffect size statistics,each of which is a sample
approachto data analysis is also discussed in Chapter 8. Reliability issues correlation(or partialcorrelation)between the responseand the coefficients of
encountered in software quality and maintenanceare dealt in Chapter9. A the contrastof interest.The issue of what populationcharacteristicsare being
brief discussion of design of experimentis given in Chapter 10, Chapter 11 estimatedby these correlationsis never addressed.In Table 2.12 (p. 29), the
Book Reviews 1529

sample statistics are treatedas though they are parameters.The various sam- Gordon,I. (1987), "Reviewof ContrastAnalysis: Focused Comparisonsin the
ple correlationscan differ substantially,but the reader is given little help in Analysis of Variance,"by R. Rosenthaland R. Rosnow, AustralianJournal
deciding which to use in a particularinstance. Standarderrorsare not given of Statistics, 29, 108-109.
and this deficit reduces the usefulness of the statistics for later meta analy- Grant,D. A. (1962), "Testingthe Null Hypothesisand the Strategyand Tactics
ses. Incidentally,the TaskForce did not condemn correlation-basedmeasures, of InvestigatingTheoreticalModels,"Psychological Review, 69, 54-61.
but it did express a preference for nonstandardizedmeasures of effect size Levin, J. (1987), "Review of ContrastAnalysis: Focused Comparisonsin the
(e.g., ordinaryregressioncoefficients) wheneverthe units of measurementare Analysis of Variance,"by R. Rosenthal and R. Rosnow, Journal of the
meaningful. American Statistical Association, 82, 1191-1192.
Counternullintervals are promoted as substitutesfor confidence intervals
Nunnaly,J. (1960), "The Place of Statistics in Psychology,"Educationaland
(p. 13). The endpoints of counternullintervals are the effect sizes under the
null and alternativehypotheses for which the likelihood functions are equal. Psychological Measurement,20, 641-650.
Counternullintervalsare interpretedas confidence intervals,but otherwise are Rosenthal, R., and Rosnow, R. (1985), ContrastAnalysis: Focused Compar-
similar to the likelihood intervalsdescribedby Royall (1997, chaps. 4 and 6). isons in the Analysis of Variance,Cambridge:CambridgeUniversityPress.
In R3, the intervals are motivated by appealing to p-values ratherthan like- Royall, R. M. (1997), Statistical Evidence: A LikelihoodApproach,London:
lihood. Counternullintervals are useful for giving meaning to the likelihood Chapman& Hall.
of the null hypothesis, but they fall short as substitutesfor confidence inter- Rozeboom, W. W. (1960), "The Fallacy of the Null-Hypothesis Significance
vals. In particular,they suffer from the severe limitation that the "confidence Test,"Psychological Bulletin, 57, 416-428.
coefficient" is not chosen by the investigator.Instead, the coefficient is data Shrout, P. E. (1997), "Should Significance Tests Be Banned? Introductionto
dependentand is equal to 1- the two-sided p value that correspondsto the a Special Section Exploring the Pros and Cons," Psychological Science,
contrast. Accordingly, the confidence coefficient could be absurdly small or 8, 1-2.
large (e.g., 1 -5 x 10-'?, p. 98). If interval estimates of standardizedeffect Steiger, J. H., and Fouladi, R. T. (1997), "Noncentrality Interval Esti-
size measuresare desired, then a more sensible approachis to constructconfi-
mation and the Evaluation of Statistical Models," in What if There
dence intervalshaving fixed confidence coefficients (e.g., Steiger and Fouladi
Were No Significance Tests, eds. L. L. Harlow, S. A. Mulaik, and
1997). J. H. Steiger, Mahwah, NJ: Lawrence Erlbaum Associates Publishers,
Throughoutthe book, the BESD is used to display correlation-basedeffect 221-257.
sizes computed in ANOVA or frequency table settings. The BESD is con-
structedby fixing the marginsof a 2 x 2 frequencytable (groups x outcomes) Thompson, K., and Schumacker,R. (1997), "An Evaluationof Rosenthaland
Rubin's Binomial Effect Size Display,"Journal of Educationaland Behav-
to make the column sums and the row sums homogeneous. The cell frequen-
ioral Statistics, 22, 109-117.
cies are obtainedby equatingthe estimated(I coefficient to r (the correlation-
based effect size measure)and solving for the cell frequencies.The result is a Wilkinson, L. and the Task Force on StatisticalInference.(1999), "Statistical
2 x 2 table with success proportionequal to ' (1 ? r) in the two columns and Methods in Psychology Journals:Guidelines and Explanations,"American
equal to 2 on the average. Thompson and Schumacker(1997) arguedthat, in Psychologist, 54, 594-604.
the two-sample comparison of means setting, the BESD is appropriateonly
when the response variable can be meaningfully dichotomized to produce a
50% success rate and that, in general, the BESD can seriously distort effect Design and Analysis of Cluster Randomization
size measures.These criticisms were not addressedin R3. Instead,the BESD Trials in Health Research.
was employed in a routine, uncriticalmanner.On page 26, for example, the Allan DONNERand Neil KLAR.New York: Oxford University Press,
BESD is applied to data from a prospective study of the effects of aspirin 2000. ISBN 0-340-69153-0. x + 178 pp. $60.00 (H).
on risk of heartattack.Fitting the conventionalmultiplicative(i.e., log linear)
model for these data yields an estimated odds ratio of 0.55 and an associ- The stated purpose of this book is to present a unified treatmentof the
ated 95% confidence intervalof 0.43-0.69. The BESD for these data consists cluster randomized designs for use as a reference source in study planning
of a 2 x 2 table in which average heart attack risk is 50% (48.3% in the and analysis. Also, it is intended as a graduate level textbook in research
aspirin group and 51.7% in the placebo group) and the odds ratio is 0.87. methodology for biostatisticians,epidemiologists, health service researchers,
This applicationof the BESD is both unnecessaryand inappropriate.The odds and public health professionals. Although the emphasis of this book is on
ratio itself is a meaningfulmeasureof the multiplicativeeffect of exposure to
health research,the materialshould be useful to researchersfrom fields such
aspirin;that is, the odds of heart attack in the aspirin group were only 55%
as education, psychology, and sociology as well.
as large as the odds in the placebo group. The BESD fails to preserve mul- Trials randomizingclusters (e.g., families, communities, schools), some-
tiplicative effect size and instead preserves an irrelevantcharacteristicof the
times called group-randomizationtrials have become widespreadin the eval-
two-way table, namely the estimated F coefficient. This application of the uation of interventions. Reasons for adopting a cluster-randomizeddesign
BESD illustrates the danger of performingdata analysis without the benefit include controlling cost, minimizing experimentalcontamination,and meet-
of a statistical model. ing ethical requirements.Cluster-randomizeddesigns are less efficient than
The BESD is especially obtuse when applied to nonpairwise contrasts designs that randomizeindividualsto groups because individuals in a cluster
among the levels of an ANOVA factor. In this situation, the columns of the often are more similar than individuals in different clusters. The similarity
2 x 2 table do not correspond to actual factor levels, but instead represent
between individuals in a cluster is measured by the intraclustercorrelation
hypotheticalgroups at interpolatedlevels of the factor. One might be able to
coefficient, p. The penalty for adopting a cluster-randomizeddesign can be
make sense of this display if the factor is quantitative,but it is doubtful that
summarizedin terms of the varianceinflationfactor(VIF), where, in the sim-
any clarity has been achieved. plest situation, VIF = 1 + (m - 1) p and m is the cluster size. The variance
There is a need for a book that helps behavioral scientists to use and to
inflation factor measures the ratio of the variance of a mean for a cluster-
understandcontrasts and associated effect size measures. The authors(espe- randomizeddesign to the varianceof the mean for a randomsample with the
cially Rosenthal and Rosnow) deserve commendationfor their persistence in same numberof subjects. When p = 0, there is no penalty for using a cluster
this endeavor.Hopefully, they will succeed next time. design (VIF = 1); however, if the cluster size is appreciable,the penalty can
be large for a small (positive) correlationcoefficient.
Robert J. BOIK
Montana State University Many similarities exist between cluster-randomizeddesigns and cluster
sampling. Survey samplers have long known the impact of the cluster size
REFERENCES on the variance of the mean (e.g., Hansen and Hurwitz 1942). However, the
work available in the survey literature filtered slowly into other scientific
Binder, A. (1963), "FurtherConsiderationson Testing Null Hypothesis and disciplines. The authorssummarize(Sec., 2.3) several methodologicalreviews
the Strategy and Tactics of InvestigatingTheoreticalModels," Psychologi- of publishedstudies utilizing cluster-randomizeddesigns; they reportthat less
cal Review, 70, 107-115. than 25 percentof studies adequatelyaccountedfor the clusteringin the study
Cohen, J. (1994), "The Earthis Round (p < .05). American Psychologist, 49, design, and approximatelyhalf of the studies did not account for clustering
997-1003. properlyin the data analysis.

View publication stats

S-ar putea să vă placă și