Sunteți pe pagina 1din 10

Psychology in the Schools, Vol. 44(5), 2007 © 2007 Wiley Periodicals, Inc.

Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/pits.20238

ERRORS OF INFERENCE IN STRUCTURAL EQUATION MODELING


D. BETSY McCOACH, ANNE C. BLACK, AND ANN A. O’CONNELL
University of Connecticut

Although structural equation modeling (SEM) is one of the most comprehensive and flexible
approaches to data analysis currently available, it is nonetheless prone to researcher misuse and
misconceptions. This article offers a brief overview of the unique capabilities of SEM and dis-
cusses common sources of user error in drawing conclusions from these analyses. We make
recommendations to guide proper analytical practices and appropriate inferences and provide
references for more advanced study. © 2007 Wiley Periodicals, Inc.

Structural equation modeling (SEM) refers to a family of techniques that utilize the analysis
of covariances to explore relationships among a set of variables (Kline, 1998). SEM allows research-
ers to test the congruity of hypothesized models with their sample data (Breckler, 1990). Due to its
versatility and flexibility, SEM represents one of the most comprehensive approaches to research
design and data analysis available to social and behavioral scientists (Hoyle, 1995). SEM offers
many advantages over traditional data analytic techniques, yet it is not a panacea capable of fixing
flaws in theory, research design, sampling, or measurement. This article provides a short discus-
sion of the advantages of SEM, followed by descriptions of several common errors of inference
made when using SEM techniques. It is our hope that this exposition will help readers to become
more savvy and critical consumers and producers of research using SEM.

Advantages of SEM
SEM can accommodate modeling structures based on both latent and observed variables.
Latent variables refer to unobserved theoretical constructs. Often times, researchers cannot mea-
sure constructs of interest directly and thus have to rely on multiple indicators of a single construct
to appropriately capture its meaning. For example, intelligence is a latent variable, which might be
measured using indicators (or measured variables) such as verbal IQ, non-verbal IQ, and academic
aptitude scores. Further, these latent constructs are often used as predictors (or outcomes) in
regression-type models, alongside other variables that may themselves be latent or be directly
observed. The analysis of latent variables is both statistically and conceptually appealing. SEM
allows researchers to distinguish between observed and latent variables and to explicitly model
both types of variables, thereby allowing researchers to test a wider variety of hypotheses than
would be possible with traditional statistical techniques such as multiple regression or ANOVA
(Kline, 1998).
Structural equation models explicitly account for measurement error (Raykov & Marcoul-
ides, 2000) and allow researchers to separate “true variance” (variance that is common among
indicators of a single construct) from “error variance” or “disturbance” (variance due to other
factors, including error in measurement). In standard multiple regression, measurement error within
the predictor variables attenuates the regression weight from the predictor variable to the depen-
dent variable (Baron & Kenny, 1986; Campbell & Kenny, 1999). This underestimation reduces the
analyst’s ability to explain the outcome. Because SEM analyses use multiple indicators to estimate
the effects of latent variables, SEM corrects for unreliability within the construct and provides
more accurate estimates of the relationship between the latent variable and the criterion.

Correspondence to: D. Betsy McCoach, University of Connecticut, Neag School of Education, 249 Glenbrook Road,
Unit 2064, Storrs, CT 06269. E-mail: betsy.mccoach@uconn.edu

461
462 McCoach, Black, and O’Connell

Another strength of SEM is the ease with which it allows researchers to model the direct,
indirect, and total effects of a system of variables, thereby facilitating the development and testing
of mediational models. A mediator variable is an intervening variable that explains the relation-
ship between a predictor variable and a dependent variable (Baron & Kenny, 1986). Mediational
models allow researchers to treat a single variable as both an independent variable and a depen-
dent variable, thereby providing the researcher with an opportunity to test a variety of complex
simultaneous equation models. Latent variable models are particularly well suited to modeling
mediation. In traditional models, measurement error in the mediator tends to produce an under-
estimate of the effect of the mediator and an overestimate of the effect of the independent variable
on the dependent variable when all of the path coefficients are positive (Baron & Kenny, 1986).
However, because latent variable models incorporate measurement error into the model, they
allow for a more accurate test of mediational effects (Baron & Kenny).
Perhaps most importantly, SEM provides a comprehensive statistical approach for test-
ing existing hypotheses about relations among observed and latent variables (Hoyle, 1995).
Using SEM, researchers can specify an amazing array of models a priori and assess the degree
to which a given model fits their sample data. SEM also allows researchers to test competing
theoretical models to determine which model best reproduces the observed variance/covariance
matrix.
However appealing and elegant SEM may be, it is a data analytic technique, and, as such, it
is incapable of resolving problems in theory or design. “These methods have greatly increased the
rigor with which one can analyze correlational data, and they solve many major statistical prob-
lems that have plagued this kind of data. However, they solve a much smaller proportion of the
interpretational—inferential in the broader sense—problems” (Cliff, 1983, p. 116, emphasis in
the original). This article considers several errors of inference that researchers may commit when
using SEM techniques. Our review addresses five related areas of common misconception and
misinterpretation: statistical hypothesis testing, model equivalence, model modification, model fit
versus model prediction, and causality. Our goal is to encourage readers to appropriately deal with
the interpretational as well as the statistical aspects of SEM.
1. Hypothesis testing, statistical significance, and power
Statistical significance testing in SEM follows the same general principles as it does in other
types of hypothesis testing (ANOVA, regression, etc.). Conceptually, the chi-square global test of
statistical significance in SEM assesses whether the specified model fits the data. Formally, the
chi-square test assesses whether the discrepancy between the model-implied covariance matrix
and the observed covariance matrix is likely to be due to sampling error. In addition, statistical
tests of individual parameters (such as the regression weights, disturbances, etc.) are used to assess
whether or not the population values of those parameters are zero. Therefore, all criticisms levied
at statistical significance testing in general (e.g., Kline, 2004; Wilkinson and the Task Force on
Statistical Inference, 1999) also apply to the practice of significance testing within the SEM frame-
work. The notion of sampling error is based on the assumption that the sample is randomly selected
from a given population. Given that many applications of SEM use nonprobability samples, the
generalizability of research results from these studies may be questionable.
Hypothesis testing for overall model assessment in SEM differs from traditional tests of
statistical significance, and this has major implications for power within the SEM framework. In
most hypothesis tests, the null hypothesis specifies there is no relationship among variables (or
no difference between groups). Generally speaking, in such a scenario, we want to reject the null
hypothesis and conclude that there is a statistically significant relationship (or difference). In
SEM, the logic is reversed. We test the null hypothesis that covariance matrix implied by a

Psychology in the Schools DOI: 10.1002/pits


Errors of Inference in SEM 463

particular model equals the population covariance of observed variables. Assuming that the
distributional assumptions (normality, etc.) for the data are satisfied, one can use a test statistic
with a chi-square ( ␹ 2 ) distribution to test the null hypothesis that the specified model exactly
reproduces the population covariance matrix of observed variables (Bollen & Long, 1993). We
evaluate exact model fit by comparing the chi-square of the specified model to the critical value
for chi-square for its degrees of freedom (i.e., for the number of parameters estimated in the
model). When chi-square is statistically significant, we reject the null hypothesis that the differ-
ence between the observed covariance matrix and the model implied covariance matrix is likely
to be due to mere sampling error.
Tests of exact fit have several shortcomings. First, ␹ 2 is very sensitive to sample size; there-
fore, almost any model with a large sample size will be rejected if there is even a miniscule
amount of data misfit. Most estimation methods in SEM require large sample sizes. A researcher
wants to fail to reject the null hypothesis. Having large sample sizes works against the researcher;
they provide more power to reject the null hypothesis that the model fits the data. Second, know-
ing that the model-implied covariance matrix does not exactly fit the population covariance matrix
does not tell us about the degree to which the model does or does not fit the data or why the
model does not fit the data. Further, given the complexities of reality and our desire for parsi-
mony, our models are often simplifications of the processes that we study. Therefore, it should
be no surprise that model-implied covariance matrices fail to exactly reproduce population covari-
ance matrices. In the social sciences, “it is implausible that any model that we use is anything
more than an approximation to reality. Since a null hypothesis that a model fits exactly in some
population is known a priori to be false, it seems pointless even to try to test whether it is true”
(Browne & Cudeck, 1993, p. 137).
In addition to conducting global tests of model fit, researchers can compare competing, nested
models using the chi-square difference test. “Whereas authors regularly acknowledge the effects
of sample size on the power of the omnibus chi-square test of exact fit, they only rarely address the
sample size effect on the power of nested chi-square tests” (Tomarken & Waller, 2003, p. 591). In
chi-square tests of differences for nested models, small sample sizes tend to favor more parsimo-
nious models, whereas large sample size tend to favor less parsimonious models.

Type I Error
Although researchers pay a great deal of attention to the control of Type I error in ANOVA-
type models, the issue of Type I error control has received considerably less attention in the SEM
literature (Green & Babyak, 1997; Hancock, 2000). Although this principle is often ignored,
controlling Type I errors across multiple tests of individual parameters is important in SEM.
Researchers who do not control Type I error rate “run the risk of including parameters in their
model that are statistically significant, but are due only to random sampling fluctuation” (Green &
Babyak, p. 39). Failing to control Type I error rate can lead researchers to free parameters that are
not actually statistically significantly different from their fixed values in the population. This
practice results in less parsimonious (and perhaps less replicable) models, affecting the research-
ers’ ability to draw valid inferences (Green & Babyak).
Hancock (2000) proposed a Scheffe-like procedure for model modifications in SEM; the
chi-square change must exceed the critical value of chi-square with the number of degrees of
freedom (df ) in the model to free the parameter of interest. For example, where a specified model
has 30 df, the modification index would need to exceed 43.77 (the critical value of chi-square with
30 df ) to consider making the proposed modification. If the researcher made the first proposed
modification, the next modification would need to exceed 42.56 (the critical value of chi-square
with 29 df ) in order to be adopted.

Psychology in the Schools DOI: 10.1002/pits


464 McCoach, Black, and O’Connell

While Hancock’s approach seems quite logical, it could be overly conservative for large
measurement models, as they often have hundreds of degrees of freedom and highly correlated
indicators. However, this approach appears to be sensible for many different types of SEM and
helps protect against overly capitalizing on chance when making model modifications.

2. Equivalent Models
In SEM, the researcher must specify a particular model a priori, and there are often a myriad
of models that are statistically equivalent to the researcher’s hypothesized model. Equivalent
models have different causal structures, but produce identical fit to the data (Hershberger, 2006).
Two models are said to be equivalent if they reproduce the same set of model-implied covariance
(and other moment) matrices (Hershberger; Raykov & Penov, 1999; Tomarken & Waller, 2003).
Equivalent models produce identical values for the discrepancy between the model-implied matrix
and the observed matrix; therefore, they will result in identical values for model chi-square and
model fit indices. Perhaps the simplest example of model equivalence is to reverse the causal paths
in a path analytic diagram. For example, specifying that X r Y r Z is equivalent to specifying
that Z r Y r X. For complex models, there are often several (if not dozens!) of functionally
equivalent models that the researcher has not tested.1 Even when a model fits the data well, any
statistically equivalent models would fit the data equally well (Tomarken & Waller). Equivalent
models can lead to substantially different theoretical or substantive conclusions (Hershberger;
MacCallum, Wegener, Uchino, & Fabrigar, 1993; Tomarken & Waller). Unfortunately, researchers
often fail to recognize the existence of equivalent models or consider equivalent models when
interpreting the results of their research (MacCallum et al., 1993).
In SEM, it is impossible to confirm a model. Although we may fail to confirm a model, we
can never actually establish its veracity (Cliff, 1983). Statistical tests and descriptive fit indices
can never prove that a model is correct (Tomarken & Waller, 2003). Rather, they suggest that the
discrepancy between the observed variance covariance and the model-implied variance covari-
ance matrix is relatively small. Therefore, one can reasonably conclude that the model “provides
an acceptable description of the data examined” (Biddle & Marlin, 1987, p. 9), in the sense that the
covariance matrix implied by the specified model sufficiently reproduces the actual covariance
matrix. Moreover, “when the data do not disconfirm a model, there are many other models that are
not disconfirmed either” (Cliff, p. 117), given the number of untested models that are statistically
equivalent to the specified model. Therefore, in the best case scenario, when we achieve good fit,
we can conclude our model “is one plausible representation of the underlying structure from a
larger pool of plausible models” (Tomarken & Waller, p. 580).
In addition, the possibility exists that an untested model will provide even better fit to the data.
Unfortunately, there is no way to protect against this possibility. Researchers should test an
assortment of plausible competing models to minimize this possibility; however, because of
the number of rival alternative models, the testing of multiple competing models does not eliminate
the possibility that an untested model may provide better fit to the data than does the researcher’s model.
Therefore, any specified model is a tentative explanation and is subject to future disconfirmation.

3. Model Modification and Exploratory Analyses


One of the most common and controversial practices in SEM is model modification. When
the model that was specified a priori does not exhibit good fit to the data, the temptation to modify

1
Hershberger (2006), Lee and Hershberger (1990), and Stelzl (1986) demonstrate rules for generating multiple equiv-
alent models.

Psychology in the Schools DOI: 10.1002/pits


Errors of Inference in SEM 465

the model to achieve better fit can be irresistible. A specification search involves making
modifications to a model in an effort to improve it in some way, either by specifying a more
parsimonious model or by “fitting” the data more closely by freeing parameters (MacCallum,
1986). Most commonly, the process of the specification search is informed by the values of
the modification indices and t values for parameter estimates (MacCallum). Ideally this process
is also guided by a well-grounded hypothesis or theory such that resulting models are “sub-
stantively meaningful.” Ultimately, the goal of the specification search is to produce a model
that accurately represents the relationships among the variables of interest within the popula-
tion (MacCallum). Fortunately or otherwise, current SEM software programs provide sug-
gested model modifications, based solely on statistical criteria. The researcher is then left to
determine what, if any, model modifications are warranted. Although using empirical data to
guide decision making may be helpful for “simple” modifications, they do not tend to inform
“major changes in structure,” and some indications for change may be “nonsensical”
(Bollen, 1989, p. 296). Moreover, “the entire logic of confirmatory analysis is undermined
when the same set of data is used both to develop a model and to evaluate its fit” (Breckler,
1990, p. 268).
Further, there is no guarantee that making modifications to the model to improve model fit
results in a more “correct” model. If the initial model is incorrect, it is unlikely that specification
searches will result in the correct model (Kelloway, 1995). In his study comparing restricted and
unrestricted specification searches with samples of 100 and 300 observations, MacCallum (1986)
found high rates of Type II decision error and concluded that specification searches based on
“substantive and theoretical information” were more successful than those conducted without
restriction (made automatically based solely on indexes of fit) (p. 118). Further he noted that using
nonsignificant chi-square to prompt the end of a search may be misleading and that better fitting
models could be achieved by continuing to modify models after a nonsignificant chi-square was
found.
MacCallum, Roznowski, and Necowitz (1992) conducted a series of simulation studies, con-
cluding that sequential specification searches (where parameters are freed in succession based on
the largest modification indices) produced unstable and unreliable results, especially in small- to
moderate-sized samples. Even large samples were not immune to the problems inherent in such
searches. In addition, modified models did not consistently cross-validate at small- to moderate-
sample sizes. Further, among those models that did cross-validate, results were sometimes unsta-
ble across repeated samples. They concluded that “models produced by mechanical specification
searches in samples that are not extremely large are likely to be influenced by chance character-
istics of the sample” (MacCallum et al., p. 502).
Blindly following the suggestions of modification indices is contraindicated. The more mod-
ifications are made to fit a model to a particular set of sample data, the more likely the model is
to be influenced by the idiosyncracies of those data and the less likely it is to estimate true
population parameters. Models with more parameter estimates may fit the data better simply
because of “chance fluctuations” in the sample data. In essence, we can overfit a model to a set
of data, rendering it ungeneralizable. “A model cannot be supported by a finding of good fit
to the data when that model has been modified so as to improve its fit to that same data” (Mac-
Callum, 2001, p. 129). It is also important to remember that a model that exhibits good fit is
not necessarily a good model. “It is easy to draw the wrong conclusions when the good fit of the
measurement model masks the bad fit of the causal [structural] model that is generally the prime
object of the investigation” (McDonald, 2004, p. 688). Modifying existing models until they
reach some criterion of good fit is a questionable practice. The true test of a “good” model is its
ability to be replicated with other, independent samples (Bollen, 1989).

Psychology in the Schools DOI: 10.1002/pits


466 McCoach, Black, and O’Connell

4. Model fit Versus Model Prediction

A model may exhibit adequate fit and yet do a poor job of predicting the criterion variable of
interest. Many researchers who would never neglect to report the R-squared value for a multiple
regression analysis seem to overlook the importance of reporting similar measures of variance
explained within a structural equation modeling perspective. In SEM, a great deal of emphasis is
placed on model fit. This may cause some researchers to lose sight of the fact that a good fitting
model can explain very little variability in the variable(s) of interest.
To assess model prediction for a given endogenous (dependent) variable, it necessary to
compute the proportion of variance in the variable that is explained by the model. Determining the
proportion of variance in a given endogenous variable that is explained by the specified model is
quite straightforward for recursive models (models with unidirectional paths and uncorrelated
disturbances). The ratio of the variance of the disturbance (or error) to the observed variance
represents the proportion of unexplained variance in the endogenous variable. Therefore, R 2 is
simply 1 minus that ratio (Kline, 2005). Determining the variance explained in nonrecursive
models is more problematic. For information pertaining to the calculation of R-square in nonrecur-
sive models, the interested reader is referred to Bentler and Raykov (2000).
As Tomarken and Waller (2003) demonstrated, a good fitting model may account for less than
1% of the variance in an endogenous variable of interest. In fact, “a good fitting model can be obtained
even when none of the estimated parameters are different from 0” (Kelloway, 1995, p. 217). Con-
versely, a poor fitting model may account for a large amount of the variance in an endogenous vari-
able. There are at least two reasons for the lack of congruence between model fit and model prediction.
First, model fit is concerned with the degree to which the model can reproduce the pattern of observed
covariances, not whether the covariances among variables in the system are large or small. In addi-
tion, in SEM, as power increases, measures of model fit tend to decrease. Further, power tends to
increase as the reliability of indicators increases. In other words, the greater the proportion of
variance in the indicators that is explained by the latent variables, and the smaller the errors of the
manifest indicators, the more power the model will tend to have (Tomarken & Waller, 2003) and
the more likely the researcher will be to reject the null hypothesis that the observed covariance
matrix and the model-implied covariance matrix are equal. Browne, MacCallum, Kim, Anderson,
and Glaser (2002) demonstrated that chi-square and chi-square-based fit indices are more sensitive
to model misfit when unique variances (errors, disturbances, etc.) are smaller. Therefore, models
with highly reliable manifest indicators tend to exhibit worse fit than models with less reliable
indicators (Browne et al., 2002). Whereas fit indices such as the root mean square residual (RMR)
and standardized root mean square residual (SRMR) measure model misfit in the truest sense,
chi-square and chi-square fit-based indices measure “misfit detectibility.” “Detectibility of misfit
depends not only on misfit, but also on accuracy of measurement, as reflected in the unique
variances” (p. 417). Further, power is affected by the magnitudes of the covariances among the
variables and the number of variables per construct (Dolan, Wicherts, & Molenaar, 2004; Tomar-
ken & Waller, 2003). This means that a researcher using a subscale with a large number of items
that have large communalities will have more power than a researcher who uses a less reliable
scale (Tomarken & Waller). However, because power in this scenario refers to the power to reject
the null hypothesis that the model fits the data, the researcher who uses the longer, more reliable
scale may actually end up with a more poorly fitting model! Fornell and Larcker (1981) observed
this peculiar quality 25 years ago and remarked that the chi-square test “may indicate a good fit
between the hypothesized model and the data even though both the measures and the theory are
inadequate. In fact, the goodness of fit can actually improve as properties of the measures and/or
the relationships between the theoretical constructs decline” (Fornell & Larcker, p. 40).

Psychology in the Schools DOI: 10.1002/pits


Errors of Inference in SEM 467

What are the implications for applied researchers? First, as with other statistical techniques,
it is important to report measures of effect size, rather than simply relying on tests of statistical
significance. In the case of SEM, this requires documenting the proportion of variance explained
for all endogenous variables in SEM models. Unfortunately many empirical papers neglect to
provide such information (Tomarken & Waller, 2005). Second, researchers should examine resid-
uals (or residual-based fit indices such as the SRMR) and R-square measures in addition to mea-
sures of model fit when they are evaluating the adequacy of a given model. Finally, if the proportion
accounted for is quite low, the utility of a given model should be scrutinized.

5. The Inference of Cause in Structural Equation Modeling


All researchers are familiar with the adage correlation does not imply causation. In the absence
of experimental manipulation, is it ever reasonable to make causal inferences? Structural models
are postulated “models of causality that may or may not correspond to causal sequences in the real
world” (Kline, 2005, p. 324). Causality is an assumption rather than a consequence of SEM
(Brannick, 1995). Using SEM allows us to ascertain whether a hypothesized causal structure is
consistent or inconsistent with the data; however, the assertion of causal inferences ultimately
depends “on criteria that are separate from that analytic system” (Kazantzis, Ronan, & Deane,
2001, p. 1080). Conclusively demonstrating causality requires active control or manipulation of
variables. “With correlational data, it is not possible to isolate the empirical system sufficiently so
that the nature of the relations among the variables can be unambiguously ascertained” (Cliff,
1983, p. 119).
However, correlational data can be a valuable prelude to determining causality, as they expose
causal hypotheses to possible disconfirmation (Campbell & Stanley, 1963). In this sense, correla-
tional designs serve as an initial screening of variables before they are “passed through” to exper-
imentation, where an absence of correlation precludes doing so.
Bollen (1989) makes a distinction between demonstrating a causal relationship with “abso-
lute certainty” and perfect predictability with statements of “probabilistic” cause (pp. 41 and 43),
saying that for the former to be truly realized, two variables would have to coexist in a vacuum,
completely isolated from other influences, a condition he calls “impossible” (p. 41). A probabi-
listic cause exists if the occurrence of an event X increases the likelihood of another event Y,
without suggesting that Y must always follow X (i.e., there is no other cause of Y ).
There is at least some agreement in the literature about three requirements for an inference
of probabilistic cause from correlational analysis (Bollen, 1989; Kline, 2005; Schumacker &
Lomax, 1996): isolation, association, and direction of influence (Bollen). In addition, the pro-
posed causal relationship must correspond with a “plausible causal hypothesis and the absence
of plausible rival hypotheses to explain the correlation upon other grounds” (Campbell & Stan-
ley, 1963, p. 65).

Isolation
Isolation exists when one variable is demonstrated to be the function of another (or a set of
others) and no other influences. Bollen (1989) calls this relationship an “unobtainable ideal,”
and asserts that “pseudo-isolation” is an acceptable alternative. In a condition of pseudo-
isolation, Y is demonstrated to be the function of a set of exogenous variables plus disturbance,
an error term that represents the combined effect of all unknown influences. Further, the distur-
bance must be uncorrelated with the set of exogenous variables. Pseudo-isolation is violated
when a correlation exists between the disturbance term and one or more of the explanatory
variables. As a basis for inferring causal relations, the structural model must include all relevant
explanatory variables (i.e., the model must contain no specification error), those variables must

Psychology in the Schools DOI: 10.1002/pits


468 McCoach, Black, and O’Connell

be measured reliably, and the disturbance terms must covary neither with each other nor with
other variables in the model (Bollen).

The Omitted Variable Problem


Given a desirable fit statistic, a researcher may assume that the model is completely specified
in that it includes all relevant variables. Unfortunately, fit indices are not necessarily sensitive to
this type of misspecification (Tomarken & Waller, 2005), and incorrect conclusions can be drawn
from well-fitting models that have omitted variables. Because the residual terms in a model will
account for any omitted variables, “the residual parameterizations afforded by SEM software can
mask the limitations of a rather incomplete model” (Tomarken & Waller, p. 49). Incomplete
models not only misrepresent the relationships among variables, but parameter estimates based on
these models are likely to be biased and standard errors inaccurate (Tomarken & Waller). Further,
an omitted variable “may account entirely for effects that are mistakenly attributed to variables
explicitly included in a model” (Tomarken & Waller, 2003, p. 584). Therefore, users of SEM must
assess the likelihood that their model omits important variables, and they should explicitly acknowl-
edge this possibility when reporting their results.

Association
The second requirement is that of association. Specifically, once pseudo-isolation is estab-
lished, the correlation between a dependent variable and its proposed causal variable, after con-
trolling for all other influences (i.e., the partial correlation), must be nonzero. Further, the degree
of association must be estimated under appropriate model assumptions. Most basically, research-
ers should ensure that data structures meet the necessary model assumptions before making infer-
ences about parameters. In addition, the researcher must verify a linear relationship between the
pair of relevant variables (or appropriately model a nonlinear relationship), must account for the
nature of any missing data, and must examine heteroskedasticity of disturbances and multicollin-
earity among predictors as potential sources of error in this parameter estimation (Bollen 1989).
Following that, the estimated partial correlation between the explanatory and causal variables
must be statistically significant as a necessary but insufficient condition for causal inference.
Direction of Causation
Temporal precedence is critical in determining the direction of variable influence (Bollen,
1989). Simply, the purported causal variable must precede the event that it “caused.” In the absence
of experimental manipulation, observations over time “are essential” in establishing this antecedent–
consequence relationship (Campbell & Stanley, 1963, p. 65), and SEM with longitudinal data
provide “the strongest case” for causal conclusions (Biddle & Marlin, 1987, p. 5).
Once the researcher is confident that the specified structural model meets these three condi-
tions and that a plausible hypothesis supports the inferred relationship, he must then rule out the
plausibility of rival hypotheses (Campbell & Stanley, 1963) and models. This is generally the most
difficult criterion to meet with nonexperimental data. If the model survives to this point, it may be
reasonable to conclude, probabilistically, that a causal relationship has been demonstrated. How-
ever, any statements to that effect must expressly state the tentativeness and degree of (un)cer-
tainty of the inference. In other words, correlational data may be suggestive of causal relations.
However, they cannot establish these relations (Cliff, 1983). Finally, design plays a crucial role in
the ability to make causal inferences. “The primary basis for causal inference in structural equa-
tion modeling is the same as the basis for causal inference in any other statistical technique; the
design of the data collection. Strong designs lead to strong causal inferences, weak designs to
weak inferences” (Kelloway, 1995, p. 216).

Psychology in the Schools DOI: 10.1002/pits


Errors of Inference in SEM 469

Conclusion
Models can be useful if they are not grossly wrong—useful for prediction, for testing and
developing theories, for clarifying the nature of the world. (MacCallum, 2001, p. 136)
Structural equation modeling has been both demonized and canonized in recent literature
(Meehl & Waller, 2002). In reality, SEM is not a magical technique that transforms correlational
data into uncontestable models. Rather, SEM is “a cutting edge statistical technique that is subject
to some very old and familiar problems, constraints, and misconceptions” (Tomarken & Waller,
2005, p. 56). When applied and interpreted correctly, SEM is an invaluable tool that helps us to
make sense of the complexities of our world, and SEM offers several advantages over traditional
statistical techniques. However, SEM does not replace the need for good design and sound judg-
ment. “No amount of sophisticated analyses can strengthen the inference obtainable from a weak
design” (Kelloway, 1995, p. 216), and no analytic method can replace the need for critical appraisal
and common sense. SEM allows researchers a great deal of flexibility and control over their
analyses. This may be used or misused. Therefore, producers and consumers of SEM should
realistically evaluate the strengths and limitations of this technique and should interpret the results
of SEM analyses cautiously and thoughtfully.

References
Baron, R.M., & Kenny, D.A. (1986). The moderator-mediator variable distinction in social psychological research: Con-
ceptual, strategic, and statistical considerations. Psychological Bulletin, 51, 1173–1182.
Bentler, P.M., & Raykov, T. (2000). On measures of explained variance in non-recursive structural equation models.
Journal of Applied Psychology, 85, 125–131.
Biddle, B.J., & Marlin, M.M. (1987). Causality, confirmation, credulity, and structural equation modeling. Child Devel-
opment, 58, 4–17.
Bollen, K.A. (1989). Structural equations with latent variables. New York: John Wiley & Sons.
Bollen, K.A., & Long, J.S. (1993). Introduction. In K.A. Bollen & J.S. Long (Eds.) Testing structural equation models
(pp. 1–9). Newbury Park, CA: Sage.
Brannick, M.T. (1995). Critical comments on applying covariance structure modeling. Journal of Organizational Behavior,
16, 201–213.
Breckler, S.J. (1990). Applications of covariance structure modeling in psychology: Cause for concern. Psychological
Bulletin, 107, 260–273.
Browne, M.W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K.A. Bollen & J.S. Long (Eds.) Testing
structural equation models (pp. 136–162). Newbury Park, CA: Sage.
Browne, M.W., MacCallum, R.C., Kim, C., Anderson, B.L., & Glaser, R. (2002). When fit indices and residuals are
incompatible. Psychological Methods, 7, 403– 421.
Campbell, D.T., & Kenny, D.A. (1999). Regression artifacts. New York: Guilford Press.
Campbell, D.T., & Stanley, J.C. (1963). Experimental and quasi-experimental designs for research. Dallas, TX:
Houghton-Mifflin.
Cliff, N. (1983). Some cautions concerning the application of causal modeling methods. Multivariate Behavioral Research,
18, 115–126.
Dolan, C.V., Wicherts, J.M., & Molenaar, P.C.M. (2004). A note on the relationship between the number of indicators and
their reliability in detecting regression coefficients in latent regression analysis. Structural Equation Modeling, 11,
210–216.
Fornell, C., & Larcker, D.F. (1981). Evaluating structural equation models with unobservable variables and measurement
error. Journal of Marketing Research, 18, 39–50.
Green, S.B., & Babyak, M.A. (1997). Control of type I errors with multiple tests of constraints in structural equation
modeling. Multivariate Behavioral Research, 32, 39–51.
Hancock, G.R. (2000). A sequential Scheffe-type respecification procedure for controlling Type I error in exploratory
structural equation model modification. Structural Equation Modeling, 6, 158–168.
Hershberger, S.L. (2006). The problem of equivalent structural models. In G.R. Hancock & R.O. Mueller (Eds.), Structural
equation modeling: A second course. Greenwich, CT: Information Age Publishing.
Hoyle, R.H. (1995). The structural equation modeling approach: Basic concepts and fundamental issues. In R.H. Hoyle
(Ed.), Structural equation modeling (pp. 158–176). Thousand Oaks, CA: Sage.

Psychology in the Schools DOI: 10.1002/pits


470 McCoach, Black, and O’Connell

Kazantzis, N., Ronan, K.R., & Deane, F.P. (2001). Concluding causation from correlation: A comment on Burns and
Spangler (2000). Journal of Consulting and Clinical Psychology, 69, 1079–1083.
Kelloway, E.K. (1995). Structural equation modeling in perspective. Journal of Organizational Behavior, 16, 215–224.
Kline, R.B. (1998). Principles and practice of structural equation modeling. New York: Guilford Press.
Kline, R.B. (2004). Beyond significance testing. Washington, DC: American Psychological Association.
Kline, R.B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York: Guilford Press.
Lee, S., & Hershberger, S. (1990). A simple rule for generating equivalent models in covariance structure modeling.
Multivariate Behavioral Research, 25, 313–334.
MacCallum, R.C. (1986). Specification searches in covariance structure modeling. Psychological Bulletin, 100, 107–120.
MacCallum, R.C. (2001). Working with imperfect models. Multivariate Behavioral Research, 38, 113–139.
MacCallum, R.C., Roznowski, M., & Necowitz, L.B. (1992). Model modifications in covariance structure analysis: The
problem of capitalization on chance. Psychological Bulletin, 111, 490–504.
MacCallum, R.C., Wegener, D.T., Uchino, B.N., & Fabrigar, L.R. (1993). The problem of equivalent models in applica-
tions of covariance structure analysis. Psychological Bulletin, 114, 185–199.
McDonald, R.P. (2004). The specific analysis of structural equation models. Multivariate Behavioral Research, 39, 687–713.
Meehl, P.E., & Waller, N.G. (2002). The path analysis controversy: A new statistical approach to strong appraisal of
verisimilitude. Psychological Methods, 7, 283–300.
Raykov, T., & Marcoulides, G.A. (2000). A first course in structural equation modeling. Mahwah, NJ: Erlbaum.
Raykov, T., & Penov, S. (1999). On structural equation model equivalence. Multivariate Behavioral Research, 34, 199–244.
Schumacker, R.E., & Lomax, R.G. (1996). A beginner’s guide to structural equation modeling. Mahwah, NJ: Erlbaum.
Stelzl, I. (1986). Changing a causal hypothesis without changing the fit: Some rules for generating equivalent path models.
Multivariate Behavioral Research, 21, 309–331.
Tomarken, A.J., & Waller, N.G. (2003). Potential problems with well-fitting models. Journal of Abnormal Psychology, 112,
578–598.
Tomarken, A.J., & Waller, N.G. (2005). Structural equation modeling: Strengths, limitations, and misconceptions. Annual
Review of Clinical Psychology, 1, 31– 65.
Wilkinson, L., and the Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines
and explanations. American Psychologist, 54, 594– 608.

Psychology in the Schools DOI: 10.1002/pits

S-ar putea să vă placă și