Sunteți pe pagina 1din 41

Multiple regression analysis

Prof. Rakesh Pandey


Department of Psychology
Banaras Hindu University
Multivariate thinking
• Herbert Simon (1969) states that “we are
attempting to find the basic simplicity in the overt
complexity of life.”
• Margaret Wheatley (1994), a social scientist
working with organizations, suggests that “we are
seeking to uncover the latent order in a system”
• Search for simplicity and latent order could be
made much more attainable when approached
with a mindset of multivariate thinking
• Multivariate thinking is defined as a body of
thought processes that illuminate
interrelatedness between and within sets of
variables.
Multivariate statistics
• Example of Univariate and Bivariate methods
• MV statistics may be considered extension of
univariate or bivariate statistics
• In this sense MV statistics are a complete or general
case whereas the univariate and bivarite methods are
the special cases of this general MV model
• In general, MV methods involve analysis of many
variables
• Thus, MV provide tools to analyze many variable
research problem in a single analysis rather than doing
a series of univariate or bivariate analyses
• BUT MV cannot be defined only in terms of number of
IVs and DVs
Multivariate statistics
• Some features taken together can help us understand
MV
– Number of variables must be more than contained in
univariate or bivariate analysis
– Some subsets of these variables must be analyzed together
(i.e., combined in some manner)
– But which kind of variables has to be combined is a matter
of debate.
• For example, Stevens (2002) count only DVs (this exludes MRA, FA
etc.)
• Tabachnick & Fidel (2001) require combining more than one of
each type of variables (i.e., IV and DV). According to them, “With
multivariate statistics, you simultaneously analyze multiple
dependent and multiple independent variables. This capability is
important in both nonexperimental (correlational or survey) and
experimental research.
– Both the said definitions in terms of no. of IV or DV are
problematic
Multivariate Statistics
• Why there is such confusion in defining MV?
• My speculation (lack of distinction between variable and
variate)
• Variables are directly measured in the process of data
collection whereas the variates are not directly
measured in the process of data collection
• Variates are the weighted composites of two or more
directly measured variables
• Such weighed composites are often called as variates,
composite or synthetic variables (Grimm & Yernold,
2000).
• The most commonly used method of forming composite
is a linear combination of weighted variables ( composite
= w1X1 + w2X2 + …..)
Multivariate Statistics
• Thus, taking the meaning of variate into
account, the MV statistics may be defined as a
statistical methods that combines a set of
variables together in the process of analysis.
• Multivariate analysis or statistics involves
combining together variables to form
composite variables (Meyers, Gamst &
Guarino, ……….).
• The composite variables may include IV (e.g.,
MRA) and/or DV (MANOVA)or the variables
may be combined without making any
distinction of IV and DV (e.g., Factor analysis).
Benefits of MV
• Our thinking is stretched to embrace a larger context in which
we can envision more complex and realistic theories and models
than could be rendered with univariate methods.
• It provides tool to analsyze data when there are many IVs
and/or DVs
• A thorough grounding in multivariate thinking helps us
understand others' research, giving us a richer understanding
when reading the literature.
• Multivariate thinking helps expand our capabilities by informing
application to our own research.
• Multivariate thinking enables researchers to examine large sets
of variables in encompassing and integrated analysis, thereby
controlling for overall error rate and also taking correlations
among variables into account.
• Multivariate thinking reveals several assessment indices to
determine whether the overall or macro-analysis, as well as
specific part or microanalysis, are behaving as expected.
Drawbacks of MV
• Statistical assumptions (e.g., normality, linearity,
and homoscedasticity) common to the general
linear model (McCullagh & Nelder, 1989) must be
met for most multivariate methods.
• Many more participants are usually needed to
adequately test a multivariate design compared
with smaller univariate studies.
• Interpretation of results from a multivariate
analysis may be difficult because of having
several layers to examine.
• Some researchers speculate that multivariate
methods are too complex to take the time to learn
(though it is an inaccurate perception)
Regression: Basics
• Correlation provide important information to
make prediction
• Regression analysis is a statistical tool to make
prediction about a variable based on its
relationship with one or more other variables
• The variable being predicted is called criterion or
outcome variable
• The variables used to predict the outcome
variables are called predictor or process variables
• Regression analysis is a way of predicting an
outcome variable from one predictor variable
(simple regression) or several predictor variables
(multiple regression)
Regression: Basics
• Correlation & regression go hand in hand
• Correlation provides the information about the degree and
direction of correspondence between two or more
variables
• It is, however, limited to the given data set
• On the other hand, regression analysis use the information
contained in correlation and fit a model to the obtained
data and allows us to go beyond the data to predict the
outcomes on the basis of one or more predictors
Outcome = model + error
• The model we fit in a regression analysis is linear
• Thus the ‘model’ in the above equation can be replaced by
things that define a line
• There are two things that define line
– 1. Slope (or gradient) & 2. Intercept and thus model cab be
replaced by the equation : Yi = (A + bXi) + εi
Bivariate regression equation
Y-Values
8
7 Y = a + bX
6
5
4
Y-Values
3 b
2
a 1
0
0 2 4 6 8
Drawing best fit line: Least square method
Multiple regression analysis (MRA)
• MRA is a statistical technique used to predict a
quantitatively measured continuous variable
(criterion or outcome of DV) by a set of
quantitatively measured continuous and/or
categorical variables (predictors or IV).

• When the DV or the criterion is categorical and


predictors are continuous then another type of
regression analysis (called Logistic regression
analysis) is used to predict the criterion.
• Thus, MRA is an extension of the bivariate
regression for several IV or predictor variables
• In this sense the MRA is equivalent to bivariate
regression with one DV (Y) and a linear weighted
composite of set of IVs
• If IVs are not correlated with each other then this
composite IV may be the simple sum of the
scores of IV (i.e., weight = 1)
• However, this is not the case in real world. Thus
the composite of IV are created by summing the
orthogonal component of each IV
• Orthogonal component of IV means that
component of an IV in which the contribution of
other IVs have been partialled out.
Diagrammatic representation of MRA
Conceptual Understanding

(X1) 1

(X2) 2

3
(X3) X’ Dependent
Variable
4
(X4)
New Linear
Combination
MRA equation: Raw score form
Standardized regression equation
MRA: Three variable example
Partial correlation

Semi-Partial correlation
Multiple correlation & regression

Standardized regression Coefficient

Unstandardized regression Coefficient

Y= A+B1X1+B2X2
Different strategies to perform MRA
Strategies of MRA: The different ways
of creating the variate
• The Standard (simultaneous) method
– Also called ‘direct method’
• The statistical method
– Forward
– Backward
– Step-wise
• The sequential (hierarchical method)
The standard method
• All the variables are entered into the equation in one
step
• In this method each predictor variable is entered after
controlling for the effect of remaining predictors
– For instance, X1 (controlled for X2 & X3), X2 (controlled for
X1 & X3), X3 (controlled for X1 & X2)
• The total variance explained is the sum of the variance
accounted for by the orthogonal components of IVs
plus the shared variance

• It provides a full model solution, i.e. each predictor is


part of the equation or regression model
Statistical Method
The variables are entered one by one based on some
statistical criteria
• Forward (one variable is added at a time)
– The highest contributing predictor is entered first
– The first variable is retained in the equation and then the
second variable having highest partial correlation (correlation
after partialling out the first variable) is entered next
– The process continues until the addition of variable results in
non-significant contribution
• Backward (one variable is removed at a time)
– Starts as a standard method (all variable added together)
– Then the one variable whose removal will result in lowest and
non-significant change in R square is removed
– The new model is developed after removing the one variable
and the process of evaluation is repeated to identify next
variable whose removal will result in non-significant decrease in
R square
– The process continues until only significant predictors remain in
the model
Statistical Method
• Stepwise ( a combination of forward and
backward method)
– Begin with the forward method and include variables
whose addition increases R square significantly and
stops adding variable when addition results in non-
significant increase in R square
– But unlike forward method in which removal of
variable is not permitted, the stepwise method now
begins to evaluate the unique contribution of each
variables already entered in the equation, and
– Removes that variable whose unique contribution is
not significant. Thus, from this point onward the
backward method steps in
– The model on the next slide demonstrates the
procedure
Sequential method
• The variables in this method are also not entered
simultaneously
• The variables are entered in a sequence BUT the entry
sequence is controlled by the researcher and not by
any statistical criteria or computer program (as is done
in statistical approach).
• The entry sequence of variables are based on some
empirical ground or theory
• Unlike the statistical method in which one variable is
entered (or removed) at a time, the sequential method
allows to enter one or more variable at each step
• Since, a block (or set of variables) can be entered at
each step, this method is also called block entry
analysis.
Sequential method
• It is also called as covariance analysis because
the variables entered at previous steps are
treated as covariates and their effect is
controlled
• The most common use of this method
includes
– Controlling the effect of some covariates
– Moderated regression analysis
– Incremental validity of constructs
Types of research questions addressed
• Predicting one variable from a combined knowledge of several
other variables (simultaneous)
• Exploring the relationship of one variable with a set of other
variables (simultaneous)
• Which variables (from a larger set of variables) are better predictor
of given criterion (statistical)
• How much better can we predict a criterion when we add one or
more predictor variables to the mix (statistical)
• How much variance in a criterion variable can be explained by a set
of predictor variables (simultaneous)
• Among a set of predictor variables which variable accounts for the
largest amount of variance in criterion (statistical)
• The relative significance of variables in predicting a criterion
(statistical)
• To examine the moderating effect of one or more variables
(Sequential)
• How much variance one or more predictors explain after controlling
a set of variables (Sequential)
Basic assumptions
• each of the metric variables are normally
distributed (both IV & DV)
• the relationships between metric variables are
linear, and
• Homoscedasticity
• Independence of errors (Durbin- Watson test)
• Failing to satisfy the assumptions does not
mean that our answer is wrong. It means that
our solution may under-report the strength of
the relationships.
Other issues in performing MRA
• Multicollinearity is a problem in regression
analysis that occurs when two independent
variables are highly correlated, e.g. r = 0.90, or
higher.
• The relationship between the independent
variables and the dependent variables is distorted
by the very strong relationship between the
independent variables, leading to the likelihood
that our interpretation of relationships will be
incorrect.
• In the worst case, if the variables are perfectly
correlated, the regression cannot be computed.
• Problem of outliers and missing values
Data and other requirements
• Criterion variable should be measured on a continuous
scale (such as interval or ratio scale).
• The predictor should be measured on a ratio, interval, or
ordinal scale. Nominal data except dichotomous is not
allowed
• Multiple regression requires a large number of
observations.
• The number of cases (participants) must substantially
exceed the number of predictor variables you are using in
your regression.
• The absolute minimum is that you have five times as many
participants as predictor variables.
• A more acceptable ratio is 10:1, but some people argue
that this should be as high as 40:1 for some statistical
selection methods
Tabulating & Interpreting MRA results
• Use real data analysis to demonstrate

S-ar putea să vă placă și