Sunteți pe pagina 1din 20

Week 14

Chapter 16 Partial Correlation and


Multiple Regression and Correlation

Chapter 16
Partial Correlation and Multiple
Regression and Correlation

In This Presentation
Partial correlations
Multiple regression
Using the multiple regression line to predict Y
Multiple correlation coefficient (R2)
Limitations of multiple regression and

correlation

Introduction
Multiple Regression and Correlation allow us

to:
1.
2.
3.

Disentangle and examine the separate


effects of the independent variables.
Use all of the independent variables to
predict Y.
Assess the combined effects of the
independent variables on Y.

Partial Correlation
Partial Correlation measures the correlation

between X and Y controlling for Z


Comparing the bivariate (zero-order)
correlation to the partial (first-order)
correlation allows us to determine if the
relationship between X and Y is direct,
spurious, or intervening
Interaction cannot be determined with
partial correlations

Partial Correlation
Note the subscripts in the symbol for a partial

correlation coefficient:
rxyz
which indicates that the correlation coefficient is for X
and Y controlling for Z

Partial Correlation
Example
The table below lists husbands hours of housework per week (Y),
number of children (X), and husbands years of education (Z) for a
sample of 12 dual-career households

Partial Correlation
Example
A correlation matrix appears below
The bivariate (zero-order) correlation between husbands
housework and number of children is +0.50
This indicates a positive relationship

Partial Correlation
Example
Calculating the partial (first-order) correlation between
husbands housework and number of children controlling for
husbands years of education yields +0.43

Partial Correlation
Example
Comparing the bivariate correlation (+0.50) to
the partial correlation (+0.43) finds little
change
The relationship between number of children
and husbands housework controlling for
husbands education has not changed
Therefore, we have evidence of a direct
relationship

Multiple Regression
Previously, the bivariate regression equation was:

In the multivariate case, the regression equation


becomes:

Multiple Regression
Y = a + b1X1 + b2X2
Notation
a is the Y intercept, where the regression line crosses the
Y axis
b1 is the partial slope for X1 on Y
b1 indicates the change in Y for one unit change in X1,

controlling for X2
b2 is the partial slope for X2 on Y
b2 indicates the change in Y for one unit change in X2,

controlling for X1

Multiple Regression using SPSS

Suppose we are interested in the link between Daily Calorie Intake


and Female Life Expectancy in a third world country
Suppose further that we wish to look at other variables that might
predict Female life expectancy

One way to do this is to add additional variables to the equation and


conduct a multiple regression analysis.
E.g. literacy rates with the assumption that those who read can
access health and medical information

Multiple Regression using SPSS:


Steps to Set Up the Analysis

In Data Editor go to Analyze/ Regression/


Linear and click Reset
Put Average Female Life Expectancy into
the Dependent box
Put Daily Calorie Intake and People who
Read % into the Independents box
Under Statistics, select Estimates,
Confidence Intervals, Model Fit,
Descriptives, Part and Partial Correlation,
R Square Change, Collinearity
Diagnostics, and click Continue
Under Options, check Include Constant in
the Equation, click Continue and then OK
Compare your output to the next several
slides

Interpreting Your SPSS Multiple


Regression Output

First lets look at the zero-order (pairwise) correlations


between Average Female Life Expectancy (Y), Daily Calorie
Intake (X1) and People who Read (X2). Note that these are .
776 for Y with X1, .869 for Y with X2, and .682 for X1 with X2
Correlations
Average
female life
expectancy
Pearson Correlation

r YX1
r YX2

Sig. (1-tailed)

Average female life


expectancy
Daily calorie intake
People who read (%)
Average female life
expectancy
Daily calorie intake
People who read (%)
Average female life
expectancy
Daily calorie intake
People who read (%)

Daily calorie
intake

People who
read (%)

1.000

.776

.869

.776
.869

1.000
.682

.682
1.000

.000

.000

.000
.000

.
.000

.000
.

74

74

74

74
74

74
74

74
74

r X1X2

Examining the Regression Weights


Coefficientsa

Model
1

(Constant)
People who read (%)
Daily calorie intake

Unstandardized
Coefficients
B
Std. Error
25.838
2.882
.315
.034
.007
.001

Standardized
Coefficients
Beta
.636
.342

t
8.964
9.202
4.949

Sig.
.000
.000
.000

95% Confidence Interval for B


Lower Bound
Upper Bound
20.090
31.585
.247
.383
.004
.010

Zero-order

Correlations
Partial

.869
.776

.738
.506

Part
.465
.250

Collinearity Statistics
Tolerance
VIF
.535
.535

1.868
1.868

a. Dependent Variable: Average female life expectancy

Above are the raw (unstandardized) and standardized regression weights


for the regression of female life expectancy on daily calorie intake and
percentage of people who read.
The standardized regression coefficient (beta weight) for daily caloric intake
is .342.
The beta weight for percentage of people who read is much larger, .636.
What this weight means is that for every unit change in percentage of
people who read (that is, for every increase by a factor of one standard
deviation on the people who read variable), Y (female life expectancy)
will increase by a multiple of .636 standard deviations.
Note that both the beta coefficients are significant at p < .001

R, R Square, and the SEE


Model Summary
Change Statistics
Model
1

R
.905a

R Square
.818

Adjusted
R Square
.813

Std. Error of
the Estimate
4.948

R Square
Change
.818

F Change
159.922

df1

df2
2

71

Sig. F Change
.000

a. Predictors: (Constant), People who read (%), Daily calorie intake

Above is the model summary, which has some important


statistics. It gives us R and R square for the regression of
Y (female life expectancy) on the two predictors. R is .
905, which is a very high correlation. R square tells us
what proportion of the variation in female life expectancy
is explained by the two predictors, a very high .818. It
gives us the standard error of estimate, which we can use
to put confidence intervals around the unstandardized
regression coefficients

F Test for the Significance of the


Regression Equation
ANOVAb
Model
1

Regression
Residual
Total

Sum of
Squares
7829.451
1738.008
9567.459

df
2
71
73

Mean Square
3914.726
24.479

F
159.922

Sig.
.000a

a. Predictors: (Constant), People who read (%), Daily calorie intake


b. Dependent Variable: Average female life expectancy

Next we look at the F test of the significance of the


Regression equation, Y = .342 X1 + .636 X2. Is this so much better a
predictor of female literacy (Y) than simply using the mean of Y that the
difference is statistically significant? The F test is a ratio of the mean square
for the regression equation to the mean square for the residual (the
departures of the actual scores on Y from what the regression equation
predicted). In this case we have a very large value of F, which is significant
at p <.001. Thus it is reasonable to conclude that our regression equation is
a significantly better predictor than the mean of Y.

Confidence Intervals around the


Regression Weights
Coefficientsa

Model
1

(Constant)
Daily calorie intake
People who read (%)

Unstandardized
Coefficients
B
Std. Error
25.838
2.882
.007
.001
.315
.034

Standardized
Coefficients
Beta
.342
.636

t
8.964
4.949
9.202

Sig.
.000
.000
.000

95% Confidence Interval for B


Lower Bound
Upper Bound
20.090
31.585
.004
.010
.247
.383

Zero-order

Correlations
Partial

.776
.869

.506
.738

Part

a. Dependent Variable: Average female life expectancy

Finally, your output provides confidence intervals around the


unstandardized regression coefficients. Thus we can say
with 95% confidence that the unstandardized weight to
apply to daily calorie intake to predict female life expectancy
ranges between .004 and .010, and that the
undstandardized weight to apply to percentage of people
who read ranges between .247 and .383

.250
.465

Limitations
Multiple regression and correlation are among the most powerful
techniques available to researchers. But powerful techniques have
high demands.
These techniques require:
Every variable is measured at the interval-ratio level
Each independent variable has a linear relationship with the
dependent variable
Independent variables do not interact with each other
Independent variables are uncorrelated with each other
When these requirements are violated (as they often are), these
techniques will produce biased and/or inefficient estimates. There
are more advanced techniques available to researchers that can
correct for violations of these requirements. Such techniques are
beyond the scope of this text.

S-ar putea să vă placă și