Pca SPSS

Principal Components
Analysis with SPSS

Karl L. Wuensch
Dept of Psychology
East Carolina University
When to Use PCA

You have a set of p continuous variables.
You want to repackage their variance into
m components.
You will usually want m to be < p, but not
always.
Components and Variables

Each component is a weighted linear
combination of the variables
Ci Wi 1 X 1 Wi 2 X 2 Wip X p
Each variable is a weighted linear
combination of the components.
X j A1 j C1 A2 j C2 Amj Cm
Factors and Variables

In Factor Analysis, we exclude from the
solution any variance that is unique, not
shared by the variables.
X j A1 j F1 A2 j F2 Amj Fm U j
Uj is the unique variance for Xj
Goals of PCA and FA

Data reduction.
Discover and summarize pattern of
intercorrelations among variables.
Test theory about the latent variables
underlying a set a measurement variables.
Construct a test instrument.
There are many others uses of PCA and
FA.
Data Reduction
Ossenkopp and Mazmanian (Physiology
and Behavior, 34: 935-941).
19 behavioral and physiological variables.
A single criterion variable, physiological
response to four hours of cold-restraint
Extracted five factors.
Used multiple regression to develop a
model for predicting the criterion from the
five factors.
Exploratory Factor Analysis

Want to discover the pattern of
intercorrleations among variables.
Wilt et al., 2005 (thesis).
Variables are items on the SOIS at ECU.
Found two factors, one evaluative, one on
difficulty of course.
Compared FTF students to DE students,
on structure and means.
Confirmatory Factor Analysis

Have a theory regarding the factor
structure for a set of variables.
Want to confirm that the theory describes
the observed intercorrelations well.
Thurstone: Intelligence consists of seven
independent factors rather than one global
factor.
Often done with SEM software
Construct A Test Instrument

Write a large set of items designed to test
the constructs of interest.
Administer the survey to a sample of
persons from the target population.
Use FA to help select those items that will
be used to measure each of the
constructs of interest.
Use Cronbach alpha to check reliability of
resulting scales.
An Unusual Use of PCA

Poulson, Braithwaite, Brondino, and Wuensch
(1997, Journal of Social Behavior and
Personality, 12, 743-758).
Simulated jury trial, seemingly insane

defendant killed a man.
Criterion variable = recommended verdict
Guilty
Guilty But Mentally Ill
Not Guilty By Reason of Insanity.
Predictor variables = jurors scores on 8

scales.
Discriminant function analysis.
Problem with multicollinearity.
Used PCA to extract eight orthogonal
components.
Predicted recommended verdict from
these 8 components.
Transformed results back to the original
scales.
A Simple, Contrived Example

Consumers rate importance of seven
characteristics of beer.
low Cost
high Size of bottle
high Alcohol content
Reputation of brand
Color
Aroma
Taste
FACTBEER.SAV at
http://core.ecu.edu/psyc/wuenschk/SPSS
/SPSS-Data.htm
.
Analyze, Data Reduction, Factor.
Scoot beer variables into box.
Click Descriptives and then check Initial

Solution, Coefficients, KMO and Bartletts
Test of Sphericity, and Anti-image. Click
Continue.
Click Extraction and then select Principal

Components, Correlation Matrix,
Unrotated Factor Solution, Scree Plot, and
Eigenvalues Over 1. Click Continue.
Click Rotation. Select Varimax and

Rotated Solution. Click Continue.
Click Options. Select Exclude Cases

Listwise and Sorted By Size. Click
Continue.
Click OK, and SPSS completes the

Principal Components Analysis.
Checking for Unique Variables 1

Check the correlation matrix.
If there are any variables not well
correlated with some others, might as well
delete them.

Correlation Matrix
cost
sizealcohol reputat color
aroma taste
cost
1.00 .832 .767 -.406 .018 -.046 -.064
size
.832 1.00 .904 -.392 .179 .098 .026
alcohol .767 .904 1.00 -.463 .072 .044 .012
reputat -.406 -.392 -.463 1.00 -.372 -.443 -.443
color .018 .179 .072 -.372 1.00 .909 .903
aroma -.046 .098 .044 -.443 .909 1.00 .870
taste -.064 .026 .012 -.443 .903 .870 1.00

Bartletts test of sphericity tests null that
the matrix is an identity matrix, but does
not help identify individual variables that
are not well correlated with others.
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling

Adequacy.
Bartlett's Test of
Sphericity
Approx. Chi-Square
df
Sig.
.665
1637.9
21
.000

For each variable, check R2 between it
and the remaining variables.
SPSS reports these as the
initial communalities when
you do a principal axis
factor analysis
Delete any variable with a
low R2 .
Checking for Unique Correlations

Look at partial correlations pairs of
variables with large partial correlations
share variance with one another but not
with the remaining variables this is
problematic.
Kaisers MSA will tell you, for each
variable, how much of this problem exists.
The smaller the MSA, the greater the
problem.
Checking for Unique Correlations 2

An MSA of .9 is marvelous, .5 miserable.
Variables with small MSAs should be
deleted
Or additional variables added that will
share variance with the troublesome
variables.
Checking for Unique Correlations 3

Anti-image Matrices
cost
Anti-image
Correlation
size
alcohol
reputat
color
aroma
taste
cost
.779a
-.543
.105
.256
.100
.135
-.105
size
-.543
.550a
-.806
-.109
-.495
.061
.435
.105
-.806
.630a
.226
.381
-.060
-.310
.256
-.109
.226
.763a
-.231
.287
.257
color
.100
-.495
.381
-.231
.590a
-.574
-.693
aroma
.135
.061
-.060
.287
-.574
.801a
-.087
-.105
.435
-.310
.257
-.693
-.087
.676a
alcohol
reputat
taste
a. Measures of Sampling Adequacy (MSA) on main diagonal. Off diagonal are partial correlations x -1.
Extracting Principal Components 1

From p variables we can extract p components.
Each of p eigenvalues represents the amount of
standardized variance that has been captured
by one component.
The first component accounts for the largest
possible amount of variance.
The second captures as much as possible of
what is left over, and so on.
Each is orthogonal to the others.

Each variable has standardized variance =
1.
The total standardized variance in the p
variables = p.
The sum of the m = p eigenvalues = p.
All of the variance is extracted.
For each component, the proportion of
variance extracted = eigenvalue / p.

For our beer data, here are the
eigenvalues and proportions of variance
for the seven components:
Component
1
2
3
4
5
6
7
Total
3.313
2.616
.575
.240
.134
9.E-02
4.E-02
Initial Eigenvalues
% of
Cumulative
Variance
%
47.327
47.327
37.369
84.696
8.209
92.905
3.427
96.332
1.921
98.252
1.221
99.473
.527
100.000
Extraction Method: Principal Component Analysis.
How Many Components to Retain

From p variables we can extract p
components.
We probably want fewer than p.
Simple rule: Keep as many as have
eigenvalues 1.
A component with eigenvalue < 1 captured
less than one variables worth of variance.
Visual Aid: Use a Scree Plot

Scree is rubble at base of cliff.
For our beer data,
Scree Plot
3.5
3.0
2.5
2.0
1.5
Eigenvalue
1.0
.5
0.0
1
Component Number
Only the first two components have

eigenvalues greater than 1.
Big drop in eigenvalue between
component 2 and component 3.
Components 3-7 are scree.
Try a 2 component solution.
Should also look at solution with one fewer
and with one more component.
Less Subjective Methods

Parallel Analysis and Velciers MAP test.
SAS, SPSS, Matlab scripts available at
https://
people.ok.ubc.ca/brioconn/nfactors/nfactor
s.html
Parallel Analysis
How many components account for more
variance than do components derived
from random data?
Create 1,000 or more sets of random
data.
Each with same number of cases and
variables as your data set.
For each set, find the eigenvalues.
For the eigenvalues from the random sets,

find the 95th percentile for each
component.
Retain as many components for which the
eigenvalue from your data exceeds the
95th percentile from the random data sets.
Random Data Eigenvalues

Root
Prcntyle
1.000000
1.344920
2.000000
1.207526
3.000000
1.118462
4.000000
1.038794
5.000000
.973311
6.000000
.907173
7.000000
.830506
Our data yielded eigenvalues of 3.313,

2.616, and 0.575.
Retain two components
Velicers MAP Test

Step by step, extract increasing numbers
of components.
At each step, determine how much
common variance is left in the residuals.
Retain all steps up to and including that
producing the smallest residual common
variance.
Velicer's Minimum Average Partial (MAP) Test:

Velicer's Average Squared Correlations
.000000
.266624
1.000000
.440869
2.000000
.129252
3.000000
.170272
4.000000
.331686
5.000000
.486046
6.000000 1.000000
The smallest average squared correlation is
.129252
The number of components is 2
Which Test to Use?

Parallel analysis tends to overextract.
MAP tends to underextract.
If they disagree, increase number of
random sets in the parallel analysis
And inspect carefully the two smallest
values from the MAP test.
May need apply the meaningfulness
criterion.
Loadings, Unrotated and Rotated

loading matrix = factor pattern matrix =
component matrix.
Each loading is the Pearson r between one
variable and one component.
Since the components are orthogonal, each
loading is also a weight from predicting X from
the components.
Here are the unrotated loadings for our 2
component solution:
Component Matrixa
COLOR
AROMA
REPUTAT
TASTE
COST
ALCOHOL
SIZE
Component
1
2
.760
-.576
.736
-.614
-.735
-.071
.710
-.646
.550
.734
.632
.699
.667
.675
a. 2 components extracted.
All variables load well on first component,

economy and quality vs. reputation.
Second component is more interesting,
economy versus quality.
Rotate these axes so that the two

dimensions pass more nearly through the
two major clusters (COST, SIZE, ALCH
and COLOR, AROMA, TASTE).
The number of degrees by which I rotate
the axes is the angle PSI. For these data,
rotating the axes -40.63 degrees has the
desired effect.
Component 1 = Quality versus reputation.

Component 2 = Economy (or cheap
drunk) versus reputation.
Rotated Component Matrixa
TASTE
AROMA
COLOR
SIZE
ALCOHOL
COST
REPUTAT
Component
1
2
.960
-.028
.958
1.E-02
.952
6.E-02
7.E-02
.947
2.E-02
.942
-.061
.916
-.512
-.533

Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
Number of Components in the

Rotated Solution
Try extracting one fewer component, try one
more component.
Which produces the more sensible solution?
Error = difference in obtained structure and true
structure.
Overextraction (too many components)
produces less error than underextraction.
If there is only one true factor and no unique
variables, can get factor splitting.
In this case, first unrotated factor true

factor.
But rotation splits the factor, producing an
imaginary second factor and corrupting
the first.
Can avoid this problem by including a
garbage variable that will be removed prior
to the final solution.
Explained Variance
Square the loadings and then sum them across
variables.
Get, for each component, the amount of
variance explained.
Prior to rotation, these are eigenvalues.
Here are the SSL for our data, after rotation:
Total Variance Explained
Component
1
2
Rotation Sums of Squared

Loadings
% of
Cumulative
Total
Variance
%
3.017
43.101
43.101
2.912
41.595
84.696
After rotation the two components together

account for (3.02 + 2.91) / 7 = 85% of the
total variance.
If the last component has a small SSL,

one should consider dropping it.
If SSL = 1, the component has extracted
one variables worth of variance.
If only one variable loads well on a
component, the component is not well
defined.
If only two load well, it may be reliable, if
the two variables are highly correlated with
one another but not with other variables.
Naming Components
For each component, look at how it is
correlated with the variables.
Try to name the construct represented by
that factor.
If you cannot, perhaps you should try a
different solution.
I have named our components aesthetic
quality and cheap drunk.
Communalities
For each variable, sum the squared
loadings across components.
This gives you the R2 for predicting the
variable from the components,
which is the proportion of the variables
variance which has been extracted by the
components.
Here are the communalities for our beer

data. Initial is with all 7 components,
Extraction is for our 2 component
solution.
Communalities
COST
SIZE
ALCOHOL
REPUTAT
COLOR
AROMA
TASTE
Initial
1.000
1.000
1.000
1.000
1.000
1.000
1.000
Extraction
.842
.901
.889
.546
.910
.918
.922
Orthogonal Rotations
Varimax -- minimize the complexity of the
components by making the large loadings
larger and the small loadings smaller
within each component.
Quartimax -- makes large loadings larger
and small loadings smaller within each
variable.
Equamax a compromize between these
two.
Oblique Rotations
Axes drawn through the two clusters in the
upper right quadrant would not be
perpendicular.
May better fit the data with axes that are

not perpendicular, but at the cost of having
components that are correlated with one
another.
More on this later.

Pca SPSS

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Pca SPSS

Încărcat de

Drepturi de autor:

Formate disponibile

Principal Components

Analysis with SPSS

When to Use PCA

Components and Variables

Factors and Variables

Goals of PCA and FA

Exploratory Factor Analysis

Confirmatory Factor Analysis

Construct A Test Instrument

An Unusual Use of PCA

Simulated jury trial, seemingly insane

Predictor variables = jurors scores on 8

A Simple, Contrived Example

Click Descriptives and then check Initial

Click Extraction and then select Principal

Click Rotation. Select Varimax and

Click Options. Select Exclude Cases

Click OK, and SPSS completes the

Checking for Unique Variables 1

Checking for Unique Variables 2

sizealcohol reputat color

Checking for Unique Variables 3

Kaiser-Meyer-Olkin Measure of Sampling

Checking for Unique Variables 4

Checking for Unique Correlations

Checking for Unique Correlations 2

Checking for Unique Correlations 3

Extracting Principal Components 1

Extracting Principal Components 2

Extracting Principal Components 3

Extraction Method: Principal Component Analysis.

How Many Components to Retain

Visual Aid: Use a Scree Plot

Only the first two components have

Less Subjective Methods

For the eigenvalues from the random sets,

Random Data Eigenvalues

Our data yielded eigenvalues of 3.313,

Velicers MAP Test

Velicer's Minimum Average Partial (MAP) Test:

Which Test to Use?

Loadings, Unrotated and Rotated

Extraction Method: Principal Component Analysis.

All variables load well on first component,

Rotate these axes so that the two

Component 1 = Quality versus reputation.

Extraction Method: Principal Component Analysis.

a. Rotation converged in 3 iterations.

Number of Components in the

In this case, first unrotated factor true

Total Variance Explained

Rotation Sums of Squared

Extraction Method: Principal Component Analysis.

After rotation the two components together

If the last component has a small SSL,

Here are the communalities for our beer

Extraction Method: Principal Component Analysis.

May better fit the data with axes that are

S-ar putea să vă placă și