Sunteți pe pagina 1din 26

Principal Component Analysis Part II

Based on 1. Statistics and Data Analysis in Geology, J.C. Davis, New York, John Wiley & sons, 2nd ed.,1996 2. Chemometrics: A Textbook, Amsterdam, D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte, and L. Kaufman, Elsevier,1988 3. Course note: Multivariate Data Analysis and Chemometrics, B. Jrgensen, Department of Statistics, University of Southern Denmark, 2003 4. Multi- and Megavariate Data Analysis- Principles and Applications, L. Eriksson, E. Johansson, N. Kettaneh-Wold, and S. Wold, UMETRICS, 2001 5. Matlab online Manual, The MathWorks, Inc.

PART II: Procedures of PCA


1. 2. 3. 4. 1st Step: Pre-treatment of Data Matrix Scaling 2nd Step: Calculation of covariance matrix 3rd Step: Calculation of eigenvalues and eigenvectors of covariance matrix 4th Step: Calculation of scores

1st Step: Pre-treatment of Data Matrix


1) Pre-treatment of Data Matrix-Scaling

Scaling

(Adapted from Multi- and Megavariate Data Analysis- Principles and Applications, L. Eriksson, E.Johansson, N. Kettaneh-Wold, and S. Wold, UMETRICS, 2001)

2) Example of Data Matrix: X

1) Pre-treatment of Data Matrix-Scaling Unless the data are normalized, a variable with a large variance will dominate Most common scaling technique Unit variance (UV) scaling

Column: Variable

UV Scaling

a11 am1
Sk1

a1n ... amn ...


Sk2 Skn

Row: Sample

Standard deviation of each variable

(a11 / sk 1 ) (am1 / sk 1)

(a1n / skn ) ... (amn / skn ) ...

1) Pre-treatment of Data Matrix-Scaling (Continued)

Measured Values & Length

Unit Variance Scaling Length of each variable: identical

Note: However, the mean values still remain different Therefore mean-centering as a second part of pre-data processing Step1) Average value of each variable is calculated Step2) Subtracted from the data

1) Pre-treatment of Data Matrix- Scaling (Continued)

Measured Values & Length

Unit Variance Scaling Length of each variable: identical

Mean Centering

Note: Unit Variance Scaling + Mean Centering= Auto-Scaling Sometimes we dont need UV scaling ex. same unit like spectroscopic data

2) Example of Data Matrix: X


As an example, when we have bivariate observations as follows,
Sample NO. 1 2 3 4 5 6 7 8 9 10 Element H Li Be B C N O F Na Mg Martynov-Batsanovs Electronegativity (X1) 2.1 0.9 1.45 1.9 2.37 2.85 3.32 3.78 0.89 1.31 Zungers pseudopotenial core radii sum (X2) 1.25 1.61 1.08 0.795 0.64 0.54 0.465 0.405 2.65 2.03 Sample NO. 11 12 13 14 15 16 17 18 19 20 Element Al Si P S Cl K Ca Sc Ti V Martynov-Batsanovs Electronegativity (X1) 1.64 1.98 2.32 2.65 2.98 0.8 1.17 1.5 1.86 2.22 Zungers pseudopotential core radii sum (X2) 1.675 1.42 1.24 1.1 1.01 3.69 3 2.75 2.58 2.43

We have 20 samples with 2 variables. The size of data matrix will be 202.

2) Example of Data Matrix: X (continued)

The scatter plot of X


4.0 3.5 3.0 2.5

X2

2.0 1.5 1.0 0.5 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

X1

2) Example of Data Matrix: X (continued)


If we choose Mean Centering as a scaling method. New frame after the mean centering
4.0 3.5 3.0 2.5

X2

2.0 1.5 1.0 0.5 0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
2.5 2.0 1.5 1.0 0.5

New frame after the mean centering

X1

Data after the scaling

Avg. of X1

Avg. of X2

(1.9995,

1.618)
X2

0.0 -0.5 -1.0 -1.5 -2.0 -2.5 -2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

X1

2nd Step: Calculation of covariance matrix


1) Calculation of Covariance matrix(S) of Data Matrix(X) 2) Property of Covariance matrix(S) of Data Matrix(X)

1) Calculation of Covariance matrix(S) of Data Matrix(X): If the data matrix X has m-rows and n-columns, covariance matrix S is

1 )X T X S = cov( X ) = ( m 1
From example, we have 20 rows and 2 columns. Therefore, the size of covariance matrix, S, is 2 by 2. By definition of covariance matrix,

0.6881 0.5929 S= 0.5929 0.9026 0.6881 0.5929 S= 0.5929 0.9026

Covariance matrix S from original data matrix

Covariance matrix S from scaled data matrix

How do we get a covariance matrix, S, via computer? Use the command, cov(a), in Matlab where a is a data matrix

2) Property of Covariance matrix(S) of Data Matrix(X):


the variance of X1 covariance between the X1 and X2:
Inverse correlation from the negative sign

0.6881 0.5929 S= 0.5929 0.9026


covariance between the X1 and X2:
Inverse correlation from the negative sign

the variance of X2

Variance (1 dimensional concept): Measure of the spread of data in a given data set

var( X ) =

(X
i =1

X )( X i X ) (n 1)

Covariance (Multi-dimensional concept): Measure of the spread of data between dimensions (variables)

cov( X , Y ) =

(X
i =1

X )(Yi Y ) (n 1)

Where n: sample number

X and Y : mean of the set X and Y, respectively

2) Property of Covariance matrix(S) of Data Matrix (X) (continued)


n! covariance values. 2(n 2)!

If we have n-dimensional data set, we will have

For 3 dimensional data set (x,y,z), the covariance matrix S will be

cov( x, x) cov( x, y ) cov( x, z ) S = cov( y, x) cov( y, y ) cov( y, z ) cov( z , x) cov( z , y ) cov( z , z )

Since cov(a,b)=cov(b,a), covariance matrix S is symmetrical !

1. From the eigenvalue properties, we already know. for symmetric matrices, their eigenvectors always are at right angles to each other: ORTHOGONAL!!! 2. Therefore eigenvectors of covariance matrix are orthogonal! : VERY IMPORTANT concept in PCA 3. If we measure m variables, we can compute an mm covariance matrix. Then we can extract m eigenvalues and m eigenvectors.

3rd Step of PCA: Calculation of eigenvalues and eigenvectors of covariance matrix


1) Calculation of Eigenvalues of Covariance matrix (S) 2) Calculation of the corresponding Eigenvectors of Covariance matrix (S) 3) Graphical representations of eigenvalues and eigenvectors 4) Summary of eigenvalues and eigenvectors 5) Advantages of PCA

1) Calculation of Eigenvalues of Covariance matrix (S)

0.6881 0.5929 S= 0.5929 0.9026


Symmetrical

0.6881 0.5929

0.5929 0.9026

=0

(0.6881 )(0.9026 ) (0.5929) 2 = 0

2 1.5907 + 0.27 = 0
its eigenvectors are orthogonal (90o)

1.5907 (1.5907) 2 4 0.27 = 2

See next slide

1 = 1.3978, 2 = 0.1928
Eigenvalues of covariance matrix S

2) Calculation of the corresponding Eigenvectors of Covariance matrix (S) For eigenvalue

1 = 1.3978

0.5929 x1 0.7097 0.5929 x1 0.6881 1.3978 = =0 0.5929 0.9026 1.3978 x2 0.5929 0.4952 x2
Is it possible to solve this problem? (NOT for the case of No, because of two same values (-0.5929)

x1 = x2 = 0

For eigenvalue 2 = 0.1928


0.5929 0.6881 0.1928 x1 0.4953 0.5929 x1 x = 0.5929 0.7098 x = 0 0.9026 0.1928 2 0.5929 2
Is it possible to solve this problem? (NOT for the case of No, because of two same values (-0.5929)

x1 = x2 = 0 )

So computerized solutions are indispensable!! See Next Slide!

2) Calculation of the corresponding Eigenvectors of Covariance matrix (S) In Matlab, use the function of eigenvector decomposition. The command is [P, ]=eig(S). Then you will get followings in Matlab.

0 0.1928 = 0 1.3978

Eigenvalues that we already calculated

For

1 = 1.3978
Eigenvectors

0.7675 0.6411 P= 0.6411 0.7675 For 2 = 0.1928

2) Calculation of the corresponding Eigenvectors of Covariance matrix (S) For

1 = 1.3978
Eigenvectors

0.7675 0.6411 P= 0.6411 0.7675 For 2 = 0.1928


Note: for

2 = 0.1928
Eigenvectors can also be x1=0.7675 and x2=0.6411

0.4953 0.5929 x1 0.5929 0.7098 x = 0 2

for

1 = 1.3978
Eigenvectors can also be x1=0.6411 and x2=-0.7675

0.7097 0.5929 x1 0.5929 0.4952 x = 0 2

Because of this effect, we sometimes get mirror images in PCA!

3) Graphical representations of eigenvalues and eigenvectors Covariance matrix of scaled data matrix,S Eigenvalues of S Eigenvectors of S

0.6881 0.5929 S= 0.5929 0.9026

1 = 1.3978

and

2 = 0.1928

1 = 1.3978 , Eigenvectors: -0.6411 & 0.7675 or (0.6411 & -0.7675) For 2 = 0.1928, Eigenvectors: -0.7675 & -0.6411 or (0.7675 & 0.6411)
For We already know Eigenvectors: the orientations of the principal axes of the ellipse
Slope of major axis =ratio of eigenvector=0.7675:(-0.6411) Slope of minor axis=ratio of eigenvector=-0.6411:(-0.7675) - Slope + Slope

Eigenvalues: the lengths of each of the principal axes of ellipse

2 =0.1928 1 =1.3978

c.f. in 3D

Ellipsoid

3) Graphical representations of eigenvalues and eigenvectors (continued)

Covariance matrix of scaled data matrix,S


0.6881 0.5929 S= 0.5929 0.9026
PC2(Principal Component 2)
2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 -2.5 -2.5

the lengths of each of the principal axes of ellipse

X2

Eigenvalues:

Orthogonal (90o)

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

X1

Slope of minor axis = ratio of eigenvector of the second largest eigenvalue(2)

PC1(Principal Component 1)
Slope of major axis = ratio of eigenvector of the largest eigenvalue(1)

4) Summary of eigenvalues and eigenvectors

0.6881 0.5929 S= 0.5929 0.9026


From covariance matrix S, we know followings. 1. The total variance (trace of matrix) is 0.6881+0.9026=1.5907 2. Variable X2 contributes 0.9026/1.5907=56.74% 3. Variable X1 contributes 0.6881/1.5907=43.26%

0 0.1928 = 0 1.3978
How about eigenvalues? 1. Total eigenvalues will be 0.1928+1.3978=1.5906 and this is the same as total variance 2. Eigenvalues represent the lengths of the two principal axes of ellipse. Therefore the axes represent the total variance of the data set. 3. The first principal axis contains 1.3978/1.5906=87.88% of the total variance. Second principal axis represents 0.1928/1.5906=12.12% of the total variance.

5) Advantages of PCA From this example, Lets suppose..we need to reduce our system to only one variable. Then we need to discard either variable X2 or X1. It means we will lose 56.74% or 43.26% of the total variance. If, however, we convert our data set to scores on the first principal axis(PC 1), we lose only 12.12% of the variation in our data set. This is a big advantage of PCA!!!

4th Step of PCA: Calculation of scores


1) Mathematical representations of transformation of axes 2) Calculations of scores 3) Proof of scores by overlapping onto the scaled data set

1) Mathematical representations of transformation of axes from X1-X2 to PC1-PC2 (PC: Principal Component)
Slope of major axis = ratio of eigenvector of the largest eigenvalue(1)

PC1
2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 -2.5 -2.5

PC1 = 1 X 1 + 2 X 2

s are the elements of the first eigenvector


PC2 PC 2 = 1 X 1 + 2 X 2

s are the elements of the second eigenvector

X2

For 1st principal axis,

PC1i = 0.6411X 1i 0.7675 X 2i


For 2nd principal axis,
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

X1 Slope of minor axis = ratio of eigenvector of the second largest eigenvalue(2)

PC 2 i = 0.7675 X 1i + 0.6411 X 2 i

Scores

Loadings

2) Calculations of scores

For 1st principal axis,

X ] ([ P ] ) 1 = [T ] [
T

PC1i = 0.6411X 1i 0.7675 X 2i


For 2nd principal axis,

[T ] n m matrix of principal components scores


[ X ]n m matrix of observations (scaled original data matrix)
[ P] m m Square matrix of eigenvectors or loading matrix.

PC 2 i = 0.7675 X 1i + 0.6411 X 2 i

Scores

Loadings Therefore,

PC2

PC1

0.1005 0.368 0.1588 0.3469 1.0995 0.008 0.8490 0.6987 0.7675 0.6411 0.5495 0.538 = 0.7666 0.0606 0.6411 0.7675 0.2205 0.812 0.6898 0.4818
Scores

3) Proof of scores by overlapping onto the scaled data set

Original data set (scaled)


2.5 2.0 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 -2.0 -2.5 -2.5

Score plot of scaled data set

rotation

2.5 2.0 1.5 1.0 0.5

X2

PC2

0.0 -0.5

rotation
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

-1.0 -1.5 -2.0 -2.5 -2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

X1

PC1

S-ar putea să vă placă și