Sunteți pe pagina 1din 20

PCA vs PLS

Maya Hristakeva

University of California, Santa Cruz

May 13, 2009

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 1 / 20


Outline

1 Linear Regression

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 2 / 20


Outline

Setup

Data matrix (instances as columns):

X = [x1 ... xT ] ∈ RN x T

Reference values:

y = [y1 ... yT ]T ∈ RT x 1

Goal: minimize square loss


T
1X T 1
min (xi w − yi )2 ≡ min ||XT w − y||2
w 2 w 2
i=1

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 3 / 20


Outline

Variance and Covariance


1
PT
Expectation of X = [x1 ... xT ]: E[X] = T i=1 xi
Variance of X:
T
1 X
var(X) = cov(X, X) = (xi − E[X])(xi − E[X])T
T i=1

Covariance of X = [x1 ... xT ] and Z = [z1 ... zT ]


T
1 X
cov(X, Z) = (xi − E[X])(zi − E[Z])T
T i=1

In this presentation, we assume that X and y are mean-centered:

E[X] = 0 and E[y] = 0

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 4 / 20


Linear Regression

Outline

1 Linear Regression

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 5 / 20


Linear Regression

Linear Regression
Least Squares optimization problem:
T
1X T 1
L(w) = min (xi w − yi )2 ≡ min ||XT w − y||2
w 2 w 2
i=1

Differentiate w.r.t. w:

∇w L(w) = X(XT w − y) = 0
XXT w = Xy

Exact solution:
w? = (XXT )−1 Xy
Note: XXT is not always invertible

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 6 / 20


Linear Regression

Ridge Regression

Regularization penalizes large values of ||w||22

1 λ
L(w) = min ||XT w − y||2 + ||w||2
w 2 2
Differentiate w.r.t. w:

∇w L(w) = X(XT w − y) + λw = 0
(XXT + λI)w = Xy

Exact solution:
w? = (XXT + λI)−1 Xy
Note: XXT + λI is always invertible for λ > 0

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 7 / 20


Prinicpal Component Analysis

Outline

1 Linear Regression

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 8 / 20


Prinicpal Component Analysis

Compression Loss Minimization

Find a rank k projection matrix P for which the compression loss is


minimized:
T
X
min ||Pxi − xi ||2 ≡ min ||PX − X||2
P P
i=1
= min tr ((I − P)XXT )
P
= max tr (PXXT )
P
T
X
= max tr var (P̃T xi )

i=1

where P is a projection matrix of rank k.

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 9 / 20


Prinicpal Component Analysis

Projection Matrix Properties

Properties of P:
P2 = P ∈ RNxN
P = ki=1 pi pTi = P̃P̃T for P̃ = [p1 ...pk ] ∈ RNxK
P

pTi pi = 1 (i.e. pi has unit-length)


pTi pj = 0 for i 6= j (i.e. pi and pj are orthogonal)

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 10 / 20


Prinicpal Component Analysis

Variance Maximization
Find k projection directions P̃ = [p1 ...pk ] for which the variance of
the compressed data (P̃T X) is maximized:
T T
X 1 X T
max tr var (P̃T xi ) ≡ max tr (P̃ xi )(P̃T xi )T

i=1
P̃ T i=1
N
1 X T
= max tr (xi P̃P̃T} xi )
P̃ T i=1 | {z
P
N
1 X
= max tr (P (xi xTi ))
P T i=1
1
= max tr (P XXT )
P
|T {z }
C

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 11 / 20


Prinicpal Component Analysis

PCA Solution
Let C = XXT : covariance matrix of X
X
max tr (PC) = max tr (P( γi ci cTi )
P P
i
X
= max γi tr (cTi Pci )
P | {zP }
i
cT
i Pci ≤1, i cT
i Pci =k
X
≤ max
P γi δi
0≤δi ≤1, i δi =k
i
k
X
= max γij = k largest eigenvalues of C
1≤i1 <i2 <ik ≤n
j=1

Hence, P consists of the eigenvectors corresponding to the k largest


eigenvalues of C.
Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 12 / 20
Prinicpal Component Analysis

Principal Component Regression


Principal Component Regression ≡ PCA + Linear Regression

Use PCA to find a k−rank projection matrix P = P̃P̃T

min ||PX − X||2


P

Minimize square loss


1
arg min ||(P̃T X)T w − y||2
w 2
Solution:

w? = (P̃T XXT P̃)−1 P̃T Xy ∈ Rk x 1

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 13 / 20


Prinicpal Component Analysis

Summary of PCA

Finds a set of k orthogonal direction


Directions of maximum variance of XXT
Minimizes compression error (i.e. best approximation of X)
Ignores all information about y while constructing the projection
matrix P

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 14 / 20


Partial Least Squares

Outline

1 Linear Regression

2 Prinicpal Component Analysis

3 Partial Least Squares

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 15 / 20


Partial Least Squares

Partial Least Squares (PLS)


Finds components from X that are also relevant to y
PLS finds projection directions for which the covariance between
X and y is maximized:
T
X
T 2
arg max(cov (X pi , y)) = arg max( (xTj pi )yj )2
pi pi
j=1
T
X
= arg max(tr (pTi (xj yj ))2
pi
j=1

= arg max(tr (pTi Xy))2


pi

= arg max(pTi Xy)(pTi Xy)T


pi

= arg max pTi XyyT XT pi


pi
Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 16 / 20
Partial Least Squares

Finding the First PLS Direction p1

Finding p1

arg max pT1 XyyT XT p1 s.t. pT1 p1 = 1


p1

L(p1 , λ) = pT1 XyyT XT p1 − λ(pT1 p1 − 1)


∇p1 L = XyyT XT p1 − λp1 = 0
XyyT XT p1 = λp1

Hence, p1 is the largest eigenvector of XyyT XT .

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 17 / 20


Partial Least Squares

Finding the remaining k − 1 PLS directions

Since (XyyT XT ) is a rank-1 matrix, an additional orthogonality


constraints is used to find the remaining k − 1 PLS projection
directions

arg max pTi XyyT XT pi


pi

s.t. pTi pi = 1 and pTi XXT pj = 0 for1 ≤ j < i

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 18 / 20


Partial Least Squares

PLS Regression
PLS Regression ≡ PLS Decomposition + Linear Regression
Use PLS to find a projection directions pi

max(cov (XT pi , y))2


pi

s.t. pTi pi = 1 and pTi XXT pj = 0 for1 ≤ j < i


Minimize square loss
1
arg min ||(P̃T X)T w − y||2
w 2
Solution:
w? = (P̃T XXT P̃)−1 P̃T Xy
for P̃ = [p1 ... pk ]
Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 19 / 20
Partial Least Squares

Summary

PCA and PLS:


Differ in the optimization problem they solve to find a projection
matrix P
Are all linear decomposition techniques
Can be combined with various loss function other than square
loss

Maya Hristakeva et.al. (UCSC) PCA vs PLS May 13, 2009 20 / 20

S-ar putea să vă placă și