Sunteți pe pagina 1din 28

Dimensionality Reduction

Lecturer:

Javier Hernandez Rivera

30th September 2010 MAS 622J/1.126J: Pattern Recognition & Analysis


"A man's mind, stretched by new ideas, may never return to it's original Oliver Wendell Holmes Jr. dimensions"

Outline

Before

Linear Algebra Probability Likelihood Ratio ROC ML/MAP Accuracy, Dimensions & Overfitting (DHS 3.7) Principal Component Analysis (DHS 3.8.1) Fisher Linear Discriminant/LDA (DHS 3.8.2) Other Component Analysis Algorithms

Today

2

Accuracy, Dimensions & Overfitting

Classification accuracy depends upon the dimensionality?

if

Complexity (computational/temporal)
Big O notation

O(c) < O(log n) < O(n) < O(n log n) < O(n^2) < O(n^k) < O(2^n) < O(n!) < O(n^n) Training Testing
3

Overfitting
Regression Classification

Error

Overfitting

Complexity
4

Dimensionality Reduction

Optimal data representation


2D: Evaluating students performance
# study hours # study hours

# As Unnecessary complexity
5

# As

Dimensionality Reduction

Optimal data representation

3D: Find the most informative point of view

Principal Component Analysis

Assumptions for new basis:


Large variance has important structure Linear projection Orthogonal basis


T wj wi = 1 i = j T wj wi = 0 i 6= j

Y = WT X

X 2 Rd;n d dim, n samples xij dim i of sample j


Y 2 Rk;n projected data WT 2 Rk;d k basis of d dim

Principal Component Analysis Optimal W?


1.

Minimize projection cost (Pearson, 1901)


minkx x ^k

2.
8

Maximize projected variance (Hotelling, 1933)

Principal Component Analysis


Covariance Algorithm Pn 1 i = n j =1 xij 1. Zero mean: xij = xij i Pn 1 i = n1 j =1 (xij i )2 x 2. Unit variance: xij = 3. Covariance matrix: = XXT 4. Compute eigenvectors of : WT 5. Project X onto the k principal components
ij j

Y
kxn

=
k<d

WT
kx d

X
dx n

Principal Component Analysis

How many PCs?

Y = WX kx n k x d dx n
Pk i =1 Pi > Threshold (0.9 or 0.95) n
i=1 i

14% of Energy Normalized eigenvalue of 3rd PC

There are d eigenvectors Choose first p eigenvectors based on their eigenvalues Final data has only k dimensions
10

PCA Applications: EigenFaces

X
dxn Images from faces: Aligned Gray scale Same size Same lighting

eig vect (X XT)

W
dxd eigenFaces

d = # pixels/image n = # images

Sirovich and Kirby, 1987 Matthew Turk and Alex Pentland, 1991
11

PCA Applications: EigenFaces


Faces EigenFaces

Eigenfaces are standardized face ingredients A human face may be considered to be a combination of these standard faces
12

PCA Applications: EigenFaces


Dimensionality Reduction W x
x

Outlier Detection
argmaxckycar yc k > 2

ycar

Classification
argminckynew yc k < 1

ynew

13

PCA Applications: EigenFaces


Reconstruction
x = y1 w1 + y2 w2 + + yk wk

Reconstructed images
(adding 8 PC at each step)

Reconstructing with partial information


14

Principal Component Analysis

Any problems?
Lets define a samples as an image of100x100 pixels d = 10,000 X XT is 10,000 x 10,000 O(d2d3) computing eigenvectors becomes infeasible 100 px

100 px

Possible solution (just if n<<d)


Use WT = (XE)T where E Rn,k are k eigenvectors of XTX
15

Principal Component Analysis


SVD Algorithm Pn 1 i = n j =1 xij 1. Zero Mean: xij = xij i Pn xij 1 2 = ( x ) x = i ij i ij j =1 2. Unit Variance: n1 j 3. Compute SVD of X: X = UVT 4. Projection matrix: W = U 5. Project data

Y
kxn

W
kxd

X
dx n
k <= n < d

Why W = U?
16

Principal Component Analysis

X
dx n

U
dx d

S
dx n

VT
nxn

Data Columns are data points

Right Singular Vectors Columns are eigenvectors of XXT

Singular Values Diagonal matrix of sorted values

Left Singular Vectors Rows are eigenvectors of XTX

MATLAB: [U S V] = svd(A);
17

PCA Applications: Action Unit Recognition

T. Simon, J. Hernandez, 2009


18

Applications

Visualize high dimensional data (e.g. AnalogySpace) Find hidden relationships (e.g. topics in articles) Compress information Avoid redundancy Reduce algorithm complexity Outlier detection Denoising

Demos: http://www.cs.mcgill.ca/~sqrt/dimr/dimreduction.html
19

PCA for pattern recognition


Principal Component Analysis higher variance bad for discriminability

Fisher Linear Discriminant Linear Discriminant Analysis smaller variance good discriminability

20

Linear Discriminant Analysis

Assumptions for new basis:


y = wT x

Maximize distance between projected class means Minimize projected class variance
21

Linear Discriminant Analysis


Objective
argmaxw J (w) =
mi =
1 ni

wT SB w wT SW w
x2Ci

SW =

SB = (m2 m1)(m2 m1)T


P2 P
j

T x2Cj (x mj )(x mj )

Algorithm 1. Compute class means 1 w = SW (m2 m1 ) 2. Compute T y = w x 3. Project data


22

PCA vs LDA

PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional space as possible. LDA: Perform dimensionality reduction while preserving as much of the class discriminatory information as possible.
23

Other Component Analysis Algorithms

Independent Component Analysis (ICA)

PCA: uncorrelated PCs ICA: independent PCs Blind source separation


M. Poh, D. McDuff, and R. Picard 2010

Recall: independent uncorrelated uncorrelated X independent

24

Independent Component Analysis


Sound Source 1 Mixture 1 Output 1 Sound Source 2

Mixture 2

I C A

Output 2

Output 3 Sound Source 3


25

Mixture 3

Other Component Analysis Algorithms

Curvilinear Component Analysis (CCA)


PCA/LDA: linear projection

CCA: non-linear projection while preserving proximity between points

26

What you should know?


Why

dimensionality reduction? PCA/LDA assumptions PCA/LDA algorithms PCA vs LDA Applications Possible extensions

27

Recommended Readings

Tutorials

A tutorial on PCA. J. Shlens A tutorial on PCA. L. Smith Fisher Linear Discriminat Analysis. M. Welling Eigenfaces (http://www.cs.princeton.edu/~cdecoro/eigenfaces/) C. DeCoro

Publications


28

Eigenfaces for Recognition. M. Turk and A. Pentland Using Discriminant Eigenfeatures for Image Retrieval. D. Swets, J. Weng Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. P. Belhumeur, J. Hespanha, and D. Kriegman PCA versus LDA. A. Martinez, A. Kak

S-ar putea să vă placă și