Dimensionality Reduction: Javier Hernandez Rivera

Dimensionality Reduction
Lecturer:
Javier Hernandez Rivera
30th September 2010 MAS 622J/1.126J: Pattern Recognition & Analysis

"A man's mind, stretched by new ideas, may never return to it's original Oliver Wendell Holmes Jr. dimensions"
Outline
Before

Linear Algebra Probability Likelihood Ratio ROC ML/MAP Accuracy, Dimensions & Overfitting (DHS 3.7) Principal Component Analysis (DHS 3.8.1) Fisher Linear Discriminant/LDA (DHS 3.8.2) Other Component Analysis Algorithms
Today

2
Accuracy, Dimensions & Overfitting
Classification accuracy depends upon the dimensionality?
if
Complexity (computational/temporal)
Big O notation
O(c) < O(log n) < O(n) < O(n log n) < O(n^2) < O(n^k) < O(2^n) < O(n!) < O(n^n) Training Testing
3
Overfitting
Regression Classification
Error
Overfitting
Complexity
4
Optimal data representation

2D: Evaluating students performance
# study hours # study hours
# As Unnecessary complexity
5
# As
Optimal data representation
3D: Find the most informative point of view
Principal Component Analysis
Assumptions for new basis:

Large variance has important structure Linear projection Orthogonal basis

T wj wi = 1 i = j T wj wi = 0 i 6= j
Y = WT X
X 2 Rd;n d dim, n samples xij dim i of sample j

Y 2 Rk;n projected data WT 2 Rk;d k basis of d dim
Principal Component Analysis Optimal W?

1.
Minimize projection cost (Pearson, 1901)

minkx x ^k
2.
8
Maximize projected variance (Hotelling, 1933)

Covariance Algorithm Pn 1 i = n j =1 xij 1. Zero mean: xij = xij i Pn 1 i = n1 j =1 (xij i )2 x 2. Unit variance: xij = 3. Covariance matrix: = XXT 4. Compute eigenvectors of : WT 5. Project X onto the k principal components
ij j
Y
kxn
=
k<d
WT
kx d
X
dx n
How many PCs?
Y = WX kx n k x d dx n
Pk i =1 Pi > Threshold (0.9 or 0.95) n
i=1 i
14% of Energy Normalized eigenvalue of 3rd PC
There are d eigenvectors Choose first p eigenvectors based on their eigenvalues Final data has only k dimensions
10
PCA Applications: EigenFaces
X
dxn Images from faces: Aligned Gray scale Same size Same lighting
eig vect (X XT)
W
dxd eigenFaces
d = # pixels/image n = # images
Sirovich and Kirby, 1987 Matthew Turk and Alex Pentland, 1991
11

Faces EigenFaces
Eigenfaces are standardized face ingredients A human face may be considered to be a combination of these standard faces
12

Dimensionality Reduction W x
x
Outlier Detection
argmaxckycar yc k > 2
ycar
Classification
argminckynew yc k < 1
ynew
13

Reconstruction
x = y1 w1 + y2 w2 + + yk wk
Reconstructed images
(adding 8 PC at each step)
Reconstructing with partial information

14
Any problems?
Lets define a samples as an image of100x100 pixels d = 10,000 X XT is 10,000 x 10,000 O(d2d3) computing eigenvectors becomes infeasible 100 px
100 px
Possible solution (just if n<<d)

Use WT = (XE)T where E Rn,k are k eigenvectors of XTX
15

SVD Algorithm Pn 1 i = n j =1 xij 1. Zero Mean: xij = xij i Pn xij 1 2 = ( x ) x = i ij i ij j =1 2. Unit Variance: n1 j 3. Compute SVD of X: X = UVT 4. Projection matrix: W = U 5. Project data
Y
kxn
W
kxd
X
dx n
k <= n < d
Why W = U?
16
X
dx n
U
dx d
S
dx n
VT
nxn
Data Columns are data points
Right Singular Vectors Columns are eigenvectors of XXT
Singular Values Diagonal matrix of sorted values
Left Singular Vectors Rows are eigenvectors of XTX
MATLAB: [U S V] = svd(A);
17
PCA Applications: Action Unit Recognition
T. Simon, J. Hernandez, 2009

18
Applications
Visualize high dimensional data (e.g. AnalogySpace) Find hidden relationships (e.g. topics in articles) Compress information Avoid redundancy Reduce algorithm complexity Outlier detection Denoising
Demos: http://www.cs.mcgill.ca/~sqrt/dimr/dimreduction.html
19
PCA for pattern recognition

Principal Component Analysis higher variance bad for discriminability
Fisher Linear Discriminant Linear Discriminant Analysis smaller variance good discriminability
20
Linear Discriminant Analysis
Assumptions for new basis:

y = wT x
Maximize distance between projected class means Minimize projected class variance
21
Linear Discriminant Analysis

Objective
argmaxw J (w) =
mi =
1 ni
wT SB w wT SW w
x2Ci
SW =
SB = (m2 m1)(m2 m1)T

P2 P
j
T x2Cj (x mj )(x mj )
Algorithm 1. Compute class means 1 w = SW (m2 m1 ) 2. Compute T y = w x 3. Project data

22
PCA vs LDA
PCA: Perform dimensionality reduction while preserving as much of the variance in the high dimensional space as possible. LDA: Perform dimensionality reduction while preserving as much of the class discriminatory information as possible.
23
Other Component Analysis Algorithms
Independent Component Analysis (ICA)
PCA: uncorrelated PCs ICA: independent PCs Blind source separation

M. Poh, D. McDuff, and R. Picard 2010
Recall: independent uncorrelated uncorrelated X independent
24
Independent Component Analysis

Sound Source 1 Mixture 1 Output 1 Sound Source 2
Mixture 2
I C A
Output 2
Output 3 Sound Source 3

25
Mixture 3
Other Component Analysis Algorithms
Curvilinear Component Analysis (CCA)

PCA/LDA: linear projection
CCA: non-linear projection while preserving proximity between points
26
What you should know?

Why
dimensionality reduction? PCA/LDA assumptions PCA/LDA algorithms PCA vs LDA Applications Possible extensions
27
Recommended Readings
Tutorials

A tutorial on PCA. J. Shlens A tutorial on PCA. L. Smith Fisher Linear Discriminat Analysis. M. Welling Eigenfaces (http://www.cs.princeton.edu/~cdecoro/eigenfaces/) C. DeCoro
Publications

28
Eigenfaces for Recognition. M. Turk and A. Pentland Using Discriminant Eigenfeatures for Image Retrieval. D. Swets, J. Weng Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. P. Belhumeur, J. Hespanha, and D. Kriegman PCA versus LDA. A. Martinez, A. Kak

Dimensionality Reduction: Javier Hernandez Rivera

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Dimensionality Reduction: Javier Hernandez Rivera

Încărcat de

Drepturi de autor:

Formate disponibile

Dimensionality Reduction

Javier Hernandez Rivera

30th September 2010 MAS 622J/1.126J: Pattern Recognition & Analysis

Accuracy, Dimensions & Overfitting

Classification accuracy depends upon the dimensionality?

Optimal data representation

Optimal data representation

3D: Find the most informative point of view

Principal Component Analysis

Assumptions for new basis:

Large variance has important structure Linear projection Orthogonal basis

X 2 Rd;n d dim, n samples xij dim i of sample j

Principal Component Analysis Optimal W?

Minimize projection cost (Pearson, 1901)

Maximize projected variance (Hotelling, 1933)

Principal Component Analysis

Principal Component Analysis

How many PCs?

14% of Energy Normalized eigenvalue of 3rd PC

PCA Applications: EigenFaces

eig vect (X XT)

PCA Applications: EigenFaces

PCA Applications: EigenFaces

PCA Applications: EigenFaces

Reconstructing with partial information

Principal Component Analysis

Possible solution (just if n<<d)

Principal Component Analysis

Principal Component Analysis

Data Columns are data points

Right Singular Vectors Columns are eigenvectors of XXT

Singular Values Diagonal matrix of sorted values

Left Singular Vectors Rows are eigenvectors of XTX

PCA Applications: Action Unit Recognition

T. Simon, J. Hernandez, 2009

PCA for pattern recognition

Linear Discriminant Analysis

Assumptions for new basis:

Linear Discriminant Analysis

SB = (m2 m1)(m2 m1)T

Algorithm 1. Compute class means 1 w = SW (m2 m1 ) 2. Compute T y = w x 3. Project data

Other Component Analysis Algorithms

Independent Component Analysis (ICA)

PCA: uncorrelated PCs ICA: independent PCs Blind source separation

Recall: independent uncorrelated uncorrelated X independent

Independent Component Analysis

Output 3 Sound Source 3

Other Component Analysis Algorithms

Curvilinear Component Analysis (CCA)

CCA: non-linear projection while preserving proximity between points

What you should know?

S-ar putea să vă placă și