Documente Academic
Documente Profesional
Documente Cultură
to Multiple Correspondence
Analysis
1
CNAM, Paris, France
2
CE.R.I.E.S., Neuilly sur Seine, France
3
Universit Franois Rabelais, Tours, France
1
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
Background
2. Sparse MCA
2.1 Definitions
2.2 Algorithm
2.3 properties
2.4 Toy example: dogs data set
Conclusion
3
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
1. From Sparse PCA to Group Sparse PCA
1.1 Lasso & elastic net
j 1
4
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
1. From Sparse PCA to Group Sparse PCA
1.2 Sparse PCA
Several attempts:
Simple PCA
by Vines (2000) : integer loadings
Rousson, V. and Gasser, T. (2004) : loadings (+ , 0, -)
6
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
1. From Sparse PCA to Group Sparse PCA
1.2 Sparse PCA
= UDV T
Let the SVD of XXbe with
Z = UD the
principal components
Ridge regression:
ridge arg min Z - X
2
2
XT X = VD2 V T with V T V = I
2
D
i,ridge = XT X + I X T XVi Vi 2 ii
-1
v% Vi
Dii +
= is an approximation to
V Vi XV
, and the i th
i i
approximated
component
9
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
1. From Sparse PCA to Group Sparse PCA
1.3 Group Sparse PCA
3 indicator variables 0 0
in the complete disjunctive table
Challenge of Sparse MCA : select categorical
variables, not categories
Principle: a straightforward extension of Group
Sparse PCA for
groups of indicator variables, with the chi-
square metric
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
11
2. Sparse MCA
2.1 Correspondence analysis: notations
%
LetF be the matrix of standardised residuals:
1 1
F - rc D
- -
%= D
F 2 T 2
r c
%= UV T
Singular Value Decomposition
F
12
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
2. Sparse MCA
2.2 Algorithm
%k
y = F
and the1 tuning parameter
= , ...,
K
%
For aF
T%
F = UDV T
fixed = UV T
, compute the SVD of
and update
p k
1
2
%
Total inertia
p j 1
p j 1 Z j.1,..., j-1
j 1
% Z% %
Z
are the residuals after adjusting
Z j.1,..., j-1 j 1,..., j-1 for
(regression projection) 14
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
2. Sparse MCA
Toy example: Dogs
Data:
n=27 breeds of dogs
p=6 variables
q=16 (total number of
columns)
X : 27 x 6 matrix of
categorical variables
K : 27 x 16 complete
disjunctive table K=(K1,
, K6)
1 bloc
=
1 SNP = 1 Kj matrix
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
2. Sparse MCA
2.4 Toy example: Dogs
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
3. Application on genetic data
Single Nucleotide Polymorphisms
Data:
n=502 individuals
p=100 SNPs (among more
than
300 000 of the original data
base)
q=281 (total number of
columns)
X : 502 x 100 matrix of
qualitative variables
K : 502 x 281 complete
disjunctive table K=(K1,
, K100)
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
3. Application on genetic data
Single Nucleotide Polymorphisms
Dim 1
Dim 2
Dim 3
Dim 4
Dim 5
Dim 6
=0.04
30 non-zero
loadings
on the 1st axe
20
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
Application on genetic data
Comparison of the loadings
Sparse
SNPs MCA
MCA
Dim 1 Dim 2 Dim 1 Dim 2
-
- 0.30
0.00
0.043 9
rs4253711.AA -0.323 0
rs4253711.AG 0.009 0.00
0.016 0.05
rs4253711.GG 0.024 0
- 7
0.00
0.006 0.08
0
6
- 0.00
. . -. .
0.42 0.
. . 0.025
.
rs4253724.AA -0.264 4. .
.
rs4253724.AT .
0.018 . .
0.11 .
0.00
0.014
rs4253724.TT 0.027 5 0
-
0.11
0.008 21
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain 6 0.00
3. Application on genetic data
Single Nucleotide Polymorphisms
22
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
3. Application on genetic data
Single Nucleotide Polymorphisms
23
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
Application on genetic data
Comparison of the squared loadings
rs4253724
. 0.11
. 0.23
. 0.00
. 0.00
. 0.20
. 0.00
.
. 9
. 8. 3. 2. 6. 0.
. . . . . . .
rs26722 0.00 0.00 0.00 0.00 0.00 0.65
1 3 3 3 0 9
25
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
3. Application on genetic data
Single Nucleotide Polymorphisms
26
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
Conclusions and perspectives
Research in progress:
Criteria for choosing the tuning parameter
Extension of Sparse MCA
To select groups and predictors within a group, in order
to produce sparsity at both levels 27
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
References