Gilbert Anne Christiane

A Generalization of Sparse PCA
to Multiple Correspondence
Analysis
G. Saporta1, A. Bernard1,2, C. Guinot2,3
1
CNAM, Paris, France
2
CE.R.I.E.S., Neuilly sur Seine, France
3
Universit Franois Rabelais, Tours, France
1
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
Background
In case of high dimensional data, PCA or MCA

components are nearly impossible to interpret.
Two ways for obtaining simple structures:
1. Factor rotation (varimax, quartimax etc.) well

known in factor analysis for obtaining simple
structure. Generalized to CA and MCA by Van de
Velden & al (2005), Chavent & al (2012)
But components are still combinations of all
original variables
2. Sparse methods providing components which are

combinations of few original variables, like sparse
PCA.
Extension for categorical variables:
Development of a new sparse method: sparse 2
Outline
1. From Sparse PCA to Group Sparse PCA
1.1 Lasso & elastic net
1.2 Sparse PCA
1.3 Group Lasso
1.4 Group Sparse PCA
2. Sparse MCA
2.1 Definitions
2.2 Algorithm
2.3 properties
2.4 Toy example: dogs data set
3. Application to genetic data
Conclusion
3
1.1 Lasso & elastic net
Lasso: shrinkage and selection method for linear

regression (Tibshirani, 1996)
Imposes the L1 norm on the linear regression
p
coefficients lasso arg min y X 2 j
j 1
Lasso continuously shrinks the coefficients towards

zero
Produces a sparse model but the number of variables
selected is bounded by the number of units
Elastic net: combine ridge penalty and lasso penalty
to select more predictors than the number of
observations (Zou & Hastie, 2005)

p
en arg min y X 2 2 1 1 with 1 j

2
j 1
4
1.2 Sparse PCA
In PCA each PC is a linear combination of all the

original variables
Difficult to interpret the results
Challenge of SPCA: to obtain components easily

interpretable (lot of zero loadings in
principal factors)
Principle of SPCA: to modify PCA imposing

lasso/elastic-net constraint to construct
modified PCs with sparse loadings
Warning: Sparse PCA does not provide a global

selection of variables but a selection dimension by
dimension : different from the regression context 5
1.2 Sparse PCA
Several attempts:
Simple PCA
by Vines (2000) : integer loadings
Rousson, V. and Gasser, T. (2004) : loadings (+ , 0, -)
SCoTLASS by Jolliffe & al. (2003) : extra L1 constraints
Our technique is based on H. Zou, T. Hastie, R.

Tibshirani (2006)
6
1.2 Sparse PCA
= UDV T
Let the SVD of XXbe with
Z = UD the
principal components
Ridge regression:
ridge arg min Z - X
2

2
XT X = VD2 V T with V T V = I
2
D
i,ridge = XT X + I X T XVi Vi 2 ii
-1
v% Vi
Dii +
Loadings can be recovered by regressing (ridge

regression) PCs on the p variables
PCA can be written as a regression-type

optimization problem
7
1.2 Sparse PCA
Sparse PCA add a new penalty to produce sparse

loadings: Lasso
penalty
arg min Z - X

2
+
2
1
+ 1

= is an approximation to
V Vi XV
, and the i th
i i

approximated
component
Produces sparse loadings with zero coefficients to

facilitate interpretation
Alternated algorithm between elastic net and SVD

8
1.3 Group Lasso
X matrix divided into J

sub-matrices Xj of pj variables
Group Lasso: extension of Lasso

for selecting groups of variables
The group Lasso estimate is defined as a solution to

(Yuan & Lin, 2007): 2
J J
arg min y
GL X

p j 1
j j
j 1
j j
If pj=1 for all j, group Lasso = Lasso
9
1.3 Group Sparse PCA
Data matrix X still divided into J

groups Xj
of pj variables, but no Y
Group Sparse PCA: compromise between
SPCA and group Lasso
Goal: select groups of continuous variables (zero
coefficients to entire blocks of variables)
Principle: replace the penalty function in the SPCA

algorithm 2 2
arg min Z - X 1 1

by that defined in theJ group 2LassoJ

GL arg min Z X j j pj j
j 1 j 1
10
2. Sparse MCA
2.1 Definition
Original table In MCA: Complete

disjunctive table
XJ Selection of 1 column in the XJ1
1 X1Jpj 0
original table
pJ 0 1
(categorical variable Xj )
. . .
. = . .
. Selection of a block of pj . .
3 indicator variables 0 0
in the complete disjunctive table
Challenge of Sparse MCA : select categorical
variables, not categories
Principle: a straightforward extension of Group
Sparse PCA for
groups of indicator variables, with the chi-
square metric
11
2. Sparse MCA
2.1 Correspondence analysis: notations
Let F be the n x q disjunctive table divided by the

number of units
r = F1q c = F T 1n Dr = diag r Dc = diag c
%
LetF be the matrix of standardised residuals:
1 1
F - rc D
- -
%= D
F 2 T 2
r c
%= UV T
Singular Value Decomposition
F
12
2. Sparse MCA
2.2 Algorithm
Let start at V[,1:K], the loadings of the first K PCs
Given a fixed = , ...,

1 K
; solve the group lasso
problem for k=1, , K (number of factors, K J)
and j=1,,Jk J
2
J
arg min y F% j k
j p j k
j
j 1 j 1
%k
y = F
and the1 tuning parameter
= , ...,
K
%
For aF
T%
F = UDV T
fixed = UV T
, compute the SVD of
and update
Repeat steps 2-3 %

until convergence
V = k
k k
13
Normalization:
2. Sparse MCA
2.3 Properties
Properties MCA Sparse MCA

Uncorrelated
TRUE FALSE
Components
Orthogonal loadings TRUE FALSE
Barycentric property TRUE TRUE
j % 2
% of inertia tot
100 Z j.1,..., j-1
p k
1

2
%
Total inertia
p j 1
p j 1 Z j.1,..., j-1
j 1
% Z% %
Z
are the residuals after adjusting
Z j.1,..., j-1 j 1,..., j-1 for
(regression projection) 14
2. Sparse MCA
Toy example: Dogs
Data:
n=27 breeds of dogs
p=6 variables
q=16 (total number of
columns)
X : 27 x 6 matrix of
categorical variables
K : 27 x 16 complete
disjunctive table K=(K1,
, K6)
1 bloc
=
1 SNP = 1 Kj matrix
2. Sparse MCA
2.4 Toy example: Dogs
Dim 1 For =0.10:

Dim 2
Dim 3
Dim 4 11 non-zero
loadings
on the 1st axis
6 non-zero
loadings
on the 2nd axis
5 non-zero
loadings
on the 3rd axis
3 non-zero
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain loadings
2. Sparse MCA
2.4 Toy example: Comparison of the loadings
SNPs MCA Sparse MCA

Dim Dim Dim Dim Dim Dim Dim
Dim 1
2 3 4 1 2 3 4
- - -
0.01 0.07 0.06 0.39 0.51
- 0.00 0.00
7 2 0 9 7
0.270 0 0
large - -
0.00 0.00
medium 0.44 0.38 0.06 0.80 0.00
0.222 0 0
small 4 4 5 8 8
0.00 0.00
- - -
0.453 0 0
0.40 0.20 0.08 0.33 0.61
2 5 5 1 0
- -
0.33 0.09 0.09 0.00 0.47 0.27
0.00
2 8 1 0 1 8
0
lightweight 0.437 - - -
0.00
heavy -0.061 0.26 0.11 0.15 0.00 0.36 0.42
0
veryheavy -0.428 5 8 4 0 9 6
0.00
- - -
0
0.33 0.49 0.33 0.00 0.05 0.86
2. Sparse MCA
2.4 Toy example : comparison of displays
Comparison between MCA and Sparse MCA

on the first plan
3. Application on genetic data
Single Nucleotide Polymorphisms
Data:
n=502 individuals
p=100 SNPs (among more
than
300 000 of the original data
base)
q=281 (total number of
columns)
X : 502 x 100 matrix of
qualitative variables
K : 502 x 281 complete
disjunctive table K=(K1,
, K100)
Dim 1
Dim 2
Dim 3
Dim 4
Dim 5
Dim 6
=0.04
30 non-zero
loadings
on the 1st axe

20
Application on genetic data
Comparison of the loadings
Sparse
SNPs MCA
MCA
Dim 1 Dim 2 Dim 1 Dim 2
-
- 0.30
0.00
0.043 9
rs4253711.AA -0.323 0
rs4253711.AG 0.009 0.00
0.016 0.05
rs4253711.GG 0.024 0
- 7
0.00
0.006 0.08
0
6
- 0.00
. . -. .
0.42 0.
. . 0.025
.
rs4253724.AA -0.264 4. .
.
rs4253724.AT .
0.018 . .
0.11 .
0.00
0.014
rs4253724.TT 0.027 5 0
-
0.11
0.008 21
5th International Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain 6 0.00

on the first plan
22

on the second plan
23
Application on genetic data
Comparison of the squared loadings
MCA with Sparse

SNPs MCA
rotation MCA
Dim Dim Dim Dim Dim
Dim 2
1 2 1 2 1
rs4253711 0.10 0.23 0.00 0.00 0.10 0.00

4 2 3 3 6 0
rs4253724
. 0.11
. 0.23
. 0.00
. 0.00
. 0.20
. 0.00
.
. 9
. 8. 3. 2. 6. 0.
. . . . . . .
rs26722 0.00 0.00 0.00 0.00 0.00 0.65
1 3 3 3 0 9
rs35406 0.00 0.00 0.00 0.00 0.00 0.11

24
1 0 3
5 0 5
Comparison between MCA with rotation and sparse MCA

on the first plan
25
Comparison between MCA with rotation and sparse MCA

on the second plan
26
Conclusions and perspectives
We proposed 2 new methods in a unsupervised

multiblock data context: Group Sparse PCA for
continuous variables, and Sparse MCA for
categorical variables
Both methods produce sparse loadings structures
that makes easier the interpretation and the
comprehension of the results
However these methods do not yield sparsity within
groups
Research in progress:
Criteria for choosing the tuning parameter
Extension of Sparse MCA
To select groups and predictors within a group, in order
to produce sparsity at both levels 27
References
Chavent, M., Kuentz-Simonet, V., and Saracco, J. (2012).

Orthogonal rotation in PCAMIX. Advances in Data Analysis
and Classification, 6 (2), 131-146.
Jolliffe, I.T. , Trendafilov, N.T. and Uddin, M. (2003) A

modified principal component technique based on the
LASSO. Journal of Computational and Graphical Statistics,
12, 531547,
Rousson, V. , Gasser, T. (2004), Simple component

analysis. Journal of the Royal Statistical Society: Series C
(Applied Statistics), 53,539-555
Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2012)

A Sparse-Group Lasso. Journal of Computational and 28
Graphical
5th International Statistics,
Conference of the ERCIM, 1-3 December 2012, Oviedo, Spain
References
Van de Velden, M., Kiers, H. (2005) Rotation in

Correspondence Analysis, Journal of Classification,22, 2,
251-271
Vines, S.K., (2000) Simple principal components, Journal of

the Royal Statistical Society: Series C (Applied Statistics),
49, 441-451
Yuan, M., Lin, Y. (2007) Model selection and estimation in

regression with grouped variables. Journal of the Royal
Statistical Society, Series B, 68, 49-67,
Zou, H., Hastie , T. (2005) Regularization and variable

selection via the elastic net. Journal of Computational and
Graphical Statistics, 67, 301-320,
29

Gilbert Anne Christiane

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Gilbert Anne Christiane

Încărcat de

Drepturi de autor:

Formate disponibile

A Generalization of Sparse PCA

G. Saporta1, A. Bernard1,2, C. Guinot2,3

In case of high dimensional data, PCA or MCA

1. Factor rotation (varimax, quartimax etc.) well

2. Sparse methods providing components which are

3. Application to genetic data

Lasso: shrinkage and selection method for linear

Lasso continuously shrinks the coefficients towards

en arg min y X 2 2 1 1 with 1 j

In PCA each PC is a linear combination of all the

Challenge of SPCA: to obtain components easily

Principle of SPCA: to modify PCA imposing

Warning: Sparse PCA does not provide a global

SCoTLASS by Jolliffe & al. (2003) : extra L1 constraints

Our technique is based on H. Zou, T. Hastie, R.

Loadings can be recovered by regressing (ridge

PCA can be written as a regression-type

Sparse PCA add a new penalty to produce sparse

Produces sparse loadings with zero coefficients to

Alternated algorithm between elastic net and SVD

X matrix divided into J

Group Lasso: extension of Lasso

The group Lasso estimate is defined as a solution to

If pj=1 for all j, group Lasso = Lasso

Data matrix X still divided into J

Principle: replace the penalty function in the SPCA

by that defined in theJ group 2LassoJ

Original table In MCA: Complete

Let F be the n x q disjunctive table divided by the

Let start at V[,1:K], the loadings of the first K PCs

Given a fixed = , ...,

Repeat steps 2-3 %

Properties MCA Sparse MCA

Dim 1 For =0.10:

SNPs MCA Sparse MCA

Comparison between MCA and Sparse MCA

Comparison between MCA and Sparse MCA

Comparison between MCA and Sparse MCA

MCA with Sparse

rs4253711 0.10 0.23 0.00 0.00 0.10 0.00

rs35406 0.00 0.00 0.00 0.00 0.00 0.11

Comparison between MCA with rotation and sparse MCA

Comparison between MCA with rotation and sparse MCA

We proposed 2 new methods in a unsupervised

Chavent, M., Kuentz-Simonet, V., and Saracco, J. (2012).

Jolliffe, I.T. , Trendafilov, N.T. and Uddin, M. (2003) A

Rousson, V. , Gasser, T. (2004), Simple component

Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2012)

Van de Velden, M., Kiers, H. (2005) Rotation in

Vines, S.K., (2000) Simple principal components, Journal of

Yuan, M., Lin, Y. (2007) Model selection and estimation in

Zou, H., Hastie , T. (2005) Regularization and variable

S-ar putea să vă placă și