Sunteți pe pagina 1din 26

Correspondence analysis

and Related Methods Part 2


1. What is multiple correspondence analysis (MCA)?
2. Why is MCA so useful as a method of visualizing
questionnaire data?
3. How is MCA implemented in XLSTAT?

Classical or simple CA analyses the relationships


between two variables, although the method is extended
to analyse different forms of tabular data, for example the
productattribute data shown previously, as well as
ratings, preferences, on an individual or aggregate level.
Multiple CA analyses several categorical variables where
we are interested in all the relationships within the set of
variables, not between one set and another
The best way to understand the difference is to see the
different data format for the MCA program in XLSTAT:
these are individual-level responses to several questions.

Source:
Family &
Changing
Gender
Roles Survey
ISSP (1994)
Responses to four questions concerning working women

Demographic categories

BetweenBetween-set versus withinwithin-set


Questions: Should a women work full-time, work part-time
or stay at home or missing data [4 response categories]:
(Q1) before she has children; (Q2) when she has a preschool child; (Q3) when children are still at school; (Q4)
when all children have left home.
Demographics: Country [24], Sex [2], Age group [6]

between-set means that there are two sets of


variables and we are interested in the relationships
between them e.g., between demographics and
the question responses
within-set means that there is one set of variables
and we are interested in the relationships amongst
them e.g., amongst the question responses... this
is the multiple correspondence analysis (MCA) case

BetweenBetween-set example:
example: Simple CA
Q3: Should a woman with a child at school work full-time, part-time or stay at
home?
work
work
stay at
DK/unsure/
COUNTRY
AUS
DW
DE
GB
NIRL
USA
A
H
I
IRL
NL
N
S
CZ
SLO
PL
BG
RUS
NZ
CDN
IL
J
E
RP
Total
Average profile

full-time
W
256
101
278
161
126
482
84
285
171
223
539
487
295
228
341
431
270
175
120
566
468
203
738
243
7271
0.216

part-time
w
1156
1394
691
646
394
686
632
736
670
424
1205
1242
833
585
428
425
427
1154
754
497
664
671
1012
448
17774
0.529

home
H
176
581
62
70
75
107
202
447
167
209
143
205
39
198
222
589
335
550
72
108
92
313
514
484
5960
0.177

missing
?
191
248
66
107
52
172
59
32
10
82
81
153
105
13
41
152
94
119
101
269
63
120
230
25
2585
0.077

Total
1779
2324
1097
984
647
1447
977
1500
1018
938
1968
2087
1272
1024
1032
1597
1126
1998
1047
1440
1287
1307
2494
1200
33590
1

Source:
Family &
Changing Gender
Roles Survey
ISSP (1994)

Simple CA
Should a woman with a child at school work full-time, part-time or stay at home?
0.6

CDN

0.0532 (36.5%)

W
2W

0.4

PL
USA
IL

2??

SLO
BG

0.2

IRL
RP
2H
H
0

DE
S
-0.2

0.0737 (50.6%)

NL
N
NIRL

GB

AUS

CZ

2w
w
I

87.1%
inertia
explained

RUS
A

NZ

DW

-0.4
-0.4

-0.2

0.2

0.4

0.6

Simple CA of multiway tables


Should a woman with a child at school work full-time, part-time or stay at home?

Each country is
split by gender:
242 country-age
groups. We say
the variables
country and age
are interactively
coded

COUNTRY
AUSm
AUSf
DWm
DWf
.
.
.
RPm
RPf
Total
Average profile

work
full-time
W
117
138
43
58
.
.
.
347
390
7271
0.216

work
part-time
w
596
559
675
719
.
.
.
445
566
17774
0.529

stay at
DK/unsure/
home
missing
H
?
114
82
60
109
357
123
224
125
.
.
.
.
.
.
294
111
218
118
5960
2585
0.177
0.077

Average profile stays the same,


so definition of centre and
geometric distance remain
identical to previous map, all that
has been done is to split each
country point into two profiles

Total
909
866
1198
1126
.
.
.
1197
1292
33590
1

Simple CA of multiway tables


Should a woman with a child at school work full-time, part-time or stay at home?
0.6

CDNf

0.0546 (35.3%)

Inertia before:
0.01456
Inertia with MF
split:
0.01546
5.8% due to MF

CDNm
W

0.4

PLf

USAf Ilm
USAm
0.2

-0.2

Ef

Ilf
?

IRLm

NLm Nm
Sm NLf
Nf
NIRLm
Def GBm

RPm

RPf

0.0797 (51.5%)

CZm
CZf

Hf Jm Hm

Jf

Af DWf
0

Bulgaria (BG) is
only country with
a reverse MF
difference

RUSm

Im

NZm
-0.2

Ireland (IRL) has


largest MF
difference

BGf

Sf AUSf NIRLf
w
GBf
If AUSm
NZf

-0.4
-0.4

BGm

IRLf

Dem

SLOf SLOm
Em

PLm

Am
0.2

RUSf

86.8%
inertia
explained

DWm

0.4

0.6

0.8

Simple CA of multiway tables


Should a woman with a child at school work full-time, part-time or stay at home?
1

0.0791 (33.0%)
CDNf<25
PLm>66

PLm<26-35
DEm<25

0.5

Points tend to lie in a


curved pattern (called
arch or horseshoe)

W
H

?
0

Interactive coding of
country (24), gender (2)
and age (6), giving 288
combinations

Hm>66

Points that lie inside


the arch are
polarized, e.g.
PLm26-35: 32% W,
22% w, 32% H, but
NZm>66: 7% W,
73% w, 15% H
Average: 22% W,
53% w, 18% H

0.1301 (54.3%)

-0.5

NZm>66

87.3% inertia
explained

-1
-1

-0.5

0.5

Stacked tables
Should a woman with a child at school work full-time, part-time or stay at home?
WwH?

Each variable is separately cross-tabulated with


the question and then stacked one on top of
another.
Country (24)

Gender (2)
Age (6)
Education (7)
Marital status (5)
Social class (8)

Since the column margins of each table are


identical (and same as the interactively coded
tables before), the basic geometry remains the
same, its just the detail that is sacrificed here,
all the information is collapsed into main
effects.
Inertia of stacked table is the average of the
inertias of its subtables

Stacked tables
Should a (married)
woman before having
children...

... with a
preschool child...

... with a child at


school ...

WwH?WwH?WwH?WwH?

... when her


children have
left home work
full-time, parttime or stay at
home?

Country (24)

Gender (2)
Age (6)
Education (7)
Marital status (5)

Social class (8)

Tables can be stacked


row-wise and column-wise,
adding additional questions
as columns
24 contingency tables in a
6 4 pattern, row margins
and column margins are
the same.
Inertia of stacked table is
the average of the inertias
of its subtables

Stacked tables
Women in the workplace and 6 demographic variables
0.0084 (21.9%)

RP
0.4

1H

2W
E
0.2

IL 3W

4H

CDN

SLO
E7

DE
0

-0.2

si

E2

A1

S6
A2

S0 BG

USA 2? 2w IRL
E1
se
M
NL S5 3? 4W
1w
E6 A3 1?
F
S*
di E5
S2 A5 I wi
1W 4? CZ ma E3 H
A6
N
S3
S4
A4
2H J
3w E4
RUS
4w
NIRL

0.0188 (49.1%)

3H

NZ

-0.2

DW

Relationships
within questions
and relationships
within
demographics not
displayed
explicitly
Join categories of
ordinal variable to
see trends, for
example age.

GB
AUS

-0.4
-0.4

PL S1

Relationships
between each
demographic
variable and each
question
displayed jointly

0.2

0.4

0.6

71.0% inertia
explained

Multiple correspondence analysis (MCA)


Women in the workplace 4 questions
West & East German samples only
Original data
Questions
1 2 3 4

Indicator Matrix
Qu. 1

Qu. 2

Qu. 3

Qu. 4

W w H ?

W w H ?

W w H ?

W w H ?

-------------------------------------------------1 3 2 2

1 0 0 0

0 0 1 0

0 1 0 0

0 1 0 0

2 3 3 2

0 1 0 0

0 0 1 0

0 0 1 0

0 1 0 0

4 3 3 2

0 0 0 1

0 0 1 0

0 0 1 0

0 1 0 0

4 4 4 4

0 0 0 1

0 0 0 1

0 0 0 1

0 0 0 1

4 4 4 4

0 0 0 1

0 0 0 1

0 0 0 1

0 0 0 1

1 3 2 1
. . .

1 0 0 0

0 0 1 0

0 1 0 0

1 0 0 0

. . .
. . .

and so on for 3415 rows

Response data is
recoded as
dummy variables

N rows, Q
questions, q-th
question has Jq
categories, total
number of
categories is J
( N = 3415, Q = 4
Jq = 4 for all q,
J = 16 )
One definition of
MCA is that it is
the CA of the
indicator matrix

MCA: XLSTAT initial output


Total inertia:

Eigenvalues and percentages of inertia:

Eigenvalue
Inertia (%)
Cumulative %
Adjusted Inertia
Adjusted Inertia (%)
Cumulative %

F1
0.692
23.061
23.061
0.347
66.152
66.152

F2
0.513
17.108
40.169
0.123
23.482
89.634

F3
0.365
12.156
52.325
0.023
4.456
94.090

Total inertia in MCA of indicator matrix Z =

F4
0.307
10.248
62.573
0.006
1.118
95.208

J Q 16 4
=
=3
Q
4

F5
0.218
7.254
69.827

... F12

Multiple correspondence analysis (MCA)


Women in the workplace 4 questions
Burt matrix
1W

1w

1H

1?

1w

0 476

129

335

16

1H

79

72

17

61

1?

0 360

57

108 194

96

2W

172

181

127

48

2w 1107 129

57

0 1299

219

997

24

0 290

50

379

0 2083

1W 2500

2H 1130 335

2W

2w

2H

172 1107 1130

72 108

0 1645

0 194

2?
91

3W

3w

3H

3?

4W

4w

4H

355 1709 345

91

1766

537

40 157

18

128

293

17

38

14

21

38

55 202

51

45

2 262

165

15

61

22

972

239

13

75

988 573

60

760

615

84 186

4 227

62

27

0 201

360

14

1348

566

23 146

202

286

73

0 311

49

30

49

1959

30

896

97

81 232

261 181

2?

91

3W

355

16

127

219

24

3w 1709 261

17

96

48

997

988

50

3H

61

55

61

573

0 642

0 202

22

60 227

3?

345 181
91

18

4W 1766 128

14

51

165

972

760

62

4w

21

45

15

239

615

27

14

38

13

84

23

6 262

75

186 201

146

537 293

4H

40

17

4?

157

38

360 1348 202


566 286
73

4?

Stacked matrix of
all two-way
contingency
tables, including
each variable with
itself
If Z (NJ) is the
indicator matrix,
then the Burt
matrix B (JJ) is
B = ZTZ

81

0 232

0 463

Alternative
definition of MCA
is that it is the CA
of the Burt matrix

MCA (Burt
(Burt matrix version)
version)
Women in the workplace 4 questions
2
0.479
0.263 (23.0%)

Results are same for


Burt matrix, just
principal inertias
change.

2W
1

3W
2w
4W 1W
3w

Relationships
amongst (within) the
set of questions are
displayed jointly

2?
4?

1?

3?
0.263(41.9%)
(41.9%)
0.479

2H
4w

Missing value
categories have
strong association

1w
3H

-1

64.9% inertia
explained (only
40.2% if indicator
matrix analysed)

4H
1H

-2

-3
-1

Multiple correspondence analysis (MCA)


Women in the workplace 4 questions
Burt matrix inertias of each subtable
1W

1w

1H

1?

0 476

0.363
6
72

57

181

1W 2500
1w
1H

3.000
0
0 79

1?

0 360

2W

172

2w 1107 129

0.363 6

57

2H 1130 335

2W

2w

2H

172 1107 1130


129

335

2?
91

4W

4w

4H

355 1709 345

91

1766

537

40 157

18

128

293

17

14

0.644 6
21 38

55 202

51

45

2 262

165

15

61 22
0.892

972

239

60

760

615

84 186

4 227

62

27

0 201
1

108 194

96

261 181

38

127

48

0 1299

3.0000

219

997

72 108

0 1645

24

0 194

0 290

50

379

360

14

0 2083

1348

566

202

286

0 311

49

30

49

1959

30

896

355

16

127

219

24

3w 1709 261

17

96

48

997

988

55

0.892
61 573

50

0 202

22

0.42461

345 181

60 227

4?

3W

18

3?

0.424
17 61

91

3H

16

91

3?

3w

2?

3H

3W

988 573

3.000
0 642
0

13 75
0.345

23 146

0.480
73 81

0 232

4W 1766 128

14

51

165

972

760

62

4w

21

45

15

239

615

27

14

0.644
17 38

0.345
13
84

23

6 262

75

186 201

146

81 232

0 463

537 293

4H

40

4?

157

38

360 1348 202


566 286

0.480
73

3.000
97

Percentage of
variance
explained is
actually much
higher, in MCA
the overall inertia
is inflated by the
diagonal tables in
the Burt matrix
the percentage is
actually about
90%
Total inertia of
Burt matrix is
average of the
inertias of its
submatrices =
1.143
Since the
diagonal inertias
are so high, this
inflates the
average, hence
low percentages

Adjustment of principal inertias


(eigenvalues)
eigenvalues)
We can rescale an existing MCA solution in order to best fit the off-diagonal
tables. All we need is the total inertia of the Burt matrix, inertia(B), and the
principal inertias k2 of the Burt matrix in the solution space.
If we have computed the solution on the indicator matrix Z (as in MCA module
of XLSTAT), the eigenvalues calculated are k so all the squares of the
principal inertias of Z need to be summed in order to get inertia(B). If you
have analysed the Burt matrix B, inertia(B) is the total inertia.
Here are the steps to rescale the solution:
1.

Calculate the average off-diagonal inertia :


Q
J Q
inertia (B)

average off-diagonal inertia =


2
Q 1
Q

2.

Calculate the adjusted principal inertias :


2
2
Q
1
1

only
for

>
adjusted principal inertias = Q 1 k Q
k Q

Calculate adjusted percentages of inertia :


adjusted principal inertias
adjusted percentages of inertia = average off - diagonal inertia

3.

MCA (adjusted
(adjusted)
adjusted)
Women in the workplace 4 questions
2
0.123 (23.5%)

2W
3W
2w
4W 1W
3w

2?
4? 1?

3?
0.347 (66.2%)

2H
4w
1w
3H
-1

4H
1H
-2

89.7% inertia explained


-3
-1

MCA (Burt
(Burt matrix version)
version)
Women in the workplace 4 questions
2
0.479 (23.0%)

0.263 (23.0%)

2W
1

3W
2w

2?

4W 1W
3w

4?

1?

3?
0.263(41.9%)
(41.9%)
0.479

2H
4w
1w
3H

-1

4H
1H

-2

64.9% inertia explained


-3
-1

MCA
Women in the workplace supplementary demographic groups
0.5

DE

E1
E4
di A3
F

A2 E* A1 E5
si
E6

se
ma
A5

A4
E3

A6

wi

DW
E2

-0.5
-0.5

0.5

Related topics
1. Subset correspondence analysis
restricting analysis to a subset of categories (e.g. all
substantive responses excluding missing categories, or
missing categories by themselves, or middle categories)
2. Square asymmetric tables
mobility tables, brand-switching, migration...
3. Recoding of data before applying CA
ratings, preferences, paired comparisons, continuous-scale
data (ratio and interval)
4. Stability and inference
concentration ellipses, convex hulls, permutation tests
5. Canonical correspondence analysis (CCA)
CA with explanatory variables (combination of dimensions
reduction and regression)

Subset correspondence analysis


For example, analysing the women working data but ignoring the missing
values (this is NOT just a CA of the table without the missing value columns
the masses and metric of the complete matrix are maintained).
In XLSTATs MCA program you are given a menu for selecting which
categories you want to retain or omit:

Subset correspondence analysis


0.0241 (13.5%)
1H
1

4H

2W
0.5

3W

3H
1W 4W

0.1240 (70.0%)

2H
1w

-0.5
-1.5

-1

-0.5

4w

2w
3w

0.5

Canonical correspondence analysis (CCA)


CCA)

This has the same objective as CA but restricts the CA solution to be (linearly)
related to external predictor variables, for exampe we want to find the best
low-dimensional view of the responses which is related to age (either age
group or original age variable)

Canonical correspondence analysis


(restricted to age group differences)
differences)
0.6

0.465 (18.4%)

agegp-6
0.4

Q2-1
Q3-4
agegp-5

Q3-1
agegp-2

0.2

agegp-1
Q1-2 Q3-3

Q4-4
Q2-4

Q1-1

Q4-3
0

Q4-2

Q2-3

Q2-2
Q3-2

Q4-1

0.685 (63.5%)

Q1-4

-0.2

-0.4

Q1-3
-0.6
-0.8

-0.6

-0.4

agegp-3
agegp-4
-0.2

0.2

0.4

0.6

S-ar putea să vă placă și