Sunteți pe pagina 1din 6

RESEARCH AND SURVEY STATISTICS – STA3022F

SOLUTION TO TUTORIAL #1
Week 2 2008

CORRESPONDENCE ANALYSIS

QUESTION 1

Q1.1

H0: There is no association between car type and features


H1: There is a significant association between car type and features

The Chi-square statistic can be found from the ‘Eigenvalues and Inertia for all Dimensions’ table…

Chi2 = Chi2 on dimension 1 + Chi2 on dimension 2 = 36.0195 + 23.4993 = 59.5188


p-level = 0.0000

Therefore reject H0 at the 5% level and conclude that there is an association between car type and
features

Q1.2

Average column profile

68/143 = 0.475
26/143 = 0.181
49/143 = 0.342

Average row profile

24/143 = 0.167 53/143 = 0.370 39/143 = 0.272 27/143 = 0.188

You can also get this from the ‘Mass’ columns of the ‘Row Coordinates’ and ‘Column
Coordinates’ tables, remembering Row Mass = Average column profile, and Column Mass =
Average row profile.

Q1.3

Eigenvalues and Inertia for all Dimensions (Corres11 - Car Brands Attribute Association Study)
Input Table (Rows x Columns): 3 x 4
Total Inertia=.41622 Chi²=59.519 df=6 p=0.0000
Number Singular Eigen- Perc. of Cumulatv Chi
of Dims. Values Values Inertia Percent Squares
1 0.501881 0.251885 60.51784 60.5178 36.01952
2 0.405377 0.164331 39.48216 100.0000 23.49932

Eigenvalue(dimension 2) = Total inertia – Eigenvalue(dimension 1)


= 0.41622 – 0.251885 = 0.164331

Perc of Inertia(dimension 1) = 0.2518/(0.2518 + 0.1643) = 60.51%

1
Perc of Inertia(dimension 2) = 0.1643/(0.2518 + 0.1643) = 39.49%

Attach most importance to dimension 1 when interpreting results, but also need to look at
dimension 2 since it contributes an important amount to the explained variation.

Q1.4

Use coordinates listed in ‘Row Coordinates’ and ‘Column Coordinates’ tables.

2 D P lo t o f R o w a n d C o lu m n C o o r d in a t e s ; D im e n s io n : 1 x 2
In p u t T a b le ( R o w s x C o l u m n s ) : 3 x 4
S t a n d a r d iz a t io n : R o w a n d c o lu m n p r o f ile s
1 .0
Dimension 2; Eigenvalue: .16433 (39.48% of Inertia)

0 .8 BA LLA DE
STY LE

0 .6

0 .4 P R IC E

0 .2
LA S ER
0 .0 ECO NO M Y

- 0 .2

- 0 .4 H A N DN LISINS GA N

- 0 .6
R o w .C o o r d s
- 0 .8 - 0 .6 - 0 .4 - 0 .2 0 .0 0 .2 0 .4 0 .6 0 .8 1.0
C o l. C o o r d s
D i m e n s i o n 1 ; E ig e n v a l u e : . 2 5 1 8 8 ( 6 0 . 5 2 % o f I n e r t ia )

Q1.5

Considering dimension 1 only: (60.5% of inertia)


Since a number of car types and attributes are positioned far from the origin, their respective
row/col profiles differ significantly from the average row/col profile.
Ballade is strongly associated with Style, and Laser with Economy and Price. Nissan is moderately
associated with Handling.

On dimension 2 (30.5% of inertia), there are similar interpretations. Ballade is also associated with
Style, and Nissan with Handling. However, since Laser falls between -0.2 and +0.2, it cannot be
associated with any features on dimension 2. However, it remains fair to say that Laser is
associated with both Economy and Price, because those observations are valid on the more
powerful dimension 1.

2
QUESTION 2

Q2.1

H0: There is no association between preferred hotel and age group


H1: There is a significant association between preferred hotel and age group

The Chi-square statistic can be found from the ‘Eigenvalues and Inertia for all Dimensions’ table…

Chi2 = Chi2 on dimension 1 + Chi2 on dimension 2 + Chi2 on dimension 3


= 55.4506 + 9.9929 + 2.9294 = 68.373
p-level = 0.0000

Therefore reject H0 at the 5% level and conclude that there is an association between preferred
hotel and age group

Q2.2

Average column profile

154/720 = 0.2138
203/720 = 0.2819
232/720 = 0.3222
131/720 = 0.1819

Average row profile

211/720 = 0.2930 140/720 = 0.1944 119/720 = 0.1652 101/720 = 0.1402 149/720 = 0.2069

You can also get this from the ‘Mass’ columns of the ‘Row Coordinates’ and ‘Column
Coordinates’ tables, remembering Row Mass = Average column profile, and Column Mass =
Average row profile.

Q2.3

Eigenv alues and Inertia f or all Dimensions (Corres12 - Hotel Guests Prof iling Study )
Input Table (Rows x Columns): 4 x 5
Total Inertia=.09496 Chi²=68.373 df =12 p=0.0000
Number Singular Eigen- Perc. of Cumulatv Chi
of Dims. Values Values Inertia Percent Squares
1 0.277515 0.077015 81.10008 81.1001 55.45060
2 0.117810 0.013879 14.61537 95.7154 9.99297
3 0.063787 0.004069 4.28455 100.0000 2.92948

Eigenvalue(dimension 1) = Total inertia – Eigenvalue(dimension 2) – Eigenvalue(dimension 3)


= 0.09496 – 0.013879 – 0.004069

Perc of Inertia(dimension 1) = 0.077015/0.09496 = 81.10%

3
Attach most importance to dimension 1 when interpreting results, can ignore or give very little
attention to dimension 2 since it contributes much less to the explained variation (20%). Ignore
dimension 3, it is insignificant.

Q2.4

Use coordinates listed in ‘Row Coordinates’ and ‘Column Coordinates’ tables.

2 D P lo t o f R o w a n d C o lu m n C o o r d in a t e s ; D im e n s io n : 1 x 2
In p u t T a b le ( R o w s x C o l u m n s ) : 4 x 5
S t a n d a r d iz a t io n : R o w a n d c o lu m n p r o f ile s
0 .2 0
Dimension 2; Eigenvalue: .01388 (14.62% of Inertia)

0 .1 5 C IT Y
3 5 -4 5
<2 5
0 .1 0 H O L IN N

4 5 -5 5
0 .0 5

0 .0 0

- 0 .0 5 SUN

- 0 .1 0 >55

- 0 .1 5 2 5 -3 5
PRO TEA
- 0 .2 0

- 0 .2 5
R o w .C o o r d s
- 0 .6 - 0 .5 - 0 .4 - 0 .3 - 0 .2 - 0 .1 0 .0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6
C o l. C o o r d s
D i m e n s io n 1 ; E ig e n v a l u e : .0 7 7 0 1 ( 8 1 . 1 0 % o f In e r t ia )

Q2.5

Considering dimension 1 only: (81.1% of inertia)


Only some row and column variable levels are ‘far’ from the origin on dimension 1. Generally,
there is a moderate association between 45-55 year olds and over 55 year olds and Holiday Inn,
and a moderate association between 25-35 year olds and City Lodge. There is a weak association
between under 25’s and City Lodge. There are no strong demographic features associated with
Southern Sun or Protea Hotels.

4
QUESTION 3

1. % of inertia for DIM 1 = 67.26 – (17.61+24.00) = 25.65

Chi-squared on DIM 1 = (25.65/17.61)*20040.67 = 29190.4


Chi-squared on DIM 2 = (24.00/17.61)*20040.67 = 27312.6

Total Chi-squared = 113788.88

H0: There is no association between language spoken and province lived in


H1: There is an association

Test chi-squared = 113788.88


D.o.F. = (r-1)(c-1) = (12-1)(9-1) = 80
Critical chi-squared = 116.321

Since Test > Crit, reject H0 and conclude there is a significant association.

2. A two-dimensional plot explains 49.6% of the variation, which is fairly poor.


One advantage: easy to interpret and plot on a page
One disadvantage: won’t show all relationships accurately

3. Limpopo & Tshivenda


Xhosa & Eastern Cape
Zulu & KZN

Or: Limpopo & Sepedi


Or: Limpopo & Xitsonga

4. English: shows no strong associations with any province and lies close to 0.2 on DIM 1
Sesotho: falls very close to zero on DIM 1 and has a mild association with Free State and
Western Cape on DIM 2
Might also say Afrikaans

5. Tshivenda is popular in Limpopo but is not widely spoken elsewhere. It is the 2nd least
spoken language and thus may appear to be under threat elsewhere.

Setswana is spoken by over 3.5 million people and is by far the dominant language in the
North West province. It is not under any threat.

5
QUESTION 4

1. H0: There is no association between risk type and preferred investment


H1: There is an association

Total chi-squared = Total inertia * n = 0.08815 * 340 = 29.971


Critical chi-squared = 12.59
Since Test > Critical, reject H0 and conclude there is an association

2. See figure below

2 D P l o t o f R o w a n d C o lu m n C o o r d i n a te s ; D im e n s io n : 1 x 2
In p u t T a b le ( R o w s x C o lu m n s ) : 4 x 3
S ta n d a r d i z a ti o n : R o w a n d c o lu m n p r o fi l e s
0 .2 5
U n it tr u s ts
Dimension 2; Eigenvalue: .00632 (7.164% of Inertia)

0 .2 0

0 .1 5

0 .1 0
R is k lo ve r

0 .0 5 R is k a ve r s e B onds

0 .0 0
F ixe d d e p o s its
O p tio n s a n d fu tu r e s
- 0 .0 5

- 0 .1 0 R is k n e u tr a l

- 0 .1 5
R o w .C o o r d s
- 0 .5 - 0 .4 - 0 .3 - 0 .2 - 0 .1 0 .0 0 .1 0 .2 0 .3 0 .4 0 .5 0 .6
C o l.C o o r d s
D i m e n s io n 1; E ig e n va lu e : . 0 8 1 8 4 ( 92 . 8 4 % o f In e r t ia )

3. Options & Risk lover


Fixed deposits & Risk averse
Possibly: Unit trusts & Neutral (but very weak)

4. Principal components analysis.


Factor analysis.
1) Each dimension/PC is expressed as a linear combination of the underlying variables
2) PC1 attempts to explain the maximum amount of variation (have the maximum
variance)
3) Subsequent PC’s are constructed so as to explain the maximum amount of left-over
variation and be unrelated/uncorrelated/orthogonal to PC1 (and any PC’s that came
before).

5. The ability of the map to accurately portray the point. They are all equal to one because you
only need 2 dimensions (= MIN(r-1,c-1) = MIN(3,2) = 2) to perfectly display all
relationships in the data.

6. Fixed deposits. Has largest mass (0.547)

S-ar putea să vă placă și