Sunteți pe pagina 1din 10

CONTINGENCY TABLES AND CHI-SQUARED TESTS

In this type of analysis we have two characteristics, such as gender and eye colour, which cannot be measured but which can be used to group people by variations within them. These characteristics may, or may not, be associated in some way. How can we decide? We can take a random sample from the population, note which variation of each characteristic is appropriate for each case and then cross-tabulate the data. It is then analysed in order to see if the proportions of each characteristic in the sub-samples are the same as the overall proportions - easier to do than describe !s an e"ample, if there is no relationship between gender and eye colour we would e"pect similar proportions of males and females to have blue eyes. The Variables If the variables are, as is usually the case, nominal, #described by name only$, fre%uencies may be cross-tabulated by each category within each variable. &rdinal variables may be used if there are only a limited number of orders so that each one can be classified as a separate category. 'ontinuous variables may be grouped and then tabulated similarly though the results will then vary according to the grouping categories.

Contin en!" Tables #Cross-Tabs$ (ou have met this type of table before as a contingency table when calculating probabilities. !s a reminder) cases are allotted to categories and their fre%uencies crosstabulated* e.g. in the gender + eye colour e"ample there might be blue eyed males, blue eyed females, brown eyed males, brown eyed females, etc. These tables are known as contingency tables. !ll possible ,contingencies, are included in the ,cells, which are themselves mutually e"clusive. The table is completed by calculating the ,row totals,, the ,column totals, and the ,grand total,. E%&e!te' (al)es If the two variables, the characteristics under scrutiny, are completely independent, the proportions within the sub-totals of the contingency table would be e"pected to be the same as those of the totals for each variable. In practice we work with fre%uencies rather than proportions, distinguishing between ,observed, and ,e"pected, fre%uencies by enclosing the latter within brackets. If gender and eye colour are independent and if a third of the population has blue eyes, we would e"pect a third of males to be blue eyed and a third of females to be blue eyed. These proportions are obviously contrived so as to be easy to work with. How can we cope with more awkward numbers? In the on-going e"ample, the proportions are first calculated as fractions which are then multiplied by the total fre%uency to find the e"pected individual cell fre%uencies. This produces a formula which is applicable in all cases) -or any cell the e"pected fre%uency is calculated by)
Ro+ total Col)*n total O(erall total

where the relevant row and column are those crossing in that particular cell.

Chi-s,)are' # $ Test .or In'e&en'en!e The hypothesis test which is carried out in order to see if there is any association between categorical variables, such as gender and eye colour, is known as the 'hi-s%uared, # $, test,
/

E%a*&le / The following table compiled by a personnel manager relates to a random sample of .01 staff taken from the whole workforce of the supermarket chain. We shall, in this e"ample, test for association between a member of staff,s gender and his+her type of 2ob, at the 34 level of significance. 5ale 6upervisor 6helf stacker Till operator 'leaner Total 'ompleting the row and column totals, as previously with probability, gives the full table. In this e"ample we randomly selected a sample of .01 supermarket staff and found that two thirds, ./1, of them were female and one third, 91, were male. !ssuming there is no association between gender and 2ob category and finding that we have 83 till operators, we would e"pect two thirds, 71, of them to be female and one third, .3, of them to be male. :ote that these figures are a %uarter of each gender respectively, which checks since till operators, 83, form a %uarter of the total staff, .01. We now calculate the other e%&e!te' .re,)en!ies from the probabilities and put them into the table) #6ee 6ection 8.0$ ; #6upervisor$ < 73+.01 ; #5ale$ < 91+.01 assuming independence.
73 91 .01 .01 .01 = ...9=

-emale .3 71 73 81

Total

/1 /1 .1 .1

; #6upervisor and male$ < 73+.01 " 91+.01

Therefore the e%&e!te' n)*ber of 5ale supervisors <

This is the e"pected fre%uency for members of staff who are both male and a supervisor. :ote that it is a theoretical number which does not have to be an integer.

This simplifies to)

>ow total 'olumn total &verall total

73 91 = ...9= .01

'alculating the other e"pected fre%uencies and inserting them in the table #in brackets$) 5ale 6upervisor 6helf stacker Till operator 'leaner Total /1 /1 .1 .1 #...9=$ #.9.9=$ #.3.11$ #.9.9=$ 91 .3 71 73 81 -emale #/7.77$ #77.77$ #71.11$ #77.77$ ./1 Total 73 31 83 31 .01

These are the fre%uencies which would be e"pected if there is no association between gender and 2ob category at the supermarket. If the e"pected fre%uencies are observed to actually occur in practice then we can deduce that the two variables are indeed independent. We would obviously not e"pect to get e"act agreement with the e"pected fre%uencies, so some critical amount of difference is allowed and we compare the difference from our observations with that allowed by the use of a standard table. !re the values observed so different to those e"pected that we must re2ect the idea of independence? &r are the results 2ust due to sampling errors, with the variables actually being independent? It is to be hoped that you recognise the need for a hypothesis test The !his,)are' # -$ H"&othesis test To find the answer, we analyse the data and compare the result to a standard table figure. We carry out a formal hypothesis test at 34 significance) the !hi-s,)are' test. .$ /$ 7$ 8$ 6tate :ull Hypothesis, H1, #that of no association$ and !lternative Hypothesis, H.. >ecord observed fre%uencies, &, in each cell of the contingency table. 'alculate row, column and grand totals. 'alculate e"pected fre%uency, ?, for each cell ) row total " column total grand total :ote that) :o e"pected fre%uency should be less than . and the number of e"pected fre%uencies below 3 should not be over /14 of the total number of cells. &therwise the test is invalid. 3$ -ind critical value from chi-s%uare table, as appended, with #r - .$ #c - .$ degrees of freedom where r and c are the number of rows and columns respectively.

9$ =$

'alculate test statistic)

( & ?) /
?

'ompare the two values and conclude whether the variables are independent or not.

In e"ample ., we have already carried out steps /, 7 and 8 of the procedure by calculating the e"pected values. Whether these are calculated before, or during, the test is up to personal preference. 6ome statisticians also prefer to calculate the test statistic, as this procedure is rather lengthy, before starting the test and to then insert the calculated value in the formal hypothesis test. The test statistic is an overall measure of the difference between the e"pected and observed fre%uencies. ?ach cell difference is s%uared so that positive and negative differences have the same weighting and proportioned by the si@e of the e"pected cell contents. When the contributions from each cell are totalled their sum is compared with a critical value from the chi-s%uared table A hence the name of this test. N)ll H"&othesis #H0$) There is no asso!iation between gender and 2ob category. #>emember that ,null, means none.$ Alternati(e H"&othesis #H/$) There is an association between gender and 2ob category. Criti!al Val)e) from the chi-s%uared table :umber of degrees of freedom #$ < #r - .$#c - .$ < #8 - .$#/ - .$ < 7 " . < 7* / table, as appended, is always one tailed. Bevel of significance < 34
/ 34, = 7 <

=.0.9

Test statisti! The test statistic is calculated from the contingency table which includes both the observed and the e"pected values for the fre%uency of staff. The data may be tabulated, as we shall do in this e"ample, or the contribution of each cell may be calculated directly as and then the test statistic found as the sum of these contributions) 5ale 6upervisor 6helf stacker Till operator 'leaner Total /1 /1 .1 .1 #...9=$ #.9.9=$ #.3.11$ #.9.9=$ 91 .3 71 73 81 -emale #/7.77$ #77.77$ #71.11$ #77.77$ ./1 Total 73 31 83 31 .01

(& ?)/
?

Test statisti! <

( & ?) /
?

& /1 .3 /1 71 .1 73 .1 81

? ...9= /7.77 .9.9= 77.77 .3.11 71.11 .9.9= 77.77

#& - ?$ 0.77 -0.77 7.77 -7.77 -3.11 3.11 -9.9= 9.9= Total

#& - ?$/+? 3.C89 /.C=8 1.993 1.777 ..99= 1.077 /.99C ..773 .9.8//

Test Statisti!) .9.8// Con!l)sion) Test statistic D 'ritical value therefore re2ect H1 . 'onclude that there is an association between gender and 2ob category in the supermarket chain. Booking again at the data we can see that far more males than e"pected were supervisors or shelf stackers and more females were cleaners or till operators.

E%a*&le In this e"ample we first have to set up the contingency table from the following information collected from a %uestionnaire) In a recent survey within a 6upermarket chain, a random sample of .91 employees) stackers, sales staff and administrators, were asked to grade their attitude towards future wage restraint on the scale) Eery favourable* favourable* unfavourable* very unfavourable. &f the 81 stackers interviewed, = gave the response ,favourable,, /8 the response ,unfavourable,, and 0 the response ,very unfavourable,. There were 39 sales staff and from these, .1 responded ,very unfavourable,, C responded ,favourable, and 7 responded ,very favourableF. The rest of the sample were administrators. &f these, .9 gave the response ,very favourable, and / the response ,very unfavourable,. In the whole survey, e"actly half the employees interviewed responded ,unfavourable,. We first draw up a contingency table showing these results and then test whether attitude towards future wage restraint is dependent on the type of employment. 6etting up the table) in this e"ample there are three types of employee giving four different responses, i. e. we have a 7 " 8 #or a 8 " 7$ table. !dding e"tra rows and columns for the subtotals and titles we need 3 " 9 cells. Have a go at compiling the table. !s you come to each number in the fre%uency of response above insert it into the appropriate place* then find the missing figures by difference. There is sufficient information here to enable you to complete your table. When complete, check with that below before calculating the e"pected values. E.favourable -avourable Gnfavourable 6tackers 6ales staff !dministrators Total The e"pected values can ne"t be calculated)
>ow total 'olumn total &verall total

E.unfavourable Total

and inserted.

E.favourable -avourable Gnfavourable 6tackers 6ales staff !dministrators Total H"&othesis test . 7 .9 /1 = C /8 81 /8 78 // 01

E.unfavourable Total 0 .1 / /1 81 39 98 .91

N)ll H"&othesis #H0$) There is no asso!iation between 2ob category and attitude towards wage restraint. Alternati(e H"&othesis #H/$) There is an association between 2ob category and attitude towards wage restraint. Le(el o. Si ni.i!an!e) 34 Bevel &f 6ignificance

Criti!al (al)e) :umber of degrees of freedom #$ < #r - .$#c - .$ < Bevel of significance < 34 / table, 34, 9 degrees of freedom < ./.3C

Test statisti! & . = /8 0 7 C 78 .1 .9 /8 // /

( & ?)/
?

#'omplete the table.$ ? 3 .1 /1 3 = .8 #& - ?$ -8 -7 H8 H7 -8 -3 #& - ?$/+? 7./11 1.C11 1.011 ..011 /./09 ..=09

Total Test stati!) 7/.C9C

7/.C9C

Con!l)sion) Test statistic D 'ritical value therefore re2ect H1 'onclude that there is an association between 2ob category and attitude towards future wage restraint. The administrators were for it but the others against it.

CO12LETED E3A12LES 4RO1 LECTURE HANDOUT


E%a*&le - In this e"ample we first have to set up the contingency table from the following information collected from a %uestionnaire) In a recent survey within a 6upermarket chain, a random sample of .91 employees) stackers, sales staff and administrators, were asked to grade their attitude towards future wage restraint on the scale)

Eery favourable* favourable*

unfavourable*

very unfavourable.

&f the 81 stackers interviewed, = gave the response ,favourable,, /8 the response ,unfavourable,, and 0 the response ,very unfavourable,. There were 39 sales staff and from these, .1 responded ,very unfavourable,, C responded ,favourable, and 7 responded ,very favourableF. The rest of the sample were administrators. &f these, .9 gave the response ,very favourable, and / the response ,very unfavourable,. In the whole survey, e"actly half the employees interviewed responded ,unfavourable,. We first draw up a contingency table showing these results and then test whether attitude towards future wage restraint is dependent on the type of employment. 6etting up the table) in this e"ample there are three types of employee giving four different responses, i. e. we have a 7 " 8 #or a 8 " 7$ table. !dding e"tra rows and columns for the subtotals and titles we need 3 " 9 cells. Have a go at compiling the table. !s you come to each number in the fre%uency of response above insert it into the appropriate cell* then find the missing figures by difference. There is sufficient information here to enable you to complete your table. When complete, check with that below before calculating the e"pected values. E.favourable -avourable Gnfavourable 6tackers 6ales staff !dministrators Total . 7 .9 /1 = C /8 81 /8 78 // 01 E.unfavourable Total 0 .1 / /1 81 39 98 .91 and inserted.

The e"pected values can ne"t be calculated)

>ow total 'olumn total &verall total

E.favourable -avourable Gnfavourable 6tackers 6ales staff !dministrators Total H"&othesis test . 7 .9 #3$ #=$ #0$ /1 = C /8 #.1$ #.8$ #.9$ 81 /8 78 // #/1$ #/0$ #7/$ 01

E.unfavourable Total 0 .1 / #3$ #=$ #0$ /1 81 39 98 .91

N)ll H"&othesis #H0$) There is no asso!iation between 2ob category and attitude towards wage restraint. Alternati(e H"&othesis #H/$) There is an association between 2ob category and attitude towards wage restraint.

Le(el o. Si ni.i!an!e)

34 Bevel of significance

Criti!al (al)e) :umber of degrees of freedom #$ < #r - .$#c - .$ < #7 - .$#8 - .$ < / " 7 < 9 Bevel of significance < 34 / table, Table 3 in !ppendi" I, 34, 9 degrees of freedom < ./.3C

Test statisti! & . = /8 0 7 C 78 .1 .9 /8 // /

( & ?)/
?

? 3 .1 /1 3 = .8 /0 = 0 .9 7/ 0

#& - ?$ -8 -7 H8 H7 -8 -3 H9 H7 H0 H0 -.1 -9 Total

#& - ?$/+? 7./11 1.C11 1.011 ..011 /./09 ..=09 ../09 ../09 0.111 8.111 7../3 8.311 7/.C9C

Test stati!) 7/.C9C Con!l)sion) Test statistic D 'ritical value therefore re2ect H1 'onclude that there is an association between 2ob category and attitude towards future wage restraint. The administrators were for it but the others against it.
Table 5 2ERCENTAGE 2OINTS O4 THE --D/STRIBUTION

/ 8

/06 /.=19 8.913 9./3/

56 7.08. 3.CC. =.0.9

- 56 3.1/8 =.7=0 C.73.

/6 9.973 C./.1 ...73

07l6 .1.07 .7.0/ .9./=

9 5 : ; < = /0 // //8 /9 /5 /: /; /< /= -0 -/ --8 -9 -5 -: -; -< -= 80 90 50 :0 ;0 <0 =0 /00

=.=01 C./79 .1.98 ./.1/ .7.79 .8.90 .3.CC .=./0 .0.33 .C.0. /..19 //.7. /7.38 /8.== /3.CC /=./1 /0.8. /C.9/ 71.0. 7/.1. 77./1 78.70 73.39 79.=8 7=.C/ 7C.1C 81./9 3..0. 97..= =8.81 03.37 C9.30 .1=.9 ..0.3

C.800 ...1= ./.3C .8.1= .3.3. .9.C/ .0.7. .C.90 /..17 //.79 /7.90 /3.11 /9.71 /=.3C /0.0= 71..8 7..8. 7/.9= 77.C/ 73..= 79.8/ 7=.93 70.0C 81... 8..78 8/.39 87.== 33.=9 9=.31 =C.10 C1.37 .1..C ..7.. ./8.7

....8 ./.07 .8.83 .9.1/ .=.37 .C.1/ /1.80 /..C/ /7.78 /8.=8 /9../ /=.8C /0.03 71..C 7..37 7/.03 78..= 73.80 79.=0 70.10 7C.79 81.93 8..C/ 87..C 88.89 83.=/ 89.C0 3C.78 =..8/ 07.71 C3.1/ .19.9 ..0.. ./C.9

.7./0 .3.10 .9.0. .0.8C /1.1C /..9= /7./. /8.=/ /9.// /=.9C /C..8 71.30 7/.11 77.8. 78.0. 79..C 7=.3= 70.C7 81./C 8..98 8/.C0 88.7. 83.98 89.C9 80./0 8C.3C 31.0C 97.9C =9..3 00.70 .11.8 ../.7 ./8.. .73.0

.0.8= /1.3. //.89 /8.79 /9..7 /=.0C /C.3C 7../9 7/.C. 78.3. 79../ 7=.=1 7C./3 81.=C 8/.7. 87.0/ 83.7/ 89.01 80./= 8C.=7 3...0 3/.9/ 38.13 33.80 39.0C 30.71 3C.=1 =7.8/ 09.99 CC.9. ../.7 ./8.0 .7=./ .8C.3

.1

S-ar putea să vă placă și