00 voturi pozitive00 voturi negative

0 vizualizări71 paginiChi square

Nov 14, 2018

© © All Rights Reserved

PPTX, PDF, TXT sau citiți online pe Scribd

Chi square

© All Rights Reserved

0 vizualizări

00 voturi pozitive00 voturi negative

Chi square

© All Rights Reserved

Sunteți pe pagina 1din 71

Department of Management,

FEB

Airlangga University

Why Chi Square? (χ2)

• We want to compare two variables, but…

• Not all variables are interval-level, so we cannot use regression.

• Hypothesis Tests for Difference of Means and Difference of

Proportions only allow us to compare two groups with one

value.

• We need something else. . .

What is Chi Square? (χ )

2

comparing 2 or more nominal categories

• The Chi Square Statistic compares

nominal values in a cross-tabulation table,

making what are called row by column

comparisons or “r x c” tables.

Data Types

Data

Quantitative Qualitative

Discrete Continuous

4

A Nominal variable is…

example gender where male = 1 and female = 2.

Hypothesis Tests

Qualitative Data

Qualitative

Data

2 or more

1 pop. pop.

Proportion Independence

2 pop.

EPI809/Spring 2008 6

Chi-Square Tables

and consist of columns and rows, with columns

representing areas under the curve and rows

associated with the degrees of freedom (df)

which, for 2 tests of homogeneity and

independence are: df = (r-1)(c-1).

Chi-Square Tables

• Typical columns are: 2.100 2.050 2.025 2.010 2.005

• The decision rule both for for a chi-square test of homogeneity and

one of independence is:

DR: Reject H0 in favor of HA if and only if 2calc > 2crit.

Otherwise, FTR H0.

• In the case of homogeneity, this is essentially:

DR: Reject similarity of processes in favor of distinctions between

them if and only if their profiles differ markedly from one another

so that the preponderance of evidence supports distinctions.

Otherwise, FTR H0.

2crit Determination

With 8 df and = .05

df 2.100 2.050 2.025 2.010 2.005

2 4.6052 5.9915 7.3778 9.2103 10.5966

3 6.2514 7.8147 9.3484 11.3449 12.8381

4 7.7794 9.4877 11.1433 13.2767 14.8602

. . . . . .

. . . . . .

8 13.3616 15.5073 17.5346 20.0902 21.9550

. . . . . .

. . . . . .

30 40.2560 43.7729 46.9792 50.8922 53.6720

Reject H0 if 2 > 9.488

Using the Table…

15 - 10

Degrees of

Freedom

5–1=4

Right-Tail

Area

= 0.05

Characteristics of the 15 - 11

Chi-Square Distribution

… it is positively skewed

… it is non-negative

… it is based on degrees of freedom

…when the degrees of freedom change

a new distribution is created

…e.g.

Copyright © 2004 McGraw-Hill Ryerson Limited. All rights reserved.

Characteristics of the 15 - 12

Chi-Square Distribution

df = 3

df = 5

df = 10

2

Copyright © 2004 McGraw-Hill Ryerson Limited. All rights reserved.

Summing up the properties of the 2

Distribution:

2 distribution ranges from zero to some positive value,

i.e., ‘no difference’ to some ‘big difference’.

2 distribution is not symmetrical, but skewed to the

right, from zero to a large positive 2. Chi square looks at

differences from zero. Its value depends on the number

of comparisons made, that is, the number of df. Note that

the critical value of chi square gets bigger as the df get

bigger, just because the more comparisons made the

more likely you are to find differences, so df corrects for

this.

There are many different 2 distributions. Like the t

distribution, 2 varies with degrees of freedom.

CHI SQUARE APPLICATIONS

(χ2)

Test of

Independency

Test between Test of

proportions normality

Basic Assumption of the Null

Hypothesis

• There is no difference in the population, the

difference you observe is just the chance

variation of your sample.

•We are comparing observed values

(“frequency actually observed in our sample,

written “fo”) to some set of expected by

chance frequencies (written “fe”).

(χ2)

test for difference between two proportions

betweeen two independenct groups – two way cross-

classification table (contingency table)

• Ho : there is no difference between the two populations

proportions

• Ho : p1 = p2

• H1: two populations proportions are not the same

• H1 : p1 ≠ p2

(χ )

2

----------------------------

Goodness-of-Fit Test: 15 - 18

frequencies respectively

H0: There is no difference between the

observed and expected frequencies

H1: There is a difference between the

observed and the expected frequencies

Goodness-of-Fit Test: 15 - 19

(fo )2

- fe

… the test statistic is: 2

=

fe

(k-1) degrees of freedom,

where k is the number of categories

Steps in Test of Hypothesis

1. Determine the appropriate test

2. Establish the level of significance:α

3. Formulate the statistical hypothesis

4. Calculate the test statistic

5. Determine the degree of freedom

6. Compare computed test statistic against a

tabled/critical value

20

1. Determine Appropriate Test

measured on a nominal scale.

It can be applied to interval or ratio data that have

been categorized into a small number of groups.

It assumes that the observations are randomly

sampled from the population.

All observations are independent (an individual

can appear only once in a table and there are no

overlapping categories).

It does not make any assumptions about the shape

of the distribution nor about the homogeneity of

variances.

21

2. Establish Level of Significance

α is a predetermined value

The convention

• α = .05

• α = .01

• α = .001

22

3. Determine The Hypothesis:

Whether There is an Association

or Not

Ho : The two variables are independent

Ha : The two variables are associated

23

4. Calculating Test Statistics

Contrasts observed frequencies in each cell of a

contingency table with expected frequencies.

The expected frequencies represent the number of

cases that would be found in each cell if the null

hypothesis were true ( i.e. the nominal variables

are unrelated).

Expected frequency of two unrelated events is

product of the row and column frequency divided

by number of cases.

Fe= Fr Fc / N

24

4. Calculating Test Statistics

( Fo - Fe ) 2

=

2

Fe

25

4. Calculating Test Statistics

( Fo - Fe ) 2

=

2

Fe

26

27

5. Determine Degrees of

Freedom

df = (R-1)(C-1)

6. Compare computed test statistic

against a tabled/critical value

The computed value of the Pearson chi-

square statistic is compared with the critical

value to determine if the computed value is

improbable

The critical tabled values are based on

sampling distributions of the Pearson chi-

square statistic

If calculated 2 is greater than 2 table

value, reject Ho 28

Example

preferences on gun control issues.

A questionnaire was developed and sent to a

random sample of 90 voters.

The researcher also collects information

about the political party membership of the

sample of 90 respondents.

29

Bivariate Frequency Table or

Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

30

Bivariate Frequency Table or

Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

31

Row frequency

Bivariate Frequency Table or

Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

32

Bivariate Frequency Table or

Contingency Table

Favor Neutral Oppose f row

Democrat 10 10 30 50

Republican 15 15 10 40

f column 25 25 40 n = 90

Column frequency

33

1. Determine Appropriate Test

2. Voting Preference ( 3 levels) and Nominal

34

35

Alpha of .05

3. Determine The Hypothesis

in their opinion on gun control issue.

responses to the gun control survey and the

party membership in the population.

36

4. Calculating Test Statistics

fe =13.9 fe =13.9 fe=22.2

Republican fo =15 fo =15 fo =10 40

fe =11.1 fe =11.1 fe =17.8

f column 25 25 40 n = 90

37

4. Calculating Test Statistics

= 50*25/90

Democrat fo =10 fo =10 fo =30 50

fe =13.9 fe =13.9 fe=22.2

Republican fo =15 fo =15 fo =10 40

fe =11.1 fe =11.1 fe =17.8

f column 25 25 40 n = 90

38

4. Calculating Test Statistics

fe =13.9 fe =13.9 fe=22.2

= 40* 25/90

Republican fo =15 fo =15 fo =10 40

fe =11.1 fe =11.1 fe =17.8

f column 25 25 40 n = 90

39

4. Calculating Test Statistics

=

2

13.89 13.89 22.2

11.11 11.11 17.8

= 11.03

40

41

5. Determine Degrees of

Freedom

df = (R-1)(C-1) =

(2-1)(3-1) = 2

6. Compare computed test statistic

against a tabled/critical value

α = 0.05

df = 2

Critical tabled value = 5.991

Test statistic, 11.03, exceeds critical value

Null hypothesis is rejected

Democrats & Republicans differ

significantly in their opinions on gun

control issues

42

SPSS Output for Gun Control

Example

Chi-Square Tests

Asymp. Sig.

Value df (2-sided)

Pearson Chi-Square 11.025a 2 .004

Likelihood Ratio 11.365 2 .003

Linear-by-Linear

8.722 1 .003

Association

N of Valid Cases 90

a. 0 cells (.0%) have expected count less than 5. The

minimum expected count is 11.11.

43

Another case...

Approval for President Obama by Race

BLACKS WHITES

APPROVE 69 156

DISAPPROVE 21 144

The formula for 2 is:

( fo - fe )

2

=

2

fe

OR, sometimes written:

(O - E ) 2

= 2

E

Where fo is the observed frequency of each

category in each cell of a table.

O or fo is what we observe from our sample, the

observed frequency. NOTE that 2 works with

frequencies in each cell.

people who would show up in each cell IF the null

hypothesis were true, if there was no racial

difference in approval, if the frequencies were due

solely to chance.

For each cell in the table we are to compare

what we observe to what we should expect by

chance:

• Subtract the value of the hypothetical expectancy (fe) from the observed

frequency (fo) for each cell.

• Square each of these deviations.

• Divide each of the squared differences by the expected value of each cell.

• Finally, take the sum of the squared fo- f e differences to get χ2 .

The Chi Square statistic tests :

• Whether the difference between what you observe and what

chance would predict is due to sampling error.

• The greater the deviation of what we observe to what we

would expect by chance, the greater the probability that the

difference is NOT due to chance.

DIFFERENCE BETWEEN EXPENSIVE

AND CHEEP BEER

• Consumer Reports routinely finds that many

people who claim they can taste the difference

can’t — they are influenced by the label.

• How would you test the idea that people cannot

really tell the difference, and that they are really

responding to the price label information. How

do we disentangle the label effect from taste?

What is the null? ==> No difference

We expect: beer 1 = beer 2 = beer 3

before them 3 bottles, one labeled with name of

well-known high-priced beer, another a medium-

priced beer, and the third a low priced beer.

Bottles counter balanced to control for order

effects.

All 150 Subjects taste each beer and state

preference.

The Full Table

High Priced Medium Low

Beer Priced Beer Priced Beer

Observed fo 77 41 32

Expected fe 50 50 50

Step 1. Hypothesis:

Null = the proportions preferring each beer

should be equal IF indeed the beers are equal and

if preferences are not influenced by the label. Here,

chance would predict 50 people in each group if

label did not matter. The ratios of O to E values

should be the same across all 3 comparisons if

label does not matter. The O : E ratios in each

column should be the same. Our alternative

hypothesis is that preferences will follow the status

of beer 1 > beer 2 > beer 3.

Step 2. The Distribution: .

nominal variable on another nominal variable

the 2 distribution is appropriate -- we are doing a

row by column [r * c] analysis.

Set alpha at .05 for 95% confidence.

Step 4. Determine Critical Value of 2*:

The chi square distribution changes shape by

degrees of freedom, just as does the t distribution.

Degrees of freedom change as a function of the

number of comparisons made.

Formula for degrees of freedom of 2:

df = (r - 1) x (c - 1)

where r = number of rows; c = number of columns

We have a 3 by 2 table, so df = (3 - 1) x (2 - 1) = 2.

categories.)

(O - E ) 2

=

2

E

Beer Hi Priced Med Priced Lo Priced

Observed 77 41 32

Expected 50 50 50

O-E 27 -9 -18

Look up our p-value of 2 = 22.68 in Chi Square

table at 2 df. Find that the 22.68 is even beyond

.01 significance.

chances in 10,000 would produce a difference this

big just by chance. Or better, less than 5 samples

10,000 of the same size would produce a

difference this big.

Step 6. Interpret:

critical value of 5.991.

People do respond to price label information.

Goodness-of-Fit Test:

Normality

… the test investigates

if the observed frequencies in a frequency distribution

match the theoretical normal distribution

15 - 59

…to determine the mean and standard deviation

of the frequency distribution

- Compute the z-value for the lower class limit

and the upper class limit for each class

- Determine fe for each category

- Use the chi-square goodness-of-fit test to

determine if fo coincides with fe

attention

• Suppose we knew the mean and standard deviation of

population but wished to find whether some sample data

conform to the normal distribution :

d.f. = k = 1

• On the other hand, if we don’t know the mean and standard

deviation of population but we wish to test whether some

sample data follow the normal distribution

d.f. = k = p=1

Where p is the number of population parameter being estimated

from the sample data

Goodness-of-Fit Test:

Normality

15 - 61

Foundation is reported in the

following frequency distribution

normally distributed with a mean of $10 and a

standard deviation of $2?

… continued

15 - 62

<$6 20

$6 up to $8 60

$8 up to $10 140

$10 up to $12 120

$12 up to $14 90

>$14 70

Total 500

… continued

first determine the z - value

15 - 63

X - m 6 - 10

z = = = - 2 . 00

s 2

Now…

find the probability of a z - value less than –2.00

… continued

15 - 64

<$6 20 .02

$6 up to $8 60 .14

$8 up to $10 140 .34

$10 up to $12 120 .34

$12 up to $14 90 .14

>$14 70 .02

Total 500

… continued

z-value less than –2.00 times the sample size

15 - 65

f e = (. 0228 )( 500 ) = 11 . 40

The other expected frequencies

are computed similarly

… continued

15 - 66

<$6 20 .02 11.40 6.49

$6 up to $8 60 .14 67.95 .93

$8 up to $10 140 .34 170.65 5.50

$10 up to $12 120 .34 170.65 15.03

$12 up to $14 90 .14 67.95 7.16

>$14 70 .02 11.40 301.22

Total 500 500 336.33

… continued

Step 1 H0: The observations follow the normal distribution

H0: The observations do NOT follow the normal

distribution

Step 2

15 - 67

= 0.05

Step 4 2 = 336.33

H0: is rejected.

The observations do NOT follow the normal distribution

A contingency table is used to investigate

whether two traits or characteristics

are related

15 - 68

… each observation is classified according to two criteria

…the usual hypothesis testing procedure is used

(number of rows -1)(number of columns -1)

Expected Frequency = (row total)(column total)/grand total

Is there a relationship between the

location of an accident and the gender

15 - 69

of the person involved in the accident?

police were classified by type and gender.

At the .05 level of significance, can we

conclude that gender and the location of

the accident are related?

… continued

Sex

Location Total

Work Home Other

Male 60 20 10 90

15 - 70

Female 20 30 10 60

Total 80 50 20 150

The expected frequency for the work-male

intersection is computed as (90)(80)/150 =48

Similarly, you can compute the

expected frequencies for the other cells

… continued

Step 1 H0: The Gender and Location are NOT related

H0: The Gender and Location are related

Step 2 = 0.05

H0 is rejected if 2 >5.991, df = 2

15 - 71

Step 3

(…there are (3- 1)(2-1) = 2 degrees of freedom)

2 =

(60 - 48 )2

...

(10 - 8 )

2

48 8

= 16 . 667

H0: is rejected.

Gender and Location are related!

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.