Sunteți pe pagina 1din 10

STAT8097 Business Statistics Forum Discussion Session 8 – Chi Square and Nonparametric

Test (Chapter 12)

Question 1 (40 points)


Chi Square Test for the Difference Between Two Proportions
Consider a study on the impact of using celebrities in television advertisements. A researcher
investigated the relationship between gender of a viewer and the viewer’s brand awareness. Three
hundred TV viewers were asked to identify product advertised by male celebrities. The data were
summarized in the following table.

Gender
Male Female Totals
Brand Could Identify 95 41 136
Awareness Product
Could Not 55 109 164
Identify Product
Totals 150 150 300

a. Is there evidence of a difference between males and females in the proportion who could
identify product advertised by male celebrities? Set up the null and alternative hypotheses to
gain insights into the impact of using celebrities in television advertisements.
b. Conduct the hypothesis test defined in (a), using the 0.05 level of significance.
c. Conduct the hypothesis test defined in (a), using the 0.01 level of significance.
d. Are the results of the hypothesis tests different for (b) and (c)?

Known:
n = 300
males; n1 = 150; X1 = 95 could Identify Product
females; n2 =150; X2 = 41 could Identify Product

a. Is there evidence of a difference between males and females in the proportion who could
identify product advertised by male celebrities? Set up the null and alternative hypotheses to
gain insights into the impact of using celebrities in television advertisements.

Hypothesis:
H0: π1 = π2 (Proportion of males who could Identify Product is equal to the proportion of
females who could Identify Product)
H1: π1 ≠ π2 (There is the different The two proportions –Type of Product identification
preference is not independent of gender)

If H0 is true, then the proportion of males who could Identify Product advertised by male
celebrities should be the same as the proportion of females who could Identify Product
advertised by male celebrities.

Therefore the decision rule is :


If 𝑋 2 > 𝑋𝑢2 , reject H0, otherwise do not reject H0
b. Conduct the hypothesis test defined in (a), using the 0.05 level of significance.

Before calculating X2 (Chi-square test statistic), we need to find the average proportion (𝑝̅ )

Formula of 𝑝̅ :

95+41 136
= = 45.33%
150+150 300

Next is to calculate expected frequency (fe)of each:

Formula is fe = 𝑝̅ x n1 for males who could Identify Product = 45.33% x 150 = 68


Formula is fe = 𝑝̅ x n2 for females who could Identify Product = 45.33% x 150 = 68
Formula is fe = (1-𝑝̅ ) x n1 for males who could not Identify Product = (1 - 45.33%) x 150 = 82
Formula is fe = (1-𝑝̅ ) x n2 for females who could not Identify Product = (1 - 45.33%) x 150 = 82

We can create table to calculate X2

Gender
Male Female Totals
Observed Expected Observed Expected
Could Identify
95 68 41 68 136
Brand Product
Awareness Could Not
55 82 109 82 164
Identify Product
Totals 150 150 150 150 300

(𝑓0 − 𝑓𝑒 )2
f0 fe (f0-fe) (f0-fe)2
𝑓0
95 68 27 729 10.72
41 68 -27 729 10.72
55 82 -27 729 8.89
109 82 27 729 8.89
39.22

The result is X2 = 39.22


Now, find the critical value from the Chi-square distribution using 0.05 level of significance.

First, find degree of freedom (df) by using formula


(number of rows -1)*(number of columns -1) => (2-1)*(2-1) = 1*1 = 1
df = 1
α = 0.05

By using chi-square distribution, we can find critical value

The critical value = 3.84

Region of
rejection
Region of
non-rejection

3.84 X2

Since we found that X2stat > critical value (39.22 > 3.84), therefore we reject H0. In other
word that the proportion of males who could Identify Product advertised by male celebrities
is different as the proportion of females who could Identify Product advertised by male
celebrities.

c. Conduct the hypothesis test defined in (a), using the 0.01 level of significance.

Since we have X2 = 39.22, we just need to find the critical value from the Chi-square
distribution using 0.01 level of significance.
First, find degree of freedom (df) by using formula
(number of rows -1)*(number of columns -1) => (2-1)*(2-1) = 1*1 = 1
df = 1
α = 0.01

By using chi-square distribution, we can find critical value

The critical value = 6.63

Region of
rejection
Region of
non-rejection

6.63 X2

Since we found that X2stat > critical value (39.22 > 6.63), therefore we reject H0. In other
word that the proportion of males who could Identify Product advertised by male celebrities
is different as the proportion of females who could Identify Product advertised by male
celebrities.

d. Are the results of the hypothesis tests different for (b) and (c)?

Eventhough we used different level of significant (α) to conduct hypothesis test (α = 0.05; α
= 0.01), the result is remain the same; we reject H0 since X2 > critical value for both
hypothesis test.
Question 2 (40 points)
Chi Square Test of Independence
A vehicle quality survey asked new owners a variety of questions about their recently purchased cars.
One questions asked for the owner’s rating of the vehicle using categorical responses of average,
outstanding, and exceptional. Another question asked for the owner’s education level with the
categorical responses some high school, high school graduate, college graduate, and university
graduate. Assume the sample data below are for 500 owners who had recently purchased a car.

Education
Quality Rating Some HS HS Grad College Grad University Grad
Average 35 30 20 60
Outstanding 45 45 50 90
Exceptional 20 25 30 50

a. Compute the value of 2 for a test of independence.

First, determine Null Hypothesis and Alternative Hypothesis

H0: There is no relationship between quality rating of the vehicle and education.
H1: There is a relationship between quality rating of the vehicle and education.

In order to compute X2stat, use formula as follow:

f0 = observed frequency in a particular cell of the row * column contingency table


fe = expected frequency in a particular cell if the null hypothesis of independence is true
formula => (Row total * Column total) / n

Row total = sum of the frequencies in the row


Column total = sum of the frequencies in the column
n = overall sample size
Table observed frequency

Quality Some HS College University


Total
Rating HS Grad Grad Grad
Average 35 30 20 60 145
Outstanding 45 45 50 90 230
Exceptional 20 25 30 50 125
Total 100 100 100 200 500

Calculate expected frequency (fe) of Some HS/Average = (100 * 145)/500 = 29. Hence we
calculate for rest of them, and we find as follow in table:

Quality University
Some HS HS Grad College Grad
Rating Grad
Average 29 29 29 58
Outstanding 46 46 46 92
Exceptional 25 25 25 50

In order to calculate

Using table as follow:

(𝑓0 − 𝑓𝑒 )2
f0 fe (f0-fe) (f0-fe)2
Cell 𝑓0
Average/Some HS 35 29 6 36 1.24
Average/HS Grad 30 29 1 1 0.03
Average/College Grad 20 29 -9 81 2.79
Average/University Grad 60 58 2 4 0.07
Outstanding/Some HS 45 46 -1 1 0.02
Outstanding/HS Grad 45 46 -1 1 0.02
Outstanding/College Grad 50 46 4 16 0.35
Outstanding/University Grad 90 92 -2 4 0.04
Exceptional/Some HS 20 25 -5 25 1
Exceptional/HS Grad 25 25 0 0 0
Exceptional/College Grad 30 25 5 25 1
Exceptional/University Grad 50 50 0 0 0
6.56

The result of X2 = 6.56


b. Use a 0.05 level of significance and a test of independence to determine if a new owner’s
vehicle quality rating is independent of the owner’s education?

In order to find critical value;


First, find degree of freedom (df) by using formula
(number of rows -1)*(number of columns -1) => (3-1)*(4-1) = 2*3 = 6
df = 6
α = 0.05

By using chi-square distribution, we can find critical value

The critical value = 12.59

Region of
rejection
Region of
non-rejection

X2 12.59
We found that X2 ≤ critical value; meaning that there is no relationship between quality
rating of the vehicle and education or independent.

c. What is the p-value and what is your conclusion?

d. Use the overall percentage of average, outstanding and exceptional ratings to comment upon
how new owners rate the quality of their recently purchased cars.

Question 3 (20 points)


Krusskal – Wallis Test
Forty-minute workouts of one if the following activities three days a week will lead to a loss weight.
The following sample data show the number of calories burned during 40-minute workouts for three
different activities.

Swimming Tennis Cycling


408 415 385
380 485 250
425 450 295
400 420 402
427 530 268

a. At the 0.05 level of significance, is there evidence of a significant difference in the median
amount of calories burned among the different activities?

Determine H0 and H1:

H₀ = M1 = M2 = M3; There is no significant difference in the median amount of calories burned among
the different activities

H₁ ≠ M1 ≠ M2 ≠ M3; There is significant difference in the median amount of calories burned among
the different activities

Decision rule:

Reject H0 if W >

Then find the rank of each data sample from smaller sample

Swimming Rank Tennis Rank Cycling Rank


408 8 415 9 385 5
380 4 485 14 250 1
425 11 450 13 295 3
400 6 420 10 402 7
427 12 530 15 268 2
41 61 18

Checking the rank:

n = n1 + n2 + n3 = 5 + 5 + 5 = 15

By using formula as follow:


𝑛(𝑛+1) 15∗(15+1) 15∗16 240
T1 + T2 + T 3 = = 41 + 61 + 18 = => 120 = => 120 = => 120 = 120
2 2 2 2

Next, is to find Hstat by using formula:


n = n1 + n2 + n3 = 5 + 5 + 5 = 15

T1 = 41; T2 = 61; T3 = 18

12 412 612 182


=>⌊ ( + + )⌋-3(15+1) = (0.05*1145.2)-(3*16) = 57.26 – 48 = 9.26
15(15+1) 5 5 5

The result of Hstat = 9.26

Next is to find critical value by determine degree of freedom (df)= c – 1; c = 3; so 3 – 1 = 2

α = 0.05

Using Table below to find critical value with df = 2 and α = 0.05

Critical chi-square value is 5.99

Compare Hstat vs Critical chi-square value => 9.26 > 5.99

We found that Hstat > critical value; There is significant difference in the median amount of
calories burned among the different activities

Region of
rejection
Region of
non-rejection

5.99 X2
c. What is your conclusion?

We found that Hstat > critical value (9.26 > 5.99); therefore we can conclude there is
significant difference in the median amount of calories burned among the different
activities.

S-ar putea să vă placă și