Sunteți pe pagina 1din 42

Business Statistics: Communicating with Numbers

By Sanjiv Jaggia and Alison Kelly

McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 12 Learning Objectives (LOs)

LO 12.1: Conduct a goodness-of-fit test for a multinomial


experiment.

LO 12.2: Determine whether two classifications of a


population are independent.

LO 12.3: Conduct a goodness-of-fit test for normality.

LO 12.4: Perform the Jarque-Bera test for normality.

12-2
Is Brand Loyalty Related to Buyer’s Age?

 The retail analyst for a marketing firm wants to know


if different customer groups prefer one brand over
another. She looks at data from 600 sales.
 In particular, she feels that the brand Under Armour
might appeal more to younger customers.
 The more established brands (Nike and Adidas)
might be capturing the older-customer market.

12-3
Is Brand Loyalty Related to Buyer’s Age?

1. Determine whether the two classifications


(age and brand name) are dependent at
the 5% significance level

2. Discuss how the findings from the test for


independence can be used.

12-4
12.1 Goodness-of-Fit Test for a Multinomial Experiment

LO 12.1 Conduct a goodness-of-fit test for a multinomial experiment.

 Thistest determines whether two or more population


proportions equal each other or any predetermined set of
values.

 Forexample, are four candidates in an election equally


favored by voters?

 Or, do people rate food quality in a restaurant comparably


to last year?

12-5
LO 12.1 A Multinomial Experiment

A multinomial experiment consists of a series of n


independent trials such that:

1.On each there are k possible outcomes.

2.The
probability pi of falling into category i is the same on
each trial.

3.Thek probabilities sum to 1:


p1 + p2 + … + pk = 1

12-6
LO 12.1 The Hypothesis Test

 The null hypothesis: the population proportions are equal


to one another or they are each equal to a specific value.

 Equal Population proportions:


H0: p1 = p2 = p3 = p4 = 0.25
HA: Not all population proportions are equal to 0.25.

 Unequal Population Proportions:


H0: p1 =0.4, p2 = 0.3, p3 = 0.2, p4 = 0.1
HA: At least one pi differs from its hypothesized value.

12-7
LO 12.1 Restaurant Food Quality

Last year the management at a restaurant


surveyed its patrons to rate the quality of its food.
The results were as follows:

Based on this and other survey results,


management made changes to the menu.

12-8
LO 12.1 This Year’s Results

This year, the management surveyed 250


patrons, asking the same questions about food
quality. Here are the results:

We want to know if the results agree with those


from last year, or if there has been a significant
change.

12-9
LO 12.1 Methodology
 Compute an expected frequency for each
category and compare it to what we actually
observe.

 Compute the difference between what was


observed and expected for each category.

 If the results this year are consistent with last


year, these differences will be relatively small.

12-10
LO 12.1 The ei (Expected Frequencies)
 We first compute the expected counts based
on the survey of 250 restaurant patrons.

 If the survey is consistent with last year’s


results, we expect e1 = p1(250) = .15(250) =
37.5 responses to be in the “Excellent”
category.

 There actually were o1 = 46, a bit more than


expected.

12-11
LO 12.1 Computing the Deviations
 In the first category e1 = 37.5 and o1 = 46, so we
get (o1 – e1) = ___.

 In the third category, which are “Fair” responses,


e3 = p3(250) = .45(250) = 112.5.

 There are 105 of these responses in the survey,


so we compute (o3 – e3) = 105 – 112.5 = ___.

12-12
LO 12.1 Standardizing the Deviations

12-13
LO 12.1 The Chi-Square Test

df = k-1, where k is the number of categories


oi = observed frequency for category i
ei = expected frequency for category i

12-14
LO 12.1 The Critical Value (at  = .05)

12-15
LO 12.1 The Restaurant Example

12-16
LO 12.1 The Restaurant Example
Observed Expected ( oi - ei )2
Response Percentage This Year Out of 250 ________
Category Last year ( oi ) ( ei ) ( oi - ei ) ei
Excellent 15% 46 37.5 8.5 1.927
Good 30% 83 75.0 8.0 0.853
Fair 45% 105 112.5 -7.5 0.500
Poor 10% 16 25.0 -9.0 3.240
TOTAL 100% 250 250 0.0 6.520

 Since the computed test statistic of 6.520 is less than


the critical value of 7.815, we do not reject H0.
 The changes did not produce a statistically significant
response at the 5% level.

12-17
LO 12.1 A Required Condition

 The test requires that the expected frequency


( ei ) in each cell is at least 5.

 That was not a problem in the restaurant


example.

 One way to correct this potential problem is


to combine categories to get ei ≥ 5.

12-18
LO 12.1 Example 12.1
 There are five companies that manufacture a
particular product. Their market shares for
2010 are:
Company 1 2 3 4 5
Market Share 40% 32% 24% 2% 2%

 Current-year shares are not yet known, so a


market analyst surveys 200 recent customers
to gain an “advanced look.”

12-19
LO 12.1 Example 12.1 (continued)
 The survey showed the following results:
Company 1 2 3 4 5 Total
Purchases 70 60 54 10 6 200

 A minor complication is that for two small


companies, a 2% market share yields
expected frequencies of 4 (200×0.02).

 We will combine companies 4 and 5 in


performing the analysis.

12-20
LO 12.1 Example 12.1 (continued)

12-21
LO 12.1 Example 12.1 Computations
Market Purchases Expected ( oi - ei )2
Share in This Year Out of 200 ________
Company 2010 ( oi ) ( ei ) ( oi - ei ) ei
1 40% 70 80 -10.0 1.250
2 32% 60 64 -4.0 0.250
3 24% 54 48 6.0 0.750
4 and 5 4% 16 8 8.0 8.000
TOTAL 100% 200 200 0.0 10.250

 Because the computed test statistic exceeds 7.815,


we reject H0.
 We conclude that there have been shifts in the market.

12-22
12.2 Chi-Square Test for Independence
LO 12.2 Determine whether two classifications of a population are independent.

 The goodness-of-fit test examines a single qualitative


variable. A test of independence – also called a chi-
square test of a contingency table – analyzes the
relationship between two qualitative variables.

 The competing hypotheses can be expressed as:


H0: The two classifications are independent
HA: The two classifications are dependent

12-23
LO 12.2 Contingency Tables

 A contingency table shows the frequencies


for two qualitative variables (i.e., brand of
product and type of customer).

 Each variable has two or more categories.

 The test for independence is based on the


expected and observed frequencies for each
cell in the table.

12-24
LO 12.2 Example
Does the brand of compression garment
purchased depend on the customer’s age?

Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 174 132 90
35 years and older 54 72 78

12-25
LO 12.2 Notation
 We use the notation oij to denote the observed
frequency in row i of column j.

 Similarly, eij is the expected frequency in row i


of column j.

 Under the independence assumption, the


expected frequency per cell is:
eij = (Row i total)(Column j total)/Sample Size

12-26
LO 12.2 The Chi-Square Statistic
We apply the chi-square test statistic in a
similar manner as in the goodness-of-fit test.
The formula is as follows:

(oij  eij ) 2

  
2
df ,
i j eij

where df = (rows - 1)(columns -1).

12-27
LO 12.2 Computing Expected Frequencies
Brand Name Row
Age Group Under Armor Nike Adidas Totals
Under 35 years 174 132 90 396
35 years and up 54 72 78 204
Column Totals 228 204 168 600

 For row 1 and column 1, the expected frequency, e11, is


(396)(228)/600 = 150.48.

 For row 1 and column 2, the expected frequency, e12, is


(396)(204)/600 = _____.

 For e13, we calculate (396)(___)/600 = _____.

12-28
LO 12.2 Expected Frequencies and Deviations

Brand Name Row


Age Group Under Armor Nike Adidas Totals
Under 35 years 150.48 134.64 110.88 396.00
35 years and up 077.52 069.36 057.12 204.00
Column Totals 228.00 204.00 168.00 600.00

The deviations ( oij – eij ) are:


Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 23.52 -2.64 -20.88
35 years and up -23.52 2.64 20.88

12-29
LO 12.2 Squared Deviations
 We square each deviation and divide by the
respective expected frequency. These
values are shown in the following table.

Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 3.68 0.05 3.93
35 years and up 7.14 0.10 7.63

 The standardized, squared deviations sum to


22.53, the value of the test statistic.

12-30
LO 12.2 Summarizing the Example
Competing Hypotheses:
H0: Age and brand name are independent.
HA: Age and brand name are dependent.

The test statistic is calculated using:


(oij  eij ) 2

  
2
df ,
i j eij
where df = (r – 1)(c – 1) = (2 - 1)(3 - 1) = 2.

The critical value is 5.991 at the 5% significance level.

12-31
LO 12.2 Summarizing the Example
 We reject H0 because the value of the test
statistic is larger than the critical value:
22.53 > 5.991. Therefore, age and brand
name are not independent of one another.

 Alternatively, by selecting Formulas > Insert


Function > CHISQ.DIST.RT and inputting
X=22.53 and Deg-freedom=2, Excel will
compute the p-value for our test, which is
very close to 0.

12-32
12.3 Chi-Square Test for Normality
LO 12.3 Conduct a goodness-of-fit test for normality.

 The goodness-of-fit test can also be used to


determine if a population has a particular
probability distribution. The expected
frequencies are determined from this assumed
distribution.

 These expected frequencies are then compared


to the observed frequencies to compute the
familiar chi-square test statistic.

12-33
LO 12.3 Testing for Normality

 The hypotheses for a test for normality:


H0: The data follow a normal distribution with
parameters µ and σ
HA: The data do not follow this distribution

 The values of µ and σ are typically the point


estimates calculated from the sample data.

12-34
LO 12.3 Example: A Sample of 50 Incomes
 Table 12.9 in the text shows 50 household
incomes. The sample mean income is 63.80 (in
$1000s) with standard deviation 45.78.

 We next form k = 5 categories (up to 20, 20 to


40, etc.), and count how many households we
observe with incomes in each category.

 For the expected frequency, we calculate the


probability of an income falling in each category,
assuming income follows our hypothesized
distribution.

12-35
LO 12.3 Computing the Expected Counts
 There are 6 households in the first class of
less than $20,000.

 If µ = 63.8 and σ = 45.78, we compute:

 In this interval we expect 0.1658×50 = 8.43


households.

12-36
LO 12.3 Calculations for the Test

 df = (k – 1 – 2) because there are two parameters in the


normal distribution.
 With k = 5, df = 2 and the critical value is 5.991.

12-37
LO 12.3 Concluding the Test
 Since the value of the test statistic, 8.12, exceeds
our critical value of 5.991, we reject the null
hypothesis.
 We conclude that this data does not come from a
normal distribution with mean 63.8 and standard
deviation 45.78.
 A criticism of this method is that we first have to
convert raw data into a set of arbitrary classes.
 The result might be different if we had grouped
the data differently.

12-38
12.4 The Jarque-Bera Test for Normality
LO 12.4 Perform the Jarque-Bera test for normality.

 An alternative to the goodness-of-fit test for


normality is one developed by Jarque and Bera.

 A normal distribution is not skewed and its peak


is in a specific ratio to its spread.

 The Jarque-Bera test uses these facts to derive


a test statistic.

12-39
LO 12.4 Skewness and Kurtosis
 Skewness is a measure of a distribution’s lack of
symmetry; we have S = 0 for any normal
distribution.

 Kurtosis is a measure of peakedness; the value


is K = 0 for a normal distribution.

 We can obtain the values of S and K from Excel


and use them to compute the appropriate test
statistic.

12-40
LO 12.4 Hypotheses and Test Statistic

12-41
LO 12.4 Example 12.3

12-42

S-ar putea să vă placă și