Sunteți pe pagina 1din 94

Practice Problems from Levine, Stephan,

Krehbiel and Berenson, “Statistics for Managers,” 6th Ed.,


Prentice-Hall, 2011

BASIC DATA ANALYSIS:


Chapter 2: 2.32 a,c (use excel), 2.99 a,b,d (not percentage polygon and
use excel)
Chapter 3: 3.7, 3.10 (use excel), 3.15 (use excel), 3.17 (use excel), 3.40,
3.45, 3.46 (use excel)

PROBABILITY:
Chapter 4: 4.8, 4.9, 4.13, 4.21, 4.26, 4.28

RANDOM VARIABLES:
Chapter 5: 5.3, 5.4, 5.10, 5.11, 5.12, 5.13

NORMAL DISTRIBUTION:
Chapter 6: 6.5, 6.8, 6.10, 6.13a-c

SAMPLING:
Chapter 7: 7.15, 7.19, 7.20, 7.21, 7.22, 7.27, 7.30

CONFIDENCE INTERVALS:
Chapter 8: 8.9, 8.10, 8.16, 8.17, 8.28, 8.30, 8.33, 8.37, 8.38, 8.71, 8.73,
8.75

HYPOTHESIS TESTING:
Chapter 9: 9.9, 9.10, 9.11, 9.15, 9.22, 9.26, 9.46, 9.48, 9.57, 9.67, 9.68
Chapter 10: 10.7, 10.20 (use excel), 10.30, 10.35, 10.59

REGRESSION ANALYSIS:
Chapter 13: 13.4 (use excel), 13.9 (use excel), 13.16, 13.21, 13.42, 13.47,
13.49, 13.50, 13.58 (use approximations),
13.74a-e, g-h (use excel), 13.75 (use excel)
Chapter 14: 14.3, 14.6 (use excel), 14.10, 14.11, 14.12, 14.16 a-d, 14.25,
14.28, 14.41 a-c, e-i (use excel), 14.49 a-c, e-h, i, k (use excel)
Statistics for Managers
Using Microsoft Excel
SIXTH EDITION

David M. Levine
Department of Statistics and Computer Information Systems

Zicklin School of Business, Baruch College, City University of New York

David R Stephan
Department of Statistics and Computer Information Systems

Zicklin School of Business, Baruch College, City University of New York

Timothy C. Krehbiel
Department of Management

Richard T. Farmer School of Business, Miami University

Mark L Berenson
Department of Management and Information Systems

School of Business, Montclair State University

Prentice Hall
Boston Columbus Indianapolis New York San Francisco Upper Saddle River
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto
Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo
2.4 Organizing Numerical Data 31

Problems for Section 2.4


LEARNING THE BASICS Raw Data on Utility Charges ($)
2.27 Form an ordered array, given the following data from 96 171 202 178 147 102 153 197 127 82
a sample of n = 7 midterm exam scores in accounting: 157 185 90 116 172 111 148 213 130 165
141 149 206 175 123 128 144 168 109 167
68 94 63 75 71 88 64 95 163 150 154 130 143 187 166 139 149
108 119 183 151 114 135 191 137 129 158
2.28 Form an ordered array, given the following data from
a sample of n = 7 midterm exam scores in marketing:
a. Construct a frequency distribution and a percentage dis-
88 78 78 73 91 78 85 tribution that have class intervals with the upper class
boundaries $99, $119, and so on.
b. Construct a cumulative percentage distribution.
2.29 The GMAT scores from a sample of 50 applicants to
c. Around what amount does the monthly electricity cost
an MBA program indicate that none of the applicants scored
seem to be concentrated?
below 450. A frequency distribution was formed by choos-
ing class intervals 450 to 499, 500 to 549, and so on, with 2.33 One operation of a mill is to cut pieces of steel into
the last class having an interval from 700 to 749. Two appli- parts that will later be used as the frame for front seats in an
cants scored in the interval 450 to 499, and 16 applicants automobile. The steel is cut with a diamond saw and
scored in the interval 500 to 549. requires the resulting parts to be within ±0.005 inch of the
a. What percentage of applicants scored below 500? length specified by the automobile company. Data are
b. What percentage of applicants scored between 500 collected from a sample of 100 steel parts and stored in
and 549? J2221- The measurement reported is the difference in
c. What percentage of applicants scored below 550? inches between the actual length of the steel part, as meas-
d. What percentage of applicants scored below 750? ured by a laser measurement device, and the specified
length of the steel part. For example, the first value, -0.002,
2.30 A set of data has values that vary from 11.6 to 97.8.
represents a steel part that is 0.002 inch shorter than the
a. If these values are grouped into nine classes, indicate the
specified length.
class boundaries.
a. Construct a frequency distribution and a percentage
b. What class interval width did you choose?
distribution.
c. What are the nine class midpoints?
b. Construct a cumulative percentage distribution.
c. Is the steel mill doing a good job meeting the require-
ments set by the automobile company? Explain.
APPLYING THE CONCEPTS
2.31 As player salaries have increased, the cost of attend- 2 . 3 4 A manufacturing company produces steel housings
ing baseball games has increased dramatically. The file for electrical equipment. The main component part of the
E H ^ 3 contains the following 2009 data concerning the housing is a steel trough that is made out of a 14-gauge steel
total cost at each of the 30 Major League Baseball parks for coil. It is produced using a 250-ton progressive punch press
four tickets, two beers, four soft drinks, four hot dogs, two with a wipe-down operation that puts two 90-degree forms
game programs, two baseball caps, and parking for one in the flat steel to make the trough. The distance from one
vehicle: side of the form to the other is critical because of weather-
proofing in outdoor applications. The company requires that
164 326 224 180 205 162 141 170 411 187 185 165 151 166 114 the width of the trough be between 8.31 inches and 8.61
158 305 145 161 170 210 222 146 259 220 135 215 172 223 216 inches. Data are collected from a sample of 49 troughs and
stored in ,12SEEI3> which contains the widths of the troughs,
Source: Data extracted from teammarketing.com, April I, 2009. in inches.
a. Place the data into an ordered array.
•>• Construct a frequency distribution and a percentage
8.312 8.343 8.317 8.383 8.348 8.410 8.351 8.373
distribution.
c 8.481 8.422 8.476 8.382 8.484 8.403 8.414 8.419
« Around which class grouping, if any, are the costs of
8.385 8.465 8.498 8.447 8.436 8.413 8.489 8.414
attending a baseball game concentrated? Explain.
8.481 8.415 8.479 8.429 8.458 8.462 8.460 8.444
J>|LFJ 2.32 The file iMEGl contains the following data 8.429 8.460 8.412 8.420 8.410 8.405 8.323 8.420
about the cost of electricity during July 2009 for a 8.396 8.447 8.405 8.439 8.411 8.427 8.420 8.498
random sample of 50 one-bedroom apartments in a large city: 8.409
62 CHAPTfR 2 Organizing and Visualizing Data

Data were collected from 600 customers and organized in a. Construct a pie chart and a Pareto chart for the percent-
the following contingency tables: age of counties using the various methods.
b. What conclusions can you reach concerning the type of
GENDER voting method used in November 2006?
DESSERT ORDERED c. What differences are there between the methods used in
Male Female Total
2000 and 2006?
Yes 40 136 96
2.98 In summer 2000, a growing number of warranty
No 240 464 224
Total 600 claims on Firestone tires sold on Ford SUVs prompted
280 320
Firestone and Ford to issue a major recall. An analysis of
warranty claims data helped identify which models to recall.
BEEF ENTREE A breakdown of 2,504 warranty claims based on tire size is
given in the following table:
DESSERT ORDERED Yes No Total

Yes 71 65 136 Tire Size Number of Warranty Claims


No 116 348 464
23575R15 2,030
Total 187 413 600
311050R15 137
30950R15 82
a. For each of the two contingency tables, construct contin- 23570R16 81
gency tables of row percentages, column percentages, 331250R15 58
and total percentages. 25570R16 54
b. Which type of percentage (row, column, or total) do you Others 62
think is most informative for each gender? For beef entree? Source: Data extracted from Robert L. Simison, "Ford Steps Up Recall
Explain. Without Firestone," The Wall Street Journal, August 14, 2000, p. A3.
c. What conclusions concerning the pattern of dessert
ordering can the restaurant owner reach?
The 2,030 warranty claims for the 23575R15 tires can be
2.97 The following data represent the method for recording categorized into ATX models and Wilderness models. The
votes in the November 2006 election, broken down by percent- type of incident leading to a warranty claim, by model type,
age of counties in the United States, using each method and is summarized in the following table:
the number of counties using each method in 2000 and 2006.
ATX Model Wilderness
Percentage of Incident Type Warranty Claims Warranty Claims
Counties Using
Method Method in 2006 (%) Tread separation 1,365 59
Blowout 77 41
Electronic 36.6 Other/unknown 422 _66
Hand-counted paper ballots 1.8 Total 1,864 166
Lever 2.0
Source: Data extracted from Robert L. Simison, "Ford Steps Up Recall
Mixed 3.0 Without Firestone," The Wall Street Journal, August 14, 2000, p. A3.
Optically scanned paper ballots 56.2
Punch card 0.4
Source: Data extracted from R. Wolf, "Paper-Trail Voting Gets Organized a. Construct a Pareto chart for the number of warranty
Opposition," USA Today, April 24, 2007, p. 2A. claims by tire size. What tire size accounts for most of
the claims?
b. Construct a pie chart to display the percentage of the total
Number of Counties number of warranty claims for the 23575R15 tires that
Method 2000 2006 come from the ATX model and Wilderness model.
Interpret the chart.
Electronic 309 1,142 c. Construct a Pareto chart for the type of incident causing
Hand-counted paper ballots 370 57 the warranty claim for the ATX model. Does a certain
Lever 434 62 type of incident account for most of the claims?
Mixed 149 92 d. Construct a Pareto chart for the type of incident causing
Optically scanned paper ballots 1,279 1,752 the warranty claim for the Wilderness model. Does a cer-
Punch card 572 13 tain type of incident account for most of the claims?
Source: Data extracted from R. Wolf, "Paper-Trail Voting Gets Organized
Opposition," USA Today, April 24, 2007, p. 2A. 2.99 One of the major measures of the quality of service
provided by an organization is the speed with which the
Chapter Review Problems 63

organization responds to customer complaints. A large a. Construct a stem-and-leaf display for each of the three
family-held department store selling furniture and flooring, variables.
including carpet, had undergone a major expansion in the b. Construct three scatter plots: money market account ver-
past several years. In particular, the flooring department had sus one-year CD, money market account versus five-year
expanded from 2 installation crews to an installation super- CD, and one-year CD versus five-year CD.
visor, a measurer, and 15 installation crews. A business c. Discuss what you learn from studying the graphs in (a)
objective of the company was to reduce the time between and (b).
when the complaint is received and when it is resolved.
During a recent year, the company received 50 complaints 2 . 1 0 3 The file l ^ ^ t e f f l f f l includes the total compensa-
concerning carpet installation. The data from the 50 com- tion (in $) of CEOs of large public companies in 2008.
plaints, organized in WSSSSHi, represent the number of days Source: Data extracted from D. Jones and B. Hansen, "CEO Pay
Dives in a Rough 2008," www.usatoday.com, May 1, 2009.
between the receipt of the complaint and the resolution of
the complaint: a. Construct a frequency distribution and a percentage
distribution.
b. Construct a histogram and a percentage polygon.
54 5 35 137 31 27 152 2 123 81 74 27 c. Construct a cumulative percentage distribution and plot a
11 19 126 110 110 29 61 35 94 31 26 5 cumulative percentage polygon (ogive).
12 4 165 32 29 28 29 26 25 1 14 13 d. Based on (a) through (c), what conclusions can you reach
13 10 5 27 4 52 30 22 36 26 20 23 concerning CEO compensation in 2008?
33 68
2 . 1 0 4 Studies conducted by a manufacturer of "Boston"
and "Vermont" asphalt shingles have shown product weight
a. Construct a frequency distribution and a percentage to be a major factor in customers' perception of quality.
distribution. Moreover, the weight represents the amount of raw materi-
b. Construct a histogram and a percentage polygon. als being used and is therefore very important to the com-
c. Construct a cumulative percentage distribution and plot a pany from a cost standpoint. The last stage of the assembly
cumulative percentage polygon (ogive). line packages the shingles before the packages are placed on
d. On the basis of the results of (a) through (c), if you had wooden pallets. The variable of interest is the weight in
to tell the president of the company how long a customer pounds of the pallet which for most brands holds 16 squares
should expect to wait to have a complaint resolved, what of shingles. The company expects pallets of its "Boston"
would you say? Explain. brand-name shingles to weigh at least 3,050 pounds but less
than 3,260 pounds. For the company's "Vermont" brand-
2.100 Data concerning 128 of the best-selling domestic name shingles, pallets should weigh at least 3,600 pounds
beers in the United States are contained in I'/fflFEflflirTfj. but less than 3,800. Data are collected from a sample of 368
The values for three variables are included: percentage alco- pallets of "Boston" shingles and 330 pallets of "Vermont"
hol, number of calories per 12 ounces, and number of carbo- shingles and stored in fcE^iffl.
hydrates (in grams) per 12 ounces.
a. For the "Boston" shingles, construct a frequency distri-
Source: Data extracted from www.Beerl00.com, June 15, 2009.
bution and a percentage distribution having eight class
a. Construct a percentage histogram for each of the three intervals, using 3,015, 3,050, 3,085, 3,120, 3,155, 3,190,
variables. 3,225, 3,260, and 3,295 as the class boundaries.
b. Construct three scatter plots: percentage alcohol versus
b. For the "Vermont" shingles, construct a frequency distri-
calories, percentage alcohol versus carbohydrates, and
bution and a percentage distribution having seven class
calories versus carbohydrates.
intervals, using 3,550, 3,600, 3,650, 3,700, 3,750, 3,800,
c. Discuss what you learn from studying the graphs in (a) 3,850, and 3,900 as the class boundaries.
and (b).
c. Construct percentage histograms for the "Boston" shin-
2.101 The file rffiflfotregfl contains the state cigarette tax, gles and for the "Vermont" shingles.
in dollars, for each state as of April 1, 2009. d. Comment on the distribution of pallet weights for the
a. Develop an ordered array. "Boston" and "Vermont" shingles. Be sure to identify
"• Plot a percentage histogram. the percentage of pallets that are underweight and
c. What conclusions can you reach about the differences in overweight.
the state cigarette tax between the states?
2 . 1 0 5 The file £SEEE222BS includes the overall cost
<• 102 The file ? ^ 2 S S I S 3 contains the yields for a money index, the monthly rent for a two-bedroom apartment, the
Market account, a one-year certificate of deposit (CD), and cost of a cup of coffee with service, the cost of a fast-food
a five-year CD, for 23 banks in the metropolitan New York hamburger meal, the cost of dry-cleaning a men's blazer, the
ar
ea, as of May 28, 2009. cost of toothpaste, and the cost of movie tickets in 10 differ-
0Urce:
Data extracted from www.Bankrate.com, May 28, 2009. ent cities.
100 CHAPTER 3 Numerical Descriptive Measures

I Problems for Sections 3.1 and 3.2


LEARNING THE BASICS 3.8 The operations manager of a plant that manufactures tires
3.1 The following is a set of data from a sample of n — 5: wants to compare the actual inner diameters of two grades
of tires, each of which is expected to be 575 millimeters.
7 4 9 8 2
A sample of five tires of each grade was selected, and the
a. Compute the mean, median, and mode. results representing the inner diameters of the tires, ranked
b. Compute the range, variance, standard deviation, and from smallest to largest, are as follows:
coefficient of variation.
c. Compute the Z scores. Are there any outliers? Grade X Grade Y
d. Describe the shape of the data set.
568 570 575 578 584 573 574 575 577 578
3.2 The following is a set of data from a sample of n = 6:
a. For each of the two grades of tires, compute the mean,
7 4 9 7 3 12
median, and standard deviation.
a. Compute the mean, median, and mode. b. Which grade of tire is providing better quality?
b. Compute the range, variance, standard deviation, and Explain.
coefficient of variation. c. What would be the effect on your answers in (a) and (b) if
c. Compute the Z scores. Are there any outliers? the last value for grade Fwere 588 instead of 578? Explain.
d. Describe the shape of the data set.
3.9 According to the U.S. Census Bureau, in November
3.3 The following set of data is from a sample of n = 7: 2008 the median sales price of new houses was $220,400,
12 7 4 9 0 7 3 and the mean sales price was $287,500 (extracted from
a. Compute the mean, median, and mode. www.census.gov, January 21, 2009).
b. Compute the range, variance, standard deviation, and a. Interpret the median sales price.
coefficient of variation. b. Interpret the mean sales price.
c. Compute the Z scores. Are there any outliers? c. Discuss the shape of the distribution of the price of new
d. Describe the shape of the data set. houses.

3.4 The following is a set of data from a sample of n = 5: | /SELF| 3.10 The file MPWflJlffifi contains the prices for
7 - 5 - 8 7 9 MBlfrTTn two tickets, with online service charges, large pop-
corn, and two medium soft drinks, at a sample of six theater
a. Compute the mean, median, and mode.
chains:
b. Compute the range, variance, standard deviation, and
$36.15 $31.00 $35.05 $40.25 $33.75 $43.00
coefficient of variation.
c. Compute the Z scores. Are there any outliers? Source: Data extracted from K. Kelly, "The Multiplex Under Siege,"
The Wall Street Journal December '24-25, 2005, pp. PI, P5.
d. Describe the shape of the data set.
a. Compute the mean and median.
3.5 Suppose that the rate of return for a particular stock b. Compute the variance, standard deviation, range, and
during the past two years was 10% and 30%. Compute the coefficient of variation.
geometric rate of return. (Note: A rate of return of 10% is c. Are the data skewed? If so, how?
recorded as 0.10, and a rate of return of 30% is recorded d. Based on the results of (a) through (c), what conclu-
as 0.30.) sions can you reach concerning the cost of going to the
3.6 Suppose that the rate of return for a particular stock movies?
during the past two years was 20% and —30%. Compute the 3.11 The file E23SII contains the overall miles per gallon
geometric rate of return. (MPG) of 2009 sedans priced under $20,000.
27 31 30 28 27 24 29 32
APPLYING THE CONCEPTS 32 27 26 26 25 26 25 24
3.7 A business school reported its findings from a study Source: Data extracted from "Vehicle Ratings," Consumer Reports,
of recent graduates. A sample of n — 10 finance majors April 2009, p. 27.
had a mean starting salary of $45,000, a median starting a. Compute the mean, median, and mode.
salary of $45,000, and a standard deviation of $10,000. b. Compute the variance, standard deviation, range, coeffi-
A sample of n = 10 information systems majors had cient of variation, and Z scores.
a mean starting salary of $56,000, a median of $45,000, c. Are the data skewed? If so, how?
and a standard deviation of $37,000. Discuss the central d. Compare the results of (a) through (c) to those of
tendency, variation, and shape of starting salaries for the Problem 3.12 (a) through (c) that refer to the miles per
two majors. gallon of SUVs priced under $30,000.
3.2 Variation and Shape 101

3.12 The file EE^'- contains the overall miles per gallon a. For money market accounts and five-year CDs, sepa-
(MPG) of 2009 small SUVs priced under $30,000. rately compute the variance, standard deviation, range,
and coefficient of variation.
24 23 22 21 22 22 18 19 19 19 21 21
b. Based on the results of (a), do money market accounts or
21 18 19 21 17 22 18 18 22 16 16 five-year CDs have more variation in the highest yields
Source: Data extracted from "Vehicle Ratings," Consumer Reports, offered? Explain.
April 2009, pp. 33-34.
a. Compute the mean, median, and mode. 3.16 The file l^r^TpslH^ contains the starting admission
b. Compute the variance, standard deviation, range, coeffi- price (in $) for one-day tickets to 10 theme parks in the
cient of variation, and Z scores. United States:
c. Are the data skewed? If so, how? 58 63 41 42 29 50 62 43 40 40
d. Compare the results of (a) through (c) to those of Source: Data extracted from C. Jackson and E. Gamerman,
Problem 3.11 (a) through (c) that refer to the miles per "Rethinking the Thrill Factor," The Wall Street Journal, April 15-16,
2006, pp. PI, P4.
gallon of sedans priced under $20,000.
a. Compute the mean, median, and mode.
3.13 The file E2SSBH22J5 contains the cost (in cents) per b. Compute the range, variance, and standard deviation.
l-ounce serving for a sample of 13 chocolate chip cookies. c. Based on the results of (a) and (b), what conclusions can
The data are as follows: you reach concerning the starting admission price for
54 22 25 23 36 43 7 43 25 47 24 45 44 one-day tickets.
Source: Data extractedfrom "Chip, Chip, Hooray," Consumer Reports, d. Suppose that the first value was 98 instead of 58. Repeat
June 2009, p. 7. (a) through (c), using this value. Comment on the differ-
a. Compute the mean, median, and mode. ence in the results.
b. Compute the variance, standard deviation, range, coeffi-
3.17 A bank branch located in a commercial district of a
cient of variation, and Z scores. Are there any outliers?
city has the business objective of developing an improved
Explain.
process for serving customers during the noon-to-l:00 P.M.
c. Are the data skewed? If so, how?
lunch period. The waiting time, in minutes, is defined as the
d. Based on the results of (a) through (c), what conclusions
time the customer enters the line to when he or she reaches
can you reach concerning the cost of chocolate chip
the teller window. Data are collected from a sample of 15
cookies?
customers during this hour. The file EH2S1 contains the
3.14 The file l»JTJ!l«iiE!WffEI contains the cost per ounce results, which are listed below:
($) for a sample of 14 dark chocolate bars. 4.21 5.55 3.02 5.13 4.77 2.34 3.54 3.20
0.68 0.72 0.92 1.14 1.42 0.94 0.77 4.50 6.10 0.38 5.12 6.46 6.19 3.79
0.57 1.51 0.57 0.55 0.86 1.41 0.90 a. Compute the mean and median.
Source: Data extracted from "Dark Chocolate: Which Bars Are Best? " b. Compute the variance, standard deviation, range, coeffi-
Consumer Reports, September 2007, p. 8.
cient of variation, and Z scores. Are there any outliers?
a. Compute the mean, median, and mode. Explain.
b. Compute the variance, standard deviation, range, coefficient c. Are the data skewed? If so, how?
of variation, and Z scores. Are there any outliers? Explain. d. As a customer walks into the branch office during the
c. Are the data skewed? If so, how? lunch hour, she asks the branch manager how long she
d. Based on the results of (a) through (c), what conclusions can expect to wait. The branch manager replies, "Almost
can you reach concerning the cost of dark chocolate bars? certainly less than five minutes." On the basis of the
results of (a) through (c), evaluate the accuracy of this
3.15 Is there a difference in the variation of the yields of
different types of investments? The file i:EfflOTf?ir.l contains statement.
the nationwide highest yields of money market accounts and 3.18 Suppose that another bank branch, located in a resi-
five-year CDs as of May 17, 2009: dential area, is also concerned with the noon-to-1 P.M. lunch
hour. The waiting time, in minutes, collected from a sample
of 15 customers during this hour, is contained in the file
Money Market Five-Year CD
i:pms4 and listed below:
2.25 3.70 9.66 5.90 8.02 5.79 8.73 3.82 8.01 8.35
2.20 3.66
2.12 3.65 10.49 6.68 5.64 4.08 6.17 9.91 5.47
2.03 3.50 a. Compute the mean and median.
2.02 3.50 b. Compute the variance, standard deviation, range, coeffi-
cient of variation, and Z scores. Are there any outliers?
Source: Data extracted from www.Bankrate.com, May 17, 2009.
Explain.
3.4 Numerical Descriptive Measures for a Population 113

EXAMPLE 3.16 As in Example 3.15, a population of 12-ounce cans of cola is known to have a mean fill-weight
of 12.06 ounces and a standard deviation of 0.02. However, the shape of the population is
Using the unknown, and you cannot assume that it is bell-shaped. Describe the distribution of fill-
Chebyshev Rule weights. Is it very likely that a can will contain less than 12 ounces of cola?
SOLUTION
fi ± a = 12.06 ± 0.02 = (12.04, 12.08)
fi ± 2a = 12.06 ± 2(0.02) = (12.02,12.10)
fi±3(r = 12.06 ± 3(0.02) = (12.00,12.12)
Because the distribution may be skewed, you cannot use the empirical rule. Using the
Chebyshev rule, you cannot say anything about the percentage of cans containing between
12.04 and 12.08 ounces. You can state that at least 75% of the cans will contain between 12.02
and 12.10 ounces and at least 88.89% will contain between 12.00 and 12.12 ounces. Therefore,
between 0 and 11.11% of the cans will contain less than 12 ounces.

You can use these two rules to understand how data are distributed around the mean when you
have sample data. With each rule, you use the value you calculated for X in place of JX and the
value you calculated for £ in place of a. The results you compute using the sample statistics are
approximations because you used sample statistics (X, S) and not population parameters (/JL, o~).

I Problems for Section 3.4


LEARNING THE BASICS a. Compute the mean, variance, and standard deviation for
3.37 The following is a set of data for a population with this population.
N= 10: b. What percentage of these businesses have quarterly sales
tax receipts within ± 1 , ±2, or ± 3 standard deviations of
7 5 11 8 3 6 2 1 9 8 the mean?
a. Compute the population mean. c. Compare your findings with what would be expected on
b. Compute the population standard deviation. the basis of the empirical rule. Are you surprised at the
3.38 The following is a set of data for a population with results in (b)?
N = 10: 3 . 4 0 Consider a population of 1,024 mutual funds that
7 5 6 6 6 4 6 9 3 primarily invest in large companies. You have determined
a. Compute the population mean. that fi, the mean one-year total percentage return
b. Compute the population standard deviation. achieved by all the funds, is 8.20 and that a, the standard
deviation, is 2.75.
APPLYING THE CONCEPTS a. According to the empirical rule, what percentage of these
funds are expected to be within ± 1 standard deviation of
3.39 The file fftfe contains the quarterly sales tax receipts
the mean?
(in thousands of dollars) submitted to the comptroller of the
b. According to the empirical rule, what percentage of these
Village of Fair Lake for the period ending March 2009 by all
funds are expected to be within ±2 standard deviations
50 business establishments in that locale:
of the mean?
10.3 11.1 9.6 9.0 14.5 c. According to the Chebyshev rule, what percentage of
13.0 6.7 11.0 8.4 10.3 these funds are expected to be within ± 1, ±2, or ±3 stan-
13.0 11.2 7.3 5.3 12.5 dard deviations of the mean?
8.0 11.8 8.7 10.6 9.5 d. According to the Chebyshev rule, at least 93.75% of
these funds are expected to have one-year total returns
11.1 10.2 11.1 9.9 9.8
between what two amounts?
11.6 15.1 12.5 6.5 7.5
3.41 The file RfflffCT^;< contains the state cigarette tax,
10.0 12.9 9.2 10.0 12.8
in dollars, for each of the 50 states as of April 1, 2009.
12.5 9.3 10.4 12.7 10.5 a. Compute the population mean and population standard
9.3 11.5 10.7 11.6 7.8 deviation for the state cigarette tax.
10.5 7.6 10.1 8.9 8.6 b. Interpret the parameters in (a).
3.5 The Covariance and the Coefficient of Correlation 119

The value and revenues of the NBA teams are very highly correlated. The teams with the
lowest revenues have the lowest values. The teams with the highest revenues have the high-
est values. This relationship is very strong, as indicated by the coefficient of correlation,
r = 0.9848.
Although in general you cannot assume that just because two variables are correlated that
changes in one variable caused changes in the other variable, for this example, it makes sense
to conclude that changes in revenue will cause changes in the value of a team.

In summary, the coefficient of correlation indicates the linear relationship, or associa-


tion, between two numerical variables. When the coefficient of correlation gets closer to
+ 1 or —1, the linear relationship between the two variables is stronger. When the coeffi-
cient of correlation is near 0, little or no linear relationship exists. The sign of the coef-
ficient of correlation indicates whether the data are positively correlated (i.e., the larger
values of X are typically paired with the larger values of Y) or negatively correlated (i.e.,
the larger values of X are typically paired with the smaller values of Y). The existence of a
strong correlation does not imply a causation effect. It only indicates the tendencies pres-
ent in the data.

| Problems for Section 3.5


LEARNING THE BASICS Product Calories Fat
3.44 The following is a set of data from a sample of
Dunkin' Donuts Iced Mocha Swirl 240 8.0
n = 11 items:
latte (whole milk)
X 1 5 8 10 12 4 9 15 18 Starbucks Coffee Frappuccino 260 3.5
Y 21 15 24 1! 30 36 12 27 45 54 blended coffee
Dunkin' Donuts Coffee Coolatta 350 22.0
a. Compute the covariance. (cream)
b. Compute the coefficient of correlation. Starbucks Iced Coffee Mocha Espresso 350 20.0
c. How strong is the relationship between X and 7? (whole milk and whipped cream)
Explain. Starbucks Mocha Frappuccino 420 16.0
blended coffee (whipped cream)
Starbucks Chocolate Brownie 510 22.0
APPLYING THE CONCEPTS Frappuccino blended coffee
(whipped cream)
3.45 A study of 218 students at Ohio State University sug-
Starbucks Chocolate Frappuccino 530 19.0
gests a link between time spent on the social networking site
blended creme (whipped cream)
Facebook and grade point average. Students who rarely or
never used Facebook had higher grade point averages than Source: Data extracted from "Coffee as Candy at Dunkin 'Donuts and
Starbucks," Consumer Reports, June 2004, p. 9.
the students who use Facebook.
Source: Data extracted from M. B. Marklein, "Facebook Use Linked a. Compute the covariance.
to Less Textbook Time," www.usatoday.com, April 14, 2009.
b. Compute the coefficient of correlation.
a. Does the study suggest that time spent on Facebook and c. Which do you think is more valuable in expressing the
grade point average are positively correlated or nega- relationship between calories and fat—the covariance or
tively correlated? the coefficient of correlation? Explain.
b. Do you think that there might be a cause-and-effect rela- d. Based on (a) and (b), what conclusions can you reach
tionship between time spent on Facebook and grade point about the relationship between calories and fat?
average? Explain.
3.47 There are several methods for calculating fuel effi-
3.46 The file E 2 E 3 M contains the calories ciency. The following table (stored in ESSE^;;) indicates
and fat, in grams, of 16-ounce iced coffee drinks mileage (in miles per gallon), as calculated by owners and
at Dunkin' Donuts and Starbucks: by current government standards:
aq

r--~'----- ----- ---,,-. - .


I P roblems for Section 4.1
I

!
LEARNIf.JG THE nASIC~, and collecti\ely exhaustive or explain why that would not be
useful.
4.1 Two coins arc tossed_
a. Givc an example of a simple eWllt. a. Registered voters in the United States wcre asked
b. Give an example of a joint event. whether they are registered as Republicans or Democrats.
i c. What is the complement of a head on thc first toss? b. Each respondent was classified by the origin of the car he
I

!, 4.2 An urn contains 12 red balls and 8 white balls. One ball
or shc dri\es: Amcrican. European. Japanesc. or nonc.
c. People were asked. "Do you currently live in (i) an apart·
is to be selected from the urn. ment or (ii) a house'?"
a. Give an example of a simple event. d. A product \\"as classified as defective or not defective.
b. What is the complement of a red ball?
4.7 Which of the following events occur with a probabiltty
4.3 Given the following contingency table: of zero? For each. state why or why not.
a. A voter in the United States is registered as a Republican
B B' and as a Democrat.
b. A voter in the United States is female and registered as a
A 10 20 Republican.
A' 20 40 c. An automobile is a Ford and a Toyota.
d. An automobile is a Toyota and was manufactured in the
What is the probability of United States.
a. event A? 4.8 According to an Ipsos poll. the perception of unfair­
b. event A'? ness in the U.S. tax code is spread fairly evenly across
c. event A and B? income groups. age groups, and education levels. In an April
d. event A or B? 2006 survey of 1,005 adults, Jpsos reported that almost 60';0
4.4 Given the following contingency table: of all people said the code is unfair, whereas slightly more
than 60% of those making more than S50,000 viewed the
code as unfair ("People Cry Unfairness." The Cincinnati
B B' Enquirer, April 16. 2006. p. AS). Suppose that the following
A \0 30
contingency table represents the specific breakdown of
A' 25 35
responses:

INCOME LEVEL
What is the probability of
a. event A'? U.S. TAX Less Than More Than

b. event A alld B? CODE $50,000 $50,000 Total

c. event A' Gnd B'?


Fair 225 ISO 405

d. event A' or B',?


Unfair 280 320 600

Total 505 1,005

APPLYING THE CONCEPTS


4.5 For each of the following, indicate whether the type of a. Give an example of a simple event.
probability involved is an example of a priori probability, b. Give an example of a joint event.
empirical probability. or subjective probability. c. What is the complement of "tax code is fair'''?
a. The next toss of a fair coin will land on heads. d. Why is "tax code is fair and makes less than $50,000" a
b. Italy will win soccer's World Cup the next time the C0111­ joint event?
petition is held.
4.9 Referring to the contingency table in Problem 4.8, if a
c. The sum of the faces of two dice will be seven.
respondent is selected at random, what is the probability
d. The train taking a commuter to work will be more than
that he or she
10 minutes late.
a. thinks the tax code is unfair?
4.6 For each of the following, state whether the events cre­ b. thinks the tax code is unfair and makes less than
ated are mutually exclusive and collectively exhaustive. If $50,OOO?
they are not mutually exclusive and collectively exhaustive. c. thinks the tax code is unfair or makes less than S50.000?
either reword the categories to make them mutually exclusive d. Explain the difference in the results in (b) and (c).
..
4, I Basic Probability Concepts 141

4.10 Do people of different age groups differ in their c. i~ engaged with his or her workplace or is a U.S.
response to e-mail messages? A survey by the Center for the worker?
Digital Future of the University of Southem California (data d. Explain the difference in the results in (b) and (c).
extracted from A. Mindlin, "Older E-mail Users Favor Fast
4.13 Where people turn for news is different for various
Replies:' The New York Times, July 14, 2008. p. B3) reported
age groups. A study conducted on this issue (data extracted
that 70.7% of users over 70 years of age believe that e-mail
from P Johnson, "Young People Turn to the Web for News,"
messages should be answered quickly, as compared to 53.6%
USA Today, March 23, 2006, p. 9D) was based on 200
ofusers 12 to 50 years old, Suppose that the survey was based
respondents who were between ages 36 and 50 and 200
on 1,000 users over 70 years of age and 1,000 users 12 to
respondents who were over age 50. Of the 200 respondents
50 years old. The following table summarizes the results:
who were between ages 36 and 50, 82 got their news prima­
rily from newspapers. Of the 200 respondents who were over
AGE OF RESPONDENTS
age 50, 104 got their news primarily from newspapers.
ANSWERS QUICKLY 12-50 Over 70 Total
Construct a contingency table to evaluate the probabilities.
I f a respondent is selected at random, what is the probability
Yes 536 707 1,243
that he or she
No 464 293 757
a. got news primarily from newspapers'?
Total 1,000 1,000 2,000
b. got news primarily from newspapers and IS over
50 years old?
a. Give an example of a simple event. c. got news primarily from newspapers or IS over
b. Give an example of a joint event. 50 years old?
c. What is the complement of a respondent who answers d. Explain the difference in the results in (b) and (c).
quickly?
d. Why is a respondent who answers quickly and is over 4.14 A sample of 500 respondents in a large metropoli­
70 years old a joint event? tan area was selected to study consumer behavior. Among
the questions asked was "Do you enjoy shopping for cloth­
4.11 Referring to the contingency table in Problem 4.10, if a ing?" Of 240 males, 136 answered yes. Of 260 females,
respondent is selected at random, what is the probability that 224 answered yes. Construct a contingency table to evalu­
a. he or she answers quickly? ate the probabilities, What is the probability that a respon­
b. is over 70 years old? dent chosen at random
c. he or she answers quickly or is over 70 years old? a. enjoys shopping for clothing?
d. Explain the difference in the results in (b) and (c). b. is a female and enjoys shopping for clothing?
r7SELi=l 4.12 According to a Gallup Pol~, the ~xtent to c. is a female or enjoys shopping for clothing?
BilmJ which employees are engaged with their work­ d. is a male or a female?
place varies from country to country. Gallup reports that the 4.15 Each year, ratings are compiled concerning the per­
percentage of U.S. workers engaged with their workplace formance of new cars during the first 90 days of use.
is more than twice as high as the percentage of German Suppose that the cars have been categorized according to
workers. The study also shows that having more engaged whether a car needs warranty-related repair (yes or no) and
workers leads to increased innovation, productivity, and the country in which the company manufacturing a car is
profitability, as well as reduced employee turnover. The based (United States or not United States). Based on the data
results of the poll are summarized in the following table: collected, the probability that the new car needs a warranty
repair is 0.04, the probability that the car was manufactured
COUNTRY by a U.S.-based company is 0.60, and the probability that
ENGAGEMENT United States Germany Total the new car needs a warranty repair and was manufactured
by a U.S.-based company is 0.025. Construct a contingency
Engaged 550 246 796 table to evaluate the probabilities of a warranty-related
Not engaged 1,345 1,649 2,994 repair. What is the probability that a new car selected at
Total 1,895 1,895 3,790 random
Source: Data extracted jiT)m M. Nink, "Employee Disengagement Plagues a. needs a warranty repair?
German):" Gallup Management Journal, gmj.gallup.com. April 9. 2009. b. needs a warranty repair and was manufactured by a U.S.­
based company?
If an employee is selected at random, what is the probability c. needs a warranty repair or was manufactured by a U.S.­
that he or she based company?
a. is engaged with his or her workplace? d. needs a warranty repair or was not manufactured by a
b. is a u.s. worker? U.S.-based company?
4.2 Conditional Probability 147

To illustrate Equation (4.8), refer to Table 4.1 on pagc 136. Let


P(A) = probability of "planned to purchase"
45 P(BI) = probability of "actually purchased"
I is­
)i 1­ P(B 2 ) probability of "did not actually purchase"
Then, using Equation (4.8), the probability of planned to purchase is
peA) peA IBI)P(B I ) + peA IB2 )P(B 2 )

(~~~)C~~~o) + (;o~)C~~~)
200 50 250
=--+ 0.25
1,000 1,000 1,000

he
ult
on
I Problems for Section 4.2
LEARNING THE BASICS 200 respondents who were over age 50. Of the 200 respon­
ng
79 4.16 Given the following contingency table: dents who were between ages 36 and 50, 82 got their news
primarily from newspapers. Of the 200 respondents who were
over age 50, 104 got their news primarily from newspapers.
B B'
a. Given that a respondent is over age 50, what is the proba­
A 10 20 bility that he or she gets news primarily from newspapers?
A' 20 40 b. Given that a respondent gets news primarily from newspa­
pers, what is the probability that he or she is over age 50?
What is the p robability of c. Explain the difference in the results in (a) and (b).
a. AlB? d. Are the two events whether the respondent is over age 50
or b. AlB'? and whether he or she gets news primarily from newspa­
c. A' IB'? pers independent?
d. Are events A and B independent?
4.22 Do people of different age groups differ in their
4.17 Given the following contingency table: response to e-mail messages? A survey by the Center for
the Digital Future of the University of Southern California
B B' (data extracted from A. Mindlin, "Older E-mail Users Favor
Fast Replies," The New York Times, July 14,2008, p. B3)
A 10 30
reported that 70.7% of users over 70 years of age believe
A' 25 35 that e-mail messages should be answered quickly, as com­
pared to 53.6% of users J 2 to 50 years old. Suppose that the
re What is the probability of survey was based on 1,000 users over 70 years of age and
a. AlB? 1,000 users 12 to 50 years old. The following table summa­
b. A' IB'? rizes the results:
c. AlB'?
d. Are events A and B independent?
AGE OF RESPONDENTS
4.18 If peA and B) = 0.4 and PCB) 0.8, find peA IB).
ANSWERS QUICKLY 12-50 Over 70 Total
n 4.19 If peA) 0.7,P(B) 0.6, and A and B are inde­
Yes 536 707 1,243
pendent, find peA and B).
No 464 293 757
4.20 If peA) 0.3, PCB) = 0.4, and peA and B) = 0.2, Total 1,000 1,000 2,000
are A and B independent?
a. Suppose you know that the respondent is between 12 and
APPLYING THE CONCEPTS 50 years old. What is the probability that he or she answers
4.21 Where people turn for news is different for various age quiekly?
groups. Suppose that a study conducted on this issue (data b. Suppose you know that the respondent is over 70 years old.
extracted from P. Johnson, "Young People Turn to the Web What is the probability that he or she answers quickly?
for News," U,)A Today, March 23. 2006, p. 9D) was based c. Are the two events, answers quickly and age, independent?
on 200 respondents who were between ages 36 and 50 and Explain.
148 CllAP1TR..\ Basic Probahililv

4.23 According to an Ipso:-; poll. the pcn.:eption or unrair­ 4.25 A ~(ll1lpk or -"Oil r~spondent" in a large llletropnlit:ll)
ness in the U.S. tax code i~ spread fairly e\c111y acro~" area WHS sckctcd III study consumer beha\ior. \\ ilh th~
incomc groups, age groups, and cducation len~ls. In an April following results:
2006 survey of 1,005 ildults, Ipsos reportt'd that almost Clf)%
of all people said the code is unfair. whereas slightly more GENDER
ENJOYS SHOPPING
than 60% of those making morc than $50.000 vit'wcd the
FOR CLOTHING Male Female Total
code as unfair ("Pt'op1c Cry Unfairness," The Cillcilll/ali
El1quirer. April 16. 2006. p. A8). Suppose that the following Yes 136 224 36[)
contingency table represents the speci fic breakdown of No 104 36 140
responses: Total 240 260 500

INCOME LEVEL a. Suppose the respondent chosen is a female. What is the


probability that she does not enjoy shopping for clothing?
Less Than More Than
b. Suppose the respondent chosen enjoys shopping for
TAX COOE $50,000 Total
clothing. What is the probability that the individual is a
Fair 225 180 405 male'?
Unfair 280 320 600 c. Are enjoying shopping for elothing and the gender of the
Total 505 500 1,005 individual independent? Explain.
4.26 Each year, ratings are compiled coneerning the per­
a. Given that a respondent earns less than $50,000, what formance of new cars during the first 90 days of u:-;e.
is the probability that he or she said that the tax code is Suppose that the cars have been categorized according to
fair? whether a car needs warranty-related repair (yes or no) and
b. Given that a respondent earns more than $50.000, what the country in which the company manufacturing a C,lr is
is the probability that he or she said that the tax code is based (United States or not United States). Based on the
fair? data collected the probability that the new car needs a war­
c. Is income level independent of attitude about whether the ranty repair is 0.04, the probability that the car is manufac­
tax code is fair? Explain. tured by a U.S.-based company is 0.60, and the probability
~ 4.24 According to a Gallup Poll, the extent to that the new car needs a warranty repair and was manufac­
~ which employees are engaged with their work­ tured by a U.S.-based company is 0.025.
place varies from country to country. Gallup reports that the a. Suppose you know that a company based in the United
percentage of U.S. workers engaged with their workplace is States manufactured a particular car. What is the proha­
more than twice as high as the percentage of German work­ bility that the car needs warranty repair?
ers. The study also shows that having more engaged workers b. Suppose you know that a company based in the United
leads to increased innovation, productivity, and profitability, States did not manufacture a particular car. What is the
as well as reduced employee turnover. The results of the poll probability that the car needs warranty repair?
are summarized in the following table: c. Are need for warranty repair and location of the cOIll[1any
manufacturing the car independent?
COUNTRY 4.27 In 37 of the 59 years from 1950 through 2008. the
S&P 500 finished higher after the first 5 days of trading In
ENGAGEMENT United States Germany Total
32 of those 37 years, the S&P 500 finished higher for the
Engaged 550 246 796 year. Is a good first week a good omen for the upcoming
Not Engaged 1,345 1,649 2,994 year? The following table gives the first-week and annual
Total 1,895 1.895 3,790 performance over this 59-year period:
Source: Dala exfracledfi'Oln M. Nink. "Employee Disengagemelll
Plagues Germany," Gallup Management Journal. gmj.gallup.com, S&P 500'S ANNUAL PERFORMANCE
April 9. 20()9.
FIRST WEEK Higher Lower
----------------~ -------------------­
,. a. Given that a worker is from the United States. what is the Higher 32 5
probability that the worker is engaged'? Lower ]I II
b. Given that a worker is from the United States, what is the
probability that the worker is not engaged? a. If a year is selected at random, what is the probahility
c. Given that a worker is from Germany. what is the proba­ that the S&P 500 finished higher for the year?
bility that the worker is engaged? b. Given that the S&P 500 finished higher after the first
d. Given that a worker is from Germany, what is the proba­ 5 days of trading. what is the probability that it finished
bility that the worker is not engaged? higher for the year?
;:e

4.3 Bayes' Theorem 149

c. Are the two events "first-week performance" and "annual II points. A II other cards are counted at their face value.
performance" independent? Explain. Blackjack is achieved if 2 cards total 21 points. What is
d. Look up the performance after the first 5 days of 2009 the probability of getting blackjack in this problem?
and the 2009 annual performance of the S&P 500 at
4.29 A box of nine gloves contains two left-handed gloves
finance.yahoo.com. Comment on the results.
and seven right-handed gloves.
4.28 A standard deck of cards is being used to playa game. a. If two gloves are randomly selected from the box, with­
There are four suits (hearts, diamonds, clubs, and spades), out replacement (the first glove is not returned to the box
each having 13 faces (ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, after it is selected), what is the probability that both
queen, and king), making a total of 52 cards. This complete gloves selected will be right-handed?
deck is thoroughly mixed, and you will receive the first b. If two gloves are randomly selected from the box, with­
2 cards from the deck, without replacement (the first card out replacement (the first glove is not returned to the box
is not returned to the deck after it is selected). after it is selected), what is the probability that there will
B. What is the probability that both cards are queens? be one right-handed glove and one left-handed glove
b. What is the probability that the first card is a 10 and the selected?
second card is a 5 or 6? c. If three gloves are selected, with replaeement (the gloves
c. If you were sampling with replacement (the first card is are returned to the box after they are selected), what is
returned to the deck after it is selected), what would be the probability that all three will be left-handed?
the answer in (a)? d. If you were sampling with replacement (the first glove is
d. In the game of blackjack, the picture cards Gack, queen, returned to the box after it is selected), what would be the
king) count as 10 points, and the ace counts as either 1 or answers to (a) and (b)?

4.3 Bayes'Theorem
Bayes' theorem is used to revise previously calculated probabilities based on new infor­
mation. Developed by Thomas Bayes in the eighteenth century (see references], 2,
and 5), Bayes' theorem is an extension of what you previously learned about conditional
probability.
Apply Bayes' theorem using You can apply Bayes' theorem to the situation in which M&R Electronics World is consid­
the instructions in Section ering marketing a new model of television. In the past, 40% of the new model televisions have
EGO.
been successful, and 60% have been unsuccessful. Before introducing the new model televi­
sion, the marketing research department conducts an extensive study and releases a report,
either favorable or unfavorable. In the past, 80% of the successful new model televisions had
received favorable market research reports, and 30% of the unsuccessful new model televi­
sions had received favorable reports. For the new model of television under consideration, the
marketing research department has issued a favorable report. What is the probability that the
television will be successful?
Bayes' theorem is developed from the definition of conditional probability. To find the
conditional probability of B given A, consider Equation (4 Ab) (originally presented on page
142 and shown below):
peA and B) peA IB)P(B)
PCB IA) - -..- ­
peA) peA)
Bayes' theorem is derived by substituting Equation (4.8) on page 146 for peA) in the denomi­
nator of Equation (4Ab).

BAYES'THEOREM

PCB; I A) (4.9)

where B; is the ith event out of k mutually exclusive and collectively exhaustive
events.
5.2 Covariance and Its Application in finance 165
5.3 Recently, a regional automobile dealership sent out 5.5 The number of arrivals per minute at a bank located in
fliers to perspective customers, indicating that they had the central business district of a large city was recorded over
already won one of three different prizes: a 2008 Kia Optima a period of 200 minutes, with the following results:
valued at $15,000, a $500 gas card, or a $5 Wal-Mart shop­
ping card, To claim his or her prize, a prospective customer
i
needed to present the flier at the dealership'S showroom. The Arrivals Frequency
fine print on the back of the flier listed the probabilities of
f winning. The chance of winning the car was lout of3 I ,478, 0 14

I
the chance of winning the gas card was lout of 31,478, and 1 31

I
the chance of winning the shopping card was 31,4 76 out
2 47

3 41

31,478.
4 29

a. How many fliers do you think the automobile dealership


5 21

sent out?
6 10

b. Using your answer to (a) and the probabilities listed on 7 5

the flier, what is the expected value of the prize won by a 8 2

prospective customer receiving a flier?


c. Using your answer to (a) and the probabilities listed on
the flier, what is the standard deviation of the value of the a. Compute the expected number of arrivals per minute.
prize won by a prospective customer receiving a flier? b. Compute the standard deviation.
d. Do you think this is an effective promotion? Why or why 5.6 The manager of the commercial mortgage department
not? of a large bank has collected data during the past two years
5.4 In the carnival game Under-or-Over-Seven, a pair of fair concerning the number of commercial mortgages approved
dice is rolled once, and the resulting sum determines whether per week. The results from these two years (104 weeks) indi­
the player wins or loses his or her bet. For example, the player cated the fol lowing:
can bet $1 that the sum will be under 7-that is, 2, 3, 4, 5, or
6. For this bet, the player wins $1 if the result is under 7 and

loses $1 if the outcome equals or is greater than 7. Similarly,


Number of Commercial
the player can bet $1 that the sum will be over 7-that is, 8, 9,
Mortgages Approved Frequency
10, II, or 12. Here, the player wins $1 if the result is over 7

but loses $1 if the result is 7 or under. A third method of play

o 13

I 25

is to bet $1 on the outcome 7. For this bet, the player wins $4


2 32

if the result of the roll is 7 and loses $1 otherwise.


3 17

a. Construct the probability distribution representing the 4 9

different outcomes that are possible for a $1 bet on 5 6

under 7. 6 1

b. Construct the probability distribution representing the dif­ 7 1

ferent outcomes that are possible for a $] bet on over 7.


c. Construct the probability distribution representing the
different outcomes that are possible for a $1 bet on 7, a. Compute the expected number of mortgages approved
d. Show that the expected long-run profit (or loss) to the per week.
player is the same, no matter which method of play is used. b. Compute the standard deviation.

5.2 Covariance and Its Application in Finance


In Section 5.1, the expected value, variance, and standard deviation of a discrete random variable
of a probability distribution are discussed. In this section, the covariance between two variables
is introduced and applied to portfolio management, a topic of great interest to financial analysts.

Covariance
The covariance, (J);r, measures the strength of the relationship between two numerical random
variables, X and Y. A positive covariance indicates a positive relationship. A negative covari­
ance indicates a negative relationship. A covariance of 0 indicates that the two variables are
independent. Equation (5.4) defines the covariance for a discrete probability distribution.
5.2 Co\'ariance and Its Application in Finance 169
5.9 Two investments, X and Y. have the following 5.13 Suppose that in Problem 5.12 you wanted to create a

characteristics: portfolio that consists of stock X and stock Y. Compute the

portfolio expected return and portfolio risk for each of the

£(X) $50, £( Y) = $100, (T~ = 9,000. following percentages invested in stock X:

1 a.30%

(T}, = 15,000. and axr = 7,500.


b.50%

c.70%

If the weight of portfolio assets assigned to investment X is


d. On the basis of the results of (a) through (c), which port­
OA, compute the
folio would you recommend? Explain.
a. portfolio expected return.
b. portfolio risk. 5.14 You are trying to develop a strategy for investing in
two different stocks. The anticipated annual return for a
APPLYING THE CONCEPTS $1,000 investment in each stock under four different eco­
5.10 The process of being served at a bank consists of two nomic conditions has the following probability distribution:
independent parts-the time waiting in line and the time it
takes to be served by the teller. Suppose that the time wait­
Returns
ing in line has an expected value of 4 minutes, with a stan­
dard deviation of 1.2 minutes, and the time it takes to be Probability Economic Condition Stock X Stock Y
served by the teller has an expected value of 5.5 minutes, 0.1 Recession -50 -100
with a standard deviation of 1.5 minutes. Compute the 0.3 Slow growth 20 50
a. expected value of the total time it takes to be served at 0.4 Moderate growth 100 130
the bank. 0.2 Fast growth 150 200
b. standard deviation of the total time it takes to be served at
the bank. Compute the
5.11 In the portfolio example in this section (see page a. expected return for stock X and for stock Y.
168), half the portfolio assets are invested in the Dow Jones b. standard deviation for stock X and for stock Y.
fund and half in a weak-economy fund. Recalculate the port­ c. covariance of stock X and stock Y.
folio expected return and the portfolio risk if d. Would you invest in stock X or stock Y? Explain.
a. 30% of the portfolio assets are invested in the Dow Jones 5.15 Suppose that in Problem 5.14 you wanted to create a

fund and 70% in a weak-economy fund. portfolio that consists of stock X and stock Y. Compute the

b. 70% of the portfolio assets are invested in the Dow Jones portfolio expected return and portfolio risk for each of the

fund and 30% in a weak-economy fund. following percentages invested in stock X:

c. Which of the three investment strategies (30%, 50%, or a.30%

70% in the Dow Jones fund) would you recommend? b.50%

Why? c.70%

r7'SElFl 5.12 You are trying to develop a strategy for d. On the basis of the results of (a) through (c), which port­
. . . investing in two different stocks. The anticipated folio would you recommend? Explain .
annual return for a $1,000 investment in each stock under 5.16 You plan to invest $1,000 in a corporate bond fund or
four different economic conditions has the following proba­

I
in a common stock fund. The following information about the
bility distribution: annual return (per $1,000) of each of these investments under
different economic conditions is available, along with the
probability that each of these economic conditions will occur:

I
Returns
Probability Economic Condition Stock X Stock Y
Economic Corporate Common
! 0.1 Recession -100 50 Probability Condition Bond Fund Stock Fund
0.3 Slow growth 0 150
0.3 Moderate growth 80 0.10 Recession -70 -300
-20
0.3 Fast growth 0.15 Stagnation 30 -100
150 100
0.35 Slow growth 80 100
0.30 Moderate growth 100 150
Compute the 0.10 High growth 120 350
a. expected return for stock X and for stock Y.
b. standard deviation for stock X and for stock Y. Compute the
c. covariance of stock X and stock Y. a. expected return for the corporate bond fund and for the
d. Would you invest in stock X or stock Y? Explain. common stock fund.
-,F

~~

~
.

6.2 The Normal Distribution 205


f'
populations are naturally skewed (coining that with discussions about grade inflation (undoubt­ publiC sector through the years, These misun­
term in passing), and he helped put to rest the edy a phenomena at many schools), But, have derstandings have caused a number of business
notion that the normal distribution underlies all you ever realized that a"proof" of this inflation­ blunders and have sparked some famous public
phenomena, that there are "too few" low grades because policy debates, As you study this chapter, make
Today, unfortunately, people still make the grades are skewed toward A's and B's-wrongly sure you understand the "normal" distribution
type of mistake that Pearson refuted, Maybe you implies that grades should be "normally" distrib· and the assumptions that must hold for its proper
have heard about the small class of three stu· uted, By the time you finish reading this book, use. Not verifying whether these assumptions
dents in which a professor announced that one you may realize that because college students hold is another common error made by decision
student would get an A, one would get a B, and represent small nonrandom samples, there are makers using this distribution. And, most impor­
one would get a C "because grades need to be plenty of reasons to suspect that the distribution tantly, always remember that the name normal
normally distributed," (That the professor was of grades would not be "normaL" distribution does not mean to suggest normal in
describing a uniform distribution was a double Misunderstandings about the normal distri­ the everyday (dare we say "normal"?) sense of
irony.) As a student, you are probably familiar bution have occurred both in business and in the the word!

IProblems for Section 6.2


LEARNING THE BASICS b. X < 70?
6.1 Given a standardized normal distribution (with a mean c. X< 800rX> 110?
of 0 and a standard deviation of 1, as in Table E.2), what is d. Between what two X values (symmetrically distributed
the probability that around the mean) are 80% of the values?
R. Z is less than 1.57? 6.6 Given a normal distribution with !-L = 50 and (J' = 4,
b. Z is greater than 1.84? what is the probability that
c. Z is between 1.57 and 1.84? a. X> 43?
d. Z is less than].57 or greater than 1.84? b. X < 42?
6.2 Given a standardized normal distribution (with a mean c. 5% of the values are less than what X value?
of 0 and a standard deviation of 1, as in Table E.2), what is d. B etween what two X values (symmetrically distributed
the probability that around the mean) are 60% of the val ues?
a. Z is between -1.57 and 1.84?
b. Z is less than 1.57 or greater than 1.84? APPLYING THE CONCEPTS
c. What is the value of Z if only 2.5% of all possible Z val­ 6.7 In a recent year, about two-thirds of u.s. households
ues are larger? purchased ground coffee. Consider the annual ground cof­
d. Between what two values of Z (symmetrically distributed fee expenditures for households purchasing ground coffee,
around the mean) will 68.26% of all possible Z values be assuming that these expenditures are approximately distrib­
contained? uted as a normal random variable with a mean of $65.16 and
a standard deviation of $10.00.
6.3 Given a standardized normal distribution (with a mean
a. Find the probability that a household spent less than $35.00.
of 0 and a standard deviation of 1, as in Table E.2), what is
b. Find the probability that a household spent more than
the probability that
$60.00.
a. Z is less than 1.08?
c. What proportion of the households spent between $40.00
b. Z is greater than -0.21?
and $50.00?
c. Z is less than -0.21 or greater than the mean?
d. 99% of the households spent less than what amount?
d. Z is less than -0.21 or greater than 1.08?
6.4 Given a standardized normal distribution (with a mean
r7SELFl 6.8 Toby's Trucking Company determined that
of 0 and a standard deviation of 1, as in Table E.2), deter­
a:D! the distance traveled per truck per year is nor­
mally distributed, with a mean of 50.0 thousand miles and a
mine the following probabilities:
standard deviation of 12.0 thousand miles.
a. P(Z > 1.08)
a. What proportion of trucks can be expected to travel
b. P(Z < -0.21)
between 34.0 and 50.0 thousand miles in a year?
c. P( -1.96 < Z < -0.21)
h. What percentage of trucks can be expected to travel either
d. What is the value of Z if only 15.87% of all possible Z
below 30.0 or above 60.0 thousand miles in a year?
values are larger?
c. How many miles will be traveled by at least 80% of the
6.5 Given a normal distribution with !-L = 100 and trucks?
(J'= 10, what is the probability that d. What are your answers to (a) through (c) if the standard
R. X > 75? deviation is 10.0 thousand miles?
206 CHAPTER (, The Normal Dislrihuli,)n iJnd Other Continuou, Distributiolls

6.9 The owner of a fish market determined thaI the mean c. What is the probability that a call lasted between 110 and
weight for salmon is 12.3 pounds. with a standard deviation 180 seconds?
of 2 pounds. Assuming that the weights of salmon are nor­ d. I % of all calls will last less than how many seconds?
mally distributed, what is the probability that a randomly
6.12 According to the American Society for Quality, a
selected salmon will weigh
certified quality engineer (CQE) is a professional who
a. between 12 and 15 pounds?
understands the principles of product and service qualily
b. less than 10 pounds?
evaluation and control. In a 2008 survey, the mean salary of
c. Between what two values will 95% of the salmon weights
l,387 CQEs was $72,824, with a standard deviation of
fall?
$18,796 (I. Elaine Allen, "Salary Survey: Seeing Green,"
6.10 A set of final examination grades in an introductory Quality Progress, December 2008, pp. 20--53). Assume that
statistics course is normally distributed, with a mean of 73 the salaries of CQEs is approximately normally distributed.
and a standard deviation of 8. For a randomly selected CQE, what is the probability that he
a. What is the probability of getting a grade below 91 on Of she has a salary
this exam? a. below $50.000'1
b. What is the probability that a student scored between 65 b. above $75,000'1
and 89? c. above $100,000'1
c. The probability is 5% that a student taking the test scores 6.13 Many manufacturing problems involve the matching
higher than what grade?
of machine parts, such as shafts that fit into a valve hole. A
d. If the professor grades on a curve (i.e., gives A's to the top
particular design requires a shaft with a diameter of 22.000
10% of the class, regardless of the score), are you better
mm, but shafts with diameters between 21.990 mm and
off with a grade of 81 on this exam or a grade of 68 on a
22,0 I 0 mm are acceptable. Suppose that the manufacturing F

different exam, where the mean is 62 and the standard


process yields shafts with diameters normally distributed, C

deviation is 3? Show your answer statistically and explain.


with a mean of22.002 mm and a standard deviation of 0.005 tr
6.11 A statistical analysis of 1,000 long-distance telephone mm. For this process, what is
calls made from the headquarters of the Bricks and Clicks a. the proportion of shafts with a diameter between 21.99 5,
Computer Corporation indicates that the length of these mm and 22.00 mm? dl
01
calls is normally distributed, with 1L = 240 seconds and b. the probability that a shaft is acceptable? to
a = 40 seconds. c. the diameter that will be exceeded by only 2% of the shafts?
a. What is the probability that a call lasted less than 180 d. What would be your answers in (a) through (c) if
seconds? the standard deviation of the shaft diameters were
b. What is the probability that a call lasted between 180 and 0.004 mm?
300 seconds?

6.3 Evaluating Normality


As discussed in Section 6.2, many continuous variables used in business closely follow a nor­ FI
mal distribution. This section presents two approaches for determining whether a set of data Fi\
can be approximated by the normal distribution: an
ret
1. Comparing the characteristics of the data with the theoretical properties of the normal dis­
tribution Fig
2. Constructing a normal probability plot the
pre
on
Comparing Data Characteristics to Theoretical Properties off,
The normal distribution has several important theoretical properties: ere

• It is symmetrical; thus, the mean and median are equal.


• It is bell-shaped; thus, the empirical rule applies. .
II
• The interquartile range equal::> 1.33 standard deviations.
• The range is approximately equal to 6 standard deviations.
In actual practice, a continuous variable may have characteristics that approximate these
I
theoretical properties. However, many continuous variables are neither normally distributed
nor approximately normally distributed. For such variables, the descriptive characteristics of

' ....
240 (II.\I'IIR

t)roblenlS for Section 7.4


h. what i~ the probability that the sample mean is less thll
7.15 (ii\(~n a normal distribution \\ith f.1 =: '00 and 1.2N Ii1ches'.'
(T 10. if you select a sample of II 25. what is the prob­ c. \\'hal is the probability that the sal11pk mcan is bctwc,11
ability that X is 1.31 and 1.33 inches')
a. less than 957 d. The probability is 60'yo that the sample mean will be
b. between 95 and 97.5? hetween what two values. symmetrically distrihukd
c. abow I02.2'? around the population mean'?
d. There i~ a 65°/(1 chance that "Y is above what value? 7.20 The U.S. Census Bureau announced that the median
7.16 Gin:n a normal distribution with J.L 50 and (T = 5, sales price of new houses sold in April 2009 was $22 J ,6fH).
if you select a sample of Il = 100. \'ihat is the probability and the mean sales price was S274.300 (www.census.gO\!
that .\" is newhomesales. July 20. 20(9). Assume that the standal d
a. kss than 477 deviation of the prices is $90.000.
b. bet\\een 47 and 49.5? a. If you select samples of 11 == 2, describe the shape of the
c. above 51.!'? sampling distribution of X.
d. There is a 35'Yo chance that X is above what value? b. If you select samples of n = 100, describe the shape ()f
the sampling distribution of X.
c. lfyou select a random sample of 11 = 100, what is the prob­
7.17 For each of the following three populations, indicate what ability that the sample mean will be less than $300,000'!
the sampling distribution for samples of 25 would consist of: d. If you select a random sample of n = 100, what is the
a. Travel expense vouchers for a university in an academic year. probability that the sample mean will be between $275,000
b. Absentee records (days absent per year) in 2009 for and $290,000':
employees of a large manufacturing company. 7.21 Time spent using e-mail per session is normally dis­
c. Yearly sales (in gallons) of unleaded gasoline at service tributed, with J.L = 8 minutes and if = 2 minutes. If you
stations located in a particular state. select a random sample of 25 sessions,
7.18 The following data represent the number of days absent a. \·vhat is the probability that the sample mean is between
per year in a population of six employees of a small company: 7.8 and 8.2 minutes'?
b. what is the probability that the sample mean is betwe,'ll
3 6 7 9 10 7.5 and 8 minutes?
a. Assuming that you sample without replacement, select c. If you select a random sample of 100 sessions, what is
all possible samples of n = 2 and construct the sampling the probability that the sample mean is between 7.8 and
distribution of the mean. Compute the mean of all the 8.2 minutes?
sample means and also compute the population mean. d. Explain the difference in the results of (a) and (c).
Are they equal? What is this property called?
b. Repeat (a) for all possible samples of n = 3. ki~'17:22 The amount of time a bank teller spend,.
;:' "'~ with each customer has a population mean, J.L, 01
c. Compare the shape of the sampling distribution of the
3.10 minutes and standard deviation, if, of 0040 minute. If
mean in (a) and (b). Which sampling distribution has less
you select a random sample of 16 customers,
variability? Why?
a. what is the probability that the mean time spent per cus­
d. Assuming that you sample with replacement, repeat
tomer is at least 3 minutes?
(a) through (c) and compare the results. Which sampling dis­
b. there is an 85% chance that the sample mean is less than
ttibutions have the least variability-those in (a) or (b)'? Why?
how many minutes?
7.19 The diameter of a brand of Ping-Pong balls is approx­ c. What assumption must you make in order to soh.:
imately normally distributed, with a mean of 1.30 inches (a) and (b)?
and a standard deviation of 0.04 inch. If you select a random d. If you select a random sample of 64 customers, there is
sample of 16 Ping-Pong balls, an 85% chance that the sample mean is less than ho\\
a. what is the sampling distribution of the mean'? many minutes'.'

Sanlpling Distribution of the Proportion


Consider a categorical variable that has only two categories, such as the customer prefers your
brand or the customer prefers the competitor's brand. You are interested in the proportion or
items belonging to one of the categories-for example, the proportion of customers that prefer
P'
242 CHAPTER 7 Sampling and Sampling Distrihutiol1S f
0.30 - OAO
r; . -­
1(0.40)(0.60)
V-- 200 -­
--0.10
10.24
I~-
--0.10
0.0346
I
\j 200
= -2.89
Using Table E.2, the area under the normal curve less than --2.89 is 0.0019. Therefore, if the
true proportion of items of interest in the population is 0040, then only 0.19% of the samples
of II = 200 would be expected to have sample proportions less than 0.30.

IProblems for Section 7.5


LEARNING THE BASICS 7.27 You plan to conduct a marketing experiment in which
7.23 In a random sample of 64 people, 48 are classified as students are to taste one of two different brands of soft drink.
"successful." Their task is to correctly identity the brand tasted. You select a
a. Determine the sample proportion,p. of "successful" people. random sample of 200 students and assume that the students
b. If the population proportion is 0.70, determine the stan­ have no ability to distinguish between the two brands. (Hint: If
dard error of the proportion. an individual has no ability to distinguish between the two soft
drinks, then the two brands are equally likely to be selected.)
7.24 A random sample of 50 households was selected for a a. What is the probability that the sample will have between
telephone survey. The key question asked was, "Do you or 50% and 60% of the identifications correct?
any member of your household own a cellular telephone that b. The probability is 90% that the sample percentage is con­
you can use to access the Internet?" Of the 50 respondents, tained within what symmetrical limits of the population
15 said yes and 35 said no. percentage?
a. Determine the sample proportion, p, of households with c. What is the probability that the sample percentage of cor­
cellular telephones that can be used to access the Internet. rect identifications is greater than 65%?
b. If the population proportion is 0040, determine the stan­ d. W hich is more likely to occur-more than 60% correct
dard error of the proportion. identifications in the sample of 200 or more than 55%
7.25 The following data represent the responses (Y for yes correct identifications in a sample of I,OOO? Explain.
and N for no) from a sample of 40 college students to the 7.28 In an online survey of 4,00 I respondents, 8% were
question "Do you currently own shares in any stocks?" classified as productivity enhancers who are comfortable
NNYNNYNYNYNNYNYYNNNY with technology and use the Internet for its practical value
NYNNNNYNNYYNNNYNNYNN (data extracted from M. Himowitz, "How to Tell What Kind
a. Determine the sample proportion, p, of college students of Tech User You Are," Newsday, May 27, 2007, p. F6).
who own shares of stock. Suppose you select a sample of 400 students at your school,
b. If the population proportion is 0.30, determine the stan­ and the population proportion of productivity enhancers
dard error of the proportion. is 0.08.
a. What is the probability that in the sample, fewer than
APPLYING THE CONCEPTS 10% of the students will be productivity enhancers?
h. What is the probability that in the sample, between 6%
r7SELFl 7.26 A political pollster is conducting an analysis
and 10% of the students will be productivity enhancers?
.mil of sample results in order to make predictions on c. What is the probability that in the sample, more than 5%
election night. Assuming a two-candidate election, if a spe­
cific candidate receives at least 55% of the vote in the sample, of the students will be productivity enhancers?
that candidate will be forecast as the winner of the election. If d. If a sample of 100 is taken, how does this change your
you select a random sample of 100 voters, what is the proba­ answers to (a) through (c)?
bility that a candidate will be forecast as the winner when 7.29 Companies often make flextime scheduling available to
a. the true percentage of her vote is 50.1%? help recruit and keep women employees who have children.
b. the true percentage of her vote is 60%'1 Other workers sometimes view these flextime schedules as
c. the true percentage of her vote is 49% (and she will actu­ unfair. An article in USA Today indicates that 25% of male
ally lose the election)? employees state that they have to pick up the slack for moms
d. If the sample size is increased to 400, what are your working flextime schedules (data extracted from D. Jones.
answers to (a) through (c)? Discuss. "Poll Finds Resentment of Flextime," www.usatoday.com.
7.6 OnlilJe Sampling from Finite Populations 243
May II, 2007). Suppose you select a random sample of expensive. Preventable health care-related errors cost the
100 male employees working for companies offering flextime. U.S. economy an estimated $29 billion each year. Suppose
a. What is the probability that 25% or fewer male employ­ that you select a sample of 100 U.S. hospital patients.
ees will indicate that they have to pick up the slack for a. What is the probability that the sample percentage will be
moms working flextime? between 5% and 10%?
he b. What is the probability that 20% or fewer will indicate that b. The probability is 90% that the sample percentage will be
es they have to pick up the slack for moms working flextime? within what symmetrical limits of the population percentage?
c. If a random sample of 500 is taken, how does this change c. The probability is 95% that the sample percentage will be
your answers to (a) and (b)? within what symmetrical limits ofthe population percentage?
7.30 According to Gallup's poll on personal finances, 46% d. Suppose you selected a sample of400 U.S. hospital patients.
of U. S. workers say they feel that they will have enough How does this change your answers in (a) through (c)?
money to live comfortably when they retire. (Data extracted 7.32 Yahoo Hotlobs reported that 56% of full-time office
from The Gallup Poll, www.gallup.com. May 6, 2008.) If workers believe that dressing down can affect jobs, salaries,
;h you select a random sample of 200 U.S. workers, or promotions (data extracted from 1. Yang and K. Carter,
k. a. what is the probability that the sample will have between "Dress Can Affect Size of Paycheck," ,\,\,ww.usatoday.com.
45% and 55% who say they feel they will have enough May 9, 2007).
money to live comfortably when they retire? a. Suppose that you take a sample of 100 full-time workers.
b. the probability is 90% that the sample percentage will be If the true population proportion of workers who believe
contained within what symmetrical limits of the popula­ that dressing down can affect jobs, salaries, or promo­
tion percentage? tions is 0.56, what is the probability that fewer than half
~n c. the probability is 95% that the sample percentage will be in your sample hold that same belief?
contained within what symmetrical limits of the popula­ b. Suppose that you take a sample of 500 full-time workers.
n­ tion percentage? If the true population proportion of workers who believe
m 7.31 The Agency for Healthcare Research and Quality that dressing down can affect jobs, salaries, or promo­
reports that medical errors are responsible for injury in lout tions is 0.56, what is the probability that fewer than half
r­ of every 25 hospital patients in the United States (data in your sample hold that same belief?
extracted from M. Ozan-Rafferty, "Hospitals: Never c. Discuss the effect of sample size on the sampling distri­
ct Have a Never Event," The Gallup Management Journal, bution of the proportion in general and the effect on the
Yo gmj.galIup.com, May 7, 2009). These errors are tragic and probabilities in (a) and (b).

re
Ie
Ie
Id
).
7.6 'Online Topic: Sampling from Finite Populations
,I. In this section, sampling without replacement from finite populations is discussed. To study
rs this topic, read the Section 7.6 online topic file that is available on this book's companion Web
site. (See Appendix Section D.8 to learn how to access the online topic files.)
111

@ Oxford Cereals Revisited

s the plant operations manager for Oxfords Cereals, you were


IT responsible for monitoring the amount of cereal placed in
each box. To be consistent with package labeling, boxes
should contain a mean of 368 grams of cereal. Thousands of
o boxes are produced during a shift, and weighing every single box was
I.
determined to be too time-consuming, costly, and inefficient. Instead. a
S
sample of boxes was selected. Based on your analysis of the sample, you had to decide whether
e to maintain, alter, or shut down the process.
s Using the concept of the sampling distribution of the mean, you were able to determine prob­
"
abilities that such a sample mean could have been randomly selected from a population with a
mean of 368 grams. Specifically, if a sample of size 11 = 25 is selected from a population with a
258 CIIAPTER X Confidence lnlenul btlmalion

95°1(, confidencc that the mean annual income of the 2 mil­ 8.9 The manager of a paint supply store wants to estimate
lion customers is between $70,000 and S85,OOO. Explain thc the actual amount of paint contained in I-gallon cans pur­
meaning of this statement. chased from a nationally known manufacturer. The manu­
facturer's specifications state that the standard deviation of
8.6 Suppose that you are going to collect a set of data, the amount of paint is equal to 0.02 gallon. A random sam­
either from an entire population or from a random sample ple of 50 cans is selected, and the sample mean amount of
taken from that population. painl per I-gallon can is 0.995 gallon.
a. Which statistical measure would you compute first: the a. Construct a 99% confidence interval estimate for the pop­
mean or the standard deviation? Explain. ulation mean amounl of paint included in a I-gallon can.
b. What does your answer to (a) tell you about the "practi­ b. On the basis of these results, do you think that the man­
cality" of using the confidence interval estimate formula ager has a right to complain to the manufacturer? Why?
given in Equation (8.1)? c. Must you assume that the popUlation amount of paint per
. !
can is normally distributed here? Explain.
8.7 Consider the confidence interval estimate discussed in d. Construct a 95% confidence interval estimate. How dues
Problem 8.5. Suppose that the population mean annual this change your answer to (b)?
income is $71,000. Is the confidence interval estimate stated
in Problem 8.5 correct? Explain. fI"'SELFl 8.10 The quality control manager at a light bulb
IIiIi'!J!il factory needs to estimate the mean life of a large
8.8 You are working as an assistant to the dean of institu­ shipment of light bulbs. The standard deviation is 100 hours.

tional research at your university. She wants to survey A random sample of 64 light bulbs indicated a sample mean

members of the alumni association who obtained their bac­ life of 350 hours.

calaureate degrees 5 years ago to learn what their starting a. Construct a 95% confidence interval estimate for the

salaries were in their first full-time job after receiving their population mean life of light bulbs in this shipment.

degrees. A sample of 100 alumni is to be randomly selected b. Do you think that the manufacturer has the right to state

from the list of 2,500 graduates in that class. If her goal is that the light bulbs have a mean life of 400 hours'? Explain.

to construct a 95% confidence interval estimate for the pop­ c. Must you assume that the population light bulb life is

ulation mean starting salary, why is it unlikely that you will normally distributed? Explain.

be able to use Equation (8.1) on page 255 for this purpose? d. Suppose that the standard deviation changes to 80 hours.

Explain. What are your answers in (a) and (b)?

8.2 Confidence Interval Estimate for the Mean (a Unknown)


In the previous section, you learned that in most business situations, you do not know (J', the pop­
ulation standard deviation. This section discusses a method of constructing a confidence interval
estimate of IL that uses the sample statistic S as an estimate of the population parameter (J'.

Student's t Distribution
At the start of the twentieth century, William S. Gosset was working at Guinness, trying to help
the Irish brewer brew better beer less expensively (see reference 4). As he had only small samples
to study, he needed to find a way to make inferences about means without having to know (]'.
'Guinness considered all research Writing under the pen name "Student,"] Gosset solved this problem by developing what today
conducted to be proprietary and a is known as the Student's t distribution, or the t distribution, for short.
trade secret. The firm prohibited its
employees from publishing their
If the random variable X is normally distributed, then the following statistic
results. Gosset circumvented this
ban by using the pen name "Student" X-IL
r= ---­
to publish his findings. S
Vfl
has a t distribution with n - 1 degrees of freedom. This expression has the same form as the
Z statistic in Equation (7.4) on page 236, except that S is used to estimate the unknown (J'.

Properties of the t Distribution


The t distribution is very similar in appearance to the standardized normal distribution. Both
distributions are symmetrical and bell-shaped, with the mean and the median equal to zero.
However, the t distribution has more area in the tails and less in the center than does the
)64 \ ,,; I ~ I 1( .

i\

:",: OIUr'1 e
:~'Thl 1 F -i I I ' 1 r ..,"
popu:atlol1 of 200 :::aJl.1,11': 67 11 11 J,
ordErs \fJ;th cr unk1;c;\'il", ,'5 aJl.~, 1 e t, 1 (I 1_ 1 (I . c' : _,.4, t.~,U. ~;~] I 7{ ,'.1
~;aJl.rll': --: 10 c' Hi. ':0' '1 ,:1 ':; . I ~/:;.~, '? 'J
SaJl.I,lE: ;3 10 51~1 11 [i.:j j . 4~ 1,69. 1St:, tI:·, 48)
"WlI~'It:: ,~ 10 tit. 6'" 11 4:, 6_ (:1:3.50, '74.(:;81
SaJup 1 e 10 10 6_: ~ (~,t,.41, 6E,.69',
~;l'1h.ple11 10 71 L 4.0':, (61 . 9:'" 30. :9,1
;:'aJu~'le 12 10 70, c,:: 10. <,2 (63.02/ 72.(3)
SalI;ple 13 10 6:,.51 8.16 (~,?6? '71~~:,)

S>'1lI<ple 14 10 64.90 I,S? 50, 70. jlj)


;::>'1lI<ple 15 10 t:6.:'~ 11 '::1 (5::1. '::0, 74.24:1
Sdlli~d e If, 10 7Cj. 4 J 10.21 (6"::. L:, 7'7 74,1
~;WlIple 17 IP '7204 (;. " < 1 92; l:67 :.7) 7(,.Sl,1
::'WlIple 10 10 n.91 11 '::9 3 ~ 57 (6~,. :33" £:14 9~)
SaJuple 1<:; 10 71 4<:. " 7t. 3.0<:; (64.51, ')8.47)
;:'aJIl~'le 20 10 70.1:, 10.84 ).43 (6::. 3~1 ;'7 91',

Problems for Section 8.2


a. Assuming a normal distribution. construct a 95% C();;!'I­
8.11 If X = 75, S = 24, and II = 36. and assllming that dence intelTal estimate for the mean \alue of all gn:ci !!1g
the population is normally distributed. construct a 95% con­ cards in the store's inventory.
fidence inten'al estimate for the population mean. fl. b. Suppose there were 2.500 greeting cards in the s\()!<:'s
inventory. How are the results in (a) useful in assisting ihe
8.12 Determine the critical value of I in each of the follow­ store owner 10 estimate the total \'alue of the inventor~'
ing circllmstances:
a. J -a=0.95,1I 10 II SELF]8.16 Southside Hospital in Bav Shorc. New ),,;1\.
___ commonly conducts stress tests to study the he.!!'t
b.l a=0.99,1I 10
c. - a = 0.95,11 = 32 muscle after a person has a heart attack. Members ()t ,he
d. I - a 0.95,11 = 65 diagnostic imaging department conducted a quality II11PI<l\ c·
e. I - a 0.90,11= 16 ment project with the objective of reducing the turnanlldld
time for stress tests, Turnaround time is defined as the lime
8.13 Assuming that the population is normally distributed, from when a test is ordered to when the radiologist sigll~ \,IT
construct a 95%) confidence interval estimate for the popu­ on the test results, Initially. the mean turnaround time t; 'i a
lation mean for each of the following samples: stress test was 68 hours, After incorporating changes into i he
Sample A: 1 I I I 8 8 8 8 stress-test process, the quality improvement team collectcci a
Sample B: I 2 3 4 5 6 7 8 sample of 50 turnaround times. In this sample. the 111L,I11
Explain why these two samples produce different confidence turnaround time was 32 hours. \\i1b a standard devia1'clil
intervals even though they have the same mean and range. of () hours (data extracted from F. Godin, D. Ra\ ,.n.
8.14 Assuming that the population is normally distributed. e. Sweetapple. and F. R. Del Guidice. "Faster Test Resulh."
construct a 95% confidence interval for the population QlIulin' Progress. January 2004. 37( I J, pp, 33·-39).
mean, based on the following sample of size 11 7: a. Construct a 95% confidence interval estimate for ::1c'
1.2,3.4. 5.6, and 20. Change the number 20 to 7 and recal­ population mean turnaround time,
culate the confidence intenal. Using these results, describe b. Interpret the inlenal constructed in (a),
the effect of an outlier (i.e., an extreme value) on the confi­ c. Do YOll think the quality imprO\ell1ent project wa- J
dence interval. success'?
8.17 The US. Department of Transportation requires t!:,'
manufacturers 10 prO\'ide tire performance information ,.'Il
8.15 A stationery store wants to estimate the mean retail the sidewall of a tire to better inform prospective custonwr,
value of greeting cards that it has in its inventory. A random as they make purchasing decisions. Olle very import",.!
sample of 100 greeting cards indicates a mean value of measure of tire performance is the tread wear index. whtd:
$2.55 and a standard deviation of $0.44, indicates tilt' tire's resistance to tread wear compared with .1
8.2 Confidence Interval Estimate for the Mean (a Unknown) 265

tire graded with a base of 100. A tire with a grade of 200 Stock Exchange or the I\ASDAQ. In 2008, the S&P 500 \vas
should last twice as long, on average, as a tire graded with a down 38.5°/,), but what about financial compensation (salary,
base of 100. A consumer organization wants to estimate the bonuses, stock options, etc.) to the 500 CEOs that run the
actual tread wear index of a brand name of tires that claims companies? To learn more about the mean CEO compensa­
"graded 200" on the sidewall of the tire. A random sample tion, an alphabetical list of the 500 companies was obtained
of n 18 indicates a sample mean tread wear index of and ordered from I (3M) to 500 (Zions Bancorp). Next, the
195.3 and a sample standard deviation of 21.4. random number table was used to select a random number
a. Assuming that the popUlation of tread wear indexes is from 1 to 50. The number selected was 10. Then, the com­
normally distributed, construct a 95% confidence inter­ panies numbered 10, 60, 110, 160, 210, 260, 310, 360, 410,
val estimate for the population mean tread wear index for and 460 were investigated and the total CEO compensation
tires produced by this manufacturer under this brand recorded. The data, stored in t:rr~, are as follows:
name.
b. Do you think that the consumer organization should Number Company Compensation
accuse the manufacturer of producing tires that do not
meet the performance information provided on the side­ 10 Aflac 10,783,232
wall of the tire? Explain. 60 Big Lots 9,862,262
c. Explain why an observed tread wear index of 210 for a 110 Comerica 4,108,245
particular tire is not unusual, even though it is outside the 160 EMC 13,874,262
confidence interval developed in (a). 210 Harley-Davidson 6,048,027
260 Kohl's 11,638,049
8.18 The file IMrm;nttTI contains the price for two movie 310 Molson Coors Brewing 5,558,499
tickets, with online service charges, large popcorn, and two 360 Pfizer 6,629,955
medium soft drinks at a sample of six theater chains: 410 Sigma-Aldrich 3,983,596
$36.15 S31.00 $35.05 $40.25 $33.75 $43.00 460 United Parcel Service 5,168,664
Source: Data extracted/rolll K. KelZl'. "The Multiplex Under Siege." Source: Dala eXlracted/i'O/1/ D. Jones and B. Hansen, "CEO Pay Dives
The WalJ Street Journal, December 24-25. 2005. pp. Pl. P5. in a Rough 2008," www.usatoday.com. May J. 2009.

a. Construct a 95% confidence interval estimate for the


population mean price for two movie tickets, with online a. Construct a 95% confidence interval estimate for the mean
service charges, large popcorn, and two medium soft 2008 compensation for CEOs of S&P 500 companies.
drinks, assuming a normal distribution. b. Construct a 99% confidence interval estimate for the mean
b. Interpret the interval constructed in (a). 2008 compensation for CEOs of S&P 500 companies.
c. Comment on the effect that changing the level of confi­
8.19 The file ~ contains the overall miles per gallon dence had on your answers in (a) and (b).
(MPG) of 2009 sedans priced under $20,000.
8.22 One of the major measures of the quality of service
27 31 30 28 27 24 29 32 32 27 26 26 25 26 25 24 provided by any organization is the speed with which it
Source: Data extractedfi'OlII "Vehicle Ratings." Consumer responds to customer complaints. A large family-held depart­
Reports. April 2009. p. 27. ment store selling furniture and flooring, including carpet,
a. Construct a 95% confidence interval estimate for the had undergone a major expansion in the past several years. In
population mean MPG of2009 sedans (4 cylinder) priced particular, the flooring department had expanded from
under $20,000, assuming a normal distribution. 2 installation crews to an installation supervisor, a measurer,
b. Interpret the interval constructed in (a). and 15 installation crews. The store had the business objec­
c. Compare the results in (a) to those in Problem 8.20(a). tive of improving its response to complaints. The variable of
interest was defined as the number of days between when the
8.20 The file ~ contains the overall miles per gallon
complaint was made and when it was resolved. Data were
(MPG) of 2009 small SUVs priced under $30,000.
collected from 50 complaints that were made in the last year.
24232221 2222 18 19 19 1921 21 21 J 8 1921 1722 18 1822 16 16 The data were stored in m.iJ!!fr, and are as follows:
Source: Data extmcled/ro/ll "Vehicle Ralings," Consumer Reports.
April 2009. p. 33-34. 54 5 35 137 31 27 152 2 123 81 74 27
a. Construct a 95% confidence interval estimate for the 11 19 126 110 110 29 61 35 94 31 26 5
population mean MPG of 2009 SUV s priced under 12 4 165 32 29 28 29 26 25 I 14 13
$30,000, assuming a normal distribution. 13 10 5 27 4 52 30 22 36 26 20 23
b. Interpret the interval constructed in (a). 33 68
c. Compare the results in (a) to those in Problem 8.19(a). a. Construct a 95% confidence interval estimate for the
8.21 The stocks included in the S&P 500 are those of large population mean number of days between the receipt of a
publicly held companies that trade on either the New York complaint and the resolution of the complaint.
r

The operaiiOlb'manager a! a Jar!:,e ne\\~paper \\:lIlh to es!im~\1C the proportion or Ile\\~pape',


prilHl'd that hm e ,1 nllnClll1fonmng 'Iltribu!l'. Using thl' Define. Collect. Org(lIlize. \"jslIalih,
Estimating the and Analyze steps. you defilll' lhe \ariabk or intl'rest ,1:-. whl'lhn till' nl'wspaper has e\cl'ssi\ "
Proportion of ruboff. improper pagl' Sl'tup. l11issinf. pages. Dr dlipliL'~l!e pages. You colkct till' data by selccl­
Nonconforming ing a random s~ll11ple of 1/ 200 111'\\spapers from all the Ill'\\spapers printed during aSl1lgi.:
Newspapers day. You organize the rl'sults. which sho\\ that 35 ne\\'spapers contain some type of nonconf(ll"­
Printed manee. into a workbook. Tn analyze the data. you need to construct and interpret a 9()!~'o CUJ1­
fidenee intenal for the proportion nf newspapers printed during the day that han; a noncoll­
forming anribulC.
SOUJlIOt" Using Equation (8.3),

x 35
p = 0.175, and with a 90";;, level of confide nee Z" 2 1.645
!1 200

175)(0.825)
= 0.175 ± (1.645) --.-~--~--

200
0.175 ± (1.645)( 0.0269)
= 0.175 :t: 0.0442

0.1308 :S 7T :S 0.2192
You conclude with 90% confidence that the population proportion of all newspapers printed
that day with nonconformities is between 0.1308 and 0.1292. This means that between 13.01<° 0
and 21.92% of the newspapers printed on that day have some type of nonconformance.

Equation (8.3) contains a Z statistic because you can use the normal distribution to appro\­
imate the binomial distribution when the sample size is sufficiently large. In Example 8.4. the
confidence interval using Z provides an excellent approximation for the popUlation proportion
because both X and 11 - X are greater than 5. However. if you do not have a sufficiently large
sample size, you should Lise the binomial distribution rather than Equation (8.3) (see refer­
ences I and 2). The exact confidence intervals for various sample sizes and proportions of
successes have been tabulated by Fisher and Yates (reference 2).

I'~- -.---.. -----.----.--.-.--..--.......--,-----'-'---'---.---.-.-.--.....----.-- ..-.---..-..-...-.


r Problems for Section 8.3
LEAf<NII.JG 1 HE BASICS a. Construct a 99% confidence interval estimate for the
8.26 If 11 = 200 and X = 50, construct a 95% confidence population proportion of households that would purch~l:-;e
interval estimate for the population proportion. the additional telephone line.
b. How would the manager in charge of promotional programs
8.27 If 11 400 and X = 25, construct a 99% confidence concerning residential customers use the results in (a)?
interval estimate for the population proportion.
8.29 CareerBuilder.com surveyed 1,124 mothers who were
APPLYHJG THL C.ONCFP1~ currently employed full time. Of the women surveyed, 2S I
r;sEi:Fl 8.28 The telephone company wants to estimate said that they were dissatisfied with their work-life balance.
~'J the proportion of households that would purchase and 495 said that they would take a pay cut to spend more time
an additional telephone line if it were made available at a with their kids (data extracted from D. Jones, "Poll Find~
substantially reduced installation cost. A random sample of Resentment of Flextime," www.usatoday.com May I L 2007).
500 households is selected. The results indicate that 135 of a. Construct a 95% confidence interval estimate for the
the households would purchase the additional telephone line popUlation proportion of mothers employed full time WIll)
at a reduced installation cost. are dissatisfied with their work-life balance.
X.4 Determining Sample SJZt: 269

b. Construct a 9YYC) confidence interval estimate for the B.32 In a survey of2.395 adults. 1,916 reported that e-mails
population proportion of mothers employed full time are easy to misinterpret but only 1,269 reported that telephone
who would take a pay cut to spend more time with their conversations are easy to misinterpret (extracted from "Open
kids. to Misinterpretation." USA Today, July 17,2007. p. 1D).
c. Write a short summary of the information derived from a. Construct a 95% confidence interval estimate for the
(a) and (b). population proportion of adults who report that e-mails
are easy to misinterpret.
8.30 Have you ever negotiated a pay raise? According to an b. Construct a 95% confidence interval estimate for the
Accenture survey, 52% of U.S. workers have (1. Yang and population proportion of adults who report that telephone
K. Carter, "Have You Ever Negotiated a Pay Raise?", conversations are easy to misinterpret.
"Snapshots," w"\\'W.usatoday.com, May 22, 2009). c. Compare the results of (a) and (b).
a. Suppose that the survey had a sample size of n 500.
Construct a 95% confidence interval for the proportion B.33 The utility of mobile devices raises new questions
of all U.S. workers who have negotiated a pay raise. about the intrusion of work into personal life. I n a survey by
b. Based on (a), can you claim that more than half of all CareerJournal.com (data extracted from P. Kitchen, "Can't
U.S. workers have negotiated a pay raise? Turn It Off," Newsday, October 20, 2006, pp. F4-F5), 158
c. Repeat parts (a) and (b), assuming that the survey had a of 473 employees responded that they typically took work
sample size of n 5,000. with them on vacation, and 85 responded that there are
d. Discuss the effect of sample size on confidence interval unwritten and unspoken expectations that they stay con­
estimation. nected during vacation.
a. Construct a 95% confidence interval estimate for the
8.31 In a survey of 1,000 airline travelers, 760 responded population proportion of employees who typically take
that the airline fee that is most unreasonable is additional work with them on vacation.
charges to redeem points/miles (extracted from "Which b. Construct a 95% confidence interval estimate for the
Airline Fee Is Most Unreasonable?" USA Today, December 2, population proportion of employees who said that there
2008, p. B I). Construct a 95% confidence interval estimate are unwritten and unspoken expectations that they stay
for the population proportion of airline travelers who think connected during vacation.
that the airline fee that is most unreasonable is additional c. Interpret the intervals in (a) and (b).
charges to redeem points/miles. d. Explain the difference in the results in (a) and (b).

8.4 Determining Sample Size


In each confidence interval developed so far in this chapter, the sample size was reported along
with the results, with little discussion of the width of the resulting confidence interval. In the
business world, sample sizes are determined prior to data collection to ensure that the confi­
dence interval is narrow enough to be useful in making decisions. Determining the proper sam­
ple size is a complicated procedure, subject to the constraints of budget, time, and the amount
of acceptable sampling error. In the Saxon Home Improvement example, if you want to estimate
the mean dollar amount of the sales invoices, you must determine in advance how large a sam­
pling error to allow in estimating the population mean. You must also determine, in advance, the
level of confidence (i.e., 90%, 95%, or 99%) to use in estimating the population parameter.

Sample Size Determination for the Mean


To develop an equation for determining the appropriate sample size needed when constructing
a confidence interval estimate for the mean, recall Equation (8.1) on page 255:

x
The amount added to or subtracted from is equal to half the width of the interval. This
<Ill this context. some statisticians quantity represents the amount of imprecision in the estimate that results from sampling error. 2
refer to e as the margin of error. The sampling error, e, is defined as

u
e
vn
8.4 Determining Sample Size 273

Because the general rule is to round the sample size up to the next whole integer to slightly
oversatisfy the criteria, a sample size of 100 is needed. Thus, the sample size needed to satisfy
the requirements of the company, based on the estimated proportion, desired confidence level.
and sampling error, is equal to the sample size taken on page 267. The actual confidence interval
is narrower than required because the sample proportion is 0.10, whereas 0.15 was used for 1T
in Equation (8.5). Figure 8.14 shows a worksheet solution for determining the sample size.

FIGURE 8.14
1 For the Proportion of In-Error Sale. Invoite.

Worksheet for determining


2

sample size for estimating ~ ------------------------~


~om ------~--~~--_r--~--~
the proportion of sales
invoices with errors for the '':, Sam ling Error
Saxon Home Improvement
Company
8 =·---,~~·te_CalculatiC!_ns____ J
the Figure B. 14 displays the 9 Z Value ~ -1.9600' =NORMSINV((l 86)/2)

10r­ COMPUTE worksheet of the . 10 Calculated Sample Size ,I


99.9563) =(89 A l • 84' (1 ·84))/B5 A l
Sample Size Proportion 11 _ __
S to workbook and reveals the , 12 __________Re:.:.=su=.lt"---r_____-i
ula­ formulas the worksheet uses. 13 sample Size Needed loill =ROUNDUP(B10, 0)
try­ See Section EGB.4 to learn
how to determine sample size Example 8.6 provides another application of determining the sample size for estimating
and how to use the Sample
rel­ Size Proportion workbook as a the population proportion.
ma­ template for other problems.
/Gte
1T)
ltity EXAMPLE 8.6 You want to have 90% confidence of estimating the proportion of office workers who respond
!Um to e-mail within an hour to within ±0.05. Because you have not previously undertaken such a
ling Determining the study, there is no information available from past data. Determine the sample size needed.
Sample Size for
SOLUTION Because no information is available from past data, assume that 7T = 0.50. Using
the Population Equation (8.5) on page 272 and e = 0.05,7T 0.50, and Za/2 = 1.645 for 90% confidence,
Proportion
Z~i27T(1 - 7T)
11

(1.645 )2(0.50 )(0.50)


(0.05)2
1T,
270.6
;est
his Therefore, you need a sample of 271 office workers to estimate the population proportion to
111­ within ±0.05 with 90% confidence.
,ill
Ian

he
)r­
IProblems for Section 8.4
1at LEARNING THE BASICS 8.37 If you want to be 95% confident of estimating the
ith 8.34 If you want to be 95% confident of estimating the popu­ population proportion to within a sampling error of ±0.02
lation mean to within a sampling error of ±5 and the standard
and there is historical evidence that the population propor­
deviation is assumed to be 15, what sample size is required? tion is approximately OAO, what sample size is needed?

8.35 If you want to be 99% confident of estimating the popu­ APPLYING THE CONCEPTS
lation mean to within a sampling error of ±20 and the standard
deviation is assumed to be 100, what sample size is required?
! 65118.38 A survey is planned to determine the mean
. . annual family medical expenses of employees of a
8.36 If you want to be 99% confident of estimating the large company. The management of the company wishes to be
population proportion to within a sampling error of ±0.04, 95% confident that the sample mean is correct to within ±$50
what sample size is needed? of the population mean annual family medical expenses.
274 CHAPTER X Confldcncc lnlc!'\al FstinJation

A prc\ious study indicatcs that the standard dC\iatiol1 is 8.45 What proportion of American" gel most of their nc\\~
approximately $400. from the Internet'? According to a poll conducted by PC\\
a. How largc a sample is necessary? Research Center, 40% get most of their news from the
b. II' management wants to be correct to \vithin ± $25, how Internet (data extracted from "Drill Down." The Nell" hwk
many employees need to be selected? Tillles. January 5. 2009. p. 83).
a. To conduct a follo\N-up study that would providc 95".,)
8.39 If the manager of a paint supply store wants to esti­
confidence that the point estimate is correct to within
mate, with 95'% confidence. the mean amount of paint in a
±0.04 of the population proportion, how large a sample
I-gallon can to within ±O.004 gallon and also assumes that the
size is required'?
standard deviation is 0.02 gallon, what sample size is needed?
b. To conduct a follow-up study that would provide 99%
8.40 If a quality control manager wants to estimate, with confidence that the point estimate is correct to within
95% confidence, the mean life of light bulbs to within ±20 ±0.04 of the population proportion, how many people
hours and also assumes that the population standard devia­ need to be sampled?
tion is 100 hours, how many light bulbs need to be selected? c. To conduct a follow-up study that would provide 95"ij
confidence that the point estimate is correct to within
8.41 If the inspection division of a county weights and
±0.02 of the population proportion, how large a sample
measures department wants to estimate the mean amount of
size is required?
soft-drink fill in 2-liter bottles to within ±O.O I liter with
d. To conduct a follow-up study that would provide 99%
95% confidence and also assumes that the standard devia­
confidence that the point estimate is correct to within
tion is 0.05 liter, what sample size is needed?
±0.02 of the population proportion, how many people
8.42 A consumer group wants to estimate the mean electric need to be sampled?
bill for the month of July for single-family homes in a large e. Discuss the effects on sample size requirements of chang­
city. Based on studies conducted in other cities, the standard ing the desired confidence level and the acceptable sam­
deviation is assumed to be $25. The group wants to estimate, pling error.
with 99% confidence, the mean bill for July to within ± $5.
8.46 A survey of 1,000 adults was conducted in March
a. What sample size is needed?
2009 concerning "green practices." In response to the
b. If 95% confidence is desired., how many homes need to
question of what was the most beneficial thing to do for
be selected?
the environment, 28% said buying renewable energy, 19'%
8.43 An advertising agency that serves a major radio said using greener transportation, and 7% said selecting
station wants to estimate the mean amount of time that minimal or reduced packaging (data extracted from
the station's audience spends listening to the radio daily. "Environmentally Friendly Choices," USA Today, March 31,
From past studies, the standard deviation is estimated as 2009. p. D 1). Construct a 95% confidence interval estimate
45 minutes. of the popUlation proportion of who said that the most ben­
a. What s ample size is needed if the agency wants to be eficial thing to do for the environment was
90% confident of being correct to within ± 5 minutes? a. buy renewable energy.
b. If99% confidence is desired., how many listeners need to b. use greener transportation.
be selected? c. select minimal or reduced packaging.
d. You have been asked to update the results of this study.
8.44 A growing niche in the restaurant business is gourmet­
Determine the sample size necessary to estimate, with
casual breakfast, lunch, and brunch. Chains in this group
95% confidence, the popUlation proportions in (a) through
include Le Peep, Good Egg, Eggs & I, First Watch, and Eggs
(c) to within ±0.02.
Up Grill. The mean per-person check for First Watch is
approximately $7, and the mean per-person check for Eggs 8.47 In a study of 500 executives, 315 stated that their
Up Grill is $6.50 (data extracted from J. Hayes, company informally monitored social networking sites to
"Competition Heats Up as Breakfast Concepts Eye Growth," stay on top of information related to their company (data
Nal/on:~ Restaurant News, April 24,2006, pp. 8,66). extracted from "Checking Out the Buzz," USA Todm', June
a. Assuming a standard deviation of $2.00, what sample 26,2009, p. IB).
size is needed to estimate, with 95% confidence, the a. Construct a 95% confidence interval for the proportion of
mean per-person check for Good Egg to within ± $0.25? companies that informally monitored social networking
b. Assuming a standard deviation of $2.50, what sample sites to stay on top of information related to their company.
size is needed to estimate, with 95% confidence, the h. Interpret the interval constructed in (a).
mean per-person check for Good Egg to within ±$0.25? c. If you wanted to conduct a follow-up study to estimate the
c. Assuming a standard deviation of $3.00, what sample population proportion of companies that informally mon­
size is needed to estimate, with 95% confidence, the itored social networking sites to stay on top of informa­
mean per-person check for Good Egg to within ± SO.25? tion related to their company to within ±O.O I with 95%
d. Discuss the effect of variation on the sample size needed. confidence, how many executives would you survey?
286 CHAPTER R Confidence Interval Estimation

Suppose that the personnel director also wishes to take a tics of her customers. In particular, she decides to focus on
survey in a branch office. Answer these questions: two variables: the amount of money spent by customers and
c. What sample size is needed to have 95% confidence in whether the customers own only one dog, only one cat, or
estimating the population mean absenteeism to within more than one dog and/or cat. The results from a sample of
± 1.5 days if the population standard deviation is esti­ 70 customers are as follows:
mated to be 4.5 days? • Amount of money spent: X $2 1.34, S $9.22.
d. How many clerical workers need to be selected to have • 37 customers own only a dog.
90% eonfidence in estimating the population proportion • 26 eustomers own only a cat.
to within ±0.075 if no previous estimate is available? • 7 customers own more than one dog and/or cat.
e. Based on (c) and (d), what sample size is needed if a sin­ a. Construct a 95% confidence interval estimate for the
gle survey is being conducted? population mean amount spent in the pet supply store.
b. Construct a 90% confidence interval estimate for the
8.71 The market research director for Dotty's Department population proportion of customers who own only a cat.
Store wants to study women's spending on cosmetics. A The branch manager of another outlet (Store 2) wishes to
survey of the store's customers is designed in order to esti­ conduct a similar survey in his store. The manager does not
mate the proportion of women who purchase their cosmet­ have access to the information generated by the manager of c
ics primarily from Dotty's Department Store and the mean Store I. Answer the following questions:
yearly amount that women spend on cosmetics. A previous c. What sample size is needed to have 95% confidence of
survey found that the standard deviation of the amount s
estimating the population mean amount spent in this store a
women spend on cosmetics in a year is approximately $18. to within ± $1.50 if the standard deviation is estimated to Sl
a. What sample size is needed to have 99% confidence of be $10? a1
estimating the population mean to within ± $5? d. How many customers need to be selected to have 90% f(
b. How many of the store's credit card holders need to be confidence of estimating the population proportion of
selected to have 90% confidence of estimating the popu­
a.
customers who own only a cat to within ±0.045?
lation proportion to within ±0.045? e. Based on your answers to (c) and (d), how large a sample
8.72 The branch manager of a nationwide bookstore chain should the manager take? b.
(located near a college campus) wants to study characteris­ 8.74 Scarlett and Heather, the owners of an upscale restau­
tics of her store's customers. She decides to focus on two rant in Dayton, Ohio, want to study the dining characteristics 8.
variables: the amount of money spent by customers and oftheir customers. They decide to focus on two variables: the pa
whether the customers would consider purchasing educa­ amount of money spent by customers and whether customers tic
tional DVDs related to graduate preparation exams, such as order dessert. The results from a sample of 60 customers are ViE
the GMAT, GRE, or LSAT. The results from a sample of 70 as follows: est
customers are as follows: • Amount spent: X == $38.54,S = $7.26. Wil
• Amount spent: X == $28.52, S == $11.39. • 18 customers purchased dessert. bel
• 28 customers stated that they would consider purchasing a. Construct a 95% confidence interval estimate for the a.
the educational DVDs. population mean amount spent per customer in the Us
a. Construct a 95% confidence interval estimate for the restaurant. wit
population mean amount spent in the bookstore. b. Construct a 90% confidence interval estimate for the pop­
b. Construct a 90% confidence interval estimate for the ulation proportion of customers who purchase dessert.
population proportion of customers who would consider Jeanine, the owner of a competing restaurant, wants to
purchasing educational DVDs. conduct a similar survey in her restaurant. Jeanine does not In J
Assume that the branch manager of another store in the chain have access to the information that Scarlett and Heather mel
(also located close to a college campus) wants to conduct a have obtained from the survey they conducted. Answer the was
similar survey in his store. Answer the following questions: following questions: the
c. What sample size is needed to have 95% confidence of c. What sample size is needed to have 95% confidence of dete
estimating the population mean amount spent in this estimating the population mean amount spent in her (anc
store to within ± $2 if the standard deviation is assumed restaurant to within ± $1.50, assuming that the standard
to be $10'1 deviation is estimated to be $8'1 $1
d. How many customers need to be selected to have 90% con­ d. How many customers need to be selected to have 90% b. C
fidence of estimating the population proportion who would confidence of estimating the population proportion of u
consider purchasing the educational DVDs to within ±0.04? customers who purchase dessert to within ±0.04? c. C
e. Based on your answers to (c) and (d), how large a sample e. Based on your answers to (c) and (d), how large a sample p<
should the manager take? should Jeanine take? d.C
8.73 The branch manager of an outlet (Store I) of a nation­ 8.75 The manufacturer of "Ice Melt" claims its product will p(
wide chain of pet supply stores wants to study characteris­ melt snow and ice at temperatures as low as 0° Fahrenheit. gl
Chapter Review Problems 287

n A representative for a large chain of hardware stores is inter­ e. Construct a 95% confidence interval estimate for the
d ested in testing this claim. The chain purchases a large ship­ total difference between the amount reimbursed and the
ment of 5-pound bags for distribution. The representative amount that the auditor determined should have been
f wants to know, with 95% confidence and within ±0.05, what reimbursed.
proportion of bags of Ice Melt perform the job as claimed by
the manufacturer.
8.78 A home furnishings store that sells bedroom furniture
is conducting an end-of-month inventory of the beds (mat­
a. How many bags does the representative need to test? What
tress, bed spring, and frame) in stock. An auditor for the
assumption should be made concerning the population
store wants to estimate the mean value of the beds in stock
proportion? (This is called destructive testing; i.e., the
at that time. She wants to have 99% confidence that her esti­
product being tested is destroyed by the test and is then
mate of the mean value is correct to within ± $1 00. On the
unavailable to be sold.)
basis of past experience, she estimates that the standard
b. Suppose that the representative tests 50 bags, and 42 of
deviation of the value of a bed is $200.
them do the job as claimed. Construct a 95% confidence
a. How many beds should she select?
interval estimate for the population proportion that will
b. Using the sample size selected in (a), an audit was con­
do the job as claimed.
ducted, with the following results:
c. How can the representative use the results of (b) to deter­
mine whether to sell the Ice Melt product? x = $1,654.27 S = $184.62
8.76 An auditor needs to estimate the percentage of times Construct a 99% confidence interval estimate for the total
a company fails to follow an internal control procedure. A
value of the beds in stock at the end of the month if there
sample of 50 from a population of 1,000 items is selected,
were 258 beds in stock.
and in 7 instances, the internal control procedure was not
followed. 8.79 A quality characteristic of interest for a tea-bag­
B. Construct a 90% one-sided confidence interval estimate filling process is the weight of the tea in the individual
for the population proportion of items in which the inter­ bags. In this example, the label weight on the package indi­
nal control procedure was not followed. cates that the mean amount is 5.5 grams of tea in a bag. If
b. If the tolerable exception rate is 0.15, what should the the bags are underfill ed, two problems arise. First, cus­
auditor conclude? tomers may not be able to brew the tea to be as strong as
they wish. Second, the company may be in violation of the
8.77 An auditor for a government agency needs to evaluate
truth-in-Iabeling laws. On the other hand, if the mean
payments for doctors' office visits paid by Medicare in a par­
amount of tea in a bag exceeds the label weight, the com­
ticular zip code during the month of June. A total of 25,056
pany is giving away product. Getting an exact amount of
visits occurred during June in this area. The auditor wants to
tea in a bag is problematic because of variation in the tem­
estimate the total amount paid by Medicare to within ± $ 5
perature and humidity inside the factory, differences in the
with 95% confidence. On the basis of past experience, she
density of the tea, and the extremely fast filling operation
believes that the standard deviation is approximately $30.
of the machine (approximately 170 bags per minute). The
B. What sample size should she select?

following data (stored in ~) are the weights, in


Using the sample size selected in (a), an audit is conducted,

grams, of a sample of 50 tea bags produced in one hour by


with the following results.

a single machine:
Amount of Reimbursement
5.65 5.44 5.42 5.40 5.53 5.34 5.54 5.45 5.52 5.41
x= $93.70 s = $34.55 5.57 5.40 5.53 5.54 5.55 5.62 5.56 5.46 5.44 5.51
In 12 of the office visits, an incorrect amount of reimburse­ 5.47 5.40 5.47 5.61 5.53 5.32 5.67 5.29 5.49 5.55
ment was provided. For the 12 office visits in which there 5.77 5.57 5.42 5.58 5.58 5.50 5.32 5.50 5.53 5.58
was an incorrect reimbursement, the differences between 5.61 5.45 5.44 5.25 5.56 5.63 5.50 5.57 5.67 5.36
the amount reimbursed and the amount that the auditor
a. Construct a 99% confidence interval estimate for the
determined should have been reimbursed were as follows
popUlation mean weight of the tea bags.
(and stored in IMZG:):
b. Is the company meeting the requirement set forth on
$17 $25 $14 -$10 $20 $40 $35 $30 $28 $22 $15 $5 the label that the mean amount of tea in a bag is 5.5
grams?
b. Construct a 90% confidence interval estimate for the pop­
c. Do you think the assumption needed to construct the con­
ulation proportion of reimbursements that contain errors.
fidence interval estimate in (a) is valid?
c. Construct a 95% confidence interval estimate for the
population mean reimbursement per office visit. 8.80 A manufacturing company produces steel housings
d. Construct a 95% confidence interval estimate for the for electrical equipment. The main component part of the
population total amount of reimbursements for this geo­ housing is a steel trough that is made from a 14-gauge steel
graphic area in June. coil. It is produced using a 250-ton progressive punch press
9. I Fundamentals of H ypothesis-Testing Methodology 309

t fproblems for Section 9.1


LEARNING THE BASICS h. Which type of error are the consumer groups trying to
9.1 If you use a 0.05 level of significance in a (two-tail) avoid? Explain.
hypothesis test, what will you decide if ZSTAT -0.767 c. Which type of error are the industry lobbyists trying to
avoid? Explain.
9.2 If you use a 0.05 level of significance in a (two-tail) d. How would it be possible to lower the chances of both
f hypothesis test, what will you decide if ZSTAT = +2.2l? Type I and Type II errors?
9.3 If you use a 0.10 level of significance in a (two-tail) 9.12 As a result of complaints from both students and fac­
hypothesis test, what is your decision rule for rejecting a ulty about lateness, the registrar at a large university wants
null hypothesis that the population mean is 500 if you use to adjust the scheduled class times to allow for adequate
the Ztest? travel time between classes and is ready to undertake a
9.4 If you use a 0.01 level of significance in a (two-tail) study. Until now, the registrar has believed that there should
hypothesis test, what is your decision rule for rejecting be 20 minutes between scheduled classes. State the null
Ho: f.L= 12.5 if you use the Z test? hypothesis, Ho, and the alternative hypothesis, HI'

9.5 What is your decision in Problem 9.4 if ZSTAT -2.61? 9.13 Do students at your school study more than, less than,
8 or about the same as students at other business schools?
9.6 What is the p-value if, in a two-tail hypothesis test, BusinessWeek reported that at the top 50 business schools,
" ZSTAT = +2.007 students studied an average of 14.6 hours per week (data
9.7 In Problem 9.6, what is your statistical decision if you extracted from "Cracking the Books," Special Report!
test the null hypothesis at the 0.10 level of significance? Online Extra, www.husinessweek.com. March 19, 2007).
Set up a hypothesis test to try to prove that the mean number of
9.8 What is the p-value if, in a two-tail hypothesis test,
hours studied at your school is different from the 14.6-hour
ZSTAT = -1.38?
per week benchmark reported by Business Week.
a. State the null and alternative hypotheses.
APPLYING THE CONCEPTS h. What is a Type I error for your test?
9.9 In the u.s. legal system, a defendant is presumed inno­ c. What is a Type II error for your test?
cent until proven guilty. Consider a null hypothesis, Ho, that
r7SEl'Fl 9.14 The quality-control manager at a light bulb
the defendant is innocent, and an alternative hypothesis, HI,
-.rill factory needs to determine whether the mean life of
that the defendant is guilty. A jury has two possible deci­
a large shipment of light bulbs is equal to 375 hours. The
sions: Convict the defendant (i.e., reject the null hypothesis)
population standard deviation is 100 hours. A random sample
or do not convict the defendant (i.e., do not reject the null
of 64 light bulbs indicates a sample mean life of350 hours.
hypothesis). Explain the meaning of the risks of committing
a. At the 0.05 level of significance, is there evidence that
either a Type I or Type II error in this example.
the mean life is different from 375 hours?
.e 9.10 Suppose the defendant in Problem 9.9 is presumed h. Compute the p-value and interpret its meaning .
g guilty until proven innocent, as in some other judicial sys­ c. Construct a 95% confidence interval estimate of the pop­
tems. How do the null and alternative hypotheses differ from ulation mean life of the light bulbs.
those in Problem 9. 9? What are the meanings of the risks of d. Compare the results of (a) and (c). What conclusions do
committing either a Type I or Type II error here? you reach?
9.11 The U.S. Food and Drug Administration (FDA) is 9.15 The manager of a paint supply store wants to deter­
responsible for approving new drugs. Many consumer mine whether the mean amount of paint contained in I-gallon
;C groups feel that the approval process is too easy and, there­ cans purchased from a nationally known manufacturer is
fore, too many drugs are approved that are later found to be actually 1 gallon. You know from the manufacturer's specifi­
ig unsafe. On the other hand, a number of industry lobbyists cations that the standard deviation of the amount of paint is
d have pushed for a more lenient approval process so that 0.02 gallon. YIilU select a random sample of 50 cans, and the
Id pharmaceutical companies can get new drugs approved mean amount of paint per I-gallon can is 0.995 gallon.
is more easily and quickly (data extracted from R. Sharpe, a. Is there evidence that the mean amount is different from
.t­ "FDA Tries to Find Right Balance on Drug Approvals," The 1.0 gallon (use IX 0.0 1)7
1­ Wall Street Journal, April 20, 1999, p. A24). Consider a null b. Compute the p-value and interpret its meaning.
hypothesis that a new, unapproved drug is unsafe and an c. Construct a 99% confidence interval estimate of the pop­
Ig alternative hypothesis that a new, unapproved drug is safe. ulation mean amount of paint.
:0 a. Explain the risks of committing a Type I or Type II d. Compare the results of (a) and (c). What conclusions do
). error. you reach?
The r test is a robust test. It due~ !lui lose poner if thl..' sl1''I 1 1.' of the population dep,Jlh
sOll1e\\hat from a normal distribution. parlicu/,Irly \\ hen the "ample size is large enough In
enable the test statistic f tll be influenced b) thl' Central Limit Theorem hee Section -~)
However. YOll can reach erroncous conclusions and can lose statistical power if you lbC th~
t test incorrectly. If the sample size. II. is small (i.e .. less than 30) and you cannut easily n1dKe
the aS~lll11ption that the underlying population is at least appro.\il11ately normally distributed.
then 11()11/}([/'clIIl('tric testing procedures arc more appropri<lte (see references I and 2).

t Problems for Section 9.2 ­


LEARNING THE nAS/cS packaging process is stopped if there is evidence that the
9.16 If, in a sample of 11 = 16 selected from a normal pop­ mean amount packaged is different from 8, 17 ounccs.
ulation, I 56 and 5:= 12, what is the value of tsr4T if Suppose that in a particular sample of 50 packages, the
you are testing the null hypothesis Ho: J.1 50? mean amount dispensed is 8,159 ounces, with a sample stan­
dard deviation of 0.051 ounce.
I
9.17 rn Problem 9.16, how many degrees of freedom are a. Is there evidence that the population mean amount is dilll:r­
there in the f test? ent from 8,17 ounces? (Use a 0.05 level of significance.)
9.18 In Problems 9,16 and 9.17, what are the critical val­ b. Determine the p-value and interpret its meaning.
I ues of t if the level of significance, a, is 0.05 and the alter­ 9.24 In a recent year. the Federal Communications
II native hypothesis, HI, is J.1 #- 50? Commission reported that the mean wait for repair:-; for
Verizon customers was 36.5 hours. In an effort to improve
III

1
9.19 In Problems 9.16, 9.17, and 9.18. what is your statis­

tical decision if the alternative hypothesis, HI, is J.1 #- 50?


this service, suppose that a new repair service process was
developed. This new process, when used for a sample of 100
9.20 If, in a sample of 11
II! I population, I := 65, and S
16 selected from a left-skewed
21, would you use the t test to
repairs, resulted in a sample mean of 34.5 hours and a sam­
ple standard deviation of 11.7 hours.
i 1 test the null hypothesis Ho: J.L = 60? Discuss. a. Is there evidence that the population mean amount is dif­
I I ferent from 36.5 hours? (Use a 0.05 level of significance.)
Ii 9.21 If. in a sample of /1
population, I
160 selected from a left-skewed
65, and S := 21, would you use the f test to
b. Determine the p-vaJue and interpret its meaning.
! test the null hypothesis Ho: J.1 60? Discuss. 9.25 In a recent year, the Federal Communications
1
Commission reported that the mean wait for repairs for
APPLYING THE CONCEPTS
AT &T customers was 25.3 hours, In an effort to imprme
I/sELFl9.22 You are the manager of a restaurant for a
this service, suppose that a new repair service process \\as
~ fast-food franchise, Last month, the mean waiting
developed. This new process, when used for a sample of IDO
time at the drive-through window for branches in your geo­
repairs. resulted in a sample mean of 22.3 hours and a sall1­
graphical region, as measured from the time a customer
pie standard deviation of 8.3 hours.
places an order until the time the customer receives the order,
a. Is there evidence that the population mean amount is dif­
was 3.7 minutes, You select a random sample of 64 orders.
ferent from 25.3 hours? (Use a 0.05 level of significance.)
The sample mean waiting time is 3.57 minutes, with a sam­
b. Determine the p-value and interpret its meaning.
ple standard deviation of 0,8 minute.
9.26 The file contains prices (in dollars) for
a. At the 0,05 level of significance, is there evidence that the two tidets. with online service charges. large popcorn.
population mean waiting time is different from 3.7 minutes? and two medium soft drinks at a sample of six theater
b. Because the sample size is 64, do you need to be con­ chains:
cerned about the shape of the population distribution 36,15 3 LOO 35.05 40.25 33.75 43.00
when conducting the f test in (a)? Explain. Source: Data extracted/rom K. Kl:'lly. "nIl:' Multiplex Under Siege.
9.23 A manufacturer of chocolate candies uses machines The Wall Street .Tournai. Decl!lIliJer 24-25. 2()U5, pp. PI. P5.
to package candies as they move along a filling line. a. At the 0.05 level of significance, is there evidence that
Although the packages are labeled as 8 ounces, the company the mean price for two movie tickets, with online servile
wants the packages to contain a mean of 8.17 ounces so that charges, large popcorn. and two medium soft drinks. i~
virtually none of the packages contain less than 8 ounces. different fr0111 $35'1
A sample of 50 packages is selected periodically, and the b. Determine the p-value in (a) and interpret its meaning.
9.2 f Test of Hypothesis for the Mean ((T Unknown) 315

Irts c. What assumption must you make about the population 9.29 One of the major measures of the quality of service
to distribution in order to conduct the t test in (a) and (b)'? provided by any organization is the speed with which it
4). d. Because the sample size is 6, do you need to be con­ responds to customer complaints. A large family-held depart­
the cerned about the shape of the population distribution ment store selling furniture and flooring, including carpet,
ike when conducting the t test in (a)? Explain. had undergone a major expansion in the past several years.
ed, 9.27 In New York State, savings banks are permitted to sell In particular, the flooring department had expanded from
2 installation crews to an installation supervisor, a measurer,
a form of life insurance called savings bank life insurance
and 15 installation crews. The store had the business objec­
(SBLI). The approval process consists of underwriting, which
tive of improving its response to complaints. The variable of
includes a review of the application, a medical information
interest was defined as the number of days between when the
bureau check, possible requests for additional medical infor­
complaint was made and when it was resolved. Data were
mation and medical exams, and a policy compilation stage in
collected from 50 complaints that were made in the last year.
which the policy pages are generated and sent to the bank for
The data were stored in mrn:rt;, and are as follows:
delivery. The ability to deliver approved policies to customers
the in a timely manner is critical to the profitability of this ser­ 54 5 35 137 31 27 152 2 123 81 74 27
:es. vice. During a period of one month, a random sample of 27 II 19 126 lID 110 29 61 35 94 31 26 5
the approved policies is selected, and the total processing time, in 12 4 165 32 29 28 29 26 25 14 13
an­ days, is recorded (and stored in lIiB/IfIM:):
13 IO 5 27 4 52 30 22 36 26 20 23
73 19 16 64 28 28 31 90 60 56 31 56 22 18 33 68
fer­
45 48 17 17 17 91 92 63 50 51 69 16 17 a. The installation supervisor claims that the mean number
.)
R. Inthe past, the mean processing time was 45 days. At the of days between the receipt of a complaint and the reso­
0.05 level of significance, is there evidence that the mean lution of the complaint is 20 days. At the 0.05 level of
)ns
processing time has changed from 45 days? significance, is there evidence that the claim is not true
for
b. What assumption about the population distribution is (i.e., that the mean number of days is different from 20)?
ove
needed in order to conduct the t test in (a)? b. What assumption about the population distribution is
;vas
c. Construct a boxplot or a normal probability plot to eval­ needed in order to conduct the t test in (a)?
100
uate the assumption made in (b). c. Construct a boxplot or a normal probability plot to eval­
lffi­
d. Do you think that the assumption needed in order to con­ uate the assumption made in (b).
duct the t test in (a) is valid? Explain. d. Do you think that the assumption needed in order to con­
dif­
duct the t test in (a) is valid? Explain.
~e.) 9.28 The following data (in Di) represent the amount of
9.30 A manufacturing company produces steel housings
soft-drink filled in a sample of 50 consecutive 2-liter bot­
for electrical equipment. The main component part of the
ons tles. The results, listed horizontally in the order of being
housing is a steel trough that is made out of a 14-gauge steel
for filled, were
coil. It is produced using a 250-ton progressive punch press
ove
2.109 2.086 2.066 2.075 2.065 2.057 2.052 2.044 2.036 2.038 with a wipe-down operation that puts two 90-degree forms
\Vas
2.031 2.029 2.025 2.029 2.023 2.020 2.0\5 2.014 2.013 2.014 in the flat steel to make the trough. The distance from one
100
side of the form to the other is critical because of weather­
1I1l­ 2.012 2.012 2.012 2.010 2.005 2.003 1.999 1.996 1.997 1.992
proofing in outdoor applications. The company requires that
1.994 1.986 1.984 1.981 1.973 1.975 1.971 1.969 1.966 1.967 the width of the trough be between 8.31 inches and 8.61
dif­ 1.963 1.957 1.951 1.951 1.947 1.941 1.941 1.938 1.908 1.894 inches. The file ~ contains the widths of the troughs, in
ce.)
a. At the 0.05 level of significance, is there evidence that inches, for a sample of n = 49:
the mean amount of soft drink filled is different from 8.312 8.343 8.317 8.383 8.348 8.410 8.351 8.373 8.481 8.422
for
2.0 liters? 8.476 8.382 8.484 8.403 8.414 8.419 8.385 8.465 8.498 8.447
)rn,
b. Determine the p-value in (a) and interpret its meaning. 8.436 8.413 8.489 8.414 8.481 8.415 8.479 8.429 8.458 8.462
Iter
c. In (a), you assumed that the distribution of the amount of
8.460 8.444 8.429 8.460 8.412 8.420 8.410 8.405 8.323 8.420
soft drink filled was normally distributed. Evaluate this
assumption by constructing a boxplot or a normal proba­ 8.396 8.447 8.405 8.439 8.411 8.427 8.420 8.498 8.409
ge, bility plot. a. At the 0.05 level of significance, is there evidence that the
d. Do you think that the assumption needed in order to con­ mean width of the troughs is different from 8.46 inches?
that duct the t test in (a) is valid? Explain. b. What assumption about the population distribution is
vice e. Examine the values of the 50 bottles in their sequential needed in order to conduct the t test in (a)?
" is order, as given in the problem. Is there a pattern to the c. Evaluate the assumption made in (b).
results? If so, what impact might this pattern have on the d. Do you think that the assumption needed in order to con­
g. validity of the results in (a)? duct the t test in (a) is valid? Explain.
,
!
I

I
Problems for Section 9.3
:I
I LEARNING THE BASICS
9.34 In a one-tail hypothesis test \\here you reject 1111 only
b. Determine the p-value and interpret its meaning.
c. C0l1lpare the results in (a) and (b) to those of Problem
II in the upper tail, what is the p-value jf ZSIAT = + 2.00? 9.25 (a) and (b) on page 314.

;II 9.35 In Problem 9.34. what is your statistical decision if SELF 9.46 The Glen Valley Steel Company manufac_
::"'7f':!:' tures stee I bars. I f the production process is
you test the null hypothesis at the 0.05 level of significance?
working properly, it turns oUi steel bars that are normally
9.36 In a one-tail hypothesis test where you reject Hu only distributed with mean length of at least 2.8 feet. Longer
in the /rm'er tail, what is the p-value if ZSlAT -1.38? steel bars can be used or altered. but shorter bars must
I I 9.37 In Problem 9.36, what is your statistical decision if be scrapped. You select a sample of 25 bars; the mean
length is 2.73 feet, and the sample standard deviation is
I you test the null hypothesis at the 0.01 level of significance?
0.20 foot.
I
!
9.38 In a one-tail hypothesis test where you reject Ho only
in the lOIre,. tail, what is the p-va)ue if ZST.4T + 1.38"
a. If you test the null hypothesis at the 0.05 level of signifi­
i
'I'
cance, what decision do you make using the critical value
9.39 In Problem 9.38, what is the statistical decision if you approach to hypothesis testing?
Ij iI test the null hypothesis at the 0.0 I level of significance? b. If you test the null hypothesis at the 0.05 level of signifi­
cance, what decision do you make using the p-value
9.40 In a one-tail hypothesis test where you reject Ho approach to hypothesis testing?
ill only in the IIpper tail, what is the critical value of the t-test
statistic with 10 degrees of freedom at the 0,0 I level of
c. Interpret the meaning of the p-value in this problem.
d. Compare your conclusions in (a) and (b).
significance?
9.47 You arc the manager of a restaurant that delivers pizza
9.41 In Problem 9.40, what is your statistical decision if to college dormitory rooms. You have just changed your
tSIAT +2.39? delivery process in an effort to reduce the mean time
9.42 In a one-tail hypothesis test where you reject Ho only between the order and completion of delivery from the cur­
in the lower tail, what is the critical value of the tSTAT test rent 25 minutes. A sample of 36 orders using the new deliv­
statistic with 20 degrees of freedom at the 0.01 level of ery process yields a sample mean of 22.4 minutes and a
signi ficance? sample standard deviation of 6 minutes.
a. Using the six-step critical value approach, at the 0.05
9.43 In Problem 9.42, what is your statistical decision if level of significance, is there evidence that the popula­
I tSTAT l.l5? tion mean delivery time has been reduced below the pre­
vious population mean value of 25 minutes?
APPLYING THE CONCEPTS
I 9.44 In a recent year, the Federal Communications
Commission reported that the mean wait for repairs for
b. At the 0.05 level of significance, use the five-step p-value
approach.
c. Interpret the meaning of the p-value in (b).
Verizon customers was 36.5 hours. In an effort to improve this d. Compare your conclusions in (a) and (b).
II
!, Ii 1
f
service, suppose that a new repair service process was devel­

oped. This new process, used for a sample of 100 repairs,

9.48 Children in the United States account directly for


$36 billion in sales annually. When their indirect influence
resulted in a sample mean of 34.5 hours and a sample stan­

over product decisions from stereos to vacations is consid­


dard deviation of 11.7 hours.
ered, the total economic spending affected by children in the
a. Is there evidence that the population mean amount is less
United States is $290 billion. It is estimated that by age 10.
than 36.5 hours? (Use a 0.05 level of significance.)
a child makes an average of more than five trips a week to a
b. Determine the p-value and interpret its meaning.
store (data extracted from M. E. Goldberg, G. 1. Gorn, L. A.
c. Compare the results in (a) and (b) to those of Problem
Peracchio, and G. Bamossy, "Understanding Materialism
9.24 (a) and (b) on page 314.
Among Youth," Journal of ConSlimer Psychology, 2003.
9.45 In a recent year, the Federal Communications 13(3), pp. 278-288). Suppose that you want to prove that
Commission reported that the mean wait for repairs for children in your city average more than five trips a week to
AT&T customers was 25.3 hours. In an effort to improve this a store. Let fL represent the population mean number of
service, suppose that a new repair service process was devel­ times children in your city make trips to a store.
oped. This new process, used for a sample of 100 repairs, a. State the null and alternative hypotheses.
resulted in a sample mean of 22.3 hours and a sample stan­ b. Explain the meaning of the Type I and Type II errors in
dard deviation of 8.3 hours. the context of this scenario.
a. Is there evidence that the population mean amount is less c. Suppose that you carry out a similar study in the city in
than 25.3 hours? (Use a 0.05 level of significance.) which you live. You take a sample of ) 00 children and
9.4 Z Test of Hypothesis for the Proportion 321

find that the mean number of trips to the store is 5.4 7 and out was 9.52 minutes, with a sample standard deviation of
the sample standard deviation of the number of trips to 5.8 minutes.
the store is 1.6. At the 0.01 level of significance, is there a. At the 0.05 level of significance, using the critical value
evidence that the population mean number of trips to the approach to hypothesis testing, is there evidence that the
store is greater than 5 per week? population mean waiting time to check out is less than
d. Interpret the meaning of the p-value in (c). 10.73 minutes?
ac­ b. At the 0.05 level of significance, using the p-value
9.49 The population mean waiting time to check out of a
IS approach to hypothesis testing, is there evidence that the
supermarket has been 10.73 minutes. Recently, in an effort
lly population mean waiting time to check out is less than
to reduce the waiting time, the supermarket has experi­
~er 10.73 minutes?
'mented with a system in which there is a single waiting
LISt c. Interpret the meaning of the p-value in this problem.
line with multiple checkout servers. A sample of 100 cus­
'an d. Compare your conclusions in (a) and (b).
tomers was selected, and their mean waiting time to check
IS

fi­
ue
9.4 Z Test of Hypothesis for the Proportion
fi­
In some situations, you want to test a hypothesis about the proportion of events of interest in
ue
the population, 7T, rather than test the population mean. To begin, you select a random sample
and compute the sample proportion, p = X / n. You then compare the value of this statistic to
the hypothesized value of the parameter, 7T, in order to decide whether to reject the null hypoth­
esis. If the number of events of interest (X) and the number of events that are not of interest
za (n - X) are each at least five, the sampling distribution of a proportion approximately follows
ur a normal distribution. You use the Z test for the proportion given in Equation (9.3) to perform
ne the hypothesis test for the difference between the sample proportion, p, and the hypothesized
Ir­ population proportion, 7T.
v-
a
ZTEST FOR THE PROPORTION
)5
a­ (9.3)

where

X N umber of events of interest in the sample


p = Sample proportion =- = I .
n Samp e size
::Jr
~e 7T = Hypothesized proportion of events of interest in the population

le The ZSTAT test statistic approximately follows a standardized normal distribution when
0, X and (n - X) are each at least 5.
a
\,
m
Alternatively, by multiplying the numerator and denominator by n, you can write the ZSTAT
3,
test statistic in terms of the number of events of interest, X, as shown in Equation (9.4).
at
:0
)f
ZTEST FOR THE PROPORTIOt\llN TERMS OF THE NUMBER OF EVENTS
OF INTEREST
n X - n7T
ZSTAT = ~r===== (9.4)
Vn7T(l - 7T)
n
d
324 CHAPTER 9 Fundamentals of Hypothesi" Testing

Problems for Section 9.4


LE.ARI.J1NG 1 HE:. BASICS compare the percentage of employees in the organization in
9.50 If, in a random sample of 400 items, 88 are defective, a particular position with a specific background to the per.
what is the sample proportion of defective items? centage in a particular position with that specific background
in the general workforce. Recently, a large academic medical
9.51 In Problem 9.50, if the null hypothesis is that 20% of center determined that 9 of 17 employees in a particular posi.
the items in the population are defective, what is the value tion were female, whereas 55% of the employees for this
of ZST..JT? position in the general workforce were female. At the 0.05
9.52 In Problems 9.50 and 9.51, suppose you are testing level of significance, is there evidence that the proportion of
the null hypothesis Ho: 7T = 0.20 against the two-tail alter­ females in this position at this medical center is different
native hypothesis HI: 7T * 0.20 and you choose the level of from what would be expected in the general workforce?
significance a = 0.05. What is your statistical decision? ~~ 9.56 Of 1,000 respondents aged 24 to 35, 65%
, " reported that they preferred to "look for a job in a
APPLYING THE CONCEPTS place where I would like to live" rather than "look for the best
9.53 The U.S. Department of Education reports that 46% job I can find, the place where I live is secondary" (data
of full-time college students are employed while attending extracted from L. Belkin, "What Do Young Jobseekers Want?
college (data extracted from "The Condition of Education (Something Other Than a Job)," The New York Times,
2009," National Centerfor Education Statistics, nces.ed.gov). September 6,2007, p. G2). At the 0.05 level of significance, is
A recent survey of60 full-time students at Miami University there evidence that the proportion of all youngjobseekers aged
found that 29 were employed. 24 to 35 who preferred to "look for a job in a place where I
a. Use the five-step p-value approach to hypothesis testing would like to live" rather than "look for the best job I can find,
and a 0.05 level of significance to determine whether the the place where I live is secondary" is different from 60%')
proportion of full-time students at Miami University is 9.57 One of the biggest issues facing e-retailers is the abil­
different than the national norm of 0.46. ity to reduce the proportion of customers who cancel their
b. Assume that the study found that 36 of the 60 full-time transactions after they have selected their products. It has
students were employed and repeat (a). Are the conclu­ been estimated that about half of prospective customers can­
sions the same?
cel their transactions after they have selected their products
9.54 Online magazines make it easy for readers to link to an (data extracted from B, Tedeschi, HE-Commerce, a Cure for
advertiser's Web site directly from an advertisement placed in Abandoned Shopping Carts: A Web Checkout System That
the digital magazine. A recent survey indicated that 56% of Eliminates the Need for Multiple Screens," The New York
online magazine readers have clicked on an advertisement Times, February 14,2005, p, C3). Suppose that a company
and linked directly to the advertiser's Web site. The survey changed its Web site so that customers could use a single­
was based on a sample size of n = 6,403 (data extracted from page checkout process rather than multiple pages. A sample
"Metrics," EContent, January/February, 2007, p. 20). of 500 customers who had selected their products were pro­
a. Use the five-step p-value approach to try to determine vided with the new checkout system. Of these 500 cus­
whether there is evidence that more than half of all the tomers, 210 cancelled their transactions after they had
readers of online magazines have linked to an advertiser's selected their products.
Web site. (Use the 0.05 level of significance.) a. At the 0.01 level of significance, is there evidence that
b. Suppose that the sample size was only n = 100, and as the population proportion of customers who select prod­
before, 56% of the online magazine readers indicated that ucts and then cancel their transaction is less than 0.50
they had clicked on an advertisement to link directly to with the new system?
the advertiser's Web site. Use the five-step p-value b. Suppose that a sample of /1 = 100 customers (instead of
approach to try to determine whether there is evidence n = 500 customers) were provided with the new checkout
that more than half of all the readers of online magazines system and that 42 of those customers cancelled their trans­
have linked to an advertiser's Web site. (Use the 0.05 actions after they had selected their products. At the 0.01
level of significance.) level of significance, is there evidence that the population
c. Discuss the effect that sample size has on hypothesis testing. proportion of customers who select products and then can­
d. What do you think are your chances of rejecting any null cel their transaction is less than 0.50 with the new system?
hypothesis concerning a population proportion if a sam­ c. Compare the results of (a) and (b) and discuss the effect
ple size of n = 20 is used? that sample size has on the outcome, and, in general, in
9.55 One of the issues facing organizations is increasing hypothesis testing.
diversity throughout the organization. One of the ways to 9.58 A recent study by the Pew Internet and American liCe
evaluate an organization's success at increasing diversity is to Project (pewinternet.org) found that Americans had a
9.64 H(m can a confidence intenal estimate for the popu­ 9.69 Webcredibk. a UK-ba~ed con~lIlting firm spcclCiliz.
lation mean prm ide conclusions tn the corresponding t\\ 0­ ing in Web sill'S. intranets. mobile dcviees. and applic3ti'H1S.
tail hypothesis lest for the population mcan? conducted a suney of 1.132 mobile phone lIsers bc(\\cen
February [lnd April 2009. The survey found th[ll 52 of 1
'"
9.65 What is the six-step critical value approach to hypoth­
lllobile phone users are 110\\ using the mobile Internet (d<lta
esis testing,?
extracted from ..Email and Social Networking Most Popular
9.66 \Vhat is the fi\e-step p-value approach to hypothesis Mobile Internet Activities:' www.'webcredible.co.uk. \lay
testing') 13, 2009). The authors of the article imply th(lt the suney
proves that more than half of alll110bile phone users are n()\\
APPLYlh!G lHf:. COh)CE:PT~ using the mobile Internet.
a. lJse the five-step p-value approach to hypothesis testing
9.67 An article in lvfarketing NeIl'S (T, T. Semon. "Consider
and (l 0.05 level of significance to try to prove that more
a Statistical Insignificance Test," ,!o,1arketing Nell'S. February
than half of all mobile phone users are now using the
I. 1999) argued that the level of significance used when
mobile Internet.
comparing two products is often too low--that is. some­
b. Based 011 your result in (a), is the claim implied by the
times you should be using an a value greater than (lOS.
authors valid?
Specifically, the article recounted testing the proportion of
c. Suppose the survey found that 53% of mobile phone
potential customers with a preference for product lover
users are now using the mobile Internet. Repeat parts (a)
product 2. The null hypothesis was that the population pro­
and (b).
portion of potential customers preferring product I was
d. Compare the results of (b) and (c).
0.50. and the alternative hypothesis was that it was not equal
to 0.50. The p-value for the test was 0.22. The article sug­ 9.70 The owner of a gasoline station wants to study gaso­
gested that, in some cases, this should be enough evidence line purchasing habits by motorists at his station. He selects
to reject the null hypothesis. a random sample of 60 motorists during a certain week. \\ith
a. State, in statistical terms. the null and alternative the following results:
hypotheses for this example. • The amount purchased was X = 11.3 gallons, S = 3.1
b. Explain the risks associated with Type I and Type II errors gallons.
in this case. • Eleven motorists purchased premium-grade gasoline,
c. What would be the consequences if you rejected the null a. At the 0.05 level of significance, is there evidence that the
hypothesis for a p-value of 0.22? population mean purchase was different from 10 gallons?
d. Why do you think the article suggested raising the value b. Determine the p-value in (a).
of a? c. At the 0.05 level of significance, is there evidence that
e. What would you do in this situation? less than 20% of all the motorists at the station purchased
f. What is your answer in (e) if the p-value equals 0.12? premium-grade gasoline?
What if it equals 0.06? d. What is your answer to (a) if the sample mean equals 10J
9.68 La Quinta Motor Inns developed a computer model to gallons?
help predict the profitability of sites that are being consid­ e. What is your answer to (c) if 7 motorists purchased
ered as locations for new hotels. If the computer model pre­ premium-grade gasoline?
dicts large profits, La Quinta buys the proposed site and 9.71 An auditor for a government agency is assigned the
builds a new hotel. If the computer model predicts small or task of evaluating reimbursement for office visits to
moderate profits, La Quinta chooses not to proceed with physicians paid by Medicare. The audit was conducted on
I
II that site (data extracted from S. E. Kimes and 1. A. a sample of 75 of the reimbursements, with the following
!
Fitzsimmons, "Selecting Profitable Hotel Sites at La Quinta results:
Motor Inns," Interfaces, Vol. 20, March·-April 1990. • In 12 of the office visits. an incorrect amount of reim­
pp. 12-20). This decision-making procedure can be expressed bursement was provided.
I

i
in the hypothesis-testing framework. The null hypothesis is
that the site is not a profitable location. The alternative
• The amount of reimbursement was X = $93.70.
S= $34.55.
I hypothesis is that the site is a profitable location. a. At the 0.05 level of significance, is there evidence that
a. Explain the risks associated with committing a Type J the population mean reimbursement was less than $1 ocr?
Ii
error in this case.
b. Explain the risks associated with committing a Type Il
b. At the 0.05 level of significance, is there evidence thaI
the proportion of incorrect reimbursements in the popu­
I error in this case. lation was greater than 0.1 O'?
1
c. Which type of error do you think the executives at c. Discuss the underlying assumptions of the test used in (a).
La Quinta Motor Inns want to avoid? Explain, d. What is your answerto (a) if the sample mean equals $90','
d. How do changes in the rejection criterion affect the prob­ e. What is your answer to (b) if 15 office visits had incor­
abilities of committing Type J and Type II errors? rect reimbursements?
10.1 Comparing the Means of Two Independent Populations 343
aisle
cola.
nor·
THINK ABOUT THIS "This Call May Be Monitored .. "
:lude
.. "you have ever used a telephone to seek cus­ Who You Gonna Call? training program against the mean score for
Ions. the 30 employees in the preexisting training
tomer service, at least once you've probably Our previous student presented her plan to her boss
.' heard a message that begins "this call may be for approval, but her boss, remembering the words program.
monitored ... " Most of the time the message of afamous statistiCian, said, "In God we trust, all She listened as her professor explained,
explains that the monitoring is for "quality assur­ others must bring data." That is, her boss wanted "What you are trying to prove is that the mean
ance purposes," but do companies really monitor proof that her new training program would improve score from the new training program is higher
1ces, your calls to improve quality? customer service. Faced with such a request, who than the mean score from the current program.
From one of our previous students, we've would you call? She called one of us. "Hey, You can make the null hypothesis that the means
flnot
learned that a certain large financial corpora­ Professor, you'll never believe why I called. Iwork for are equal and see if you can reject it in favor of
d by tion really does monitor calls for quality pur­ the alternative that the mean score from the new
a large company, and in the project I am currently
lIves poses. This student was asked to develop an working on, I have to put some of the statistics you I program is higher."
r the improved training program for a call center taught us to work! Can you help?" The answer was I "Or, as you used to say, 'if the p-value is low,
that was hiring people to answer phone calls "yes," and together they formulated this test: . Ho must go!'-yes, I do remember!" she replied,
r the customers make about outstanding loans. Her professor chuckled and said, "Yes, that's cor­
For feedback and evaluation, she planned to • Randomly assign the 60 most recent hires to rect And if you can reject HQ, you will have the
randomly select phone calls received by each two training programs. Half would go through proof to present to your boss." She thanked him
new employee and rate the employee on the preexisting training program, and half for his help and got back to work, with the new­
10 aspects of the call, including whether the would be trained using the new program. found confidence that she would be able to suc­
employee maintained a pleasant tone with the • At the end of the first month, compare the cessfully apply the t test that compares the
customer. mean score for the 30 employees in the new means of two independent populations.

IProblems for Section 10.1


LEARNING THE BASICS APPLYING THE CONCEPTS
10.1 If you have samples of nJ = 12 and n2 = 15, in per­ 10.7 According to a recent study, when shopping online
forming the pooled-variance t test, how many degrees of for luxury goods, men spend a mean of $2,401, whereas
freedom do you have? women spend a mean of $1 ,527 (data extracted from R. A.
Smith, "Fashion Online: Retailers Tackle the Gender Gap,"
10.2 Assume that you have a sample of nl 8, with the
The Wall Street Journal, March 13,2008, pp. DI, DIO).
sample mean Xl = 42, and a sample standard deviation
Suppose that the study was based on a sample of 600 men
S) 4, and you have an independent sample ofn2 15 from
and 700 females, and the standard deviation of the amount
another population with a sample mean of X2 34 and a
spent was $1,200 for men and $1,000 for women.
1,1)
sample standard deviation S2 = 5.
a. State the null and alternative hypothesis if you want to
a. What is the value of the pooled-variance tSTAT test statis­
determine whether the mean amount spent is higher for
tic for testing Ho: 11-1 11-2?
men than for women.
b. In finding the critical value tal2' how many degrees of
b. In the context of this study, what is the meaning of the
freedom are there? .
Type I error?
c. Using the level of significance a = O.oI, what is the criti­
c. In the context of this study, what is the meaning of the
cal value for a one-tail test of the hypothesis Ho: 11-1 :s; 11-2
Type II error?
against the alternative, HI: 11-1> 11-2?
d. At the 0.01 level of significance, is there evidence that the
d. What is your statistical decision?
mean amount spent is higher for men than for women?
10.3 What assumptions about the two populations are nec­
10.8 A recent study ("Snack Ads Spur Children to Eat
rhus, essary in Problem 10.2?
More," The New York Times, July 20, 2009, p. B3) found that
oled· 10.4 Referring to Problem 10.2, construct a 95% confidence children who watched a cartoon with food advertising ate,
n the interval estimate of the population mean difference between on average, 28.5 grams of Goldfish crackers as compared to
iance 11-1 and 11-2' an average 19.7 grams of Goldfish crackers for children who
It that watched a cartoon without food advertising. Although there
t pro­ 10.5 Referring to Problem 10.2, if n 1= 5 and n2 4, how
were 118 children in the study, neither the sample size in
,ether many degrees of freedom do you have?
each group nor the sample standard deviations were
I help 10.6 Referring to Problem 10.2, if nl 5 and n2 = 4, at the reported. Suppose that there were 59 children in each group,
riate. 0.01 level of significance, is there evidence that 11-1 > 11-2? and the sample standard deviation for those children who
352 CfIAPTl:R I() Two-Sample Tcsb

Because t"TIl - 3.04411 is less than 1.1\331, you rejcct the Ilull hypothesis, Ho (the jHalue
is 0.0070 0.(5). There is e\'idence that the mean delivery time is lower for the local pizza
restaurant than for the national pizza chain.
This conclusion is different from the one you reached in Example 10.1 on page 340 when
you used the pooled-variance f test for thcse data. By pairing the delivery times, you .'11\' able
to fOCLlS on the differences between the two pizza delivery services and not tbe variability cre.
ated by ordering pizzas at different times of day. The paired! test is a more powerful statistical
procedure that is better able to detect tbe difference between the two pizza delivery sen ices,
because you are controlling for tbe time of day they were ordered.

Confidence Interval Estirnate for the Mean Difference

Instead of, or in addition to, testing for the difference between the means of two related populations,

you can use Equation ( 10.4) to construct a confidence interval estimate for the mean difference.

CONFIDENCE INTERVAL ESTIMATE FOR THE MEAN DIFFERENCE

or

(10.4)

where ta/2 is the critical value of the t distribution, with n - I degrees of freedom, for an
area of a/2 in the upper tail.

Return to the example comparing mileage generated by real-life driving and by govern­
ment standards on page 348. Using Equation (lOA), l5 -2.3444, SD = 2,8936, n 9, and
ta/2 = 2,306 (for 95% confidence and n - I 8 degrees of freedom),
2.8936
-2.3444 ± (2.306)-y19

- 2.3444 ± 2.2242
-4.5686 $ J-LD $ -0,1202
Thus, with 95% confidence, the mean difference in gasoline mileage between the real-life
driving done by an AAA member and the driving done according to government standards is
between -4,5686 and -0.1202 miles per gallon, Because the interval estimate contains only
values less than zero, you can conclude that there is a difference in the population means, The
mean miles per gallon for the real-life driving done by an AAA member is less than the mean
miles per gallon for the driving done according to government standards,

f Problems for Section 10.2


LEARNING THE BASICS reads the passage from the book, In the analysis of the data
10.18 An experimental design for a paired t test has 20 collected from this experiment, how many degrees of free­
pairs of identical twins. How many degrees of freedom are dom are there in the test?
there in this t test?
APPLYING THE CONCEPTS
10.19 Fifteen volunteers are recruited to participate in an ~ 10.20 Nine experts rated two brands of Colombian
experiment. A measurement is made (such as blood pres­ IIm.OO coffee in a taste-testing experiment. A rating on a
sure) before each volunteer is asked to read a particularly 7-point scale (I = extremely unpleasing, 7 extremely
upsetting passage from a book and after each volunteer pleasing) is given for each of four characteristics: taste,
10.2 Comparing the Means ofTwo Related Populations 353

iue . aroma, richness, and acidity. The following data (stored in a. At the 0.01 level of significance, is there evidence of a
:za ~) display the summated ratings-accumulated over all difference between the mean price of textbooks at the
four characteristics. local bookstore and Amazon.com?
len b. What assumption is necessary about the population dis­
::lIe tribution in order to perform this test?
re· c. Construct a 99% confidence interval estimate of the
cal BRAND mean difference in price. Interpret the interval.
es, EXPERT A B d. Compare the results of (a) and (c).
CC 24 26 10.23 In tough economic times, magazines and other media
S.E. 27 27 have trouble selling advertisements. Thus, one indicator of a
E.G. 19 22 weak economy is a reduction in the number of magazine
ns, B.L 24 27 pages devoted to advertisements. The file Wb~' contains
CM. 22 25 the number of pages devoted to advertisements in May 2008
CN. 26 27 and May 2009 for 12 men's magazines (extracted from
G.N. 27 26 W. Levith, "Magazine Monitor," Media week, April 20,
R.M. 25 27 2009, p.53).
p.y. 22 23 a. At the 0.05 level of significance, is there evidence that
the mean number of pages devoted to advertisements in
men's magazines was higher in May 2008 than in May
2009'1
a. At the 0.05 level of significance, is there evidence of a h. What assumption is necessary about the population dis­
difference in the mean summated ratings between the two
tribution in order to perform this test?
brands?
c. Use a graphical method to evaluate the validity of the
b. What assumption is necessary about the population dis­ assumption in (b).
tribution in order to perform this test?
d. Construct and interpret a 95% confidence interval esti­
c. Determine the p-value in (a) and interpret its meaning. mate of the difference in the mean number of pages
d. Construct and interpret a 95% confidence interval esti­ devoted to advertisements in men's magazines between
m· mate of the difference in the mean summated ratings
May 2008 and May 2009.
md between the two brands.
10.24 Multiple myeloma, or blood plasma cancer, is char­
10.21 In industrial settings, alternative methods often exist acterized by increased blood vessel formulation (angiogene­
for measuring variables of interest. The data in IMlt1llh,ji.\'l sis) in the bone marrow that is a predictive factor in sur­
(coded to maintain confidentiality) represent measurements vival. One treatment approach used for multiple myeloma is
in-line that were collected from an analyzer during the pro­ stem cell transplantation with the patient's own stem cells.
duction process and from an analytical lab (data extracted The following data (stored in mal) represent the bone
from M. Leitnaker, "Comparing Measurement Processes: marrow microvessel density for patients who had a com­
life In-line Versus Analytical Measurements," Quality Engineering, plete response to the stem cell transplant (as measured by
sis 13,2000-2001, pp. 293-298). blood and urine tests). The measurements were taken imme­
,nly a. At the 0.05 level of significance, is there evidence of a diately prior to the stem cell transplant and at the time the
[he difference in the mean measurements in-line and from an complete response was determined:
can analytical lab?
b. What assumption is necessary about the population dis­
tribution in order to perform this test?
c. Use a graphical method to evaluate the validity of the Patient Before After
assumption in (a). 158 284
1
d. Construct and interpret a 95% confidence interval esti­ 2 189 214
mate of the difference in the mean measurements in-line 3 202 101
lata
and from an analytical lab. 4 353 227
rec­
5 416 290
10.22 Can students save money by comparison shopping 6 426 176
for textbooks at Amazon.com? To investigate this possibil­
7 441 290
ity, a random sample of 19 textbooks used during the Spring
Source: Data extracted from S V Rajkllmar. R. Fonseca, T E. Witzig.
)ian 2009 semester at Miami University was selected. The prices M. A. Gertz, and P. R. Greipp. "Balle Marrow Angiogenesis in Patients
ma for these textbooks at both a local bookstore and through Achieving Complete Response After Stem Cell Transplantation/or
lely Amazon.com were recorded. The prices for the textbooks Multiple Myeloma." Leukemia, 1999. 13, pp. 469-472.
ste, are stored in ~.
I (),3 Comparing the ProportIOns of Two Independent Populations 359
on of
Irena­ CONFIDENCE INTERVAL ESTIMATE FOR THE DIFFERENCE BETWEEN TWO
alter- PROPORTIONS
The COMPUTE worksheet
of the Z Two Proportions
it is workbook computes the
)f(ion confidence interval estimate or
for the difference between
two proportions in columns
it is Dand E (not shown in Figure
on of 70,12 on page 357)

(10.6)
.Iue is

To construct a 95% confidence interval estimate for the population difference between the
proportion of guests who would return to the Beachcomber and who would return to the
Windsurfer, you use the results on page 356 or from Figure 10.12 on page 357:
154
PI XI = 163 = 0.7181 P2 = n2 262 = 0.5878
nl 227
Using Equation (10.6),

(0.7181 0.5878) ± (1

0.1303 ± (1.96)(0.0426)
0.1303 ± 0.0835
0.0468 :s:; (7T1 7T2):::;; 0.2138

Thus, you have 95% confidence that the difference between the population proportion of
guests who would return to the Beachcomber and the Windsurfer is between 0.0468 and
0.2138. In percentages, the difference is between 4.68% and 21.38%. Guest satisfaction is
higher at the Beachcomber than at the Windsurfer,

rProblems for Section 10.3


LEARNING THE BASICS shopping for clothing?" Of 240 males, 136 answered yes. Of
10.27 Letnl 100'XI = 50,fl2 = 100,andX2 = 30. 260 females, 224 answered yes.
a. At the 0.05 level of significance, is there evidence of a sig­ a. Is there evidence of a significant difference between
nificant difference between the two population proportions? males and females in the proportion who enjoy shopping
h. Construct a 95% confidence interval estimate for the dif­ for clothing at the 0.01 level of significance?
ecause
ference between the two population proportions. b. Find the p-value in (a) and interpret its meaning.
lypoth­
c. Construct and interpret a 99% confidence interval esti­
mately 10.28 Letnl 100,X, = 45,n2 = 50, and Xl = 25. mate for the difference between the proportion of males
portion a. At the 0.01 level of significance, is there evidence of a sig­ and females who enjoy shopping for clothing.
adren­ nificant difference between the two population proportions? d. What are your answers to (a) through (c) if 206 males
h. Construct a 99% confidence interval estimate for the dif­ enjoyed shopping for clothing?
ference between the two population proportions.
n 10.30 A study funded by the Massachusetts Institute of
APPLYING THE CONCEPTS Technology tested the notion that even when it comes to
'0 inde­ 10.29 A survey of 500 shoppers was taken in a large metro­ sugar pills, some people think a costly one works better than
ference politan area to determine various information about consumer a cheap one. Researchers randomly divided 82 healthy paid
behavior. Among the questions asked was, "Do you enjoy volunteers into two groups. All the volunteers thought they
,
360 CHAPTER 10 Two-Sample Tests
, I

would be testing a new pain reliever, One group was told the the Center for the Digital Future of the University of ~
; !
pain reliever they would be using cost $2.50 a pill, and the Southern California (data extracted from A. Mindlin, "Older
other group was told it cost only 10 cents a pil I. In reality, E-mail Users Favor Fast Replies," The Ne1\' York Times. July
the pills they were all about to take were simply sugar pills. 14, 2008, p. B3) reported that 70.7% of users over 70 years
The volunteers were given a light electric shock 011 the wrist. of age believe that e-mail messages should be answered
Then the volunteers were given a sugar pill, and a short time quickly as compared to 53.6% of users 12 to 50 years old.
later they were shocked again. Of the volunteers who took Suppose that the survey was based on 1,000 users over
the expensive pill, 35 of the 41 said they felt less pain after­ 70 years of age and 1,000 users 12 to 50 years old.
ward. Of the volunteers who took the cheap pill, 25 of the a. At the 0.0 I level of significance, is there evidence of a
41 said they felt less pain afterward (data extracted from significant difference between the two age groups that
R. Rubin, "Placebo Study Tests 'Costlier Is Better' Notion," believe that e-mail messages should be answered
www.usatoday.com. March 5,2008). quickly'?
a. Set up the null and alternative hypotheses to try to prove b. Find the p-value in (a) and interpret its meaning.
that people think an expensive pill works better than a
10.33 Are women more risk averse in the stock market?
cheap pill.
A sample of men and women were asked the following ques­
b. Conduct the hypothesis test defined in (a), using the 0.05
tion: "If both the stock market and a stock you owned
level of significance.
dropped 25% in three months, would you buy more shares
c. Does the result of your test in (b) make it appropriate to
while the price is low'?" (data extracted from "Women Are
claim that people think an expensive pill works better
More Risk Averse in the Stock Market," USA Today,
than a cheap pill?
September 25, 2006, p. I C). Of 965 women, 338 said yes.
10.31 Some people enjoy the anticipation of an upcoming Of 1,066 men, 554 said yes.
product or event and prefer to pay in advance and delay the a. At the 0.05 level of significance, is there evidence that 1
actual consumption/delivery date. In other cases, people do the proportion of women who would buy more shares J.'

not want a delay. An article in the Journal of Marketing while the price is low is less than the proportion of men? 4
1 '
Research reported on an experiment in which 50 individuals b. Find the p-value in (a) and interpret its meaning. 1
jl were told that they had just purchased a ticket to a concert
10.34 An experiment was conducted to study the choices
and 50 were told that they had just purchased a personal dig­
made in mutual fund selection. Undergraduate and MBA
I
ital assistant (PDA). The participants were then asked to
indicate their preferences for attending the concert or receiv­
students were presented with different S&P 500 index funds
that were identical except for fees. Suppose that 100 under­
ing the PDA. Did they prefer tonight or tomorrow, or would
i graduate students and 100 MBA students were selected.
I i they prefer to wait two to four weeks'? The individuals were
Partial results are shown in the following table:
II r
told to ignore their schedule constraints in order to better
I measure their willingness to delay the consumption/delivery
of their purchase. The following table gives partial results of STUDENT GROUP

the study:
FUND Undergraduate MBA

Concert PDA Highest-cost fund 27 18


Not-highest-cost fund 73 82
Tonight or tomorrow 28 47
Source: Data extractedfi·om.J Choi. D. Laibsoll. and B. Madrian, "Why
Two to four weeks 22 3 Does the La", ofOne Practice Fail? An Experiment on Mutual Funds,"
Total 50 50 www.som.yale.edu/faculty/jjc83/fees.pdf.
Source: Data adapted from 0. Amir and D. Ariel},. "Decisions by Rules:
The Case of Unwillingness to Pay for Beneficial Delays," Journal of a. At the 0.05 level of significance, is there evidence of a
Marketing Research, February 2007. Vol. XLII; pp. 142-152.
difference between undergraduate and MBA students in
the proportion who selected the highest-cost fund'?
a. What proportion of the participants would prefer delay­ b. Find the p-value in (a) and interpret its meaning.
ing the date of the concert?
10.35 Where people tum for news is different for various age
b. What proportion of the participants would prefer delay­
groups (data extracted from P. Johnson, "Young People Tum to
ing receipt of a new PDA?
the Web for News," USA Today, March 23, 2006, p. 9D).
c. Using the 0.05 level of significance, is there evidence of
Suppose that a study conducted on this issue was based on 200
a significant difference in the proportion willing to delay
respondents who were between the ages of 36 and 50 and 200
the date of the concert and the proportion willing to delay
respondents who were above age 50. Of the 200 respondents
receipt ofa new PDA'?
who were between the ages of 36 and 50, 82 got their news
rISEi:i=l
10.32 Do people of different age groups differ in primarily from newspapers. Of the 200 respondents who were
~ their response to e-mail messages? A survey by above age 50, 104 got their news primarily from newspapers.
10.4 FTest for the Ratio ofTwo Variances 361

of a. Is there evidence of a significant difference in the propor­ c. Construct and interpret a 95%, confidence interval esti­
Ider tion that get their news primarily from newspapers mate for the difference between the population propor­
July between those respondents 36 to 50 years old and those tion of respondents who get their news primarily from
~ars above 50 years old? (Use a 0.05.) newspapers between those respondents 36 to 50 years old
:red b. Determine the p-value in (a) and interpret its meaning. and those above 50 years old.
old.
IVer

)f a
that 10.. 4 FTest for the Ratio of Two Variances
:red Often you need to determine whether two independent populations have the same variability.
By testing variances, you can detect differences in the variability in two independent popula­
tions. One important reason to test for the difference between the variances of two populations
<et? is to determine whether to use the pooled-variance f test (which assumes equal variances) or
les­ the separate-variance f test (which does not assume equal variances) while comparing the
ned means of two independent populations.
:ires The test for the difference between the variances of two independent populations is based
Are on the ratio of the two sample variances. If you assume that each population is normally dis­
lay, tributed, then the ratio STJ S~ follows the F distribution (see Table E.5). The critical values of
yes. the F distribution in Table E.5 depend on the degrees of freedom in the two samples. The
degrees of freedom in the numerator of the ratio are for the first sample, and the degrees of
that freedom in the denominator are for the second sample. The first sample taken from the first
ues population is defined as the sample that has the larger sample variance. The second sample
len? taken from the second population is the sample with the smaller sample variance. Equation
(10.7) defines the F test for the ratio of two variances.
ices
lBA F-TEST STATISTIC FOR TESTING THE RATIO OF TWO VARIANCES
mds The FST4T test statistic is equal to the variance of sample I (the larger sample variance)
der­ divided by the variance of sample 2 (the smaller sample variance).
ted.
Sy
FSTAT =- (JO.7)
S~
where
IA
sf = variance of sample I (the larger sample variance)
S~ variance of sample 2 (the smaller sample variance)
Wh,r n] size of sample I
s,
n2 = size of sample 2

n] - = degrees of freedom from sample 1 (i.e., the numerator degrees offreedom)


ofa
= degrees of freedom from sample 2 (i.e., the denominator degrees of freedom)
ts in

The F!>TAT test statistic follows an F distribution with n] 1 and 112 - 1 degrees of
freedom.
; age
mto
mI. For a given level of significance, a, to test the null hypothesis of equality of population
200 variances:
200
lents
IC\\'S
against the alternative hypothesis that the two population variances are not equal:
,verc
. ..,.2 -/.. 2
H I'VI-r(f2
ers.
Chapter Re\'ie\,' Problems 369

b. Define a Type I and Type II error for the hypotheses in (a). a. At the 0.05 level of significance, is there a difference in
e:. What type of statistical test should you use? the variance of the study time for male students and
d. What assumptions are needed to perform the test you female students?
selected? b. Using the results of (a), which t test is appropriate for
e. Repeat (a) through (d) for research hypothesis 2. comparing the mean study time for male and female
students?
10.59 A study conducted in March 2009 found that about
c. At the 0.05 level of significance, conduct the test selected
half of U.S. adults trusted the U.S. government more than
in (b).
U.S. business to solve the economic problems of the United
d. Write a short summary of your findings.
States. However, when the population is subdivided by polit­
ical party affiliation, the results are very different. The study 10.62 Two professors wanted to study how students from
1.7) showed that 72% of Democrats trusted the government their two universities compared in their capabilities of using
more. but only 29% of Republicans trusted the government Excel spreadsheets in undergraduate information systems
more. Suppose that you are in charge of updating the study. courses (data extracted from H. Howe and M. G. Simkin,
You will take a national sample of Democrats and a national "Factors Affecting the Ability to Detect Spreadsheet Errors,"
sample of Republicans and then try to use the results to Decision Sciences Journal ofInnovative Education, January
show statistical evidence that the proportion of Democrats 2006, pp. 101-122). A comparison of the student demo­
trusting the government more than business is greater than graphics was also performed. One school is a state university
the proportion of Republicans trusting the government more in the western United States, and the other school is a state
than business. university in the eastern United States, The following table
a. What are the nu1l and alternative hypotheses? contains information regarding the ages of the students:
b. What is a Type I error in the context of this study?
c. What is a Type II error in the context of this study? Sample Standard

10.60 The American Society for Quality (ASQ) conducted School Size Mean Deviation

a salary survey of all its members. ASQ members work in Western 93 23.28 6.29

'0 all areas of manufacturing and service-related institutions, Eastern 135 21.16 1.32

with a common theme of an interest in quality. Two job titles


associated with high salaries are manager and master black
belt. (In Section 17.7, you will learn that a master black belt a. Using a 0.0 I level of significance, is there evidence of a
is a person who takes a leadership and training role in a Six difference in the variances of the age of students at the
Sigma quality improvement initiative.) Descriptive statistics western school and at the eastern school?
concerning salaries for these two job titles are given in the b. Discuss the practical implications of the test performed in
following table: (a). Address, specifically, the impact equal (or unequal)
variances in age has on teaching an undergraduate infor­
Standard mation systems course.
Sample
Job Title Size Mean Deviation c. To test for a difference in the mean age of students, is it
~dt most appropriate to use the pooled-variance t test or the
ns? Manager 2,228 85,551 24,109
separate-variance t test?
Master black belt 134 113,385 24,738
The following table contains information regarding the
Source: Data extracted from I. E. AI/en, "Sa/ary Survey: Seeing Greell," years of spreadsheet usage of the students:
Quality Progress, December 2008, pp. 20-53.
for
~tar
a. Using a 0,05 level of significance, is there a difference in Sample Standard

"A School Size Mean Years Deviation

,nie the variability of salaries between managers and master


black belts? Western 93 2.6 2.4

in
b. Based on the result of (a), which t test defined in Section Eastern 135 4.0 2.1

ess
10.1 is appropriate for comparing mean salaries?
~ct­
!ral
c. Using a 0.05 level of significance, is there a difference d. Using a 0.01 level of significance, is there evidence of a
between the mean salary of managers and the mean
difference in the variances of the years of spreadsheet
salary of master black belts?
the usage of students at the western school and at the eastern
10.61 Do male and female students study the same amount school?
lyS­ per week? In 2007, 58 sophomore business students were e. Based on the results of (d), use the most appropriate test
surveyed at a large university that has more than 1,000 soph­ to determine, at the 0.0 I level of significance, whether
ter­ omore business students each year. The file ~; con­ there is evidence of a difference in the mean years of
~rs. tains the gender and the number of hours spent studying in spreadsheet usage of students at the western school and
a typical week for the sampled students. at the eastern school.
J 3.2 Determining the Simple Linear Regression Equation 481

:Jp~blems for Section 13.2

tt~ :

"i~ LEARNING THE BASICS 13.5 Circulation is the lifeblood of the publishing busi­
" 13.1 Fitting a straight line to a set of data yields the follow­ ness. The larger the sales of a magazine, the more it can
ing prediction line: charge advertisers. However, a circulation gap has appeared
between the publishers' reports of magazines' newsstand
sales and subsequent audits by the Audit Bureau of
Circulations. The file t:!Jnnrrnfip contains the reported and
.,:\ •. Interpret the meaning of the Yintercept, hu. audited newsstand yearly sales (in thousands) for the
b. Interpret the meaning of the slope, b j • following 10 magazines:
c. Predict the value of Y for X = 3.
13.2 If the values of X in Problem 13.1 range from 2 to 25, Magazine Reported (X) Audited (J')
should you use this model to predict the mean value of Y YAt 621.0 299.6
o when X equals
CosmoGirl 359.7 207.7
. a.3?
Rosie 530.0 325.0
b. -3?
Playboy 492.1 336.3
c.O?
Esquire 70.5 48.6
d.24?
TeenPeopie 567.0 400.3
13.3 Fitting a straight line to a set of data yields the follow­ More 125.5 91.2
ing prediction line: Spin 50.6 39.1
Vogue 353.3 268.6
Vi = 16 - 0.5Xi Elle 263.6 214.3
8. Interpret the meaning of the Y intercept, boo Source: Data extracted from M. Rose, "In Fight for Ads, Publishers
Often Overstate Theil' Sales," The Wall Street Journal. August 6, 2003,
b. Interpret the meaning of the slope, b l • pp. AI, A 10.
c. Predict the value of Y for X = 6.

APPLYING THE CONCEPTS a. Construct a scatter plot.

For these data, bo 26.724 and hi = 0.5719.

[7S'ELF] 13.4 The marketing manager of a large super­


b. Interpret the meaning of the slope, hJ, in this problem.
_ market chain would like to use shelf space to
c. Predict the audited newsstand sales for a magazine that
predict the sales of pet food. A random sample of 12 equal­
reports newsstand sales of 400,000.
sized stores is selected, with the following results (stored in
~!): 13.6 The owner of a moving company typically has his
most experienced manager predict the total number of labor
hours that will be required to complete an upcoming move.
Store Shelf Space (X) (Feet) Weekly Sales (Y) ($) This approach has proved useful in the past, but the owner
I 5 160 has the business objective of developing a more accurate
2 5 220 method of predicting labor hours. In a preliminary effort to
3 5 140 provide a more accurate method, the owner has decided to
4 10 190 use the number of cubic feet moved as the independent vari­
5 10 240 able and has collected data for 36 moves in which the origin
6 10 260 and destination were within the borough of Manhattan in
7 15 230 New York City and in which the travel time was an insignif­
8 15 270 icant portion of the hours worked. The data are stored in
9 15 280 ~j.
10 20 260 a. Construct a scatter plot.
11 20 290 b. Assuming a linear relationship, use the least-squares
12 20 310 method to determine the regression coefficients bo and hi'
c. Interpret the meaning of the slope, h], in this problem.
d. Predict the labor hours for moving 500 cubic feet.
a. Construct a scatter plot.

For these data, bo = 145 and b j 7.4.


13.7 A critically important aspect of customer service in a
b. Interpret the meaning of the slope, b], in this problem.
supennarket is the waiting time at the checkout (defined as
c. Predict the weekly sales of pet food for stores with 8 feet the time the customer enters the line until he or she is served).
of shelf space for pet food. Data were collected during time periods in which a constant
482 CHAPTER 13 Simple Linear Regression

number of checkout counters were open. The total number of a. Construct a scatter plot.
customers in the store and the waiting times (in minutes) were b. Use the least-squares method to determine the regression
recorded. The results are stored in e!i-:L~'"'!' .. coefficients b o and b I.
a. Construct a scatter plot. c. Interpret the meaning of ho and hi in this problem.
b. Assuming a linear relationship, use the least-squares d. Predict the monthly rent for an apartment that has
method to determine the regression coefficients bo and bl' 1,000 square feet.
c. Interpret the meaning of the slope, hi, in this problem. e. Why would it not be appropriate to use the model to pre.
d. Predict the waiting time when there are 20 customers in dict the monthly rent for apartments that have 500 square
the store. feet?
f. Your friends Jim and Jennifer are considering signing a
13.8 The value of a sports franchise is directly related to
lease for an apartment in this residential neighborhoOd.
the amount of revenue that a franchise can generate. The file
They are trying to dec.ide between two apartments, one
i:):l;!m,;ll;. represents the value in 2009 (in millions of dol­
with 1,000 square feet for a monthly rent of $1,275 and
lars) and the annual revenue (in millions of dollars) for the
the other with 1,200 square feet for a monthly rem of
30 major league baseball franchises. Suppose you want to
$1,425. Based on (a) through (d), which apartment do
develop a simple linear regression model to predict fran­
you think is a better deal?
chise value based on annual revenue generated.
a. Construct a scatter plot 13.10 A company that holds the DVD distribution rights to
b. Use the least-squares method to determine the regression movies previously released only in theaters wants to esti·
coefficients ho and bl' mate sales of DVDs based on box office success. The file
c. Interpret the meaning of bo and b l in this problem. ~:.lists the box office gross (in $millions) for each of30
d. Predict the value of a baseball franchise that generates movies and the number of DVDs sold (in thousands). For
$200 million of annual revenue. these data,
a. construct a scatter plot.
13.9 An agent for a residential real estate company in a
b. assuming a linear relationship. use the least-squares
large city would like to be able to predict the monthly rental
method to determine the regression coefficients bQ and hi'
cost for apartments, based on the size of an apartment, as
c. interpret the meaning of the slope, bj, in this problem.
defined by square footage. The agent selects a sample of 25
d. predict the sales for a movie DVD that had a box office
apartments in a particular residential neighborhood and
gathers the data below (stored in m). gross of$75 million.

Table for Problem 13.9

Apartment Monthly Rent ($) Size (Sq. Feet) Apartment Monthly Rent ($) Size (Sq. Feet)
1 950 850 14 1,800 1,369
2 1,600 1,450 15 1,400 1,175
3 1,200 1,085 16 1,450 1,225
4 1,500 1,232 17 1,100 1,245
5 950 718 18 1,700 1,259
6 1,700 1,485 19 1,200 1,150
7 1,650 1,136 20 1,150 896
8 935 726 21 1,600 1,361
9 875 700 22 1,650 1,040
10 1,150 956 23 1,200 755
Il 1,400 1,100 24 800 1,000
12 1,650 1,285 25 1,750 1,200
13 2,300 1,985

13.3 Measures of Variation


When using the least-squares method to determine the regression eoefficients for a set of data,
you need to compute three measures of variation. The first measure, the total sum of squares
(SST), is a measure of variation of the Yj values around their mean, Y. The total variation, or
total sum of squares, is subdivided into explained variation and unexplained variation.
The explained variation, or regression sum of squares (SSR) , represents variation due to the
13.3 Measures of Variation 487

This standard error of the estimate, equal to 0.9664 millions of dollars (i.e., $966,400), is
labeled Standard Error in the Figure 13.8 worksheet results. The standard error of the estimate
represents a measure of the variation around the prediction line. It is measured in the same
units as the dependent variable Y. The interpretation of the standard error of the estimate is
similar to that of the standard deviation. Just as the standard deviation measures variability
around the mean, the standard error of the estimate measures variability around the prediction
line. For Sunflowers Apparel, the typical difference between actual annual sales at a store and
the predicted annual sales using the regression equation is approximately $966,400.

'[problems for Section 13.3


LEARNING THE BASICS c. How useful do you think this regression model is for pre­
. 13.11 How do you interpret a coefficient of determination, dicting labor hours?
. ,2, equal to 0.80'1 13.19 In Problem 13.7 on page 481, you used the number
13.12 If SSR 36 and SSE = 4, determine SST and then of customers to predict the waiting time at the checkout line
p'.• Nlml1Il1TP the coefficient of determination, ,.2, and interpret in a supermarket (stored in m·Ijj,J·1j!1t:;). Using the results
of that problem,
a. determine the coefficient of determination, ,.2, and inter­
13 If SSR = 66 and SST 88, compute the coefficient pret its meaning.
determination, r2, and interpret its meaning. h. determine the standard error of the estimate.
10 and SSR = 30, compute the coeffi­ c. How useful do you think this regression model is for
of determination, ,.2, and interpret its meaning. predicting the waiting time at the checkout line in a
supermarket?
.um 120, why is it impossible for SST to equal
e is 13.20 In Problem 13.8 on page 482, you used annual rev­
1eir enues to predict the value of a baseball franchise (stored in
,re­ I:Hi01.l.llfj). Using the results of that problem,
for APPLYING THE CONCEPTS a. determine the coefficient of determination, r2, and inter­
jic­ :~T/SEi:Fl13.16 In Problem 13.4 on page 481, the mar­ pret its meaning.
. II1II keting manager used shelf space for pet food h. determine the standard error of the estimate .
'om ;fto predict weekly sales (stored in CiljlS.. H). For those data, c. How useful do you think this regression model is for pre­
the ·'SSR = 20,535 and SST = 30,025. dicting the value of a baseball franchise?
sti­ . I. Determine the coefficient of determination, ,.2, and inter­
13.21 In Problem 13.9 on page 482, an agent for a real
tin pret its meaning.
estate company wanted to predict the monthly rent for apart­
the b. Determine the standard error of the estimate.
ments, based on the size of the apartment (stored in [Zt:).
c. How useful do you think this regression model is for
Using the results of that problem,
predicting sales?
a. determine the coefficient of determination. , and inter­
13.17 In Problem 13.5 on page 481, you used reported pret its meaning.
magazine newsstand sales to predict audited sales (stored in h. determine the standard error of the estimate.
Bnnmr:m). For those data, SSR 130,301.41 and SST = c. How useful do you think this regression model is for pre­
144,538.64. dicting the monthly rent?
a. Determine the coefficient of determination, r2, and inter­ d. Can you think of other variables that might explain the
pret its meaning. variation in monthly rent?
b. Determine the standard error of the estimate.
13.22 In Problem 13.10 on page 482, you used box office
c. How useful do you think this regression model is for pre­
gross to predict sales of DVDs (stored in ~.). Using the
dicting audited sales?
results of that problem,
13.18 In Problem 13.6 on page 481, an owner of a moving a. determine the coefficient of determination, ,.2, and inter­
company wanted to predict labor hours, based on the cubic pret its meaning.
67.
feet moved (stored in ~:i). Using the results of that h. determine the standard error of the estimate.
problem, c. How useful do you think this regression model is for pre­
a. determine the coefficient of determination, ,.2, and inter­ dicting sales of DVDs?
pret its meaning. d. Can you think of other variables that might explain the
b. determine the standard error of the estimate. variation in DVD sales?
502

Problems for Section 13.7


13.44 In I)n,blelll 136 on pagl' -li\ I. Ihc (l\\ ner \)) " 1ll0\
13.39 You arc testillg the null hypothe~i" that there I~ 110 COlllp:lilY \\,<Iilk'd to prcdict Jaho!' hours. based (\Ii IIII.' IlLlii•.
linear relatiol1ship bet\\cL'1l t\\() \ariabks. X and l. From bel' or cubic !'cl'l 1ll0\ cd. 'f he data arc stored ;;1
your sample of II = 10. yOll determine that,. O,S/). Lsing the result., of that problem.
a. What is the \alue oftlle I test statistic f\JIr" a. at the O.U5 le\cI of significance. is Illcre e\ J(it!ll'C of a
b. At the u 0,05 le\t~1 of significance. \\hat arc the criti­ linear relation~hip bL'lween the number of ulhic feel
cal \alues'! l11o\ed and labor hours'!
r. Based 011 your answers to (a) and (b). what statistical b. construct a 95°" cnnfidence intenal eqimClte \1111\e pop.
deci"ion should vou make') ulation slope. f3 I'

13.40 YOLIarc testing the null hypothesis that there is 110 13.45 In Problem 13.7 on page 4t( I, you ll~ecl thl' 11111llb~r
linear relationship between two \'ariables. X and r. From of cLlstomers 10 predict Ihe waiting time on the l'hcckout
your sample of II = 18. you determine that hi +4.5 <.md line. The data are storcd in , Lsing liic' re~ldts
SI> 1.5. of that problem.
a. Whal is Ihe value of f~l;r'? a. at the 0.05 le\el of significance. is there c,i,knce of a
b. At the a 0.05 level of significance, \vhat are the criti­ linear relationship between the number of cu..,tolllers and
cal values'? the waiting time on the checkout line?
c. Based on your answers to (a) and (b). what statistical b. construct a 9Y% confidence interval estimale of the pop·
decision should you make? ulation slupe. f3 J.
d. Construct a 95% confidence interval estimate of the pop­
13.46 In Problem 13.8 on page 482, YOLl used anllual rc\·
ulation slope, (31'
enues to predict the value of a baseball franchise. The data
13.41 You are testing the null hypothesis that there is no are stored in . Using the results of that j1mblem.
linear relationship between two variables, X and r. From a. at the 0.05 level of signi ficance, is there e\ ideI1ce of a
your sample of 11 = 20, YOll determine that SSR = 60 and linear relationship between annual revenue and frallchis~
SSE = 40. value?
a. What is the value of F'>Tn'! b. construct a 95~o confidence interval estimate or the pop­
b. At the a = 0.05 level of significance. what is the critical ulation slope, f3 I·
value? a
c. Based on your answers to (a) and (b). what statistical 13.47 In Problem 13.9 on page 482, an agent jill a real b
decision should you make? estate company wanted to predict the monthly rcnt 11.1r apart·
d. Compute the correlation coefficient by first computing ments. based on the size of the apartment. The data arc
1
,." and assuming that hi is negative. stored in : ' Using the results of that problem,
IT
e. At the 0.05 level of significance, is there a significant a. at the 0.05 level of significance. is there evidencc of a
linear relationship between the size of the apartment and R
correlation between )( and Y? f(
the monthly rent'?
m
b. construct a 95% confidence interval estimate of the por'
ulation slope. {31'
[ / SElF j13.42 In Problem 13.4 on page 481, the market­
'. ....,' Illg manager used shelf space for pet food to pre­ 13.48 In Problem 13.10 on page 482. you used bo\ lItTier:
dict weekly sales. The data are stored in . From the gross to predict the sales of DVDs. The data are 51tjf'cd iJ~
Lt
results of that problem. b I = 7.4 and S/, I = 1.59. . Using the results of that problem.
a. At the 0.05 level of significance. is there evidence of a a. at the 0.05 level of significance. is there e\ idelh'c of a
of
linear relationship between shelf space and sales? linear relationship between box office gross and "ales l1f
pro
b. Construct a 95% confidence interval estimate of the pop- DVDs?
h
u lation slope, {31' b. construct a 95% confidence interval estimate of till' por­
fo
ulation slope. (3 J. Jal
13.43 In Problem 13.5 on page 481. you used reported
magazine newsstand sales to predict audited sales. The data 13.49 The volatility of a stock is often measured by it,
are stored in • . Using the results of that problem. beta value. You can estimate the beta value of a stoc:k b) N
bl 0.5719 and 51>. = 0.0668. developing a simple linear regression model. using tilL' per­
a. At the 0.05 level of significance. is there evidence of a lin­ centage weekly change in the stock as the dependent "arl'
ear relationship between reported sales and audited sales? able and the percentage weekly change in a market indc'\ <1'
CI
b. Construct a 95%, confidence interval estimate oflhe pop­ the independent variable. The S&P 500 Index is a C01l1Jllon
ulation slope. {31. index to use. ror example, ifyoll wanted to estimate tilL' heW
13.7 Inferences Abollt the Slope and Correlation Coefficient 503

--­ Disney, you could use the following model, \vhich is


referred to as a market model:
The estimated market models for these funds are
approximately
(% weekly change in Disney) 130 (% weekly change in DXSLX) = 0.0 + 2.5
(% weekly change in the S&P 500)
+ 131 (% weekly change in S&P 500 index) + E
(% weekly change in DXHLX) 0.0 + 2
'·u
;;;;.rhe least-squares regression estimate of the slope hi is the (% weekly change in the Xinhua China 25)
:~'eSrimate of the beta value for Disney. A stock with a beta Thus, if the S&P 500 Index gains 10% over a period of time,
of a
:'JnJue of LO tends to move the same as the overall market A the leveraged mutual fund DXSLX gains approximately
feel
k with a beta value of 1.5 tends to move 50% more than 25%. On the downside, if the same index loses 20%,
ci"tiIe overall market, and a stock with a beta value of 0.6 tends DXSLX loses approximately 50%.
ito move only 60% as much as the overall market Stocks a. The objective of the Direxion Funds Small Cap Bull 2.5 x
"with negative beta values tend to move in a direction oppo­ fund DXRLX, is 250% of the performance of the Russell
flber ,site that of the overall market. The following table gives 2000 Index. What is its approximate market model?
(Qut some beta values for some widely held stocks, using a year's b. If the Russell 2000 Index gains 10% in a year, what return
mlts ",;worth of data ending in May, 2009. Note that in the first 10 do you expect DXRLX to have?
~inonths of this time frame the S&P 500 lost approximately c. If the Russell 2000 Index loses 20% in a year, what return
ofa '40% of its value and then rebounded by about 10% in the do you expect DXRLX to have?
and last 2 months: d. What type of investors should be attracted to leveraged
index funds? What type of investors should stay away
pop- Ticker Symbol Beta from these funds?

PG 0.54 13.51 The file mim.tml~ represent the calories and fat
rev- T 0.73 (in grams) of 16-ounce iced coffee drinks at Dunkin' Donuts
data DIS 1.10 and Starbucks:
~m,
AAPL 1.52
of a EBAY 1.69 Product Calories Fat
:hise F 2.86 Dunkin' Donuts Iced Mocha 240 8.0
Data extractedfromjinance.yahoo.com. May 27.2009.
Swirllatte (whole milk)
pop- . ',;',
Starbucks Coffee Frappuccino 260 3.5
'I. For each of the six companies, interpret the beta value. blended coffee
real b. How can investors use the beta value as a guide for Dunkin' Donuts Coffee 350 22.0
part­ investing? Coolatta (cream)
1 are Starbucks Iced Coffee Mocha 350 20.0
13.50 Index funds are mutual funds that try to mimic the Espresso (whole milk and
ofa movement of leading indexes, such as the S&P 500 or the whipped cream)
t and Russell 2000. The beta values (as described in Problem 13.49) Starbucks Mocha Frappuccino 420 16.0
for these funds are therefore approximately 1.0, and the esti­ blended coffee (whipped cream)
pop­ mated market models for these funds are approximately Starbucks Chocolate Brownie 510 22.0
(% weekly change in index fund) Frappuccino blended coffee
(whipped cream)
lffice 0.0 + 1.0 (% weekly change in the index) Starbucks Chocolate Frappuccino 530 19.0
cd in Blended Creme (whipped cream)
Leveraged index funds are designed to magnify the move­
ment of major indexes. Direxion Funds is a leading provider Source: Data extracted from "Coffee as Candy at Dunkin' Donuts and
, of a Starbllcks." Consumer Reports. JlIne 2004, p. 9.
ies of of leveraged index and other alternative-class mutual fund
Products for investment advisors and sophisticated investors.
Two of the company's most popular funds are shown in the a. Compute and interpret the coefficient of correlation, r.
pop' b. At the 0.05 level of significance, is there a significant lin­
following table (extracted from www.direxionfunds.com.
January 7, 2009). ear relationship between calories and fat?
'I' its 13.52 There are several methods for calculating fuel economy.
~k by Name Ticker Sym bol Description The file lII!Ib!ttiti (data shown on page 504) contains mileage
~ per- as calculated by owners and by current government standards:
van- S&P 500 Bull DXSLX 250% of the S&P a. Compute and interpret the coefficient of correlation, r.
lex as 2.5x Fund 500 Index b. At the 0.05 level of significance, is there a significant lin­
11mon China Bull DXHLX 200% of the Xinhua ear relationship between the mileage as calculated by
e bcta 2x Fund China 25 Index owners and by current government standards?
13.8 Estimation of Mean Values and Prediction oflndividual Valu~s 507

FIGURE 13.21 B
1 Confidence Interval Estimate and Prediction Interval
Confidence interval

estimate and prediction -----1'

s :'3· _-_-_-_-_-_-_-_-_-_-_-_-_-_-;;;.;-;;.;.-;...-_ -_-_-_-_-, -_-_-_-_-_-


Data
inteNal worksheet for the 4 X Value I 4
Sunflowers Apparel data 5 Confidence Level I 95%
6
7 Intermediate Calculations
Figure 13.21 displays the . 8 Sample Size 14 =COUNT(SLRData!A:A)
CIEandPI worksheet of the 9 Degrees of Freedom 11 =B8 -1
Simple Linear Regression 10 t Value 2.1788 =TlNV(l - B5, B9)
workbook. Create this 11 Sample Mean 2.9114 =AVERAGE(SLRData !A:A)
on­ worksheet using the 12 Sum of Squared Difference 37.9136 =DEVSO(SLRData!A:A)
instructions in Section 13 Standard Error of the Estimate 0.9664 =COMPUTE!B7
EG13.B. 14 h Statistic 0.1011 =1/B8 + (B4 - B11)"1/B11
15 Predicted Y (YHat) 7.6439 =TREND(SLRData!B2:B15, SLRData!Al:A15, B4)
·15
17 For Average Y
18 Interval Half Width 0.6718 =B10 • B13 • SORT(B14)

19
Confidence Interval Lower Limit 6.9711 =B15 - B18
20 Confidence Interval Upper Limit 8.3167 =B15 + B18

21

22
For Individual Response Y
.23 ·Interval Half Width 2.2104 =B10' B13 • SORT(l + B14)
24 Prediction Interval Lower Limit 5.4335 =B15 - B13
25 Prediction Interval Upper Limit 9.8544 =B15 + B13

IProblems for Section 13.8


LEARNING THE BASICS APPLYING THE CONCEPTS
13.55 Based on a sample of n = 20, the least-squares 13.57 In Problem 13.5 on page 481, you used reported
method was used to develop the following prediction line: sales to predict audited sales of magazines. The data are
Yj = 5 + 3Xj • In addition, stored in _rna",.
For these data SyX = 42.186 and
n hj = 0.108 when X = 400.
SYX = 1.0 X = 2 L(Xj - X)2 = 20 a. Construct a 95% confidence interval estimate of the
j=1
mean audited sales for magazines that report newsstand
B. Constructa 95% confidence interval estimate of the pop­ sales of 400,000.
ulation mean response for X = 2. b. Construct a 95% prediction interval of the audited sales
b. Construct a 95% prediction interval of an individual for an individual magazine that reports newsstand sales
response for X = 2. of 400,000.
c. Explain the difference in the results in (a) and (b).
13.56 Based on a sample of n = 20, the least-squares
1!1ethod was used to develop the following prediction line: I7SEi:i=l 13.58 In Problem 13.4 on page 481, the market­
Yj = 5 + 3Xj • In addition, .:mDl ing manager used shelf space for pet food to pre­
n dict weekly sales. The data are stored in ItUmJ. For these
SYX = 1.0 X = 2 L(Xi _X)2 = 20 data SyX = 30.81 and h j = 0.1373 when X = 8.
i=1 a. Construct a 95% confidence interval estimate of the
B. Construct a 95% confidence interval estimate of the pop­ mean weekly sales for all stores that have 8 feet of shelf
ulation mean response for X = 4. space for pet food.
b. Construct a 95% prediction interval of an individual b. Construct a 95% prediction interval of the weekly sales
response for X = 4. of an individual store that has 8 feet of shelf space for pet
c. Compare the results of (a) and (b) with those of Problem food.
13.47 (a) and (b). Which interval is wider? Why? c. Explain the difference in the results in (a) and (b).
Chapter Review Problems 515

Internet site one or more times between classes, the student a. Use the least-squares method to compute the regression
was given a I for that time period. Because there were 13 coefficients bo and b J •
time periods, a student's score on hit consistency could b. Interpret the meaning of bo and b l in this problem.
range from 0 to 13. c. Predict the delivery time for 150 cases of soft drink.
The other three variables included the student's course d. Should you use the model to predict the delivery time for
average, the student's cumulative grade point average a customer who is receiving 500 cases of soft drink? Why
(GPA), and the total number of hits the student had on the or why not?
Internet site supporting the course. The following table gives e. Determine the coefficient of determination, ,.2, and
the correlation coefficient for all pairs of variables. Note explain its meaning in this problem.
that correlations marked with an * are statistically signifi­ f. Perform a residual analysis. Is there any evidence of a
cant, using a = 0.001: pattern in the residuals? Explain.
g. At the 0.05 level of significance, is there evidence of a
linear relationship between delivery time and the number
Variable Correlation
of cases delivered?
Course Average, Cumulative OPA 0.72*
h. Construct a 95% confidence interval estimate of the
Course Average, Total Hits 0.08
mean delivery time for 150 cases of soft drink and a 95%
Course Average, Hit Consistency 0.37*
prediction interval of the delivery time for a single deliv­
Cumulative OPA, Total Hits 0.12
ery of 150 cases of soft drink.
Cumulative OPA, Hit Consistency 0.32*

0.64*
13.75 Mixed costs are very common in business and con­
Total Hits & Hit Consistency
sist of a fixed cost element and a variable cost element.
Source: Data extracted from D. Baugher, A. JfJranelli, and E. Weisbord.
Fixed costs are a recurring, constant cost that does not vary
"Student Hits in an Internet-Supported Course: How Can Instructors
Use Them and What Do They Mean? .. Decision Sciences Journal of when business activity varies. Variable costs are added costs
;Innovative Education, 1 (Fall 2003). 159-179. associated with each unit of business activity the organiza­
tion experiences. The relationship can be characterized by
the following equation:
a. What conclusions can you reach from this correlation
analysis?
Total costs = Fixed cost + (Cost per unit)
b. Are you surprised by the results, or are they consistent X (Number of units of business activity)
with your own observations and experiences? In a leading managerial accounting textbook, the authors
13.74 Management of a soft-drink bottling company has discuss a hospital's total maintenance costs and use regres­
the business objective of developing a method for allocating sion analysis to estimate the fixed-cost element of mainte­
delivery costs to customers. Although one cost clearly nance and the variable cost associated with the number of
relates to travel time within a particular route, another vari­ patient-days. The hospital's total maintenance costs and
able cost reflects the time required to unload the cases of number of patient-days for seven months are listed in the
soft drink at the delivery point. To begin, management following table and stored in IMMtilt!tltt
decided to develop a regression model to predict delivery
time based on the number of cases delivered. A sample of
20 deliveries within a territory was selected. The delivery Total Maintenance Costs Patient-Days
n times and the number of cases delivered were organized in $7,900 5,600
the following table (and stored in L:Iil1!l1WJ): $8,500 7,100
:.... $7,400 5,000
n Number Delivery I Number Delivery $8,200 6,500
of Time of Time $9,100 7,300
Customer Cases (Minutes) Customer Cases (Minutes) $9,800 8,000
52 32.1 43.0
$7,800 6,200
1 11 161
2 64 34.8 12 184 49.4 Source: Data extracted fium P C. Brewer, R. H. Garrison, and E. f[
)1 Noreen, Introduction to Managerial Accounting. 4th ed. (Boston:
3 73 36.2 13 202 57.2 McGraw-Hili Irwin, 2008).
d
4 85 37.8 14 218 56.8
:s 5 95 37.8 243
15 60.6
~-
6 103 39.7 16 254 61.2
r. a. Using total maintenance costs as the dependent variable
7 116 38.5 17 267 58.2
it and patient-days as the independent variable, use the
8 121 41.9 18 275 63.1
It least-squares method to find the regression coefficients
9 143 44.2 19 287 65.6
It
10 157 47.1 20 298 67.3
bo and bl'
l~ b. Which regression coefficient represents fixed cost?
516 ClL\I'TLR I Silllple: Lillcal Rcgn""joll

c, Which regression coefficient reprl''>l'llts Ihe \ariabk cost 13.78 The director of gradualL' studies at a large cullege of
per each palient-day') business would like to predict the grade point ii\erage (GPA)
d. Predict tOlal maintenance costs ft)r a month with 7,500 of students III an rvt BA program based on (iraduale
patient -days. l\1anagement Admission Tc~t (GMAT) score.·'\ ~all1ple of
20 students who ha\e completed 1\\'0 years in the program i,
13.76 You \\Jnt to de\elop a model to predict the selling
selected. The results are stored in I.
price of homes based on assessed value. A sample of 30
reccntly sold single-family houses in a small city is selected (Hint: First, determine which are the independent and
to study the relationship between selling price (in thousands dependent variables.)
of dollars) and assessed value (in thousands of dollars). The a. Construct a scatter plot and. assuming a linear relation.
houses in the city were reassessed at full value one year prior ship. use the least-squares method to computc the regres.
to the study. The results are in [t:r.r:l;. sion coefficients hlJ and hi'
b. Interpret the meaning of the }' intercept. ho• and the slope.
(Hint: First. determine which are the independent and
hi. in this problem,
dependent variables.)
c. Use the prediction line developed in (a) to predict the
a. Construct a scatter plot and. assuming a linear relation­
GPA for a student with a GMAT score of 600.
ship. use the least-squares method to compute the regres­
d. Determine the coetfi cient of determination, ,.~, and inter­
sion coefficients bo and b I'
pret its meaning in this problem.
b. Interpret the meaning of the Y intercept, bu, and the slope,
e. Perform a residual analysis on your results and evaluate
b l • in this problem.
the regression assumptions.
c. Use the prediction line developed in (a) to predict the
f. At the 0.05 level of significance. is there evidence of a
selling price for a house whose assessed value is
linear relationship between GMAT score and (iPA?
$170,000.
g. Construct a 95% confidence interval estimate of the
d. Determine the coefficient of determination. ,.2.
and inter­
mean OPA of students with a GMAT score of 600 and a
pret its meaning in this problem.
95% prediction interval of the GPA for a particular stu­
e. Perform a residual analysis on your results and evaluate
dent with a GMAT score of 600.
the regression assumptions.
h. Construct a 95% confidence interval estimate of the pop­
f. At the 0.05 level of significance, is there evidence of a
ulation slope.
linear relationship between selling price and assessed
value? 13.79 An accountant for a large department store would
g. Construct a 95% confidence interval estimate of the pop­ like to develop a model to predict the amount of time it takes
ulation slope. to process invoices. Data are collected from the past
32 working days, and the number of invoices processed and
13.77 You want to develop a model to predict the assessed
completion time (in hours) are stored in ~>.
value of houses. based on heating area. A sample of ]5 single­
family houses in a city is selected. The assessed value (in (Hint: First, determine which are the independent and
thousands of dollars) and the heating area of the houses (in dependent variables.)
thousands of square feet) are recorded; the results are stored a. Assuming a linear relationship, use the least-squares
in 1ml'!Il.iJ. method to compute the regression coefficients b" and hi'
b. Interpret the meaning of the Y intercept. bu. and the slope.
(Hint: First, determine which are the independent and
bJ, in this problem.
dependent variables.)
c. Use the prediction line developed in (a) to predict the 51
a. Construct a scatter plot and, assuming a linear relation­ d
amount of time it would take to process 150 invoices.
ship. use the least-squares method to compute the regres­
d. Determine the coefficient of determination, ,.2. and inter· 5(
sion coefficients b(j and b I' n
pret its meaning.
b. Interpret the meaning of the Y intercept, b(j, and the slope.
e. Plot the residuals against the number of ill\oices PI
bl, in this problem. Lt
processed and also against time.
c. Use the prediction line developed in (a) to predict the
f. Based on the plots in (e). does the model seem appropriate: (r
assessed value for a house whose heating area is 1,750
g. Based on the results in (e) and (f), what conclusions de
square feet.
can you make about the validity of the prediction made a.
d. Determine the coefficient of determination. r2, and inter­
in (c)?
pret its meaning in this problem.
t". Perform a residual analysis on your results and evaluate 13.80 On January 28. 1986, the space shuttle Challenger
the regression assumptions. exploded, and seven astronauts were killed. Prior to the b.
f. At the 0.05 level of significance. is there evidence of a launch. the predicted atmospheric temperature wa~ for
linear relationship between assessed value and heating freezing weather at the launch site. Engineers for Morton c.
area? Thiokol (the manufacturer of the rocket 111otor) prepared
532 CHAPTEI{ 14 IlltmdurtiolllO Multiple R.:gr.:"ioll

Problems For Section 14.1


LEARNING THE BASICS a. State the multiple regression equation.
14.1 For this problem, usc the following multiple regres­ b. Interpret the meaning of the slopes. hi and he. in this
sion equation: problem.
c. Explain why the regression coefficient, ho. has no practi­
Y, = 10 + 5X Ii + 3'\'21 cal meaning in the context of this problem.
a. Interpret the meaning of the slopes. d. Predict the monthly warehouse distribution cost When
b. Interpret the meaning of the Y intercept. sales are $400,000 and the number of orders is 4.500.
e. Construct a 95% confidence interval estimate for the
14.2 For this problem, use the following multiple regres­ mean monthly warehouse distribution cost when sales are
I

i
sion equation: S400.000 and the number of orders is 4.500.
f. Construct a 95% prediction interval for the monthly
IT, 50 - 2X" + 7X21
warehouse distribution cost for a particular month when
I
a. Interpret the meaning of the slopes. sales are $400,000 and the number of orders is 4,500.
b. Interpret the meaning of the Y intercept. g. Explain why the interval in (e) is narrower than the
I,
APPLYING THE CONCEPTS
interval in (f).
• 14.5 A consumer organization wants to develop a regres­
i 14.3 A shoe manufacturer is considering the development
I
of a new brand of running shoes. The business problem fac­ sion model to predict mileage (as measured by miles per
gallon) based on the horsepower of the car's engine and the
II
ing the marketing analyst is to determine which variables
should be used to predict durability (i,e., the effect of long­ weight of the car (in pounds). Data were collected from a
I
term impact), Two independent variables under consideration sample of 50 recent car models. and the results are organ­
ized and stored in ~.
are XI (FOREIMP). a measurement of the forefoot shock­
absorbing capability, and X2 (MIDSOLE), a measurement of a. State the multiple regression equation.
the change in impact properties over time, The dependent b. Interpret the meaning of the slopes, b 1 and /12, in this
variable Y is LTIMP, a measure of the shoe's durability after problem.

I
a repeated impact test. Data are collected from a random
sample of 15 types of currently manufactured running shoes,
c. Explain why the regression coefficient, b o, has no practical
meaning in the context of this problem.
with the following results: d. Predict the miles per gallon for cars that have 60 horsepower
and weigh 2,000 pounds.
e. Construct a 95% confidence interval estimate for the

II
Variable Coefficients
Standard
Error t Statistic
mean miles per gallon for cars that have 60 horsepower
and weigh 2,000 pounds.
f. Construct a 95% prediction interval for the miles per gal­
1
INTERCEPT -0.02686 0.06905 -0.39 0.7034 lon for an individual car that has 60 horsepov.er and
I FOREIMP 0.79116 0.06295 12.57 0.0000 weighs 2,000 pounds.
I' MIDSOLE 0.60484 0.07174 8.43 0.0000
j
I
I 14.6 The business problem facing a consumer products
company is to measure the effectiveness of different types of
advertising media in the promotion of its products. Specifically.
a. State the multiple regression equation.
the company is interested in the effectiveness of radio adver­
b. Interpret the meaning of the slopes, b l and b2 • in this
tising and newspaper advertising (including the cost of dis­
problem.
count coupons). Data were collected from a sample of22 cities ]
r7SE'lFl 14.4 A mail-order catalog business selling per­ with approximately equal populations selected for study dur­
BIii!lIl sonal computer supplies, software, and hardware ing a test period of one month. Each city is allocated a specific
maintains a centralized warehouse. Management is cur­ expenditure level for radio advertising and for newspaper
rently examining the process of distribution from the advertising. The sales of the product (in thousands of dollars)
warehouse. The business problem facing management and also the levels of media expenditure (in thousands of dol­
relates to the factors that affect warehouse distribution lars) during the test month are recorded, with the following
costs. Currently, a small handling fee is added to each results shown at the top of page 533 and stored in fJ!fI: l!j:
order, regardless of the amount of the order. Data col­ a. State the multiple regression equation.
lected over the past 24 months (stored in ~':) indi­ b. Interpret the meaning of the slopes. b l and b2 , in this
cate the warehouse distribution costs (in thousands of problem.
dollars), the sales (in thousands of dollars), and the number c. Interpret the meaning of the regression coefficient. Il,}.
of orders received. d. Which type of advertising is more effective? Explaill.
14.2 /). Adjusted 1'2, and the Overall FTes! 533

Data were collected for 26 weeks; these data are organized


Sales Radio Adverting Newspaper
and stored in ~:t.
($000) ($000) Advertising ($000)
a. State the multiple regression equation.
973 0 40 b. Interpret the meaning of the slopes, b l and b2• in this
I, I 19 0 40 problem.
875 25 25 c. Explain why the regression coefficient, bo, has no practi­
625 25 25 cal meaning in the context of this problem.
910 30 30 d. Predict the standby hours for a week in which the total
971 30 30 staff present have 310 people-days and the remote hours
931 35 35 are 400.
1,177 35 35 e. Construct a 95% confidence interval estimate for the
882 40 25 mean standby hours for weeks in which the total staff
982 40 25 present have 310 people-days and the remote hours are
y 1,628 45 45 400.
tl 1,577 45 45 f. Construct a 95% prediction interval for the standby hours
, 1,044 50 0 for a single week in which the total staff present have 310
914 50 0 people-days and the remote hours are 400.
1,329 55 25
14.8 Nassau County is located approximately 25 miles east
1,330 55 25
of New York City. The data organized and stored in
1,405 60 30
:r
1,436 60 30
rcvmM include the appraised value, land area of the prop­
Ie erty in acres, and age, in years, for a sample of 30 single­
1,521 65 35
a 1,74 I
family homes located in Glen Cove, a small city in Nassau
65 35
1­ County. Develop a multiple linear regression model to pre­
1,866 70 40
dict appraised value based on land area of the property and
1,717 70 40
age, in years.
is a. State the multiple regression equation.
b. Interpret the meaning of the slopes, bl and b2 , in this
al The business problem facing the director of broad­ problem.
operations for a television station was the issue of c. Explain why the regression coefficient, bo, has no practi­
er hours (i.e., hours in which unionized graphic artists cal meaning in the context of this problem.
at the station are paid but are not actually involved in any d. Predict the appraised value for a house that has a land
Ie '." activity) and what factors were related to standby hours. The area of 0.25 acres and is 45 years old.
er "~study included the following variables: e. Construct a 95% confidence interval estimate for the
~;·Standby hours (Y)-Total number of standby hours in a mean appraised value for houses that have a land area of
LI­
\ week 0.25 acres and are 45 years old.
Id
,Total staff present (XI ) -Weekly total of people-days f. Construct a 95% prediction interval estimate for the
iRemote hours (X2 )-Total number of hours worked by appraised value for an individual house that has a land
employees at locations away from the central plant area of 0.25 acres and is 45 years old.
ts

of

Iy,
~r-


es
Ir­ 14.2 r2, Adjusted r2, and the Overall FTest
k This section discusses three methods you can use to evaluate the overall multiple regression
er model: the coefficient of multiple determination, 1'2, the adjusted r2, and the overall Ftest.
's)
)1­
19 Coefficient of Multiple Determination
~: Recall from Section 13.3 that the coefficient of determination, ,.2, measures the proportion of
the variation in Y that is explained by the independent variable X in the simple linear regres­
Lis sion model. In multiple regression, the coefficient of multiple determination represents the
proportion of the variation in Y that is explained by the set of independent variables. Equation
(14.4) defines the coefficient of multiple determination for a multiple regression model with
two or more independent variables.
536 CHAPTER 14 Introduction to Multiple Regression

Figure 14.1 on page 530, the F.5TAT test statistic given in the ANOVA summary table is 48.4771
Because 48.4771 > 3.32, or because the p-value 0.000 < 0.05, you reject Ho and con~
elude that at least one of the independent variables (price and/or promotional expenditures) is
related to sales.

I Problems for Section 14.2 -


LEARNING THE BASICS studied 34 independent variables, such as team skills, diver.
14.9 The following ANOVA summary table is for a multi­ sity, meeting frequency, and clarity in expectations. For each
ple regression model with two independent variables: of the teams studie~ each of the variables was given a value
of I through 100, based on the results of interviews and sur.
vey data, where 100 represents the highest rating. The
Degrees of Sum of Mean
dependent variable, team performance, was also given a
Source Freedom Squares Squares F
value of 1 through 100, with 100 representing the highest
Regression 2 60 rating. Many different regression models were explored,
Error ~ 120 including the following:
Total 20 180
Model 1

a. Determine the regression mean square (MSR) and the Team perfonnance f30 + f31 (Team ski 11 ) + e,
mean square error (MS£). r;dj = 0.68
h. Compute the overall FSTAT test statistic.
Model 2
c. Determine whether there is a significant relationship
between Yand the two independent variables at the 0.05 Team performance 130 + 131 (Clarity in expectations) + e,
level of significance.
d. Compute the coefficient of multiple determination, r2,
r;dj = 0.78
and interpret its meaning. Model 3
e. Compute the adjusted r2.
Team performance = f30 + f31 (Team skills)
14.10 The following ANOVA summary table is for a mul­
+ f32 (Clarity in expectations) + e
tiple regression model with two independent variables:
ridj = 0.97
Degrees of Sum of Mean
Source Freedom Squares Squares F a. Interpret the adjusted r2 for each of the three models.
h. Which of these three models do you think is the best pre·
Regression 2 30 dictor of team performance?
Error 10 120
Total 12 150 14.12 In Problem 14.3 on page 532, you predicted the
durability of a brand of running shoe, based on the forefoot
shock-absorbing capability and the change in impact prop­
a. Determine the regression mean square (MSR) and the
erties over time. The regression analysis resulted in the
mean square error (MS£).
following ANOVA summary table:
h. Compute the overall FSTAT test statistic.
c. Determine whether there is a significant relationship
between Yand the two independent variables at the 0.05 Degrees of Sum of Mean
level of significance. Source Freedom Squares Squares F p-va.!!:.
d. Compute the coefficient of multiple determination, ,.2, Regression 2 12.61020 6.30510 97.69 0.0001
and interpret its meaning. Error 12 0.77453 0.06454
e. Compute the adjusted /).. Total 14 13.38473

APPLYING THE CONCEPTS a. Determine whether there is a significant relationship


14.11 Eileen M. Van Aken and Brian M. Kleiner, profes­ between durability and the two independent variables at
sors at Virginia Polytechnic Institute and State University, the 0.05 level of significance.
investigated the factors that contribute to the effectiveness h. Interpret the meaning of the p-value. 2
of teams [data extracted from "Detemlinants of Effectiveness c. Compute the coefficient of multiple determination, r
for Cross-Functional Organizational Design Teams," Quali~v and interpret its meaning.
Management Journal, 4 (1997), 51--79]. The researchers d. Compute the adjusted r2.
14.3 Residual Analysis for the Multiple Regression Mode! 537

13 In Problem 14.5 on page 532, you used horsepower b. interpret the meaning of the p-value.
weight to predict mileage (stored in ~~). Using the c. compute the coefficient of multiple determination, ,.2,
from that problem, and interpret its meaning.

determine whether there is a significant relationship d. compute the adjusted ,.2.

between mileage and the two independent variables

14.16 In Problem 14.6 on page 532, you used radio adver­


~ (horsepower and weight) at the 0.05 level of significance.
tising and newspaper advertising to predict sales (stored in
interpret the meaning of the p-value.
t£t'ttm'J). Using the results from that problem,
,compute the coefficient of multiple determination, r2,
a. determine whether there is a significant relationship
interpret its meaning.
between sales and the two independent variables (radio
compute the adjusted r2.
advertising and newspaper advertising) at the 0.05 level
14.14 In Problem 14.4 on page 532, you used of significance.
sales and number of orders to predict distribution b. interpret the meaning of the p-value.
at a mail-order catalog business (stored in ~). c. compute the coefficient of multiple determination, r2,
the results from that problem, and interpret its meaning.
determine whether there is a significant relationship d. compute the adjusted r2.
: between distribution costs and the two independent vari­
14.17 In Problem 14.8 on page 533, you used the land area
ables (sales and number of orders) at the 0.05 level of
of a property and the age of a house to predict appraised
.. significance.
interpret the meaning of the p-value.
value (stored in rmann. Using the results from that
problem,
compute the coefficient of multiple determination, r2,
a. determine whether there is a significant relationship
and interpret its meaning.
between appraised value and the two independent vari­
compute the adjusted r2.
ables (land area of a property and age of a house) at the
15 In Problem 14.7 on page 533, you used the total staff 0.05 level of significance.
and remote hours to predict standby hours (stored in b. interpret the meaning of the p-value.
. Using the results from that problem, c. compute the coefficient of multiple determination, r2,
determine whether there is a significant relationship and interpret its meaning.
between standby hours and the two independent variables d. compute the adjusted r2.
(total staff present and remote hours) at the 0.05 level of
significance.

Residual Analysis for the Multiple Regression Model


e­ In Section 13.5, you used residual analysis to evaluate the fit of the simple linear regression
model. For the multiple regression model with two independent variables, you need to con­
struct and analyze the following residual plots:
le
ot 1. Residuals versus Yj
p­ 2. Residuals versus Xli
ne 3. Residuals versus X 2i
4. Residuals versus time

The first residual plot examines the pattern of residuals versus the predicted values of Y.
If the residuals show a pattern for the predicted values of Y, there is evidence of a possible
-

:e
curvilinear effect (see Section 15.1) in at least one independent variable, a possible violation
of the assumption of equal variance (see Figure 13. I3 on page 491), andlor the need to
transform the Y variable.
The second and third residual plots involve the independent variables. Patterns in the plot
of the residuals versus an independent variable may indicate the existence of a curvilinear
,ip effect and, therefore, the need to add a curvilinear independent variable to the multiple regres­
at sion model (see Section 15.1). The fourth plot is used to investigate patterns in the residuals in
order to validate the independence assumption when the data are collected in time order.
Associated with this residual plot, as in Section 13.6, you can compute the Durbin-Watson sta­
tistic to determine the existence of positive autocorrelation among the residuals.
Figure 14.4 presents the residual plots for the OmniPower sales example. There is very lit­
tle or no pattern in the relationship between the residuals and the predicted value of Y, the
""~
-=.""""';.~ ,e

542 l j

h0bltJn)~, for Section 14Ji


C(l~h at a mail-order catalo,l! hu:-illC,' htlll','d III
14.23 l'~(' lh\.' fuliu'ling II1r\)rll1~jtl(lJ1 from a ll1ultipk Lsillg the J"e.oults Ij'om that problem.
rcgn:ssioll analysis: a. cOIl,lrul.'l a 95"" cI)nfidellce inlcnales(il1l<llC I': the P('P_
ulatl011 ~lopl' Ql'I\\l'CIl dhtnbuliol1 cost and ,ak,
b. at the 0.05 Ic\d of significam:e. detcrminc \\ hclh~r
a. Vvhich \ariable has the largest slope. 111 units of a each independent variabll' makes a significClIll 'Olltrib,;.
I statistic?
tion to the regression modd. On the ba,i, oj'lhc,c
b. COllstruct a 95'% confidence interval estimate of the results. indicate the independent variables tn Illliuuc Iii
this model.
population slope. f31'
c. At the 0.05 leve 1 of significance, determ inc whether 14.27 In Problem 14.5 on page 532. you used 1l\lh\.'polI~r
cach independent variable makes a significant contribu­ and weight to predict mileage (stored in ). l'lllg the
tion to the regression model. On the basis of these results from that problem,
results, indicate the independent variables to include in a. construct a 95% confidence interval estin];!!c' of the
this model. population slope between mileage and horsep(l\\er.
14.24 Use the following information from a multiple b. at the 0.05 level of significance. determilll' whether
regression analysis: each independent variable makes a significant l'ulltribu­
tion to the regression model. On the basis of these
11 = 20 bl "" 4 he = 3 5", = 1.2 She 0.8 results, indicate the independent variables to include in
this model.
a. Which variable has the largest slope. in units of a I statistic?
b. Construct a 95°/c) confidence interval estimate of the pop­ 14.28 In Problem 14.6 on page 532, you used radil' adler­
ulatioll slope, Pl' tising and newspaper advertising to predict sales {~tored in
c. At the 0.05 level of significance, determine whether each ). Using the results from that problem.
independent variable makes a significant contribution to a. construct a 95% confidence interval estimate 01 tile pop­
the regression model. On the basis of these results, indi­ ulation slope between sales and radio advertising
cate the independent variables to include in this model. h. at the 0.05 level of significance, determine whethl'r each
independent variable makes a significant contn hution \(l
the regression model. On the basis of these results. indi­
14.25 In Problem 14.3 on page 532, you predicted the cate the independent variables to include in this model.
durability of a brand of running shoe, based on the forefoot 14.29 In Problem 14.7 on page 533, you used the total
shock-absorbing capability (FOREIMP) and the change in number of staff present and remote hours to predict standb~
impact properties over time (MIDSOLE) for a sample of 15 hours (stored in ). Using the results fWIll that
pairs of shoes. Use the following results: problem,
a. construct a 95% confidence interval estimate of the pop­
ulation slope between standby hours and total number of
Standard

staff present.
Variable Coefficient Error f Statistic

b. at the 0.05 level of significance, determine whether each


INTERCEPT -0.02686 0.06905 -0.39 0.7034 independent variable makes a significant contribution to
FOREIMP 0.79116 0.06295 12.57 0.0000 the regression model. On the basis of these rt'slllh, indi­
MIDSOLE 0.60484 0.07174 8.43 0.0000 cate the independent variables to include in thi~ Ilwdel.
14.30 In Problem 14.8 on page 533, you used land :1rea or
a property and age of a house to predict appraised lalul?
a. Construct a 95% confidence interval estimate of the pop­
(stored in ). Using the results from that prob!cnl.
ulation slope between durability and forefoot shock­
a. construct a 95% confidence interval estimate of the pop­
absorbing capability.
ulation slope between appraised value and land ar\.'<l OLI
b. At the 0.05 level of significance, determine whether each
property.
independent variable makes a significant contribution to
b. at the 0.05 level of significance, determine \\ hi.?ther
the regression model. On the basis of these results, indi­
each independent variable makes a significant cOIlll'ibu­
cate the independent variables to include in this model.
tion to the regression model. On the basis (1 f these
r;sEi:F] 14.26 In Problem 14.4 on page 532, you used results, indicate the independent variables to include in
~ sales and number of orders to predict distribution this model.
14.6 Using Dummy Variables and Interaction Terms in Regression Models 555
score and whether the student received a grade of B or higher stores. The results are shown in the following table (and
in the introductory statistics course (0 = no and I = yes~. organized and stored in ~):
Explain the steps involved in developing a regreSSIOn
a. . I
model for these data. Be sure to indicate the partlcu ar Shelf Space Weekly Sales
models you need to evaluate and compare. Store (Feet) Location (Dollars)
Suppose the regression coefficient for the variable
whether the student received a grade of B or higher in 1 5 Back 160
the introductory statistics course is +0.30. How do you 2 5 Front 220
interpret this result? 3 5 Back 140
4 10 Back 190
14.40 A real estate association in a suburban community 5 to Back 240
would like to study the relationship between the size of a 6 10 Front 260
single-family house (as measured by the number of rooms) 7 15 Back 230
. and the selling price of the house (in thousands of dollars). 8 15 Back 270
Two different neighborhoods are included in the study, one 9 15 Front 280
on the east side of the community (=0) and the other on the 10 20 Back 260
west side (= I). A random sample of 20 houses was selected, 11 20 Back 290
. with the results stored in rmtrWr!1A. For (a) through (k), do 12 20 Front 310
not include an interaction term.
a. State the multiple regression equation that predicts the
For (a) through (m), do not include an interaction term.
selling price, based on the number of rooms and the
a. State the multiple regression equation that predicts sales
neighborhood.
based on shelf space and location.
b. Interpret the regression coefficients in (a).
b. Interpret the regression coefficients in (a).
c. Predict the selling price for a house with nine rooms that is
c. Predict the weekly sales of pet food for a store with 8
located in an east-side neighborhood. Construct a 95%
feet of shelf space situated at the back of the aisle.
confidence interval estimate and a 95% prediction interval.
Construct a 95% confidence interval estimate and a 95%
d. Perform a residual analysis on the results and determine
prediction interval. .
whether the regression assumptions are valid.
d. Perform a residual analysis on the results and determme
e. Is there a significant relationship between selling price
whether the regression assumptions are valid.
and the two independent variables (rooms and neighbor­
e. Is there a significant relationship between sales and the
hood) at the 0.05 level of significance?
two independent variables (shelf space and aisle posi­
r. At the 0.05 level of significance, determine whether
tion) at the 0.05 level of significance?
each independent variable makes a contribution to the
f. At the 0.05 level of significance, determine whether
regression model. Indicate the most appropriate regres­
each independent variable makes a contribution to the
sion model for this set of data.
regression model. Indicate the most appropriate regres­
.g. Construct and interpret a 95% confidence interval esti­
sion model for this set of data.
mate of the population slope for the relationship between
g. Construct and interpret 95% confidence interval estimates
selling price and number of rooms.
of the population slope for the relationship between sales
h. Construct and interpret a 95% confidence interval esti­
and shelf space and between sales and aisle location.
mate of the population slope for the relationship between
h. Compare the slope in (b) with the slope for the simple
selling price and neighborhood.
linear regression model of Problem 13.4 on page 481.
i. Compute and interpret the adjusted r2.
Explain the difference in the results.
j. Compute the coefficients of partial determination and
i. Compute and interpret the meaning of the coefficient of
interpret their meaning.
multiple determination, ,.2.
k. What assumption do you need to make about the slope
j. Compute and interpret the adjusted r2.
of selling price with number of rooms?
k. Compare r2 with the r2 value computed in Problem
l. Add an interaction term to the model and, at the 0.05
13.16(a) on page 487.
level of significance, determine whether it makes a sig­
l. Compute the coefficients of partial determination and
nificant contribution to the model.
interpret their meaning. .
In. On the basis of the results of (f) and (I), which model is
m. What assumption about the slope of shelf space With
most appropriate? Explain.
sales do you need to make in this problem?
14.41 The marketing manager of a large supermarket n. Add an interaction term to the model and, at the 0.05
chain faced the business problem of determining the effect level of significance, determine whether it makes a sig­
On the sales of pet food of shelf space and whether the prod­ nificant contribution to the model.
uct was placed at the front I) or back (=0) of the aisle. o. On the basis of the results of (D and (n), which model is
Data are collected from a random sample of 12 equal-sized most appropriate? Explain.
14.6 Using Dummy Variables and Interaction Terms in Regression Models 557

ate 14.45 Zagat's publishes restaurant ratings for various loca­ 14.47 In Problem 14.5 on page 532, horsepower and
t to .',.,.uu,,'- in the United States. The file IjttH$~ contains the weight were used to predict miles per gallon (stored in
I to .' Zagat rating for food, decor, service, and cost per person for 1'J!.i::m.).Develop a regression model that includes horse­
~ is .'. a sample of 50 restaurants located in a city and 50 restau­ power, weight, and the interaction of horsepower and weight
ent .':rants located in a suburb. Develop a regression model to to predict miles per gallon.
the ':predict the cost per person, based on a variable that repre­ a. At the 0.05 level of significance, is there evidence that
:tan i.sents the sum of the ratings for food, decor, and service and the interaction term makes a significant contribution to
:ant ~ dummy variable concerning location (city vs. suburban). the model?
and (a) through (m), do not include an interaction term. b. Which regression model is more appropriate, the one used
an Source: Extracted from Zagat Survey 2008 New York City Restaurants in this problem or the one used in Problem 14.5? Explain.
'ond Zagat Survey 2007-2008 Long Island Restaurants.
jng
14.48 In Problem 14.7 on page 533, you used total staff
a. State the multiple regression equation. present and remote hours to predict standby hours (stored in
and b. Interpret the regression coefficients in (a). lj'r!'jl'J!fi). Develop a regression model to predict standby
.' c. Predict the cost for a restaurant with a summated rating hours that includes total staff present, remote hours, and the
of 60 that is located in a city and construct a 95% confi­ interaction of total staff present and remote hours.
1 an dence interval estimate and a 95% prediction interval.
a. At the 0.05 level of significance, is there evidence that
ct a . d. Perform a residual analysis on the results and determine the interaction term makes a significant contribution to
tion whether the regression assumptions are satisfied. the model?
;'e. Is there a significant relationship between price and the b. Which regression model is more appropriate, the one used
line q two independent variables (summated rating and loca­
in this problem or the one used in Problem 14.7? Explain.
tion) at the 0.05 level of significance?

)Urs At the 0.05 level of significance, determine whether


14.49 The director of a training program for a large insur­
and each independent variable makes a contribution to the
ance company has the business objective of determining
~) at regression model. Indicate the most appropriate regres­
which training method is best for training underwriters. The
sion model for this set of data.
three methods to be evaluated are traditional, CD-ROM
ther Construct a 95% confidence interval estimate of the
based, and Web based. The 30 trainees are divided into three
the population slope for the relationship between cost and
randomly assigned groups of 10. Before the start of the
summated rating.
training, each trainee is given a proficiency exam that mea­
Compare the slope in (b) with the slope for the simple
sures mathematics and computer skills. At the end of the
. the linear regression model of Problem 13.90 on page 518.
training, all students take the same end-of-training exam.
ours Explain the difference in the results.
The results are organized and stored in liCtitumm.
Compute and interpret the meaning of the coefficient of
Develop a multiple regression model to predict the score
. the multiple determination.
on the end-of-training exam, based on the score on the pro­
lfan Compute and interpret the adjusted r2.
ficiency exam and the method of training used. For (a)
Compare r2 with the ,.2 value computed in Problem
through (k), do not include an interaction term.
13.90 (d) on page 518. a. State the multiple regression equation.
and t. Compute the coefficients of partial determination and b. Interpret the regression coefficients in (a).
interpret their meaning. c. Predict the end-of-training exam score for a student with
:Iope m. What assumption about the slope of cost with summated a proficiency exam score of 100 who had Web-based
rating do you need to make in this problem? training.
0.05 n. Add an interaction term to the model and, at the 0.05 d. Perform a residual analysis on your results and deter­
sig­ level of significance, determine whether it makes a sig­ mine whether the regression assumptions are valid.
nificant contribution to the model. e. Is there a significant relationship between the end­
lei is o. On the basis of the results of (f) and (n), which model is of-training exam score and the independent variables
most appropriate? Explain. (proficiency score and training method) at the 0.05 level
of significance?
used 14.46 In Problem 14.6 on page 532, you used radio adver­ f. At the 0.05 level of significance, determine whether
tored tising and newspaper advertising to predict sales (stored in each independent variable makes a contribution to the
.istri­ PlMmm). Develop a regression model to predict sales that regression model. Indicate the most appropriate regres­
::m of includes radio advertising, newspaper advertising, and the sion model for this set of data.
interaction of radio advertising and newspaper advertising. g. Construct and interpret 95% confidence interval esti­
: that a. At the 0.05 level of significance, is there evidence that mate ofthe population slope for the relationship between
)n to the interaction term makes a significant contribution to end-of-training exam score and proficiency exam.
the model? h. Construct and interpret 95% confidence interval esti­
: one b. Which regression model is more appropriate, the one used mates of the population slope for the relationship between
n. in this problem or the one used in Problem 14.6? Explain. end-of-training exam score and type of training method.
CHAPTER 2 SOLUTIONS

2.32 (a) Electricity Costs Frequency Percentage


$80 to $99 4 8%
$100 to $119 7 14
$120 to $139 9 18
$140 to $159 13 26
$160 to $179 9 18
$180 to $199 5 10
$200 to $219 3 6

(c) The majority of utility charges are clustered between $120 and $180.

2.99 (a)
Range Frequency Percentage
0 but less than 25 17 34%
25 but less than 50 19 38%
50 but less than 75 5 10%
75 but less than 100 2 4%
100 but less than 125 3 6%
125 but less than 150 2 4%
150 but less than 175 2 4%

(b)
Histogram

20
18
16
y 14
c 12
n
e
u 10
q
e
r 8
F 6
4
2
0
0 but less 25 but 50 but 75 but 100 but 125 but 150 but
than 25 less than less than less than less than less than less than
50 75 100 125 150 175
Days

(d) You should tell the president of the company that over half of the complaints are
resolved within a month, but point out that some complaints take as long as three
or four months to settle.
CHAPTER 3 SOLUTIONS

3.7 For the finance majors, the mean and median starting salary are the same. Hence, the
distribution of starting salary is symmetrical. The average scatter is $10,000. For the
information systems graduates, the mean starting salary is greater than the median
starting salary. Hence, the distribution is right-skewed. The average scatter is $37,000,
which is almost 4 times higher than that of the finance majors.

3.10 (a), (b)


Price
Mean 36.5333
Median 35.6000
Mode #N/A
Standard Deviation 4.3896
Sample Variance 19.2687
Range 12.0000
Minimum 31.0000
Maximum 43.0000
Sum 219.2000
Count 6
First Quartile 33.7500
Third Quartile 40.2500
Interquartile Range 6.5000
Coefficient of Variation 12.0154%
(c) The mean is only slightly larger than the median, so the data are only slightly
right-skewed.
(d) The mean cost is $36.53 and the median cost is $35.60. The average scatter of
cost around the mean is $4.39. The difference between the highest and the lowest
cost is $12.

3.15 (a)
Money Market Five-Year CD

Standard Deviation 0.101637 Standard Deviation 0.094974


Sample Variance 0.01033 Sample Variance 0.00902
Range 0.23 Range 0.2
Coefficient of Variation 4.79% Coefficient of Variation 2.64%
(b) The money market accounts have more variation in the highest yields offered
compared to the five-year CD because the standard deviation, variance, range
and coefficient of variation are all higher for money market accounts.
CHAPTER 3 SOLUTIONS (continued)

3.17 Excel output:


Waiting Time

Mean 4.286667
Standard Error 0.422926
Median 4.5
Mode #N/A
Standard Deviation 1.637985
Sample Variance 2.682995
Kurtosis 0.832925
Skewness -0.83295
Range 6.08
Minimum 0.38
Maximum 6.46
Sum 64.3
Count 15
First Quartile 3.2
Third Quartile 5.55
Interquartile Range 2.35
Coefficient of Variation 38.2112%

(a) Mean = 4.287 Median = 4.5


(b) Variance = 2.683 Standard deviation = 1.638 Range = 6.08
Coefficient of variation = 38.21%
Z scores: 0.05, 0.77, 0.77, 0.51, 0.30, 1.19, 0.46, 0.66, 0.13, 1.11, 2.39,
0.51, 1.33, 1.16, 0.30
There are no outliers.
(c) Since the mean is less than the median, the distribution is left-skewed.
(d) The mean and median are both under 5 minutes and the distribution is left-
skewed, meaning that there are more unusually low observations than
there are high observations. But six of the 15 bank customers sampled (or
40%) had wait times in excess of 5 minutes. So, although the customer is
more likely to be served in less than 5 minutes, the manager may have
been overconfident in responding that the customer would “almost
certainly” not wait longer than 5 minutes for service.
CHAPTER 3 SOLUTIONS (continued)

3.40 (a) 68% (b) 95%


(c) within  1: can’t calculate since for chebyshev’s rule k >1.
1 1
within  2: 1  2
 1  2  .75 or 75%
k 2
1 1
within  3: 1  2  1  2  .8889 or 88.89%
k 3
1
(d) set 1  2  .9375 , solve for k and find k=4.
k
Use   4 : 8.20-4(2.75)=-2.8 and 8.20+4(2.75)=19.2
So at least 93.75% of these funds are expected to have one-year total returns btwn -2.8
and 19.2.

3.45 (a) The study suggests that time spent on Facebook and grade point average are
negatively correlated.
(b) There could be a cause-and-effect relationship between time spent on Facebook
and grade point average. The more time spent on Facebook, the less time a
student would have available for study and, hence, results in lower grade point
average holding constant all the other factors that could have affected grade point
average.

 X i  X Yi  Y 
3550
3.46 (a) cov  X , Y   i 1
  591.6667
n 1 6
cov  X , Y  591.6667
(b) r   0.7196
S X SY 113.1371 7.2678
(c) The correlation coefficient is more valuable for expressing the relationship
between calories and fat because it does not depend on the units used to measure
calories and fat.
(d) There is a strong positive linear relationship between calories and fat.
CHAPTER 4 SOLUTIONS
4.8 (a) “Makes less than $50,000”.
(b) “Makes less than $50,000 and tax code is unfair”.
(c) The complement of “tax code is fair” is “tax code is unfair”.
(d) “Tax code is fair and makes less than $50,000” is a joint event because it
consists of two characteristics or attributes.

4.9 (a) P(tax code is unfair) = 600/1005 = 0.5970


(b) P(tax code is unfair and makes less than $50,000) = 280/1005 = 0.2786
(c) P(tax code is unfair or makes less than $50,000) = (600+505-280)/1005 = 0.8209
(d) The probability of “tax code is unfair or makes less than $50,000” includes the
probability of “tax code is unfair”, the probability of “makes less than $50,000”,
minus the joint probability of “tax code is unfair and makes less than $50,000”.

4.13 FOR PARTS (a)-(c) YOU CAN SET UP A CONTINGENCY TABLE


UNDER 50 OVER 50
NEWSPAPERS 82/400 104/400 186/400
notNEWSPAPERS 118/400 96/400 214/400
200/400 200/400 400/400

(a) P(N) = 186/400 = .465


(b) P(N∩O) = 104/400 = .26
(c) P(N or O) = P(N) + P(O) – P(N∩O) = 186/400 + 200/400 - 104/400 = .705
or use: P(N or O) = P(N∩O) + P(N∩O’) + P(N’∩O) = 104/400 + 82/400 + 96/400 = .705
(d) For part b we want the probability that (someone got news primarily from newspapers
and is over 50). For part c we want the probability that (someone got news primarily
from newspapers or is over 50). So with part c we have to include the probability that
(someone got news primarily from newspapers and is over 50), the probability that
(someone got news primarily from newspapers and is not over 50) and the probability
that (someone did not get news primarily from newspapers and is over 50).

4.21 USE CONTINGENCY TABLE FROM 4.13


(a) P(N/O) = P(N∩O)/P(O) = (104/400)/(200/400) = 104/200 = .52
(b) P(O/N) = P(N∩O)/P(N) = (104/400)/(186/400) = 104/186 = .5591
(c) The conditional events are reversed.
(d) Since P(newspapers | age over 50) = 0.52 is not equal to P(newspapers) = 0.465, the two
events, “respondent is above age 50” and “whether he or she gets their news
primarily from newspapers”, are not statistically independent.

4.26 Set up a contingency table, Let W=needs warranty related repairs, U=United States.
Given P(W)=.04, P(U)=.60, P(W∩U)=.025, use information to fill in table:
U U’
W .025 .015 .04
W’ .575 .385 .96
.60 .40 1

(a) P(W/U) = P(W∩U)/P(U) = .025/.6 = .0417


(b) P(W/U’) = P(W∩U’)/P(U’) = .015/.4 = .0375
(c) P(W/U)=.0417 is not equal to P(W)=.04, so the two events are not statistically independent.
CHAPTER 4 SOLUTIONS (continued)

4 3 12 1
4.28 (a) P(both queens) =     0.0045
52 51 2,652 221
4 8 32 8
(b) P(10 followed by 5 or 6) =     0.012
52 51 2,652 663
4 4 16 1
(c) P(both queens) =     0.0059
52 52 2,704 169
16 4 4 16 128 32
(d) P(blackjack) =       0.0483
52 51 52 51 2,652 663
CHAPTER 5 SOLUTIONS

5.3 (a) Based on the fact that the odds of winning are expressed out with a base of
31,478, you will think that the automobile dealership sent out 31,478 fliers.
(b)    iN1 X i P X i  = $ 5.49
(c)    iN1  X i  E  X i 2 P X i  = $ 84.56
(d) The total cost of the prizes is $15,000 + $500 + 31,476 * $5 = $172,880.
Assuming that the cost of producing the fliers is negligible, the cost of reaching a
single customer is $172,880/31478 = $5.49. The effectiveness of the promotion
will depend on how many customers will show up in the show room.

5.4 Here are the 36 different possible outcomes when you roll dice:
Under 7: (1,1) (1,2) (2,1) (2,2) (1,3) (3,1) (1,4) (4,1) (2,3) (3,2) (1,5) (5,1) (2,4) (4,2) (3,3)
Equal 7: (1,6) (6,1) (2,5) (5,2) (3,4) (4,3)
Over 7: (2,6) (6,2) (3,5) (5,3) (4,4) (3,6) (6,3) (4,5) (5,4) (4,6) (6,4) (5,5) (5,6) (6,5) (6,6)

(a) X P (X) (b) X P(X) (c) X P(X)


$ -1 21/36 $ -1 21/36 $ -1 30/36
$ +1 15/36 $+1 15/36 $ +4 6/36

(d) For (a): E(X) = (-1)(21/36) + (+1)(15/36) = -0.167


For (b): E(X) = (-1)(21/36) + (+1)(15/36) = -0.167
For (c): E(X) = (-1)(30/36) + (+4)(6/36) = -0.167
$ -0.167 for each method of play

5.10 (a) E(total time) = E(time waiting) + E(time served) = 4 + 5.5 = 9.5 minutes
(b)  (total time) = 1.22  1.52  1.9209 minutes

5.11 (a) E(P) = 0.3(65) + 0.7(35) = $44


 P  (0.3) 2 (37,525)  (0.7) 2 (11, 025)  2(0.3)(0.7)(19,275)  $26.15
P 26.15
CV   100%   59.44%
E  P 44
(b) E(P) = 0.7(65) + 0.3(35) = $ 56
 P  (0.7) 2 (37,525)  (0.3) 2 (11, 025)  2(0.7)(0.3)(19,275)  $106.23
P 106.23
CV   100%   189.69%
E  P 56
(c) Investing 30% in the Dow Jones index and 70% in the weak-economy fund will
yield the lowest risk per unit average return at 59.44%. This will be the
investment recommendation if you are a risk averse investor.
CHAPTER 5 SOLUTIONS

5.12
X P(X) XP(X) (X-E(X))2P(X)
-100 0.1 -10 2528.1
0 0.3 0 1044.3
80 0.3 24 132.3
150 0.3 45 2484.3
59 6189.0

Y P(Y) YP(Y) (Y-E(Y))2P(Y)


50 0.1 5 129.6
150 0.3 45 5548.8
-20 0.3 -6 346.8
-100 0.3 -30 3898.8
14 9924

(a) E(X)=59, E(Y)=14


(b) Var(X)=6189, Stdev(X)=78.67, Var(Y)=9924, Stdev(Y)=99.62
(c)
(X-E(X)) (Y-E(Y)) P(X,Y) (X-E(X))(Y-E(Y))P(X,Y)
-159 36 0.1 -572.4
-59 136 0.3 -2407.2
21 -34 0.3 -214.2
91 -114 0.3 -3112.2
-6306.0

Cov(X,Y) = ( X  E ( X ))(Y  E (Y )) P ( X , Y ) = -6306
X ,Y
(d) Stock X gives the investor a lower standard deviation while yielding a higher expected
return so the investor should select stock X.

5.13
(a) E(P) =.3*59+.7*14 = 27.5, σp= .3 2 (6189)  .7 2 (9924)  2(.3)(.7 )( 6306) = 52.64

(b) E(P) =.5*59+.5*14 = 36.5, σp= .5 2 (6189)  .5 2 (9924)  2(.5)(.5)( 6306) = 29.58

(c) E(P) =.7*59+.3*14 = 45.5, σp= .7 2 (6189)  .3 2 (9924)  2(.7)(.3)( 6306) = 35.74
(d) Based on the results (a)-(c), you should recommend a portfolio with 70% of stock X and
30% of stock Y. This portfolio has the lowest coefficient of variation, so it has the lowest
risk per unit average return. The coefficient of variations for (a)-(c) are as follows:
(a) 191.42%, (b) 81.04%, (c) 78.55%
This portfolio also has the highest expected return.
CHAPTER 6 SOLUTIONS

 75  100 
6.5 (a) P(X>75) = P z   = P(z > -2.5) = .9938
  10
 70  100 
(b) P(X<70) = P z   = P(z < -3) = 1-.99865 = .00135
 10 
 80  100   110  100 
(c) P(X<80) + P(X>110) = P z   + P z   = P(z<-2) + P(z>1) =
 10   10 
= .0228 + (1-.8413) = .0228 + .1587 =.1815
 X  100 
X  100 
 L U
(d) Set P z  = P(-1.28 < z < 1.28) = .80, so that
 10 10 
 
X  100
L = -1.28 or XL = 100 - 1.28(10) = 87.2 and
10
X  100
U
= 1.28 or XL = 100 + 1.28(10) = 112.8
10

6.8
 34  50 50  50 
(a) P(34<X<50) = P z  = P(-1.33< z<0) =.9082 - .5 = .4082
 12 12 
 30  50   60  50 
(b) P(X<30)+P(X>60)= P z   + P z   =P(z<-1.67) + P(z>.83) =
 12   12 

= .0475 + (1-.7967) =.2508

 X  50  X  50
(c) Set P z   = P(z >-.84) =.80, so that = -.84 or X =50-.84(12) = 39.92
 12  12
(39,920 miles)

(d) Keeping everything else constant, a smaller standard deviation causes more sample means to be
closer to the mean. As a result, we would expect (a), (c) to increase and (b) to decrease.
 34  50 50  50 
(Δa) P(34<X<50) = P z  = P(-1.6< z<0) =.9452 - .5 = .4452
 10 10 

 30  50   60  50 
(Δb) P(X<30)+P(X>60) = P z   + P z   =P(z<-2) + P(z>+1) =
 10   10 
= .0228 + (1-.8413) =.1815

 X  50  X  50
(Δc) Set P z   = P(z >-.84) =.80, so that = -.84
 10  10
or X =50-.84(10) = 41.6 (41,600 miles)
CHAPTER 6 SOLUTIONS (continued)

 91  73 
6.10 (a) P(X<91) = P z   = P(z <2.25) = .9878
 8 

 65  73 89  73 
(b) P(65<X<89) = P z  = P(-1< z<2) = .9772 - (1-.8413) = .8185
 8 8 

 X  73  X  73
(c) Set P z   = P(z >1.645) =.05, so that = 1.645 or X =73+1.645(8) = 86.16
 8  8

(d) Option 1:

 X  73  X  73
Set P z   = P(z >1.28) =.10, so that = 1.28 or X =73+1.28(8) = 83.24
 8  8
Since 81<83.24, you will not get an A under this grading option.

Option 2:

 X  62  X  62
Set P z   = P(z >1.28) =.10, so that = 1.28 or X =62+1.28(3) = 65.84
 3  3
Since 68>65.84, you will get an A under this grading option.

You should prefer Option 2.


You can also find this result by directly comparing the z-scores for options 1 and 2,
and find that Zoption 1 = (81-73)/8=1< 2=(68-62)/3= Zoption 2. If the same curve is
used, option 2 would result in a higher grade.

 21.99  22.002 22  22.002 


6.13 (a) P(21.99<X<22.00) = P z  = P(-2.4< z<-.4) =
 .005 .005 

= P(z < -.4) - P(z < -2.4) = .3446 – .0082 = .3364

 21.99  22.002 22.01  22.002 


(b) P(21.99<X<22.01) = P z  = P(-2.4< z<1.6) =
 .005 .005 

= P(z < 1.6) - P(z < -2.4) = .9452- .0082 = .937

 X  22.002  X  22.002
(c) Set P z   = P(z >2.05) =.02, so that = 2.05 or
 .005  .005

X =22.002+2.05(.005)=22.012
CHAPTER 7 SOLUTIONS

 95  100 
7.15 (a) P( X < 95) = P z   = P(z< -2.5) = .0062
 10 / 25 
 95  100 97.5  100 
(b) P(95< X <97.5) = P z  = P(-2.5<z<-1.25) = .9938 - .8944 = .0994
 10 / 25 10 / 25 
 102.2  100 
(c) P( X >102.2) = P z   = P(z>1.1) = 1-.3643 = .1357
 10 / 25 
 X  100  X  100
(d) Set P z   = P(z >-.39) =.65, so that = -.39 or X = 100 - .39(10/ 25 ) = 99.2
 10 / 25  10 / 25

7.19
(a) Because the population diameter of Ping-Pong balls is approximately normally distributed,
sampling distribution of samples of 16 will also be approximately normally distributed with
standard error equal to .04 / 16  .01 .

 1.28  1.3 
(b) P( X <1.28) = P z   = P(z<-2) = 1-.9772 = .0228
 
.01
 1.31  1.3 1.33  1.3 
(c) P(1.31< X <1.33) = P z  = P(1<z<3) = .99865-.8413 = .15735
 .01 .01 
 X  1.3 X  1.3 

 L U
(d) Set P z  = P(-.84 < z < .84) = .60, so that
 .01 .01 
 
X  1.3
L = -.84 or XL = 1.3 - .84(.01) = 1.2916 and
.01
X  1.3
U
= +.84 or XL = 1.3 + .84(.01) = 1.3084
.01

7.20 (a) When n=2, the shape of the sampling distribution of X should resemble the shape of the
distribution of the population from which the sample is drawn. Since the mean is larger than the
median, the distribution of the sales price of the new houses is skewed to the right, and so is the
sampling distribution of X .

(b) If you select samples of n=100, the shape of the sampling distribution of the mean will be very close
to a normal distribution with a mean of $274,300 and standard deviation of  / n  90000 / 100 =9000.

 300000  274300 
(c) P( X <300000) = P z   = P(z<2.86) = .9979
 900000 / 100 

 275000  274300 290000  274300 


(d) P(275000< X <290000) = P z  = P(.08<z<1.74) =
 900000 / 100 900000 / 100 
= .9591-.5319 = .4272
CHAPTER 7 SOLUTIONS (continued)

 7.8  8 8.2  8 
7.21 (a) P(7.8< X <8.2) = P z  = P(-.5<z<.5) = = .6915 - .3085 = .383
 2 / 25 2 / 25 

 7.5  8 88 
(b) P(7.5< X <8) = P z  = P(-1.25<z<0) = .8944 - .5 = .3944
 2 / 25 2 / 25 

 7.8  8 8.2  8 
(c) P(7.8< X <8.2) = P z  = P(-1<z<1) = = .8413 - .1587 = .6826
 2 / 100 2 / 100 

(d) With the sample size increasing from n=25 to n=100, more sample means will be closer to the
distribution mean. The standard error of the sampling distribution of size 100 is much smaller
than that of size 25, so the likelihood that the sample mean will fall within 0.2 minutes of the
mean is much higher for samples of size 100 (probability=0.6826) than for samples of size 25
(probability=0.3830).

 
3  3.1
7.22 (a) P( X >3) = P z   = P(z>-1) = .8413
 .4 / 16 

 X  3.1  X  3.1
(b) Set P z   = P(z<1.04) =.85, so that = 1.04 or X = 3.1 + 1.04(.4/ 16 ) = 3.204
 .4 / 16  .4 / 16

(c) To be able to use the standard normal distribution as an approximation for the area under
the curve, we must assume that the population is symmetrically distributed such that the
central limit theorem will likely hold for samples of n=16.

 X  3.1  X  3.1
(d) Set P z   = P(z<1.04) =.85, so that = 1.04 or X = 3.1 + 1.04(.4/ 64 ) = 3.152
 .4 / 64  .4 / 64

 
 
.5  .5 .6  .5
7.27 (a) P(.5< p<.6) = P z   P (0  z  2.83) = .9977-.5 = .4977
 (.5)(.5) (.5)(.5) 
 
 200 200 

(b) continued on next page


CHAPTER 7 SOLUTIONS (continued)
7.27 (continued)
 
 p  .5 p  .5 
(b) Set P L  = P(-1.645 < z <1.645) = .90, so that
U
z
 .5(.5) .5(.5) 
 
 200 200 
p  .5 .5(.5)
L = -1.645 or p L = .5 – 1.645 = .4418 and
.5(.5) 200
200
p  .5 .5(.5)
U
= +1.645 or pU = .5 + 1.645 = .5582
.5(.5) 200
200
 (1   )
NOTE: You can also just use the formula   z
n

 
 
.65  .5
(c) P( p >.65) = P z    P ( z  4.24)  0
 (.5)(.5) 
 
 200 

 
 
.60  .5
(d) If n=200, P( p >.60) = P z    P ( z  2.83) = 1 - .9977 = .0023
 (.5)(.5) 
 
 200 

 
 
.55  .5
If n=1000, P( p >.55) = P z    P ( z  3.16) = 1 - .99921 = .00079
 (.5)(.5) 
 
 1000 

More than 60% correct in a sample of 200 is more likely than more than 55%
correct in a sample of 1000.
CHAPTER 7 SOLUTIONS (continued)

 
 
.45  .46 .55  .46
7.30 (a) P(.45< p<.55) = P z  = P(-.28<z<2.55) = .9946-.3897 = .6049
 (.46)(.54) (.46)(.54) 
 
 200 200 

 
 p  .46 p  .46 
(b) Set P L  = P(-1.645 < z <1.645) = .90, so that
U
z
 .46(.54) .46(.54) 
 
 200 200 
p  .46 .46(.54)
L = -1.645 or p L = .46 – 1.645 = .40 and
.46(.54) 200
200
p  .46 .46(.54)
U
= +1.645 or pU = .46 + 1.645 = .52
.46(.54) 200
200
 (1   )
NOTE: You can also just use the formula   z
n

 
 p  .46 p  .46 
(c) Set P L  = P(-1.96 < z <1.96) = .95, so that
U
z
 .46(.54) .46(.54) 
 
 200 200 
p  .46 .46(.54)
L = -1.96 or p L = .46 – 1.96 = .39 and
.46(.54) 200
200
p  .46 .46(.54)
U
= +1.96 or pU = .46 + 1.96 = .53
.46(.54) 200
200
 (1   )
NOTE: You can also just use the formula   z
n
CHAPTER 8 SOLUTIONS
 0.02
8.9 (a) X  Z  0.995  2.58  0.9877    1.0023
n 50
(b) Since the value of 1.0 is included in the interval, there is no reason to believe that
the mean is different from 1.0 gallon.
(c) No. Since  is known and n = 50, from the Central Limit Theorem, we may assume
that the sampling distribution of X is approximately normal.
(d) The reduced confidence level narrows the width of the confidence interval.
 0.02
. X  Z  0.995  1.96  0.9895    1.0005
n 50
(b) Since the value of 1.0 is still included in the interval, there is no reason to
believe that the mean is different from 1.0 gallon.

 100
8.10 (a) X Z  350  1.96  325.5    374.50
n 64

(b) No. The manufacturer cannot support a claim that the bulbs last an average 400
hours. Based on the data from the sample, a mean of 400 hours would represent a
distance of 4 standard deviations above the sample mean of 350 hours.
(c) No. Since  is known and n = 64, from the Central Limit Theorem, we may assume
that the sampling distribution of X is approximately normal.
(d) The confidence interval is narrower based on a process standard deviation of 80
hours rather than the original assumption of 100 hours.
 80
(a) X  Z  350  1.96  330.4    369.6
n 64
(b) Based on the smaller standard deviation, a mean of 400 hours would
represent a distance of 5 standard deviations above the sample mean of
350 hours. No, the manufacturer cannot support a claim that the bulbs
have a mean life of 400 hours.

S 9
8.16 (a) X t  32  2.0096  29.44    34.56
n 50
(b) The quality improvement team can be 95% confident that the population mean
turnaround time is now somewhere in between 29.44 hours and 34.56 hours.

(c) The project was a success because the initial turnaround time of 68 hours does
not fall inside the 95% confidence interval.

S 21.4
8.17 (a) X t  195.3  2.1098  184.6581    205.9419
n 18
(b) No, a grade of 200 is in the interval.
(c) It is not unusual. A tread-wear index of 210 for a particular tire is only 0.69
standard deviation above the sample mean of 195.3.
CHAPTER 8 SOLUTIONS (continued)

8.28 (a)
X 135 p (1 – p ) 0.27(1  0.27)
p  = 0.27 pZ  0.27  2.5758
n 500 n 500
0.22    0.32
(b) The manager in charge of promotional programs concerning residential
customers can infer that the proportion of households that would purchase an
additional telephone line if it were made available at a substantially reduced
installation cost is between 0.22 and 0.32 with a 99% level of confidence.

8.30 (a)
X 260 p (1 – p ) 0.52(1  0.52)
p  = 0.52 pZ  0.52  1.96
n 500 n 500
0.4762    0.5638
(b) Since the 95% confidence interval contains 0.50, you cannot claim that more than
half of all U.S. workers have negotiated a pay raise.
X 2600
(c) (a) p  = 0.52
n 5000
p (1 – p ) 0.52(1  0.52)
pZ  0.52  1.96
n 5000
0.5062    0.5338
(b) Since the lower limit of the 95% confidence interval is greater than 0.50,
you can claim that more than half of all U.S. workers have negotiated a
pay raise.
(d) The larger the sample size, the narrow is the confidence interval holding
everything else constant.

158 p 1  p  0.33401  0.3340 


8.33 (a) p  = 0.3340 pZ  0.3340  1.96
473 n 473
0.29    0.38
85 p 1  p  0.17971  0.1797 
(b) p  = 0.1797 pZ  0.1797  1.96
473 n 473
0.15    0.21
(c) You can be 95% confident that the population proportion of employees who
typically took work with them on vacation is somewhere between 0.29 and
0.38 and the population proportion of employees who said that there are
unwritten and unspoken expectations that they stay connected is
somewhere between 0.15 and 0.21.
(d) The two confidence interval estimates do not overlap because the two separate
point estimates are far enough apart.

Z 2 (1 –  ) 1.962 (0.4)(0.6)
8.37 n  = 2,304.96 Use n = 2,305
e2 (0.02) 2
CHAPTER 8 SOLUTIONS (continued)

Z 2  2 1.962  4002
8.38 (a) n  = 245.86 Use n = 246
e2 502
Z 2 2 1.962  4002
(b) n 2  = 983.41 Use n = 984
e 252

Z 2  2 2.582 182
8.71 (a) n  = 86.27 Use n = 87
e2 52
Note: If the Z-value used is carried out to 2.5758, the value of n is 85.986 and
only 86 women would need to be sampled.
Z 2    (1 –  ) 1.6452  (0.5)  (0.5)
(b) n  = 334.07 Use n = 335
e2 (0.045) 2
If a single sample were to be selected for both purposes, the larger of the two
sample sizes (n = 335) should be used.

S $9.22
8.73 (a) X t  $21.34  1.9949  $19.14    $23.54
n 70
p (1 – p ) 0.3714(0.6286)
(b) pZ  0.3714  1.645 
n 70
0.2764    0.4664
Z  2 2
1.96 10
2 2
(c) n 2  = 170.74 Use n = 171
e 1.5 2
Z 2    (1 –  ) 1.6452  (0.5)  (0.5)
(d) n  = 334.08 Use n = 335
e2 (0.045) 2
(e) If a single sample were to be selected for both purposes, the larger of the two sample
sizes (n = 335) should be used.

Z 2    (1   ) 1.962  (0.5)  (0.5)


8.75 (a) n  = 384.16 Use n = 385
e2 (0.05) 2
If we assume that the population proportion is only 0.50, then a sample of 385
would be required. If the population proportion is 0.90, the sample size required
is cut to 103.
p (1  p ) 0.84(0.16)
(b) pZ  0.84  1.96 
n 50
0.7384    0.9416
(c) The representative can be 95% confidence that the actual proportion of bags that
will do the job is between 74.5% and 93.5%. He/she can accordingly perform a
cost-benefit analysis to decide if he/she wants to sell the Ice Melt product.
CHAPTER 9 SOLUTIONS
9.9  is the probability of incorrectly convicting the defendant when he is innocent.  is the
probability of incorrectly failing to convict the defendant when he is guilty.
9.10 Under the French judicial system, unlike ours in the United States, the null hypothesis
assumes the defendant is guilty, the alternative hypothesis assumes the defendant is
innocent. A Type I error would be not convicting a guilty person and a Type II error
would be convicting an innocent person.
9.11 (a) A Type I error is the mistake of approving an unsafe drug. A Type II error is not
approving a safe drug.
(b) The consumer groups are trying to avoid a Type I error.
(c) The industry lobbyists are trying to avoid a Type II error.
(d) To lower both Type I and Type II errors, the FDA can require more information
and evidence in the form of more rigorous testing. This can easily translate into
longer time to approve a new drug.
9.15 (a) H0:  = 1. The mean amount of paint is 1 gallon.
H1:   1. The mean amount of paint differs from 1 gallon.
Decision rule: Reject H 0 if |ZSTAT| > 2.5758
X   .995  1
Test statistic: Z STAT    1.7678
 / n .02/ 50
Decision: Since |ZSTAT| < 2.5758, do not reject H 0 . There is not enough evidence
to conclude that the mean amount of paint contained in 1-gallon cans purchased
from a nationally known manufacturer is different from 1 gallon.
(b) p-value = 0.0771. If the population mean amount of paint contained in 1-gallon
cans purchased from a nationally known manufacturer is actually 1 gallon, the
probability of obtaining a test statistic that is more than 1.7678 standard error
units away from 0 is 0.0771.
 .02
(c) X  Za/2  .995  2.5758 0.9877    1.0023
n 50
You are 95% confident that population mean amount of paint contained in 1-gallon cans
purchased from a nationally known manufacturer is somewhere between 0.9877 and
1.0023 gallon.
(d) Since the 99% confidence interval does contain the hypothesized value of 1, you
will not reject H 0 . The conclusions are the same.

9.22 (a) H 0 :   3.7 H 1 :   3.7 . Decision rule: Reject H 0 if |tSTAT| > 1.998 d.f. = 63
X  3.57 - 3.7
Test statistic: t STAT    -1.3
S/ n 0.8/ 64
Decision: Since |tSTAT| < 1.9983, do not reject H 0 . There is not enough evidence
to conclude that the population mean waiting time is different from 3.7
minutes at the 0.05 level of significance.
(b) The sample size of 64 is large enough to apply the Central Limit Theorem and,
hence, you do not need to be concerned about the shape of the population
distribution when conducting the t-test in (a). In general, the t test is
appropriate for this sample size except for the case where the population is
extremely skewed or bimodal.
CHAPTER 9 SOLUTIONS (continued)

9.26 (a) H 0 :   35 H 1 :   35
Decision rule: Reject H 0 if |tSTAT| > 2.5706 d.f. = 5
X  36.53 - 35
Test statistic: t STAT    0.8556
S/ n 4.3896/ 6
Decision: Since |tSTAT| < 2.5706, do not reject H 0 . There is not enough evidence
to conclude that the mean price for two tickets with online service charges, large
popcorn, and two medium soft drinks is different from $35.
(b) The p-value is 0.4313. If the population mean is indeed $35, the probability of
observing a sample of 6 theater chains that will result in a sample mean farther
away from the hypothesized value than this sample is 0.4313.
(c) That the distribution of prices is normally distributed.
(d) With a small sample size, it is difficult to evaluate the assumption of normality.
However, the distribution may be symmetric since the mean and the median are
close in value.

9.46 H0:   2.8 feet.


The mean length of steel bars produced is at least 2.8 feet and the production equipment
does not need immediate adjustment.
H1:  < 2.8 feet.
The mean length of steel bars produced is less than 2.8 feet and the production equipment
does need immediate adjustment.
(a) Decision rule: If tSTAT < – 1.7109, reject H0.
X –  2.73 – 2.8
Test statistic: t STAT   = – 1.75
S 0 .2
n 25
Decision: Since tSTAT = – 1.75 is less than –1.7109, reject H0. There is enough
evidence to conclude the production equipment needs adjustment.
(b) Decision rule: If p-value < 0.05, reject H0.
X –  2.73 – 2.8
Test statistic: t STAT   = – 1.75
S 0 .2
n 25
p-value = 0.0464
Decision: Since p value = 0.0464 is less than  = 0.05, reject H0. There is enough
evidence to conclude the production equipment needs adjustment.
(c) The probability of obtaining a sample whose mean is 2.73 feet or less when the
null hypothesis is true is 0.0464.
(d) The conclusions are the same.
CHAPTER 9 SOLUTIONS (continued)

9.48 (a) H0:   5


The mean number of trips that children take to the store is no more than 5.
H1:  > 5
The mean number of trips that children take to the store is more than 5.
(b) A Type I error occurs when you conclude the mean number of trips that children
take to the store is more than 5 when in fact the mean number is not more than
five.
A Type II error occurs when you conclude the mean number of trips that children
take to the store is not more than 5 when in fact the mean number is more than
five.
(c) Decision rule: If tSTAT > 2.3646 or when the p-value < 0.01, reject H0.
X –  5.47 – 5
Test statistic: t STAT   = 2.9375
S 1 .6
n 100
p-value = 0.0021
Decision: Since tSTAT = 2.9375 is greater than 2.3646 or the p-value of 0.0021 is
less than 0.01, reject H0. There is enough evidence to conclude the population
mean number of trips to the store is greater than 5 per week.
(d) When the null hypothesis is true, the probability of obtaining a sample whose
mean is 5.47 trips or more is 0.0021.

9.57 (a) H0:   0.5 H1:  < 0.5


Decision rule: If ZSTAT < -2.3263, reject H0.
p  0.42 - 0.5
Test statistic: Z STAT   = -3.5777
 1    0.51 - 0.5
n 500
Decision: Since ZSTAT = -3.5777 is lower than the critical bound of 2.3263, reject
H0. There is enough evidence to conclude that the proportion of customers who
selected products and then cancelled their transaction is less than 0.50 with the
new system.
(b) H0:   0.5 H1:  < 0.5
Decision rule: If Z < -2.3263, reject H0.
p  0.42 - 0.5
Test statistic: Z STAT   = -1.6
 1    0.51 - 0.5
n 100
Decision: Since ZSTAT = -1.6 is larger than the critical bound of 2.3263, do not
reject H0. There is not enough evidence to conclude that the proportion of
customers who selected products and then cancelled their transaction is less than
0.50 with the new system.
(c) The larger the sample size, the smaller is the standard error. Even though the
sample proportion is the same value at 0.42 in (a) and (b), the test statistic is
more negative while the p-value is smaller in (a) compared to (b) because of the
larger sample size in (a).
CHAPTER 9 SOLUTIONS (continued)

9.67 (a) H 0 :   0.5 H1 :   0.5


(b) The level of significance is the probability of committing a Type I error, which is
the probability of concluding the proportion of customers who prefer product 1
over product 2 is not 50% when in fact 50% of customers prefer product one over
product two. The risk associated with Type II error is the probability of not
rejecting the claim that 50% of customers prefer product 1 over product 2 when it
should be rejected.
(c) If you reject the null hypothesis for a p-value of 0.22, there is a 22% probability
that you may have incorrectly concluded that the proportion of customers
preferring product 1 is not 50% when in fact the correct proportion is 50%.
(d) The article suggests raising the level of significance because the consequences of
incorrectly concluding the proportion is not 50% are not very severe.
(e) Before raising the level of significance of a test, you have to genuinely evaluate
whether the cost of committing a Type I error is really not as bad as you have
thought.
(f) If the p-value is actually 0.12, you will be more confident about rejecting the null
hypothesis. If the p-value is 0.06, you will be even more confident that a Type I
error is much less likely to occur.

9.68 (a) La Quinta Motor Inns commits a Type I error when it purchases a site that is not
profitable.
(b) Type II error occurs when La Quinta Motor Inns fails to purchase a profitable site.
The cost to the Inns when a Type II error is committed is the loss on the amount of
profit the site could have generated had the Inns decided to purchase the site.
(c) The executives at La Quinta Motor Inns are trying to avoid a Type I error by
adopting a very stringent decision criterion. Only sites that are classified as
capable of generating high profit will be purchased.
(d) If the executives adopt a less stringent rejection criterion by buying sites for which
the computer model predicts moderate or large profit, the probability of committing
a Type I error will increase. Many more of the sites the computer model predicts
that will generate moderate profit may end up not being profitable at all. On the
other hand, the less stringent rejection criterion will lower the probability of
committing a Type II error since more potentially profitable sites will be
purchased.
CHAPTER 10 SOLUTIONS
10.7 (a) H0: 1   2 The mean amount spent is no higher for men than women.
H1: 1   2 The mean amount spent is higher for men than women.
(b) Type I error is the error made in concluding that the mean amount spent is higher
for men than women when the mean amount spent is in fact no higher for men
than women.
(c) Type II error is the error made in concluding that the mean amount spent is no
higher for men than women when the mean amount spent is in fact higher for
men than women.
( X 1  X 2 )  ( 1   2 ) ( 2401  1527 )  0
(d) z  = = 14.13 > 2.33 = z.01, reject H0
2 2
s1 s 1200 2 1000 2
 2 
n1 n 2 600 700

There is evidence that the mean amount spent is higher for men than for women.

10.20
A B DIFFERENCE(B-A)
24 26 2
27 27 0
19 22 3
24 27 3
22 25 3
26 27 1
27 26 -1
25 27 2
22 23 1
D =1.56
sD=1.42
a. H0: μD = 0
H1: μD ≠ 0
D  D 1.56  0
t stat  = = 3.3 > 2.306 = t8,.025 , so reject H0.
sD 1.42
n 9
There is enough evidence of a difference in the mean summated ratings between the two
brands.
b. You must assume the distribution of the differences between the two ratings is
approximately normal.

c. p-value is 1.0112. The probability of obtaining a mean difference in ratings that deviates
from 0 by at least 3.3 standard deviations in either direction is approximately .0112 if
there is no difference in the mean summated ratings between the two brands.
sD 1.42
d. D  t n 1, / 2 = 1.56  2.306 = 0.47 to 2.65 (a) Define the difference in
n 9
summated rating as the rating on brand A minus the rating on brand B.
CHAPTER 10 SOLUTIONS (continued)
10.30 (a) H0:  1   2 H1:  1   2 Population 1 = expensive pill, 2 = cheap pill
(b) Decision rule: If ZSTAT > 1.6449, reject H0.
Test statistic:
X 1  X 2 35  25
p  = 0.7317
n1  n2 41  41

Z STAT  1
 p  p2   1   2    0.8537  0.6098  0 = 2.4924
1 1  1 1 
p 1  p     0.7317 1  0.7317    
 n1 n2   41 41 
Decision: Since ZSTAT = 2.4924 is greater than the critical bound of 1.6449, reject H0.
There is sufficient evidence to conclude that people think an expensive pill works
better than a cheap pill.
(c) Yes, the result in (b) makes it appropriate to claim that people think an expensive pill
works better than a cheap pill.

10.35 (a) H0:  1 =  2 where Populations: 1 = Age 36 to 50, 2 = Age above 50


H1:  1   2
Decision rule: If ZSTAT < – 1.96 or ZSTAT > 1.96, reject H0.
Test statistic:

Z STAT 
 p1  p 2    1   2   0.41 - 0.52  0 = -2.2054
1 1   1 1 
p 1  p    0.4651 - 0.465  
 n1 n2   200 200 
Decision: Since ZSTAT = 2.2054 is less than the lower critical bound of – 1.96, reject
H0. There is sufficient evidence of a significant difference in the proportion who get
their news primarily from newspapers between those respondents 36 to 50 years old
and those above 50 years old.
(b) p-value is 0.0274. The probability of obtaining a difference in proportions that gives
rise to a test statistic that deviates from 0 by 2.2054 or more in either direction is
0.0274 if there is no difference in the proportion who get their news primarily from
newspapers between those respondents 36 to 50 years old and those above 50 yrs old.
(c)
p1 1  p1  p 2 1  p 2  0.411 - 0.41 0.521  0.52 
 p1  p 2   Z   0.41 - 0.52   1.96 
n1 n2 200 200
-0.2072   1   2  -0.0128
10.59 (a) H 0 : 1   2 H1 :  1   2
Population: 1 = Democrats, 2 = Republicans
(b) Type I Error: Rejecting the null hypothesis that the proportion of Democrats trusting
the government more than business is no greater than the proportion of Republicans
trusting the government more than business when the proportion of Democrats
trusting the government more than business is indeed no greater than the proportion
of Republicans trusting the government more than business.
(c) Type II Error: Failing to reject the null hypothesis that the proportion of Democrats
trusting the government more than business is no greater than the proportion of
Republicans trusting the government more than business when the proportion of
Democrats trusting the government more than business is indeed greater than the
proportion of Republicans trusting the government more than business.
CHAPTER 13 SOLUTIONS
13.4 (a)
Scatter Diagram
350

300

250

200
Y
150 Sales

100

50

0
0 5 10 15 20 25
X

The scatter plot shows a positive linear relationship.


(b) For each increase in shelf space of an additional foot, there is an expected
increase in weekly sales of an estimated $7.40.
(c) Yˆ  145  7.4 X  145  7.4(8) = $204.20

13.9
Regression Statistics
Multiple R 0.850061
R Square 0.722603
Adjusted R
Square 0.710543
Standard Error 194.5954
Observations 25

ANOVA
df SS MS F Sign F
7.52E-
Regression 1 2268777 2268777 59.91376 08
Residual 23 870949.5 37867.37
Total 24 3139726

Stand
Coeff Err t Stat P-value
Intercept 177.1208 161.0043 1.1001 0.28267
X Variable 1 1.065144 0.137608 7.740398 7.52E-08
CHAPTER 13 SOLUTIONS (continued)
13.9 (continued) (a)
Scatter Plot

2500

2000

Monthly Rent ($)


1500

1000

500

0
0 500 1000 1500 2000 2500
Size (square feet)

(b) Yˆ  177.1  1.065 X


(c) For each increase of 1 square foot in space, the expected monthly rental is
estimated to increase by $1.065. Since X cannot be zero, 177.1 has no practical
interpretation.
(d) Yˆ  177.1  1.065 X  177.1  1.065(1000)  $1242.10
(e) An apartment with 500 square feet is outside the relevant range for the
independent variable.
(f) The apartment with 1200 square feet has the more favorable rent relative to size.
Based on the regression equation, a 1200 square foot apartment would have an
expected monthly rent of $1455.10, while a 1000 square foot apartment would
have an expected monthly rent of $1242.10.
SSR 20,535
13.16 (a) r2 =  = 0.684. So, 68.4% of the variation in weekly sales can be
SST 30,025
explained by the variation in shelf space.

 Y 
n
2
 Yˆi
SSE i
9490
(b) S YX   i 1
 = 30.8058
n2 n2 10
(c) Based on (a) and (b), the model should be moderately useful for predicting sales.
13.21 (a) r2 = 0.723. So, 72.3% of the variation in monthly rent can be explained by the
variation in square footage.
(b) S YX  194.6
(c) Based on (a) and (b), the model should be moderately useful for predicting
monthly rent.
13.42 (a) H 0 : 1  0 H1 : 1  0
b  7.4
tSTAT  1 1  = 4.65 > t0.05 = 2.2281 with 10 degrees of freedom for
Sb1 1.59
  0.05 . Reject H0. There is enough evidence to conclude that the fitted linear
regression model is useful.
(b) b1  t /2 Sb1 = 7.4  2.2281(1.59) 3.86   1  10.94
CHAPTER 13 SOLUTIONS (continued)

13.47 (a) tSTAT  7.74  t0.05/ 2  2.0687 with 23 degrees of freedom for   0.05 . Reject
H0. There is evidence that the fitted linear regression model is useful.
(b) 0.7803  1  1.3497

13.49 (a) Proctor & Gamble’s stock moves only 54% as much as the overall market and is
much less volatile than the market. AT&T’s stock moves only 73% as much as
the overall market and is considered less volatile than the market. The stock of
Disney Company moves 10% more than the overall market and is considered a
little more volatile than the market. Apple’s stock moves 52% more than the
overall market and is considered as volatile. eBay’s stock moves 69% more than
the overall market and is considered as volatile. Ford’s stock moves 186% more
than the overall market and is considered as extremely volatile.
(b) Investors can use the beta value as a measure of the volatility of a stock to assess
its risk.

13.50 (a) (% daily change in DXRLX) = b0 + 2.5 (% daily change in Russell 2000 Index)
(b) If the Russell 2000 Index gains 10% in a year, the DXRLX is expected to gain an
estimated 25% on average.
(c) If the Russell 2000 Index loses 20% in a year, the DXRLX is expected to lose an
estimated 50% on average.
(d) Risk takers will be attracted to leveraged funds, but risk averse investors will stay
away.

13.58 (a) YOU ARE ONLY RESPONSIBLE FOR THE APPROXIMATE INTERVAL:
DISREGARD THE hi INFORMATION
(b0 + b1X)  t10,.025 (sYX/ n ) = (145+7.4(8))  2.2281(30.81/ 12 ) =
= 204.2  2.2281(30.81/ 12 ) = 184.38 to 224.02.

(b) YOU ARE ONLY RESPONSIBLE FOR THE APPROXIMATE INTERVAL:


DISREGARD THE hi INFORMATION

(b0 + b1X)  t10,.025 sYX = (145+7.4(8))  2.2281(30.81) =


= 204.2  2.2281(30.81) = 135.55 to 272.85

(c) Part (b) provides a prediction interval for the individual response given a specific
value of the independent variable, and part (a) provides an interval estimate for
the mean value, given a specific value of the independent variable. Because there
is much more variation in predicting an individual value than in estimating a
mean value, a prediction interval is wider than a confidence interval estimate.
CHAPTER 13 SOLUTIONS (continued)
13.74
Regression Statistics
Multiple R 0.985774
R Square 0.971751
Adjusted R
Square 0.970182
Standard Error 1.986503
Observations 20

ANOVA df SS MS F sign f
2.152E-
Regression 1 2443.466 2443.466 619.1956 15
Residual 18 71.03149 3.946194
Total 19 2514.498

Stand
Coeff Err t Stat P-value
Intercept 24.83453 1.054219 23.55729 5.61E-15
X Variable 1 0.140026 0.005627 24.88364 2.15E-15

(a) b0 = 24.84, b1 = 0.14

(b) 24.84 is the portion of estimated mean delivery time that is not affected by the
number of cases delivered. For each additional case, the estimated mean delivery
time increases by 0.14 minutes.
(c) Yˆ  24.84  0.14 X  24.84  0.14(150)  45.84

(d) No, 500 cases is outside the relevant range of the data used to fit the regression
equation.

(e) r2 = 0.972. So, 97.2% of the variation in delivery time can be explained by the
variation in the number of cases.

(g) H0: β=0


H1: β≠0
.140026  0
t stat  = 24.88 > 2.1009 = t18,.025, so reject H0.
.005627

(h) YOU ARE ONLY RESPONSIBLE FOR THE APPROXIMATE


INTERVAL:
(b0 + b1X)  t18,.025 (sYX/ n ) = (24.84+.14(150))  2.1009(1.987/ 20 ) = 44.91 to 46.77.

YOU ARE ONLY RESPONSIBLE FOR THE APPROXIMATE


INTERVAL:
(b0 + b1X)  t18,.025 sYX = (24.84+.14(150))  2.1009(1.987) = 41.67 to 50.01.
CHAPTER 13 SOLUTIONS (continued)

13.75

Regression Statistics
Multiple R 0.946795
R Square 0.896421
Adjusted R
Square 0.875705
Standard Error 291.7399
Observations 7

ANOVA
df SS MS F Sign F
Regression 1 3683011 3683011 43.27244 0.001219
Residual 5 425560.8 85112.15
Total 6 4108571

Standard
Coefficients Error t Stat P-value
Intercept 3430.943 761.2418 4.507034 0.006358
X Variable 1 0.758937 0.115372 6.578179 0.001219

(a) b0 = 3430.9428, b1 = 0.7589

(b) The intercept b0 represents fixed cost.

(c) The slope b1 represents the variable cost per each patient-day.

(d) Yˆ  3430.9428  0.7589X  3430.9428  0.7589(7500)  $9122.967


CHAPTER 14 SOLUTIONS

14.3 (a) Yˆ  0.02686  0.79116 X 1  0.60484 X 2


(b) For a given measurement of the change in impact properties over time, each
increase of one unit in forefoot impact absorbing capability is estimated to result
in a mean increase in the long-term ability to absorb shock of 0.79116 units. For
a given forefoot impact absorbing capability, each increase of one unit in
measurement of the change in impact properties over time is estimated to result
in a mean increase in the long-term ability to absorb shock of 0.60484 units.

14.6
Regression Statistics
Multiple R 0.899273
R Square 0.808692
Adj R Sq 0.788555
Stand Err 158.9041
Observ 22
ANOVA df SS MS F sign F
Regression 2 2028033 1014016 40.15823 1.5E-07
Residual 19 479759.9 25250.52
Total 21 2507793
Stand
Coeff Err t Stat P-value
Intercept 156.4304 126.7579 1.234089 0.232217
X Var 1 13.08068 1.759374 7.434851 4.89E-07
X Var 2 16.79528 2.963378 5.667613 1.83E-05

(a) Yˆ  156.4  13.081X 1  16.795 X 2


(b) For a given amount of newspaper advertising, each increase of $1000 in radio
advertising is estimated to result in a mean increase in sales of $13,081. For a
given amount of radio advertising, each increase of $1000 in newspaper
advertising is estimated to result in the mean increase in sales of $16,795.
(c) When there is no money spent on radio advertising and newspaper advertising,
the estimated mean amount of sales is $156,430.44.
(d) According to the results of (b), newspaper advertising is more effective as each
increase of $1000 in newspaper advertising will result in a higher mean increase
in sales than the same amount of increase in radio advertising.

14.10 (a) MSR  SSR / k  30 / 2  15


MSE  SSE /(n  k  1)  120 /10  12
(b) FSTAT  MSR / MSE  15 / 12  1.25
(c) FSTAT  1.25  F  4.103 . Do not reject H0. There is not sufficient
evidence of a significant linear relationship.
SSR 30
(d) r2    0.2
SST 150
 n 1 
(e) 2
radj

 
 1   1  rY2.12
n  k  1 
= 0.04
CHAPTER 14 SOLUTIONS (continued)

14.11 (a) 68% of the total variability in team performance can be explained by team skills
after adjusting for the number of predictors and sample size. 78% of the total
variability in team performance can be explained by clarity in expectation after
adjusting for the number of predictors and sample size. 97% of the total
variability in team performance can be explained by both team skills and clarity
in expectations after adjusting for the number of predictors and sample size.
(b) Model 3 is the best predictor of team performance since it has the highest
adjusted r2.

14.12 (a) H0: β1 = β2 = 0


H1: At least one βi not equal to zero

FSTAT  97.69  F  3.89 . Reject H0. There is evidence of a significant linear


relationship with at least one of the independent variables.
(b) p-value = virtually zero. The probability of obtaining an F test statistic of 97.69
or larger is virtually zero if H0 is true.
(c) rY2.12  SSR / SST  12.6102 / 13.38473  0.9421 . So, 94.21% of the variation
in the long-term ability to absorb shock can be explained by variation in forefoot
absorbing capability and variation in midsole impact.
 n 1   15  1 
(d) 2
radj  1   (1  rY2.12 )  1  (1  0.9421)  0.93245
 n  k  1   15  2  1 

14.16 (a) H0: β1 = β2 = 0


H1: At least one βi not equal to zero
MSR  SSR / k  2, 028, 033 / 2  1, 014, 016
MSE  SSE /(n  k  1)  479, 759.9 /19  25, 251
FSTAT  MSR / MSE  1,014,016 / 25,251  40.16
FSTAT  40.16  F  3.522 . Reject H0. There is evidence of a significant
linear relationship.
(b) p-value < 0.001. The probability of obtaining an F test statistic of 40.16 or larger
is less than 0.001 if H0 is true.
(c) r 2  SSR / SST  2, 028, 033 / 2,507, 793  0.8087 . So, 80.87% of the
variation
in sales can be explained by variation in radio advertising and variation in
newspaper advertising.
 n 1   22  1 
(d) 2
radj  1  (1  rY2.12 )   1   (1  0.8087)  0.7886
 n  k  1  22  2  1 
CHAPTER 14 SOLUTIONS (continued)

14.25 (a) 95% confidence interval on 1 : b1  tn  k 1Sb1 , 0.79116  2.1788  0.06295 


0.65400   1  0.92832

(b) H0: β1=0


H1: β1≠0

For X1: tSTAT  b1 / Sb1  0.79116 / 0.06295  12.57  t12  2.1788 with 12
degrees of freedom for  = 0.05. Reject H0. There is evidence that the variable X1
contributes to a model already containing X2.

H0: β2=0
H1: β2≠0

For X2: tSTAT  b2 / Sb2  0.60484 / 0.07174  8.43  t12  2.1788 with 12
degrees of freedom for  = 0.05. Reject H0. There is evidence that the variable X2
contributes to a model already containing X1.
Both variables X1 and X2 should be included in the model.

14.28 (a) 95% confidence interval on 1 : b1  tn  k 1Sb1 , 13.0807  2.093 1.7594 


9.398  1  16.763

(b) H0: β1=0


H1: β1≠0

For X1: tSTAT  b1 / Sb1  13.0807 /1.7594  7.43  t19  2.093 with 19 degrees
of freedom for  = 0.05. Reject H0. There is evidence that the variable X1
contributes to a model already containing X2.

H0: β2=0
H1: β2≠0

For X2: tSTAT  b2 / Sb2  16.7953 / 2.9634  5.67  t19  2.093 with 19 degrees
of freedom for  = 0.05. Reject H0. There is evidence that the variable X2
contributes to a model already containing X1.
Both variables X1 and X2 should be included in the model.
CHAPTER 14 SOLUTIONS (continued)

14.41
Multiple R 0.929
R Square 0.864
Adj R sq 0.834
Stand Err 21.318
Observ 12
ANOVA df SS MS F Sign
Regression 2 25935.00 12967.50 28.53 0.00
Residual 9 4090.00 454.44
Total 11 30025.00
Coeff Stand Err t Stat P-value
Intercept 130.00 15.69 8.29 0.00
X Variable 1 7.40 1.10 6.72 0.00
X Variable 2 45.00 13.05 3.45 0.01

(a) Yˆ  130  7.4 X 1  45 X 2 , where X1 = shelf space and X2 = aisle location (1 = front).
(b) Holding constant the effect of aisle location, for each additional foot of shelf space, sales
are estimated to increase by a mean of $7.40. For a given amount of shelf space,
a front-of-aisle location is estimated to increase sales by a mean of $45.
(c) YOU ARE ONLY RESPONSIBLE FOR THE APPROXIMATE INTERVALS:
(b0 + b1X1+ b2X2)  t9,.025 (sYX/ n ) = (130+7.4(8)+45(0))  2.2622(21.3/ 12 )
= 189.2  2.2622(21.3/ 12 ) = 175 to 203.
(b0 + b1X1 + b2X2)  t9,.025 sYX = (130+7.4(8)+45(0))  2.2622(21.3)
= 189.2  2.2622(21.3) =141 to 237.
(e) H0: β1= β2=0
H1: At least one βj≠0
FSTAT  28.53  F  4.26 . Reject H0. There is evidence of a relationship
between sales and the two independent variables.

(f) H0: β1=0


H1: β1≠0
For X1: tSTAT  6.72  t /2  2.2622 . Reject H0. Shelf space makes a significant
contribution and should be included in the model.
H0: β2=0
H1: β2≠0
For X2: tSTAT  3.45  t /2  2.2622 . Reject H0. Aisle location makes a significant
contribution and should be included in the model.
Both variables should be kept in the model.
(g) b1  t9,.025 s = 7.4  2.2622(1.1) = 4.9 to 9.9
b
1
b2  t9,.025 s = 45  2.2622(13.05) = 15.5 to 74.5
b
2

(h) The slope here takes into account the effect of the other predictor variable, placement,
while the solution for Problem 13.4 did not.
(i) rY .12  0.864 . So, 86.4% of the variation in sales can be explained by variation in shelf
2

space and variation in aisle location.


CHAPTER 14 SOLUTIONS (continued)

14.49
Regression Statistics
Multiple R 0.886397
R Square 0.785699
Adjusted R Square 0.760972
Standard Error 9.634874
Observations 30
ANOVA df SS MS F Sign F
Regression 3 8849.066 2949.689 31.77489 7.53E-09
Residual 26 2413.601 92.8308
Total 29 11262.67
Coeff Stand Err t Stat P-value
Intercept -63.9813 16.79967 -3.80849 0.000769
X Variable 1 1.125782 0.158856 7.086787 1.59E-07
X Variable 2 -22.2887 4.31543 -5.16488 2.18E-05
X Variable 3 8.088047 4.310281 1.876455 0.071861

(a) Yˆ  63.9813  1.1258X 1  22.2887X 2  8.0880X 3


where X1 = proficiency exam, X2 = traditional method dummy, X3 = CD-ROM-
based dummy

(b) Holding constant the effect of training method, for each point increase in
proficiency exam score, the end-of-training exam score is estimated to increase
by a mean of 1.1258 points. For a given proficiency exam score, the end-of-
training exam score of a trainee who has been trained by the traditional method
will have an estimated mean score that is 22.2887 points below a trainee that has
been trained using the web-based method. For a given proficiency exam score,
the end-of-training exam score of a trainee who has been trained by the CD-
ROM-based method will have an estimated mean score that is 8.0880 points
above a trainee that has been trained using the web-based method

(c) Yˆ  63.9813  1.1258 100   48.5969

(e) H0: β1= β2= β3=0


H1: At least one βj≠0

FSTAT = 31.77 with 3 and 26 degrees of freedom. The p-value is virtually 0.


Reject H0 at 5% level of significance. There is evidence of a relationship between
end-of-training exam score and the dependent variables.

(f) continued on next page


CHAPTER 14 SOLUTIONS (continued)

(f) H0: β1=0


H1: β1≠0

For X1: tSTAT = 7.0868 and the p-value is virtually 0. Reject H0. Proficiency exam
score makes a significant contribution and should be included in the model.

H0: β2=0
H1: β2≠0

For X2: tSTAT = -5.1649 and the p-value is virtually 0. Reject H0. The traditional
method dummy makes a significant contribution and should be included in the
model.

H0: β3=0
H1: β3≠0

For X3: t  1.8765 and the p-value = 0.07186. Do not reject H0. There is not
sufficient evidence to conclude that there is a difference in the CD-ROM based
method and the web-based method on the mean end-of-training exam scores.
Base on the above result, the regression model should use the proficiency exam
score and the traditional dummy variable.

(g) b1  t26,.025 s =1.126  2.0555(.159)=.799 to 1.45


b
1

(h) b2  t26,.025 s =-22.29  2.0555(4.315)=-31.16 to -13.42


b
2
b3  t26,.025 s = 8.09  2.0555(4.31) = -.769 to 16.95
b
3

(i) 2
radj  0.7610

(k) The slope of end-of-training exam score with proficiency score is the same
regardless of the training method.

S-ar putea să vă placă și